Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

sort - sort, merge or sequence check text files

&pagelevel(4)&pagelevel

sort sorts lines in an input file and writes the result on the standard output.

If you specify more than one file, sort sorts and merges the files in the same operation, i.e. the contents of all input files are sorted and printed together.

Sorting can be performed either by whole lines or by specific parts of lines, known as sort keys. If you wish to sort by whole lines, you do not specify any sort keys; one or more keys can be used to sort by particular portions of lines. A sort key is defined by specifying the positions of fields in a line in the form +pos1 -pos2 (see “Defining specific sort keys”).

sort divides the lines of a file into fields. A field is a string of characters that is delimited by a field separator or a newline. Blanks and tabs are the default field separators. In a sequence of one or more default separators, all separators are part of the next field.
Leading blanks at the beginning of a line thus by default form part of the first field.


Syntax


Format 1: sort [ -m][ -o output_file][ -bdfiMnru][ -t char]
     [-k keydef...][ -z recsz][ -y[kmem]]
     [ -T directory][ file...]
Format 2: sort -c[ -bdfiMnru][ -t char]
     [-k keydef...][ -z recsz][ -y[kmem]]
     [ -T directory][ file...]
Format 3: sort [-m][ -o output_file][ -bdfiMnru][ -t char]
     [ +pos1[ -pos2]][ -z recsz][ -y[kmem]]
     [ -T directory][ file...]
Format 4: sort -c[ -bdfiMnru][ -t char]
     [ +pos1[ -pos2]][ -z recsz][ -y[kmem]]
     [ -T directory][ file...]

The formats are described together because

  • options -m and -o outpute_file in formats 1 and 3 are substituted for option -c in formats 2 and 4,

  • option -k keydef in formats 1 and 2 is substituted for option +pos1[ -pos2] in formats 3 and 4.

No option specified

Input lines are sorted lexicographically by bytes (characters) in machine collating sequence.

Options that alter the behavior of sort

Multiple options may be used, provided they are specified individually, each preceded by a blank and a minus sign.

-c

sort checks whether the input file is already sorted according to the current ordering rules. If it is, nothing is output; otherwise, the first line that does not match the ordering rules is displayed.

Only one file may be specified with option -c!

-m

sort merges input files which are already sorted.

-o output_file

output_file is the name of a file to which the sorted contents of the input file are to be written. The file named as output_file can also be one of the input files, but in this case the original unsorted contents of the named file are overwritten.

-o output_file not specified:
sort writes on the standard output.

-T directory

Specifies a directory for temporary files.

-u  

(u - unique)

Causes identical lines to be output once only. Lines with identical sort keys are considered identical lines.

-y[kmem]

Option -y defines the memory size that sort uses to start with. This initial size has a large impact on the speed with which the file is sorted. It is a waste of memory or of CPU time to sort a small file in a large amount of memory or a large file in a small amount of memory respectively.

kmem

Amount of memory (in Kbytes) initially assigned to sort. If you assign a value above the maximum of 1 Mbyte or below the minimum of 16 Kbyte, the corresponding extremum will be used. Thus if you define a value of 0 (-y0), for example, sort will start with minimum memory.

kmem not specified:
sort starts with maximum memory.

-y[kmem] not specified:
sort starts with a system default memory size (32 Kbytes), and continues to use more space if required.

-z recsz

With this option you allocate correctly sized buffers for the merge phase. You only need to do this if you are using option -c or -m, i.e. if you are not actually sorting the files:

If you are sorting the files, sort records the size of the longest line read in the sort phase so that buffers of the correct size can be allocated during the merge phase.

If you are not sorting the files, sort normally uses a default value for the buffer size. Lines longer than this will cause sort to terminate abnormally. Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination.

Options that alter ordering rules

You have two possibilities to specify the following options:

  • Before the first +pos1 specification:

    They then apply globally to all sort keys specified with +pos1.

    Multiple options can either be specified as usual, each with a minus sign and delimiting blanks, or they can be grouped together without intervening spaces and with just one minus sign at the beginning.

  • After a +pos1 or -pos2 specification:

    They then override global settings for the sort key to which they are attached, i.e. the altered ordering rule applies only to the preceding position specification.

    These options are directly appended to +pos1 or -pos2 without minus signs and blanks.

-b

Ignores leading field separators when determining the start and end of a sort key. Note that the b option is only effective when sorting is based on sort keys (i.e. not on the whole line).

-d

Performs a lexicographical sort, taking into account only the characters for which the C functions isalnum() and isspace() return a value of "true". These are the characters defined in the current locale as alphanumeric letters, digits, or characters producing white space, such as blanks or tabs.

-f

Folds lowercase into uppercase before sorting, thus making no distinction between them.

-i

In non-numeric comparisons, ignores all characters for which the C function isprint() returns a value of "false", i.e. all characters defined as non-printing in the current locale. If the collating sequence is based on the ASCII table, for example, characters 001 through 037 (octal) and character 0177 (octal) are ignored (see section “ASCII character set (ISO 646)”).

-M

The first three characters of the sort key are converted to uppercase, treated as names

of months, and collated in calendar order. The -M option implies the -b option.

-n

Sorts numerically. A numeric value must come first in the sort key and may consist of: blanks, minus signs, digits 0-9, and a decimal point. The -n option implies the -b option, i.e. leading blanks are ignored.

-r

Reverses the collating sequence (sorting order).

Option that alters field separators

This option must be specified separately with a minus sign.

-t char

Uses the character you specify for char as the field separator. Unlike default field separators, char is itself not part of a field. It may, however, be part of a sort key, for example if the sort key extends from the first to the third x-separated field. Every field separator char is significant, i.e. charchar delimits an empty field.

-t char not specified:
The default field separators apply (blanks and tabs). A sequence of one or more default field separators forms part of the following field.

Defining specific sort keys

When defining sort keys please note that sequences of letters defined as one collating element in the current locale count as a single letter. In a Spanish locale, for example, ch is a single collating element.

-k keydef

Defines the sort fields. keydef is defined as a sort field in the following form:

start_of_sort_field[type][,end_of_sort_field[type]]

where start_of_sort_field corresponds to +pos1 and end_of_sort_field to -pos2 (see description below). type corresponds to one of the options b, d, f, i, n or r.

+pos1[ - pos2]

+pos1 and -pos2 specify the start and end of a sort key on the basis of the fields in the input lines.
+pos1 is the position of the first character in the sort key, -pos2 refers to the first character after it. +pos1 must come before -pos2.

-pos2 not specified:
The sort key extends from +pos1 to the end of the line.

The pos1 and pos2 arguments have the form:

m[.n]

where m and n are integers with the following significance:

m

Skips m fields of the line, addressing field m+1.


.n

Skips n characters plus the field separator as of the last character of field m, thus addressing character n+1 within field m+1. If the -b option is in effect, field separators at the start of a field are not counted; thus, +m.nb refers to the n+1th non-whitespace character after field m.

.n not specified:
Is equivalent to .0 and refers to the first character after field m. If the -b option is in effect, field separators at the start of a field are not counted; thus, +m.0b refers to the first non-whitespace character in the m+1th field.

Example

To specify a sort key that begins with the fourth character in the second field and ends with this field, you enter:

sort +1.3 -2

Explanation:


    End         End     End
    Field1      Field2  Field3
         |           |       | 
030-456537 A.Mackenzie  Dublin
             |       |
             Sort key


+1.3

Skip field 1 and 3 characters: 
the 4th character after field 1 is the 1st character in the sort key: M

-2

Skip field 2 and 0 characters:
the 1st character after field 2 is the 1st character after the sort field: blank. Thus the character before is the last character in the sort key: n

Note that default field separators, unlike those defined with option -t, are part of the following field. Hence the first character of field 2 is the blank, the second character is the A, and so on.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal.

file

Name of the file you wish to sort.
You may name more than one file. All named files are sorted and merged, and the input lines from all of them together are sorted and written to standard output. In the input files, any letter sequence defined as a collating element in the current locale counts as a single letter. Thus in a Spanish locale ch is a single collating element. When the last line in an input file is missing a newline character, sort appends one, issues a warning, and continues.

Only one file may be specified together with the -c option.

If you use a dash (-) as the name for file, sort reads from standard input.

file not specified:
sort reads from standard input.

Exit status

The following exit status values may occur:

0All input files were processed successfully. If option -c was set, then the input file was sorted correctly.
1If -c was set the input file was not sorted as specified. If both -c and -u were set two identical input lines were found with the same sort field.
>1An error has occurred.

Locale

The following environment variables affect the execution of sort:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_COLLATE

Dertermine the preset collating sequence used by the sort command.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files). LC_CTYPE also governs how character classes are handled by the -b, -d, -f and -i options.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

LC_NUMERIC

Determine the form of the radix character (decimal point) in conjunction with the -n option.

LC_TIME

Determine the currently valid month names, their abbreviations and their collating sequence in conjunction with option -M.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

Example 1

Sorting the contents of input_file with the second field as the sort key.

$ sort +1 -2 input_file

Example 2

You wish to sort the contents of input_file1 and input_file2 in reverse order on the second character of the second field (= 1st character which is not a space if fields are all separated by one space). The output is to be written to output_file.

$ sort -r -o output_file +1.0 -1.2 input_file1 input_file2

Example 3

Sorting the contents of input_file1 and input_file2 in reverse order, placing the output in output_file, and using the first character in the second field as the sort key.

$ sort -r -o output_file +1.0b -1.1b input_file1 input_file2

Example 4

Displaying the presorted file input_file, suppressing all but the first occurrence of lines having the same third field.

$ sort -u +2 -3 input_file

See also

comm, join, uniq