sort sorts lines in an input file and writes the result on the standard output.
If you specify more than one file, sort sorts and merges the files in the same operation, i.e. the contents of all input files are sorted and printed together.
Sorting can be performed either by whole lines or by specific parts of lines, known as sort keys. If you wish to sort by whole lines, you do not specify any sort keys; one or more keys can be used to sort by particular portions of lines. A sort key is defined by specifying the positions of fields in a line in the form +pos1 -pos2 (see “Defining specific sort keys”).
sort divides the lines of a file into fields. A field is a string of characters that is delimited by a field separator or a newline. Blanks and tabs are the default field separators. In a sequence of one or more default separators, all separators are part of the next field.
Leading blanks at the beginning of a line thus by default form part of the first field.
Syntax
Format 1: |
sort [ -m][ -o output_file][ -bdfiMnru][ -t char] [-k keydef...][ -z recsz][ -y[kmem]] |
Format 2: |
sort -c[ -bdfiMnru][ -t char] [-k keydef...][ -z recsz][ -y[kmem]] |
Format 3: |
sort [-m][ -o output_file][ -bdfiMnru][ -t char] [ +pos1[ -pos2]][ -z recsz][ -y[kmem]] |
Format 4: |
sort -c[ -bdfiMnru][ -t char] [ +pos1[ -pos2]][ -z recsz][ -y[kmem]] |
The formats are described together because
No option specified Input lines are sorted lexicographically by bytes (characters) in machine collating sequence. |
Options that alter the behavior of sortMultiple options may be used, provided they are specified individually, each preceded by a blank and a minus sign.
sort checks whether the input file is already sorted according to the current ordering rules. If it is, nothing is output; otherwise, the first line that does not match the ordering rules is displayed. Only one file may be specified with option -c!
sort merges input files which are already sorted.
output_file is the name of a file to which the sorted contents of the input file are to be written. The file named as output_file can also be one of the input files, but in this case the original unsorted contents of the named file are overwritten. -o output_file not specified:
Specifies a directory for temporary files.
(u - unique) Causes identical lines to be output once only. Lines with identical sort keys are considered identical lines.
Option -y defines the memory size that sort uses to start with. This initial size has a large impact on the speed with which the file is sorted. It is a waste of memory or of CPU time to sort a small file in a large amount of memory or a large file in a small amount of memory respectively.
Amount of memory (in Kbytes) initially assigned to sort. If you assign a value above the maximum of 1 Mbyte or below the minimum of 16 Kbyte, the corresponding extremum will be used. Thus if you define a value of 0 (-y0), for example, sort will start with minimum memory. kmem not specified: -y[kmem] not specified:
With this option you allocate correctly sized buffers for the merge phase. You only need to do this if you are using option -c or -m, i.e. if you are not actually sorting the files: If you are sorting the files, sort records the size of the longest line read in the sort phase so that buffers of the correct size can be allocated during the merge phase. If you are not sorting the files, sort normally uses a default value for the buffer size. Lines longer than this will cause sort to terminate abnormally. Supplying the actual number of bytes in the longest line to be merged (or some larger value) will prevent abnormal termination. |
Options that alter ordering rulesYou have two possibilities to specify the following options:
Ignores leading field separators when determining the start and end of a sort key. Note that the b option is only effective when sorting is based on sort keys (i.e. not on the whole line).
Performs a lexicographical sort, taking into account only the characters for which the C functions isalnum() and isspace() return a value of "true". These are the characters defined in the current locale as alphanumeric letters, digits, or characters producing white space, such as blanks or tabs.
Folds lowercase into uppercase before sorting, thus making no distinction between them.
In non-numeric comparisons, ignores all characters for which the C function isprint() returns a value of "false", i.e. all characters defined as non-printing in the current locale. If the collating sequence is based on the ASCII table, for example, characters 001 through 037 (octal) and character 0177 (octal) are ignored (see section “ASCII character set (ISO 646)”).
The first three characters of the sort key are converted to uppercase, treated as names of months, and collated in calendar order. The -M option implies the -b option.
Sorts numerically. A numeric value must come first in the sort key and may consist of: blanks, minus signs, digits 0-9, and a decimal point. The -n option implies the -b option, i.e. leading blanks are ignored.
Reverses the collating sequence (sorting order). |
Option that alters field separatorsThis option must be specified separately with a minus sign.
Uses the character you specify for char as the field separator. Unlike default field separators, char is itself not part of a field. It may, however, be part of a sort key, for example if the sort key extends from the first to the third x-separated field. Every field separator char is significant, i.e. charchar delimits an empty field. -t |
Defining specific sort keysWhen defining sort keys please note that sequences of letters defined as one collating element in the current locale count as a single letter. In a Spanish locale, for example, ch is a single collating element.
Defines the sort fields. keydef is defined as a sort field in the following form:
where start_of_sort_field corresponds to +pos1 and end_of_sort_field to -pos2 (see description below). type corresponds to one of the options b, d, f, i, n or r.
+pos1 and -pos2 specify the start and end of a sort key on the basis of the fields in the input lines. -pos2 not specified: The pos1 and pos2 arguments have the form:
where m and n are integers with the following significance:
Skips n characters plus the field separator as of the last character of field m, thus addressing character n+1 within field m+1. If the -b option is in effect, field separators at the start of a field are not counted; thus, +m.nb refers to the n+1th non-whitespace character after field m. .n not specified: Example To specify a sort key that begins with the fourth character in the second field and ends with this field, you enter:
Explanation: End End End Field1 Field2 Field3 | | | 030-456537 A.Mackenzie Dublin | | Sort key +1.3 Skip field 1 and 3 characters: -2 Skip field 2 and 0 characters: Note that default field separators, unlike those defined with option -t, are part of the following field. Hence the first character of field 2 is the blank, the second character is the A, and so on. When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. |
Name of the file you wish to sort. Only one file may be specified together with the -c option. If you use a dash (-) as the name for file, sort reads from standard input. file not specified: |
Exit status
The following exit status values may occur:
|
Locale
The following environment variables affect the execution of sort: LANG Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined. LC_ALL If set to a non-empty string value, override the values of all the other internationalization variables. LC_COLLATE Dertermine the preset collating sequence used by the sort command. LC_CTYPE Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files). LC_CTYPE also governs how character classes are handled by the -b, -d, -f and -i options. LC_MESSAGES Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error. LC_NUMERIC Determine the form of the radix character (decimal point) in conjunction with the -n option. LC_TIME Determine the currently valid month names, their abbreviations and their collating sequence in conjunction with option -M. NLSPATH Determine the location of message catalogs for the processing of LC_MESSAGES. |
Example 1
Sorting the contents of input_file with the second field as the sort key.
|
Example 2
You wish to sort the contents of input_file1 and input_file2 in reverse order on the second character of the second field (= 1st character which is not a space if fields are all separated by one space). The output is to be written to output_file.
|
Example 3
Sorting the contents of input_file1 and input_file2 in reverse order, placing the output in output_file, and using the first character in the second field as the sort key.
|
Example 4
Displaying the presorted file input_file, suppressing all but the first occurrence of lines having the same third field.
|
See also
comm, join, uniq |