Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

uniq - report or filter out repeated lines in a file

&pagelevel(4)&pagelevel

The command uniq searches a file for sequences of identical lines, and writes the file to standard output, removing all but one of repeated lines in the process. Note that repeated lines must be adjacent in order to be found, i.e. the input file must be sorted.


Syntax


Format 1:uniq[ -c| -d| -u][ -n][ +m][ input_file[ ouput_file]]
Format 2:uniq[ -c| -d| -u][ -f feld][ -s zeichen][ input_file[ ouput_file]]

The two formats are defined together since option -n in format 1 is equivalent to the option -f field in format 2 and option +m in format 1 is equivalent to option -s char in format 2.

No option specified

The named input_file is output without repeated lines.

-c

Outputs all lines without repetitions, starting each line with a decimal number to indicate how often it occurred repeatedly in input_file. uniq ignores the -u and -d options if set with the -c option.

-d

Outputs one copy each of only those lines that are repeated in input_file.

-u

Outputs only the lines that are not repeated in input_file.

-n

Ignores the first n fields from the beginning of the line, plus any tabs or blanks located in front of a field, when comparing for duplicates. A field is a string of non-blank characters separated from its neighbors by tabs or blanks.

-n not specified:
Lines are compared from the beginning of the line or beginning with character m+1.

Option -n is equivalent to option -f field in format 2.

+m

Causes the first m characters from the beginning of the line to be ignored when comparing for duplicates. If the +m option is combined with the -n option, the first m characters after the nth field are ignored. Blanks following the nth field are not ignored: they must be allowed for in the value of m.

+m not specified:
Lines are compared from the beginning of the line or beginning with field n+1.


Option +m is equivalent to option -s char in format 2.

input_file

Name of the file that is to be examined.

input_file not specified:
uniq reads from standard input.

output_file

Name of the file to which the output is to be written.

output_file not specified:
uniq writes to standard output.

Locale

The following environment variables affect the execution of uniq:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files), the classification of characters as upper- to lower-case, and the mapping of characters from one case to the other.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

Example 1

You want to search a file for identical lines, regardless of where they are located in the file.

A count showing how often each of these lines occurs is also to be output.

$ sort file | uniq -c

Example 2

You want to output the 10 most frequently occurring words in the file text.

$ cat text \
> | sed 's/ */ /g' \
> | tr ' ' '\n' \
> | sed '/^$/d' \
> | sort \
> | uniq -c \
> | while read N W; do printf "%06d %s\n" $N "$W"; done \
> | sort -r \
> | head -n 10

Explanation:

  • sed generates a list from text in which one or more blanks are replaced by one blank.

  • tr replaces blanks in this list by newline characters.

  • sed removes empty lines from this list.

  • sort sorts this list according to EBCDIC.

  • uniq -c outputs all lines without repetitions and in front of each one enters how frequently it occurs.

  • The while loop replaces the frequency by a 6-digit number with leading zeros.

  • sort -r sorts this frequency list backward, i.e. the most frequent line is contained in the first line.

  • head outputs the first 10 lines of this list.

See also

comm, sort