Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

comm - select or reject lines common to two files

&pagelevel(4)&pagelevel

comm compares two files in which the lines are sorted on the basis of the currently valid collating sequence. Sorting can be performed with the sort command.

Syntax

comm [ -123 ] file1 file2

No option specified:


comm produces three columns with the following meanings:
Column 1:lines which occur in file1 only
Column 2:lines which occur in file2 only
Column 3:lines which occur in both files


Options:


-1Column 1 is not output.
-2Column 2 is not output.
-3Column 3 is not output.

Combinations of options 1, 2 and 3 are also permitted, e.g.:
-12comm outputs all lines common to both files.
-23comm outputs all lines which only occur in file1.
-13comm outputs all lines which only occur in file2.
-123comm generates no output.


file1 file2

Names of the two sorted files which you want to compare.
The comm command will not function properly unless both files have been sorted. If you use a dash as one of the names, comm reads from standard input.

Locale

The following environment variables affect the execution of cd:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_COLLATE

Determine the locale for the collating sequence comm expects to have been used when the input files were sorted.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments). LC_CTYPE governs character classes, character conversion (shifting) and the behavior of character classes in regular expressions.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

Example

The file books contains the titles of books and their authors. Each line contains the title of one book and the name of its author, with a space between them. You would now like to search the file books for a number of authors whose names you have listed in the file authors1. The contents of books and authors1 are as follows:

books                                            authors1
"Gormenghast" Peake                              Blyton
"Buddenbrooks" Mann                              Gogol
"Noddy" Blyton                                   Joyce
"Ulysses" Joyce                                  Kafka
                                                 Mann
                                                 Tolstoy

You can now proceed as follows:

  • Use awk to extract the authors from books.

  • Sort the authors in books using sort.

  • Redirect the output of sort to the new file authors2.

  • Compare the files authors1 and authors2 using comm -2.


$ awk '{printf"%s\n",$2}' books| sort > authors2


The file authors2 contains the following:

Blyton
Joyce
Mann
Peake

$ comm -2 authors1 authors2
        Blyton

Gogol

        Joyce
Kafka

        Mann

Tolstoy


All authors which are only in authors1 are output in column 1. Then come the contents of column 3, which lists all authors present in both files.

See also

cmp, diff, sort, uniq