Your Browser is not longer supported

Please use Google Chrome, Mozilla Firefox or Microsoft Edge to view the page correctly
Loading...

{{viewport.spaceProperty.prod}}

join - relational database operator

&pagelevel(4)&pagelevel

join compares two files on the basis of relations ("join fields") and joins all pairs of lines with identical join fields. The result is displayed on the standard output.

When join is invoked, a join field on which the files are to be compared must be specified for each of the two files. Each field is bounded by a pair of field separators. join compares each line in the first file with lines in the second and displays one output line on the standard output for each pair of lines with identical join fields. The output line comprises specific fields from both lines.

Before the call

Each input file must be sorted so that the join fields are arranged in the currently valid collating sequence (see sort). If the default field separator is used (join without the -t option), leading separators must be ignored (see sort option -b) when the files are sorted. However, if you invoke join with option -t, leading field separators must be taken into account when sorting the files (see sort without option -b).


Syntax


Format 1: join[ -a n| -v n][ -e string]
    [ -o list][ -t c]
    [ -1 field][ -2 field] file1 file2
Format 2: join[ -a n][ -e string]
    [ -j field][ -j1 field][ -j2 field]
    [ -o list...][ -t c] file1 file2

The formats are described together since the option -j field in format 2 corresponds to the options -1 field -2 field in format 1. -j1 field is equivalent to -1 field and -j2 field is equivalent to -2 field.

No option specified

The first field in a line is the default join field for both files; the default separators are blanks, tabs, and newline characters. Multiple field separators count as one field separator, and leading separators are ignored.

join displays one output line on the standard output for each pair of lines with identical join fields. Each output line consists of the following entries in the given order:

  • the common field

  • the rest of the line from the first file

  • the rest of the line from the second file

The default output field separator is a blank.

-a n

(additional output) in addition to the normal output, join outputs all the lines of the nth input file whose comparison field does not match any comparison field in the other file.You may enter 1 or 2 for n. If you require output for both files enter -a 1 -a 2.

Option -a and option -v must not both be specified.

-v n

Instead of the standard output, a line is generated for each line in n for which no match is found. n may be 1 or 2. If you enter both -v 1 and -v 2 then all the lines for which there is no match are output.

-e string

(empty output fields) Replaces empty output fields with the specified string.

-j[n] m

The mth field is specified as the comparison field for the nth file. You may enter 1 or 2 for n; m must be a whole number greater than or equal to 1.

If you do not specify the option -j for the other file then the comparison field for this other file is the 1st field.

n not specified:
The join field for both files is the m-th field.

-j not specified:
The join field for both files is the first field.

-o list

(output format)join changes the output line format, so that each output line comprises the individual fields specified in list. The common field is not printed unless you explicitly specify it in list.

The list you specify must consist of elements in the form n.m, where n is either 1 or 2, and m is greater than or equal to 1. Each element in the form n.m stands for the mth field in the nth file. Individual elements must be delimited by blanks or tabs.

-t c

Defines character c as a field separator for both input and output lines. Each occurrence of c is interpreted as a field separator, i.e.

  • two consecutive c separators designate an empty field, and

  • a leading c is significant and designates an empty first field.

In addition, the newline character acts as a field separator for the input lines.

The default field separators (blanks and tabs) are interpreted as field separators only if you specify them as a value for c.

-1 field

Joins the field field from file 1. The fields are decimal whole numbers starting with 1.

-2 field

Joins the field field from file 2. The fields are decimal whole numbers starting with 1.

file1 file2

Names of the two files to be joined on the basis of common fields by join.
If you use a dash (-) as the name for file1, join reads from standard input.


If the files are not sorted on their join fields, join will not process all lines!

Problems may arise if a numeric file name (e.g. 1.2) is specified for file1 and the -o option is used immediately before this file name is listed. To avoid such conflicts, a numeric file name should be preceded by a slash (e.g. ./1.2).

Locale

The following environment variables affect the execution of join:

LANG

Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined.

LC_ALL

If set to a non-empty string value, override the values of all the other internationalization variables.

LC_COLLATE

Determine the locale for the collating sequence join expects to have been used when the input files were sorted.

LC_CTYPE

Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files), the classification of characters as upper- to lower-case, and the mapping of characters from one case to the other.

LC_MESSAGES

Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error.

NLSPATH

Determine the location of message catalogs for the processing of LC_MESSAGES.

Example 1

In the file place, a place is assigned to a name. In the file amount, an amount and a date are assigned to the same names. Both files are sorted by name. join is to join the two files on the names:

Contents of place:

Albert Buffalo
Hugh Washington
Irene Philadelphia

Contents of amount :

Albert   287.56  20.03.94
Hugh      23.15  25.06.93
Hugh     167.87  16.12.93
Irene   1212.12  12.12.94
Irene      1.98  01.01.94

Join the two files on the first join field:

$ join place amount

Albert Buffalo 287.56 20.03.94

Hugh Washington 23.15 25.06.93

Hugh Washington 167.87 16.12.93

Irene Philadelphia 1212.12 12.12.94

Irene Philadelphia 1.98 01.01.94

Join the two files and format in columns with awk:

$ join place amount | awk '{printf("%-10s %-15s %-10s %-10s\n",$1,$2,$3,$4)}'

Albert    Buffalo          287.56    20.03.94

Hugh      Washington        23.15    25.06.93

Hugh      Washington       167.87    16.12.93
Irene     Philadelphia    1212.12    12.12.94

Irene     Philadelphia       1.98    01.01.94

Example 2

In the file city, a name is assigned to a city. In the file amount (see Example 1), an amount and a date are assigned to a name. city is sorted by cities, amount by names. join is to join the two files on the names.

Contents of city:

Buffalo      Albert
Buffalo      Frank
Washington   Hugh
New York     Eric
Philadelphia Irene

In this example, the join field for city is field 2, while that of amount is field 1.
Before the files are joined, city must be sorted on field 2. The output is subsequently formatted into columns with awk:

$ sort -b +1 city | join -j1 2 - amount | \

> awk '{printf("%-10s %-15s %-10s %-10s\n",$1,$2,$3,$4)}'
Albert     Buffalo         287.56     20.03.94

Hugh       Washington      23.15      25.06.93

Hugh       Washington      167.87     16.12.93
Irene      Philadelphia    1212.12    12.12.94

Irene      Philadelphia    1.98       01.01.94

See also

awk, comm, sort, uniq