join compares two files on the basis of relations ("join fields") and joins all pairs of lines with identical join fields. The result is displayed on the standard output.
When join is invoked, a join field on which the files are to be compared must be specified for each of the two files. Each field is bounded by a pair of field separators. join compares each line in the first file with lines in the second and displays one output line on the standard output for each pair of lines with identical join fields. The output line comprises specific fields from both lines.
Before the call
Each input file must be sorted so that the join fields are arranged in the currently valid collating sequence (see sort). If the default field separator is used (join without the -t option), leading separators must be ignored (see sort option -b) when the files are sorted. However, if you invoke join with option -t, leading field separators must be taken into account when sorting the files (see sort without option -b).
Syntax
Format 1: |
join[ -a n| -v n][ -e string] |
Format 2: |
join[ -a n][ -e string] |
The formats are described together since the option -j No option specified The first field in a line is the default join field for both files; the default separators are blanks, tabs, and newline characters. Multiple field separators count as one field separator, and leading separators are ignored. join displays one output line on the standard output for each pair of lines with identical join fields. Each output line consists of the following entries in the given order:
The default output field separator is a blank.
(additional output) in addition to the normal output, join outputs all the lines of the nth input file whose comparison field does not match any comparison field in the other file.You may enter 1 or 2 for n. If you require output for both files enter -a Option -a and option -v must not both be specified.
Instead of the standard output, a line is generated for each line in n for which no match is found. n may be 1 or 2. If you enter both -v
(empty output fields) Replaces empty output fields with the specified string.
The mth field is specified as the comparison field for the nth file. You may enter 1 or 2 for n; m must be a whole number greater than or equal to 1. If you do not specify the option -j for the other file then the comparison field for this other file is the 1st field. n not specified: -j not specified:
(output format)join changes the output line format, so that each output line comprises the individual fields specified in list. The common field is not printed unless you explicitly specify it in list. The list you specify must consist of elements in the form n.m, where n is either 1 or 2, and m is greater than or equal to 1. Each element in the form n.m stands for the mth field in the nth file. Individual elements must be delimited by blanks or tabs.
Defines character c as a field separator for both input and output lines. Each occurrence of c is interpreted as a field separator, i.e.
In addition, the newline character acts as a field separator for the input lines. The default field separators (blanks and tabs) are interpreted as field separators only if you specify them as a value for c.
Joins the field field from file 1. The fields are decimal whole numbers starting with 1.
Joins the field field from file 2. The fields are decimal whole numbers starting with 1.
Names of the two files to be joined on the basis of common fields by join. If the files are not sorted on their join fields, join will not process all lines! Problems may arise if a numeric file name (e.g. 1.2) is specified for file1 and the -o option is used immediately before this file name is listed. To avoid such conflicts, a numeric file name should be preceded by a slash (e.g. ./1.2). |
Locale
The following environment variables affect the execution of join: LANG Provide a default value for the internationalization variables that are unset or null. If LANG is unset of null, the corresponding value from the implementation-specific default locale will be used. If any of the internationalization variables contains an invalid setting, the utility will behave as if none of the variables had been defined. LC_ALL If set to a non-empty string value, override the values of all the other internationalization variables. LC_COLLATE Determine the locale for the collating sequence join expects to have been used when the input files were sorted. LC_CTYPE Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single- as opposed to multi-byte characters in arguments and input files), the classification of characters as upper- to lower-case, and the mapping of characters from one case to the other. LC_MESSAGES Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard error. NLSPATH Determine the location of message catalogs for the processing of LC_MESSAGES. |
Example 1
In the file place, a place is assigned to a name. In the file amount, an amount and a date are assigned to the same names. Both files are sorted by name. join is to join the two files on the names: Contents of place: Albert Buffalo Hugh Washington Irene Philadelphia Contents of amount : Albert 287.56 20.03.94 Hugh 23.15 25.06.93 Hugh 167.87 16.12.93 Irene 1212.12 12.12.94 Irene 1.98 01.01.94 Join the two files on the first join field:
Join the two files and format in columns with awk:
|
Example 2
In the file city, a name is assigned to a city. In the file amount (see Example 1), an amount and a date are assigned to a name. city is sorted by cities, amount by names. join is to join the two files on the names. Contents of city: Buffalo Albert Buffalo Frank Washington Hugh New York Eric Philadelphia Irene In this example, the join field for city is field 2, while that of amount is field 1.
|
See also
awk, comm, sort, uniq |