comm
Compare two sorted files line by line
TLDR
Produce three tab-separated columns: lines only in first file, lines only in second file and common lines
Print only lines common to both files
Print only lines common to both files, reading one file from stdin
Get lines only found in first file, saving the result to a third file
Print lines only found in second file, when the files aren't sorted
SYNOPSIS
comm [OPTION]... FILE1 FILE2
PARAMETERS
-1
Suppress printing of column 1 (lines unique to FILE1).
-2
Suppress printing of column 2 (lines unique to FILE2).
-3
Suppress printing of column 3 (lines common to both files).
--output-delimiter=STR
Separate columns with STR. Default is tab characters.
--nocheck-order
Do not check that the input is sorted. This can lead to incorrect output if files are not sorted.
--zero-terminated, -z
Line delimiter is NUL, not newline. Input lines can contain newlines.
--help
Display help message and exit.
--version
Output version information and exit.
DESCRIPTION
The comm command compares two already sorted files, line by line. It outputs three columns by default: lines unique to the first file, lines unique to the second file, and lines common to both files.
This utility is particularly useful for identifying differences and commonalities between datasets when the order of lines is significant. For accurate results, both input files must be sorted in the same collating sequence; otherwise, comm may produce incorrect or incomplete output. Users can control which columns are displayed using options like -1, -2, and -3 to suppress specific columns, allowing focus on unique or common lines. It can also accept standard input for one of the files by specifying a hyphen (-) as an argument.
CAVEATS
The most crucial caveat is that both input files must be sorted for comm to function correctly and produce reliable output. If files are not sorted, the results will be unpredictable and likely incorrect.
comm performs a character-by-character comparison. Leading/trailing whitespace or case differences will be treated as different lines unless normalized prior to comparison.
It is not designed for comparing unsorted files or for complex diffing scenarios (like showing line changes within a block), for which diff is more appropriate.
COLUMN STRUCTURE
By default, comm outputs three columns, separated by tab characters (configurable with --output-delimiter): Column 1 contains lines found only in FILE1; Column 2 contains lines found only in FILE2; and Column 3 contains lines found in both FILE1 and FILE2.
STANDARD INPUT USAGE
One of the input files can be specified as a hyphen (-), which indicates that comm should read from standard input for that file. For example, comm file1 - reads the content for the second file from standard input.
HISTORY
comm is part of the GNU Core Utilities, a collection of fundamental tools commonly found on Unix-like operating systems. Its basic functionality has been a standard Unix utility for a long time, indicating its foundational role in text processing. The core logic of comparing sorted streams is efficient and well-established.