LinuxCommandLibrary

comm

Compare two sorted files line by line

TLDR

Produce three tab-separated columns: lines only in first file, lines only in second file, and common lines

$ comm [file1] [file2]
copy

Print only lines common to both files
$ comm -12 [file1] [file2]
copy

Print only lines common to both files, reading one file from stdin
$ cat [file1] | comm -12 - [file2]
copy

Get lines only found in first file, saving the result to a third file
$ comm -23 [file1] [file2] > [file1_only]
copy

Print lines only found in second file, when the files aren't sorted
$ comm -13 <(sort [file1]) <(sort [file2])
copy

SYNOPSIS

comm [OPTION]... FILE1 FILE2

PARAMETERS

-1
    suppress lines unique to FILE1 (column 1)

-2
    suppress lines unique to FILE2 (column 2)

-3
    suppress lines common to both files (column 3)

-z, --zero-terminated
    separate lines/files with NUL instead of newline/tab (GNU)

--help
    display usage information and exit

--version
    output version information and exit

DESCRIPTION

comm compares two sorted text files line by line, outputting three tab-separated columns to stdout:

  • Column 1: lines unique to FILE1
  • Column 2: lines unique to FILE2
  • Column 3: lines common to both
Lines are compared exactly as-is. Column 1 has no leading delimiter; column 2 has one tab; column 3 has two tabs.

Files must be sorted ascending with sort(1) using the current locale's collating sequence, or output is meaningless. comm is ideal for finding differences, intersections, or symmetric differences in large datasets efficiently.

Options suppress columns: -1 hides FILE1 uniques, -2 hides FILE2 uniques, -3 hides commons. Combine for specific outputs, e.g., comm -12 shows only commons (like intersection).

If FILE1 or FILE2 is -, read from stdin. Use -z for NUL-delimited lines/files (GNU extension).

Exit status: 0 on success, 1 on error (e.g., unreadable files). No locale-dependent case folding or whitespace normalization.

CAVEATS

Input files must be pre-sorted with sort(1); unsorted input yields incorrect results. Cannot compare stdin to itself directly. No support for unsorted data or regex patterns.

COLUMN DELIMITERS

Default: tab. Column 1: no prefix; column 2: 1 tab; column 3: 2 tabs. -z uses NUL.

EXAMPLES

sort file1 > f1; sort file2 > f2; comm f1 f2 # full comparison
comm -12 f1 f2 # common lines only
comm -23 f1 f2 # lines only in f1

HISTORY

Introduced in Version 7 Unix (1979). Standardized in POSIX.1-2001. GNU coreutils version adds -z (1997+). Widely used for data processing pipelines.

SEE ALSO

sort(1), uniq(1), diff(1), join(1), cmp(1)

Copied to clipboard