LinuxCommandLibrary

uniq

Remove adjacent, duplicate lines

TLDR

Display each line once

$ sort [path/to/file] | uniq
copy

Display only unique lines
$ sort [path/to/file] | uniq [[-u|--unique]]
copy

Display only duplicate lines
$ sort [path/to/file] | uniq [[-d|--repeated]]
copy

Display number of occurrences of each line along with that line
$ sort [path/to/file] | uniq [[-c|--count]]
copy

Display number of occurrences of each line, sorted by the most frequent
$ sort [path/to/file] | uniq [[-c|--count]] | sort [[-nr|--numeric-sort --reverse]]
copy

SYNOPSIS

uniq [OPTION]... [INPUT_FILE [OUTPUT_FILE]]

PARAMETERS

-c, --count
    Prefix lines by the number of occurrences. This option effectively counts how many times each unique line appears.

-d, --repeated
    Only print duplicate lines, one occurrence per group of adjacent duplicates.

-D, --all-repeated[=METHOD]
    Print all duplicate lines. METHOD can be 'none' (default), 'prepend' (add blank line before each group), or 'separate' (add blank line after each group).

-f N, --skip-fields=N
    Avoid comparing the first N fields (delimited by spaces by default).

-i, --ignore-case
    Ignore differences in case when comparing lines.

-s N, --skip-chars=N
    Avoid comparing the first N characters on a line. Applied after any field skipping.

-u, --unique
    Only print unique lines (lines that have no adjacent duplicates).

-z, --zero-terminated
    Line delimiter is NUL, not newline. Useful for processing output from commands like 'find -print0'.

-w N, --check-chars=N
    Compare at most N characters in lines. Applied after any field or character skipping.

--help
    Display a help message and exit.

--version
    Output version information and exit.

DESCRIPTION

The uniq command filters or reports unique lines from a sorted input.

It reads data from standard input or a specified file and writes the processed output to standard output or another file. A critical aspect of uniq is that it only detects and operates on adjacent duplicate lines. This means if a file contains duplicate lines that are not next to each other (e.g., 'apple
banana
apple'), uniq will not treat the non-adjacent 'apple' lines as duplicates unless the file is first sorted. Therefore, it is very common to pipe the output of the sort command into uniq (e.g., sort file.txt | uniq) to ensure all duplicates throughout the file are handled. Depending on the options used, uniq can output only the unique lines, only the duplicate lines, all occurrences of duplicate lines, or count the occurrences of each line.

CAVEATS

The primary limitation of uniq is that it only identifies and processes adjacent duplicate lines. If your input file contains duplicate lines that are not immediately consecutive, uniq will not detect them as duplicates. To ensure all duplicate lines throughout a file are handled, it is almost always necessary to sort the input data first, typically using the sort command (e.g., sort data.txt | uniq).

TYPICAL USAGE PATTERN

Due to its design of only processing adjacent duplicates, uniq is most effectively used in conjunction with the sort command. The standard pattern for removing all duplicate lines from a file is:

sort your_file.txt | uniq > unique_file.txt

This ensures that all identical lines are brought together by sort before uniq processes them.

HISTORY

The uniq command is a fundamental utility that has been part of Unix systems since their early development. Its core functionality of identifying and processing adjacent lines has remained consistent over decades. It is now a standard component of GNU Coreutils, which provides essential tools for operating systems. While its basic behavior is timeless, modern versions have introduced additional options, such as --all-repeated, to enhance flexibility and cover more specific use cases.

SEE ALSO

sort(1), comm(1), diff(1)

Copied to clipboard