LinuxCommandLibrary

uniq

Remove adjacent, duplicate lines

TLDR

Display each line once

$ sort [path/to/file] | uniq
copy

Display only unique lines
$ sort [path/to/file] | uniq [[-u|--unique]]
copy

Display only duplicate lines
$ sort [path/to/file] | uniq [[-d|--repeated]]
copy

Display number of occurrences of each line along with that line
$ sort [path/to/file] | uniq [[-c|--count]]
copy

Display number of occurrences of each line, sorted by the most frequent
$ sort [path/to/file] | uniq [[-c|--count]] | sort [[-nr|--numeric-sort --reverse]]
copy

SYNOPSIS

uniq [options] [input [output]]

PARAMETERS

-c, --count
    Prefix each line with the number of times it occurred.

-d, --repeated
    Only print duplicate lines, one for each group.

-D, --all-repeated[=METHOD]
    Print all duplicate lines. METHOD can be 'separate' (default) to separate groups of repeated lines with empty lines, or 'prepend' to prepend each group with a single empty line.

-f, --skip-fields=N
    Avoid comparing the first N fields.

-i, --ignore-case
    Ignore differences in case when comparing lines.

-s, --skip-chars=N
    Avoid comparing the first N characters.

-u, --unique
    Only print unique lines.

-w, --check-chars=N
    Compare no more than N characters in lines.

input
    The input file. If not specified, reads from standard input.

output
    The output file. If not specified, writes to standard output.

DESCRIPTION

The uniq command filters adjacent matching lines from input, writing one copy of each to output. Input can be from a file or standard input. By default, uniq removes duplicate adjacent lines. Options exist to count the number of occurrences, output only repeated lines, ignore case, skip fields, and compare only certain characters. uniq is typically used in conjunction with sort to remove all duplicate lines, as uniq only processes adjacent duplicates. Its simple yet effective design makes it a staple in shell scripting and data processing workflows. It's an essential tool for cleaning and analyzing text-based data. The most common use case is to remove adjacent duplicated lines from a file or stream, often after sorting the data. uniq is not a general purpose duplicate remover.
It's a specialized tool for removing *adjacent* duplicates.

CAVEATS

uniq only works on adjacent lines. Therefore, it is almost always used in conjunction with sort to ensure that all identical lines are next to each other.
Be careful with the -D/--all-repeated option, as it can produce very large outputs if the input data contains many repeated lines.

EXAMPLES

Remove duplicate lines from a sorted file:
sort input.txt | uniq > output.txt

Count the occurrences of each line in a sorted file:
sort input.txt | uniq -c

Print only the lines that appear more than once:
sort input.txt | uniq -d

Print only the unique lines:
sort input.txt | uniq -u

Ignoring case (i.e. treating 'A' and 'a' as the same):
sort input.txt | uniq -i

Removing duplicates lines and displaying all duplicates with delimiter:
sort input.txt | uniq -D

EXIT STATUS

The uniq utility exits with one of the following values:
0: Successful completion.
>0: An error occurred.

HISTORY

uniq has been a standard utility in Unix-like operating systems since the early days of Unix. Its purpose remains the same: to filter adjacent matching lines. Over time, options have been added to provide more control over the comparison process, such as ignoring case or skipping fields. The core functionality, however, has remained consistent, making it a reliable and widely used tool.

SEE ALSO

sort(1), comm(1), tr(1), awk(1), sed(1)

Copied to clipboard