uniq
Remove adjacent, duplicate lines
TLDR
Display each line once
Display only unique lines
Display only duplicate lines
Display number of occurrences of each line along with that line
Display number of occurrences of each line, sorted by the most frequent
SYNOPSIS
uniq [options] [input [output]]
PARAMETERS
-c, --count
Prefix each line with the number of times it occurred.
-d, --repeated
Only print duplicate lines, one for each group.
-D, --all-repeated[=METHOD]
Print all duplicate lines. METHOD can be 'separate' (default) to separate groups of repeated lines with empty lines, or 'prepend' to prepend each group with a single empty line.
-f, --skip-fields=N
Avoid comparing the first N fields.
-i, --ignore-case
Ignore differences in case when comparing lines.
-s, --skip-chars=N
Avoid comparing the first N characters.
-u, --unique
Only print unique lines.
-w, --check-chars=N
Compare no more than N characters in lines.
input
The input file. If not specified, reads from standard input.
output
The output file. If not specified, writes to standard output.
DESCRIPTION
The uniq
command filters adjacent matching lines from input, writing one copy of each to output. Input can be from a file or standard input. By default, uniq
removes duplicate adjacent lines. Options exist to count the number of occurrences, output only repeated lines, ignore case, skip fields, and compare only certain characters. uniq
is typically used in conjunction with sort
to remove all duplicate lines, as uniq
only processes adjacent duplicates. Its simple yet effective design makes it a staple in shell scripting and data processing workflows. It's an essential tool for cleaning and analyzing text-based data. The most common use case is to remove adjacent duplicated lines from a file or stream, often after sorting the data. uniq
is not a general purpose duplicate remover.
It's a specialized tool for removing *adjacent* duplicates.
CAVEATS
uniq
only works on adjacent lines. Therefore, it is almost always used in conjunction with sort
to ensure that all identical lines are next to each other.
Be careful with the -D/--all-repeated option, as it can produce very large outputs if the input data contains many repeated lines.
EXAMPLES
Remove duplicate lines from a sorted file:sort input.txt | uniq > output.txt
Count the occurrences of each line in a sorted file:sort input.txt | uniq -c
Print only the lines that appear more than once:sort input.txt | uniq -d
Print only the unique lines:sort input.txt | uniq -u
Ignoring case (i.e. treating 'A' and 'a' as the same):sort input.txt | uniq -i
Removing duplicates lines and displaying all duplicates with delimiter:sort input.txt | uniq -D
EXIT STATUS
The uniq
utility exits with one of the following values:
0: Successful completion.
>0: An error occurred.
HISTORY
uniq
has been a standard utility in Unix-like operating systems since the early days of Unix. Its purpose remains the same: to filter adjacent matching lines. Over time, options have been added to provide more control over the comparison process, such as ignoring case or skipping fields. The core functionality, however, has remained consistent, making it a reliable and widely used tool.