uniq
Remove adjacent, duplicate lines
TLDR
Display each line once
Display only unique lines
Display only duplicate lines
Display number of occurrences of each line along with that line
Display number of occurrences of each line, sorted by the most frequent
SYNOPSIS
uniq [OPTION]... [INPUT_FILE [OUTPUT_FILE]]
PARAMETERS
-c, --count
Prefix lines by the number of occurrences. This option effectively counts how many times each unique line appears.
-d, --repeated
Only print duplicate lines, one occurrence per group of adjacent duplicates.
-D, --all-repeated[=METHOD]
Print all duplicate lines. METHOD can be 'none' (default), 'prepend' (add blank line before each group), or 'separate' (add blank line after each group).
-f N, --skip-fields=N
Avoid comparing the first N fields (delimited by spaces by default).
-i, --ignore-case
Ignore differences in case when comparing lines.
-s N, --skip-chars=N
Avoid comparing the first N characters on a line. Applied after any field skipping.
-u, --unique
Only print unique lines (lines that have no adjacent duplicates).
-z, --zero-terminated
Line delimiter is NUL, not newline. Useful for processing output from commands like 'find -print0'.
-w N, --check-chars=N
Compare at most N characters in lines. Applied after any field or character skipping.
--help
Display a help message and exit.
--version
Output version information and exit.
DESCRIPTION
The uniq command filters or reports unique lines from a sorted input.
It reads data from standard input or a specified file and writes the processed output to standard output or another file. A critical aspect of uniq is that it only detects and operates on adjacent duplicate lines. This means if a file contains duplicate lines that are not next to each other (e.g., 'apple
banana
apple'), uniq will not treat the non-adjacent 'apple' lines as duplicates unless the file is first sorted. Therefore, it is very common to pipe the output of the sort command into uniq (e.g., sort file.txt | uniq) to ensure all duplicates throughout the file are handled. Depending on the options used, uniq can output only the unique lines, only the duplicate lines, all occurrences of duplicate lines, or count the occurrences of each line.
CAVEATS
The primary limitation of uniq is that it only identifies and processes adjacent duplicate lines. If your input file contains duplicate lines that are not immediately consecutive, uniq will not detect them as duplicates. To ensure all duplicate lines throughout a file are handled, it is almost always necessary to sort the input data first, typically using the sort command (e.g., sort data.txt | uniq).
TYPICAL USAGE PATTERN
Due to its design of only processing adjacent duplicates, uniq is most effectively used in conjunction with the sort command. The standard pattern for removing all duplicate lines from a file is:
sort your_file.txt | uniq > unique_file.txt
This ensures that all identical lines are brought together by sort before uniq processes them.
HISTORY
The uniq command is a fundamental utility that has been part of Unix systems since their early development. Its core functionality of identifying and processing adjacent lines has remained consistent over decades. It is now a standard component of GNU Coreutils, which provides essential tools for operating systems. While its basic behavior is timeless, modern versions have introduced additional options, such as --all-repeated, to enhance flexibility and cover more specific use cases.