LinuxCommandLibrary

csv-diff

Compare two CSV files for differences

TLDR

Display a human-readable summary of differences between files using a specific column as a unique identifier

$ csv-diff [path/to/file1.csv] [path/to/file2.csv] --key [column_name]
copy

Display a human-readable summary of differences between files that includes unchanged values in rows with at least one change
$ csv-diff [path/to/file1.csv] [path/to/file2.csv] --key [column_name] --show-unchanged
copy

Display a summary of differences between files in JSON format using a specific column as a unique identifier
$ csv-diff [path/to/file1.csv] [path/to/file2.csv] --key [column_name] --json
copy

SYNOPSIS

csv-diff [options] FILE1 FILE2

PARAMETERS

-h, --help
    Show help message and exit

--count COUNT, -c COUNT
    Stop after finding COUNT differences (default: unlimited)

--delimiter DELIMITER, -d DELIMITER
    Field delimiter (default: ,)

--decimal DECIMAL
    Decimal point character (default: .)

--ignore-columns IGNORE_COLUMNS [IGNORE_COLUMNS ...]
    Comma-separated list of columns to ignore in comparison

--ignore-lines IGNORE_LINES [IGNORE_LINES ...]
    Line numbers or patterns to skip (e.g., headers)

--ignore-spaces
    Ignore leading/trailing whitespace differences

--key KEY
    Column name(s) for key-based row matching

--quiet, -q
    Suppress all output (exit code indicates differences)

--style {table,compact,json,line}, -s {table,compact,json,line}
    Output format (default: table)

DESCRIPTION

csv-diff is a powerful utility for comparing two CSV files side-by-side, highlighting structural and content differences. It detects variations in rows, columns, headers, and cell values, making it ideal for data validation, ETL testing, or ensuring consistency between datasets.

Key features include customizable delimiters, ignoring specific columns or lines (e.g., headers), key-based matching for unordered data, whitespace tolerance, and various output styles like table, compact, JSON, or line-by-line diffs. It supports stopping after a set number of differences and quiet mode for scripting. Unlike generic diff(1), it understands CSV semantics, handling quoted fields and escapes correctly.

Usage is straightforward: provide two files or pipe data via stdin. Output clearly shows added, deleted, or modified rows with context, aiding quick issue identification in large files.

CAVEATS

Assumes consistent structure between files; case-sensitive by default; large files may consume significant memory. Not installed by default—requires pip install csv-diff or similar.

EXAMPLES

Basic diff: csv-diff file1.csv file2.csv
Ignore header & sort by key: csv-diff -d';' --ignore-lines 1 --key id data1.csv data2.csv

EXIT CODES

0: identical files; 1: differences found; 2: error (e.g., missing files)

HISTORY

Originated as open-source projects around 2010s; popular Python implementation by Eyeseetea (2018+) evolved from Perl precursors like csvdiff. Widely used in data engineering workflows.

SEE ALSO

diff(1), comm(1), join(1), miller(1)

Copied to clipboard