csv-diff
Compare two CSV files for differences
TLDR
Display a human-readable summary of differences between files using a specific column as a unique identifier
Display a human-readable summary of differences between files that includes unchanged values in rows with at least one change
Display a summary of differences between files in JSON format using a specific column as a unique identifier
SYNOPSIS
csv-diff [options] FILE1 FILE2
PARAMETERS
-h, --help
Show help message and exit
--count COUNT, -c COUNT
Stop after finding COUNT differences (default: unlimited)
--delimiter DELIMITER, -d DELIMITER
Field delimiter (default: ,)
--decimal DECIMAL
Decimal point character (default: .)
--ignore-columns IGNORE_COLUMNS [IGNORE_COLUMNS ...]
Comma-separated list of columns to ignore in comparison
--ignore-lines IGNORE_LINES [IGNORE_LINES ...]
Line numbers or patterns to skip (e.g., headers)
--ignore-spaces
Ignore leading/trailing whitespace differences
--key KEY
Column name(s) for key-based row matching
--quiet, -q
Suppress all output (exit code indicates differences)
--style {table,compact,json,line}, -s {table,compact,json,line}
Output format (default: table)
DESCRIPTION
csv-diff is a powerful utility for comparing two CSV files side-by-side, highlighting structural and content differences. It detects variations in rows, columns, headers, and cell values, making it ideal for data validation, ETL testing, or ensuring consistency between datasets.
Key features include customizable delimiters, ignoring specific columns or lines (e.g., headers), key-based matching for unordered data, whitespace tolerance, and various output styles like table, compact, JSON, or line-by-line diffs. It supports stopping after a set number of differences and quiet mode for scripting. Unlike generic diff(1), it understands CSV semantics, handling quoted fields and escapes correctly.
Usage is straightforward: provide two files or pipe data via stdin. Output clearly shows added, deleted, or modified rows with context, aiding quick issue identification in large files.
CAVEATS
Assumes consistent structure between files; case-sensitive by default; large files may consume significant memory. Not installed by default—requires pip install csv-diff or similar.
EXAMPLES
Basic diff: csv-diff file1.csv file2.csv
Ignore header & sort by key: csv-diff -d';' --ignore-lines 1 --key id data1.csv data2.csv
EXIT CODES
0: identical files; 1: differences found; 2: error (e.g., missing files)
HISTORY
Originated as open-source projects around 2010s; popular Python implementation by Eyeseetea (2018+) evolved from Perl precursors like csvdiff. Widely used in data engineering workflows.


