csv-diff
Compare two CSV files for differences
TLDR
Display a human-readable summary of differences between files using a specific column as a unique identifier
Display a human-readable summary of differences between files that includes unchanged values in rows with at least one change
Display a summary of differences between files in JSON format using a specific column as a unique identifier
SYNOPSIS
csv-diff [OPTIONS] FILE1 FILE2
PARAMETERS
--key COLUMN
Specifies the column to use as a key for identifying rows. Multiple --key options can be used to define a composite key.
--ignore-column COLUMN
Specifies a column to ignore during the comparison.
--delimiter CHAR
Specifies the delimiter character used in the CSV files (default is comma).
--skip-lines INTEGER
Specifies the number of lines to skip at the beginning of each file.
--output FORMAT
Specifies the output format. Available formats may include 'summary', 'diff', 'json', etc.
--version
Show program's version number and exit.
--help
Show help message and exit.
DESCRIPTION
csv-diff is a command-line utility designed to compare two CSV (Comma Separated Values) files and identify the differences between them. It provides various options to control the comparison process, including specifying key columns for identifying rows, handling different delimiters, and customizing the output format. The tool effectively highlights added, deleted, or modified rows based on the defined comparison criteria. It is a valuable tool for data validation, auditing changes in CSV datasets, and automating tasks involving CSV file comparisons. csv-diff is commonly used in data pipelines, version control systems for data files, and general data analysis workflows.
CAVEATS
csv-diff relies on consistent CSV formatting in the input files. Significant variations in formatting (e.g., different quoting styles) may lead to inaccurate results. Performance can degrade with very large CSV files.
OUTPUT FORMATS
Different output formats provide varying levels of detail.
summary: gives an overview of number of changes.
diff: Shows added, deleted and modified lines, using unified diff format.
json: Output the result in json format.
HISTORY
csv-diff's development likely emerged from the need to automate CSV file comparisons, a common task in data management and software development. The tool's popularity has grown with the increasing use of CSV as a data exchange format.