csvjoin
Join CSV files based on common columns
SYNOPSIS
csvjoin [options] FILE1 [FILE2 ...] [outfile]
PARAMETERS
-c COLUMNS, --columns COLUMNS
Columns to join on from all input files (names, indices, or positions).
-l LEFT, --left LEFT
Column(s) from first (left) file to join on.
-r RIGHT, --right RIGHT
Column(s) from second (right) file to join on.
--join-type {inner|left|right|full}
Join type; defaults to inner.
-d DELIM, --delimiter DELIM
Field delimiter character (e.g., comma, tab).
--doublequote
Fields with delimiters are double-quoted.
--quotechar QUOTECHAR
Character for quoting CSV fields.
--skiprows N
Number of initial rows to skip.
-H, --no-header-row
Treat first row as data, not header.
--maxfieldsize N
Maximum field size in bytes.
-z ENCODING
Input encoding (e.g., utf-8).
-h, --help
Show help.
--version
Show version.
DESCRIPTION
csvjoin is a powerful command-line utility from the csvkit suite for joining two or more CSV files based on specified columns, mimicking SQL JOIN operations.
By default, it performs an inner join on columns sharing the same name(s) across input files. Users can customize joins with --left, --right, or --columns flags to specify key columns by name, index (0-based), or position.
It supports join types: inner, left, right, and full outer. Handles various delimiters, quoting styles, encodings, and data cleaning options like skipping rows or filling nulls.
Output includes all columns from input files, with keys from the first file prioritized. Ideal for ETL pipelines, data merging without databases, or quick analysis on tabular data.
Unlike core Unix join, it works directly with CSV formats, preserving headers and structure.
CAVEATS
Requires csvkit installation (pip install csvkit); Python-based, not core Unix. Joins can be memory-intensive for large files. Column names/indices must match exactly unless specified.
EXAMPLES
csvjoin -c name employees.csv departments.csv
csvjoin --left id --right dept_id --join-type left staff.csv info.csv > output.csv
INSTALLATION
pip install csvkit
Or via package managers: apt install csvkit (Debian/Ubuntu), brew install csvkit (macOS).
HISTORY
Part of csvkit, created by Chris Groskopf around 2011 for CSV data wrangling. Actively maintained; csvjoin added early to fill gap in Unix tools for structured data joins.


