LinuxCommandLibrary

csvjoin

Join CSV files based on common columns

SYNOPSIS

csvjoin [options] FILE1 [FILE2 ...] [outfile]

PARAMETERS

-c COLUMNS, --columns COLUMNS
    Columns to join on from all input files (names, indices, or positions).

-l LEFT, --left LEFT
    Column(s) from first (left) file to join on.

-r RIGHT, --right RIGHT
    Column(s) from second (right) file to join on.

--join-type {inner|left|right|full}
    Join type; defaults to inner.

-d DELIM, --delimiter DELIM
    Field delimiter character (e.g., comma, tab).

--doublequote
    Fields with delimiters are double-quoted.

--quotechar QUOTECHAR
    Character for quoting CSV fields.

--skiprows N
    Number of initial rows to skip.

-H, --no-header-row
    Treat first row as data, not header.

--maxfieldsize N
    Maximum field size in bytes.

-z ENCODING
    Input encoding (e.g., utf-8).

-h, --help
    Show help.

--version
    Show version.

DESCRIPTION

csvjoin is a powerful command-line utility from the csvkit suite for joining two or more CSV files based on specified columns, mimicking SQL JOIN operations.

By default, it performs an inner join on columns sharing the same name(s) across input files. Users can customize joins with --left, --right, or --columns flags to specify key columns by name, index (0-based), or position.

It supports join types: inner, left, right, and full outer. Handles various delimiters, quoting styles, encodings, and data cleaning options like skipping rows or filling nulls.

Output includes all columns from input files, with keys from the first file prioritized. Ideal for ETL pipelines, data merging without databases, or quick analysis on tabular data.

Unlike core Unix join, it works directly with CSV formats, preserving headers and structure.

CAVEATS

Requires csvkit installation (pip install csvkit); Python-based, not core Unix. Joins can be memory-intensive for large files. Column names/indices must match exactly unless specified.

EXAMPLES

csvjoin -c name employees.csv departments.csv
csvjoin --left id --right dept_id --join-type left staff.csv info.csv > output.csv

INSTALLATION

pip install csvkit
Or via package managers: apt install csvkit (Debian/Ubuntu), brew install csvkit (macOS).

HISTORY

Part of csvkit, created by Chris Groskopf around 2011 for CSV data wrangling. Actively maintained; csvjoin added early to fill gap in Unix tools for structured data joins.

SEE ALSO

csvcut(1), csvlook(1), csvstat(1), csvsort(1), join(1), paste(1)

Copied to clipboard