csvjoin
SQL-style joins for CSV files
TLDR
Join two CSV files on a common column
SYNOPSIS
csvjoin [options] file1 file2
DESCRIPTION
csvjoin is part of csvkit that performs SQL-style joins on CSV files. It combines data from two files based on matching values in specified columns, similar to JOIN operations in databases.
The default join is an inner join, returning only rows with matches in both files. Left, right, and outer joins preserve unmatched rows from one or both files, filling missing values with empty strings.
When joining on columns with different names, specify both names separated by a comma. Multiple columns can be used as composite keys for more complex joins.
PARAMETERS
-c COLUMN, --columns COLUMN
Column(s) to join on. Comma-separated if different in each file.--left
Perform a left outer join (keep all rows from first file).--right
Perform a right outer join (keep all rows from second file).--outer
Perform a full outer join (keep all rows from both files).-d CHAR, --delimiter CHAR
Field delimiter (default: comma).-e ENCODING, --encoding ENCODING
Input file encoding.-H, --no-inference
Disable type inference.--no-header-row
Files have no header row.
CAVEATS
Both files must fit in memory for joining. Column matching is exact; consider cleaning data first. Non-matching join columns result in Cartesian products. Output column order follows first file then second file.
HISTORY
csvjoin is part of csvkit, created by Christopher Groskopf in 2011. It brings database-style join operations to command-line CSV processing, enabling data combination without importing into a database.
