csvkit
Convert and work with CSV files
TLDR
Run a command on a CSV file with a custom delimiter
Run a command on a CSV file with a tab as a delimiter (overrides -d)
Run a command on a CSV file with a custom quote character
Run a command on a CSV file with no header row
SYNOPSIS
Suite of tools invoked individually:
csvcut [-c COLUMNS] [-C COLUMNS] [FILE]
csvlook [OPTIONS] [FILE]
csvstat [OPTIONS] [FILE]
csvgrep [OPTIONS] [PATTERN] [FILE]
PARAMETERS
-d DELIM, --delimiter DELIM
Field delimiter (default: ',')
-t, --tabs
Treat tabs as field delimiters
-q QC, --quotechar QC
Quote character (default: '"')
-u EC, --escapechar EC
Escape character for quotes
-z L, --zero-lines L
Line ending specification (CRLF, LF, etc.)
-e E, --encoding E
Input/output encoding (default: utf-8)
-b, --blanks
Blank values as empty strings
--lb, --line-breaks
Line breaks within fields as empty strings
--date-format FMT
Format for parsing dates
--zero DELIM_ZERO
Treat specific delimiter as zero value
DESCRIPTION
csvkit is a powerful open-source collection of command-line tools designed for working with CSV files, the most ubiquitous format for tabular data. It empowers users to manipulate, analyze, and transform data directly in the shell without needing spreadsheets or graphical software.
Key tools include:
csvcut: Select, reorder, or exclude columns.
csvlook: Pretty-print CSV as formatted tables.
csvstat: Compute descriptive statistics like min/max/mean.
csvgrep: Search rows with grep-like patterns.
csvjoin: Join multiple CSV files on key columns.
csvsort: Sort rows by specified columns.
in2csv: Convert Excel, JSON, XML, or fixed-width to CSV.
csvsql: Generate SQL CREATE and INSERT statements.
Tools chain effortlessly via pipes, support huge files, custom delimiters/encodings, quoted fields with embeds, and dialects like Excel. Ideal for data journalists, analysts, sysadmins, and ETL pipelines. Written in Python, extensible via csvpy.
CAVEATS
Not a standard distro package; install via pip or repo. No single 'csvkit' binary—use specific tools. Assumes well-formed CSV; malformed input may fail. Large files memory-intensive for some ops like sort/join.
INSTALLATION
pip install csvkit
Debian/Ubuntu: sudo apt install csvkit
Fedora: sudo dnf install csvkit
QUICK EXAMPLE
csvlook data.csv | csvcut -c name,age | csvstat age
Pretty-prints, cuts columns, shows age stats.
PIPING WORKFLOW
in2csv sheet.xlsx | csvgrep 'NY' --columns state | csvjson
Convert Excel, filter states, output JSON.
HISTORY
Developed in 2010 by Christopher Groskopf for Chicago Tribune's Wireservice data journalism. Grew from simple cutters to full suite by 2012 (v0.9). Reached v1.0 in 2016; now v1.1+ on GitHub with 7k+ stars, community-maintained.


