LinuxCommandLibrary

csvkit

Convert and work with CSV files

TLDR

Run a command on a CSV file with a custom delimiter

$ [command] [[-d|--delimiter]] [delimiter] [path/to/file.csv]
copy

Run a command on a CSV file with a tab as a delimiter (overrides -d)
$ [command] [[-t|--tabs]] [path/to/file.csv]
copy

Run a command on a CSV file with a custom quote character
$ [command] [[-q|--quotechar]] [quote_char] [path/to/file.csv]
copy

Run a command on a CSV file with no header row
$ [command] [[-H|--no-header-row]] [path/to/file.csv]
copy

SYNOPSIS

Suite of tools invoked individually:
csvcut [-c COLUMNS] [-C COLUMNS] [FILE]
csvlook [OPTIONS] [FILE]
csvstat [OPTIONS] [FILE]
csvgrep [OPTIONS] [PATTERN] [FILE]

PARAMETERS

-d DELIM, --delimiter DELIM
    Field delimiter (default: ',')

-t, --tabs
    Treat tabs as field delimiters

-q QC, --quotechar QC
    Quote character (default: '"')

-u EC, --escapechar EC
    Escape character for quotes

-z L, --zero-lines L
    Line ending specification (CRLF, LF, etc.)

-e E, --encoding E
    Input/output encoding (default: utf-8)

-b, --blanks
    Blank values as empty strings

--lb, --line-breaks
    Line breaks within fields as empty strings

--date-format FMT
    Format for parsing dates

--zero DELIM_ZERO
    Treat specific delimiter as zero value

DESCRIPTION

csvkit is a powerful open-source collection of command-line tools designed for working with CSV files, the most ubiquitous format for tabular data. It empowers users to manipulate, analyze, and transform data directly in the shell without needing spreadsheets or graphical software.

Key tools include:
csvcut: Select, reorder, or exclude columns.
csvlook: Pretty-print CSV as formatted tables.
csvstat: Compute descriptive statistics like min/max/mean.
csvgrep: Search rows with grep-like patterns.
csvjoin: Join multiple CSV files on key columns.
csvsort: Sort rows by specified columns.
in2csv: Convert Excel, JSON, XML, or fixed-width to CSV.
csvsql: Generate SQL CREATE and INSERT statements.

Tools chain effortlessly via pipes, support huge files, custom delimiters/encodings, quoted fields with embeds, and dialects like Excel. Ideal for data journalists, analysts, sysadmins, and ETL pipelines. Written in Python, extensible via csvpy.

CAVEATS

Not a standard distro package; install via pip or repo. No single 'csvkit' binary—use specific tools. Assumes well-formed CSV; malformed input may fail. Large files memory-intensive for some ops like sort/join.

INSTALLATION

pip install csvkit
Debian/Ubuntu: sudo apt install csvkit
Fedora: sudo dnf install csvkit

QUICK EXAMPLE

csvlook data.csv | csvcut -c name,age | csvstat age
Pretty-prints, cuts columns, shows age stats.

PIPING WORKFLOW

in2csv sheet.xlsx | csvgrep 'NY' --columns state | csvjson
Convert Excel, filter states, output JSON.

HISTORY

Developed in 2010 by Christopher Groskopf for Chicago Tribune's Wireservice data journalism. Grew from simple cutters to full suite by 2012 (v0.9). Reached v1.0 in 2016; now v1.1+ on GitHub with 7k+ stars, community-maintained.

SEE ALSO

cut(1), awk(1), grep(1), join(1), paste(1), column(1), xsv(1)

Copied to clipboard