LinuxCommandLibrary

csvkit

Convert and work with CSV files

TLDR

Run a command on a CSV file with a custom delimiter

$ [command] [[-d|--delimiter]] [delimiter] [path/to/file.csv]
copy

Run a command on a CSV file with a tab as a delimiter (overrides -d)
$ [command] [[-t|--tabs]] [path/to/file.csv]
copy

Run a command on a CSV file with a custom quote character
$ [command] [[-q|--quotechar]] [quote_char] [path/to/file.csv]
copy

Run a command on a CSV file with no header row
$ [command] [[-H|--no-header-row]] [path/to/file.csv]
copy

SYNOPSIS

csvkit [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] [ARGUMENTS]
csvkit --version
csvkit --help

PARAMETERS

csvcut
    Select, reorder, or remove columns from CSV files.

csvgrep
    Filter rows from CSV files based on pattern matching.

csvsort
    Sort rows in a CSV file by one or more columns.

csvstack
    Stack multiple CSV files vertically, concatenating rows.

csvjoin
    Join two or more CSV files horizontally based on common columns.

csvstat
    Compute descriptive statistics for columns in a CSV file.

csvlook
    Render CSV data in a fixed-width format suitable for terminal viewing.

csvclean
    Fix common errors in CSV files, such as inconsistent line endings.

csvformat
    Convert CSV to a custom delimiter or fixed-width format.

in2csv
    Convert various formats (e.g., JSON, Excel, HTML tables) to CSV.

csvsql
    Execute SQL queries against CSV files, or import/export to/from SQL databases.

csvjson
    Convert CSV to JSON format (either an array of objects or arrays).

csvpy
    Load CSV data into a Python environment for scripting or interactive analysis.

csvdiff
    Compare two CSV files and highlight differences.

DESCRIPTION

csvkit is a suite of command-line tools for converting to and working with CSV (Comma Separated Values) data. It simplifies common data manipulation tasks, offering utilities for viewing, cleaning, transforming, and analyzing CSV files directly from the terminal. Built on Python, csvkit bridges the gap between simple text processing tools and more complex database operations, allowing users to leverage familiar Unix pipes for data workflows. Its subcommands provide functionalities like converting JSON or Excel to CSV, filtering rows, selecting columns, joining files, sorting, statistical analysis, and importing/exporting to/from SQL databases. It's an invaluable tool for data professionals, analysts, and anyone routinely handling tabular data.

CAVEATS

csvkit is written in Python, requiring a Python installation on your system. While powerful for many tasks, for extremely large datasets (e.g., terabytes), it might be less performant than specialized big data tools, as it often loads data into memory. It generally assumes well-formed CSV; malformed files might require prior cleaning or specific options.

INSTALLATION

csvkit is a Python package and can be easily installed using Python's package installer, pip: pip install csvkit

PIPING

One of csvkit's greatest strengths is its design to work seamlessly with Unix pipes, allowing users to chain multiple commands together for complex data workflows (e.g., cat data.csv | csvgrep -c "City" -m "New York" | csvcut -c "Name", "Age").

ENCODING

csvkit provides robust support for various character encodings, which can often be specified using a global --encoding option or subcommand-specific options to handle diverse datasets.

HISTORY

csvkit was created by Christopher Groskopf and first released in 2012. It emerged from the need for robust, programmatic tools to handle CSV files efficiently on the command line, leveraging Python's strong data handling capabilities. It quickly gained popularity among journalists, data analysts, and developers due to its ease of use and ability to integrate into Unix-like command-line workflows, filling a niche between basic shell utilities and full-fledged statistical software or database systems. Its development continues as an open-source project, actively maintained and improved by its community.

SEE ALSO

grep(1), awk(1), sed(1), cut(1), sort(1), join(1), datamash(1), q(1)

Copied to clipboard