csvkit
Convert and work with CSV files
TLDR
Run a command on a CSV file with a custom delimiter
Run a command on a CSV file with a tab as a delimiter (overrides -d)
Run a command on a CSV file with a custom quote character
Run a command on a CSV file with no header row
SYNOPSIS
csvkit [GLOBAL_OPTIONS] COMMAND [COMMAND_OPTIONS] [ARGUMENTS]
csvkit --version
csvkit --help
PARAMETERS
csvcut
Select, reorder, or remove columns from CSV files.
csvgrep
Filter rows from CSV files based on pattern matching.
csvsort
Sort rows in a CSV file by one or more columns.
csvstack
Stack multiple CSV files vertically, concatenating rows.
csvjoin
Join two or more CSV files horizontally based on common columns.
csvstat
Compute descriptive statistics for columns in a CSV file.
csvlook
Render CSV data in a fixed-width format suitable for terminal viewing.
csvclean
Fix common errors in CSV files, such as inconsistent line endings.
csvformat
Convert CSV to a custom delimiter or fixed-width format.
in2csv
Convert various formats (e.g., JSON, Excel, HTML tables) to CSV.
csvsql
Execute SQL queries against CSV files, or import/export to/from SQL databases.
csvjson
Convert CSV to JSON format (either an array of objects or arrays).
csvpy
Load CSV data into a Python environment for scripting or interactive analysis.
csvdiff
Compare two CSV files and highlight differences.
DESCRIPTION
csvkit is a suite of command-line tools for converting to and working with CSV (Comma Separated Values) data. It simplifies common data manipulation tasks, offering utilities for viewing, cleaning, transforming, and analyzing CSV files directly from the terminal. Built on Python, csvkit bridges the gap between simple text processing tools and more complex database operations, allowing users to leverage familiar Unix pipes for data workflows. Its subcommands provide functionalities like converting JSON or Excel to CSV, filtering rows, selecting columns, joining files, sorting, statistical analysis, and importing/exporting to/from SQL databases. It's an invaluable tool for data professionals, analysts, and anyone routinely handling tabular data.
CAVEATS
csvkit is written in Python, requiring a Python installation on your system. While powerful for many tasks, for extremely large datasets (e.g., terabytes), it might be less performant than specialized big data tools, as it often loads data into memory. It generally assumes well-formed CSV; malformed files might require prior cleaning or specific options.
INSTALLATION
csvkit is a Python package and can be easily installed using Python's package installer, pip: pip install csvkit
PIPING
One of csvkit's greatest strengths is its design to work seamlessly with Unix pipes, allowing users to chain multiple commands together for complex data workflows (e.g., cat data.csv | csvgrep -c "City" -m "New York" | csvcut -c "Name", "Age"
).
ENCODING
csvkit provides robust support for various character encodings, which can often be specified using a global --encoding
option or subcommand-specific options to handle diverse datasets.
HISTORY
csvkit was created by Christopher Groskopf and first released in 2012. It emerged from the need for robust, programmatic tools to handle CSV files efficiently on the command line, leveraging Python's strong data handling capabilities. It quickly gained popularity among journalists, data analysts, and developers due to its ease of use and ability to integrate into Unix-like command-line workflows, filling a niche between basic shell utilities and full-fledged statistical software or database systems. Its development continues as an open-source project, actively maintained and improved by its community.