csvkit
Suite of command-line CSV processing tools
TLDR
SYNOPSIS
tool [options] [file]
DESCRIPTION
csvkit is a comprehensive suite of command-line tools for working with CSV files. It brings database-like operations to tabular data without requiring a database, following Unix philosophy principles.The tools handle CSV quoting and escaping correctly, avoiding the pitfalls of using awk, sed, or cut directly on CSV data. They support various input encodings and delimiters, making them versatile for real-world data processing.csvkit is particularly useful for data journalism, quick data exploration, ETL processes, and as part of data pipelines. All tools can read from stdin and write to stdout for easy chaining.
INCLUDED TOOLS
in2csv
Convert various formats (Excel, JSON, fixed-width) to CSV.sql2csv
Execute a SQL query on a database and output results as CSV.csvlook
Display CSV in a human-readable table format.csvstat
Generate statistics for CSV columns.csvcut
Select and reorder columns.csvgrep
Filter rows by column values.csvsort
Sort rows by columns.csvjoin
Join two CSV files on common columns.csvstack
Concatenate CSV files vertically.csvsql
Generate SQL statements or execute queries against a database.csvjson
Convert CSV to JSON.csvpy
Load CSV into a Python shell for interactive exploration.csvclean
Validate and fix CSV formatting issues.csvformat
Convert CSV to other delimited formats.
CAVEATS
Some operations load entire files into memory. Type inference can sometimes misclassify data. Performance may be slower than specialized tools for very large files. Requires Python installation.
HISTORY
csvkit was created by Christopher Groskopf and first released in 2011. It was designed to provide data journalists and analysts with powerful command-line tools for processing CSV data, becoming a standard toolkit in the data science community.
