csvkit
Convert and work with CSV files
TLDR
Run a command on a CSV file with a custom delimiter
Run a command on a CSV file with a tab as a delimiter (overrides -d)
Run a command on a CSV file with a custom quote character
Run a command on a CSV file with no header row
SYNOPSIS
csvkit [command] [options] [input_file]
PARAMETERS
--encoding ENCODING
Character encoding of the input CSV file.
--snifflimit BYTES
Limit CSV dialect sniffing to the specified number of bytes.
--compression COMPRESSION
Specify the compression format (e.g., 'gzip', 'bz2').
--format FORMAT
Format to use when converting. Most commands support specific formats like 'csv', 'json', or 'sql'.
--delimiter DELIMITER
Character delimiting fields within a row.
--quotechar QUOTECHAR
Character used to enclose fields containing special characters.
--escapechar ESCAPECHAR
Character used to escape the quotechar.
--doublequote
Whether or not double quotes are doubled.
--skipinitialspace
Ignore whitespace immediately following the delimiter.
--tabs
Specify that the input is tab-delimited.
--no-header-row
Indicate that the input CSV file has no header row.
--names NAMES
Specify column names (comma-separated).
--blanks
Do not convert blank cells to NULL.
--date-format DATE_FORMAT
Date format string (e.g., '%Y-%m-%d').
--datetime-format DATETIME_FORMAT
Datetime format string (e.g., '%Y-%m-%d %H:%M:%S').
--locale LOCALE
Specify locale to use when parsing dates.
--db CONNECTION_STRING
Connection string to the database (for database operations).
--table TABLE_NAME
Name of the database table.
--insert
Insert data into the database table (if not using `csvsql`).
--create
Create table for database operation.
-v, --version
Show program's version number and exit.
-h, --help
Show help message and exit.
DESCRIPTION
csvkit is a suite of command-line tools for working with CSV files. It provides utilities for converting CSV to other formats (like JSON, SQL), cleaning and validating data, analyzing column types, and joining multiple CSVs. Designed with usability in mind, csvkit is perfect for exploring and manipulating data stored in CSV format, allowing users to perform data analysis and transformations from the command line without needing specialized programming skills. It supports different encodings and delimiters and is valuable for quick data assessment, data cleaning tasks and building simple ETL pipelines.
SUBCOMMANDS
csvkit comprises several subcommands, each performing a specific task:
csvlook: Display CSV data in a human-readable format.
csvstat: Calculate descriptive statistics for CSV columns.
csvclean: Clean and normalize CSV data.
csvcut: Select specific columns from a CSV file.
csvgrep: Filter rows matching a pattern.
csvjoin: Join multiple CSV files based on common columns.
csvformat: Convert a CSV into different format and modify the delimiter
csvjson: Convert a CSV into JSON.
csvsql: Execute SQL Queries on CSV files.
in2csv: Convert other formats like Excel files into CSV.
sql2csv: Execute sql query and convert into CSV
BEST PRACTICES
When using csvkit, consider the following best practices:
Always specify the correct encoding to avoid character issues.
Use the correct delimiter and quote character for your CSV file.
When dealing with large files, consider using compression to improve performance.
Use `csvstat` to understand data before manipulation.
HISTORY
csvkit was created to provide a dedicated set of tools for handling CSV data on the command line. It addresses shortcomings of generic text processing utilities when dealing with structured CSV formats.
Development of csvkit was driven by the need for easier CSV data exploration and manipulation. Initially written as a single script, it grew into a suite of specialized utilities.
Over time, csvkit has gained popularity among data scientists, analysts, and developers for its simplicity, flexibility, and focus on CSV-specific operations. It continues to be actively maintained and extended with new features and improvements.