LinuxCommandLibrary

csvkit

Convert and work with CSV files

TLDR

Run a command on a CSV file with a custom delimiter

$ [command] [[-d|--delimiter]] [delimiter] [path/to/file.csv]
copy

Run a command on a CSV file with a tab as a delimiter (overrides -d)
$ [command] [[-t|--tabs]] [path/to/file.csv]
copy

Run a command on a CSV file with a custom quote character
$ [command] [[-q|--quotechar]] [quote_char] [path/to/file.csv]
copy

Run a command on a CSV file with no header row
$ [command] [[-H|--no-header-row]] [path/to/file.csv]
copy

SYNOPSIS

csvkit [command] [options] [input_file]

PARAMETERS

--encoding ENCODING
    Character encoding of the input CSV file.

--snifflimit BYTES
    Limit CSV dialect sniffing to the specified number of bytes.

--compression COMPRESSION
    Specify the compression format (e.g., 'gzip', 'bz2').

--format FORMAT
    Format to use when converting. Most commands support specific formats like 'csv', 'json', or 'sql'.

--delimiter DELIMITER
    Character delimiting fields within a row.

--quotechar QUOTECHAR
    Character used to enclose fields containing special characters.

--escapechar ESCAPECHAR
    Character used to escape the quotechar.

--doublequote
    Whether or not double quotes are doubled.

--skipinitialspace
    Ignore whitespace immediately following the delimiter.

--tabs
    Specify that the input is tab-delimited.

--no-header-row
    Indicate that the input CSV file has no header row.

--names NAMES
    Specify column names (comma-separated).

--blanks
    Do not convert blank cells to NULL.

--date-format DATE_FORMAT
    Date format string (e.g., '%Y-%m-%d').

--datetime-format DATETIME_FORMAT
    Datetime format string (e.g., '%Y-%m-%d %H:%M:%S').

--locale LOCALE
    Specify locale to use when parsing dates.

--db CONNECTION_STRING
    Connection string to the database (for database operations).

--table TABLE_NAME
    Name of the database table.

--insert
    Insert data into the database table (if not using `csvsql`).

--create
    Create table for database operation.

-v, --version
    Show program's version number and exit.

-h, --help
    Show help message and exit.

DESCRIPTION

csvkit is a suite of command-line tools for working with CSV files. It provides utilities for converting CSV to other formats (like JSON, SQL), cleaning and validating data, analyzing column types, and joining multiple CSVs. Designed with usability in mind, csvkit is perfect for exploring and manipulating data stored in CSV format, allowing users to perform data analysis and transformations from the command line without needing specialized programming skills. It supports different encodings and delimiters and is valuable for quick data assessment, data cleaning tasks and building simple ETL pipelines.

SUBCOMMANDS

csvkit comprises several subcommands, each performing a specific task:
csvlook: Display CSV data in a human-readable format.
csvstat: Calculate descriptive statistics for CSV columns.
csvclean: Clean and normalize CSV data.
csvcut: Select specific columns from a CSV file.
csvgrep: Filter rows matching a pattern.
csvjoin: Join multiple CSV files based on common columns.
csvformat: Convert a CSV into different format and modify the delimiter
csvjson: Convert a CSV into JSON.
csvsql: Execute SQL Queries on CSV files.
in2csv: Convert other formats like Excel files into CSV.
sql2csv: Execute sql query and convert into CSV

BEST PRACTICES

When using csvkit, consider the following best practices:
Always specify the correct encoding to avoid character issues.
Use the correct delimiter and quote character for your CSV file.
When dealing with large files, consider using compression to improve performance.
Use `csvstat` to understand data before manipulation.

HISTORY

csvkit was created to provide a dedicated set of tools for handling CSV data on the command line. It addresses shortcomings of generic text processing utilities when dealing with structured CSV formats.

Development of csvkit was driven by the need for easier CSV data exploration and manipulation. Initially written as a single script, it grew into a suite of specialized utilities.

Over time, csvkit has gained popularity among data scientists, analysts, and developers for its simplicity, flexibility, and focus on CSV-specific operations. It continues to be actively maintained and extended with new features and improvements.

SEE ALSO

cut(1), grep(1), awk(1), sed(1)

Copied to clipboard