csvjson
Convert CSV data to JSON
SYNOPSIS
csvjson [OPTIONS] [FILE]
PARAMETERS
FILE
The path to the input CSV file. If omitted, csvjson reads CSV data from standard input (stdin).
-h, --help
Display a help message for csvjson and exit.
-d DELIMITER, --delimiter DELIMITER
Specify the field delimiter character used in the input CSV file. Defaults to a comma (,).
-t, --tabs
Treat the input CSV file as tab-delimited. This is a shorthand for `--delimiter '\t'`.
-q QUOTECHAR, --quote-character QUOTECHAR
Define the character used to quote fields containing special characters like delimiters or newlines. Defaults to a double quote (").
-u {0,1,2,3}, --quoting {0,1,2,3}
Control the quoting behavior for output fields. Refer to the Python `csv` module documentation for the meaning of values 0-3.
-b, --doublequote
Instructs the CSV parser to interpret two consecutive quote characters within a quoted field as a single, escaped quote character.
-p, --no-skip-initial-space
By default, spaces immediately following the delimiter are skipped. This option disables that behavior, treating leading spaces as part of the field content.
-z ESCAPECHAR, --escapechar ESCAPECHAR
Specify a character that is used to escape the delimiter or quote character within fields.
-e ENCODING, --encoding ENCODING
Set the character encoding of the input CSV file, e.g., 'utf-8'.
-L LOCALE, --locale LOCALE
Specify the locale to use for parsing numbers and dates in the input CSV.
-S, --skip-initial-space
Skip spaces immediately following the delimiter. This is the default behavior; the `-p` option disables it.
--blanks
Do not convert empty strings in the CSV to `null` values in the JSON output; instead, treat them as literal empty strings.
--no-inference
Disable automatic type inference. All CSV fields will be treated as strings in the JSON output, potentially speeding up processing for large files.
--zero-pad-float
When converting floating-point numbers, ensure they are zero-padded to maintain precision.
--skip-lines SKIP_LINES
Specify the number of lines to skip at the beginning of the input file before parsing the header or data rows.
--date-format DATE_FORMAT
Provide a `strftime` format string (e.g., '%Y-%m-%d') to use when parsing date columns.
--datetime-format DATETIME_FORMAT
Provide a `strftime` format string to use when parsing datetime columns.
--null-value NULL_VALUE
Specify a string value in the CSV that should be interpreted as `null` in the output JSON. Defaults to empty strings.
-H, --no-header-row
Treat the first row of the CSV as a data row rather than a header row. Columns will be referenced by numerical index (0-based).
-K KEY, --key KEY
Use the values from the specified column as keys in a JSON object, instead of outputting an array. This option requires `--enumerate`.
--snifflimit SNIFFLIMIT
Limit the number of rows `csvjson` sniffs from the input to infer column types. Defaults to 1024 rows.
--names
Output only a single JSON array containing the column names (header row) and then exit.
-c COLUMNS, --columns COLUMNS
A comma-separated list of column names or 0-based indices (e.g., '1,colB,3') to include in the output. Only specified columns will be processed.
-C COLUMNS, --exclude-columns COLUMNS
A comma-separated list of column names or 0-based indices to exclude from the output. All other columns will be included.
-o OUTPUT_PATH, --output OUTPUT_PATH
Specify the path to an output file where the JSON results will be written. If omitted, output is sent to standard output (stdout).
--pretty
Format the JSON output with indentation and newlines, making it more human-readable. Equivalent to `--indent 2`.
--indent INDENT
Specify the number of spaces to indent pretty-printed JSON output. Implies `--pretty` if not already set.
--enumerate
Output an array of JSON objects, where each object represents a row. This option is required when using the `--key` option.
--stream
Output JSON objects one per line (Newline Delimited JSON, NDJSON). This is suitable for processing very large datasets without holding the entire JSON structure in memory.
--format {array,objects,ndjson}
Specify the desired output JSON structure: `array` (an array of arrays), `objects` (an array of objects, the default), or `ndjson` (newline-delimited JSON objects).
DESCRIPTION
The csvjson command is a powerful utility from the csvkit suite designed for converting Comma Separated Values (CSV) data into JavaScript Object Notation (JSON) format.
It processes CSV input, which can be provided via standard input or from a specified file, and outputs JSON to standard output or an output file. By default, csvjson infers data types (e.g., integers, floats, booleans, dates) for each column and represents each CSV row as a JSON object within a larger JSON array. It supports various options for controlling input parsing (delimiters, quoting, encoding), column selection, and output formatting (pretty printing, indentation, different JSON structures like arrays of arrays or newline-delimited JSON).
csvjson is particularly useful in data processing pipelines, allowing for seamless integration of CSV data with applications and services that consume JSON, such as web APIs, NoSQL databases, or JavaScript frontends. Its flexibility makes it an essential tool for data engineers, analysts, and developers working with structured text data.
CAVEATS
When processing very large CSV files, especially without `--no-inference`, csvjson can consume significant memory as it attempts to infer types and construct the JSON output. Using `--stream` or `--format ndjson` is highly recommended for large datasets to reduce memory footprint by outputting line-by-line.
Automatic type inference might not always correctly identify the desired data type for all columns, particularly for mixed-type columns or ambiguous date/datetime formats. In such cases, explicitly disabling inference with `--no-inference` or providing `--date-format`/`--datetime-format` can be necessary.
The `--key` option requires careful use with `--enumerate` and is designed for specific JSON output structures, not general conversion.
PART OF CSVKIT
csvjson is one of several specialized tools included in the csvkit collection, a comprehensive set of command-line utilities for working with CSV files. Other tools in the suite include `csvcut` for column selection, `csvgrep` for row filtering, and `csvstat` for statistical analysis, all designed to integrate seamlessly within shell scripts and data workflows.
USE IN DATA PIPELINES
Due to its command-line nature and support for standard input/output, csvjson is ideal for use in data processing pipelines. It can easily be chained with other Unix commands (e.g., `cat input.csv | csvjson | jq '.'`) for complex transformations, making it a flexible component in ETL (Extract, Transform, Load) processes or data analysis workflows.
HISTORY
csvjson is part of the csvkit suite of utilities, an open-source project initiated by Christopher Groskopf around 2012. csvkit was developed to provide powerful, command-line tools for working with CSV files, leveraging Python's robust `csv` module. Its development focused on making data manipulation and transformation accessible through standard Unix-like pipes and commands, enhancing productivity for data professionals. csvjson specifically addresses the common need to bridge between CSV and JSON data formats in modern data pipelines.


