LinuxCommandLibrary

csvjson

Convert CSV data to JSON

SYNOPSIS

csvjson [OPTIONS] [FILE]

PARAMETERS

FILE
    The path to the input CSV file. If omitted, csvjson reads CSV data from standard input (stdin).

-h, --help
    Display a help message for csvjson and exit.

-d DELIMITER, --delimiter DELIMITER
    Specify the field delimiter character used in the input CSV file. Defaults to a comma (,).

-t, --tabs
    Treat the input CSV file as tab-delimited. This is a shorthand for `--delimiter '\t'`.

-q QUOTECHAR, --quote-character QUOTECHAR
    Define the character used to quote fields containing special characters like delimiters or newlines. Defaults to a double quote (").

-u {0,1,2,3}, --quoting {0,1,2,3}
    Control the quoting behavior for output fields. Refer to the Python `csv` module documentation for the meaning of values 0-3.

-b, --doublequote
    Instructs the CSV parser to interpret two consecutive quote characters within a quoted field as a single, escaped quote character.

-p, --no-skip-initial-space
    By default, spaces immediately following the delimiter are skipped. This option disables that behavior, treating leading spaces as part of the field content.

-z ESCAPECHAR, --escapechar ESCAPECHAR
    Specify a character that is used to escape the delimiter or quote character within fields.

-e ENCODING, --encoding ENCODING
    Set the character encoding of the input CSV file, e.g., 'utf-8'.

-L LOCALE, --locale LOCALE
    Specify the locale to use for parsing numbers and dates in the input CSV.

-S, --skip-initial-space
    Skip spaces immediately following the delimiter. This is the default behavior; the `-p` option disables it.

--blanks
    Do not convert empty strings in the CSV to `null` values in the JSON output; instead, treat them as literal empty strings.

--no-inference
    Disable automatic type inference. All CSV fields will be treated as strings in the JSON output, potentially speeding up processing for large files.

--zero-pad-float
    When converting floating-point numbers, ensure they are zero-padded to maintain precision.

--skip-lines SKIP_LINES
    Specify the number of lines to skip at the beginning of the input file before parsing the header or data rows.

--date-format DATE_FORMAT
    Provide a `strftime` format string (e.g., '%Y-%m-%d') to use when parsing date columns.

--datetime-format DATETIME_FORMAT
    Provide a `strftime` format string to use when parsing datetime columns.

--null-value NULL_VALUE
    Specify a string value in the CSV that should be interpreted as `null` in the output JSON. Defaults to empty strings.

-H, --no-header-row
    Treat the first row of the CSV as a data row rather than a header row. Columns will be referenced by numerical index (0-based).

-K KEY, --key KEY
    Use the values from the specified column as keys in a JSON object, instead of outputting an array. This option requires `--enumerate`.

--snifflimit SNIFFLIMIT
    Limit the number of rows `csvjson` sniffs from the input to infer column types. Defaults to 1024 rows.

--names
    Output only a single JSON array containing the column names (header row) and then exit.

-c COLUMNS, --columns COLUMNS
    A comma-separated list of column names or 0-based indices (e.g., '1,colB,3') to include in the output. Only specified columns will be processed.

-C COLUMNS, --exclude-columns COLUMNS
    A comma-separated list of column names or 0-based indices to exclude from the output. All other columns will be included.

-o OUTPUT_PATH, --output OUTPUT_PATH
    Specify the path to an output file where the JSON results will be written. If omitted, output is sent to standard output (stdout).

--pretty
    Format the JSON output with indentation and newlines, making it more human-readable. Equivalent to `--indent 2`.

--indent INDENT
    Specify the number of spaces to indent pretty-printed JSON output. Implies `--pretty` if not already set.

--enumerate
    Output an array of JSON objects, where each object represents a row. This option is required when using the `--key` option.

--stream
    Output JSON objects one per line (Newline Delimited JSON, NDJSON). This is suitable for processing very large datasets without holding the entire JSON structure in memory.

--format {array,objects,ndjson}
    Specify the desired output JSON structure: `array` (an array of arrays), `objects` (an array of objects, the default), or `ndjson` (newline-delimited JSON objects).

DESCRIPTION

The csvjson command is a powerful utility from the csvkit suite designed for converting Comma Separated Values (CSV) data into JavaScript Object Notation (JSON) format.

It processes CSV input, which can be provided via standard input or from a specified file, and outputs JSON to standard output or an output file. By default, csvjson infers data types (e.g., integers, floats, booleans, dates) for each column and represents each CSV row as a JSON object within a larger JSON array. It supports various options for controlling input parsing (delimiters, quoting, encoding), column selection, and output formatting (pretty printing, indentation, different JSON structures like arrays of arrays or newline-delimited JSON).

csvjson is particularly useful in data processing pipelines, allowing for seamless integration of CSV data with applications and services that consume JSON, such as web APIs, NoSQL databases, or JavaScript frontends. Its flexibility makes it an essential tool for data engineers, analysts, and developers working with structured text data.

CAVEATS

When processing very large CSV files, especially without `--no-inference`, csvjson can consume significant memory as it attempts to infer types and construct the JSON output. Using `--stream` or `--format ndjson` is highly recommended for large datasets to reduce memory footprint by outputting line-by-line.

Automatic type inference might not always correctly identify the desired data type for all columns, particularly for mixed-type columns or ambiguous date/datetime formats. In such cases, explicitly disabling inference with `--no-inference` or providing `--date-format`/`--datetime-format` can be necessary.

The `--key` option requires careful use with `--enumerate` and is designed for specific JSON output structures, not general conversion.

PART OF CSVKIT

csvjson is one of several specialized tools included in the csvkit collection, a comprehensive set of command-line utilities for working with CSV files. Other tools in the suite include `csvcut` for column selection, `csvgrep` for row filtering, and `csvstat` for statistical analysis, all designed to integrate seamlessly within shell scripts and data workflows.

USE IN DATA PIPELINES

Due to its command-line nature and support for standard input/output, csvjson is ideal for use in data processing pipelines. It can easily be chained with other Unix commands (e.g., `cat input.csv | csvjson | jq '.'`) for complex transformations, making it a flexible component in ETL (Extract, Transform, Load) processes or data analysis workflows.

HISTORY

csvjson is part of the csvkit suite of utilities, an open-source project initiated by Christopher Groskopf around 2012. csvkit was developed to provide powerful, command-line tools for working with CSV files, leveraging Python's robust `csv` module. Its development focused on making data manipulation and transformation accessible through standard Unix-like pipes and commands, enhancing productivity for data professionals. csvjson specifically addresses the common need to bridge between CSV and JSON data formats in modern data pipelines.

SEE ALSO

csvlook(1), csvgrep(1), csvcut(1), jq(1), json_pp(1)

Copied to clipboard