LinuxCommandLibrary

csvformat

Reformat CSV files to a consistent standard

TLDR

Convert to a tab-delimited file (TSV)

$ csvformat [[-T|--out-tabs]] [data.csv]
copy

Convert delimiters to a custom character
$ csvformat [[-D|--out-delimiter]] "[custom_character]" [data.csv]
copy

Convert line endings to carriage return (^M) + line feed
$ csvformat [[-M|--out-lineterminator]] "[\r\n]" [data.csv]
copy

Minimize use of quote characters
$ csvformat [[-U|--out-quoting]] 0 [data.csv]
copy

Maximize use of quote characters
$ csvformat [[-U|--out-quoting]] 1 [data.csv]
copy

SYNOPSIS

csvformat [options] [FILE ...]

Common options:
csvformat [-H | --no-header-row] [-T | --tabs] [-U [0|1|2] | --quoting [0|1|2]] [-D DELIMITER | --delimiter DELIMITER] [-Q QUOTECHAR | --quotechar QUOTECHAR] [-e ENCODING | --encoding ENCODING] [-t | --skip-initial-space] [-p | --doublequote] [-x ESCAPECHAR | --escapechar ESCAPECHAR] [-m LINETERMINATOR | --lineterminator LINETERMINATOR] [-S | --skip-header-row] [FILE ...]

PARAMETERS

FILE ...
    One or more CSV files to process. If omitted, csvformat reads from standard input (stdin).

-H, --no-header-row
    Do not output the header row in the reformatted CSV.

-T, --tabs
    Use tabs as the field delimiter for the output CSV. This is a shortcut for -D '\t'.

-U [0|1|2], --quoting [0|1|2]
    Control the quoting behavior for output fields.
0: Quote all fields.
1: Quote non-numeric fields (default for csvformat).
2: Quote minimally, only when necessary (e.g., fields containing delimiters or quote characters).

-D DELIMITER, --delimiter DELIMITER
    Specify a custom output field delimiter. Can be a single character or a multi-character string.

-Q QUOTECHAR, --quotechar QUOTECHAR
    Specify a custom output quote character, default is double-quote (").

-e ENCODING, --encoding ENCODING
    Specify the character encoding of the input CSV file, e.g., 'iso-8859-1'.

-t, --skip-initial-space
    When reading input, skip initial whitespace immediately following the delimiter.

-p, --doublequote
    Force double-quoting of embedded quotes within a field. This is the default CSV behavior and generally recommended.

-x ESCAPECHAR, --escapechar ESCAPECHAR
    Specify an escape character for output, used to escape the delimiter or quote character if they appear in a field and quoting is not applied.

-m LINETRMINATOR, --lineterminator LINETRMINATOR
    Specify a custom line terminator for output. Default is system-dependent (e.g., '\n' for Unix, '\r\n' for Windows).

-S, --skip-header-row
    Skip the first row when reading input. Useful if the input file has a header but you don't want to process it.

DESCRIPTION

The csvformat command is a utility from the csvkit suite designed to reformat Comma Separated Values (CSV) files. It allows users to control various aspects of the CSV dialect, such as the delimiter, quote character, quoting style, and line terminator. This is particularly useful when converting CSV files between different formats, standardizing disparate CSV inputs, or preparing data for systems that expect a specific CSV dialect.

For example, it can convert a tab-separated file to a comma-separated one, change how fields are quoted (e.g., quoting all fields, only non-numeric, or minimally), or adjust the character used for escaping. It reads data from standard input or specified files and writes the reformatted CSV to standard output.

CAVEATS

csvformat is part of the csvkit suite, written in Python, and thus requires a Python environment to be installed. While highly versatile, it expects reasonably well-formed CSV input. Extremely malformed files might still require pre-processing with other tools or manual cleaning. The default quoting behavior (-U 1, quote non-numeric) might differ from other tools that default to minimal quoting, which is important to consider for consistent output.

HISTORY

csvformat is a core component of the csvkit project, an open-source suite of command-line tools for working with CSV data. csvkit was originally created by Christopher Groskopf and first released around 2012-2013. It rapidly gained traction within the data science, journalism, and analysis communities due to its simplicity, Unix-like philosophy (do one thing well), and Python-based extensibility. csvformat has been fundamental to the suite since its early development, addressing the common need for transforming CSV files between different dialects.

SEE ALSO

csvkit(1), csvlook(1), csvcut(1), csvgrep(1), csvstack(1), csvjson(1), in2csv(1)

Copied to clipboard