csvformat
Reformat CSV files to a consistent standard
TLDR
Convert to a tab-delimited file (TSV)
Convert delimiters to a custom character
Convert line endings to carriage return (^M) + line feed
Minimize use of quote characters
Maximize use of quote characters
SYNOPSIS
csvformat [options] [FILE ...]
Common options:
csvformat [-H | --no-header-row] [-T | --tabs] [-U [0|1|2] | --quoting [0|1|2]] [-D DELIMITER | --delimiter DELIMITER] [-Q QUOTECHAR | --quotechar QUOTECHAR] [-e ENCODING | --encoding ENCODING] [-t | --skip-initial-space] [-p | --doublequote] [-x ESCAPECHAR | --escapechar ESCAPECHAR] [-m LINETERMINATOR | --lineterminator LINETERMINATOR] [-S | --skip-header-row] [FILE ...]
PARAMETERS
FILE ...
One or more CSV files to process. If omitted, csvformat reads from standard input (stdin).
-H, --no-header-row
Do not output the header row in the reformatted CSV.
-T, --tabs
Use tabs as the field delimiter for the output CSV. This is a shortcut for -D '\t'.
-U [0|1|2], --quoting [0|1|2]
Control the quoting behavior for output fields.
0: Quote all fields.
1: Quote non-numeric fields (default for csvformat).
2: Quote minimally, only when necessary (e.g., fields containing delimiters or quote characters).
-D DELIMITER, --delimiter DELIMITER
Specify a custom output field delimiter. Can be a single character or a multi-character string.
-Q QUOTECHAR, --quotechar QUOTECHAR
Specify a custom output quote character, default is double-quote (").
-e ENCODING, --encoding ENCODING
Specify the character encoding of the input CSV file, e.g., 'iso-8859-1'.
-t, --skip-initial-space
When reading input, skip initial whitespace immediately following the delimiter.
-p, --doublequote
Force double-quoting of embedded quotes within a field. This is the default CSV behavior and generally recommended.
-x ESCAPECHAR, --escapechar ESCAPECHAR
Specify an escape character for output, used to escape the delimiter or quote character if they appear in a field and quoting is not applied.
-m LINETRMINATOR, --lineterminator LINETRMINATOR
Specify a custom line terminator for output. Default is system-dependent (e.g., '\n' for Unix, '\r\n' for Windows).
-S, --skip-header-row
Skip the first row when reading input. Useful if the input file has a header but you don't want to process it.
DESCRIPTION
The csvformat command is a utility from the csvkit suite designed to reformat Comma Separated Values (CSV) files. It allows users to control various aspects of the CSV dialect, such as the delimiter, quote character, quoting style, and line terminator. This is particularly useful when converting CSV files between different formats, standardizing disparate CSV inputs, or preparing data for systems that expect a specific CSV dialect.
For example, it can convert a tab-separated file to a comma-separated one, change how fields are quoted (e.g., quoting all fields, only non-numeric, or minimally), or adjust the character used for escaping. It reads data from standard input or specified files and writes the reformatted CSV to standard output.
CAVEATS
csvformat is part of the csvkit suite, written in Python, and thus requires a Python environment to be installed. While highly versatile, it expects reasonably well-formed CSV input. Extremely malformed files might still require pre-processing with other tools or manual cleaning. The default quoting behavior (-U 1, quote non-numeric) might differ from other tools that default to minimal quoting, which is important to consider for consistent output.
HISTORY
csvformat is a core component of the csvkit project, an open-source suite of command-line tools for working with CSV data. csvkit was originally created by Christopher Groskopf and first released around 2012-2013. It rapidly gained traction within the data science, journalism, and analysis communities due to its simplicity, Unix-like philosophy (do one thing well), and Python-based extensibility. csvformat has been fundamental to the suite since its early development, addressing the common need for transforming CSV files between different dialects.