LinuxCommandLibrary

csvformat

Reformat CSV files to a consistent standard

TLDR

Convert to a tab-delimited file (TSV)

$ csvformat [[-T|--out-tabs]] [data.csv]
copy

Convert delimiters to a custom character
$ csvformat [[-D|--out-delimiter]] "[custom_character]" [data.csv]
copy

Convert line endings to carriage return (^M) + line feed
$ csvformat [[-M|--out-lineterminator]] "[\r\n]" [data.csv]
copy

Minimize use of quote characters
$ csvformat [[-U|--out-quoting]] 0 [data.csv]
copy

Maximize use of quote characters
$ csvformat [[-U|--out-quoting]] 1 [data.csv]
copy

SYNOPSIS

csvformat [OPTION...] [FILE]

PARAMETERS

-d, --delimiter=DELIM
    Output delimiter character. Default: ,

-t, --tabs
    Tab-delimited output

-q, --quotechar=CHAR
    Quote character. Default: "

-Q, --quote-all
    Quote all fields

-e, --excel
    Quote non-numeric fields like Excel

-b, --blanks
    Represent blanks as spaces, not empty quotes

-u, --upper
    Convert strings to uppercase

-l, --lower
    Convert strings to lowercase

-p, --escapechar=CHAR
    Escape character for quotes. Default: \

-z
    Null-terminated lines

-D, --date-format=FORMAT
    Input date format (strptime)

-f
    Output date format (strftime)

-B, --bom
    Add UTF-8 BOM

-H, --no-header-row
    Treat first row as data

-K N, --skip-lines=N
    Skip first N lines

-S, --skipinitialspace
    Ignore spaces after delimiter

--doublequote
    Doublequote quoting style

--maxfieldsize N
    Max field size in bytes

--maxrows N
    Max rows to process

DESCRIPTION

csvformat is a command-line utility from the csvkit suite designed to standardize and convert CSV (Comma-Separated Values) data. It reads CSV input from files or stdin and outputs reformatted CSV, allowing changes to delimiters, quoting styles, date formats, and more. Ideal for cleaning inconsistent CSV data, such as converting pipe-delimited files to comma-delimited or ensuring proper quoting.

Key features include specifying custom delimiters (e.g., tabs, semicolons), quoting all fields or only those with special characters, converting case, handling dates with input/output formats, adding/removing BOM, and skipping lines. It preserves headers by default but can skip them. Supports Excel-like quoting and blank field representation.

Common use cases: normalizing exported data from databases or spreadsheets, preparing files for import into other tools, or batch-processing CSVs. Output goes to stdout unless redirected. Handles large files efficiently via streaming, but has limits like max field size/rows in some versions.

Part of csvkit, it's Python-based, cross-platform, and excels in data pipelines with tools like csvcut, csvgrep.

CAVEATS

Streaming only; no random access. Date parsing strict, may fail on invalid dates. Large files may hit memory limits in older versions. Not for binary data. Excel mode approximates but not identical.

EXAMPLE

csvformat -d '|' -q '"' data.txt > formatted.csv
Converts pipe-delimited to quoted CSV.

PIPING

csvcut -c 1,3 input.csv | csvformat -q '' > noquotes.csv
Removes quotes after column selection.

HISTORY

Developed by Christopher Groskopf as part of csvkit suite, first released ~2010. Evolved through versions; now at 2.x under new maintainers. Python 3 native since 1.0. Widely used in data journalism and ETL pipelines.

SEE ALSO

csvcut(1), csvlook(1), csvgrep(1), csvkit(1), awk(1), sed(1), cut(1)

Copied to clipboard