LinuxCommandLibrary

csvcut

Extract columns from CSV files

TLDR

Print indices and names of all columns

$ csvcut [[-n|--names]] [data.csv]
copy

Extract the first and third columns
$ csvcut [[-c|--columns]] [1,3] [data.csv]
copy

Extract all columns except the fourth one
$ csvcut [[-C|--not-columns]] [4] [data.csv]
copy

Extract the columns named "id" and "first name" (in that order)
$ csvcut [[-c|--columns]] [id,"first name"] [data.csv]
copy

SYNOPSIS

csvcut [OPTIONS] [FILE]

PARAMETERS

-c COLUMN[,COLUMN,...]
    A list of column names or 1-based column indices to be extracted.
Use a comma to separate each column.
E.g. -c 1,3 or -c firstname,lastname

-C COLUMN[,COLUMN,...]
    A list of column names or 1-based column indices to be excluded.
This is the inverse of -c.

-n
    Display column names and exit. Useful for identifying column names before using -c or -C.

-d DELIMITER
    Specify a column delimiter. Default is ','. E.g. -d ';'

-t
    Display tables using ascii-art.

--zero-based
    Column indices provided to -c are zero-based, not one-based.

--names
    Shorthand for '-n'.

--no-header-row
    Do not output the header row.

--escapechar ESCAPECHAR
    Character used to escape quotes inside fields. Defaults to single quote.
For example if your escape character is \, then --escapechar \

--quoting QUOTING
    Quoting style used when reading CSV files. Valid values are minimal, all, numeric, none or quote_[MINIMAL|ALL|NUMERIC|NONE]. Defaults to minimal.

-H, --no-header-row
    Indicate that the input file does not have a header row.

--snifflimit SNIFFLIMIT
    Limit CSVKit's sniffing to this many bytes. Specified in bytes. Default is 1024.

--encoding ENCODING
    Specify an encoding. E.g. --encoding utf-8

-l, --linenumbers
    Prepend line numbers to each row of output. This is for debugging purposes.

-I, --no-inference
    Disable type inference when parsing the input.

-v, --version
    Show program's version number and exit

-h, --help
    Show program's help message and exit

DESCRIPTION

csvcut is a command-line utility used to extract columns from Comma Separated Value (CSV) files. It's part of the csvkit suite, a collection of tools for working with CSV data. It enables you to quickly select specific columns based on their name or position (index) and output a new CSV file containing only the chosen columns. csvcut is invaluable for data manipulation, data cleaning, and creating subsets of larger datasets for analysis. It's very useful in scripts and pipelines to automate tasks where you need to isolate specific information within a CSV file. By default, csvcut takes a delimiter automatically, but it can be changed with -d. csvcut also handles quoting, ensuring that data with embedded delimiters (like commas inside fields) are treated correctly. This makes it more robust than using simple text-processing tools like cut or awk for CSV data.

CAVEATS

csvcut may have difficulties with very large CSV files if you are low on memory. In that case, consider using tools designed for larger datasets.
If your CSV file doesn't conform to standard CSV format (e.g., inconsistent quoting), csvcut might not parse it correctly.
Column selection is based on either name or position, using a combination can cause unexpected behavior.

ERROR HANDLING

By default, csvcut will exit with a non-zero exit code if it encounters an error. To handle errors more gracefully, you can use shell scripting techniques like try-catch blocks or redirecting stderr to a file.

FILE INPUT

If no FILE is specified, csvcut reads from standard input (stdin), allowing it to be used in pipelines.

HISTORY

csvcut is part of the csvkit suite, which was created to provide a robust set of tools specifically for working with CSV data. It's been actively developed to handle various CSV formats and use cases and is a great tool for any data engineer.

SEE ALSO

csvlook(1), csvstat(1), csvgrep(1), cut(1), awk(1)

Copied to clipboard