csvcut
Extract columns from CSV files
TLDR
Print indices and names of all columns
Extract the first and third columns
Extract all columns except the fourth one
Extract the columns named "id" and "first name" (in that order)
SYNOPSIS
csvcut [-H] [-c COLUMNS | -C COLUMNS] [-n] [-d DELIMITER] [-q QUOTECHAR] [FILE ...]
PARAMETERS
-c COLUMNS, --columns COLUMNS
A comma-separated list of column names or 0-based indices to include in the output. When a header is present, names are preferred; otherwise, indices are used.
-C COLUMNS, --not-columns COLUMNS
A comma-separated list of column names or 0-based indices to exclude from the output. This option works inversely to -c.
-n, --names
Prints column names from the header row (or 0-based indices if -H is used) and exits. This is useful for discovering available column identifiers.
-H, --no-header-row
Specifies that the input CSV file does not contain a header row. When this option is used, columns must be referenced by their 0-based integer indices.
-d DELIMITER, --delimiter DELIMITER
Specifies the single-character field delimiter used in the input CSV file. The default delimiter is a comma (,).
-q QUOTECHAR, --quotechar QUOTECHAR
Specifies the single-character used to quote fields in the input CSV file. The default quote character is a double quote (").
FILE ...
One or more input CSV file paths. If no file paths are provided, csvcut reads CSV data from standard input (stdin).
DESCRIPTION
csvcut is a powerful command-line utility from the csvkit suite designed for robust manipulation of CSV data. Unlike traditional text-processing tools like cut that operate on byte or character positions, csvcut intelligently parses CSV files, understanding their delimiters, quoted fields, and header rows. This allows users to accurately select, reorder, or remove specific columns by name or index. It's an essential tool for data cleaning, preparing datasets for further analysis, or simply rearranging columns for better readability. csvcut can process input from files or standard input, and it outputs the modified CSV data to standard output, making it highly compatible with shell pipelines.
CAVEATS
- csvcut expects well-formed CSV input; malformed data may lead to unexpected results or errors.
- Column selection by name is case-sensitive.
- When using the -H (no header row) option, columns must be referenced by their 0-based integer index, not by name.
COLUMN INDEXING BEHAVIOR
By default, if an input CSV has a header row, columns are best referenced by their names for clarity and robustness. However, if the -H or --no-header-row option is used, csvcut treats the first line of the input as data, and columns must then be selected using 0-based integer indices (e.g., '0' for the first column, '1' for the second, and so on).
INPUT AND OUTPUT STREAMS
csvcut is designed to integrate seamlessly into shell pipelines. If no input file is specified as an argument, it automatically reads CSV data from standard input (stdin). All processed output, whether selected or reordered columns, is written to standard output (stdout), allowing it to be easily piped to other command-line utilities for further processing or redirected to a new file.
HISTORY
csvcut is a fundamental utility within the csvkit suite, which was developed by Christopher Groskopf. The motivation behind its creation was to overcome the limitations of traditional Unix tools (like cut) when processing structured CSV data, particularly regarding quoted fields and varying delimiters. It provides a robust, Python-based solution for common CSV manipulation tasks, making command-line CSV processing more reliable and user-friendly.