LinuxCommandLibrary

csvtool

Manipulate CSV (Comma Separated Value) files

TLDR

Extract the second column from a CSV file

$ csvtool [[-c|--column]] [2] [path/to/file.csv]
copy

Extract the second and fourth columns from a CSV file
$ csvtool [[-c|--column]] [2,4] [path/to/file.csv]
copy

Extract lines from a CSV file where the second column exactly matches 'Foo'
$ csvtool [[-c|--column]] [2] [[-s|--search]] '[^Foo$]' [path/to/file.csv]
copy

Extract lines from a CSV file where the second column starts with 'Bar'
$ csvtool [[-c|--column]] [2] [[-s|--search]] '[^Bar]' [path/to/file.csv]
copy

Find lines in a CSV file where the second column ends with 'Baz' and then extract the third and sixth columns
$ csvtool [[-c|--column]] [2] [[-s|--search]] '[Baz$]' [path/to/file.csv] | csvtool [[-e|--no-header]] [[-c|--column]] [3,6]
copy

SYNOPSIS

csvtool subcommand [options] [input_file...]

PARAMETERS

-t char
    Specify input field separator (default ',').

--separator char
    Long form for specifying input field separator.

-T char
    Specify output field separator (default ',').

--output-separator char
    Long form for specifying output field separator.

-u
    Do not quote output fields.

--unquoted
    Long form for not quoting output fields.

-q char
    Specify output quote character (default '"').

--quote char
    Long form for specifying output quote character.

-U char
    Separate unquoted fields with specified character.

--unquoted-separator char
    Long form for separating unquoted fields with specified character.

-n
    Do not treat the first line as a header.

--no-header
    Long form for not treating the first line as a header.

-H
    Explicitly treat the first line as a header (default for some commands).

--header
    Long form for explicitly treating the first line as a header.

-v
    Show version information and exit.

--version
    Long form for showing version information.

-h
    Display help message and exit.

--help
    Long form for displaying help message.

subcommand-specific-options
    Many subcommands accept additional options tailored to their specific functionality. Use `csvtool subcommand --help` for details (e.g., `-f`, `-r`, `-n`).

DESCRIPTION

csvtool is a powerful and versatile command-line utility designed specifically for processing Comma Separated Values (CSV) files. Unlike general-purpose text processing tools like awk or sed, csvtool understands the structured nature of CSV, correctly handling quoted fields, various delimiters, and embedded newlines. It provides a wide array of operations, including selecting specific columns, reordering fields, merging multiple CSV files, splitting large files, sorting data, identifying unique records, and counting rows. Its intuitive syntax makes complex CSV manipulations straightforward, making it an indispensable tool for data analysts, system administrators, and anyone working with tabular data in the command line environment. It streamlines tasks that would be cumbersome with less specialized tools, ensuring data integrity by respecting CSV formatting rules.

CAVEATS

csvtool is a powerful tool but might not be installed by default on all Linux distributions, requiring manual installation. While generally efficient, for extremely large datasets or highly complex transformations, specialized scripting (e.g., Python with Pandas) or database solutions might offer superior performance or flexibility. The 'select' subcommand uses a custom expression language that requires familiarity with its syntax.

KEY FEATURES

csvtool is distinguished by its direct handling of CSV intricacies, including quoted fields and varied delimiters, ensuring data integrity. It offers a rich set of subcommands for common data tasks, such as column extraction, row filtering, sorting, merging, and more. Its performance is optimized for typical command-line usage, providing a fast and reliable alternative to general-purpose text utilities for structured data.

BASIC USAGE EXAMPLES

Below are some common usage examples for csvtool:
Extract columns 1 and 3 from a file:
csvtool col 1,3 input.csv
Sort a CSV file by the second column numerically:
csvtool sort -n 2 input.csv
Filter rows where the first column equals 'active':
csvtool select '{1} == "active"' input.csv
Concatenate multiple CSV files:
csvtool cat file1.csv file2.csv > output.csv

HISTORY

csvtool was developed by Chris Double as a lightweight, efficient, and user-friendly command-line alternative for common CSV manipulation tasks. Its design prioritizes correct handling of CSV specifics like quoted fields and delimiters, adhering to the Unix philosophy. It has become a valuable utility for quick and reliable CSV processing in command-line environments, avoiding reliance on more complex scripting language setups.

SEE ALSO

awk(1), sed(1), grep(1), cut(1), sort(1), uniq(1), mlr(1), q(1), csvkit(1)

Copied to clipboard