LinuxCommandLibrary

csvsort

Sort CSV (Comma Separated Values) files

TLDR

Sort a CSV file by column 9

$ csvsort [[-c|--columns]] [9] [data.csv]
copy

Sort a CSV file by the "name" column in descending order
$ csvsort [[-r|--reverse]] [[-c|--columns]] [name] [data.csv]
copy

Sort a CSV file by column 2, then by column 4
$ csvsort [[-c|--columns]] [2,4] [data.csv]
copy

Sort a CSV file without inferring data types
$ csvsort [[-I|--no-inference]] [[-c|--columns]] [columns] [data.csv]
copy

SYNOPSIS

csvsort [OPTIONS] [FILE]
csvsort [OPTIONS] < FILE

PARAMETERS

-h, --help
    Show help message and exit.

-d DELIMITER, --delimiter DELIMITER
    Specify the delimiter of the input CSV file.

-t, --tabs
    Indicate that the input CSV file is delimited with tabs. Overrides -d.

-q QUOTECHAR, --quotechar QUOTECHAR
    Specify the quote character of the input CSV file.

-u {0,1,2,3}, --quoting {0,1,2,3}
    Control the inclusion of quotes around output fields (e.g., 0=QUOTE_NONE, 1=QUOTE_MINIMAL).

-b, --no-double-quote
    Prevent writing an extra quote each time a quote is encountered in a field.

-p PARSER_CLASS, --parser-class PARSER_CLASS
    Specify a custom class to use for parsing CSV files.

-z, --gzip
    Compress the output file with gzip.

--no-header-row
    Skip the header row when parsing input files.

-c COLUMNS, --columns COLUMNS
    A comma-separated list of column names or 1-based indices to sort by.

-r, --reverse
    Sort the data in descending order.

-l LOCALE, --locale LOCALE
    Specify the locale for sorting, affecting string comparison.

-k, --key
    Treat sort keys as column identifiers (default behavior).

-i, --ignore-case
    Perform a case-insensitive sort.

--zero
    When parsing, treat the 0th column as the first column (0-indexed).

DESCRIPTION

csvsort is a powerful command-line utility from the csvkit suite designed for sorting CSV (Comma Separated Values) files. It allows users to arrange data based on one or more specified columns, handling various CSV dialects, including tab-separated values. Unlike generic text sorting tools, csvsort understands the structure of CSV data, correctly parsing fields and respecting quoting rules. It supports both ascending and descending order, case-insensitive sorting, and locale-aware sorting. It can process files from standard input or a specified file path, outputting the sorted data to standard output, making it highly suitable for data pipelines.

CAVEATS

csvsort requires the csvkit Python package to be installed on the system. For extremely large files, memory consumption can be significant as the data is loaded into memory for sorting. Its performance is dependent on Python's CSV parsing and sorting capabilities.

COLUMN SPECIFICATION

Columns for sorting are precisely defined using the -c option. Users can specify columns by their header name (e.g., -c 'City,Population') or by their 1-based numerical index (e.g., -c '1,3'). The --zero option allows for 0-based indexing. When multiple columns are provided, csvsort performs a multi-level sort, sorting by the first specified column, then by the second for rows with identical values in the first, and so on.

STANDARD I/O SUPPORT

Adhering to the Unix philosophy, csvsort is designed to work seamlessly with standard input (stdin) and standard output (stdout). This enables powerful data processing pipelines, where the output of one command can be directly fed as input to csvsort using pipes (|), and its sorted output can then be directed to another command or redirected to a file.

HISTORY

The csvsort command is a component of the open-source csvkit suite, a collection of Python-based command-line tools for working with CSV files. Initiated by Christopher Groskopf around 2012, csvkit aims to bridge the gap between simple shell tools and complex scripting for data analysis, offering CSV-aware equivalents to common Unix utilities like sort, cut, and grep. It has evolved through community contributions to become a widely used tool for data professionals.

SEE ALSO

sort(1), csvcut(1), csvjoin(1), csvstack(1), csvkit(1)

Copied to clipboard