csvsort
Sort CSV (Comma Separated Values) files
TLDR
Sort a CSV file by column 9
Sort a CSV file by the "name" column in descending order
Sort a CSV file by column 2, then by column 4
Sort a CSV file without inferring data types
SYNOPSIS
csvsort [OPTIONS] [FILE]
csvsort [OPTIONS] < FILE
PARAMETERS
-h, --help
Show help message and exit.
-d DELIMITER, --delimiter DELIMITER
Specify the delimiter of the input CSV file.
-t, --tabs
Indicate that the input CSV file is delimited with tabs. Overrides -d.
-q QUOTECHAR, --quotechar QUOTECHAR
Specify the quote character of the input CSV file.
-u {0,1,2,3}, --quoting {0,1,2,3}
Control the inclusion of quotes around output fields (e.g., 0=QUOTE_NONE, 1=QUOTE_MINIMAL).
-b, --no-double-quote
Prevent writing an extra quote each time a quote is encountered in a field.
-p PARSER_CLASS, --parser-class PARSER_CLASS
Specify a custom class to use for parsing CSV files.
-z, --gzip
Compress the output file with gzip.
--no-header-row
Skip the header row when parsing input files.
-c COLUMNS, --columns COLUMNS
A comma-separated list of column names or 1-based indices to sort by.
-r, --reverse
Sort the data in descending order.
-l LOCALE, --locale LOCALE
Specify the locale for sorting, affecting string comparison.
-k, --key
Treat sort keys as column identifiers (default behavior).
-i, --ignore-case
Perform a case-insensitive sort.
--zero
When parsing, treat the 0th column as the first column (0-indexed).
DESCRIPTION
csvsort is a powerful command-line utility from the csvkit suite designed for sorting CSV (Comma Separated Values) files. It allows users to arrange data based on one or more specified columns, handling various CSV dialects, including tab-separated values. Unlike generic text sorting tools, csvsort understands the structure of CSV data, correctly parsing fields and respecting quoting rules. It supports both ascending and descending order, case-insensitive sorting, and locale-aware sorting. It can process files from standard input or a specified file path, outputting the sorted data to standard output, making it highly suitable for data pipelines.
CAVEATS
csvsort requires the csvkit Python package to be installed on the system. For extremely large files, memory consumption can be significant as the data is loaded into memory for sorting. Its performance is dependent on Python's CSV parsing and sorting capabilities.
COLUMN SPECIFICATION
Columns for sorting are precisely defined using the -c option. Users can specify columns by their header name (e.g., -c 'City,Population') or by their 1-based numerical index (e.g., -c '1,3'). The --zero option allows for 0-based indexing. When multiple columns are provided, csvsort performs a multi-level sort, sorting by the first specified column, then by the second for rows with identical values in the first, and so on.
STANDARD I/O SUPPORT
Adhering to the Unix philosophy, csvsort is designed to work seamlessly with standard input (stdin) and standard output (stdout). This enables powerful data processing pipelines, where the output of one command can be directly fed as input to csvsort using pipes (|), and its sorted output can then be directed to another command or redirected to a file.
HISTORY
The csvsort command is a component of the open-source csvkit suite, a collection of Python-based command-line tools for working with CSV files. Initiated by Christopher Groskopf around 2012, csvkit aims to bridge the gap between simple shell tools and complex scripting for data analysis, offering CSV-aware equivalents to common Unix utilities like sort, cut, and grep. It has evolved through community contributions to become a widely used tool for data professionals.