LinuxCommandLibrary

xsv

Slice, index, filter, and join CSV data

TLDR

Inspect the headers of a file

$ xsv headers [path/to/file.csv]
copy

Count the number of entries
$ xsv count [path/to/file.csv]
copy

Get an overview of the shape of entries
$ xsv stats [path/to/file.csv] | xsv table
copy

Select a few columns
$ xsv select [column1,column2] [path/to/file.csv]
copy

Show 10 random entries
$ xsv sample [10] [path/to/file.csv]
copy

Join a column from one file to another
$ xsv join --no-case [column1] [path/to/file1.csv] [column2] [path/to/file2.csv] | xsv table
copy

SYNOPSIS

xsv subcommand [options] [arguments]

Common Subcommands and their basic syntax:
xsv cat [--no-headers] {rows|columns} {file...}
xsv count [--delimiter ] [file]
xsv filter [--delimiter ] {pattern} [file]
xsv flatten [--delimiter ] [file]
xsv headers [--delimiter ] [file]
xsv index [--delimiter ] [file]
xsv select [--delimiter ] {column...} [file]
xsv sort [--delimiter ] [--select ] [file]
xsv stats [--delimiter ] [--everything] [file]
xsv validate [--delimiter ] [file]

Note: This is a simplified representation. Each subcommand has its own specific options and arguments.

PARAMETERS

--help
    Displays general help information or help for a specific subcommand (e.g., `xsv subcommand --help`).

--version
    Shows the `xsv` version information.

-d , --delimiter
    Specifies the field delimiter character to use (e.g., `,`, `\t`, `|`). Defaults to comma.

-j , --jobs
    Sets the number of parallel jobs (threads) to run for operations that can be parallelized.

-n, --no-headers
    Treats the first row of data as a regular row rather than a header row.

-o , --output
    Writes the output of the command to the specified `file` instead of standard output.

-s , --select
    Used by some subcommands to select columns by name or 1-based index (e.g., `Col1,Col2`, `1-3,5`).

--sniff-limit
    Limits the number of bytes to sniff when inferring the delimiter and whether or not there are headers.

DESCRIPTION

`xsv` is a modern, high-performance command-line utility for working with CSV (Comma Separated Values) data.
Written in Rust, it emphasizes speed and efficiency, making it particularly suitable for processing large datasets.
It provides a comprehensive suite of subcommands to perform common CSV manipulation tasks directly from the terminal, including selection, filtering, sorting, joining, indexing, and analysis.
`xsv` aims to be a fast and reliable alternative to traditional scripting with `awk`, `cut`, or `grep` when dealing specifically with tabular data in CSV format.
Its capabilities extend from basic viewing to complex transformations, making it an invaluable tool for data scientists, developers, and anyone regularly handling structured text files.

CAVEATS

While `xsv` is generally very efficient, some operations, especially those involving sorting or joining large datasets without prior indexing, might consume significant memory or disk space.
Indexing large files with `xsv index` can be resource-intensive initially but greatly speeds up subsequent operations like `select` or `filter`.
It strictly adheres to the CSV format, which means malformed CSV files might cause unexpected behavior or errors, although it has robust error handling for common issues.

<B>SUBCOMMANDS OVERVIEW</B>

`xsv` is designed around a subcommand model, where each subcommand performs a specific task.
This modular approach allows users to chain operations using standard Unix pipes (`|`), creating powerful data processing pipelines.
Common subcommands include `select` (choose columns), `filter` (filter rows based on patterns), `sort` (sort rows), `join` (merge two CSV files), `index` (create an index for faster lookups), and `stats` (calculate statistics).

<B>PERFORMANCE FOCUS</B>

A core design principle of `xsv` is performance.
By being written in Rust, it achieves remarkable speed, often outperforming tools written in other languages for large datasets.
This makes `xsv` an excellent choice for batch processing or integrating into scripts where speed is critical.

HISTORY

`xsv` was developed by BurntSushi (Andrew Gallant), a prominent Rust developer, with a strong focus on performance and correctness.
It was created to provide a robust and extremely fast alternative to existing CSV processing tools, leveraging Rust's memory safety and concurrency features.
First released around 2015, its development has been driven by the need for efficient handling of large CSV files, quickly gaining popularity in the data science and bioinformatics communities for its speed and reliability.

SEE ALSO

csvkit (Python-based CSV utilities), awk(1) (pattern scanning and processing language), cut(1) (remove sections from each line of files), grep(1) (print lines matching a pattern), sed(1) (stream editor for filtering and transforming text), mlr(1) (Miller, like sed/awk/cut/join/sort for name-indexed data)

Copied to clipboard