xsv
Slice, index, filter, and join CSV data
TLDR
Inspect the headers of a file
Count the number of entries
Get an overview of the shape of entries
Select a few columns
Show 10 random entries
Join a column from one file to another
SYNOPSIS
xsv subcommand [options] [arguments]
Common Subcommands and their basic syntax:
xsv cat [--no-headers] {rows|columns} {file...}
xsv count [--delimiter
xsv filter [--delimiter
xsv flatten [--delimiter
xsv headers [--delimiter
xsv index [--delimiter
xsv select [--delimiter
xsv sort [--delimiter
xsv stats [--delimiter
xsv validate [--delimiter
Note: This is a simplified representation. Each subcommand has its own specific options and arguments.
PARAMETERS
--help
Displays general help information or help for a specific subcommand (e.g., `xsv subcommand --help`).
--version
Shows the `xsv` version information.
-d
Specifies the field delimiter character to use (e.g., `,`, `\t`, `|`). Defaults to comma.
-j
Sets the number of parallel jobs (threads) to run for operations that can be parallelized.
-n, --no-headers
Treats the first row of data as a regular row rather than a header row.
-o
Writes the output of the command to the specified `file` instead of standard output.
-s
Used by some subcommands to select columns by name or 1-based index (e.g., `Col1,Col2`, `1-3,5`).
--sniff-limit
Limits the number of bytes to sniff when inferring the delimiter and whether or not there are headers.
DESCRIPTION
`xsv` is a modern, high-performance command-line utility for working with CSV (Comma Separated Values) data.
Written in Rust, it emphasizes speed and efficiency, making it particularly suitable for processing large datasets.
It provides a comprehensive suite of subcommands to perform common CSV manipulation tasks directly from the terminal, including selection, filtering, sorting, joining, indexing, and analysis.
`xsv` aims to be a fast and reliable alternative to traditional scripting with `awk`, `cut`, or `grep` when dealing specifically with tabular data in CSV format.
Its capabilities extend from basic viewing to complex transformations, making it an invaluable tool for data scientists, developers, and anyone regularly handling structured text files.
CAVEATS
While `xsv` is generally very efficient, some operations, especially those involving sorting or joining large datasets without prior indexing, might consume significant memory or disk space.
Indexing large files with `xsv index` can be resource-intensive initially but greatly speeds up subsequent operations like `select` or `filter`.
It strictly adheres to the CSV format, which means malformed CSV files might cause unexpected behavior or errors, although it has robust error handling for common issues.
<B>SUBCOMMANDS OVERVIEW</B>
`xsv` is designed around a subcommand model, where each subcommand performs a specific task.
This modular approach allows users to chain operations using standard Unix pipes (`|`), creating powerful data processing pipelines.
Common subcommands include `select` (choose columns), `filter` (filter rows based on patterns), `sort` (sort rows), `join` (merge two CSV files), `index` (create an index for faster lookups), and `stats` (calculate statistics).
<B>PERFORMANCE FOCUS</B>
A core design principle of `xsv` is performance.
By being written in Rust, it achieves remarkable speed, often outperforming tools written in other languages for large datasets.
This makes `xsv` an excellent choice for batch processing or integrating into scripts where speed is critical.
HISTORY
`xsv` was developed by BurntSushi (Andrew Gallant), a prominent Rust developer, with a strong focus on performance and correctness.
It was created to provide a robust and extremely fast alternative to existing CSV processing tools, leveraging Rust's memory safety and concurrency features.
First released around 2015, its development has been driven by the need for efficient handling of large CSV files, quickly gaining popularity in the data science and bioinformatics communities for its speed and reliability.
SEE ALSO
csvkit (Python-based CSV utilities), awk(1) (pattern scanning and processing language), cut(1) (remove sections from each line of files), grep(1) (print lines matching a pattern), sed(1) (stream editor for filtering and transforming text), mlr(1) (Miller, like sed/awk/cut/join/sort for name-indexed data)