csvstat
Summarize and analyze CSV data
TLDR
Show all stats for all columns
Show all stats for columns 2 and 4
Show sums for all columns
Show the max value length for column 3
Show the number of unique values in the "name" column
SYNOPSIS
csvstat [OPTION...] [FILE]
PARAMETERS
-d DELIM, --delimiter DELIM
Field delimiter (default: comma)
-t, --tabs
Use tab as delimiter
--lb L, --line-breaks L
Custom line break sequence
-q Q, --quotechar Q
Quote character (default: ")
--escapechar E
Escape character
--maxfieldsize N
Max field size in bytes
--quoting QUOTING
Quoting style (e.g., quote_minimal)
--fieldsize-limit N
Max bytes per field
-u, --unicode
Use Unicode in output
--blanks
Treat blanks as empty, not NULL
--null NULL
String to treat as NULL (default: empty)
--skipinitialspace
Skip whitespace after delimiter
--maxrows N
Max rows to read
--samplerows N
Rows to sample for type inference
-H, --no-header-row
Ignore header row
-c COLS, --columns COLS
Comma-separated columns to analyze
--freq
Show frequency counts
--count
Show value counts
--min
Show minimum values
--max
Show maximum values
--mean
Show mean values
--median
Show median values
--sum
Show sum of values
--stddev
Show standard deviation
--len
Show value lengths
--type
Show inferred types
--unique
Show unique value counts
DESCRIPTION
csvstat is a powerful command-line utility from the csvkit suite for analyzing CSV files. It computes and displays key statistics for each column, including row count, minimum/maximum values, mean, median, standard deviation, sum, unique value counts, null counts, and inferred data types. By default, it processes all columns and outputs a formatted table.
Ideal for data exploration, quality assessment, and quick summaries, it supports customization via column selection, specific metrics, and parsing options like delimiters or quoting styles. Input can come from files or stdin, making it pipeline-friendly with tools like csvcut. For large datasets, options like --maxrows enable sampling.
Unlike spreadsheet software, csvstat excels in automation and scripting, providing reproducible insights directly in the terminal.
CAVEATS
Heuristic type inference may fail on mixed data; not optimized for massive files without --maxrows; requires well-formed CSV.
BASIC USAGE
csvstat data.csv — Full column stats.
csvstat -c 1,2 data.csv — Stats for columns 1 and 2.
PIPING EXAMPLE
csvcut -c name,age data.csv | csvstat --freq — Frequencies after column selection.
HISTORY
Part of csvkit, developed by Christopher F. Miller starting ~2010. Evolved for data journalism; now at version 2.x with Python 3 support.


