LinuxCommandLibrary

csvstat

Summarize and analyze CSV data

TLDR

Show all stats for all columns

$ csvstat [data.csv]
copy

Show all stats for columns 2 and 4
$ csvstat [[-c|--columns]] [2,4] [data.csv]
copy

Show sums for all columns
$ csvstat --sum [data.csv]
copy

Show the max value length for column 3
$ csvstat [[-c|--columns]] [3] --len [data.csv]
copy

Show the number of unique values in the "name" column
$ csvstat [[-c|--columns]] [name] --unique [data.csv]
copy

SYNOPSIS

csvstat [OPTION...] [FILE]

PARAMETERS

-d DELIM, --delimiter DELIM
    Field delimiter (default: comma)

-t, --tabs
    Use tab as delimiter

--lb L, --line-breaks L
    Custom line break sequence

-q Q, --quotechar Q
    Quote character (default: ")

--escapechar E
    Escape character

--maxfieldsize N
    Max field size in bytes

--quoting QUOTING
    Quoting style (e.g., quote_minimal)

--fieldsize-limit N
    Max bytes per field

-u, --unicode
    Use Unicode in output

--blanks
    Treat blanks as empty, not NULL

--null NULL
    String to treat as NULL (default: empty)

--skipinitialspace
    Skip whitespace after delimiter

--maxrows N
    Max rows to read

--samplerows N
    Rows to sample for type inference

-H, --no-header-row
    Ignore header row

-c COLS, --columns COLS
    Comma-separated columns to analyze

--freq
    Show frequency counts

--count
    Show value counts

--min
    Show minimum values

--max
    Show maximum values

--mean
    Show mean values

--median
    Show median values

--sum
    Show sum of values

--stddev
    Show standard deviation

--len
    Show value lengths

--type
    Show inferred types

--unique
    Show unique value counts

DESCRIPTION

csvstat is a powerful command-line utility from the csvkit suite for analyzing CSV files. It computes and displays key statistics for each column, including row count, minimum/maximum values, mean, median, standard deviation, sum, unique value counts, null counts, and inferred data types. By default, it processes all columns and outputs a formatted table.

Ideal for data exploration, quality assessment, and quick summaries, it supports customization via column selection, specific metrics, and parsing options like delimiters or quoting styles. Input can come from files or stdin, making it pipeline-friendly with tools like csvcut. For large datasets, options like --maxrows enable sampling.

Unlike spreadsheet software, csvstat excels in automation and scripting, providing reproducible insights directly in the terminal.

CAVEATS

Heuristic type inference may fail on mixed data; not optimized for massive files without --maxrows; requires well-formed CSV.

BASIC USAGE

csvstat data.csv — Full column stats.
csvstat -c 1,2 data.csv — Stats for columns 1 and 2.

PIPING EXAMPLE

csvcut -c name,age data.csv | csvstat --freq — Frequencies after column selection.

HISTORY

Part of csvkit, developed by Christopher F. Miller starting ~2010. Evolved for data journalism; now at version 2.x with Python 3 support.

SEE ALSO

csvlook(1), csvcut(1), csvkit(1), csvstat(1), awk(1)

Copied to clipboard