csvstat

Summarize and analyze CSV data

TLDR

Show all stats for all columns

$ csvstat [data.csv]

Show all stats for columns 2 and 4

$ csvstat [[-c|--columns]] [2,4] [data.csv]

Show sums for all columns

$ csvstat --sum [data.csv]

Show the max value length for column 3

$ csvstat [[-c|--columns]] [3] --len [data.csv]

Show the number of unique values in the "name" column

$ csvstat [[-c|--columns]] [name] --unique [data.csv]

-d DELIM, --delimiter DELIM
    Field delimiter (default: comma)

-t, --tabs
    Use tab as delimiter

--lb L, --line-breaks L
    Custom line break sequence

-q Q, --quotechar Q
    Quote character (default: ")

--escapechar E
    Escape character

--maxfieldsize N
    Max field size in bytes

--quoting QUOTING
    Quoting style (e.g., quote_minimal)

--fieldsize-limit N
    Max bytes per field

-u, --unicode
    Use Unicode in output

--blanks
    Treat blanks as empty, not NULL

--null NULL
    String to treat as NULL (default: empty)

--skipinitialspace
    Skip whitespace after delimiter

--maxrows N
    Max rows to read

--samplerows N
    Rows to sample for type inference

-H, --no-header-row
    Ignore header row

-c COLS, --columns COLS
    Comma-separated columns to analyze

--freq
    Show frequency counts

--count
    Show value counts

--min
    Show minimum values

--max
    Show maximum values

--mean
    Show mean values

--median
    Show median values

--sum
    Show sum of values

--stddev
    Show standard deviation

--len
    Show value lengths

--type
    Show inferred types

--unique
    Show unique value counts

DESCRIPTION

csvstat is a powerful command-line utility from the csvkit suite for analyzing CSV files. It computes and displays key statistics for each column, including row count, minimum/maximum values, mean, median, standard deviation, sum, unique value counts, null counts, and inferred data types. By default, it processes all columns and outputs a formatted table.

Ideal for data exploration, quality assessment, and quick summaries, it supports customization via column selection, specific metrics, and parsing options like delimiters or quoting styles. Input can come from files or stdin, making it pipeline-friendly with tools like csvcut. For large datasets, options like --maxrows enable sampling.

Unlike spreadsheet software, csvstat excels in automation and scripting, providing reproducible insights directly in the terminal.

csvstat

Summarize and analyze CSV data

TLDR

SYNOPSIS

PARAMETERS

DESCRIPTION

CAVEATS

BASIC USAGE

PIPING EXAMPLE

HISTORY

SEE ALSO