LinuxCommandLibrary

csvstat

csvstat

TLDR

Show all stats for all columns

$ csvstat [data.csv]
copy


Show all stats for columns 2 and 4
$ csvstat -c [2,4] [data.csv]
copy


Show sums for all columns
$ csvstat --sum [data.csv]
copy


Show the max value length for column 3
$ csvstat -c [3] --len [data.csv]
copy


Show the number of unique values in the "name" column
$ csvstat -c [name] --unique [data.csv]
copy

DESCRIPTION

usage: csvstat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]

[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [-n] [-c COLUMNS] [--type] [--nulls] [--unique] [--min] [--max] [--sum] [--mean] [--median] [--stdev] [--len] [--freq] [--freq-count FREQ_COUNT] [--count] [-y SNIFF_LIMIT] [FILE]

Print descriptive statistics for each column in a CSV file.

positional arguments:

FILE

The CSV file to operate on. If omitted, will accept input as piped data via STDIN.

optional arguments:

-h, --help

show this help message and exit

-d DELIMITER, --delimiter DELIMITER

Delimiting character of the input CSV file.

-t, --tabs

Specify that the input CSV file is delimited with tabs. Overrides "-d".

-q QUOTECHAR, --quotechar QUOTECHAR

Character used to quote strings in the input CSV file.

-u {0,1,2,3}, --quoting {0,1,2,3}

Quoting style used in the input CSV file. 0 = Quote Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 = Quote None.

-b, --no-doublequote

Whether or not double quotes are doubled in the input CSV file.

-p ESCAPECHAR, --escapechar ESCAPECHAR

Character used to escape the delimiter if --quoting 3 ("Quote None") is specified and to escape the QUOTECHAR if --no-doublequote is specified.

-z FIELD_SIZE_LIMIT, --maxfieldsize FIELD_SIZE_LIMIT

Maximum length of a single field in the input CSV file.

-e ENCODING, --encoding ENCODING

Specify the encoding of the input CSV file.

-S, --skipinitialspace

Ignore whitespace immediately following the delimiter.

-H, --no-header-row

Specify that the input CSV file has no header row. Will create default headers (a,b,c,...).

-K SKIP_LINES, --skip-lines SKIP_LINES

Specify the number of initial lines to skip before the header row (e.g. comments, copyright notices, empty rows).

-v, --verbose

Print detailed tracebacks when errors occur.

-l, --linenumbers

Insert a column of line numbers at the front of the output. Useful when piping to grep or as a simple primary key.

--zero

When interpreting or displaying column numbers, use zero-based numbering instead of the default 1-based numbering.

-V, --version

Display version information and exit.

--csv

Output results as a CSV, rather than text.

-n, --names

Display column names and indices from the input CSV and exit.

-c COLUMNS, --columns COLUMNS

A comma separated list of column indices, names or ranges to be examined, e.g. "1,id,3-5". Defaults to all columns.

--type

Only output data type.

--nulls

Only output whether columns contains nulls.

--unique

Only output counts of unique values.

--min

Only output smallest values.

--max

Only output largest values.

--sum

Only output sums.

--mean

Only output means.

--median

Only output medians.

--stdev

Only output standard deviations.

--len

Only output the length of the longest values.

--freq

Only output lists of frequent values.

--freq-count FREQ_COUNT

The maximum number of frequent values to display.

--count

Only output total row count.

-y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT

Limit CSV dialect sniffing to the specified number of bytes. Specify "0" to disable sniffing entirely.

SEE ALSO

The full documentation for csvstat is maintained as a Texinfo manual. If the info and csvstat programs are properly installed at your site, the command info csvstat should give you access to the complete manual.

Copied to clipboard