datamash
command-line tool for basic numeric and statistical operations
TLDR
Get max, min, mean, median of a column
SYNOPSIS
datamash [options] operation column [operation column...]
DESCRIPTION
datamash performs basic numeric, textual, and statistical operations on input data from the command line. It's designed for quick data analysis tasks that would otherwise require scripting or statistical software, supporting operations like sum, mean, median, standard deviation, variance, and more.
Input is read from stdin or files, with columns separated by whitespace or a specified delimiter. The tool can group data by fields and compute aggregate statistics for each group, similar to SQL's GROUP BY functionality.
datamash is part of the GNU project and excels at one-liners for data exploration. It's commonly used in pipelines to analyze CSV files, log data, or any tabular text data. The tool can handle both numeric and textual operations, including counting unique values, string operations, and random sampling.
PARAMETERS
-R, --round digits
Round numeric output to specified decimals--narm
Ignore NA and NaN values-t char
Use specified field separator-g, --group fields
Group by specified fields-H, --headers
First line is header
OPERATIONS
sum, min, max, mean, median
Basic statisticspstdev, sstdev
Population/sample standard deviationcount, unique, collapse
Counting and groupingfirst, last, rand
Selection operations
CAVEATS
Float numbers must use comma as decimal separator in some locales (use tr to convert). Part of GNU datamash. Column numbering starts at 1.
