LinuxCommandLibrary

mlr

Process, reshape, and analyze tabular data

TLDR

Pretty-print a CSV file in a tabular format

$ mlr --icsv --opprint cat [example.csv]
copy

Receive JSON data and pretty print the output
$ echo '{"hello":"world"}' | mlr --ijson --opprint cat
copy

Sort alphabetically on a field
$ mlr --icsv --opprint sort -f [field] [example.csv]
copy

Sort in descending numerical order on a field
$ mlr --icsv --opprint sort -nr [field] [example.csv]
copy

Convert CSV to JSON, performing calculations and display those calculations
$ mlr --icsv --ojson put '$[newField1] = $[oldFieldA]/$[oldFieldB]' [example.csv]
copy

Receive JSON and format the output as vertical JSON
$ echo '{"hello":"world", "foo":"bar"}' | mlr --ijson --ojson --jvstack cat
copy

Filter lines of a compressed CSV file treating numbers as [S]trings
$ mlr --prepipe 'gunzip' [[-c|--csv]] filter [[-S|--infer-none]] '$[fieldName] =~ "[regular_expression]"' [example.csv.gz]
copy

SYNOPSIS

mlr [global options] verb [verb options] [file...]

PARAMETERS

--csv
    Input and output data as CSV.

--tsv
    Input and output data as TSV.

--json
    Input and output data as JSON.

--jsonl
    Input and output data as JSON Lines.

-n
    Don't use headers.

filter 'expression'
    Filter records based on an expression.

put 'statement'
    Evaluate statements for each record to compute fields.

cut -f field1,field2,...
    Select specified fields.

stats
    Calculate descriptive statistics.

uniq
    Remove adjacent duplicate records. Add -c to count how many records was grouped.

DESCRIPTION

Miller (mlr) is a command-line data processing tool for working with CSV, TSV, JSON, JSON Lines, and other formats. It provides a powerful set of operations for filtering, grouping, aggregating, reformatting, and reshaping data. It is designed to be a more expressive and versatile alternative to tools like `awk`, `sed`, `cut`, `join`, and similar utilities. Miller aims to simplify data manipulation workflows by offering a single, unified interface for various data formats and tasks, focusing on efficiency and usability.

Miller is useful for tasks ranging from simple data extraction and reformatting to complex data analysis and reporting. Its key features include support for a wide range of data formats, a comprehensive set of built-in functions, and a flexible syntax for specifying data processing operations. It's particularly well-suited for processing large datasets, thanks to its efficient memory management and optimized algorithms. Miller leverages data format autodetection to facilitate user experience.

EXAMPLE USES

To cut specific fields from a csv file:

mlr --csv cut -f field1,field2 input.csv

To filter records where field equals 'value':

mlr filter '$field == "value"' input.json

To calculate the sum of a specific field:

mlr stats -a sum -f field input.tsv

HISTORY

Miller has been developed over several years, with a focus on providing a user-friendly and efficient command-line data processing tool. The project aimed to address the limitations of existing utilities like `awk`, `sed`, and `cut` by offering a more powerful and versatile solution for working with various data formats. It has gained popularity among data scientists, engineers, and system administrators for its ease of use, performance, and comprehensive feature set. Miller has been continuously improved and expanded, with regular updates and contributions from the open-source community.

SEE ALSO

awk(1), sed(1), cut(1), join(1)

Copied to clipboard