mlr
Process, reshape, and analyze tabular data
TLDR
Pretty-print a CSV file in a tabular format
Receive JSON data and pretty print the output
Sort alphabetically on a field
Sort in descending numerical order on a field
Convert CSV to JSON, performing calculations and display those calculations
Receive JSON and format the output as vertical JSON
Filter lines of a compressed CSV file treating numbers as [S]trings
SYNOPSIS
mlr [global options] {verb} [verb options] [input files ...]
PARAMETERS
-h, --help
Display help for mlr or a specific verb.
-v, --version
Print mlr version information.
-I, --inplace
Edit files in place (requires a backup unless disabled).
-N, --no-auto-header
Treat the first line of input as data, not as a header row.
-S {count}, --skip-header-lines {count}
Skip count lines from the start of input before processing.
-i {format}, --iformat {format}
Specify input format (e.g., csv, tsv, json, jsonl, nidx, raw). mlr often infers this.
-o {format}, --oformat {format}
Specify output format (e.g., csv, tsv, json, jsonl, nidx, pprint, xtab). Default matches input format.
-C {char}, --ifs {char}
Set input field separator character. Default depends on the input format (e.g., comma for CSV, tab for TSV).
-c {char}, --ofs {char}
Set output field separator character. Default depends on the output format.
-F {char}, --irs {char}
Set input record separator character. Default is newline for most formats.
-f {char}, --ors {char}
Set output record separator character. Default is newline for most formats.
-T, --stack-traces
Print full stack traces on errors, useful for debugging complex scripts.
DESCRIPTION
mlr, or Miller, is a powerful command-line tool designed for processing delimited data files such as CSV, TSV, and JSON. It functions similarly to traditional Unix tools like awk, sed, cut, sort, join, and grep, but is specifically optimized for structured data.
mlr operates on "records" (lines) and "fields" (columns), allowing users to filter, reformat, sort, join, and aggregate data using a powerful, verb-based syntax. It automatically infers data types and handles various input/output formats, making it an indispensable tool for data manipulation, scripting, and exploratory data analysis directly from the terminal. Its strength lies in its ability to handle header rows and refer to fields by name, significantly simplifying complex data transformations.
CAVEATS
While powerful, mlr has a steeper learning curve than simpler tools like cut or grep due to its verb-based syntax and field-aware operations. For very large datasets, some memory-intensive operations (e.g., sorting, joining with large keys) might consume significant RAM, as mlr often loads records into memory for processing. It excels with flat, delimited files; highly nested JSON or complex XML structures are not its primary focus, although its JSON support is robust for arrays of objects.
VERB PARADIGM
mlr operates using a "verb" paradigm, which is central to its design. Instead of a single command with many flags, you specify an action (a verb) followed by options specific to that verb. This modular design makes mlr highly flexible, extensible, and powerful for chaining operations. Common verbs include cat (display), cut (select/reorder fields), sort (sort by field), filter (select records), stats (aggregate data), join (merge data from multiple files), group-by (group records by field values), rename (rename fields), and put (create/modify fields using expressions).
HISTORY
mlr (Miller) was created by John Kerl, with its first public release appearing around 2013-2014. It was designed to address the limitations of traditional Unix command-line tools when processing structured data formats like CSV and TSV, offering a more robust and field-aware alternative. Its development has been consistently active, continuously adding new verbs, input/output formats, and performance optimizations, establishing it as an indispensable tool for data scientists, system administrators, and anyone working with structured text data directly from the terminal.