LinuxCommandLibrary

trdsql

Query data from various formats using SQL

TLDR

Convert object data from multiple JSON files to a CSV file with header (-oh) and double quote

$ trdsql -ocsv -oh "SELECT * FROM [path/to/file/*.json]" | sed 's/\([^,]*\)/"&"/g' > [path/to/file.csv]
copy

Interpret JSON list as a table and put objects inside as columns (path/to/file.json: {"list":[{"age":"26","name":"Tanaka"}]})
$ trdsql "SELECT * FROM [path/to/file.json]::.list"
copy

Manipulate complex SQL query with data from multiple CSV files with first line is header (-ih)
$ trdsql -icsv -ih "SELECT [column1,column2] FROM [path/to/file*.csv] WHERE column2 != '' ORDER BY column1 GROUP BY column1"
copy

Merge content of 2 CSV files to one CSV file
$ trdsql "SELECT [column1,colum2] FROM [path/to/file1.csv] UNION SELECT [column1,column2] FROM [path/to/file2.csv]"
copy

Connect to PostgreSQL database
$ trdsql -driver postgres -dsn "host=[hostname] port=[5433] dbname=[database_name]" "SELECT 1"
copy

Create table data to MySQL database from CSV file
$ trdsql -driver mysql -dsn "[username]:[password]@[hostname]/[database]" -ih "CREATE TABLE [table] ([column1] int, [colum2] varchar(20)) AS SELECT [column3] AS [column1],[column2] FROM [path/to/header_file.csv]"
copy

Show data from compress log files
$ trdsql -iltsv "SELECT * FROM [path/to/access.log.gz]"
copy

SYNOPSIS

trdsql [OPTIONS] "<SQL QUERY>" [FILE...]
or
data | trdsql [OPTIONS] "<SQL QUERY>"

PARAMETERS

-i, --input-format <FORMAT>
    Specifies the input data format (e.g., CSV, JSON, LTSV, TBLN, RAW). Defaults to `CSV`.

-o, --output-format <FORMAT>
    Specifies the desired output data format (e.g., CSV, JSON, JSONL, AT for aligned text, GFM for GitHub Flavored Markdown table). Defaults to `AT`.

-d, --delimiter <CHAR>
    Sets the field delimiter character for CSV/TSV input. Can be a single character or a Go string literal like `\t`.

-H, --header
    Indicates that the first row of input should be treated as a header, using its values as column names.

-ih, --without-header
    Treats input as having no header row; columns will be automatically named c1, c2, etc.

-oh, --out-header
    Includes column headers in the output. Useful with formats like CSV or AT.

-E, --encoding <ENCODING>
    Specifies the input character encoding (e.g., sjis, eucjp, utf8). Defaults to `utf8`.

-q, --quiet
    Suppresses informational messages and outputs only the query results.

-dl, --driver <DRIVER>
    Specifies the database driver to use (sqlite3 (default), postgres, mysql). Required for external connections.

-dsn, --dsn <DSN>
    Provides the Data Source Name for connecting to an external database (e.g., `user:password@tcp(host:port)/dbname`).

-p, --parallel
    Enables parallel processing for multiple input files, potentially speeding up large queries.

-v, --version
    Displays the trdsql version information and exits.

DESCRIPTION

trdsql is a powerful command-line tool that enables users to query various structured text data formats using standard SQL. It treats input files such as CSV, TSV, JSON, LTSV, and even data piped from standard input, as virtual tables within an in-memory database. By default, trdsql leverages SQLite, providing a fast and efficient way to perform complex data manipulations, including filtering, sorting, aggregation, and joining across different files or data streams, directly from the terminal. Beyond its default SQLite backend, trdsql can connect to external databases like PostgreSQL and MySQL, acting as a versatile bridge between flat files and full-fledged relational database systems. It also offers flexible output options, allowing results to be formatted as CSV, JSON, LTSV, aligned text (AT), and more, making it an indispensable utility for data analysis, transformation, and reporting in any Unix-like environment.

CAVEATS

When using the default in-memory SQLite database, processing exceptionally large files or datasets may consume significant system RAM. Connecting to external databases requires careful configuration of the DSN string and ensuring the respective database drivers are correctly set up and accessible by trdsql. While highly flexible, for extremely simple, single-purpose text manipulations, specialized tools like awk or sed might offer marginally faster execution, though with significantly less SQL-like power and versatility.

SUPPORTED DATA FORMATS

trdsql supports a wide array of input and output formats, making it highly adaptable for diverse data processing tasks. Input formats include CSV, TSV, JSON, JSONL (JSON Lines), LTSV (Labeled Tab-Separated Values), TBLN, and RAW. For output, it extends its capabilities to include CSV, TSV, JSON, JSONL, LTSV, TBLN, AT (aligned text table for human readability), GFM (GitHub Flavored Markdown table), and RAW.

SQL DIALECT

When utilizing its default in-memory SQLite database, trdsql supports the widely used SQLite's SQL dialect. This means most standard SQL features, including SELECT, FROM, WHERE, GROUP BY, ORDER BY, JOIN operations, and various SQL functions, are fully available. When configured to connect to external databases such as PostgreSQL or MySQL using the -dl option, trdsql leverages their respective SQL dialects, ensuring broad compatibility with existing database queries and schemas.

HISTORY

trdsql was initiated by Noboru Iwamatsu, with its first public commits appearing in 2018. Developed primarily in Go, it was conceived as a high-performance, self-contained command-line utility capable of executing SQL queries directly on various text-based data formats. Its creation addressed a growing demand for a flexible command-line data processing tool that could leverage the familiarity and power of SQL without necessitating a full-blown database installation, evolving to incorporate extensive support for multiple input/output formats and external database connections.

SEE ALSO

q(1), csvkit(1), awk(1), jq(1), sqlite3(1)

Copied to clipboard