trdsql
Query data from various formats using SQL
TLDR
Convert object data from multiple JSON files to a CSV file with header (-oh) and double quote
Interpret JSON list as a table and put objects inside as columns (path/to/file.json: {"list":[{"age":"26","name":"Tanaka"}]})
Manipulate complex SQL query with data from multiple CSV files with first line is header (-ih)
Merge content of 2 CSV files to one CSV file
Connect to PostgreSQL database
Create table data to MySQL database from CSV file
Show data from compress log files
SYNOPSIS
trdsql [OPTIONS] "<SQL QUERY>" [FILE...]
or
data | trdsql [OPTIONS] "<SQL QUERY>"
PARAMETERS
-i, --input-format <FORMAT>
Specifies the input data format (e.g., CSV, JSON, LTSV, TBLN, RAW). Defaults to `CSV`.
-o, --output-format <FORMAT>
Specifies the desired output data format (e.g., CSV, JSON, JSONL, AT for aligned text, GFM for GitHub Flavored Markdown table). Defaults to `AT`.
-d, --delimiter <CHAR>
Sets the field delimiter character for CSV/TSV input. Can be a single character or a Go string literal like `\t`.
-H, --header
Indicates that the first row of input should be treated as a header, using its values as column names.
-ih, --without-header
Treats input as having no header row; columns will be automatically named c1, c2, etc.
-oh, --out-header
Includes column headers in the output. Useful with formats like CSV or AT.
-E, --encoding <ENCODING>
Specifies the input character encoding (e.g., sjis, eucjp, utf8). Defaults to `utf8`.
-q, --quiet
Suppresses informational messages and outputs only the query results.
-dl, --driver <DRIVER>
Specifies the database driver to use (sqlite3 (default), postgres, mysql). Required for external connections.
-dsn, --dsn <DSN>
Provides the Data Source Name for connecting to an external database (e.g., `user:password@tcp(host:port)/dbname`).
-p, --parallel
Enables parallel processing for multiple input files, potentially speeding up large queries.
-v, --version
Displays the trdsql version information and exits.
DESCRIPTION
trdsql is a powerful command-line tool that enables users to query various structured text data formats using standard SQL. It treats input files such as CSV, TSV, JSON, LTSV, and even data piped from standard input, as virtual tables within an in-memory database. By default, trdsql leverages SQLite, providing a fast and efficient way to perform complex data manipulations, including filtering, sorting, aggregation, and joining across different files or data streams, directly from the terminal. Beyond its default SQLite backend, trdsql can connect to external databases like PostgreSQL and MySQL, acting as a versatile bridge between flat files and full-fledged relational database systems. It also offers flexible output options, allowing results to be formatted as CSV, JSON, LTSV, aligned text (AT), and more, making it an indispensable utility for data analysis, transformation, and reporting in any Unix-like environment.
CAVEATS
When using the default in-memory SQLite database, processing exceptionally large files or datasets may consume significant system RAM. Connecting to external databases requires careful configuration of the DSN string and ensuring the respective database drivers are correctly set up and accessible by trdsql. While highly flexible, for extremely simple, single-purpose text manipulations, specialized tools like awk or sed might offer marginally faster execution, though with significantly less SQL-like power and versatility.
SUPPORTED DATA FORMATS
trdsql supports a wide array of input and output formats, making it highly adaptable for diverse data processing tasks. Input formats include CSV, TSV, JSON, JSONL (JSON Lines), LTSV (Labeled Tab-Separated Values), TBLN, and RAW. For output, it extends its capabilities to include CSV, TSV, JSON, JSONL, LTSV, TBLN, AT (aligned text table for human readability), GFM (GitHub Flavored Markdown table), and RAW.
SQL DIALECT
When utilizing its default in-memory SQLite database, trdsql supports the widely used SQLite's SQL dialect. This means most standard SQL features, including SELECT, FROM, WHERE, GROUP BY, ORDER BY, JOIN operations, and various SQL functions, are fully available. When configured to connect to external databases such as PostgreSQL or MySQL using the -dl option, trdsql leverages their respective SQL dialects, ensuring broad compatibility with existing database queries and schemas.
HISTORY
trdsql was initiated by Noboru Iwamatsu, with its first public commits appearing in 2018. Developed primarily in Go, it was conceived as a high-performance, self-contained command-line utility capable of executing SQL queries directly on various text-based data formats. Its creation addressed a growing demand for a flexible command-line data processing tool that could leverage the familiarity and power of SQL without necessitating a full-blown database installation, evolving to incorporate extensive support for multiple input/output formats and external database connections.