textql

Execute SQL queries against text-based data

TLDR

Print the lines in the specified CSV file that match an SQL query to stdout

$ textql -sql "[SELECT * FROM filename]" [path/to/filename.csv]

Query a TSV file

$ textql -dlm=tab -sql "[SELECT * FROM filename]" [path/to/filename.tsv]

Query file with header row

$ textql -dlm=[delimiter] -header -sql "[SELECT * FROM filename]" [path/to/filename.csv]

Read data from stdin

$ cat [path/to/file] | textql -sql "[SELECT * FROM stdin]"

Join two files on a specified common column

$ textql -header -sql "SELECT * FROM [path/to/file1] JOIN [file2] ON [path/to/file1].[c1] = [file2].[c1] LIMIT [10]" -output-header [path/to/file1.csv] [path/to/file2.csv]

Format output using an output delimiter with an output header line

$ textql -output-dlm=[delimiter] -output-header -sql "SELECT [column] AS [alias] FROM [filename]" [path/to/filename.csv]

SYNOPSIS

textql [options] [--query <SQL query> | --sql <file>] [<input_file>...]
textql [options] <SQL query>

-header, --header
    Treat the first row of input files as column headers. These headers will be used as column names in SQL queries.

-dlm <char>, --delimiter <char>
    Specify the input field delimiter (e.g., ',' for CSV, '\t' for TSV). Defaults to comma.

-tab-table
    Shorthand for --delimiter='\t', indicating that input files are tab-separated.

-output-dlm <char>, --output-delimiter <char>
    Specify the output field delimiter. Defaults to the input delimiter if not explicitly set.

-output-header, --output-header
    Include column headers in the query output. Useful for generating new delimited files.

-source <file>..., --source <file>...
    Specify one or more input file paths for the query. Can be omitted if data is piped via stdin.

-query <query>, --query <query>
    The SQL query string to execute. It's recommended to enclose the query in quotes.

-sql <file>, --sql <file>
    Path to a file containing the SQL query to execute, allowing for complex queries.

-output-file <file>, -o <file>
    Direct the query output to the specified file instead of printing to stdout.

-db-path <path>, --db-path <path>
    Path to a SQLite database file. Useful for persistent tables, indexing, or more complex multi-step queries.

-save-to <table>, --save-to <table>
    Save the results of the query into a specified table within the database given by -db-path.

-console, --console
    Start an interactive SQLite console with the input files already loaded as temporary tables, for direct SQL interaction.

-pretty, --pretty
    Format output in a human-readable, tabular form, especially useful for displaying results directly in the terminal.

-version
    Print the textql version and exit.

DESCRIPTION

textql is a versatile command-line utility that enables users to execute SQL queries directly on CSV, TSV, or any other delimited text files. It eliminates the need for manual data imports into a traditional database by treating each input file as a virtual table. Built upon the robust SQLite engine, textql allows for powerful data manipulation, including filtering, sorting, aggregation, and joining across multiple files, using familiar SQL syntax.

It's an invaluable tool for ad-hoc data analysis, quick reporting, and automating data transformation tasks on flat files. Whether you need to extract specific columns, count occurrences, or combine data from several spreadsheets, textql provides a fast and efficient solution right from your terminal.

CAVEATS

While powerful, textql treats input files as temporary tables by default unless the --db-path option is used, meaning changes are not saved back to the original files directly. Its SQL dialect is based on SQLite, so specific functions or syntaxes from other SQL implementations (e.g., PostgreSQL, MySQL) might not be supported. Performance for extremely large files (gigabytes+) can be slower than dedicated database systems, though it generally performs well for typical command-line data processing tasks.

TABLE NAMING CONVENTION

When processing files, textql automatically assigns table names based on the input filename. For example, 'data.csv' becomes the table 'data' within the SQL context. If multiple files are provided, each corresponds to its own table name based on its base name (filename without extension).

PIPING INPUT

textql can accept input directly from stdin. When piping data, the input is typically treated as a single table named 'stdin'. This allows for chaining commands, e.g., 'cat file.csv | textql "SELECT * FROM stdin WHERE column > 10"'.

HISTORY

textql was initially developed by Paul Pellegrini and released as an open-source project written in Go. Its creation was motivated by the desire to bring the power and familiarity of SQL to the common use case of processing delimited text files directly from the command line, without the overhead of importing data into a full-fledged database. It quickly gained traction among data analysts and system administrators for its simplicity, efficiency, and cross-platform compatibility.