LinuxCommandLibrary

textql

Execute SQL queries against text-based data

TLDR

Print the lines in the specified CSV file that match an SQL query to stdout

$ textql -sql "[SELECT * FROM filename]" [path/to/filename.csv]
copy

Query a TSV file
$ textql -dlm=tab -sql "[SELECT * FROM filename]" [path/to/filename.tsv]
copy

Query file with header row
$ textql -dlm=[delimiter] -header -sql "[SELECT * FROM filename]" [path/to/filename.csv]
copy

Read data from stdin
$ cat [path/to/file] | textql -sql "[SELECT * FROM stdin]"
copy

Join two files on a specified common column
$ textql -header -sql "SELECT * FROM [path/to/file1] JOIN [file2] ON [path/to/file1].[c1] = [file2].[c1] LIMIT [10]" -output-header [path/to/file1.csv] [path/to/file2.csv]
copy

Format output using an output delimiter with an output header line
$ textql -output-dlm=[delimiter] -output-header -sql "SELECT [column] AS [alias] FROM [filename]" [path/to/filename.csv]
copy

SYNOPSIS

textql [options]

PARAMETERS

--input <file>
    Specifies the input file to query. If not provided, textql will try to read from STDIN.

--sql <query>
    The SQL query to execute against the data. Use single quotes to encapsulate your query to avoid shell interpretation.

--dl <delimiter>
    Sets the delimiter for the input file. Common delimiters include comma (,), tab (\t), and pipe (|). Defaults to comma.

--header
    Specifies that the first row of the input file should be treated as the header row, defining the column names. If not provided, columns will be named col1, col2, etc.

--output-table
    The name of the table you want to use to query your file.

--output-dl <delimiter>
    Delimiter for output, defaults to comma.

--max-lines <number>
    Sets the maximum number of lines to read from input to determine schema.

--verbose
    Enables verbose output, providing more information about the process.

--version
    Displays the version of textql.

DESCRIPTION

textql is a command-line tool that allows you to treat text-based data files like CSV, TSV, and log files as relational tables and query them using SQL.
It automatically infers the schema from the file contents, making it easy to extract, transform, and analyze data without complex scripting.
It helps you to analyze big amount of data in a simple way.

CAVEATS

textql relies on SQLite under the hood, so complex queries that may be slow in SQLite will also be slow in textql.
The automatic schema inference may not always be perfect, especially with complex or inconsistent data files.
You may need to manually adjust the delimiter or column names.

<B>EXAMPLES</B>

1. Querying a CSV file:
textql --input data.csv --sql 'SELECT * FROM data WHERE column1 > 10'

2. Using a tab-separated file:
textql --input data.tsv --dl '\t' --header --sql 'SELECT column2, column3 FROM data'

3. Reading from STDIN:
cat data.csv | textql --sql 'SELECT * FROM stdin WHERE col1 LIKE "A%"'

<B>PERFORMANCE CONSIDERATIONS</B>

For very large files, consider using the `--max-lines` parameter to limit the number of lines used for schema inference. Optimizing your SQL queries is also crucial for performance. Using indexes and avoiding full table scans can significantly improve query speed.

HISTORY

textql was created to provide a more intuitive way to query text-based data files compared to traditional tools like `awk` and `sed`.
It leverages the power and familiarity of SQL, making data analysis more accessible to a wider audience. The tool has gained popularity as a quick and easy solution for ad-hoc data exploration and transformation.

SEE ALSO

awk(1), sed(1), grep(1), sqlite3(1)

Copied to clipboard