LinuxCommandLibrary

q

Quit a program or utility

TLDR

Query a CSV file by specifying the delimiter as ','

$ q [[-d|--delimiter]] ',' "SELECT * from [path/to/file]"
copy

Query a TSV file
$ q [[-t|--tab-delimited]] "SELECT * from [path/to/file]"
copy

Query file with header row
$ q [[-d|--delimiter]] [delimiter] [[-H|--skip-header]] "SELECT * from [path/to/file]"
copy

Read data from stdin; '-' in the query represents the data from stdin
$ [output] | q "select * from -"
copy

Join two files (aliased as f1 and f2 in the example) on column c1, a common column
$ q "SELECT * FROM [path/to/file] f1 JOIN [path/to/other_file] f2 ON (f1.c1 = f2.c1)"
copy

Format output using an output delimiter with an output header line (Note: Command will output column names based on the input file header or the column aliases overridden in the query)
$ q [[-D|--output-delimiter]] [delimiter] [[-O|--output-header]] "SELECT [column] as [alias] from [path/to/file]"
copy

SYNOPSIS

q [options] "SQL_QUERY" [file ...]

PARAMETERS

-H, --header
    Assume input files have a header row. This is often the default for CSV/TSV.

-N, --skip-header
    Assume input files have no header row.

-d DELIMITER, --delimiter DELIMITER
    Specify the input column delimiter (e.g., , for CSV, \t for TSV, | for pipe-separated).

-O DELIMITER, --output-delimiter DELIMITER
    Specify the output column delimiter. Defaults to the input delimiter if not specified.

-t, --tsv
    A shortcut to set the input delimiter to tab (\t).

-c COLUMNS, --columns COLUMNS
    Manually specify column names (e.g., id,name,value) when no header is present or to override detected headers.

--as-json
    Treat input files as JSON. q can parse JSON arrays of objects.

--from-file FILE
    Read the SQL query from the specified file instead of directly from the command line.

--low-memory
    Enable low memory mode, useful for processing very large files by reducing in-memory caching.

--no-header-output
    Suppress the header row in the output, useful when piping results to other tools.

DESCRIPTION

The q command is a powerful command-line tool that allows users to execute SQL-like queries directly on various text files, treating them as if they were tables in a relational database. Developed by Harel Ben-Attia, it provides an intuitive way to filter, transform, and analyze data from CSV, TSV, JSON, Apache logs, and other delimited or structured text formats, without requiring data to be loaded into an actual database system.

By leveraging a subset of the SQL-92 standard, q enables complex data manipulation operations such as SELECT with various clauses (WHERE, GROUP BY, ORDER BY, LIMIT), JOIN operations across multiple files, and the use of aggregate functions like COUNT, SUM, AVG, MIN, and MAX. It intelligently handles common scenarios like header detection, various delimiters, and can infer column types. Its ability to process data directly from standard input and output results to standard output makes it highly composable within typical Unix pipelines, integrating seamlessly with other command-line utilities.

This tool is particularly useful for system administrators, data analysts, and developers who frequently work with log files, configuration files, or other forms of structured text data. It offers a quick and efficient method for ad-hoc data analysis, reporting, and transformation tasks, effectively bridging the gap between simple text processing utilities and full-fledged database management systems. While not a complete database, its 'text as a database' paradigm simplifies many common data wrangling challenges.

CAVEATS

q is not a full-fledged relational database management system. It lacks features such as persistent storage, transactions, indexing for query optimization, and complex data types. While efficient for many tasks, performance on extremely large datasets with very complex JOIN operations might not match optimized database engines. It requires Python and the q Python package to be installed on the system.

INPUT AND OUTPUT FLEXIBILITY

q excels in its versatility with input and output. It can process data from standard input (stdin) or multiple specified files, automatically detecting common delimiters like comma, tab, or space. For output, it typically writes to standard output (stdout), allowing seamless integration into shell pipelines for further processing with tools like awk, sed, or jq.

SQL-92 COMPLIANCE

While not a complete SQL implementation, q supports a significant subset of the SQL-92 standard. This includes SELECT statements with clauses like FROM, WHERE, GROUP BY, ORDER BY, LIMIT, HAVING, UNION ALL, and DISTINCT. It also supports aggregate functions (e.g., COUNT, SUM, AVG), and cross-file JOIN operations, enabling powerful data correlation across different text sources.

HISTORY

The q command was created by Harel Ben-Attia, with its first public release appearing around 2012. It quickly gained popularity within the Unix command-line community for its innovative approach to querying text files using familiar SQL syntax. The project is actively maintained on GitHub, with ongoing development and improvements to its SQL compliance and performance.

SEE ALSO

awk(1), sed(1), grep(1), csvsql(1) (part of csvkit), jq(1), sqlite3(1)

Copied to clipboard