LinuxCommandLibrary

duckdb

Run SQL queries on data

TLDR

Start an interactive shell with a transient in-memory database

$ duckdb
copy

Start an interactive shell on a database file. If the file does not exist, a new database is created
$ duckdb [path/to/dbfile]
copy

Query a CSV, JSON, or Parquet file using SQL
$ duckdb -c "[SELECT * FROM 'data_source.[csv|csv.gz|json|json.gz|parquet]']"
copy

Directly query a CSV, JSON, or Parquet file using the file view
$ duckdb [data_source.[csv|csv.gz|json|json.gz|parquet]] -c "[ SELECT * FROM file ]"
copy

Run an SQL script
$ duckdb -f [path/to/script.sql]
copy

Run query on database file and keep the interactive shell open
$ duckdb [path/to/dbfile] -cmd "[SELECT DISTINCT * FROM tbl]"
copy

Read CSV from stdin and write CSV to stdout
$ cat [path/to/source.csv] | duckdb -c "[COPY (FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH (FORMAT CSV, HEADER)]"
copy

Start the DuckDB UI, a web interface with notebooks
$ duckdb -ui
copy

SYNOPSIS

duckdb [options] [[DATABASE_PATH] SQL_QUERY ...]

PARAMETERS

-h, --help
    Show help message and exit.

-v, --version
    Print DuckDB version information.

-c SQL, --command SQL
    Execute SQL query and exit.

--csv
    Output results in CSV format.

--json
    Output results in JSON format.

--html
    Output results as HTML table.

--table
    Output results as pretty table (default in interactive mode).

-i, --interactive
    Force interactive REPL mode.

--config KEY=VALUE
    Set configuration option (e.g., memory_limit='4GB').

--init FILE
    Execute SQL from file on startup.

--extension PATH|NAME
    Load extension by path or name.

-l, --list
    List installed extensions.

--batch
    Run non-interactively, suppress prompts.

DESCRIPTION

DuckDB is an embeddable, in-process columnar database management system designed for analytical (OLAP) workloads. The duckdb command provides a powerful CLI for executing SQL queries directly on large datasets in formats like CSV, Parquet, JSON, Arrow, and relational databases. It supports complex analytical queries with high performance, vectorized execution, and parallel processing, all without requiring a server process or complex setup.

Key features include seamless integration with data science tools (e.g., Python Pandas, R), spatial extensions (GEOS), full-text search (FTS), and hundreds of built-in functions. Users can query files directly (duckdb 'SELECT * FROM "file.parquet"'), pipe data via stdin, or load persistent databases. Interactive mode offers auto-completion, history, and pretty-printing. It's lightweight, portable across platforms, and licensed under MIT for broad adoption in data analytics, ETL pipelines, and exploratory analysis.

CAVEATS

High memory usage for large datasets; single-user only (no concurrency); experimental features may change.

BASIC EXAMPLES

duckdb -c "SELECT 42"
duckdb 'SELECT * FROM "data.csv" LIMIT 5'
duckdb mydb.db # Interactive on database

INSTALLATION

Linux: sudo apt install duckdb or download binary from duckdb.org. Supports Homebrew, Conda.

HISTORY

DuckDB originated in 2018 at CWI Amsterdam as research into analytical query engines. First public release (v0.1.0) in June 2019. Rapid growth with v1.0 in 2024, now used by millions for data analytics.

SEE ALSO

sqlite3(1), psql(1), clickhouse-client(1)

Copied to clipboard