duckdb
Run SQL queries on data
TLDR
Start an interactive shell with a transient in-memory database
Start an interactive shell on a database file. If the file does not exist, a new database is created
Query a CSV, JSON, or Parquet file using SQL
Directly query a CSV, JSON, or Parquet file using the file view
Run an SQL script
Run query on database file and keep the interactive shell open
Read CSV from stdin and write CSV to stdout
Start the DuckDB UI, a web interface with notebooks
SYNOPSIS
duckdb [options] [[DATABASE_PATH] SQL_QUERY ...]
PARAMETERS
-h, --help
Show help message and exit.
-v, --version
Print DuckDB version information.
-c SQL, --command SQL
Execute SQL query and exit.
--csv
Output results in CSV format.
--json
Output results in JSON format.
--html
Output results as HTML table.
--table
Output results as pretty table (default in interactive mode).
-i, --interactive
Force interactive REPL mode.
--config KEY=VALUE
Set configuration option (e.g., memory_limit='4GB').
--init FILE
Execute SQL from file on startup.
--extension PATH|NAME
Load extension by path or name.
-l, --list
List installed extensions.
--batch
Run non-interactively, suppress prompts.
DESCRIPTION
DuckDB is an embeddable, in-process columnar database management system designed for analytical (OLAP) workloads. The duckdb command provides a powerful CLI for executing SQL queries directly on large datasets in formats like CSV, Parquet, JSON, Arrow, and relational databases. It supports complex analytical queries with high performance, vectorized execution, and parallel processing, all without requiring a server process or complex setup.
Key features include seamless integration with data science tools (e.g., Python Pandas, R), spatial extensions (GEOS), full-text search (FTS), and hundreds of built-in functions. Users can query files directly (duckdb 'SELECT * FROM "file.parquet"'), pipe data via stdin, or load persistent databases. Interactive mode offers auto-completion, history, and pretty-printing. It's lightweight, portable across platforms, and licensed under MIT for broad adoption in data analytics, ETL pipelines, and exploratory analysis.
CAVEATS
High memory usage for large datasets; single-user only (no concurrency); experimental features may change.
BASIC EXAMPLES
duckdb -c "SELECT 42"
duckdb 'SELECT * FROM "data.csv" LIMIT 5'
duckdb mydb.db # Interactive on database
INSTALLATION
Linux: sudo apt install duckdb or download binary from duckdb.org. Supports Homebrew, Conda.
HISTORY
DuckDB originated in 2018 at CWI Amsterdam as research into analytical query engines. First public release (v0.1.0) in June 2019. Rapid growth with v1.0 in 2024, now used by millions for data analytics.


