LinuxCommandLibrary

csvsql

Execute SQL queries against CSV files

TLDR

Generate a CREATE TABLE SQL statement for a CSV file

$ csvsql [path/to/data.csv]
copy

Import a CSV file into an SQL database
$ csvsql --insert --db "[mysql://user:password@host/database]" [data.csv]
copy

Run an SQL query on a CSV file
$ csvsql --query "[select * from 'data']" [data.csv]
copy

SYNOPSIS

csvsql [OPTION...] [FILE]

PARAMETERS

-h, --help
    Show help message and exit

-v, --version
    Show program's version number and exit

--db=DB
    Database connection string (SQLite by default)

--insert
    Generate INSERT statements

--create[=/path]
    Generate CREATE TABLE statements

--tables
    Generate both CREATE and INSERT statements

--query=QUERY
    Execute given SQL query on CSV

--where=WHERE
    WHERE clause for INSERT statements

-s SCHEMA, --schema=SCHEMA
    Schema to create tables in

--no-inference
    Disable type inference; use TEXT

--types={text|float|numeric|integer|boolean|date}
    Override types for all columns

--blanks
    Skip generating SQL for blank rows

-f FIELDS, --primary-key=FIELDS
    Fields to use as primary key

--overwrite
    Drop table if it exists before creating

-d DELIM, --delimiter=DELIM
    Field delimiter (default: ',')

-t, --tabs
    Tab-delimited input

-q QUOTE, --quote=QUOTE
    Quote character

-u, --doublequote
    Double-quote characters within fields

--escape=CHAR
    Escape character for fields

-z MAX, --maxfieldsize=MAX
    Maximum field size in bytes

-e LTERM, --lineterminator=LTERM
    Line terminator

--encoding=ENCODING
    Input file encoding

-H, --no-header-row
    Treat first row as data, not headers

-c COLS, --columns=COLS
    Operate on specific columns only

--excel
    Use Excel-style escaping

DESCRIPTION

csvsql is a powerful command-line utility from the csvkit suite designed for converting CSV data into SQL statements or querying CSV files with SQL. It excels at generating CREATE TABLE statements with inferred column types, producing INSERT statements for database loading, or executing custom SQL queries directly on CSV input using an in-memory SQLite database.

Key use cases include data migration from CSV to relational databases like PostgreSQL or MySQL, data exploration via SQL without loading into a full DB, and automating ETL processes. Type inference automatically detects integers, floats, booleans, dates, and text, but can be overridden. It supports standard CSV variations with custom delimiters, quotes, and encodings.

When no database is specified, it uses SQLite for queries. For production, connect to external DBs via connection strings. Blank rows can be skipped, primary keys defined, and tables overwritten. Ideal for data engineers and analysts bridging flat files with SQL workflows, saving time on boilerplate scripting.

CAVEATS

Type inference may misclassify data; use --no-inference or --types for precision.
Large CSVs limited by SQLite memory; use --db for external DBs.
Requires csvkit package installed.

EXAMPLE

csvsql --create --insert data.csv > schema.sql
csvsql --query "SELECT * FROM stdin WHERE col1 > 10" data.csv

HISTORY

Part of csvkit, created by Christopher Filler in 2010. Evolved through Python 2/3 transitions; now v2+ maintained on GitHub (sf csvkit org). Widely used in data pipelines since early 2010s.

SEE ALSO

csvlook(1), csvstat(1), csvcut(1), in2csv(1), csvjson(1), sqlite3(1)

Copied to clipboard