LinuxCommandLibrary

csvsql

Execute SQL queries against CSV files

TLDR

Generate a CREATE TABLE SQL statement for a CSV file

$ csvsql [path/to/data.csv]
copy

Import a CSV file into an SQL database
$ csvsql --insert --db "[mysql://user:password@host/database]" [data.csv]
copy

Run an SQL query on a CSV file
$ csvsql --query "[select * from 'data']" [data.csv]
copy

SYNOPSIS

csvsql [OPTIONS] [SQL]

PARAMETERS

-i, --dialect
    SQL dialect. Options: 'sqlite' (default), 'mysql', 'postgresql'.

-d, --delimiter
    Delimiter character. Defaults to ','.

-t, --tabs
    Specify that the input CSV file is delimited with tabs.

-q, --quotechar
    Quote character. Defaults to '"'.

-u, --db-uri
    Database URI to connect to. If not specified, a temporary SQLite database is created in memory.

--query
    SQL query to execute. Overrides SQL argument.

--insert
    Generate INSERT statements for the CSV data rather than executing a query.

--table
    Table name to use when generating INSERT statements or creating the table from the CSV file.

--before-insert
    SQL query to execute before each insert statement when using the --insert option.

--after-insert
    SQL query to execute after each insert statement when using the --insert option.

--no-header-row
    Do not interpret the first row as a header row.

--snifflimit
    Limit CSV sniffing to the specified number of bytes. Specify 0 to disable sniffing.

--encoding
    Specify the encoding of the input CSV file.

--locale
    Specify the locale to use for type inference, e.g., fr_FR.

--date-format
    Override the default date format. See https://arrow.readthedocs.io/en/latest/ for available formats.

--datetime-format
    Override the default datetime format. See https://arrow.readthedocs.io/en/latest/ for available formats.

--blanks
    Do not skip blank rows.

--escapechar
    Character used to escape other characters. Defaults to '\'.

--doublequote
    Whether or not double quotes are doubled within fields.

-H, --help
    Show this help message and exit.

-V, --version
    Show program's version number and exit.

DESCRIPTION

The `csvsql` command is a powerful utility from the `csvkit` suite designed to enable you to treat CSV files as relational databases. It allows you to use SQL queries (specifically, SQLite dialect) to extract, manipulate, and analyze data stored in CSV files. This command automatically infers the data types of CSV columns and generates an SQL schema (CREATE TABLE statement) based on these types. You can then run SQL queries against this virtual table, allowing for complex data analysis tasks such as filtering, aggregation, joins, and sorting. `csvsql` provides an efficient way to process CSV data using the familiar and expressive power of SQL. The standard output of the command is the resultset of the query, returned as another CSV. Input and output encodings are assumed to be UTF-8 by default, however other encodings can be set through parameters.

USAGE EXAMPLES

Example 1: Selecting specific columns from a CSV file using SQL:
csvsql --query "SELECT column1, column2 FROM filename.csv" filename.csv

Example 2: Filtering data based on a condition:
csvsql --query "SELECT * FROM filename.csv WHERE column3 > 100" filename.csv

Example 3: Generating INSERT statements for importing data into a database:
csvsql --insert --table my_table filename.csv

Example 4: Connect to a postgresql database with the URI:
csvsql --db-uri postgresql://user:password@host:port/database --query "SELECT * FROM filename.csv WHERE column3 > 100" filename.csv

TYPE INFERENCE

`csvsql` attempts to automatically infer the data types of columns in the CSV file. It examines the first few rows of the file and determines the most appropriate type for each column. This inference can be influenced by using the `--locale`, `--date-format`, and `--datetime-format` options. If type inference fails, you may need to explicitly specify the schema using a separate SQL CREATE TABLE statement.

SEE ALSO

csvlook(1), csvcut(1), csvstat(1), csvgrep(1)

Copied to clipboard