csvsql
Execute SQL queries against CSV files
TLDR
Generate a CREATE TABLE SQL statement for a CSV file
Import a CSV file into an SQL database
Run an SQL query on a CSV file
SYNOPSIS
csvsql [OPTIONS] [SQL]
PARAMETERS
-i, --dialect
SQL dialect. Options: 'sqlite' (default), 'mysql', 'postgresql'.
-d, --delimiter
Delimiter character. Defaults to ','.
-t, --tabs
Specify that the input CSV file is delimited with tabs.
-q, --quotechar
Quote character. Defaults to '"'.
-u, --db-uri
Database URI to connect to. If not specified, a temporary SQLite database is created in memory.
--query
SQL query to execute. Overrides SQL argument.
--insert
Generate INSERT statements for the CSV data rather than executing a query.
--table
Table name to use when generating INSERT statements or creating the table from the CSV file.
--before-insert
SQL query to execute before each insert statement when using the --insert option.
--after-insert
SQL query to execute after each insert statement when using the --insert option.
--no-header-row
Do not interpret the first row as a header row.
--snifflimit
Limit CSV sniffing to the specified number of bytes. Specify 0 to disable sniffing.
--encoding
Specify the encoding of the input CSV file.
--locale
Specify the locale to use for type inference, e.g., fr_FR.
--date-format
Override the default date format. See https://arrow.readthedocs.io/en/latest/ for available formats.
--datetime-format
Override the default datetime format. See https://arrow.readthedocs.io/en/latest/ for available formats.
--blanks
Do not skip blank rows.
--escapechar
Character used to escape other characters. Defaults to '\'.
--doublequote
Whether or not double quotes are doubled within fields.
-H, --help
Show this help message and exit.
-V, --version
Show program's version number and exit.
DESCRIPTION
The `csvsql` command is a powerful utility from the `csvkit` suite designed to enable you to treat CSV files as relational databases. It allows you to use SQL queries (specifically, SQLite dialect) to extract, manipulate, and analyze data stored in CSV files. This command automatically infers the data types of CSV columns and generates an SQL schema (CREATE TABLE statement) based on these types. You can then run SQL queries against this virtual table, allowing for complex data analysis tasks such as filtering, aggregation, joins, and sorting. `csvsql` provides an efficient way to process CSV data using the familiar and expressive power of SQL. The standard output of the command is the resultset of the query, returned as another CSV. Input and output encodings are assumed to be UTF-8 by default, however other encodings can be set through parameters.
USAGE EXAMPLES
Example 1: Selecting specific columns from a CSV file using SQL:
csvsql --query "SELECT column1, column2 FROM filename.csv" filename.csv
Example 2: Filtering data based on a condition:
csvsql --query "SELECT * FROM filename.csv WHERE column3 > 100" filename.csv
Example 3: Generating INSERT statements for importing data into a database:
csvsql --insert --table my_table filename.csv
Example 4: Connect to a postgresql database with the URI:
csvsql --db-uri postgresql://user:password@host:port/database --query "SELECT * FROM filename.csv WHERE column3 > 100" filename.csv
TYPE INFERENCE
`csvsql` attempts to automatically infer the data types of columns in the CSV file. It examines the first few rows of the file and determines the most appropriate type for each column. This inference can be influenced by using the `--locale`, `--date-format`, and `--datetime-format` options. If type inference fails, you may need to explicitly specify the schema using a separate SQL CREATE TABLE statement.