LinuxCommandLibrary

bq

Interact with Google BigQuery

TLDR

Run query against a BigQuery table using standard SQL, add --dry_run flag to estimate the number of bytes read by the query

$ bq query --nouse_legacy_sql 'SELECT COUNT(*) FROM [DATASET_NAME].[TABLE_NAME]'
copy

Run a parameterized query
$ bq query --use_legacy_sql=false --parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00' 'SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR)'
copy

Create a new dataset or table in the US location
$ bq mk --location=US [dataset_name].[table_name]
copy

List all datasets in a project
$ bq ls --filter labels.[key]:[value] --max_results [integer] --format=prettyjson --project_id [project_id]
copy

Batch load data from a specific file in formats such as CSV, JSON, Parquet, and Avro to a table
$ bq load --location [location] --source_format [CSV|JSON|PARQUET|AVRO] [dataset].[table] [path_to_source]
copy

Copy one table to another
$ bq cp [dataset].[OLD_TABLE] [dataset].[new_table]
copy

Display help
$ bq help
copy

SYNOPSIS

bq [GLOBAL_FLAGS] command [command_args]
bq --version
bq help [command]

PARAMETERS

--project PROJECT_ID
    Specifies the Google Cloud project ID to use for the command. Overrides the default configured project.

--dataset DATASET_ID
    Sets the default dataset for operations that don't explicitly specify one.

--location LOCATION
    Specifies the BigQuery dataset location (e.g., 'US', 'europe-west2').

--format FORMAT
    Output format for data (e.g., 'json', 'csv', 'pretty', 'none').

--dry_run
    Checks a query or load job without actually running it or incurring costs.

--max_rows COUNT
    Specifies the maximum number of rows to return from a query or table data command.

--quiet
    Suppresses informational messages, showing only results or errors.

--verbosity LEVEL
    Sets the verbosity level for logging (e.g., 'debug', 'info', 'warning', 'error', 'critical').

--synchronous_mode
    Waits for asynchronous jobs (e.g., queries, loads) to complete before exiting.

DESCRIPTION

The bq command is the command-line interface (CLI) tool for Google BigQuery, a fully managed, serverless, and highly scalable data warehouse. It allows users to interact with BigQuery directly from their terminal, managing datasets, tables, views, and jobs. Users can run SQL queries, load data into tables from various sources (e.g., Google Cloud Storage, local files), export query results, and perform administrative tasks like managing access control and monitoring job status. While it operates on Linux, it is not a native Linux utility but rather part of the Google Cloud SDK, requiring installation and authentication to a Google Cloud project. It serves as a powerful and flexible alternative to the BigQuery web UI, enabling scripting and automation of data warehouse operations.

CAVEATS

The bq command requires the Google Cloud SDK to be installed and configured with appropriate authentication credentials for a Google Cloud project. Operations performed using bq consume BigQuery resources and may incur costs according to Google Cloud's pricing. It's crucial to understand BigQuery's data processing and storage costs. Some commands may create asynchronous jobs, and their completion status needs to be monitored separately unless --synchronous_mode is used. The command's functionality is entirely dependent on the availability and performance of the BigQuery service.

COMMON SUBCOMMANDS

The bq tool operates via subcommands, each tailored for specific BigQuery operations. Common subcommands include:
bq query SQL_QUERY: Executes a SQL query against BigQuery.
bq load DATASET.TABLE SOURCE_URI SCHEMA: Loads data into a table from a specified source.
bq mk RESOURCE_TYPE ID: Creates a new dataset, table, or view.
bq ls [DATASET_ID]: Lists datasets or tables within a dataset.
bq rm RESOURCE_TYPE ID: Deletes a dataset, table, or view.
bq show RESOURCE_TYPE ID: Displays information about a dataset, table, or job.

AUTHENTICATION

Before using bq, you must authenticate to Google Cloud. This is typically done using the gcloud auth login command, which opens a browser for interactive login. Alternatively, service accounts can be used for automated scripts via gcloud auth activate-service-account.

HISTORY

The bq command is an integral part of the Google Cloud SDK, which was first released in 2013 to provide a unified command-line experience for interacting with Google Cloud services. As BigQuery evolved from its initial beta release in 2010 into a fully managed enterprise-grade data warehouse, the bq tool has been continually updated to support new features, improve performance, and offer a comprehensive interface for data analysts, engineers, and administrators. Its development has focused on providing robust scripting capabilities and seamless integration with BigQuery's growing ecosystem of features, including streaming inserts, data manipulation language (DML), and advanced query capabilities.

SEE ALSO

gcloud(1), gsutil(1), psql(1), mysql(1)

Copied to clipboard