LinuxCommandLibrary

bq

Interact with Google BigQuery

TLDR

Run query against a BigQuery table using standard SQL, add --dry_run flag to estimate the number of bytes read by the query

$ bq query --nouse_legacy_sql 'SELECT COUNT(*) FROM [DATASET_NAME].[TABLE_NAME]'
copy

Run a parameterized query
$ bq query --use_legacy_sql=false --parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00' 'SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR)'
copy

Create a new dataset or table in the US location
$ bq mk --location=US [dataset_name].[table_name]
copy

List all datasets in a project
$ bq ls --filter labels.[key]:[value] --max_results [integer] --format=prettyjson --project_id [project_id]
copy

Batch load data from a specific file in formats such as CSV, JSON, Parquet, and Avro to a table
$ bq load --location [location] --source_format [CSV|JSON|PARQUET|AVRO] [dataset].[table] [path/to/source]
copy

Copy one table to another
$ bq cp [dataset].[OLD_TABLE] [dataset].[new_table]
copy

Display help
$ bq help
copy

SYNOPSIS

bq [--format=<format>] [--location=<location>] [--project_id=<PROJECT_ID>] <command> [<args>...]
Common commands: query, ls, show, load, extract

PARAMETERS

--configuration=<name>
    Use named configuration properties

--dry_run
    Estimate bytes processed/processed without running query

--format=<prettyjson|csv|table|tsv>
    Output format (default: prettyprint)

--help or -h or -?
    Show help for command or flag

--location=<location>
    Default location (e.g., US, EU)

--max_results=<integer>
    Max rows to return

--no_wait
    Return immediately, don't wait for job completion

--project_id or -p <PROJECT_ID>
    Default GCP project

--quiet or -q
    Disable verbose logging

--use_legacy_sql={true|false}
    Use legacy SQL dialect (default: false, standard SQL)

--apilog=<PATH>
    Log API calls to file

--format_time={boolean}
    Format timestamp outputs

DESCRIPTION

bq is the command-line interface for Google Cloud BigQuery, a serverless data warehouse for analyzing massive datasets using SQL queries.

It enables users to manage datasets, tables, jobs, routines, and models; load and export data; run queries; and script workflows. Installed via the Google Cloud SDK (gcloud components install bq), it supports interactive use and automation in pipelines.

Key capabilities include querying petabyte-scale data, scheduling queries, copying tables across projects, extracting to Cloud Storage, and integrating with other GCP services. Output formats like JSON, CSV, and tables aid scripting with tools like jq or spreadsheets.

bq handles authentication via Application Default Credentials, respects project quotas and locations (e.g., US, EU), and provides dry-run for cost estimation. It's ideal for data engineers, analysts, and DevOps for ETL, BI, and ML workflows without UI dependency.

CAVEATS

Requires Google Cloud SDK and authentication (gcloud auth login); subject to BigQuery quotas, billing, and regional availability.
Not installed by default on Linux; use gcloud components update. Large queries may timeout or incur costs.

AUTHENTICATION

Run gcloud auth application-default login to set credentials.
Use --impersonate_service_account for service accounts.

COMMON EXAMPLES

bq query --use_legacy_sql=false 'SELECT * FROM `project.dataset.table` LIMIT 10'
bq ls --project_id=myproject mydataset
bq load mydataset.mytable gs://bucket/file.csv schema.json

HISTORY

Introduced by Google in 2011 with BigQuery public preview; evolved with Cloud SDK integration (~2015). Supports BigQuery v2 API; active development tracks features like BI Engine, slots, and federated queries.

SEE ALSO

gcloud(1), gsutil(1), bqstat(1)

Copied to clipboard