dbt
Transform data in your data warehouse
TLDR
Debug the dbt project and the connection to the database
Run all models of the project
Run all tests of example_model
Build (load seeds, run models, snapshots, and tests associated with) example_model and its downstream dependents
Build all models, except the ones with the tag not_now
Build all models with tags one and two
Build all models with tags one or two
SYNOPSIS
dbt subcommand [options] [arguments]
dbt [--version | --help]
Examples:
dbt run
dbt test --select my_model
dbt docs generate
PARAMETERS
run
Executes dbt models, seeds, and snapshots defined in your project against the data warehouse.
test
Runs tests defined in the dbt project against your data to ensure data quality and integrity.
build
Executes models, tests, snapshots, and seeds in a single command, respecting dependencies.
seed
Loads CSV data from your dbt project's 'data' directory directly into your data warehouse.
snapshot
Captures the state of a changing source table at specific points in time, useful for slowly changing dimensions.
docs
Generates and serves comprehensive documentation for your dbt project and data models, including lineage.
debug
Verifies connectivity to your data warehouse and checks dbt configuration settings, aiding in troubleshooting.
init
Initializes a new dbt project with a basic directory structure and essential configuration files.
compile
Compiles dbt models into executable SQL without running them against the warehouse, useful for validation.
deps
Installs dbt packages (reusable dbt projects) from your 'packages.yml' file, managing external dependencies.
clean
Deletes artifacts from previous dbt runs, including the 'target/' directory and other compiled files.
--project-dir
Specifies the dbt project directory to use if it's not the current working directory.
--profiles-dir
Specifies the directory where the 'profiles.yml' file (containing warehouse connection details) is located.
--target
Specifies the profile target (a specific connection configuration within a profile) to use for the run.
--profile
Specifies the dbt profile (a collection of targets) to use, as defined in 'profiles.yml'.
--version
Displays the dbt version information and exits.
--help
Displays help information for the dbt command or a specific subcommand.
DESCRIPTION
dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their data warehouse by writing SQL select statements. It treats data transformation code like software development code, allowing for modularity, version control, testing, and documentation. dbt compiles SQL and Jinja into executable SQL, creating tables and views. It helps teams establish a single source of truth for their data, improve data quality, and accelerate data delivery. Users define data models, tests, and documentation within a dbt project, and dbt handles the orchestration, dependency management, and execution of these transformations within the cloud data warehouse.
CAVEATS
dbt is not a built-in Linux command; it requires Python and typically needs to be installed via pip (e.g., pip install dbt-core dbt-
It requires a properly configured profiles.yml file for connecting to a data warehouse.
Its primary function is data transformation within a data warehouse, not general-purpose data manipulation on the Linux filesystem.
PROJECT STRUCTURE
A typical dbt project includes directories for:
models/ (SQL files defining data transformations),
tests/ (YAML and SQL files for data quality checks),
macros/ (Jinja macros for reusable SQL code),
seeds/ (CSV files for static data loaded directly),
snapshots/ (SQL files for capturing historical data changes),
and configuration files like dbt_project.yml and profiles.yml.
ADAPTERS
dbt is warehouse-agnostic through the use of 'adapters'. While dbt-core provides the main functionality, a specific adapter (e.g., dbt-snowflake, dbt-bigquery, dbt-redshift, dbt-postgres) is needed to connect and interact with a particular data warehouse. These adapters are installed separately via pip.
HISTORY
dbt was created by Fishtown Analytics (now dbt Labs) and open-sourced in 2016. It emerged from a need to apply software engineering best practices to analytics code. Its adoption grew rapidly due to its focus on SQL, modularity, testing, and documentation, democratizing data transformation for analysts who were comfortable with SQL but not necessarily complex programming languages. dbt Labs continues to drive its development, alongside a vibrant open-source community, making it a cornerstone of the modern data stack.