LinuxCommandLibrary

dbt

Transform data in your data warehouse

TLDR

Debug the dbt project and the connection to the database

$ dbt debug
copy

Run all models of the project
$ dbt run
copy

Run all tests of example_model
$ dbt test --select example_model
copy

Build (load seeds, run models, snapshots, and tests associated with) example_model and its downstream dependents
$ dbt build --select example_model+
copy

Build all models, except the ones with the tag not_now
$ dbt build --exclude "tag:not_now"
copy

Build all models with tags one and two
$ dbt build --select "tag:one,tag:two"
copy

Build all models with tags one or two
$ dbt build --select "tag:one tag:two"
copy

SYNOPSIS

dbt subcommand [options] [arguments]
dbt [--version | --help]

Examples:
dbt run
dbt test --select my_model
dbt docs generate

PARAMETERS

run
    Executes dbt models, seeds, and snapshots defined in your project against the data warehouse.

test
    Runs tests defined in the dbt project against your data to ensure data quality and integrity.

build
    Executes models, tests, snapshots, and seeds in a single command, respecting dependencies.

seed
    Loads CSV data from your dbt project's 'data' directory directly into your data warehouse.

snapshot
    Captures the state of a changing source table at specific points in time, useful for slowly changing dimensions.

docs
    Generates and serves comprehensive documentation for your dbt project and data models, including lineage.

debug
    Verifies connectivity to your data warehouse and checks dbt configuration settings, aiding in troubleshooting.

init
    Initializes a new dbt project with a basic directory structure and essential configuration files.

compile
    Compiles dbt models into executable SQL without running them against the warehouse, useful for validation.

deps
    Installs dbt packages (reusable dbt projects) from your 'packages.yml' file, managing external dependencies.

clean
    Deletes artifacts from previous dbt runs, including the 'target/' directory and other compiled files.

--project-dir
    Specifies the dbt project directory to use if it's not the current working directory.

--profiles-dir
    Specifies the directory where the 'profiles.yml' file (containing warehouse connection details) is located.

--target
    Specifies the profile target (a specific connection configuration within a profile) to use for the run.

--profile
    Specifies the dbt profile (a collection of targets) to use, as defined in 'profiles.yml'.

--version
    Displays the dbt version information and exits.

--help
    Displays help information for the dbt command or a specific subcommand.

DESCRIPTION

dbt (data build tool) is an open-source command-line tool that enables data analysts and engineers to transform data in their data warehouse by writing SQL select statements. It treats data transformation code like software development code, allowing for modularity, version control, testing, and documentation. dbt compiles SQL and Jinja into executable SQL, creating tables and views. It helps teams establish a single source of truth for their data, improve data quality, and accelerate data delivery. Users define data models, tests, and documentation within a dbt project, and dbt handles the orchestration, dependency management, and execution of these transformations within the cloud data warehouse.

CAVEATS

dbt is not a built-in Linux command; it requires Python and typically needs to be installed via pip (e.g., pip install dbt-core dbt-).
It requires a properly configured profiles.yml file for connecting to a data warehouse.
Its primary function is data transformation within a data warehouse, not general-purpose data manipulation on the Linux filesystem.

PROJECT STRUCTURE

A typical dbt project includes directories for:
models/ (SQL files defining data transformations),
tests/ (YAML and SQL files for data quality checks),
macros/ (Jinja macros for reusable SQL code),
seeds/ (CSV files for static data loaded directly),
snapshots/ (SQL files for capturing historical data changes),
and configuration files like dbt_project.yml and profiles.yml.

ADAPTERS

dbt is warehouse-agnostic through the use of 'adapters'. While dbt-core provides the main functionality, a specific adapter (e.g., dbt-snowflake, dbt-bigquery, dbt-redshift, dbt-postgres) is needed to connect and interact with a particular data warehouse. These adapters are installed separately via pip.

HISTORY

dbt was created by Fishtown Analytics (now dbt Labs) and open-sourced in 2016. It emerged from a need to apply software engineering best practices to analytics code. Its adoption grew rapidly due to its focus on SQL, modularity, testing, and documentation, democratizing data transformation for analysts who were comfortable with SQL but not necessarily complex programming languages. dbt Labs continues to drive its development, alongside a vibrant open-source community, making it a cornerstone of the modern data stack.

Copied to clipboard