LinuxCommandLibrary

dbt

Transform data in your data warehouse

TLDR

Debug the dbt project and the connection to the database

$ dbt debug
copy

Run all models of the project
$ dbt run
copy

Run all tests of example_model
$ dbt test --select example_model
copy

Build (load seeds, run models, snapshots, and tests associated with) example_model and its downstream dependents
$ dbt build --select example_model+
copy

Build all models, except the ones with the tag not_now
$ dbt build --exclude "tag:not_now"
copy

Build all models with tags one and two
$ dbt build --select "tag:one,tag:two"
copy

Build all models with tags one or two
$ dbt build --select "tag:one tag:two"
copy

SYNOPSIS

dbt [options]

PARAMETERS

run
    Executes dbt models to transform data.

test
    Runs data quality tests defined in dbt projects.

docs generate
    Generates documentation for the dbt project.

seed
    Loads seed data into the data warehouse.

clean
    Removes the dbt 'target' directory (where compiled SQL and logs are stored).

debug
    Prints debug information about the dbt project, useful for troubleshooting.

compile
    Compiles the dbt project without running the models.

run-operation
    Runs a specific macro defined in the dbt project.

snapshot
    Executes dbt snapshots.

--profiles-dir
    Specifies the directory where the dbt 'profiles.yml' file is located.

--target
    Specifies target to use from the profiles.yml

DESCRIPTION

dbt (data build tool) is a command-line tool that enables data analysts and engineers to transform data in their data warehouses more effectively by using software engineering best practices.

It allows users to write modular SQL and use Jinja templating for dynamic code generation. dbt primarily focuses on the 'Transform' step of the ELT (Extract, Load, Transform) process, operating on data already loaded into a data warehouse. It manages dependencies between transformations, handles testing and documentation, and allows for version control and collaboration in data transformation workflows. It is not a data integration or ETL tool; it assumes the data is already loaded.

CAVEATS

dbt relies on a data warehouse (e.g., Snowflake, BigQuery, Redshift) being already set up and populated with data. It's not a standalone database solution. It requires a 'profiles.yml' file configured correctly to connect to the data warehouse. dbt is case-sensitive, be aware with names.

DBT PROJECT STRUCTURE

A dbt project typically includes a 'dbt_project.yml' file defining project settings, a 'models' directory containing SQL models, a 'tests' directory for data quality tests, a 'macros' directory for reusable SQL code, and a 'seeds' directory for seed data.

JINJA TEMPLATING

dbt uses Jinja templating within SQL models, enabling dynamic code generation and parameterization. This allows you to create reusable and flexible transformations.

Copied to clipboard