dbt
Transform data in your data warehouse
TLDR
Debug the dbt project and the connection to the database
Run all models of the project
Run all tests of example_model
Build (load seeds, run models, snapshots, and tests associated with) example_model and its downstream dependents
Build all models, except the ones with the tag not_now
Build all models with tags one and two
Build all models with tags one or two
SYNOPSIS
dbt
PARAMETERS
run
Executes dbt models to transform data.
test
Runs data quality tests defined in dbt projects.
docs generate
Generates documentation for the dbt project.
seed
Loads seed data into the data warehouse.
clean
Removes the dbt 'target' directory (where compiled SQL and logs are stored).
debug
Prints debug information about the dbt project, useful for troubleshooting.
compile
Compiles the dbt project without running the models.
run-operation
Runs a specific macro defined in the dbt project.
snapshot
Executes dbt snapshots.
--profiles-dir
Specifies the directory where the dbt 'profiles.yml' file is located.
--target
Specifies target to use from the profiles.yml
DESCRIPTION
dbt (data build tool) is a command-line tool that enables data analysts and engineers to transform data in their data warehouses more effectively by using software engineering best practices.
It allows users to write modular SQL and use Jinja templating for dynamic code generation. dbt primarily focuses on the 'Transform' step of the ELT (Extract, Load, Transform) process, operating on data already loaded into a data warehouse. It manages dependencies between transformations, handles testing and documentation, and allows for version control and collaboration in data transformation workflows. It is not a data integration or ETL tool; it assumes the data is already loaded.
CAVEATS
dbt relies on a data warehouse (e.g., Snowflake, BigQuery, Redshift) being already set up and populated with data. It's not a standalone database solution. It requires a 'profiles.yml' file configured correctly to connect to the data warehouse. dbt is case-sensitive, be aware with names.
DBT PROJECT STRUCTURE
A dbt project typically includes a 'dbt_project.yml' file defining project settings, a 'models' directory containing SQL models, a 'tests' directory for data quality tests, a 'macros' directory for reusable SQL code, and a 'seeds' directory for seed data.
JINJA TEMPLATING
dbt uses Jinja templating within SQL models, enabling dynamic code generation and parameterization. This allows you to create reusable and flexible transformations.