dbt Modeling
In today's data-driven business environment, companies need efficient and flexible systems for data analysis and decision-making. Data warehouses play a vital role in managing, storing, and analyzing large datasets within an organization. However, traditional data warehouse modeling techniques often fall short in meeting the demands for rapid iteration and flexibility. Enter dbt (Data Build Tool), a modern data transformation tool specifically designed for data modeling, testing, and deployment within data warehouses.
By combining the data modeling capabilities of dbt with the storage advantages of Tacnode, you can create a robust, flexible, and easily maintainable data warehouse ecosystem. This article explores how to use dbt and Tacnode for data warehouse modeling, from building data pipelines to executing data transformations.
Preparation
- Install Python 3 by visiting Python to download or use your system package manager for installation.
- Install Git by going to Git for downloading or use your system package manager.
- Download VS Code Editor.
- Install dbt and set up the project.
Install dbt
Open your terminal and run pip3 install dbt-core dbt-postgresql
dbt Modeling
Concepts
schema.yml
: Defines metadata for the table, including description, tags, and test sets.- Seed: Used to load CSV files and can import external dimension tables or small test datasets.
- Model: Conducted by writing an SQL template.
source
function establishes a relationship between the model and the source for lineage management, e.g.,select * from {{ source('public', 'et') }};
.ref
function associates relationships between different models for lineage management, e.g.,select * from {{ ref('base_model') }}
.
- Source: Defines the source table; table structure definition and data maintenance are not completed in dbt.
schema.yml example
For detailed configuration, refer to dbt schema
dbt lineage graph management example
- Lineage diagrams are automatically generated by modules linked through the
source
andref
functions.
Modeling Practice
Initialize the dbt project
Open a terminal and execute
Execute Built-in Modeling and Testing
Ran dbt test1
and found a test error report showing a failure in the not_null_my_first_dbt_model_id
test (located in models/example/schema.yml).
Upon reviewing the models/example/schema.yml
file, I noticed that the model my_first_dbt_model
implements the not_null
test. In models/example/my_first_dbt_model.sql
, line 16 generates a NULL
value.
To resolve this issue, modify the statement -- where id is not null
on line 27 of models/example/my_first_dbt_model.sql
to where id is not null
.
After making this adjustment, run dbt run; dbt test
again and confirm that the issue has been fixed.
Upload CSV data via seed
Download the file country_codes.csv, place it in the seeds
folder, and then run dbt seed
.
Access Tacnode using psql to examine country_code
.
Creating Custom Tests
For instance, consider country_codes
. Begin by creating a file named seeds/schema.yml
and paste the content below.
Generate a tailored SQL test template by creating a file named tests/generic/count_between.sql
and pasting the content below.
Executing the test dbt test --select country_codes
Define the source table and begin initial modeling
- Download the external table ods_orders.sql.gz, unzip it, and execute the import command:
psql ...connection_string... -f ods_orders.sql
- Edit
models/erp/schema.yml
to add a description undersources
.
- In cases of dirty data, the condition for filtering valid data is
where sales are not null and sales > 0
. - The cleaned model is named
dw_orders
. Create a file namedmodels/erp/dw_orders.sql
and paste the following content:
- Run
dbt run --select erp
to execute a new model and observe its results.
Model Documentation
- Create Documentation
dbt docs generate
The produced documents are located in thetarget
directory. To enable document access, static sites can be deployed usinghttpd
likengnix
. CI allows for automatic version updates. - View locally using the web
dbt docs serve
Appendix
- Obtain the project file example.tgz and review the modification records for each step using git.
- To run the project, update the profiles.yml file by adding the following content:
profiles.yml
file path:- macOS, Linux:
$HOME/.dbt/profiles.yml
- Windows:
c:/users/$HOME/.dbt/profiles.yml
- macOS, Linux: