The DBT (data build tool) is a framework, which uses SQL as a syntax base, for processing/transforming analytical data. It focuses on the Transformation (T) step of the ETL (Extraction, Transformation and Load)
DuckDB is a relational embeddable analytical DBMS that focuses on supporting analytical query workloads (OLAP). Similar to SQLite, DuckDB prioritizes simplicity and ease of integration by eliminating external dependencies for compilation and run-time. Why DuckDB ? DuckDB is designed to be embedded within applications or used as a serverless database. You can integrate it directly into your data pipeline without the need for a separate server installation or configuration.
- dbt core
- duckdb
- DBeaver (optional)
- Create an isolated virtual environment for dbt-core
conda create --name dbtenv python=3.11
- Activate the Environment
conda activate dbtenv
- Install duckdb adapter
pip install dbt-duckdb
- dbt seed
- dbt run
- dbt test
- dbt docs generate
- dbt docs serve Data source reference: https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data?select=offers.csv.gz
-
The Path should the same as you defined in the profiles.yml or choose Open to browse up the directory.
- Learn more about dbt in the docs
- Learn more about DuckDB in the docs
- Check out the blog for the latest news on dbt's development and best practices