Demo for running an analytics pipeline locally using Apache Airflow to schedule and run dbt jobs.
We're using Google Cloud Platform (GCP) and BigQuery as a data warehouse for the purposes of this demo, but if you're familiar with another public cloud or warehouse, you can substitute where applicable (i.e. dbt-bigquery
-> dbt-snowflake
).
dbt models can be found in the analytics directory. Add .sql
and .yml
files as shown in the models directory to create tables or views in the warehouse using this pipeline.
The below instructions will walk you through deploying Airflow in a local Docker container. This was chosen for easy setup and development, but a cloud hosted Airflow instance is necessary for a production-level solution. Each major public cloud provider supports Airflow hosting.
Clone the repo and open a terminal from the cloned directory
git clone https://github.com/chelseybeck/airflow-dbt-demo.git
-
GCP Service Account (+ bigquery-key.json) with the following permissions:
- BigQuery Data Editor
- BigQuery Data Viewer
- Bigquery Job User
The service account can be created manually in the GCP UI or locally using the Terraform module in the
terraform
directory - see details
cp .env.example .env
Open .env
and update the GCP_PROJECT
with your project ID
Build the Docker image
DOCKER_BUILDKIT=0 sudo docker build -t airflow-dbt-demo .
Run the new container - change local-path-to-key
to your local path
sudo docker run --env-file .env \
-p 8080:8080 \
-v /local-path-to-key/bigquery-key.json:/app/bigquery-key.json \
--name airflow-dbt-demo-container \
airflow-dbt-demo
Create Airflow admin user
docker exec -it airflow-dbt-demo-container airflow users create \
--username gstacy \
--firstname Gwen \
--lastname Stacy \
--role Admin \
--email gstacy@spiderverse.com
Access the Airflow UI and login with admin user credentials
Turn on the spotify_ingestion_dag
(which will trigger it to run) and once it completes, turn on the daily_dbt_dag
.
You should now see two new datasets in BQ:
raw_ingestion
- contains the raw spotify table spotify_top_2023_metadata
dev_spotify_derived
- contains the dbt models as defined in the models directory
Stop the existing container (if running)
docker stop airflow-dbt-demo-container
Remove the existing container
docker rm airflow-dbt-demo-container