Data Engineering Best Practices

Code for blog at Data Engineering Best Practices - #1. Data flow & Code

Project

Assume we are extracting customer and order information from upstream sources and creating a daily report of the number of orders.

Setup

If you'd like to code along, you'll need

Prerequisite:

git version >= 2.37.1
Docker version >= 20.10.17 and Docker compose v2 version >= v2.10.2. Make sure that docker is running using docker ps
pgcli

Run the following commands via the terminal. If you are using Windows, use WSL to set up Ubuntu and run the following commands via that terminal.

git clone https://github.com/josephmachado/data_engineering_best_practices.git
cd data_engineering_best_practices
make up # Spin up containers
make ddl # Create tables & views
make ci # Run checks & tests
make etl # Run etl
make spark-sh # Spark shell to check created tables

spark.sql("select partition from adventureworks.sales_mart group by 1").show() // should be the number of times you ran `make etl`
spark.sql("select count(*) from businessintelligence.sales_mart").show() // 59
spark.sql("select count(*) from adventureworks.dim_customer").show() // 1000 * num of etl runs
spark.sql("select count(*) from adventureworks.fct_orders").show() // 10000 * num of etl runs
:q // Quit scala shell

You can see the results of DQ checks, using make meta

select * from ge_validations_store limit 1;
exit

Use make down to spin down containers.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
adventureworks		adventureworks
assets/images		assets/images
containers		containers
logs/scheduler		logs/scheduler
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
env		env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Best Practices

Project

Setup

Architecture

About

Releases

Packages

Languages

josephmachado/data_engineering_best_practices

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Best Practices

Project

Setup

Architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages