Skip to content

Simple template for ETLs using Postgres, Spark and Airflow deployed on docker with a single command

License

Notifications You must be signed in to change notification settings

guipleite/airflow-postgres-spark-etl-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Template ETL with Airflow, Spark and Postgres

Simple template for ETLs using Postgres, Spark and Airflow deployed on docker containers.

To deploy simply run:

docker-compose up

After that log into the Airflow interface at http://localhost:8080/home using the credentials:

user: airflow
password: airflow

The databases and tables in Postgres are created by the sript /dags/db/init.sql when the container is created by Docker Compose, and the credentials are located in the enviroment file /dags/.env (in a real environment, this file should not be added to git).

The Dockerfile pulls and Airflow docker image and installs the required Python packages described at requirements.txt.

The ETL DAG is located at /dags/etl_DAG.py and as a demonstration, reads data from an API, then transforms it using PySpark, loads it into the database and aggreagates it using SQL querys.

About

Simple template for ETLs using Postgres, Spark and Airflow deployed on docker with a single command

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published