This project is an interactive dashboard for an airline company, integrating a data pipeline orchestrated with Airflow to handle data ingestion, transformation, and visualization.
The data used in this project is available through this link.
- Project Overview
- Data Source
- Pipeline Architecture
- Technologies Used
- How to Run the Project
- Makefile Commands
The project consists of two main components:
- Data Pipeline: Managed by Airflow, this pipeline extracts data from a PostgreSQL database, transforms it using DuckDB, and loads it into MongoDB Atlas for efficient storage.
- Streamlit Dashboard: This application retrieves the data stored in MongoDB Atlas, processes it, and displays it in an interactive dashboard for real-time visualization.
The data used for this project is sourced from a PostgreSQL database and can be downloaded from the following link. The data includes information about flights, passengers, and various operational aspects of the airline.
The data pipeline follows these steps:
- Data Extraction (PostgreSQL): Airflow orchestrates the extraction of data from the PostgreSQL database.
- Transformation (DuckDB): DuckDB is used to perform fast and efficient data transformations.
- Loading (MongoDB Atlas): The transformed data is loaded into MongoDB Atlas, ready for visualization.
- Visualization (Streamlit): The Streamlit app connects to MongoDB, retrieves the data, processes it, and displays it in an interactive dashboard.
The following technologies are utilized in this project:
- Airflow: A workflow orchestrator used to automate the ETL pipeline.
- DuckDB: An OLAP engine used for efficient data processing.
- PostgreSQL: A relational database, the source of the data.
- MongoDB Atlas: A NoSQL database for storing the processed data.
- Streamlit: A web interface to display the data in an interactive dashboard.
- Docker & Docker Compose: Used to containerize the services and manage orchestration.
Make sure you have the following tools installed on your machine:
- Docker
- Docker Compose
- Make
-
Clone the GitHub repository:
git clone https://github.com/abrahamkoloboe27/Airflow-Pipeline-Dashboard-Compagnie-Aerienne cd Airflow-Pipeline-Dashboard-Compagnie-Aerienne
-
Configure Airflow connections:
- After starting the services, navigate to the Airflow web interface.
- Go to Admin > Connections in Airflow.
- Add connections for PostgreSQL and MongoDB Atlas with the correct URI, login, and password.
-
Build Docker Images:
make build
This command builds the necessary Docker images for the services.
-
Start the Services:
make up
This command starts Airflow, PostgreSQL, MongoDB Atlas, and the Streamlit app.
-
Build and Start Services Simultaneously:
make up-build
This command rebuilds the services if necessary and then starts them.
-
Stop the Services:
make down
This command stops all running services.
The Makefile included in the project allows you to execute the following commands:
make build
: Builds the Docker images required for the project.make up
: Starts the containerized services (Airflow, PostgreSQL, MongoDB, Streamlit).make up-build
: Rebuilds the Docker images and starts the services.make down
: Stops all the running services.
This project provides a comprehensive solution for data management and visualization for an airline company. It integrates a complete data pipeline that automates extraction, transformation, and loading (ETL) of data, while Streamlit provides an interactive environment for exploring and analyzing the data in real time.