Skip to content

amosproj/amos2023ws04-pipeline-manager

Repository files navigation

Data Pipeline and Storage Manager (AMOS WS 2023/2024)

Project description

The goal of this project is to develop a cloud-native pipelining (CNP) service that facilitates the analysis and management of project data. A project can request a data pipeline and storage, and then load the data into the pipeline. The loaded raw data is stored in a storage bucket, the insights gained from the raw data are stored in a database. The end user is provided with a frontend to upload the data, and to access and analyse the stored data.

Getting started

pre-requisite

  • Docker installed and configured
    • docker —version ~ Docker version 24.0.6
    • docker compose version ~ Docker Compose version v2.23.0-desktop.1
  • Amazon S3 Bucket (https://aws.amazon.com/en/s3/)
    • create and setup an S3 Bucket for file storage
  • Keycloak
    • https/SSL is required for keycloak to work. README.md

Clone the repo

Clone the repo:

git clone https://github.com/amosproj/amos2023ws04-pipeline-manager.git

Navigate to the main root folder using:

cd amos2023ws04-pipeline-manager

As we have secrets in the backend app, we need to copy the template env to an .env

cp src/backend/.env.template src/backend/.env
cp src/backend/client_secrets.template.json src/backend/client_secrets.json

And then configure the environment variables to connect to your ASW and Apache Airflow connections.

To build the images:

docker compose build

And then in order to get the system up and running, execute the following:

docker compose up -d # in detached mode
ℹ️ For first time it could take a while to download images and configure it , but the consecutive builds will be faster.

Software Components and Functionality

Deployment Pipeline Functionality

  1. An IT staff member can roll out an instance of a CNP at the request of a project.
  2. The deployment pipeline provides the entire infrastructure consisting of frontend, backend, data pipeline, storage, and database so that a project member can then work with the CNP.
  3. The deployed frontend is accessible from the internet.

Frontend Functionality

  1. A user can upload his data through the provisioned CNP and select a data pipeline suitable for that purpose.
  2. A user can search and retrieve relevant information from his CNP project and associated data.
  3. A user can check and control the status of the CNP project.

Backend Functionality

The backend takes care of the orchestration of the described software components, with the following rough process flow:

  1. Access control to the project and project data using an appropriate IAM system
  2. Receiving the data provided via the frontend
  3. Forwarding the data to the data pipeline
  4. Transfer of the prepared data and raw data to the storage system and database

SD Issue creation Guidelines

Please take a look at the SD wiki for creating personal tickets/Issues for the project.

Issue creation Guidlines for SD's


Documentations

contributers

  • @keldami
  • @krutarth4
  • @bhanuPrakashMa
  • @sravanthidatla78
  • @ingunnaf
  • @CAgcoder
  • @elementator
  • @lalitha2395