data-engineering-project

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

python aws spark aws-lambda etl aws-s3 pandas pyspark data-engineering aws-iam aws-cloudwatch data-pipeline etl-pipeline aws-glue data-engineering-workflows data-engineering-pipeline aws-lambda-layers aws-data-engineering-project data-engineering-project

Updated May 30, 2024
Python

janaom / gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

Star

Build a data pipeline on Google Cloud using an event-driven architecture, leveraging GCS, Cloud Run functions, and BigQuery. Explore both VM and Composer options for Airflow management, and utilize Logging & Monitoring for pipeline health. Discover how SQL-based BigQuery ML can be used for initial ML implementation in specific scenarios.

bigquery airflow composer google-cloud-platform cloud-functions bigqueryml data-engineering-project cloud-run-functions

Updated Aug 25, 2024
Python

k0rsakov / scd_dag_factory

Star

Фабрика DAG через SCD-таблицу с конфигурациями

airflow tutorial docker-compose tutorials python3 data-engineering tutorial-code airflow-dags data-engineering-project

Updated Aug 15, 2024
Python

Improve this page

Add a description, image, and links to the data-engineering-project topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering-project topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering-project

Here are 8 public repositories matching this topic...

k0rsakov / dag_factory

k0rsakov / infrastructure_for_data_engineer_S3

k0rsakov / all_about_DuckDB

agusabdulrahman / Realtime-Data-Streaming

k0rsakov / infrastructure_for_data_engineer_kafka

waqarg2001 / Youtube-Data-Pipeline-AWS

janaom / gcp-de-project-data-pipeline-with-cloud-run-functions-airflow-biggueryml

k0rsakov / scd_dag_factory

Improve this page

Add this topic to your repo