This Repo contain details related to Data Engineering tech stacks in GCP
-
Updated
Dec 6, 2024 - Jupyter Notebook
This Repo contain details related to Data Engineering tech stacks in GCP
Batch ETL using Cloud Environment which is GCP by utilizing Cloud Composer + Google Cloud Storage + Dataflow + Cloud Build
Apache Beam demo projects
Added for those who want to create a data pipeline with Apache Beam, Google DataFlow and BigQuery.
🤖 Apache Beam RunInference API sample
This is a small Dataflow Job that receives a message via pubsub every time someone accesses a shortened URL. It accumulates the items using a Fixed Time Window, groups by Id and updates FireStore with the amount of clicks
Read step by step tutorial here: https://frazynondo.medium.com/etl-with-gcp-part-i-apache-beam-eclipse-gcs-and-bigquery-dc9529ee7f19
Sample code to build big data pipeline (batch and stream) using apache beam in Python
Apache Beam and EDA Projects: Showcasing real-time data processing with Apache Beam, interactive visualizations with D3.js, and automated EDA with Sweetviz and PyCaret. Includes Jupyter notebooks and outputs for learning and exploration.
Personal Apache Beam studies repository
To set up your Python development environment, get the Apache Beam SDK for Python, and run an example pipeline.
Project to Analysis image using different computer vision Algorithm. to able to get all the info from an image
Projects done to learn database projects
Add a description, image, and links to the apachebeam topic page so that developers can more easily learn about it.
To associate your repository with the apachebeam topic, visit your repo's landing page and select "manage topics."