This project demonstrates how to consume events from the Wikimedia recent changes stream using Python, aiohttp
for asynchronous HTTP requests, and confluent-kafka
for producing messages to Kafka.
- Python 3.11.5
aiohttp
library (pip install aiohttp
)- Alternatively can also use httpx library for asynchronous stream get request (
pip install httpx
) confluent-kafka
library (pip install confluent-kafka
)- Docker Engine
- Docker Compose
-
Clone the repository:
git clone https://github.com/mainakm7/kafka_wikimedia_stream.git
-
Install dependencies:
pip install -r requirements.txt
-
Set up Kafka, Zookeeper, and PostgreSQL using Docker Compose:
Ensure Docker Engine and Docker Compose are installed and running. run:
docker-compose up -d
Disclaimer: Using the Conduktor platform UI to monitor kafka topics. Present in docker compose file
- Update bootstrap.servers in
kafka_producer/main.py
to localhost:9092 for Kafka connection. - Update
topic_name
inkafka_producer/main.py
to the desired Kafka topic name.
- Run the main script to start consuming Wikimedia events:
###still in production
- Logs will be outputted to
wiki_producer.log
in the current directory.
- Asynchronous Event Handling: Uses
aiohttp
for asynchronous HTTP requests to fetch events. - Kafka Integration: Produces fetched events to a Kafka topic using
confluent-kafka
. - Error Handling: Logs errors and exceptions to
wiki_producer.log
for troubleshooting.
Contributions are welcome! Please fork the repository and submit pull requests with improvements or new features.
This project is licensed under the MIT License - see the LICENSE file for details.