This repository demonstrates a simple implementation of real-time stock market data streaming using Apache Kafka. The project includes a Kafka producer (producer.py
and producer.ipynb
) to simulate stock data generation and a Kafka consumer (consumer.py
and consumer.ipynb
) to receive and process the streaming data.
-
producer.py
- Python script showcasing how the Kafka producer works.
- Sends random data to a Kafka topic named 'test'.
- Uses the
kafka
library and a JSON value serializer.
-
consumer.py
- Python script illustrating how the Kafka consumer operates.
- Consumes messages from the 'test' Kafka topic.
- Utilizes the
kafka
library, JSON value deserializer, and subscribes to the topic.
-
docker-compose.yaml
- Docker Compose configuration file for setting up Kafka and Zookeeper services.
- Uses ConfluentInc Docker images for Kafka and Zookeeper.
-
producer.ipynb
- Jupyter Notebook version of the Kafka producer.
- Simulates real-time stock market data streaming.
- Reads processed stock market data from a CSV file and continuously produces data to the 'stock-data' Kafka topic.
-
consumer.ipynb
- Jupyter Notebook version of the Kafka consumer.
- Consumes and prints messages from the 'stock-data' Kafka topic.
- Make sure you have Docker and Docker Compose installed.
- Install the necessary Python libraries using the following command:
pip install pandas kafka
- Run
docker-compose up -d
to start the Kafka and Zookeeper services. - Execute the
producer.py
orproducer.ipynb
script to start the data simulation. - Run the
consumer.py
orconsumer.ipynb
script to consume and process the streaming data.
$ docker-compose exec -it kafka bash
docker-compose exec kafka kafka-console-producer.sh --topic test --broker-list kafka:9092
$ docker-compos exec kafka-topics --list --bootstrap-server kafka:9092
docker-compose exec kafka kafka-console-producer.sh --topic test --broker-list kafka:9092
$ docker-compose exec kafka kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server kafka:9092
$ docker-compose down
The consumer.ipynb is currently configured to print the consumed stock data. Work is in progress to modify the consumer to save the data to Azure Gen2 storage. This will enable a more permanent storage solution for the streaming stock data.