This project is related to a series of data analysis about to the worldwide famous tv show 'Game of Thrones', more specifically about all sentences said during the 8 seasons.
- Python
- Spark (PySpark)
- Docker
- Jupyter Notebook
- Sentiment Analysis Python packages
- SQL strategies
The 'star' of this repository is here to be checked. I made a series of data analysis very compatible to a professional case and, as a bonus, I applied an enhancement to create the sentiment analysis of each sentence of the dataset, allowing us to understand how the python libs 'vaderSentiment' and 'textblob' can be easily used in data engineering or as service. Please have fun :)
Q: "lyamada, why not use the jupyter official image" A: Because we'll need to have the sentiment analysis libs installed in our container
$ docker build -t jupy_sentiment:latest .
$ cd jupyter_pyspark
$ docker-compose up -d
$ docker ps
$ docker logs <container_id>
Copy the URL with the token and change the port '8888' to '10000'. Paste in your favorite browser
Access in your browser the address localhost:4040
$ cd jupyter_pyspark
$ docker-compose down