Analyzing the data has become a very essential part of our day-to-day life. It helps to take decisions wisely. In the world of digitalization, WhatsApp has become one of our main sources of texting and communication. Unfortunately, we get into so many group conversations and don't understand what is going on.
Using a chat analyzer would clearly give you an idea of how active and healthy the group is. It's a demo of statistics of our personal groups.
Clone the repository
git clone https://github.com/EphronM/Chat_analyzer_with_Sentimental_analysis.git
- Note: WordCloud has an issue of not getting installed on newer python versions. It preferred to set runtime as Python 3.7
conda create -n chatenv python=3.7 -y
conda activate chatenv
pip install -r requirements.txt
streamlit run app.py
Whatsapp has an interesting feature to export out chats in a .txt format.
Open chat >> More >> Export chat >> Without Media
We need the text formatted data, hence media files don't play any role. You could try this webApp for both group and direct chats.
After uploading the text file to the web app, it converts the text into clean pandas data frame, makes it more convenient for analyzing.
If the uploaded data is from a group chat, it fetches the user names and displays it on a dropdown menu. We could either analyze individual performance as well as overall analytics.
Initially displays the total message count by taking the length of the dataset. As we have taken the data without media, those positions are with as message, media count is the count of the occurrences. Using the URLExtract, we extract the links present in the chat. And finally count the words in the chat splitting the string gives us the total word count.
This is only performed in a group chat or the "overall" option is selected.
BY taking the value count for "USERS" features from the dataset. The bar plot and the top active user's table are displayed.
Creating a word cloud that shows the combination of the most frequently used words. The words in the cloud tell us if the chat language is formal or informal. The stopwords and punctuations are removed in order to achieve relevant words.
Succeeding the word cloud, there is a word count table that exhibits a list of the most commonly used words along with the number of times the word has been used. For easier interpretation, the same data has been conveyed through a horizontal bar graph.
A similar table displays the values of the often-used emojis. The most common icon used by people to express their emotions digitally tops the list. Using emoji.UNICODE_EMOJI['en'] package which contains most of the emoji available and counting the value if present gives us the emoji count.
Two consecutive line graphs are present to display the message traffic wrt monthly and daily basis. This gives us very in-depth insights into the group.
Barplot is famous for their simplest way of representing and understanding nature. From this, we clearly understand which month of the year or which day of the week the group is active the most. This can tell us a story for why a particular month was the busiest and why others the least.
The whole dataset was divided into the hourly basis by creating a new feature named period by adding the preceding hour to the current hour feature. Thus, by creating a Pivot table by assigning index as day names, value as messages, and columns value as a newly prepared period feature.
Using the seaborn Heatmap, the matrix is formed using the prepared Pivot table. This tells us, at which period of the day the traffic was at the peak and the least.
Using the NLTK - vadar_lexicon package Sentiment Intensity Analyzer is imported and applied on the dataset of the message.
The Analyzer gives us the polarity score for
- Positive
- Negative
- Neutral
And the values are assigned as each separate feature. By calculating the total sum of all the 3 new features, and comparing the values to find the max to give the corresponding Sentiment as overall sentiments.
Plotting all three total sums gives the overall sentiments of the chat. To compare the difference between positive and negative sentiments, a separate pie chat is plotted for total positive & negative.
Author: EphronM
Email: ephronmartin2016@gmail.com