Skip to content

WebApp which analyzes our personal whatsapp chats and gives us a sentimental analysis on the messages shared

Notifications You must be signed in to change notification settings

EphronM/Chat_analyzer_with_Sentimental_analysis

Repository files navigation

Analyzing WhatsApp chats with Sentimental analysis

Demo Site Link

Analyzing the data has become a very essential part of our day-to-day life. It helps to take decisions wisely. In the world of digitalization, WhatsApp has become one of our main sources of texting and communication. Unfortunately, we get into so many group conversations and don't understand what is going on.

Using a chat analyzer would clearly give you an idea of how active and healthy the group is. It's a demo of statistics of our personal groups.

ezgif com-gif-maker

Run this webApp localy

Clone the repository

git clone https://github.com/EphronM/Chat_analyzer_with_Sentimental_analysis.git
  • Note: WordCloud has an issue of not getting installed on newer python versions. It preferred to set runtime as Python 3.7

Create a conda environment after opening the repository

conda create -n chatenv python=3.7 -y
conda activate chatenv

Installing the required dependencies

pip install -r requirements.txt

All set to run the webApp

streamlit run app.py

where do you get these data from?

Whatsapp has an interesting feature to export out chats in a .txt format.

Open chat >> More >> Export chat >> Without Media

We need the text formatted data, hence media files don't play any role. You could try this webApp for both group and direct chats.

Analysing raw data

After uploading the text file to the web app, it converts the text into clean pandas data frame, makes it more  convenient for analyzing.

If the uploaded data is from a group chat, it fetches the user names and displays it on a dropdown menu. We could either analyze individual performance as well as overall analytics.

06 02 2022_16 48 26_REC

Initially displays the total message count by taking the length of the dataset. As we have taken the data without media, those positions are with as message, media count is the count of the occurrences. Using the URLExtract, we extract the links present in the chat. And finally count the words in the chat splitting the string gives us the total word count.

Finding the most active users

This is only performed in a group chat or the "overall" option is selected.

06 02 2022_17 00 47_REC

BY taking the value count for "USERS" features from the dataset. The bar plot and the top active user's table are displayed.

The wordCloud

wordcloud

Creating a word cloud that shows the combination of the most frequently used words. The words in the cloud tell us if the chat language is formal or informal. The stopwords and punctuations are removed in order to achieve relevant words.

Most common words

Succeeding the word cloud, there is a word count table that exhibits a list of the most commonly used words along with the number of times the word has been used. For easier interpretation, the same data has been  conveyed through a horizontal bar graph.

The routine of emojis

A similar table displays the values of the often-used emojis. The most common icon used by people to express their emotions digitally tops the list. Using emoji.UNICODE_EMOJI['en'] package which contains most of the emoji available and counting the value if present gives us the emoji count.

Timelines

monthy daily

Two consecutive line graphs are present to display the message traffic wrt monthly and daily basis. This gives us very in-depth insights into the group.

Bar plot for busy months and days

busy_month busy_day

Barplot is famous for their simplest way of representing and understanding nature. From this, we clearly understand which month of the year or which day of the week the group is active the most. This can tell us a story for why a particular month was the busiest and why others the least.

Periodic Heatmap

heatmap

The whole dataset was divided into the hourly basis by creating a new feature named period by adding the preceding hour to the current hour feature. Thus, by creating a Pivot table by assigning index as day names, value as messages, and columns value as a newly prepared period feature.

Using the seaborn Heatmap, the matrix is formed using the prepared Pivot table. This tells us, at which period of the day the traffic was at the peak and the least.

Sentimental Analysis

Using the NLTK - vadar_lexicon package Sentiment Intensity Analyzer  is imported and applied on the dataset of the message.

The Analyzer gives us the polarity score for

  • Positive
  • Negative
  • Neutral

sentimal_all

 And the values are assigned as each separate feature. By calculating the total sum of all the 3 new features, and  comparing the values to find the max to give the corresponding Sentiment as overall sentiments.

Plotting all three total sums gives the overall sentiments of the chat. To compare the difference between positive and negative sentiments, a separate pie chat is plotted for total positive & negative.

senti

Author: EphronM
Email: ephronmartin2016@gmail.com

About

WebApp which analyzes our personal whatsapp chats and gives us a sentimental analysis on the messages shared

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published