- Emotion Predictor
- Technologies
- ETL
- Test & Train
- Visualization
- Learnings
- Run Flask App
- Heroku
- Resources
- Contact
What is the Emotions Predictor?
Emotions Predictor is an application that allows users to record, and playback short audio file and its built-in super cool Machine Learning model will predict the range of emotions and the gender of the audio clip. There are options to compare the frequencies and other attributes for various emotions to see how they show up on a scale. Why Emotions Predictor? Empathy is important because it helps us understand how others are feeling so we can respond appropriately to the situation. Not all of us are born as manipulators, fortune-tellers or psychics with excellent empathetic skills. Also there are people who have certain disabilities understanding emotions or having below conditions like:
- A person who has difficulty identifying and expressing emotions
- People having trouble identifying social cues
- People who are hard of hearing
Emotions predictor, can come in handy for Interpreting the emotions in the voicemails, FBI recordings, law disputes, alien interactions, etc.
Emotions include:
- 02 = calm
- 03 = happy
- 04 = sad
- 05 = angry
- 06 = fearful
- 07 = disgust
- 08 = surprised
- Machine Learning
- Jupyter Notebook / Pandas
- Javascript
- Flask App
- D3
- HTML / CSS
- Tableau
- Data from CSV files of audio voices.
- Used
librosa
package to convert audio files into 128 Features including low-level feature extraction, such as chromograms, Mel spectrogram, MFCC, and various other spectral and rhythmic features - Used Pandas to provide the feature data for emotions and gender as input to the models
- Tested
RandomForestClassifier
,KNeighborsClassifier
,Keras Deep Learning
, andLinear Regression
to find the most accurate model. - Developed a record and playback functionality - the output of which could be read a model for predicting the emotions and the gender of the recorded audio
- Sample pre-recorded test clips were given as input to the models and emotions were predicted successfully.
Using the Labrosa library we inputted the wave. Files thru Labrosa to parse the file into 128 Mel-frequency cepstral coefficients (MFCCs) features. MFCCs represent the audio clip as a nonlinear "spectrum-of-a-spectrum". In an MFC, the frequency bands are equally spaced on the scale, which approximates the human auditory system.
This process allowed the audio file to be represented by data that was then able to be fed into various machine learning models for testing and training. We then used the same parsing method to take a user’s inputted file and break it down into MFCC data to be used for the model to predict the makeup of emotions in the person’s voice.
Through the models it was determined that for our purpose Random Forrest Classifier produced the most accurate prediction and was the model that was not overtrained
Emotion Random Forest
Gender Random Forest
Emotion Deep Learning
Gender Deep Learning
KNN Model
Accuracy
The Emotion Predictor runs as a client-side flask application, in its current edition it does not need nor contain a database. If we were to expand the project and use the user inputted files to be stored used as additional training inputs for the model a database would then be needed.
The application works by using the built-in functionality of HTML5 to allow the users browser to record and store the audio file, the file once recorded to passed into the FLASK app using the POST method where the file can then be run thru the audio parser, breaking it into features and then thru the model. The application uses two different models, one for emotion and one for Male/Female, this production as well as the probability of each emotion and sex is then passed into a JSON file as a dictionary that is used to generate the PLOTLY bar chart of emotions.
Test Clips
A similar function is used on the Test clips page, but the data is stored as a JSON file to avoid lag in the application attempting to call 8 audio files in a row to build the dictionary on each session. Instead, the page calls a stored dictionary and recalls the specific values for each of the 8 bootstrap cards, allowing for. A much more seem less user experience.
Sound cards made with bootstrap
cards
Alexis Bar Chart
Emotion Visualizations made using Pandas
Tableau
Model Accuracy
-
The gender data contained more female data than male when we combined both the datasets. The emotion data originally included neutral, calm, happy, sad, fearful, angry, disgust and surprise. We combined neutral and calm because they were similar.
-
This was causing our model to predict female, and calm more than it should. We took steps to remove extra female data and eliminated calm from our data. This resulted in a more accurate model. Predicting Gender
-
Even with an accurate model, it is still difficult to predict someone’s gender based on a person’s voice.
-
The only difference between the male and female larynx is size
-
Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories.
To Deploy our Flask App, please follow the below steps:
-
step 1: Git clone our repository into your local
-
from the folder in your terminal, type
python app.py
to launch site
RAVDESS Dataset: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0 Link
[TESS Dataset: Pichora-Fuller, M. Kathleen; Dupuis, Kate, 2020, "Toronto emotional speech set (TESS)", Link, Scholars Portal Dataverse, V1
Elliott McFarland * Celeste Muniz * Saroja Shreenivasan * Sai Prasanna * Tim Samson * Sara Simoes