Skip to content

Collaborated to create a Machine Learning model trained and tested with a Random Forest model to predict primary emotion based on input audio file. Data cleaned and trained in a Jupyter Notebook using Pandas and Librosa. Results visualized using Pandas, Tableau, and JavaScript functions with bootstrap in a dynamic HTML website.

Notifications You must be signed in to change notification settings

timsamson/audible_emotion_recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emotion Recognition- Machine Learning

logo

Emotions Predictor

Table of contents

Emotion Predictor

logo

What is the Emotions Predictor?

Emotions Predictor is an application that allows users to record, and playback short audio file and its built-in super cool Machine Learning model will predict the range of emotions and the gender of the audio clip. There are options to compare the frequencies and other attributes for various emotions to see how they show up on a scale. Why Emotions Predictor? Empathy is important because it helps us understand how others are feeling so we can respond appropriately to the situation. Not all of us are born as manipulators, fortune-tellers or psychics with excellent empathetic skills. Also there are people who have certain disabilities understanding emotions or having below conditions like:

  • A person who has difficulty identifying and expressing emotions
  • People having trouble identifying social cues
  • People who are hard of hearing

Emotions predictor, can come in handy for Interpreting the emotions in the voicemails, FBI recordings, law disputes, alien interactions, etc.

Emotions include:

  • 02 = calm
  • 03 = happy
  • 04 = sad
  • 05 = angry
  • 06 = fearful
  • 07 = disgust
  • 08 = surprised

Technologies

  • Machine Learning
  • Jupyter Notebook / Pandas
  • Javascript
  • Flask App
  • D3
  • HTML / CSS
  • Tableau

Extract Transform Load

  • Data from CSV files of audio voices.
  • Used librosa package to convert audio files into 128 Features including low-level feature extraction, such as chromograms, Mel spectrogram, MFCC, and various other spectral and rhythmic features
  • Used Pandas to provide the feature data for emotions and gender as input to the models
  • Tested RandomForestClassifier, KNeighborsClassifier , Keras Deep Learning, and Linear Regression to find the most accurate model.
  • Developed a record and playback functionality - the output of which could be read a model for predicting the emotions and the gender of the recorded audio
  • Sample pre-recorded test clips were given as input to the models and emotions were predicted successfully.

wave gif

Test Train

Using the Labrosa library we inputted the wave. Files thru Labrosa to parse the file into 128 Mel-frequency cepstral coefficients (MFCCs) features. MFCCs represent the audio clip as a nonlinear "spectrum-of-a-spectrum". In an MFC, the frequency bands are equally spaced on the scale, which approximates the human auditory system.

This process allowed the audio file to be represented by data that was then able to be fed into various machine learning models for testing and training. We then used the same parsing method to take a user’s inputted file and break it down into MFCC data to be used for the model to predict the makeup of emotions in the person’s voice.

Through the models it was determined that for our purpose Random Forrest Classifier produced the most accurate prediction and was the model that was not overtrained

model

Emotion Random Forest

emotion RF

Gender Random Forest

gender RF

Emotion Deep Learning

emotion deep learning

Gender Deep Learning

gender deep learning

KNN Model

knn model

Accuracy

model chart

winner

Visualization

The Emotion Predictor runs as a client-side flask application, in its current edition it does not need nor contain a database. If we were to expand the project and use the user inputted files to be stored used as additional training inputs for the model a database would then be needed.

The application works by using the built-in functionality of HTML5 to allow the users browser to record and store the audio file, the file once recorded to passed into the FLASK app using the POST method where the file can then be run thru the audio parser, breaking it into features and then thru the model. The application uses two different models, one for emotion and one for Male/Female, this production as well as the probability of each emotion and sex is then passed into a JSON file as a dictionary that is used to generate the PLOTLY bar chart of emotions.

model test

Test Clips

A similar function is used on the Test clips page, but the data is stored as a JSON file to avoid lag in the application attempting to call 8 audio files in a row to build the dictionary on each session. Instead, the page calls a stored dictionary and recalls the specific values for each of the 8 bootstrap cards, allowing for. A much more seem less user experience.

sound

Sound cards made with bootstrap cards

test clip1

test clip2

Alexis Bar Chart

Alexis

Emotion Visualizations made using Pandas

calm

angry

Tableau

tableau 1

tableau 2

Learnings

Model Accuracy

  • The gender data contained more female data than male when we combined both the datasets. The emotion data originally included neutral, calm, happy, sad, fearful, angry, disgust and surprise. We combined neutral and calm because they were similar.

  • This was causing our model to predict female, and calm more than it should. We took steps to remove extra female data and eliminated calm from our data. This resulted in a more accurate model. Predicting Gender

  • Even with an accurate model, it is still difficult to predict someone’s gender based on a person’s voice.

  • The only difference between the male and female larynx is size

  • Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories.

Run Flask App

To Deploy our Flask App, please follow the below steps:

  • step 1: Git clone our repository into your local

  • from the folder in your terminal, type python app.py to launch site

Heroku

Emotion Predictor

Resources

faces

RAVDESS Dataset: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0 Link

[TESS Dataset: Pichora-Fuller, M. Kathleen; Dupuis, Kate, 2020, "Toronto emotional speech set (TESS)", Link, Scholars Portal Dataverse, V1

Link

HTML Template

Google Doc Presentation

Contact

team

Elliott McFarland * Celeste Muniz * Saroja Shreenivasan * Sai Prasanna * Tim Samson * Sara Simoes

thanks!

About

Collaborated to create a Machine Learning model trained and tested with a Random Forest model to predict primary emotion based on input audio file. Data cleaned and trained in a Jupyter Notebook using Pandas and Librosa. Results visualized using Pandas, Tableau, and JavaScript functions with bootstrap in a dynamic HTML website.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published