Real-Time Audio/Video Vowel Recognition System

Authors: Francesco Papaleo, Tommaso Settimi, Chris Morse

Final Project for the Sound Communication course - Master in Sound and Music Computing

Universitat Pompeu Fabra, Barcelona

Description

This project is a proof-of-concept for a vowel recognition system based on mouth gesture and sound. The system is based on the following steps:

  audio-in                                                        FaceOSC (Face tracking)
    |                                                                       |
audio feature extraction                                        mouth gesture extraction 
(Super Collider)                                                        (FaceOSC)
    |                                                                       |
   OSC                                                                     OSC
    |                                                                       |                                           
    ---------------------------> Python OSC Server  <------------------------
                                        |
                                       OSC
                                        |
                                Wekinator Input Helper
                                        |
                                        |
                                Wekinator Classifier
                                (Vowels recognition)
                                        |
                                        |
                                       OSC
                                        |
                                    Max / MSP
                        (visual feedback and audio examples)

Goals (work in progress)

The purpose of this project is to create a working infra-structure that could support language teaching applications.

For demonstration purposes, 5 possible vowels sounds are considered: /a/, /e/, /i/, /o/, /u/.

Run this code

install Wekinator
install FaceOSC (optional)
install SuperCollider
install Max / MSP

run from terminal:

pip3 install -r requirements.txt

cd src

python3 audio_video_server.py

open SuperCollider > File > Open > script
launch FaceOSC (optional)
open Max / MSP > File > Open > patch
open Wekinator Input Helper
open Wekinator > File > Open > project file
run the pre-trained model

FaceOSC Keyboard controls

r - reset the face tracker
m - toggle face mesh drawing
g - toggle gui's visibility
p - pause/unpause (only works with movie source)
up/down - increase/decrease movie playback speed (only works with movie source)

Other scripts in python

For demonstration purposes we provide some scripts that can be used to extract audio and video features from audio files and live audio/video input. These script are optional and are not required to run the main project.

audio_osc.py: sends audio features to Wekinator
formants_extractor.py: extract formants from audio files with Praat-Parselmouth
video_osc.py: sends mouth gesture features to Wekinator

Folder Structure

    .
    ├── assets                              # screenshots and slides of the project's presentation
    ├── Democlassifier                      # pre-trained model for Wekinator
    │   ├── current
    │   │   └── models
    │   └── saved
    └── src                                 # source code
        ├── audio_osc.py                    # calls formants_extractor and sends audio features to Wekinator
        ├── audio_video_server.py           # sends audio and video features to Wekinator via OSC  
        ├── FeatureExtractor.scd            # SuperCollider script for audio feature extraction
        ├── formants_extractor.py           # extract formants from audio files with Praat-Parselmouth
        ├── MonitorOSC.maxpat               # Max patch for monitoring OSC messages and testing the project
        ├── training_GUI.amxd               # Max patch for the training of vowel sounds
        └── video_osc.py                    # sends mouth gesture features to Wekinator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Real-Time Audio/Video Vowel Recognition System

Description

Goals (work in progress)

Run this code

FaceOSC Keyboard controls

Other scripts in python

Folder Structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

Real-Time Audio/Video Vowel Recognition System

Description

Goals (work in progress)

Run this code

FaceOSC Keyboard controls

Other scripts in python

Folder Structure