Skip to content

Latest commit

 

History

History
113 lines (82 loc) · 4.51 KB

README.md

File metadata and controls

113 lines (82 loc) · 4.51 KB

Real-Time Audio/Video Vowel Recognition System

Authors: Francesco Papaleo, Tommaso Settimi, Chris Morse

Final Project for the Sound Communication course - Master in Sound and Music Computing

Universitat Pompeu Fabra, Barcelona

Description

This project is a proof-of-concept for a vowel recognition system based on mouth gesture and sound. The system is based on the following steps:

  audio-in                                                        FaceOSC (Face tracking)
    |                                                                       |
audio feature extraction                                        mouth gesture extraction 
(Super Collider)                                                        (FaceOSC)
    |                                                                       |
   OSC                                                                     OSC
    |                                                                       |                                           
    ---------------------------> Python OSC Server  <------------------------
                                        |
                                       OSC
                                        |
                                Wekinator Input Helper
                                        |
                                        |
                                Wekinator Classifier
                                (Vowels recognition)
                                        |
                                        |
                                       OSC
                                        |
                                    Max / MSP
                        (visual feedback and audio examples)

Goals (work in progress)

The purpose of this project is to create a working infra-structure that could support language teaching applications.

For demonstration purposes, 5 possible vowels sounds are considered: /a/, /e/, /i/, /o/, /u/.

Run this code

  1. install Wekinator

  2. install FaceOSC (optional)

  3. install SuperCollider

  4. install Max / MSP

  5. run from terminal:

    pip3 install -r requirements.txt
    
    cd src
    
    python3 audio_video_server.py
  6. open SuperCollider > File > Open > script

  7. launch FaceOSC (optional)

  8. open Max / MSP > File > Open > patch

  9. open Wekinator Input Helper

  10. open Wekinator > File > Open > project file

  11. run the pre-trained model

FaceOSC Keyboard controls

r - reset the face tracker
m - toggle face mesh drawing
g - toggle gui's visibility
p - pause/unpause (only works with movie source)
up/down - increase/decrease movie playback speed (only works with movie source)

Other scripts in python

For demonstration purposes we provide some scripts that can be used to extract audio and video features from audio files and live audio/video input. These script are optional and are not required to run the main project.

Folder Structure

    .
    ├── assets                              # screenshots and slides of the project's presentation
    ├── Democlassifier                      # pre-trained model for Wekinator
    │   ├── current
    │   │   └── models
    │   └── saved
    └── src                                 # source code
        ├── audio_osc.py                    # calls formants_extractor and sends audio features to Wekinator
        ├── audio_video_server.py           # sends audio and video features to Wekinator via OSC  
        ├── FeatureExtractor.scd            # SuperCollider script for audio feature extraction
        ├── formants_extractor.py           # extract formants from audio files with Praat-Parselmouth
        ├── MonitorOSC.maxpat               # Max patch for monitoring OSC messages and testing the project
        ├── training_GUI.amxd               # Max patch for the training of vowel sounds
        └── video_osc.py                    # sends mouth gesture features to Wekinator