The project focuses on translating American Sign Language (ASL) fingerspelled alphabet (26 letters). I utilised transfer learning to extract features, followed by a custom classification block to classify letters. This model is then implemented in a real-time system with OpenCV - reading frames from a web camera and classifying them frame-by-frame. This repository contains the code & weights for classifying the American Sign Language (ASL) alphabet in real-time.
This project was developed as my portfolio project at the Data Science Retreat (Batch 09) in Berlin. Please feel free to fork/comment/collaborate! Presentation slides are available in the repo :)
The entire pipeline (web camera -> image crop -> pre-processing -> classification) can be executed by running the live_demo.py script.
The live_demo.py script loads a pre-trained model (VGG16/ResNet50/MobileNet) with a custom classification block, and classifies the ASL alphabet frame-by-frame in real-time. The script will automatically access your web camera and open up a window with the live camera feed. A rectangular region of interest (ROI) is shown on the camera feed. This ROI is cropped and passed to the classifier, which returns the top 3 predictions. The largest letter shown is the top prediction, and the bottom 2 letters are the second (left) and third (right) most probable predictions. The architecture of the classification block will be described further in Sections 4/5.
The code was developed with python 3.5 and requires the following libraries/versions:
- OpenCV 3.1.0
- keras 2.0.8
- tensorflow-gpu 1.0.1 (It can also work with non gpu version)
- numpy 1.13.3
- joblib 0.10.3
NOTE - feature extraction using the pre-trained models in Keras was run on an AWS EC2 p2.8xlarge instance with the Bitfusion Ubuntu 14 TensorFlow-2017 AMI. Packages had to be manually updated, and Python 2 is the standard version. You can either update to Python 3, or edit the scripts to work with Python 2 (the only issues should be the print statements)
When running the script, you must choose the pre-trained model you wish to use. You may optionally load your own weights for the classification block.
$ python live_demo.py --help
usage: live_demo.py [-h] [-w WEIGHTS] -m MODEL
optional arguments:
-h, --help show this help message and exit
-w WEIGHTS, --weights WEIGHTS
path to the model weights
required arguments:
-m MODEL, --model MODEL
name of pre-trained network to use
NOTE - On a MacBook Pro (macOS SIERRA 16GB 1600MHz DDR3/2.2 GHz Intel Core i7) using the CPU only, it can take up to ~250ms to classify a single frame. This results in lag during real-time classification as the effective frame rate is anywhere from 1-10 frames per second, depending on which model is running. MobileNet is the most efficient model. Performance for all models is is significantly improved if running on a GPU.
There are no accurate measurements of how many people use American Sign Lanuage (ASL) - estimates vary from 500,000 to 15 million people. However, 28 million Americans (~10% of the population) have some degree of hearing loss, and 2 million of these 28 million are classified as deaf. For many of these people, their first lanugage is ASL.
The ASL alphabet is 'fingerspelled' - this means all of the alphabet (26 letters, from A-Z) can be spelled using one hand. There are 3 main use cases of fingerspelling in any sign language:
(i) Spelling your name (ii) Emphasising a point (i.e. literally spelling out a word) (iii) When saying a word not present in the ASL dictionary (the current Oxford English dictionary has ~170,000 words while estimates for ASL range from 10,000-50,000 words)
This project is a (very small!) first step towards bridging the gap between 'signers' and 'non-signers'.
coming soon I promise
coming soon
coming soon
coming soon
https://research.gallaudet.edu/Publications/ASL_Users.pdf https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html