Customized Speech Recognition model for Command Recognition

Motivation

This project was developed for my Speech Processing course in my University. The model architecture was inspired by DeepSpeech2's architecture.

Requirements

I highly recommend using conda virtual environment. I implemented this model with Pytorch and Pytorch Lightning.

pip install -r requirements.txt

Dataset

The dataset used for training and evaluating this model was reccored and cleaned by me and my teamates. It contains 3800 wav files of 18 commands below:

Training

python train.py --epoch [num of epochs] --batch_size [batchsize] --data [path to image directory]  --vocab [path to vocab model file] --mode [decode mode: 'greedy' or 'beam']

Decoding

I used CTC as loss function. There are two strategies for decoding task, Greedy or BeamSearch decoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Customized Speech Recognition model for Command Recognition

Motivation

Requirements

Dataset

Training

Decoding

Inference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Customized Speech Recognition model for Command Recognition

Motivation

Requirements

Dataset

Training

Decoding

Inference