Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 1.41 KB

README.md

File metadata and controls

36 lines (25 loc) · 1.41 KB

Customized Speech Recognition model for Command Recognition

Motivation

This project was developed for my Speech Processing course in my University. The model architecture was inspired by DeepSpeech2's architecture.

Requirements

I highly recommend using conda virtual environment. I implemented this model with Pytorch and Pytorch Lightning.

pip install -r requirements.txt

Dataset

The dataset used for training and evaluating this model was reccored and cleaned by me and my teamates. It contains 3800 wav files of 18 commands below:

Training

python train.py --epoch [num of epochs] --batch_size [batchsize] --data [path to image directory]  --vocab [path to vocab model file] --mode [decode mode: 'greedy' or 'beam'] 

Decoding

I used CTC as loss function. There are two strategies for decoding task, Greedy or BeamSearch decoder.

Inference