General

Convolutional neural networks for Google speech commands data set with PyTorch.

General

We, xuyuan and tugstugi, have participated in the Kaggle competition TensorFlow Speech Recognition Challenge and reached the 10-th place. This repository contains a simplified and cleaned up version of our team's code.

Features

1x32x32 mel-spectrogram as network input
single network implementation both for CIFAR10 and Google speech commands data sets
faster audio data augmentation on STFT
Kaggle private LB scores evaluated on 150.000+ audio files

Results

Due to time limit of the competition, we have trained most of the nets with sgd using ReduceLROnPlateau for 70 epochs. For the training parameters and dependencies, see TRAINING.md. Earlier stopping the train process will sometimes produce a better score in Kaggle.

^_Model	^{_{CIFAR10 test set accuracy}}	^{_{Speech Commands test set accuracy}}	^{_{Speech Commands test set accuracy with crop}}	^{_{Speech Commands Kaggle private LB score}}	^{_{Speech Commands Kaggle private LB score with crop}}	^_Remarks
^{_{VGG19 BN}}	^_93.56%	^_97.337235%	^_97.527432%	^_0.87454	^_0.88030
^_ResNet32	^_-	^_96.181419%	^_96.196050%	^_0.87078	^_0.87419
^_WRN-28-10	^_-	^_97.937089%	^_97.922458%	^_0.88546	^_0.88699
^{_{WRN-28-10-dropout}}	^_96.22%	^_97.702999%	^_97.717630%	^_0.89580	^_0.89568
^_WRN-52-10	^_-	^_98.039503%	^_97.980980%	^_0.88159	^_0.88323	^{_{another trained model has 97.52%/0.89322}}
^{_{ResNext29 8x64}}	^_-	^_97.190929%	^_97.161668%	^_0.89533	^_0.89733	^{_{our best model during competition}}
^_DPN92	^_-	^_97.190929%	^_97.249451%	^_0.89075	^_0.89286
^{_{DenseNet-BC (L=100, k=12)}}	^_95.52%	^_97.161668%	^_97.147037%	^_0.88946	^_0.89134
^{_{DenseNet-BC (L=190, k=40)}}	^_-	^_97.117776%	^_97.147037%	^_0.89369	^_0.89521

Results with Mixup

After the competition, some of the networks were retrained using mixup: Beyond Empirical Risk Minimization by Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin and David Lopez-Paz.

^_Model	^{_{CIFAR10 test set accuracy}}	^{_{Speech Commands test set accuracy}}	^{_{Speech Commands test set accuracy with crop}}	^{_{Speech Commands Kaggle private LB score}}	^{_{Speech Commands Kaggle private LB score with crop}}	^_Remarks
^{_{VGG19 BN}}	^_-	^_97.483541%	^_97.542063%	^_0.89521	^_0.89839
^_WRN-52-10	^_-	^_97.454279%	^_97.498171%	^_0.90273	^_0.90355	^{_{same score as the 16-th place in Kaggle}}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
checkpoints		checkpoints
datasets		datasets
models		models
transforms		transforms
.gitignore		.gitignore
README.md		README.md
TRAINING.md		TRAINING.md
download_speech_commands_dataset.sh		download_speech_commands_dataset.sh
mixup.py		mixup.py
test_cifar10.py		test_cifar10.py
test_speech_commands.py		test_speech_commands.py
train_cifar10.py		train_cifar10.py
train_speech_commands.py		train_speech_commands.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General

Features

Results

Results with Mixup

About

Releases

Packages

Languages

tugstugi/pytorch-speech-commands

Folders and files

Latest commit

History

Repository files navigation

General

Features

Results

Results with Mixup

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages