Finetuning-Whisper

This repository is for fine-tuning Whipser which is one of ASR models released by OpenAI.

Paper link: https://arxiv.org/abs/2212.04356

Environments

Colab Pro V100 GPU

Input

pair of audio sample and transcribed text(json format)

Output

transcribed text

Model

Whipser-small (# of parameters: 244M, sampling rate: 16khz)

Dataset

You are able to use Mozilla foundation's common voice 11 hindi dataset.

Training	Test
6.5K	2.9K

Hyperparameter Configuration

epoch	batch size	learning rate
10	16	1e-5

Result

WER(Word Error Rate) 33

Limitation

Because of storage and GPU shortage, I had no choice but to utilize Hindi dataset rather than Korean or English.

References

UNIT 1. Working with audio data

UNIT 2. A Gentle Introduction to Audio Applications

UNIT 3. Transformer Architectures for Audio

UNIT 5. Automatic Speech Recognition

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
5_4_Fine_tuning_the_ASR_model.ipynb		5_4_Fine_tuning_the_ASR_model.ipynb
LICENSE		LICENSE
README.md		README.md
fine_tune_whisper.ipynb		fine_tune_whisper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finetuning-Whisper

Environments

Input

Output

Model

Dataset

Hyperparameter Configuration

Result

Limitation

References

About

Releases

Packages

Languages

License

JunseokLee42/whisper

Folders and files

Latest commit

History

Repository files navigation

Finetuning-Whisper

Environments

Input

Output

Model

Dataset

Hyperparameter Configuration

Result

Limitation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages