Skip to content

This repository is for fine-tuning Whisper which is one of ASR model referenced by Huggingface

License

Notifications You must be signed in to change notification settings

JunseokLee42/whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finetuning-Whisper

This repository is for fine-tuning Whipser which is one of ASR models released by OpenAI.

Paper link: https://arxiv.org/abs/2212.04356

Environments

Colab Pro V100 GPU

Input

pair of audio sample and transcribed text(json format)

Output

transcribed text

Model

Whipser-small (# of parameters: 244M, sampling rate: 16khz)

Dataset

You are able to use Mozilla foundation's common voice 11 hindi dataset.

Training Test
6.5K 2.9K

Hyperparameter Configuration

epoch batch size learning rate
10 16 1e-5

Result

WER(Word Error Rate) 33

Limitation

Because of storage and GPU shortage, I had no choice but to utilize Hindi dataset rather than Korean or English.

References

UNIT 1. Working with audio data

UNIT 2. A Gentle Introduction to Audio Applications

UNIT 3. Transformer Architectures for Audio

UNIT 5. Automatic Speech Recognition

About

This repository is for fine-tuning Whisper which is one of ASR model referenced by Huggingface

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published