This repository is for fine-tuning Whipser which is one of ASR models released by OpenAI.
Paper link: https://arxiv.org/abs/2212.04356
Colab Pro V100 GPU
pair of audio sample and transcribed text(json format)
transcribed text
Whipser-small (# of parameters: 244M, sampling rate: 16khz)
You are able to use Mozilla foundation's common voice 11 hindi dataset.
Training | Test |
---|---|
6.5K | 2.9K |
epoch | batch size | learning rate |
---|---|---|
10 | 16 | 1e-5 |
WER(Word Error Rate) 33
Because of storage and GPU shortage, I had no choice but to utilize Hindi dataset rather than Korean or English.
UNIT 1. Working with audio data
UNIT 2. A Gentle Introduction to Audio Applications