Skip to content

Commit

Permalink
modify slides
Browse files Browse the repository at this point in the history
  • Loading branch information
kurianbenoy committed Sep 26, 2023
1 parent a0b3dc4 commit 74c9728
Showing 1 changed file with 30 additions and 24 deletions.
54 changes: 30 additions & 24 deletions talks/pyconindia-2023/pyconind.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ format:

![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/OpenAI_Logo.svg/1024px-OpenAI_Logo.svg.png){width=500 fig-align="center"}

- <span style="color:red">I think Whisper^[<span style="color:black">According to [research paper](https://cdn.openai.com/papers/whisper.pdf) p.2, the name Whisper is an abbrevation for WSPR: `Web-scale Supervised Pretraining for Speech Recognition`.</span>] is the most `under-rated model` released by OpenAI.</span>
- <span style="color:red">I think Whisper^[<span style="color:black">According to [research
paper](https://cdn.openai.com/papers/whisper.pdf) p.2, the name Whisper is an abbrevation for WSPSR: `Web-scale Supervised Pretraining for Speech Recognition`.</span>] is the most `under-rated model` released by OpenAI.</span>
- <span style="color:green">It was open-sourced on September 21, 2022 by releasing the inference code and pre-trained model weights.</span>

## About OpenAI Whisper Model
Expand Down Expand Up @@ -106,7 +107,6 @@ Whisper model in C/C++.</span>
- Fine-Tune Whisper is achieving SOTA in lot of languages
- [Speaker diarization](https://huggingface.co/spaces/dwarkesh/whisper-speaker-recognition)
- [Audio classification using OpenAI’s Whisper](https://github.com/jumon/zac)
- [4x faster with same accuracy using faster-whisper](https://github.com/guillaumekln/faster-whisper)

::: aside
For more checkout [awesome list by Sindre Sorhus](https://github.com/sindresorhus/awesome-whisper)
Expand Down Expand Up @@ -158,15 +158,6 @@ and use the pre-trained model to build a new model which works very well for our

:::

## Whisper Event

- <span style="color:red">HuggingFace Team conducted a whisper fine tuning event for 2 weeks from 5th December 2022 to 19th December 2022. The results were out on 23rd December 2022.</span>
- <span style="color:blue">The goal was to to fine-tune the Whisper model to build state-of-the-art speech recognition systems in the languages of our choice 🗣 </span>

::: aside
[Whisper Event huggingface page](https://huggingface.co/whisper-event)
:::

## Malayalam performance in whisper paper

| Model | WER |
Expand All @@ -182,6 +173,16 @@ and use the pre-trained model to build a new model which works very well for our
<span style="color:black">Appendix D2.2.2 CommonVoice 9 dataset results ([Whisper research paper](https://cdn.openai.com/papers/whisper.pdf) p.23).</span>
:::

## Whisper Event

- <span style="color:red">HuggingFace Team conducted a whisper fine tuning event for 2 weeks from 5th December 2022 to 19th December 2022. The results were out on 23rd December 2022.</span>
- <span style="color:blue">The goal was to to fine-tune the Whisper model to build state-of-the-art speech recognition systems in the languages of our choice 🗣 </span>

::: aside
[Whisper Event huggingface page](https://huggingface.co/whisper-event)
:::


## Malayalam models produced in Whisper Event

- <span style="color:red">For the language Malayalam, the results are as follows:</span>
Expand Down Expand Up @@ -215,15 +216,6 @@ previous works were done in proprietary datasets and not open-sourced.

![Last commit in thennal/whisper-medium-ml](../fossasia2023/thennal_commit.png)

## Objective of my benchmarking

- <span style="color:red">To test whether 10% WER was possible in available academic datasets.</span>

**Datasets**

- <span style="color:blue">Common Voice 11 malayalam subset</span>
- <span style="color:blue">SMC Malayalam Speech Corpus</span>

## Metrics for evaluating ASR models

- ASR evaulation relies on comparission between <span style="color:red">ground-truth</span> and <span style="color:red">ASR output</span>.
Expand All @@ -237,6 +229,16 @@ previous works were done in proprietary datasets and not open-sourced.
<span style="color:black">To learn more about ASR evaluation check this [blogpost by AWS](https://aws.amazon.com/blogs/machine-learning/evaluating-an-automatic-speech-recognition-service/)</span>
:::

## Objective of my benchmarking

- <span style="color:red">To test whether 10% WER was possible in available academic datasets.</span>

**Datasets**

- <span style="color:blue">Common Voice 11 malayalam subset</span>
- <span style="color:blue">SMC Malayalam Speech Corpus</span>


## Methodology for benchmarking

1. <span style="color:red">Create as a python library so further whisper-based transformer models can be benchmark.</span>
Expand All @@ -249,18 +251,23 @@ previous works were done in proprietary datasets and not open-sourced.

![Time for a new adventure](../fossasia2023/adventure_talk.jpg)

## Libraries I used for benchmarking
## Libraries I used

* Dependencies:
+ transformers
+ datasets
+ jiwer
+ whisper_normalizer
+ `numerize pandas librosa soundfile`
+ pandas
+ numerize
+ `librosa soundfile`

## Libraries I used

* Development library:

+ nbdev
+ black
+ Jupyter Lab

## Loading the dataset for benchmarking
Expand Down Expand Up @@ -417,7 +424,7 @@ def evaluate_whisper_model_common_voice(
## Future Ideas for Benchmarking

- <span style="color:red">Something very similar to OpenLLM Leaderboard with results of latest malayalam speech models.</span>
- <span style="color:blue">Should include results for Kaldi, Meta's MMS, Wav2Vec etc.</span>
- <span style="color:blue">Should include results for ASR models based on other architectures like Kaldi, Meta's MMS, Wav2Vec etc.</span>

![Open LLM leaderboard in [huggingface spaces](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)](../delft-fastai/openllm.png)

Expand All @@ -427,7 +434,6 @@ def evaluate_whisper_model_common_voice(
::: {.incremental}
- <span style="color:red">In Malayalam we have achieved phenomenal results for fine tuned whisper models.
- <span style="color:blue">The best model after benchmarking is: `thennal/whisper-medium-ml`
- <span style="color:green">I think their seems to be a good ASR model suitable for production use-cases.</span>
- <span style="color:orange">You can also do it in your own language especially if it is a low resource language.</span>
:::

Expand Down

0 comments on commit 74c9728

Please sign in to comment.