modify slides

kurianbenoy · Sep 26, 2023 · 74c9728 · 74c9728
1 parent a0b3dc4
commit 74c9728
Showing 1 changed file with 30 additions and 24 deletions.
diff --git a/talks/pyconindia-2023/pyconind.qmd b/talks/pyconindia-2023/pyconind.qmd
@@ -39,7 +39,8 @@ format:
 
 ![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/OpenAI_Logo.svg/1024px-OpenAI_Logo.svg.png){width=500 fig-align="center"}
 
-- <span style="color:red">I think Whisper^[<span style="color:black">According to [research paper](https://cdn.openai.com/papers/whisper.pdf) p.2, the name Whisper is an abbrevation for WSPR: `Web-scale Supervised Pretraining for Speech Recognition`.</span>] is the most `under-rated model` released by OpenAI.</span>
+- <span style="color:red">I think Whisper^[<span style="color:black">According to [research
+  paper](https://cdn.openai.com/papers/whisper.pdf) p.2, the name Whisper is an abbrevation for WSPSR: `Web-scale Supervised Pretraining for Speech Recognition`.</span>] is the most `under-rated model` released by OpenAI.</span>
 - <span style="color:green">It was open-sourced on September 21, 2022 by releasing the inference code and pre-trained model weights.</span>
 
 ## About OpenAI Whisper Model
@@ -106,7 +107,6 @@ Whisper model in C/C++.</span>
 - Fine-Tune Whisper is achieving SOTA in lot of languages
 - [Speaker diarization](https://huggingface.co/spaces/dwarkesh/whisper-speaker-recognition)
 - [Audio classification using OpenAI’s Whisper](https://github.com/jumon/zac)
-- [4x faster with same accuracy using faster-whisper](https://github.com/guillaumekln/faster-whisper)
 
 ::: aside
 For more checkout [awesome list by Sindre Sorhus](https://github.com/sindresorhus/awesome-whisper)
@@ -158,15 +158,6 @@ and use the pre-trained model to build a new model which works very well for our
 
 :::
 
-## Whisper Event
-
-- <span style="color:red">HuggingFace Team conducted a whisper fine tuning event for 2 weeks from 5th December 2022 to 19th December 2022. The results were out on 23rd December 2022.</span>
-- <span style="color:blue">The goal was to to fine-tune the Whisper model to build state-of-the-art speech recognition systems in the languages of our choice 🗣 </span>
-
-::: aside
-[Whisper Event huggingface page](https://huggingface.co/whisper-event)
-:::
-
 ## Malayalam performance in whisper paper
 
 | Model    | WER |
@@ -182,6 +173,16 @@ and use the pre-trained model to build a new model which works very well for our
 <span style="color:black">Appendix D2.2.2  CommonVoice 9 dataset results ([Whisper research paper](https://cdn.openai.com/papers/whisper.pdf) p.23).</span>
 :::
 
+## Whisper Event
+
+- <span style="color:red">HuggingFace Team conducted a whisper fine tuning event for 2 weeks from 5th December 2022 to 19th December 2022. The results were out on 23rd December 2022.</span>
+- <span style="color:blue">The goal was to to fine-tune the Whisper model to build state-of-the-art speech recognition systems in the languages of our choice 🗣 </span>
+
+::: aside
+[Whisper Event huggingface page](https://huggingface.co/whisper-event)
+:::
+
+
 ## Malayalam models produced in Whisper Event
 
 - <span style="color:red">For the language Malayalam, the results are as follows:</span>
@@ -215,15 +216,6 @@ previous works were done in proprietary datasets and not open-sourced.
 
 ![Last commit in thennal/whisper-medium-ml](../fossasia2023/thennal_commit.png)
 
-## Objective of my benchmarking
-
-- <span style="color:red">To test whether 10% WER was possible in available academic datasets.</span>
-
-**Datasets**
-
-- <span style="color:blue">Common Voice 11 malayalam subset</span>
-- <span style="color:blue">SMC Malayalam Speech Corpus</span>
-
 ## Metrics for evaluating ASR models
 
 - ASR evaulation relies on comparission between <span style="color:red">ground-truth</span> and <span style="color:red">ASR output</span>.
@@ -237,6 +229,16 @@ previous works were done in proprietary datasets and not open-sourced.
 <span style="color:black">To learn more about ASR evaluation check this [blogpost by AWS](https://aws.amazon.com/blogs/machine-learning/evaluating-an-automatic-speech-recognition-service/)</span>
 :::
 
+## Objective of my benchmarking
+
+- <span style="color:red">To test whether 10% WER was possible in available academic datasets.</span>
+
+**Datasets**
+
+- <span style="color:blue">Common Voice 11 malayalam subset</span>
+- <span style="color:blue">SMC Malayalam Speech Corpus</span>
+
+
 ## Methodology for benchmarking
 
 1. <span style="color:red">Create as a python library so further whisper-based transformer models can be benchmark.</span>
@@ -249,18 +251,23 @@ previous works were done in proprietary datasets and not open-sourced.
 
 ![Time for a new adventure](../fossasia2023/adventure_talk.jpg)
 
-## Libraries I used for benchmarking
+## Libraries I used
 
 * Dependencies:
     + transformers
     + datasets
     + jiwer
     + whisper_normalizer
-    + `numerize pandas librosa soundfile`
+    + pandas
+    + numerize
+    + `librosa soundfile`
+
+## Libraries I used 
 
 * Development library:
 
     + nbdev
+    + black
     + Jupyter Lab
 
 ## Loading the dataset for benchmarking
@@ -417,7 +424,7 @@ def evaluate_whisper_model_common_voice(
 ## Future Ideas for Benchmarking
 
 - <span style="color:red">Something very similar to OpenLLM Leaderboard with results of latest malayalam speech models.</span>
-- <span style="color:blue">Should include results for Kaldi, Meta's MMS, Wav2Vec etc.</span>
+- <span style="color:blue">Should include results for ASR models based on other architectures like Kaldi, Meta's MMS, Wav2Vec etc.</span>
 
 ![Open LLM leaderboard in [huggingface spaces](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)](../delft-fastai/openllm.png)
 
@@ -427,7 +434,6 @@ def evaluate_whisper_model_common_voice(
 ::: {.incremental}
 - <span style="color:red">In Malayalam we have achieved phenomenal results for fine tuned whisper models.
 - <span style="color:blue">The best model after benchmarking is: `thennal/whisper-medium-ml`
-- <span style="color:green">I think their seems to be a good ASR model suitable for production use-cases.</span>
 - <span style="color:orange">You can also do it in your own language especially if it is a low resource language.</span>
 :::