Whisper's output for Hindi audio leads to a transcript in Arabic or Persian script #1662

abhilashgupta · 2023-09-15T19:19:56Z

abhilashgupta
Sep 15, 2023

I tried transcribing a Hindi (non standard dialect) audio file.
The API call went through fine and returned the transcription in the Devnagiri script.
However, when I run whisper transcribe on it locally, it recognises that the audio is in Hindi, but the transcription is in either Arabic or Persian script (or maybe Urdu, not sure).
Same when I force it to transcribe in Hindi, Nepali or Marathi language (almost same script as Hindi, one extra character).

Writing here because I couldn't see an option to raise an issue.

Steps to reproduce:
I cannot attach a .wav file here, so here is the link to the video -> https://www.youtube.com/watch?v=x2CnZZFf8Jg
To use API, call the API and get back response in correct script.
To run locally, just run whisper downloaded_audio.wav --language hi --model base or whisper downloaded_audio.wav --model base

There's a conversation about how Whisper's auto detection detects Hindi audio to be Urdu (which makes sense given their similarity) at #118 but the API gives the response in the correct script and the toolkit outputs it in the wrong script even if I explicitly mention the language

7gxycn08 · 2023-09-17T06:12:21Z

7gxycn08
Sep 17, 2023

Don't bother using Whisper the accuracy rate is pretty low compared to letting GPT3.5 do the translation.

1 reply

abhilashgupta Sep 17, 2023
Author

I am not translating here.
I have an audio file from which I want to extract the transcription of the audio in the same language, which is a very different problem than translation, no?

kennethleungty · 2024-01-25T06:53:10Z

kennethleungty
Jan 25, 2024

@abhilashgupta Did you come across any solution/new insights for this?

2 replies

abhilashgupta Jan 25, 2024
Author

Yes,
the issue was observed in the base model, however it isn't the case with the large (large-v2 or large-v3) models. So, I started using the large models wherever possible.
However, occasionally, it still detects the language Hindi as Urdu, when set to auto-detect, given that they both are close to each other, and in that case, it still returns the Urdu script transcriptions. For that, always set the language manually to Hindi where possible.

Aliraza1010a Oct 21, 2024

Bhai me bhi ye issue face kar raha hu
Please batao kaise isse fix kare

nairajay2k · 2024-03-13T23:52:39Z

nairajay2k
Mar 13, 2024

I am facing the same issue. When I call model.transcribe('filename', language='hi') the returned text is in urdu/persian.

1 reply

gaurav241102 Nov 13, 2024

did you come up with any solution for this?

firofame · 2024-12-18T05:35:35Z

firofame
Dec 18, 2024

same issue with Malayalam audio

model = whisper.load_model('large-v3')
result = model.transcribe("audio.mp3")
print(result["language"]) // correct language detected i.e "ml"
print(result["text"]) // text is in a different language

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper's output for Hindi audio leads to a transcript in Arabic or Persian script #1662

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Whisper's output for Hindi audio leads to a transcript in Arabic or Persian script #1662

abhilashgupta Sep 15, 2023

Replies: 4 comments · 4 replies

7gxycn08 Sep 17, 2023

abhilashgupta Sep 17, 2023 Author

kennethleungty Jan 25, 2024

abhilashgupta Jan 25, 2024 Author

Aliraza1010a Oct 21, 2024

nairajay2k Mar 13, 2024

gaurav241102 Nov 13, 2024

firofame Dec 18, 2024

abhilashgupta
Sep 15, 2023

Replies: 4 comments 4 replies

7gxycn08
Sep 17, 2023

abhilashgupta Sep 17, 2023
Author

kennethleungty
Jan 25, 2024

abhilashgupta Jan 25, 2024
Author

nairajay2k
Mar 13, 2024

firofame
Dec 18, 2024