Finetuning Whisper for translation tasks #1646

WassayS · 2023-09-09T04:17:40Z

WassayS
Sep 9, 2023

I have an audio dataset of specific domain in the Hindi language, and I want to enhance the whisper translation capabilities of my model. Currently, it can take non-English audio input and translate it into English text.

I understand how to fine-tune the whisper model for transcription tasks like writing same language text as in audio but I'm not sure how to fine-tune it specifically for cross-lingual translation when audio is in another language and we want to improve translation to English performance of whisper model. Could you provide guidance on how to fine-tune the model for this purpose or share any repo?

AmgadHasan · 2024-01-23T00:09:46Z

AmgadHasan
Jan 23, 2024

Hi.
Did you have any success with this?

0 replies

emanueleielo · 2024-02-04T12:09:04Z

emanueleielo
Feb 4, 2024

I have an audio dataset of specific domain in the Hindi language, and I want to enhance the whisper translation capabilities of my model. Currently, it can take non-English audio input and translate it into English text.

I understand how to fine-tune the whisper model for transcription tasks like writing same language text as in audio but I'm not sure how to fine-tune it specifically for cross-lingual translation when audio is in another language and we want to improve translation to English performance of whisper model. Could you provide guidance on how to fine-tune the model for this purpose or share any repo?

I read somewhere that to fine tune on this task you can follow this guide fine tuning whisper and just change the dataset and set translation as task instead of transcribe.

I need to do the same but I still didn't tried. My worry is the calculation of WER, how can the wer be calculated in the task of translation? There are a lot of possibilities that the text predicted will have the same meaning as the real text but with differents and then the WER will mislead.

Did you manage it?

0 replies

AmgadHasan · 2024-02-15T15:26:14Z

AmgadHasan
Feb 15, 2024

I read somewhere that to fine tune on this task you can follow this guide fine tuning whisper and just change the dataset and set translation as task instead of transcribe.

I need to do the same but I still didn't tried. My worry is the calculation of WER, how can the wer be calculated in the task of translation? There are a lot of possibilities that the text predicted will have the same meaning as the real text but with differents and then the WER will mislead.

Did you manage it?

@emanueleielo
Yes, I managed to successfully fine-tune whisper for translation.

For evaluation, WER isn't a good metric for translation. You want to use one of the translation metrics like BLEU Score, METEOR, COMET or similar.

Hope that helps!

0 replies

rishikksh20 · 2024-05-12T07:46:52Z

EmreOzkose · 2024-12-11T13:22:52Z

EmreOzkose
Dec 11, 2024

Hi, @AmgadHasan , can you share how you prepare custom data?

For example, let's say language pair is en->hi.

print(custom_en_hi_dataloader["train"][0])

would be

{'audio': {'path': 'path_to_en.wav', 
           'array': ...,
           'sampling_rate': 16000},
 'sentence': 'खीर की मिठास पर गरमाई बिहार की सियासत, कुशवाहा ने दी सफाई'}

right?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning Whisper for translation tasks #1646

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Finetuning Whisper for translation tasks #1646

Replies: 5 comments · 2 replies

Replies: 5 comments 2 replies