Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207

savank7 · 2024-12-18T13:00:03Z

Issue Description:

I am using the faster-whisper library for speech-to-text transcription and encountering inconsistent results when running the transcription function asynchronously. Here's the code snippet:

from faster_whisper import WhisperModel
import asyncio

model_size = 'medium'
model = WhisperModel(model_size, device='cuda')

segments, _ = model.transcribe('audio_path', language="en")
transcription = " ".join(segment.text for segment in segments)
print(f"Transcription without async: {transcription}")


async def tts_func(num):
    asy_segments, _ = model.transcribe('audio_path', language="en")
    asy_transcription = " ".join(asy_segment.text for asy_segment in asy_segments)
    print(f"Transcription {num}: {asy_transcription}")


async def call_func():
    for i in range(5):
        await tts_func(i)


if __name__ == "__main__":
    asyncio.run(call_func())

output:

Transcription without async:  good morning hello hello hello
Transcription 0:  good morning hello hello hello  good morning hello hello hello
Transcription 1:  Good morning. Hello. Hello. Hello.
Transcription 2:  good morning hello hello hello  Hello
Transcription 3:  Good morning. Hello. Hello. Hello.
Transcription 4:  Good morning. Hello. Hello. Hello.  So.  So.  So.  So.  So.

Problem:
The synchronous transcription produces consistent and accurate results:
good morning hello hello hello

However, the asynchronous transcriptions are inconsistent and sometimes produce repeated or erroneous outputs, such as:
good morning hello hello hello good morning hello hello hello
Good morning. Hello. Hello. Hello. So. So. So. So. So.

This is critical because my project requires asynchronous processing for optimal flow, but the inconsistencies make it unusable.

Expected Behavior:
The transcription results from the asynchronous calls should be identical to the synchronous call's results.

Question:

Why is the asynchronous transcription producing inconsistent results?
Is the faster-whisper model thread-safe, or does it require specific handling in an asynchronous context?
How can I resolve this issue to get consistent results when using asynchronous functions?

Additional Details:

Model size: medium
Device: CUDA (Nvidia GPU)
Language: English (language="en")
Library: faster-whisper

Any guidance or recommendations to ensure consistent results in the asynchronous workflow would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Purfview · 2024-12-18T13:43:06Z

Whisper model is non-deterministic if temperature is not 0, try to set temperature=0, but results may degrade.

savank7 · 2024-12-18T14:02:36Z

Hello @Purfview

I have tried the solution which you have mentioned. but still the same result.

from faster_whisper import WhisperModel
import asyncio

model_size = 'medium'
model = WhisperModel(model_size, device='cuda')
temperature = 1

segments, _ = model.transcribe('audio_path', language="en", temperature=temperature)
transcription = " ".join(segment.text for segment in segments)
print(f"Transcription without async and temperature - {temperature}: {transcription}")


async def tts_func(num):
    asy_segments, _ = model.transcribe('audio_path', language="en", temperature=temperature)
    asy_transcription = " ".join(asy_segment.text for asy_segment in asy_segments)
    print(f"Transcription {num} and temperature - {temperature} : {asy_transcription}")


async def call_func():
    for i in range(5):
        await tts_func(i)


if __name__ == "__main__":
    asyncio.run(call_func())

output:

Transcription without async and temperature - 0:  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 0 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 1 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 2 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 3 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 4 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.


Transcription without async and temperature - 0.5:  Good morning. Hello. Hello. Hello.  Good morning.
Transcription 0 and temperature - 0.5 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello.
Transcription 1 and temperature - 0.5 :  Good morning. Hello. Hello. Hello.  Hello.
Transcription 2 and temperature - 0.5 :  Good morning. Hello. Hello. Hello.
Transcription 3 and temperature - 0.5 :  Good morning. Hello. Hello. Hello.  Hello. Hello. Hello.
Transcription 4 and temperature - 0.5 :  Good morning. Hello. Hello. Hello.


Transcription without async and temperature - 1:  Good morning. Hello. Hello. Hello.  Hello-hello! Hello!  Hello! Hello!  Hello! Hello!  Hello! Hello!  Hello! Hello!
Transcription 0 and temperature - 1 :  Good morning Hello  Hello Hello  Hello Hello  Hello Hello  Good morning Hello  Hello Hello  Hello Hello  Hello Hello  Hello Hello  Hello Hello  Hello Hello
Transcription 1 and temperature - 1 :  Good morning. Hello. Hello. Hello.
Transcription 2 and temperature - 1 :  Good morning. Hello. Hello. Hello.  Hello. Hello. Hello. Hello. Hello.  Hello. Hello. Hello. Hello. Hello. Hello. Hello. Hello.  Hello.  Hello.
Transcription 3 and temperature - 1 :  good morning hello hello hello  so
Transcription 4 and temperature - 1 :  Good morning. Hello. Hello. Hello.  Hello. Hello. Hello.  Hello. Hello. Hello.  Hello. Hello. Hello.  Hello. Hello.  Hello. Hello.

Also, I have attached the screenshot for the reference.

Purfview · 2024-12-18T14:15:54Z

Expected Behavior:
The transcription results from the asynchronous calls should be identical to the synchronous call's results.

I have tried the solution which you have mentioned. but still the same result.

But it's not the same:

Without async and temperature   - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 0 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 1 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 2 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 3 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.
Transcription 4 and temperature - 0 :  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.  Good morning. Hello. Hello. Hello.

vs your previous result:

Without async:    good morning hello hello hello
Transcription 0:  good morning hello hello hello  good morning hello hello hello
Transcription 1:  Good morning. Hello. Hello. Hello.
Transcription 2:  good morning hello hello hello  Hello
Transcription 3:  Good morning. Hello. Hello. Hello.
Transcription 4:  Good morning. Hello. Hello. Hello.  So.  So.  So.  So.  So.

Purfview · 2024-12-18T14:19:53Z

Read it again:

Whisper model is non-deterministic if temperature is not 0, try to set temperature=0, but results may degrade.

savank7 · 2024-12-18T14:20:09Z

yes @Purfview it is producing new results every time.

Purfview · 2024-12-18T14:24:46Z

yes @Purfview it is producing new results every time.

It doesn't when you set temperature to 0.

savank7 · 2024-12-18T14:30:34Z

yes @Purfview it is producing new results every time.

It doesn't when you set temperature to 0.

yes, @Purfview but the generated text is incorrect.

Purfview · 2024-12-18T14:36:32Z

yes, @Purfview but the generated text is incorrect.

Whisper doesn't guaranty a correct result, you can try a bigger model like large-v2.

And:

...but results may degrade.

Purfview · 2024-12-18T14:46:16Z

BTW, your result looks like it contains hallucinations, try hallucination_silence_threshold=2

savank7 · 2024-12-18T15:13:38Z

@Purfview can you please help me where can I use hallucination_silence_threshold=2 in the given code?

savank7 · 2024-12-18T15:21:09Z

@Purfview let me give you a brief about my use case, I have implemented a WebSocket server in Python that utilizes the Whisper model for audio transcription. The server receives audio data in byte format from a Vosk-Asterisk integration. I process this audio data by converting the bytes into 3-second WAV files. These WAV files are then transcribed using the Whisper model, and the transcribed text is sent back to the client. However, I am currently facing issues in this process. like this is a real-time streaming type of project.

Purfview · 2024-12-18T16:00:14Z

can you please help me where can I use hallucination_silence_threshold=2 in the given code?

model.transcribe('audio_path', language="en", hallucination_silence_threshold=1) doesn't work?

I process this audio data by converting the bytes into 3-second WAV files... ...a real-time streaming

I dunno what's proper approach with streaming, but Whisper models are trained for 30s chunks, 3s chunks are too weird, try without_timestamps=True.

Btw, you can skip wav and pass numpy audio array to transcribe().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207

Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207

savank7 commented Dec 18, 2024 •

edited

Loading

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024 •

edited

Loading

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024 •

edited

Loading

Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207

Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207

Comments

savank7 commented Dec 18, 2024 • edited Loading

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024 • edited Loading

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024

Purfview commented Dec 18, 2024

savank7 commented Dec 18, 2024

savank7 commented Dec 18, 2024

Purfview commented Dec 18, 2024 • edited Loading

savank7 commented Dec 18, 2024 •

edited

Loading

Purfview commented Dec 18, 2024 •

edited

Loading

Purfview commented Dec 18, 2024 •

edited

Loading