-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Transcription Results with faster-whisper in Asynchronous Functions #1207
Comments
Whisper model is non-deterministic if temperature is not 0, try to set |
Hello @Purfview I have tried the solution which you have mentioned. but still the same result.
output:
Also, I have attached the screenshot for the reference. |
But it's not the same:
vs your previous result:
|
Read it again:
|
yes @Purfview it is producing new results every time. |
It doesn't when you set temperature to 0. |
Whisper doesn't guaranty a correct result, you can try a bigger model like And:
|
BTW, your result looks like it contains hallucinations, try |
@Purfview can you please help me where can I use |
@Purfview let me give you a brief about my use case, I have implemented a WebSocket server in Python that utilizes the Whisper model for audio transcription. The server receives audio data in byte format from a Vosk-Asterisk integration. I process this audio data by converting the bytes into 3-second WAV files. These WAV files are then transcribed using the Whisper model, and the transcribed text is sent back to the client. However, I am currently facing issues in this process. like this is a real-time streaming type of project. |
I dunno what's proper approach with streaming, but Whisper models are trained for 30s chunks, 3s chunks are too weird, try Btw, you can skip wav and pass numpy audio array to transcribe(). |
Issue Description:
I am using the faster-whisper library for speech-to-text transcription and encountering inconsistent results when running the transcription function asynchronously. Here's the code snippet:
output:
Problem:
The synchronous transcription produces consistent and accurate results:
good morning hello hello hello
However, the asynchronous transcriptions are inconsistent and sometimes produce repeated or erroneous outputs, such as:
good morning hello hello hello good morning hello hello hello
Good morning. Hello. Hello. Hello. So. So. So. So. So.
This is critical because my project requires asynchronous processing for optimal flow, but the inconsistencies make it unusable.
Expected Behavior:
The transcription results from the asynchronous calls should be identical to the synchronous call's results.
Question:
Additional Details:
Any guidance or recommendations to ensure consistent results in the asynchronous workflow would be greatly appreciated.
The text was updated successfully, but these errors were encountered: