-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent definition of clip_timestamps
parameter between WhisperModel
and BatchedInferencePipeline
#1205
Comments
Yes, I was thinking to rename it to |
I think unifying the interfaces would be better than documenting differences. We could change both to Optional[List[dict]] where each dict has start/end times. This would make it possible to use |
clip_timestamps in sequential mode is for a quick user input, not for "self-defined VAD", btw, you can't have this functionality in batched mode as audio there is already chunked by VAD. If you want a "self-defined VAD" in sequential mode then make a new option, and rename batched "clip_timestamps" to that. |
I had that in mind when naming the parameter, it's not ideal I know, but if I'm going to change anything, then the sequential one is the candidate, because the batched one is much clearer IMO with clear starts and ends |
If you change the sequential one then it will lose original functionality which is to replicate the vanilla Whisper's clip_timestamps . |
The
clip_timestamps
parameter is defined differently inWhisperModel
andBatchedInferencePipeline
, which creates confusion when switching between these classes:In
WhisperModel.transcribe()
:faster-whisper/faster_whisper/transcribe.py
Line 735 in 1b24f28
In
BatchedInferencePipeline.transcribe()
:faster-whisper/faster_whisper/transcribe.py
Line 294 in 1b24f28
The text was updated successfully, but these errors were encountered: