Inconsistent definition of `clip_timestamps` parameter between `WhisperModel` and `BatchedInferencePipeline` #1205

zh-plus · 2024-12-17T09:38:55Z

The clip_timestamps parameter is defined differently in WhisperModel and BatchedInferencePipeline, which creates confusion when switching between these classes:

In WhisperModel.transcribe():

faster-whisper/faster_whisper/transcribe.py

Line 735 in 1b24f28

clip_timestamps: Union[str, List[float]] = "0",

In BatchedInferencePipeline.transcribe():

faster-whisper/faster_whisper/transcribe.py

Line 294 in 1b24f28

clip_timestamps: Optional[List[dict]] = None,

The text was updated successfully, but these errors were encountered:

Purfview · 2024-12-18T12:23:29Z

Yes, I was thinking to rename it to vad_timestamps for batched, but it's "cleaner" as is I think.
Maybe just add the notes at the descriptions that it needs different input in batched vs sequential modes?

zh-plus · 2024-12-19T02:34:12Z

I think unifying the interfaces would be better than documenting differences. We could change both to Optional[List[dict]] where each dict has start/end times. This would make it possible to use WhisperModel and BatchedInferencePipeline with the same clip_timestamps for self-defined VAD.

Purfview · 2024-12-19T10:16:32Z

We could change both to Optional[List[dict]]

clip_timestamps in sequential mode is for a quick user input, not for "self-defined VAD", btw, you can't have this functionality in batched mode as audio there is already chunked by VAD.

If you want a "self-defined VAD" in sequential mode then make a new option, and rename batched "clip_timestamps" to that.

MahmoudAshraf97 · 2024-12-19T11:24:01Z

I had that in mind when naming the parameter, it's not ideal I know, but if I'm going to change anything, then the sequential one is the candidate, because the batched one is much clearer IMO with clear starts and ends

Purfview · 2024-12-19T12:05:08Z

If you change the sequential one then it will lose original functionality which is to replicate the vanilla Whisper's clip_timestamps .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent definition of `clip_timestamps` parameter between `WhisperModel` and `BatchedInferencePipeline` #1205

Inconsistent definition of `clip_timestamps` parameter between `WhisperModel` and `BatchedInferencePipeline` #1205

zh-plus commented Dec 17, 2024

Purfview commented Dec 18, 2024 •

edited

Loading

zh-plus commented Dec 19, 2024

Purfview commented Dec 19, 2024 •

edited

Loading

MahmoudAshraf97 commented Dec 19, 2024

Purfview commented Dec 19, 2024

Inconsistent definition of clip_timestamps parameter between WhisperModel and BatchedInferencePipeline #1205

Inconsistent definition of clip_timestamps parameter between WhisperModel and BatchedInferencePipeline #1205

Comments

zh-plus commented Dec 17, 2024

Purfview commented Dec 18, 2024 • edited Loading

zh-plus commented Dec 19, 2024

Purfview commented Dec 19, 2024 • edited Loading

MahmoudAshraf97 commented Dec 19, 2024

Purfview commented Dec 19, 2024

Inconsistent definition of `clip_timestamps` parameter between `WhisperModel` and `BatchedInferencePipeline` #1205

Inconsistent definition of `clip_timestamps` parameter between `WhisperModel` and `BatchedInferencePipeline` #1205

Purfview commented Dec 18, 2024 •

edited

Loading

Purfview commented Dec 19, 2024 •

edited

Loading