v0.7.0
This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking π
Highlights
- Energy VAD based chunking π£οΈ @jkrukowski
- There is a new decoding option called
chunkingStrategy
which can significantly speed up your single file transcriptions with minimal WER downsides. - It works by finding a clip point in the middle of the longest silence (lowest audio energy) in the last 15s of a 30s window and uses that to split up all the audio ahead of time so it can be asynchronously decoded in parallel.
- Heres a video of it in action, comparing
.none
chunking strategy with.vad
- There is a new decoding option called
vad.chunking.mp4
- Detect language helper:
- You can now call
detectLanguage
with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing. - Example:
- You can now call
let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"
- WhisperKit via Expo @seb-sep
- For anyone that's been wanting to use WhisperKit in react native, @seb-sep is maintaining a repo that makes it easy, and also setup an automation that will automatically update it with each new WhisperKit release, check it out here: https://github.com/seb-sep/whisper-kit-expo
- Bug fixes and enhancements:
- @jiangdi0924 and @fengcunhan contributed some nice fixes in this release with #136 and #138 (see below)
- Also moved the decoding progress callback to be fully async so that it doesn't block the decoder thread
What's Changed
- Fix language detection by @jkrukowski in #133
- Fix the reset operation exception in transcribeFile in the Demo. by @jiangdi0924 in #136
- gh action for making pr to whisper-kit-expo on whisperkit release by @seb-sep in #137
- add reStartRecordingLive function by @fengcunhan in #138
- Added
@_disfavoredOverload
for deprecated methods by @jkrukowski in #143 - VAD audio chunking by @jkrukowski in #135
- Async Progress Callback by @ZachNagengast in #145
- Detect language helper by @ZachNagengast in #146
New Contributors
- @jiangdi0924 made their first contribution in #136
- @seb-sep made their first contribution in #137
- @fengcunhan made their first contribution in #138
Full Changelog: v0.6.1...v0.7.0