planning: Ichigo VAD #91

dan-homebrew · 2024-10-18T06:25:41Z

Goal

Remove the need to press the button, detect the voice
- Medium-term
- Enables ambient voice detection
- Enables interruptibility
Small model that has binary classifier for voice activity detection
- Need to determine activity parameter (e.g. how many seconds)
Long-term: advance VAD to a listening model

Tasklist

Menlo Realtime API
Cortex Realtime API

Resources

Use SileroVAD https://github.com/snakers4/silero-vad
used by both VITA and Huggingface speech2speech model
Huggingface pipeline might be a good reference point https://github.com/huggingface/speech-to-speech

hahuyhoang411 · 2024-10-20T18:50:31Z

e2e vad: https://github.com/modelscope/FunASR

PodsAreAllYouNeed · 2024-10-23T02:57:36Z

FunASR is used by huggingface to support the Paraformer STT model, while they use SileroVAD. The FSMN-VAD provided by FunASR could be useful to look into as well. Also the pipeline for FunASR includes VAD and Diarization together with STT which could indeed be very useful.

The VAD handler written by hf using some of the SileroVAD code is quite nice: https://github.com/huggingface/speech-to-speech/blob/93d74ba3bc3ad1a948cc167d7cdb95699e49d867/VAD/vad_handler.py

It includes enhancement as well, which is very useful. We can potentially adapt the handler to support other VADs as well. This can cater to #93 as well.

Current Pipeline
Audio -> Ichigo -> TTS

Pipeline using hf/s2s handler
Audio -> (VAD -> Enhancement) -> Ichigo -> TTS

tikikun · 2024-11-11T05:52:03Z

great @nguyenhoangthuan99 you can take over this if you continue on ichigo demo

tikikun · 2024-11-22T03:09:11Z

on Alex now

dan-homebrew added this to Research Oct 18, 2024

dan-homebrew converted this from a draft issue Oct 18, 2024

dan-homebrew added this to the Ichigo v0.4 milestone Oct 18, 2024

dan-homebrew assigned PodsAreAllYouNeed Oct 18, 2024

dan-homebrew changed the title ~~epic: Ichigo VAD~~ planning: Ichigo VAD Oct 18, 2024

tikikun assigned nguyenhoangthuan99 and unassigned PodsAreAllYouNeed Nov 11, 2024

tikikun assigned tikikun and tuanlda78202 and unassigned nguyenhoangthuan99 Nov 11, 2024

PodsAreAllYouNeed mentioned this issue Nov 12, 2024

planning: Ichigo v0.4 Speech Enhancements #93

Closed

tikikun assigned nguyenhoangthuan99 and unassigned tikikun and tuanlda78202 Nov 22, 2024

hiento09 added this to Jan & Cortex Nov 22, 2024

github-project-automation bot moved this to Investigating in Jan & Cortex Nov 22, 2024

dan-homebrew modified the milestones: Ichigo v0.4, Ichigo Prod Demo Nov 25, 2024

bachvudinh assigned bachvudinh and unassigned bachvudinh Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planning: Ichigo VAD #91

planning: Ichigo VAD #91

dan-homebrew commented Oct 18, 2024 •

edited by PodsAreAllYouNeed

Loading

hahuyhoang411 commented Oct 20, 2024 •

edited

Loading

PodsAreAllYouNeed commented Oct 23, 2024

tikikun commented Nov 11, 2024

tikikun commented Nov 22, 2024

planning: Ichigo VAD #91

planning: Ichigo VAD #91

Comments

dan-homebrew commented Oct 18, 2024 • edited by PodsAreAllYouNeed Loading

Goal

Tasklist

Resources

hahuyhoang411 commented Oct 20, 2024 • edited Loading

PodsAreAllYouNeed commented Oct 23, 2024

tikikun commented Nov 11, 2024

tikikun commented Nov 22, 2024

dan-homebrew commented Oct 18, 2024 •

edited by PodsAreAllYouNeed

Loading

hahuyhoang411 commented Oct 20, 2024 •

edited

Loading