About streaming and reuse of reference audios #692

Picus303 · 2024-12-01T15:07:01Z

Picus303
Dec 1, 2024

Hi! I have a few questions about how fish speech works:

I understand that there are two steps to generate the audio: the semantic tokens and then the audio. Do you generate all the tokens and then only stream the audio or do you stream everything: text -> tokens -> audio ?
I took a look at the code and it looks like it could handle receiving a stream of text as an input but couldn't find an implementation of it. Is it currently possible?
If I need to generate a lot of text using the same reference audios, is there a way to process them only once to save time?
There currently doesn't seem to be a simple way to use fish speech by importing it as a python library. Would you be interested by a PR that implement it?

AnyaCoder · 2024-12-02T04:36:31Z

AnyaCoder
Dec 2, 2024
Collaborator

currenly only stream the audio
now it could handle receiving a stream of text chunk
yes, see argparser --use_memory_cache
Certainly happy if you can make a PR for that

0 replies

Picus303 · 2024-12-02T07:13:07Z

Picus303
Dec 2, 2024
Author

Thanks for your answers. I'd like to better understand the codebase before contributing.

You start by generating all the tokens because it's very fast compared to the audio generation or do you think changing that could have a noticable impact on latency?
I'm not sure to understand, is it implemented? If no, I'd like to work on that too.

0 replies

AnyaCoder · 2024-12-03T12:44:15Z

AnyaCoder
Dec 3, 2024
Collaborator

Generate all tokens is needed because it requires context.
It is not yet implemented. PR welcome :)

0 replies

Picus303 · 2024-12-03T13:13:10Z

Picus303
Dec 3, 2024
Author

Thanks! I like the project and I'll do my best to contribute.
I'll start by simple/refactoring PR to get my hands on it (and because those 1000 lines long files are hurting my feelings), and then work on the Python library and input streaming.
I guess I'll have to talk more with everyone so I joined the discord :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About streaming and reuse of reference audios #692

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

About streaming and reuse of reference audios #692

Picus303 Dec 1, 2024

Replies: 4 comments

AnyaCoder Dec 2, 2024 Collaborator

Picus303 Dec 2, 2024 Author

AnyaCoder Dec 3, 2024 Collaborator

Picus303 Dec 3, 2024 Author

Picus303
Dec 1, 2024

AnyaCoder
Dec 2, 2024
Collaborator

Picus303
Dec 2, 2024
Author

AnyaCoder
Dec 3, 2024
Collaborator

Picus303
Dec 3, 2024
Author