OpenAI APIs for TTS/STT? #348

skorokithakis · 2023-12-20T03:27:30Z

Is there (a plan for) a way to use the OpenAI servers for STT/TTS? They are fairly slow, unfortunately, but they might be a good option for some people.

kristiankielhofner · 2023-12-20T20:35:00Z

It's not exactly impossible but it hasn't been a focus because as you say it's quite slow - to the point of going against our mission of an Alexa-competitive voice interface.

Willow has a fairly unique streaming method to WIS. I'm not completely familiar with the OpenAI speech API but at best you'd almost certainly need a proxy of some sort, and if you were doing advanced things like audio compression (AMR) you'd need to do more.

skorokithakis · 2023-12-20T20:36:15Z

Makes sense, thank you.

skorokithakis · 2024-06-15T23:54:01Z

I'd like to revisit this now with GPT-4o being out, the multimodal functionality of sending the audio directly to the model and getting audio back might be interesting. Are there any plans for WIS to send the audio to the REST endpoint directly, and receive audio back?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI APIs for TTS/STT? #348

OpenAI APIs for TTS/STT? #348

skorokithakis commented Dec 20, 2023

kristiankielhofner commented Dec 20, 2023

skorokithakis commented Dec 20, 2023

skorokithakis commented Jun 15, 2024

OpenAI APIs for TTS/STT? #348

OpenAI APIs for TTS/STT? #348

Comments

skorokithakis commented Dec 20, 2023

kristiankielhofner commented Dec 20, 2023

skorokithakis commented Dec 20, 2023

skorokithakis commented Jun 15, 2024