v0.0.52
Added
-
Constructor arguments for GoogleLLMService to directly set tools and tool_config.
-
Smart turn detection example (
22d-natural-conversation-gemini-audio.py
) that leverages Gemini 2.0 capabilities ().
(see https://x.com/kwindla/status/1870974144831275410) -
Added
DailyTransport.send_dtmf()
to send dial-out DTMF tones. -
Added
DailyTransport.sip_call_transfer()
to forward SIP and PSTN calls to another address or number. For example, transfer a SIP call to a different SIP address or transfer a PSTN phone number to a different PSTN phone number. -
Added
DailyTransport.sip_refer()
to transfer incoming SIP/PSTN calls from outside Daily to another SIP/PSTN address. -
Added an
auto_mode
input parameter toElevenLabsTTSService
.auto_mode
is set toTrue
by default. Enabling this setting disables the chunk schedule and all buffers, which reduces latency. -
Added
KoalaFilter
which implement on device noise reduction using Koala Noise Suppression.
(see https://picovoice.ai/platform/koala/) -
Added
CerebrasLLMService
for Cerebras integration with an OpenAI-compatible interface. Added foundational example14k-function-calling-cerebras.py
. -
Pipecat now supports Python 3.13. We had a dependency on the
audioop
package which was deprecated and now removed on Python 3.13. We are now usingaudioop-lts
(https://github.com/AbstractUmbra/audioop) to provide the same functionality. -
Added timestamped conversation transcript support:
- New
TranscriptProcessor
factory provides access to user and assistant transcript processors. UserTranscriptProcessor
processes user speech with timestamps from transcription.AssistantTranscriptProcessor
processes assistant responses with LLM context timestamps.- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message format.
- New examples:
28a-transcription-processor-openai.py
,28b-transcription-processor-anthropic.py
, and28c-transcription-processor-gemini.py
.
- New
-
Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino, Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
Changed
-
PlayHTTTSService
uses the new v4 websocket API, which also fixes an issue where text inputted to the TTS didn't return audio. -
The default model for
ElevenLabsTTSService
is noweleven_flash_v2_5
. -
OpenAIRealtimeBetaLLMService
now takes amodel
parameter in the constructor. -
Updated the default model for the
OpenAIRealtimeBetaLLMService
. -
Room expiration (
exp
) inDailyRoomProperties
is now optional (None
) by default instead of automatically setting a 5-minute expiration time. You must explicitly set expiration time if desired.
Deprecated
AWSTTSService
is now deprecated, usePollyTTSService
instead.
Fixed
-
Fixed token counting in
GoogleLLMService
. Tokens were summed incorrectly (double-counted in many cases). -
Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service.
-
Fixed an issue that would cause
ParallelPipeline
to handleEndFrame
incorrectly causing the main pipeline to not terminate or terminate too early. -
Fixed an audio stuttering issue in
FastPitchTTSService
. -
Fixed a
BaseOutputTransport
issue that was causing non-audio frames being processed before the previous audio frames were played. This will allow, for example, sending a frameA
after aTTSSpeakFrame
and the frameA
will only be pushed downstream after the audio generated fromTTSSpeakFrame
has been spoken. -
Fixed a
DeepgramSTTService
issue that was causing language to be passed as an object instead of a string resulting in the connection to fail.