How to identify the final token when streaming chat models? #3822

twhitnah · 2023-04-30T06:14:17Z

twhitnah
Apr 30, 2023

I'm trying to write a callback handler that streams results to a Google doc, but writes to the doc in batches so that it don't flood the API with tiny write requests for every toke. But my code always cuts off the last chunk because it can't identify when the last chunk has streamed in. It seems in this codepath on_llm_start and on_llm_end are not called, so I can't flush the final results in on_llm_end.

Any ideas on how I can do this?

This is the code I'm using to generate the response:
callback_handler = StreamingGoogleDocCallbackHandler(google_docs_service, doc_id, user, generation_key)
chat = ChatOpenAI(
callback_manager=CallbackManager([callback_handler]),
client=openai.ChatCompletion,
model_name="gpt-3.5-turbo",
streaming=True,
)
return chat([SystemMessage(content=system_prompt), HumanMessage(content=human_prompt)]).content

And StreamingGoogleDocCallbackHandler implements the new token like:

def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
self.current_batch += token
if len(self.current_batch) > MIN_BATCH_LENGTH:
self.__write_text_to_doc(current_batch)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to identify the final token when streaming chat models? #3822

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to identify the final token when streaming chat models? #3822

twhitnah Apr 30, 2023

Replies: 0 comments

twhitnah
Apr 30, 2023