You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to write a callback handler that streams results to a Google doc, but writes to the doc in batches so that it don't flood the API with tiny write requests for every toke. But my code always cuts off the last chunk because it can't identify when the last chunk has streamed in. It seems in this codepath on_llm_start and on_llm_end are not called, so I can't flush the final results in on_llm_end.
Any ideas on how I can do this?
This is the code I'm using to generate the response:
callback_handler = StreamingGoogleDocCallbackHandler(google_docs_service, doc_id, user, generation_key)
chat = ChatOpenAI(
callback_manager=CallbackManager([callback_handler]),
client=openai.ChatCompletion,
model_name="gpt-3.5-turbo",
streaming=True,
)
return chat([SystemMessage(content=system_prompt), HumanMessage(content=human_prompt)]).content
And StreamingGoogleDocCallbackHandler implements the new token like:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm trying to write a callback handler that streams results to a Google doc, but writes to the doc in batches so that it don't flood the API with tiny write requests for every toke. But my code always cuts off the last chunk because it can't identify when the last chunk has streamed in. It seems in this codepath on_llm_start and on_llm_end are not called, so I can't flush the final results in on_llm_end.
Any ideas on how I can do this?
This is the code I'm using to generate the response:
callback_handler = StreamingGoogleDocCallbackHandler(google_docs_service, doc_id, user, generation_key)
chat = ChatOpenAI(
callback_manager=CallbackManager([callback_handler]),
client=openai.ChatCompletion,
model_name="gpt-3.5-turbo",
streaming=True,
)
return chat([SystemMessage(content=system_prompt), HumanMessage(content=human_prompt)]).content
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
self.current_batch += token
if len(self.current_batch) > MIN_BATCH_LENGTH:
self.__write_text_to_doc(current_batch)
Beta Was this translation helpful? Give feedback.
All reactions