Streaming Issue in Azure Kubernetes Deployment: LLM Answers Received at End Instead of Progressively #15992

girithodu · 2024-01-13T19:17:37Z

girithodu
Jan 13, 2024

Hello everyone,

I'm encountering issues with streaming in my application. Streaming works smoothly in my local environment, where the backend streams one token at a time, and the frontend accepts tokens in the same manner. However, when I deploy both services on Azure with Kubernetes, streaming no longer functions correctly. I'm observing that all of the LLM answers are received at the end instead of progressively. Could someone please assist me in understanding what might be causing this issue?

In my Kubernetes log, I am seeing a log of each token, so I am assuming each token is being streamed through
yield f"data: {json.dumps({'token': token})}\n\n"
return StreamingResponse(gen, media_type="text/event-stream") gen is function that is doing the streaming

Technologies

FastAPI for Py backend
Next.js/React for Frontend

@girithodu · 2024-01-13T19:23:59Z

dosubot[bot]
bot Jan 13, 2024

🤖

Hello @girithodu! 🚀

I am Dosu, a friendly bot designed to help solve bugs, answer questions, and assist you in becoming a contributor while you await a human maintainer. I'm here to help, so let's get started!

Based on the information provided, it seems like the issue you're experiencing might be related to the streaming attribute of the CloudflareWorkersAI class in the LangChain framework. In the code, the streaming attribute is set to False by default, which means that the responses from the Cloudflare Workers AI service are not streamed by default, but are instead returned all at once when the request is complete.

When you're running your application locally, you might be manually setting the streaming attribute to True, which enables streaming of responses. However, when your application is deployed on Azure with Kubernetes, the streaming attribute might not be getting set to True, which would cause all of the LLM answers to be received at the end instead of progressively.

To fix this issue, you could ensure that the streaming attribute is set to True when your application is deployed on Azure with Kubernetes. You can do this by passing streaming=True when creating an instance of the CloudflareWorkersAI class, like so:

cf_ai = CloudflareWorkersAI(
    account_id=my_account_id,
    api_token=my_api_token,
    model=llm_model,
    streaming=True
)

This will ensure that the responses from the Cloudflare Workers AI service are streamed, regardless of whether your application is running locally or on Azure with Kubernetes. You can find more details about this in the CloudflareWorkersAI class in the LangChain repository.

As for your question about whether the LangChain framework supports streaming responses in its implementation, the answer is yes. The LangChain framework does support streaming responses in its implementation. This is evident from the stream method in the provided context. This method returns an iterator of BaseMessageChunk instances, which can be used to stream responses. You can find more details about this in the ChatModels class in the LangChain repository.

Regarding your question about whether the LangChain framework has any specific configurations or settings for handling streaming when deployed on different platforms like Azure with Kubernetes, I wasn't able to find any specific information about this in the LangChain repository. However, it's possible that there might be some platform-specific configurations or settings that need to be adjusted when deploying your application on Azure with Kubernetes. If you could provide more details about your deployment configuration, I might be able to provide more specific advice.

I hope this helps! If you have any more questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Pramodh-G · 2024-06-27T11:56:52Z

Pramodh-G
Jun 27, 2024

@girithodu , were you able to figure this out? I am facing the same issue here

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming Issue in Azure Kubernetes Deployment: LLM Answers Received at End Instead of Progressively #15992

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Streaming Issue in Azure Kubernetes Deployment: LLM Answers Received at End Instead of Progressively #15992

girithodu Jan 13, 2024

Replies: 2 comments

dosubot[bot] bot Jan 13, 2024

Sources

Pramodh-G Jun 27, 2024

girithodu
Jan 13, 2024

dosubot[bot]
bot Jan 13, 2024

Pramodh-G
Jun 27, 2024