-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xinference] Error: An error occurred during streaming #8253
Comments
Hey @tigflanker! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this issue together! I found a similar unsolved discussion that might be relevant to your issue: dify 0.6.8 fails to deploy local xinference vl model(qwen-vl-chat) [1]. To address the streaming error by disabling the streaming feature, you need to locate the model invocation code in your Dify codebase. Typically, this code is found in the "Model Runtime" module, which provides the interface for invoking and authenticating various models [2]. Here is a general guide on where to look and how to modify the code:
Here is an example of how you might modify the code: def _invoke(self, model: str, credentials: dict,
prompt_messages: list[PromptMessage], model_parameters: dict,
tools: Optional[list[PromptMessageTool]] = None, stop: Optional[List[str]] = None,
stream: bool = True, user: Optional[str] = None) \
-> Union[LLMResult, Generator]:
"""
Invoke large language model
:param model: model name
:param credentials: model credentials
:param prompt_messages: prompt messages
:param model_parameters: model parameters
:param tools: tools for tool calling
:param stop: stop words
:param stream: is stream response
:param user: unique user id
:return: full response or stream response chunk generator result
"""
if stream:
return self._handle_stream_response(**kwargs)
return self._handle_sync_response(**kwargs) In your specific case, you need to set response = model.invoke(
model='Qwen-VL-Chat',
credentials={
'server_url': os.environ.get('XINFERENCE_SERVER_URL'),
'model_uid': os.environ.get('XINFERENCE_CHAT_MODEL_UID')
},
prompt_messages=[
SystemPromptMessage(
content='You are a helpful AI assistant.',
),
UserPromptMessage(
content='Hello World!'
)
],
model_parameters={
'temperature': 0.7,
'top_p': 1.0,
},
stop=['you'],
stream=False, # Disable streaming
user="abc-123"
) Make sure to replace the placeholder values with your actual model details and parameters. This should help you avoid the streaming error by disabling the streaming feature. |
The bot replied automatically.It told you to modify the source code, but you run with docker.lol. |
Need the logs of xinference side and our api container. |
@Howe829 @crazywoola I trace for the llm.py file from docker docker-api-1, modity two places include _invoke function and _generate function. dosubot, can you tell how to find the source code where a workjob invoke this qwen-vl-chat API? (such as 'response = model.invoke...') |
@crazywoola Hi crazywoola, here is the log info from xinference side:
and log info from docker-api-1 :
|
@Minamiyama Is it possible for you to look at this issue? |
I have the same issue. Has anyone solved this problem? |
Hi, @tigflanker. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
let me take a look these days, @laipz8200 can this assign to me ? |
I can not reproduce in my local can you provide your dsl? |
Self Checks
Dify version
0.8.0
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Same problem like this discussion: #5813
I use xinference model serving with docker, and deploy model is : Qwen-VL-Chat
Reply told us to disable stream function like:
But we don't know where to modify this file?
Could anyone please help to explaine this step? Thanks
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
The text was updated successfully, but these errors were encountered: