[xinference] Error: An error occurred during streaming #8253

tigflanker · 2024-09-11T06:24:40Z

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.0

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Same problem like this discussion: #5813

I use xinference model serving with docker, and deploy model is : Qwen-VL-Chat

Reply told us to disable stream function like:

response = model.invoke(
    model='ChatGLM3',
    credentials={
        'server_url': os.environ.get('XINFERENCE_SERVER_URL'),
        'model_uid': os.environ.get('XINFERENCE_CHAT_MODEL_UID')
    },
    prompt_messages=[
        SystemPromptMessage(
            content='You are a helpful AI assistant.',
        ),
        UserPromptMessage(
            content='Hello World!'
        )
    ],
    model_parameters={
        'temperature': 0.7,
        'top_p': 1.0,
    },
    stop=['you'],
    stream=False,  # Disable streaming
    user="abc-123"
)

# By setting stream=False, you should be able to avoid the streaming error.

But we don't know where to modify this file?
Could anyone please help to explaine this step? Thanks

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-09-11T06:28:12Z

Hey @tigflanker! I'm here to assist you with any bugs, questions, or contributions you have. Let's tackle this issue together!

I found a similar unsolved discussion that might be relevant to your issue: dify 0.6.8 fails to deploy local xinference vl model（qwen-vl-chat） ^[1].

To address the streaming error by disabling the streaming feature, you need to locate the model invocation code in your Dify codebase. Typically, this code is found in the "Model Runtime" module, which provides the interface for invoking and authenticating various models ^[2].

Here is a general guide on where to look and how to modify the code:

Locate the Model Invocation Code:
- The model invocation code is usually in a file related to the model you are using. For example, if you are using the Qwen-VL-Chat model, you might find the relevant code in a file like llm.py under the model_runtime module.
Modify the Invocation Code:
- Once you locate the file, look for the _invoke method or similar function where the model is invoked. You need to set the stream parameter to False in this method.

Here is an example of how you might modify the code:

def _invoke(self, model: str, credentials: dict,
            prompt_messages: list[PromptMessage], model_parameters: dict,
            tools: Optional[list[PromptMessageTool]] = None, stop: Optional[List[str]] = None,
            stream: bool = True, user: Optional[str] = None) \
        -> Union[LLMResult, Generator]:
    """
    Invoke large language model

    :param model: model name
    :param credentials: model credentials
    :param prompt_messages: prompt messages
    :param model_parameters: model parameters
    :param tools: tools for tool calling
    :param stop: stop words
    :param stream: is stream response
    :param user: unique user id
    :return: full response or stream response chunk generator result
    """
    if stream:
        return self._handle_stream_response(**kwargs)
    return self._handle_sync_response(**kwargs)

In your specific case, you need to set stream=False when invoking the model:

response = model.invoke(
    model='Qwen-VL-Chat',
    credentials={
        'server_url': os.environ.get('XINFERENCE_SERVER_URL'),
        'model_uid': os.environ.get('XINFERENCE_CHAT_MODEL_UID')
    },
    prompt_messages=[
        SystemPromptMessage(
            content='You are a helpful AI assistant.',
        ),
        UserPromptMessage(
            content='Hello World!'
        )
    ],
    model_parameters={
        'temperature': 0.7,
        'top_p': 1.0,
    },
    stop=['you'],
    stream=False,  # Disable streaming
    user="abc-123"
)

Make sure to replace the placeholder values with your actual model details and parameters. This should help you avoid the streaming error by disabling the streaming feature.

To continue talking to Dosu, mention @dosu.

Howe829 · 2024-09-11T06:32:16Z

The bot replied automatically.It told you to modify the source code, but you run with docker.lol.

crazywoola · 2024-09-11T06:37:28Z

Need the logs of xinference side and our api container.

tigflanker · 2024-09-11T07:52:31Z

@Howe829 @crazywoola
Thanks for reply.

I trace for the llm.py file from docker docker-api-1, modity two places include _invoke function and _generate function.
But another issue has surfaced...

dosubot, can you tell how to find the source code where a workjob invoke this qwen-vl-chat API? (such as 'response = model.invoke...')

tigflanker · 2024-09-11T08:29:20Z

@crazywoola Hi crazywoola, here is the log info from xinference side:

2024-09-11T08:22:24.558692415Z 2024-09-11 01:22:24,556 xinference.api.restful_api 1 ERROR    Chat completion stream got an error: [address=0.0.0.0:41349, pid=4710] isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
2024-09-11T08:22:24.558744398Z  * (Tensor elements, Tensor test_elements, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.558756994Z  * (Number element, Tensor test_elements, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.558767300Z  * (Tensor elements, Number test_element, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.558776930Z Traceback (most recent call last):
2024-09-11T08:22:24.558785930Z   File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1728, in stream_results
2024-09-11T08:22:24.558795803Z     async for item in iterator:
2024-09-11T08:22:24.558805087Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 340, in __anext__
2024-09-11T08:22:24.558814322Z     return await self._actor_ref.__xoscar_next__(self._uid)
2024-09-11T08:22:24.558823316Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
2024-09-11T08:22:24.558832381Z     return self._process_result_message(result)
2024-09-11T08:22:24.558841062Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
2024-09-11T08:22:24.558850515Z     raise message.as_instanceof_cause()
2024-09-11T08:22:24.558859249Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 656, in send
2024-09-11T08:22:24.558868246Z     result = await self._run_coro(message.message_id, coro)
2024-09-11T08:22:24.558880874Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 367, in _run_coro
2024-09-11T08:22:24.558918177Z     return await coro
2024-09-11T08:22:24.558927952Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
2024-09-11T08:22:24.558937127Z     return await super().__on_receive__(message)  # type: ignore
2024-09-11T08:22:24.558945898Z   File "xoscar/core.pyx", line 558, in __on_receive__
2024-09-11T08:22:24.559058617Z     raise ex
2024-09-11T08:22:24.559108495Z   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
2024-09-11T08:22:24.559122536Z     async with self._lock:
2024-09-11T08:22:24.559132010Z   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
2024-09-11T08:22:24.559141802Z     with debug_async_timeout('actor_lock_timeout',
2024-09-11T08:22:24.559151140Z   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
2024-09-11T08:22:24.559160616Z     result = await result
2024-09-11T08:22:24.559178825Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 431, in __xoscar_next__
2024-09-11T08:22:24.559188608Z     raise e
2024-09-11T08:22:24.559197722Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 417, in __xoscar_next__
2024-09-11T08:22:24.559207998Z     r = await asyncio.to_thread(_wrapper, gen)
2024-09-11T08:22:24.559217120Z   File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
2024-09-11T08:22:24.559226253Z     return await loop.run_in_executor(None, func_call)
2024-09-11T08:22:24.559235066Z   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2024-09-11T08:22:24.559243965Z     result = self.fn(*self.args, **self.kwargs)
2024-09-11T08:22:24.559253402Z   File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 402, in _wrapper
2024-09-11T08:22:24.559263060Z     return next(_gen)
2024-09-11T08:22:24.559272182Z   File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 326, in _to_generator
2024-09-11T08:22:24.559281913Z     for v in gen:
2024-09-11T08:22:24.559290273Z   File "/usr/local/lib/python3.10/dist-packages/xinference/model/utils.py", line 368, in inner
2024-09-11T08:22:24.559300028Z     for obj in func(*args, **kwargs):
2024-09-11T08:22:24.559309013Z   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/utils.py", line 585, in _to_chat_completion_chunks
2024-09-11T08:22:24.559318311Z     for i, chunk in enumerate(chunks):
2024-09-11T08:22:24.559327806Z   File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen_vl.py", line 212, in _generate_stream
2024-09-11T08:22:24.559337777Z     for response in response_generator:
2024-09-11T08:22:24.559346771Z   File "/root/.cache/huggingface/modules/transformers_modules/Qwen-VL-Chat-pytorch-7b/modeling_qwen.py", line 1021, in stream_generator
2024-09-11T08:22:24.559356738Z     for token in self.generate_stream(
2024-09-11T08:22:24.559393780Z   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2024-09-11T08:22:24.559404074Z     return func(*args, **kwargs)
2024-09-11T08:22:24.559412895Z   File "/usr/local/lib/python3.10/dist-packages/transformers_stream_generator/main.py", line 208, in generate
2024-09-11T08:22:24.559421896Z     ] = self._prepare_attention_mask_for_generation(
2024-09-11T08:22:24.559431094Z   File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 498, in _prepare_attention_mask_for_generation
2024-09-11T08:22:24.559440961Z     torch.isin(elements=inputs, test_elements=pad_token_id).any()
2024-09-11T08:22:24.559450334Z TypeError: [address=0.0.0.0:41349, pid=4710] isin() received an invalid combination of arguments - got (test_elements=int, elements=Tensor, ), but expected one of:
2024-09-11T08:22:24.559462690Z  * (Tensor elements, Tensor test_elements, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.559471854Z  * (Number element, Tensor test_elements, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.559481066Z  * (Tensor elements, Number test_element, *, bool assume_unique = False, bool invert = False, Tensor out = None)
2024-09-11T08:22:24.559489769Z

and log info from docker-api-1 :

2024-09-11T08:25:11.014640187Z 2024-09-11 08:25:11,014.014 INFO [Thread-34 (_generate_worker)] [_client.py:1038] - HTTP Request: POST http://xx.xx.xx.xx:9997/v1/chat/completions "HTTP/1.1 200 OK"

YihanWang0810 · 2024-10-09T09:10:41Z

I have the same issue. How to change the stream=False in DIFY or in Xinference? I can call the Xinference LLM on the web, but it return an error when I use the API.

qinxuye · 2024-10-22T02:33:11Z

@Minamiyama Is it possible for you to look at this issue?

sheepying · 2024-11-08T08:32:43Z

I have the same issue. Has anyone solved this problem?

dosubot · 2024-12-09T16:07:33Z

Hi, @tigflanker. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

You experienced a streaming error with the Qwen-VL-Chat model using xinference model serving with Docker.
Guidance was provided to disable streaming by setting stream=False, but there was confusion on where to apply this change.
I provided a detailed explanation on modifying the model invocation code, but modifying source code in Docker was challenging.
You attempted modifications and shared logs for further insights, with other users reporting similar issues.
Assistance was requested from @Minamiyama to help address the problem, but the issue remains unresolved.

Next Steps:

Please confirm if this issue is still relevant with the latest version of the Dify repository. If so, you can keep the discussion open by commenting here.
If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

yihong0618 · 2024-12-09T23:58:20Z

let me take a look these days, @laipz8200 can this assign to me ?

yihong0618 · 2024-12-10T10:14:03Z

I can not reproduce in my local can you provide your dsl?

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 9, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Dec 10, 2024

laipz8200 assigned yihong0618 Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xinference] Error: An error occurred during streaming #8253

[xinference] Error: An error occurred during streaming #8253

tigflanker commented Sep 11, 2024

dosubot bot commented Sep 11, 2024

Howe829 commented Sep 11, 2024

crazywoola commented Sep 11, 2024

tigflanker commented Sep 11, 2024

tigflanker commented Sep 11, 2024

YihanWang0810 commented Oct 9, 2024

qinxuye commented Oct 22, 2024

sheepying commented Nov 8, 2024

dosubot bot commented Dec 9, 2024

yihong0618 commented Dec 9, 2024

yihong0618 commented Dec 10, 2024

[xinference] Error: An error occurred during streaming #8253

[xinference] Error: An error occurred during streaming #8253

Comments

tigflanker commented Sep 11, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Sep 11, 2024

Howe829 commented Sep 11, 2024

crazywoola commented Sep 11, 2024

tigflanker commented Sep 11, 2024

tigflanker commented Sep 11, 2024

YihanWang0810 commented Oct 9, 2024

qinxuye commented Oct 22, 2024

sheepying commented Nov 8, 2024

dosubot bot commented Dec 9, 2024

yihong0618 commented Dec 9, 2024

yihong0618 commented Dec 10, 2024