Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error uploading files while building the knowledge base #11657

Open
5 tasks done
zhudemiao opened this issue Dec 14, 2024 · 3 comments
Open
5 tasks done

Error uploading files while building the knowledge base #11657

zhudemiao opened this issue Dec 14, 2024 · 3 comments
Labels
🐞 bug Something isn't working

Comments

@zhudemiao
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.13.1、0.13.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Error uploading files while building the knowledge base

✔️ Expected Behavior

The uploaded knowledge base files have been successfully indexed.

❌ Actual Behavior

I have uploaded 3 files with sizes of 2MB, 3MB, and 5MB respectively to build an index knowledge base. These three files have remained in a "queued" state. After restarting all Docker containers, one of the files started indexing but has been stuck in the "indexing" state for 5 hours without any progress. How should I proceed? When I upload 15 files, it results in a timeout immediately. The embedding model being used is bge-large-zh-v1.5, and the hybrid retrieval large model is bge-reranker-v2-m3. I have tried versions v0.13.1 and v0.13.2, and the issue persists. What steps should I take to resolve this?

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 14, 2024
Copy link

dosubot bot commented Dec 14, 2024

There are similar issues that have been discussed and resolved. Here are some relevant threads and solutions:

  1. Rate Limit Issue: One common cause for files remaining in a "queued" or "indexing" state is reaching the rate limit for the Embedding model API key. It's recommended to check if the rate limit has been reached for the API key used [1].

  2. Celery Worker Service: Ensure that the Celery worker service is running correctly. You can check the status of the worker service using Docker commands. If the worker service is not running, start it with docker compose up -d worker. Also, check the logs for any errors or warnings that might indicate why the worker is not processing the file upload tasks [2].

  3. Service Status: Verify that all necessary services are running correctly. Use docker compose ps to check the status of all services, including api, worker, web, weaviate, db, redis, and nginx. All should show a status of Up [2].

  4. Pending Processes and Configuration: Check for any pending segmentation or embedding processes, resource limitations, and configuration issues. Ensure all dependencies and services like db and redis are running smoothly [2].

These steps should help you diagnose and resolve the issue with files being stuck in the "queued" or "indexing" state.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@zhudemiao
Copy link
Author

There are similar issues that have been discussed and resolved. Here are some relevant threads and solutions:

========================================================
I am using a locally deployed large language model. In my work container, the following error is reported: however, when I enter the work container itself and access curl http://192.1.1.9:8080, it returns "Ollama is running."

core.model_runtime.errors.invoke.InvokeConnectionError: [ollama] Connection Error, HTTPConnectionPool(host='192.1.1.9', port=8080): Read timed out. (read timeout=300)
2024-12-14 12:05:14.185 ERROR [Thread-19 (_format_qa_document)] [qa_index_processor.py:159] - Failed to format qa document
Traceback (most recent call last):
File "/app/api/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 536, in _make_request
response = conn.getresponse()

@zhudemiao
Copy link
Author

However, after I restarted all containers related to Dify, the work container no longer reports an "Connection Error, HTTPConnectionPool." The files that were already uploaded remain in a "queued" state. When I attempt to upload files again, there is no output in the log files. The logs for the worker container are stuck at the following status, with no output after waiting for an hour. All containers have an "up" status. worker docker container stuck ....:
2024-12-14 13:31:56.496 INFO [MainThread] [connection.py:22] - Connected to redis://:@redis:6379/1
2024-12-14 13:31:56.500 INFO [MainThread] [mingle.py:40] - mingle: searching for neighbors
2024-12-14 13:31:57.508 INFO [MainThread] [mingle.py:49] - mingle: all alone
2024-12-14 13:31:57.519 INFO [MainThread] [worker.py:175] - celery@56ce5d9168c4 ready.
2024-12-14 13:31:57.521 INFO [MainThread] [strategy.py:161] - Task tasks.document_indexing_task.document_indexing_task[635ae99e-c7f4-4bee-ac92-cde41fd99e98] received
2024-12-14 13:31:57.600 INFO [Dummy-1] [document_indexing_task.py:59] - Start process document: d3a29daa-0382-4006-887c-330e8dc812d1
2024-12-14 13:31:57.658 INFO [Dummy-2] [pidbox.py:111] - pidbox: Connected to redis://:
@redis:6379/1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant