Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notion syncing only some pages #9012

Open
5 tasks done
BURG3R5 opened this issue Oct 4, 2024 · 13 comments
Open
5 tasks done

Notion syncing only some pages #9012

BURG3R5 opened this issue Oct 4, 2024 · 13 comments
Assignees
Labels
📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs

Comments

@BURG3R5
Copy link

BURG3R5 commented Oct 4, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Followed the typical steps to connect Dify to Notion using an internal integration.

✔️ Expected Behavior

We have 1228 pages in our Notion database. All of them should be indexed into the Knowledge Base.

❌ Actual Behavior

The selection screen shows 125 pages selected and upon initial sync completion, only 107 documents are indexed.

Screenshot 2024-10-04 at 3 03 11 PM Screenshot 2024-10-04 at 3 01 15 PM
@dosubot dosubot bot added the 📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs label Oct 4, 2024
Copy link

dosubot bot commented Oct 4, 2024

Hey @BURG3R5! 👋 I'm here to help you with any bugs, questions, or contributions you have for Dify. I'm a bot designed to assist you while waiting for a human maintainer. Let's tackle this issue together!

I found a similar issue that was previously reported and closed: [Cloud] Missing added pages when trying to add a new page to Knowledge Base with Notion integration. This issue was resolved by a pull request that fixed the problem with adding Notion pages to the knowledge base [1].

To troubleshoot and resolve your issue with incomplete syncing of Notion pages in Dify, you can follow these steps:

  1. Authorization Verification: Ensure that the authorization verification is completed successfully. This involves binding the Notion content to Dify either during dataset creation or through the settings under Data Sources.

  2. Check Integration Configuration: Verify that the integration with Notion is correctly configured. Ensure that the environment variables in the .env file are set correctly for either internal or public integration.

  3. Segmentation and Cleaning: Confirm that the segmentation settings and indexing method are correctly configured. This affects how data is processed and imported.

  4. Synchronization Process: If the Notion content is modified, ensure that the synchronization is triggered by clicking Sync in the Dify dataset Document List Page.

  5. Data Type Limitations: Be aware that images and files are not supported for import, and tabular data will be converted to text display, which might affect the completeness of the sync.

  6. Automatic Sync Setup: Ensure that the automatic sync is set up correctly so that any updates in Notion are automatically reflected in Dify.

If these steps do not resolve the issue, further investigation into the specific configuration and logs may be necessary. Additionally, ensure that you are following the specific steps for internal or public integration as outlined in the Dify documentation [2][3][4].

To continue talking to Dosu, mention @dosu.

@BURG3R5
Copy link
Author

BURG3R5 commented Oct 4, 2024

@dosu The mentioned issue is not really similar to mine. I need help from a human maintainer.

Copy link

dosubot bot commented Oct 4, 2024

This is a tough nut to crack, could really use your help on this, @takatost.

To continue talking to Dosu, mention @dosu.

@BURG3R5
Copy link
Author

BURG3R5 commented Oct 4, 2024

Additionally, I have already attempted the fixes the bot suggested, and none of them helped.

@JirehC
Copy link

JirehC commented Oct 11, 2024

Is there any updates on this issue? I faced the same problem of missing pages during sync, both cloud and self hosted

@BURG3R5
Copy link
Author

BURG3R5 commented Oct 11, 2024

@JirehC I've moved on to downloading Notion pages as Markdown and uploading them to an empty knowledge base using the API. Can't trust Dify to respond in time or even give proper error messages, so I'm implementing the upload and sync processes myself. At least this way I have more control over the processes and can actually see what's going wrong.

Not gonna close the issue because this is still an unsolved problem in the product.

@JirehC
Copy link

JirehC commented Oct 11, 2024

@BURG3R5 I've found a potential cause for this, and it's not something we can fix, it might even be a Notion Problem rather than a Dify problem. A potential problem with the syncing is the 'time-out' issue. For example, exporting Notion workspace to Anytype through integration also has the same issue as Dify where pages gone missing. Anytype explained that the file size could be too big to make it before the time-out.

I've also moved to download Notion pages as markdown but got fed up uploading them one by one because I couldn't find anything about uploading via API. Could you send me the link to the documentation on how to do it?

Edit: Is this the documentation you follow? https://docs.dify.ai/guides/knowledge-base/maintain-dataset-via-api

Screenshot 2024-10-11 at 17 07 30

@BURG3R5
Copy link
Author

BURG3R5 commented Oct 11, 2024

@JirehC

  1. Regarding the API: Yes, I am using that alongside the notion-to-md library.
  2. Regarding the issue: Extending the timeout on the server-side and using exponential back-off to avoid rate-limits should be simple enough for Dify maintainers. Notion's "retrieve block children" API is paginated, so even if there are many blocks inside the document, setting the page_size parameter to be small enough should resolve the timeout issues.

I haven't explored the Dify repository myself to figure out exactly where the issue is, since this is a massive mono-repo containing everything from frontend to client SDKs to backend and so on. I wish a maintainer would provide insights on what is the code that's failing.

@JirehC
Copy link

JirehC commented Oct 11, 2024

Thanks @BURG3R5. I haven't discovered that library, this is great. Thanks for sharing!

@Yawen-1010
Copy link

@crazywoola @JohnJyong

Copy link

dosubot bot commented Nov 22, 2024

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
  • Troubleshooting steps I took did not resolve the issue.
  • Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
  • You have moved to manually handling Notion pages and suggested improvements for Dify.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 22, 2024
@JirehC
Copy link

JirehC commented Nov 22, 2024

Hi, @BURG3R5. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • You reported a bug in Dify v0.9.1 where only 107 out of 1228 Notion pages are synced.
  • Troubleshooting steps I took did not resolve the issue.
  • Another user, JirehC, suggested a potential "time-out" issue with Notion's API.
  • You have moved to manually handling Notion pages and suggested improvements for Dify.

Next Steps:

  • Please confirm if this issue is still relevant to the latest version of Dify. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

This issue is not solved.

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 22, 2024
@xy3xy3
Copy link

xy3xy3 commented Dec 8, 2024

I got this problem too.
I met
{
"code": "indexing_estimate_error",
"message": "notion page type not supported",
"status": 500
}

when sync a page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 feat:datasource Data sources like web, Notion, Logseq, Lark, Docs
Projects
None yet
Development

No branches or pull requests

6 participants