Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Crawl4AI to tools and for sync data from website in Knowledge #8905

Open
4 of 5 tasks
tonycky opened this issue Sep 30, 2024 · 4 comments
Open
4 of 5 tasks

add Crawl4AI to tools and for sync data from website in Knowledge #8905

tonycky opened this issue Sep 30, 2024 · 4 comments
Assignees
Labels
🔨 feat:tools Tools for agent, function call related stuff.

Comments

@tonycky
Copy link

tonycky commented Sep 30, 2024

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I found from Crawl4AI who states that they outperforms Firecrawl significantly:

Simple crawl: Crawl4AI is over 4 times faster than Firecrawl.
With JavaScript execution: Even when executing JavaScript to load more content (doubling the number of images found), Crawl4AI is still faster than Firecrawl's simple crawl.

I would suggest to add this tool to dify to allow different ways to crawl the data from websites.

2. Additional context or comments

No response

3. Can you help us with this feature?

  • I am interested in contributing to this feature.
@dosubot dosubot bot added the 🔨 feat:tools Tools for agent, function call related stuff. label Sep 30, 2024
@tonycky
Copy link
Author

tonycky commented Sep 30, 2024

For your convenience:
https://github.com/unclecode/crawl4ai

@Yawen-1010 Yawen-1010 self-assigned this Oct 22, 2024
Copy link

dosubot bot commented Nov 22, 2024

Hi, @chankwongyin. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary

  • Proposal to integrate Crawl4AI into Dify for better data crawling efficiency.
  • Crawl4AI is noted to be faster than Firecrawl, especially with JavaScript-loaded content.
  • A link to the Crawl4AI GitHub repository was provided for reference.

Next Steps

  • Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting on this issue.
  • If there is no further activity, the issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 22, 2024
@tonycky
Copy link
Author

tonycky commented Nov 22, 2024

hope this feature can be added

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 22, 2024
Copy link

dosubot bot commented Nov 22, 2024

@takatost, the user @chankwongyin has indicated that the proposal to integrate Crawl4AI into Dify is still relevant and hopes this feature can be added. Could you please assist them with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 feat:tools Tools for agent, function call related stuff.
Projects
None yet
Development

No branches or pull requests

2 participants