Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop and continue #58

Open
ishandutta2007 opened this issue Feb 22, 2020 · 1 comment
Open

Stop and continue #58

ishandutta2007 opened this issue Feb 22, 2020 · 1 comment
Assignees

Comments

@ishandutta2007
Copy link

ishandutta2007 commented Feb 22, 2020

The issue with this tool is once it halts, your have to start all over again from scratch.
And with large sites this is a very common scenario.
Since we already have the partially generated xml, it would be nice to continue from where it was interrupted. Let me know your thoughts on this and how to achieve this, I am willing to send pull request once I have a better understanding of the code

@c4software
Copy link
Owner

Hi,

It's a really nice idea. The major drawback I can see is that we can miss some new page in previously crawled pages.

But with some work (like preload all links previously crawled to avoid refetching) we can implement that kind of feature.

@c4software c4software self-assigned this Mar 5, 2020
@c4software c4software pinned this issue Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants