We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b 那么命令为: crawlergo -c chrome http://www.A.com/ http://www.A.com/path_a http://www.A.com/path_b
有两个问题:
当然后期能有参数支持多路径作为入口最好不过.
Originally posted by @djerrystyle in #31 (comment)
The text was updated successfully, but these errors were encountered:
没有起任何效果,这个crawlergo 爬虫 bug还是很多,404页面处理不了,没法自定义初始化路径,没法对传统标签(进行爬取。。。。希望作者改进下。。。。。
Sorry, something went wrong.
经过测试,千万不要加--robots-path 这个参数。。。。不然啥数据都爬不到 bug
已解决 --robots-path 造成的爬虫崩溃问题
--robots-path
我之前已经找了一些办法解决了。。。。
No branches or pull requests
目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b
那么命令为: crawlergo -c chrome http://www.A.com/ http://www.A.com/path_a http://www.A.com/path_b
有两个问题:
当然后期能有参数支持多路径作为入口最好不过.
Originally posted by @djerrystyle in #31 (comment)
The text was updated successfully, but these errors were encountered: