目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b #43

asdfasadfasfa · 2020-03-08T10:06:13Z

目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b
那么命令为: crawlergo -c chrome http://www.A.com/ http://www.A.com/path_a http://www.A.com/path_b

有两个问题:

当然后期能有参数支持多路径作为入口最好不过.

Originally posted by @djerrystyle in #31 (comment)

asdfasadfasfa · 2020-03-08T10:07:44Z

没有起任何效果，这个crawlergo 爬虫 bug还是很多，404页面处理不了，没法自定义初始化路径，没法对传统标签（进行爬取。。。。希望作者改进下。。。。。

asdfasadfasfa · 2020-03-08T10:38:29Z

经过测试，千万不要加--robots-path 这个参数。。。。不然啥数据都爬不到 bug

Qianlitp · 2020-05-11T14:31:14Z

已解决 --robots-path 造成的爬虫崩溃问题

asdfasadfasfa · 2020-05-13T07:36:16Z

我之前已经找了一些办法解决了。。。。

Qianlitp closed this as completed Jul 19, 2021

Provide feedback