Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b #43

Closed
asdfasadfasfa opened this issue Mar 8, 2020 · 4 comments

Comments

@asdfasadfasfa
Copy link

目前我的方法是拼接, 比如 http://www.A.com, 已知了两个路径: /path_a,/path_b
那么命令为: crawlergo -c chrome http://www.A.com/ http://www.A.com/path_a http://www.A.com/path_b

有两个问题:

  1. 如果已知路径比较多, 手工拼接比较麻烦
  2. 这种拼接传参的方法和分开一个个执行得到的结果是一样? 还是说有差别,没有进行验证.

当然后期能有参数支持多路径作为入口最好不过.

Originally posted by @djerrystyle in #31 (comment)

@asdfasadfasfa
Copy link
Author

没有起任何效果,这个crawlergo 爬虫 bug还是很多,404页面处理不了,没法自定义初始化路径,没法对传统标签(进行爬取。。。。希望作者改进下。。。。。

@asdfasadfasfa
Copy link
Author

经过测试,千万不要加--robots-path 这个参数。。。。不然啥数据都爬不到 bug

@Qianlitp
Copy link
Owner

已解决 --robots-path 造成的爬虫崩溃问题

@asdfasadfasfa
Copy link
Author

我之前已经找了一些办法解决了。。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants