Best practices for passing url list #178
Unanswered
benjohnsonn
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey y'all!
I've integrated unlighthouse with a Scrapy crawler I currently use – instead of using unlighthouse's crawling function, I'm feeding it a list of URLs from Scrapy, ranging from 200 to 10,000 URLs. My approach follows the advice in the docs section about Manually Providing URLs.
My first question is whether i'm still leveraging unlighthouse's efficiency when I bypass the crawling feature?
Since I'm passing exact URLs, I'm not sure if route sampling would still apply. Would it be better to convert my URL list into relative path folders with regex rules and then still use unlighthouse's crawler?
The main goal is to replace an existing python script I wrote to directly query the lighthouse api, which works, but isn't performant at all.
I know this isn't really the intended use case, so any pointers or insights you can provide would be greatly appreciated!
Additionally, My second question is, if it's possible to configure the csvExpanded to include even more information per url? It seems that the UI crawl results capture more information.
Beta Was this translation helpful? Give feedback.
All reactions