Best practices for passing url list #178

benjohnsonn · 2023-12-18T20:20:53Z

benjohnsonn
Dec 18, 2023

Hey y'all!

I've integrated unlighthouse with a Scrapy crawler I currently use – instead of using unlighthouse's crawling function, I'm feeding it a list of URLs from Scrapy, ranging from 200 to 10,000 URLs. My approach follows the advice in the docs section about Manually Providing URLs.

My first question is whether i'm still leveraging unlighthouse's efficiency when I bypass the crawling feature?

Since I'm passing exact URLs, I'm not sure if route sampling would still apply. Would it be better to convert my URL list into relative path folders with regex rules and then still use unlighthouse's crawler?

The main goal is to replace an existing python script I wrote to directly query the lighthouse api, which works, but isn't performant at all.

I know this isn't really the intended use case, so any pointers or insights you can provide would be greatly appreciated!

Additionally, My second question is, if it's possible to configure the csvExpanded to include even more information per url? It seems that the UI crawl results capture more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for passing url list #178

{{title}}

Replies: 0 comments

Select a reply

Best practices for passing url list #178

benjohnsonn Dec 18, 2023

Replies: 0 comments

benjohnsonn
Dec 18, 2023