Replies: 3 comments
-
If the CSRF token is part of the page's source, then you can extract it like any other piece of information. You would then have to figure out how exactly the site expects the CSRF to be sent with each subsequent request, for example as a header. You can then set the header from within your spider before dispatching new requests: https://roach-php.dev/docs/processing-responses#returning-custom-requests So, assuming the CSRF token exists in the page source like this <meta name="csrfToken" content="..."> Your public function parse(Response $response): \Generator
{
// do your scraping here...
$csrfToken = $response->filter('meta[name="csrfToken"]')->attr('content');
$request = new Request(
'POST',
'https://next-url-to-crawl.com',
$this->parse(...),
// Assuming the csrf token should get passed in the X-CSRF-Token header
['headers' => ['X-CSRF-Token' => $csrfToken]],
);
yield ParseResult::fromValue($request);
} |
Beta Was this translation helpful? Give feedback.
-
Converting this to a discussion as it's more a question and less of an issue. |
Beta Was this translation helpful? Give feedback.
-
For scraping SPAs with CSRF tokens, consider reverse engineering API endpoints or using headless browsers like Puppeteer/Selenium. Remember to scrape responsibly. Also, check out Crawlbase for efficient scraping solutions. |
Beta Was this translation helpful? Give feedback.
-
SPAs usually pass on a CSRF token for use in subsequent requests, is there a roach way of scraping such sites?
Beta Was this translation helpful? Give feedback.
All reactions