Is there a documented way to scrape Single Page Applications? #131

bilogic · 2023-04-08T07:47:11Z

bilogic
Apr 8, 2023

SPAs usually pass on a CSRF token for use in subsequent requests, is there a roach way of scraping such sites?

ksassnowski · 2023-04-11T09:04:46Z

ksassnowski
Apr 11, 2023
Maintainer

If the CSRF token is part of the page's source, then you can extract it like any other piece of information. You would then have to figure out how exactly the site expects the CSRF to be sent with each subsequent request, for example as a header.

You can then set the header from within your spider before dispatching new requests: https://roach-php.dev/docs/processing-responses#returning-custom-requests

So, assuming the CSRF token exists in the page source like this

<meta name="csrfToken" content="...">

Your parse method could look something like this

public function parse(Response $response): \Generator
{
    // do your scraping here...

    $csrfToken = $response->filter('meta[name="csrfToken"]')->attr('content');

    $request = new Request(
        'POST', 
        'https://next-url-to-crawl.com',
        $this->parse(...),
        // Assuming the csrf token should get passed in the X-CSRF-Token header
        ['headers' => ['X-CSRF-Token' => $csrfToken]],
    );

    yield ParseResult::fromValue($request);
}

0 replies

ksassnowski · 2023-06-03T07:10:17Z

ksassnowski
Jun 3, 2023
Maintainer

Converting this to a discussion as it's more a question and less of an issue.

0 replies

Charlotte-br560 · 2024-03-25T07:29:36Z

Charlotte-br560
Mar 25, 2024

For scraping SPAs with CSRF tokens, consider reverse engineering API endpoints or using headless browsers like Puppeteer/Selenium. Remember to scrape responsibly. Also, check out Crawlbase for efficient scraping solutions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a documented way to scrape Single Page Applications? #131

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Is there a documented way to scrape Single Page Applications? #131

bilogic Apr 8, 2023

Replies: 3 comments

ksassnowski Apr 11, 2023 Maintainer

ksassnowski Jun 3, 2023 Maintainer

Charlotte-br560 Mar 25, 2024

bilogic
Apr 8, 2023

ksassnowski
Apr 11, 2023
Maintainer

ksassnowski
Jun 3, 2023
Maintainer

Charlotte-br560
Mar 25, 2024