Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(main-resource): find last matching document request #15323

Closed
wants to merge 3 commits into from

Conversation

adamraine
Copy link
Member

Noticed this on https://umich.edu

If a page does a JS refresh then we will use the wrong network request as the main resource because we match on url. We should be looking for the last request.

@adamraine adamraine requested a review from a team as a code owner August 1, 2023 22:51
@adamraine adamraine requested review from brendankenny and removed request for a team August 1, 2023 22:51
@adamraine adamraine changed the title core(main-resource): find last matching request core(main-resource): find last matching document request Aug 1, 2023
@brendankenny
Copy link
Member

Our most bullet-proof method for this is probably:

/**
* This method generates the document request chain including client-side and server-side redirects.
*
* Example:
* GET /requestedUrl => 302 /firstRedirect
* GET /firstRedirect => 200 /firstRedirect, window.location = '/secondRedirect'
* GET /secondRedirect => 302 /thirdRedirect
* GET /thirdRedirect => 302 /mainDocumentUrl
* GET /mainDocumentUrl => 200 /mainDocumentUrl
*
* Returns network records [/requestedUrl, /firstRedirect, /secondRedirect, /thirdRedirect, /mainDocumentUrl]
*
* @param {Array<LH.Artifacts.NetworkRequest>} networkRecords
* @param {LH.Artifacts.ProcessedTrace} processedTrace
* @return {Array<LH.Artifacts.NetworkRequest>}
*/
static getDocumentRequestChain(networkRecords, processedTrace) {
/** @type {Array<LH.Artifacts.NetworkRequest>} */
const documentRequests = [];
// Find all the document requests by examining navigation events and their redirects
for (const event of processedTrace.processEvents) {
if (event.name !== 'navigationStart') continue;
const data = event.args.data || {};
if (!data.documentLoaderURL || !data.isLoadingMainFrame) continue;
let networkRecord = networkRecords.find(record => record.requestId === data.navigationId);
while (networkRecord) {
documentRequests.push(networkRecord);
// HTTP redirects won't have separate navStarts, so find through the redirect chain.
networkRecord = networkRecord.redirectDestination;
}
}
if (!documentRequests.length) {
throw new Error('No navigation requests found');
}
return documentRequests;
}

checks frame (don't want to match iframes from pages embedding themselves), matches requests by requestId not URL, follows JS and HTTP redirects, etc.

It requires the processed trace, but that's already in page-dependency-graph, and it's not great but not the worst to compute it at the end of the navigation.

@adamraine
Copy link
Member Author

It requires the processed trace, but that's already in page-dependency-graph, and it's not great but not the worst to compute it at the end of the navigation.

The error I saw was in main-document-content which does not have the processed trace or page dependency graph.

@connorjclark
Copy link
Collaborator

fyi #15001

@brendankenny
Copy link
Member

fyi #15001

Yeah, I brought up the redirects method because I was thinking back then we should switch more infrastructure over to using request IDs.

The error I saw was in main-document-content which does not have the processed trace

Any reason it couldn't? Anything expecting to find a main resource should have observed it in the trace, too.

@adamraine
Copy link
Member Author

adamraine commented Aug 2, 2023

Any reason it couldn't? Anything expecting to find a main resource should have observed it in the trace, too.

Sorry I should have said the reason for the error in main-document-content comes from the MainResources computed artifact getting the wrong resource. Presumably we would want to correct this behavior everywhere. It would be a lot of work to add the processed trace to that computed artifact and then update all it's call sites.

@connorjclark
Copy link
Collaborator

totally forgot I did this: #14520

@connorjclark
Copy link
Collaborator

we going with above pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants