Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude canonicalized pages #78

Open
Spidle opened this issue Oct 22, 2021 · 0 comments
Open

Exclude canonicalized pages #78

Spidle opened this issue Oct 22, 2021 · 0 comments

Comments

@Spidle
Copy link

Spidle commented Oct 22, 2021

sometimes we have URLs that are canonicalized to other pages, and these should not be included in the sitemap. See google's reference: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

So the logic would be to look for a canonical tag and check if it matches the crawled URL. If it does not, then do not include that page in the sitemap.

I'm working on updating your code myself to include this but I'm still new to Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant