Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit search to path instead of domain? #50

Open
1kastner opened this issue Nov 22, 2018 · 5 comments
Open

Limit search to path instead of domain? #50

1kastner opened this issue Nov 22, 2018 · 5 comments

Comments

@1kastner
Copy link

Could it be possible to restrict the search to a certain path?
A bad example would be to restrict a search to http://google.com/maps/ and ignore results which are in other "subdirectories" of http://google.com/.
Using "domain" for this purpose does not work.

@c4software
Copy link
Owner

Hi,

Sorry for the delay. You can do it via

--exclude "maps/"

But it has to be exhaustive.

You wan't something generic for all subfolders?

@1kastner
Copy link
Author

Well, actually it is an include logic which is not yet implemented in https://github.com/c4software/python-sitemap/blob/master/main.py

@davidcx89
Copy link

I agree that it would be cool to have an "include" function in the crawler.
1kastner, I think your phrase "A bad example" may have read the opposite way to crsoftware.

@1kastner
Copy link
Author

@davidcx89 yeap, sorry for bad phrasing, I maybe should have put more effort on describing the issue.

If I'll find the time there might be a pull request somewhen soon.

@c4software
Copy link
Owner

Hi,

An include pattern is indeed a great idea. Something with reggex would be really great.

I will try to doing this quickly. Maybe this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants