Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dupe spotter user-defined list of expressions / separation of default dupe spotter expressions #197

Open
acrois opened this issue Aug 8, 2021 · 1 comment

Comments

@acrois
Copy link

acrois commented Aug 8, 2021

process_body(body, url) in dupespotter.py

As an end-user, I would like to be able to modify the dupe spotter expression list and be able to update it during runtime like other configuration options.
I would also like to be apply different defaults into more intentional sets of defaults for specific website types.

Right now, it is currently hard-coded into dupespotter.py but may require more thought as to how to expose the list of expressions and keep it up to date (write to file to update).

What happens when the list is updated but an invalid expression is found? I think skipping the line and printing out an error should be sufficient.

@ivan
Copy link
Contributor

ivan commented Aug 28, 2021

Yeah, it would be nice to be able to customize dupespotter. But because most users won't, it probably makes sense to fix it in grab-site for more websites that anyone would care to crawl.

Also note that it was written a while ago and the site-specific parts of it are probably out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants