-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve (& document) scraper testing workflow #37
Comments
It'd be nice to be able to pass a string (response or byte-string or whatever you think is best) into our Page class via I'm happy to expand this more if it'd be helpful. |
This looks handy: Lines 18 to 37 in 2bf8f37
I'll try again to test a few of my scrapers this week. I think the main pain point is having a way to test selectors for a given page type and quickly see what broke. |
I want to think through this a bit & welcome feedback from anyone that'd like better ways to test their scrapers written using spatula.
The problem this is attempting to solve is that when writing scrapers, you might want the ability to test against a cached page, you would also want the ability to update your cached copy easily. This feels like it falls well within spatula's domain and spatula could offer a solution that works for common cases.
I've considered a few approaches & currently leaning towards the following:
Idea: Provide helper to turn page into a TestablePage
Sources are responsible for fetching themselves in Source.get_response, by replacing sources with special caching versions, an existing Page can be tested against a cached response.
def test_example_page():
# this would replace all of a page's sources with a new TestCacheURL, other parameters would stay the same
page = make_testable_page(ExamplePage(...))
assert page.process_page() == [1, 2, 3]
TestCacheURL would do the following:
This would be pretty simple for 80% of cases, it might get complicated for pages that yield back other pages, etc. since presumably you'd want to have their sources replaced too.
I'd also considered just having a global flag that alters how URL sources work (SPATULA_TEST_MODE) but not sure I like that approach yet.
The text was updated successfully, but these errors were encountered: