Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

* add formrequest version of spider and checker #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

canercandan
Copy link

No description provided.

@holgerd77
Copy link
Owner

Hi, could you please provide more context, what this pull request is for?

@canercandan
Copy link
Author

Hi @holgerd77 ,

This is about using the web authentication feature that Scrapy actually supports.

Here is an example: http://doc.scrapy.org/en/0.16/topics/request-response.html#request-usage-examples

To use both these new classes, the only thing you have to do is to use the new classes FormRequestDjangoSpider instead of DjangoSpider and FormRequestDjangoChecker instead of DjangoChecker and set a few parameters such as username, password and their respective input form names.

Here is a usage example:

spiders.py

from dynamic_scraper.spiders.django_spider import DjangoSpider, FormRequestDjangoSpider
from ave.models import NewsWebsite, Article, ArticleItem

class ArticleSpider(FormRequestDjangoSpider):

    name = 'article_spider'

    def __init__(self, *args, **kwargs):
        self._set_ref_object(NewsWebsite, **kwargs)
        self.scraper = self.ref_object.scraper
        self.scrape_url = self.ref_object.url
        self.scheduler_runtime = self.ref_object.scraper_runtime
        self.scraped_obj_class = Article
        self.scraped_obj_item_class = ArticleItem

        kwargs['username'] = 'USERNAME'
        kwargs['password'] = 'PASSWORD'
        kwargs['username_form'] = 'username'
        kwargs['password_form'] = 'password'

        super(ArticleSpider, self).__init__(self, *args, **kwargs)

checkers.py

from dynamic_scraper.spiders.django_checker import DjangoChecker, FormRequestDjangoChecker
from ave.models import Article

class ArticleChecker(FormRequestDjangoChecker):

    name = 'article_checker'

    def __init__(self, *args, **kwargs):
        self._set_ref_object(Article, **kwargs)
        self.scraper = self.ref_object.news_website.scraper
        self.scrape_url = self.ref_object.url
        self.scheduler_runtime = self.ref_object.checker_runtime

        kwargs['username'] = 'USERNAME'
        kwargs['password'] = 'PASSWORD'
        kwargs['username_form'] = 'username'
        kwargs['password_form'] = 'password'

        super(ArticleChecker, self).__init__(self, *args, **kwargs)

I guess we can improve it by moving parameters to the admin panel.

@umrashrf
Copy link

umrashrf commented Oct 1, 2019

https://github.com/scrapy/loginform

I am using this and it works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants