Skip to content

Django application to scrape images from website using beautifulsoup

Notifications You must be signed in to change notification settings

rachhek/imagescraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Scraper

This is an implementation of a simple image scraper made in Django using beautifulsoup for scraping. It has a simple frontend to view the images and download them.

Dependencies

Check Pipfile for details

  • beautifulsoup4
  • lxml
  • requests
  • cssutils
  • pillow
  • python 3.7

Setup

Clone the repository

$ git clone https://github.com/rachhek/imagescraper.git
$ cd imagescraper

Create a virtual environment and install the dependencies

$ pipenv shell
$ (imagescraper) pipenv install

Once the pipenv has finished installing, run migrations for django.

$ (imagescraper) python manage.py migrate

Run the server

$ (imagescraper) python manage.py runserver

Open the application in http://127.0.0.1:8000/

Walkthrough

Homepage
alt Screenshot 1

Example of scraping the homepage of http://unity.com
alt Screenshot 2

The Urls and images can be downloaded
alt Screenshot 4

The physical location of the images and txt file of URLs is

<path_to_project>/imagescraper/media/

alt Screenshot 3

Code

Scraper Tool

scraper_app/lib.py

Gallery

scraper_app/templates/scraper_app/scraper/index.html

Limitations

  • Cannot download images that are in the form of base64
  • Only scrapes "img" tag and "background-url" style tags
  • does not automatically scroll pages
  • might not properly scrape images for a highly dynamic websites

Logs

The logs are stored in /imgscraper/debug.log

About

Django application to scrape images from website using beautifulsoup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published