Book Depository Scraper 📚

General Information

This is an automated data collection package (web-scraper) that is specifically tailored to scrape data on the Book depository website based on specific category keyword of choice. Check features of this scraper for details.

Installation

Use the package installer pip to install book scraper.

Install directly from github repository

!pip install git+https://github.com/fortune-uwha/book_scraper

Usage

The BooksScraper takes two arguments: number of examples to scrape and keyword to search. This returns a Pandas DataFrame with the records, with an option to export to a csv file.

To export raw data without cleaning:

from scraper.bookscraper import BooksScraper
scraper = BooksScraper(3000, "economics", True)
scraper.collect_information()

To export clean data:

from scraper.bookscraper import CleanBookScraper
scraper = CleanBookScraper(3000, "economics", True)
scraper.clean_dataframe()

For more information just type help(BooksScraper) or help(CleanBookScraper).

Extra Configuration

In order to use the Database class, you will need to create a postgreSQL database on Heroku or any other platform and enter the authentication credentials into config.py file.

Initialization

from database.database import Database
db = Database()

Example functions

These functions will be executed by running main.py. Feel free to edit the variables to suit your requirements.
- delete_tables() - Drops categories and books tables. Handle with care - this will destroy your data!
- create_tables() - Creates categories and books tables and sets up foreign keys.
- insert_data_into_db(dataframe, category) - Inserts the data from dataframe into a database.
- export_to_csv() - Fetches the data from the database and exports as .csv file.

Features

Based on specified category, BooksScraper collects information on:

Project Status

Project is: in progress

Acknowledgements

This project was based on Turing College learning on SQL and Data Scraping.

Contact

Created by @fortune_uwha - feel free to contact me!

License

This project is open source and available under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
database		database
scraper		scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_template.py		config_template.py
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Depository Scraper 📚

Table of Contents

General Information

Installation

Usage

Extra Configuration

Features

Project Status

Acknowledgements

Contact

License

About

Releases

Packages

Languages

License

fortune-uwha/book-scraper

Folders and files

Latest commit

History

Repository files navigation

Book Depository Scraper 📚

Table of Contents

General Information

Installation

Usage

Extra Configuration

Features

Project Status

Acknowledgements

Contact

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages