Hepsiburada Review Scraper

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.

Prerequisites

$ pip3 install -r requirements.txt

Installation

$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py

Usage

Program provides an human-friendly interactive shell for users.

Features

Shutdown computer after finishing: Optional choice for deep and long scrapings.
Threads: Try to give a proper number. (Recommended value is 64.)
Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
Pagination Depth: Maximum number of paginated review pages for each product.

You can track your progress via progress bars. categories.txt, products.txt, hepsiburada.txt will be saving to your current directory.

Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!

For educational purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
hepsiburada.py		hepsiburada.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hepsiburada Review Scraper

Prerequisites

Installation

Usage

Features

About

Releases 2

Packages

Contributors 4

Languages

License

0x01h/hepsiburada-review-scraper

Folders and files

Latest commit

History

Repository files navigation

Hepsiburada Review Scraper

Prerequisites

Installation

Usage

Features

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages