Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.
$ pip3 install -r requirements.txt
$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py
Program provides an human-friendly interactive shell for users.
- Shutdown computer after finishing: Optional choice for deep and long scrapings.
- Threads: Try to give a proper number. (Recommended value is 64.)
- Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
- Pagination Depth: Maximum number of paginated review pages for each product.
You can track your progress via progress bars. categories.txt
, products.txt
, hepsiburada.txt
will be saving to your current directory.
Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!
For educational purposes only.