Welcome to the News Scraper tutorial repository! This repository serves as a comprehensive guide for web scraping using BeautifulSoup and Selenium. In this tutorial, you will learn how to scrape news from websites like detik[dot]com.
Additionally, we'll focus on scraping news related to Indonesia's presidential candidates for 2024, using keywords such as "anies baswedan", "prabowo subianto", and "ganjar pranowo".
- img/
- img_save/
- notebook/
: Jupyter notebook for scraping news from traditional static websites.dynamic_web.ipynb
: Jupyter notebook for scraping news from dynamic web applications.
- deployment_script/: Contains scripts and files for deployment using Flask.
Static Web Scraping Tutorial: Explore the notebook/static_web.ipynb notebook to learn how to scrape news from traditional static websites.
Dynamic Web Scraping Tutorial: The notebook/dynamic_web.ipynb notebook guides you through scraping news from dynamic web applications.
Sentiment Analysis Tutorial: Learn lexicon-based sentiment analysis using TextBlob. Understand the sentiment behind news articles related to the selected keywords. Build a machine learning model from scratch for sentiment analysis.
Clone the repository to your local machine
git clone https://github.com/Ubeydkhoiri/news-scraper.git
Install dependencies
pip install -r requirements.txt
Navigate to the repository
cd news-scraper
Run flask app.py
python deployment_script/app.py
After flask app runs, you can copy on your web-browser. Edit your route'anies baswedan' to export all news with tag 'anies baswedan'. And to run the scraper and update the data
Explore the tutorials in the notebook directory and deploy the Flask application using the scripts in deployment_script.