SadedeGel Scraper

This web scraper is developed to meet the data requirements of SadedeGel library. It scrapes data from news websites and stores them as .txt files. Developed as a part of Açık Kaynak Hackathon Programı 2020.

💬 Where to ask questions

The SadedeGel project is maintained by @globalmaksmum AI team members @dafajon, @askarbozcan, @mccakir and @husnusensoy.

Type	Platforms
🚨 Bug Reports	GitHub Issue Tracker
🎁 Feature Requests	GitHub Issue Tracker

How it works

Gets author urls of given news website
Gets article urls of each author
Scrapes data from the article and write to a .txt file

Install Scraper

You need sbt to build the project.

$ git clone https://github.com/GlobalMaksimum/sadedegel-scraper.git 
$ cd sadedegel-scraper
$ sbt assembly

You will get the jar under ./target/scala-[version]/

Example Run

$ nohup java -jar sadedegel-scraper-assembly-0.3.jar "hurriyet" > hurriyet.out &

Check for hurriyet-[dd-MM-yyyy] directory for .txt files.

For Developers

You can add support for additional news sources by extending NewsWebsite Trait.

Example:

import com.sadedegel.ScraperUtils.getArticles

class HurriyetScraper extends NewsWebsite {
  val domain = "https://www.hurriyet.com.tr"
  val authorsUrl = "https://www.hurriyet.com.tr/yazarlar/tum-yazarlar/#hurriyetcomtr"
  override def getAuthorUrls(): List[String] = {
    List("https://www.hurriyet.com.tr/yazarlar/ilber-ortayli/"
    )
  }
  override def getArticlesOfAuthors(authorUrls: List[String], domain: String): Unit = {
    getArticles(authorUrls, domain, ".highlighted-box.mb20", writeArticlesToFile, "?p=", "")
  }
  override def writeArticlesToFile(articleUrl: String): Unit = {
    ScraperUtils.writeToFile(articleUrl, List(".article-content.news-text", ".rhd-all-article-detail"),
      "hurriyet")
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SadedeGel Scraper

💬 Where to ask questions

How it works

Install Scraper

Example Run

For Developers

About

Releases

Packages

Languages

License

GlobalMaksimum/sadedegel-scraper

Folders and files

Latest commit

History

Repository files navigation

SadedeGel Scraper

💬 Where to ask questions

How it works

Install Scraper

Example Run

For Developers

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages