a crawler for Wikipedia (for now only the English pages)
-
Updated
Aug 7, 2018 - Python
a crawler for Wikipedia (for now only the English pages)
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
The program can map out the shortest path between 2 wikipedia pages.
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
Custom implementation of Chord DHT and analysis of its operations
A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links
Web scraping is data scraping technique used for extracting data from websites.
Innopolis IR 2016 course semester project IR system part
[READ-ONLY] A word extractor for Wikipedia articles.
python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
Document Search Engine Tool
Add a description, image, and links to the wikipedia-crawler topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-crawler topic, visit your repo's landing page and select "manage topics."