wikipedia-crawler

Here are 12 public repositories matching this topic...

Sarthakjain1206 / Intelligent_Document_Finder

Document Search Engine Tool

search-engine scrapy-spider indexer scrapy text-summarization search-algorithm webcrawler latent-dirichlet-allocation bm25 spellchecker document-similarity wikipedia-search wikipedia-crawler ranking-algorithm document-summarization reverse-index

Updated Dec 8, 2022
Python

lehinevych / MediaWikiAPI

Star

Python wrapper for the MediaWiki API to access and parse data from Wikipedia

wikipedia python3 wikipedia-api mediawiki-api wikipedia-crawler wikipedia-scraper wikipedia-sc

Updated Jun 18, 2024
Python

nazaninsbr / Wikipedia-Crawler

Star

a crawler for Wikipedia (for now only the English pages)

python crawler wikipedia python-crawler wikipedia-crawler

Updated Aug 7, 2018
Python

A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.

react search-engine elasticsearch information-retrieval material-ui web-scraping marvel elasticsearch-client scrapy-crawler nodejs-server wikipedia-crawler lavenshtein okapi-bm25 dirichlet-smoothing marvel-wiki

Updated Jan 13, 2023
JavaScript

georgegkonis / chord-dht-analysis

Star

Custom implementation of Chord DHT and analysis of its operations

python jupyter wikipedia-crawler chord-dht

Updated Sep 22, 2024
Jupyter Notebook

TimurKasatkin / IR_system

Star

Innopolis IR 2016 course semester project IR system part

cli search-engine crawler information-retrieval scala sbt vector-space-model tfidf ranked-fulltext-searches wikipedia-crawler

Updated Nov 30, 2016
Scala

mayankkumar2 / wikipedia-index-scraper

Star

The program can map out the shortest path between 2 wikipedia pages.

wikipedia wikipedia-crawler wikipedia-scraper wikipedia-entries

Updated May 24, 2020
Go

adidottxt / wikipedia-crawler

Star

python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡

udacity beautifulsoup python-web-crawler wikipedia-crawler beautifulsoup4

Updated Feb 8, 2018
Python

WillCaton2350 / Wikipedia-WebCrawler

Star

Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.

mysql python json web-crawler scrapy-spider scrapy-crawler python-crawler wikipedia-crawler