A simple search engine application with Spark

Source files

The directory structure is usual. Source files reside inside src\main\scala directory. They are:

Indexer.scala - Contains logic of Indexing.
utils/Functions.scala - Contains utility functions like vectorize_text(), normalize_word().
Ranker.scala - Contains logic of Ranker, including the interactive querying.
RelevanceAnalizator.scala - Contains ranker functions, namely simple inner product and BM25.

How to compile

Simply run sbt package in the root directory of project. The resulting jar file will be target/scala-2.11/searchengine_2.11-0.1.jar.

How to run

First the Indexer application should run to create index data and save it to a path. Then we can run Ranker on indexed data.

Indexer

Typically we run indexer in this format:

spark-submit --master yarn --class Indexer <jar-file\> <input-path\> <output-path\>

Example:

spark-submit --master yarn --class Indexer searchengine_2.11-0.1.jar /EnWikiMedium IndexDir

Run with -h argument to see full help message

Ranker

Typically we run ranker in this format:

spark-submit --master yarn --class Ranker <jar-file\> -i <index-path\> <ranker-method> <search-query>

Here <ranker-method> can be one of inner and bm25.

Example:

spark-submit --master yarn --class Ranker searchengine_2.11-0.1.jar -i IndexDir bm25 Game of Thrones

Once you get the results of first query, the application will ask for the next query.

Run with -h argument to see full help message

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src/main		src/main
.gitignore		.gitignore
MAP.ipynb		MAP.ipynb
README.md		README.md
Simple Search Engine - report.docx		Simple Search Engine - report.docx
build.sbt		build.sbt
set-up.sh		set-up.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A simple search engine application with Spark

Source files

How to compile

How to run

Indexer

Ranker

About

Releases

Packages

Contributors 2

Languages

sobir-git/Search-Engine

Folders and files

Latest commit

History

Repository files navigation

A simple search engine application with Spark

Source files

How to compile

How to run

Indexer

Ranker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages