GitHub - karantyagi/information-retrieval-systems: 🔎 Building and evaluating performance of retrieval models like tfidf, BM25, Smoothed Query Likelihood and Lucene.

Implementing and Evaluating Information Retrieval Models

This repo is for project work of course work CS 6200 Information Retreival Systems at Northeastern University. The project implements information retrieval methods like cleaning, indexing, stemming, query enhancement. It also implements various document search models like BM25, TF-IDF, Query Likelihood Model along with Lucene. It uses CACM as corpus.

General Layout

The code is divided into multiple functional packages.

cleaner : handles cleaning logic.
indexer: handles indexing logic based on cleaned corpus.
retriever: implements various document retreival algorithms.
stemmer: handles stemming task
utils: general purpose functions.
evaluation: performs evaluation uisng metrics like Precision, Recall, MAP, MRR etc. on retreived documents for model.

Compiling and Running Program

Creating cleaned corpus and index files.

Import the project in IntelliJ or Eclipse
To generate the cleaned corpus, run Cleaner.java in cleaner package. This will generate a folder under src/main/resources/testcollection/cleanedcorpus folder.
To generate the index user Indexer.java. StemmedIndexer.java can be used to generate index of stemmed version of CACM corpus.

Running project tasks

Every task in project can be run using a command line flag in Runner.java.
Run Runner.java#main() method in retreivalmodels package.
Run Options usage: Retreival Model: -taskName <arg>
task to run - [can be one of the TASK1, TASK2 or TASK3, PHASE1, PHASE2, noiseGeneration, softMatching]

NOTE: Read more about tasks in the Problem Statement `

Key Terms

BM25, Lucene, Query Language Model, Noise Generation, Soft Matching

Contributions

Harshmeet Kaur Johal (johal.k@husky.neu.edu)
Karan Tyagi (tyagi.k@husky.neu.edu)
Savan Patel (patel.sav@husky.neu.edu)

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
src/main		src/main
.gitignore		.gitignore
Problem Statement.pdf		Problem Statement.pdf
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementing and Evaluating Information Retrieval Models

General Layout

Compiling and Running Program

Key Terms

Contributions

About

Releases

Packages

Contributors 3

Languages

karantyagi/information-retrieval-systems

Folders and files

Latest commit

History

Repository files navigation

Implementing and Evaluating Information Retrieval Models

General Layout

Compiling and Running Program

Key Terms

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages