Implementing and Evaluating Information Retrieval Models

This repo is for project work of course work CS 6200 Information Retreival Systems at Northeastern University. The project implements information retrieval methods like cleaning, indexing, stemming, query enhancement. It also implements various document search models like BM25, TF-IDF, Query Likelihood Model along with Lucene. It uses CACM as corpus.

General Layout

The code is divided into multiple functional packages.

cleaner : handles cleaning logic.
indexer: handles indexing logic based on cleaned corpus.
retriever: implements various document retreival algorithms.
stemmer: handles stemming task
utils: general purpose functions.
evaluation: performs evaluation uisng metrics like Precision, Recall, MAP, MRR etc. on retreived documents for model.

Compiling and Running Program

Creating cleaned corpus and index files.

Import the project in IntelliJ or Eclipse
To generate the cleaned corpus, run Cleaner.java in cleaner package. This will generate a folder under src/main/resources/testcollection/cleanedcorpus folder.
To generate the index user Indexer.java. StemmedIndexer.java can be used to generate index of stemmed version of CACM corpus.

Running project tasks

Every task in project can be run using a command line flag in Runner.java.
Run Runner.java#main() method in retreivalmodels package.
Run Options usage: Retreival Model: -taskName <arg>
task to run - [can be one of the TASK1, TASK2 or TASK3, PHASE1, PHASE2, noiseGeneration, softMatching]

NOTE: Read more about tasks in the Problem Statement `

Key Terms

BM25, Lucene, Query Language Model, Noise Generation, Soft Matching

Contributions

Harshmeet Kaur Johal (johal.k@husky.neu.edu)
Karan Tyagi (tyagi.k@husky.neu.edu)
Savan Patel (patel.sav@husky.neu.edu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Implementing and Evaluating Information Retrieval Models

General Layout

Compiling and Running Program

Key Terms

Contributions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Implementing and Evaluating Information Retrieval Models

General Layout

Compiling and Running Program

Key Terms

Contributions