Please download the following resources:
- AOL-IA documents: Follow the instructions on the ir-datasets website
- FastText model for filtering the AOL-IA dataset
- DMOZ dataset for training the document topic estimator
- RankLib for training the Learning-to-Rank model
Install the required dependencies as follows:
conda env create -f env.yml
conda activate PR-Rank
pip install ./PR-Rank
To execute the series of experiments, run the following commands:
# Estimate qrel
python PR-Rank/aolia_qrel/main.py
# Extract features
python -m spacy download en_core_web_sm
python PR-Rank/features_extraction/main.py
# Divide dataset
python PR-Rank/dataset_division/main.py
# Train & evaluate PR-Rank (Parameter regression model)
python PR-Rank/parameter_regression/main.py
To modify experimental settings, edit the following configuration files:
- PR-Rank/aolia_qrel/config/config.yaml
- PR-Rank/features_extraction/config/config.yaml
- PR-Rank/dataset_division/config/config.yaml
- PR-Rank/parameter_regression/config/config.yaml
PR-Rank involves two main experimental stages, each with its own configuration:
- Dataset Division
- PR-Rank Parameter Regression
You can independently select feature sets for each experimental stage:
In the dataset division configuration, modify the feature_sets
parameter:
# Use only query features for dataset division
feature_sets:
- Q
In the PR-Rank configuration, modify the domain_feature_sets
parameter:
# Use all features sets for PR-Rank
domain_feature_sets:
- Q
- D
- Q-D
Available options for both stages are Q (Query), D (Document), Q-D (Query-Document pair), or any combination.
Use descriptive names for each experimental stage to organize your runs effectively.
In PR-Rank/dataset_division/config/config.yaml
:
# experiment_name: q
domains_dir_path: PR-Rank/dataset_division/experiment/q/data/domains
ltr_datasets_dir_path: PR-Rank/dataset_division/experiment/q/data/ltr_datasets
...
In PR-Rank/parameter_regression/config/config.yaml
:
# experiment_name: all
domain_features_dir_path: PR-Rank/parameter_regression/experiment/all/data/domain_features
model_parameters_dir_path: PR-Rank/parameter_regression/experiment/all/data/model_parameters
...