This repository contains datasets and code for our COLING 2024 paper: "Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines".
CitationR is created by extracting recommended citations in reviews from NeurIPS and ICLR. In total, we collect 76,143 official reviews and 21,598 submissions, among which around 35% of submissions are identified as lacking citations. Moreover, to better replicate the actual situation in which researchers search for papers to cite, we establish a larger and more challenging version of CitationR. This version includes additional 40,810 papers published in top venues that reviewers frequently recommend citations from.
Download the base version from the following link: Baidu Drive (access code: q2vy)
Download the extended version from the following link: Baidu Drive (access code: gjwu)
Unzip downloaded data and put them under data/
with names of base
and extended
respectively.
conda create -n rmc python==3.7.12
source activate rmc
pip install -r requirements.txt
- Evaluation
python ./src/bm25.py
Follow official introduction in citeomatic
Follow official introduction in Local-Citation-Recommendation
- Evaluation of LMs not fine-tuned on CitationR
python ./src/evaluating/direct_plm_evaluating.py --dataset_name base --model_name scincl --way_ref concat
- Evaluation of LMs fine-tuned on CitationR
python ./src/evaluating/evaluate.py --dataset_name base --config_name example --experiment_name example_evaluation
- Train
python ./src/training/run_training.py --model scincl --dataset base --config_name example --experiment_suffix example_train
Our fine-tuned models are available: base(access code: nrbe), extended(access code: n3pb).
Put downloaded models under experiments/dataset_name/example_evaluation/model/
If you find our work useful, please cite the paper as:
@inproceedings{long24coling,
title = {Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines},
author = {Kehan Long and Shasha Li and Pancheng Wang and Chenlong Bao and Jintao Tang and Ting Wang},
booktitle = {COLING},
year = {2024}
}
- We use some of the code in Local-Citation-Recommendation and specter for implementing our project.