Model | mrr@10 | recall@10 | ndcg@10 |
---|---|---|---|
bge-base-en-v1.5 | 0.703 | 0.862 | 0.744 |
+ fine-tuning | 0.757 | 0.900 | 0.793 |
e5-mistral-7b-instruct | 0.589 | 0.748 | 0.630 |
+ fine-tuning | 0.763 | 0.940 | 0.806 |
sh embed_pairwuse_train.sh
Optional: llm embedding
sh embed_llm_train.sh
- save the pair of
(embedding vector, id)
for each corpus example, support for multiple files - for llm embed encoding, remember to use the same instruction
sh encode_corpus.sh
Optional: llm encoding
sh encode_llm_corpus.sh
- save the pair of
(embedding vector, id)
for each query example - use
Tevatron/scifact/dev
orTevatron/scifact/test
so we can choose to encode the dev or test file
sh encode_query.sh
Optional: llm encoding
sh encode_llm_query.sh
sh retrieve.sh
sh rerank.sh
- download the
dev_qrels.txt
from dropbox
python evaluate.py