scifact

Model	mrr@10	recall@10	ndcg@10
bge-base-en-v1.5	0.703	0.862	0.744
+ fine-tuning	0.757	0.900	0.793
e5-mistral-7b-instruct	0.589	0.748	0.630
+ fine-tuning	0.763	0.940	0.806

Fine-tuning embedding

sh embed_pairwuse_train.sh

Optional: llm embedding

sh embed_llm_train.sh

save the pair of (embedding vector, id) for each corpus example, support for multiple files
for llm embed encoding, remember to use the same instruction

sh encode_corpus.sh

Optional: llm encoding

sh encode_llm_corpus.sh

save the pair of (embedding vector, id) for each query example
use Tevatron/scifact/dev or Tevatron/scifact/test so we can choose to encode the dev or test file

sh encode_query.sh

Optional: llm encoding

sh encode_llm_query.sh

sh retrieve.sh

sh rerank.sh

python evaluate.py