Name		Name	Last commit message	Last commit date
parent directory ..
0_embedding		0_embedding
1_retrieval		1_retrieval
2_reranking		2_reranking
3_colbert		3_colbert
4_rag		4_rag
eval		eval
msmacro		msmacro
scifact		scifact
t2_ranking		t2_ranking
wikipedia-nq		wikipedia-nq
README.md		README.md
README_zh_CN.md		README_zh_CN.md

README.md

Open-Retrievals examples

1. Embedding

2. Reranking

3. RAG

RAG with Langchain

4. Whole pipeline examples

5. FAQ

The grad_norm during training is always zero?

consider to change fp16 or bf16
while training, set bf16 or fp16 in TrainingArguments; while inference, set use_fp16=True in AutoModelForEmbedding or LLMRanker

The fine-tuned embedding performance during inference is worse than original?

check whether the pooling_method is correct
check whether the prompt or instruction is exactly same as training for LLM model

How can we fine-tune the BAAI/bge-m3 ColBERT model?

open-retrievals support to fine-tune the BAAI/bge-m3 colbert directly, just don't set use_fp16=True while fine-tuning, and set the learning_rate smaller

The performance is worse?

the collator and loss should be aligned, especially for triplet training with negative embeddings. The collator of open-retrievals provided is {query: value, positive: value, negative: value}. Another collator is {query: value, document: positive+negative}, if so the loss function should be treated accordingly