Skip to content

Latest commit

 

History

History

examples

Open-Retrievals examples

3. RAG

4. Whole pipeline examples

5. FAQ

  1. The grad_norm during training is always zero?
  • consider to change fp16 or bf16
  • while training, set bf16 or fp16 in TrainingArguments; while inference, set use_fp16=True in AutoModelForEmbedding or LLMRanker
  1. The fine-tuned embedding performance during inference is worse than original?
  • check whether the pooling_method is correct
  • check whether the prompt or instruction is exactly same as training for LLM model
  1. How can we fine-tune the BAAI/bge-m3 ColBERT model?
  • open-retrievals support to fine-tune the BAAI/bge-m3 colbert directly, just don't set use_fp16=True while fine-tuning, and set the learning_rate smaller
  1. The performance is worse?
  • the collator and loss should be aligned, especially for triplet training with negative embeddings. The collator of open-retrievals provided is {query: value, positive: value, negative: value}. Another collator is {query: value, document: positive+negative}, if so the loss function should be treated accordingly