🌐 Homepage | 🤗 Dataset(M-BEIR Benchmark) | 🤗 Checkpoints(UniIR models) | 📖 arXiv | GitHub
This repo contains the codebase for the ECCV-2024 paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers"
- 🔥[2024-04-13]: We highlight another valuable and concurrent research on training instruction-following, multi-task multi-modal retrievers with Late-interaction:PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers , which was done by the researchers of the University of Cambridge. They also introduced the M2KR benchmark which can be used to train and evaluate multi-modal universal information retrievers. We may combine the M2KR and M-BEIR benchmarks together to facilitate the advance of this field.
- 🔥[2024-03-18]: Release the UniIR(CLIP_SF) large and UniIR(BLIP_FF) large checkpoints 🤗 Checkpoints
- 🔥[2023-12-21]: Our 🤗 M-BEIR Benchmark is now available for use.
We propose the UniIR(Universal multimodal Information Retrieval) framework to learn a single retriever to accomplish (possibly) any retrieval task. Unlike traditional IR systems, UniIR needs to follow the instructions to take a heterogeneous query to retrieve from a heterogeneous candidate pool with millions of candidates in diverse modalities.
To train and evaluate universal multimodal retrieval models, we build a large-scale retrieval benchmark named M-BEIR (Multimodal BEnchmark for Instructed Retrieval).
We provide the M-BEIR dataset in the 🤗 Dataset. Please follow the instructions provided on the HF page to download the dataset and prepare the data for training and evaluation. You need to set up GiT LFS and directly clone the repo:
git clone https://huggingface.co/datasets/TIGER-Lab/M-BEIR
We provide the codebase for training and evaluating the UniIR CLIP-ScoreFusion, CLIP-FeatureFusion, BLIP-ScoreFusion, and BLIP-FeatureFusion models.
Prepare the codebase of the UniIR project and Conda environment using the following commands:
git clone https://github.com/TIGER-AI-Lab/UniIR
cd UniIR
cd src/models/
conda env create -f uniir_env.yml
To train the UniIR models from pretrained CLIP and BLIP checkpoints, please follow the instructions below. The scripts will automatically download the pretrained CLIP and BLIP checkpoints.
Please download the M-BEIR benchmark by following the instructions in the M-BEIR section.
cd src/models/uniir_clip/clip_scorefusion/configs_scripts/large/train/inbatch/
Modify inbatch.yaml
for hyperparameter tuning and run_inbatch.sh
for your own environment and paths.
- Modify the
UNIIR_DIR
in therun_inbatch.sh
to the directory where you want to store the checkpoints. - Modify the
MBEIR_DATA_DIR
in therun_inbatch.sh
to the directory where you store the M-BEIR benchmark. - Modify the
SRC_DIR
in therun_inbatch.sh
to the directory where you store the codebase of the UniIR project(This repo). - By default, UniIR models are trained on M-BEIR with in-batch negatives, and the hard negatives provided by the original datasets are not used.
- We used wandb to log the training process. Please make sure a
.env
environment withWANDB_API_KEY
,WANDB_PROJECT
, andWANDB_ENTITY
is set.
Then you can run the following command to train the UniIR CLIP_SF Large model.
bash run_inbatch.sh
cd src/models/uniir_blip/blip_featurefusion/configs_scripts/large/train/inbatch/
Modify inbatch.yaml
for hyperparameter tuning and run_inbatch.sh
for your own environment and paths.
bash run_inbatch.sh
Similarly, you can train the UniIR CLIP_FF and BLIP_SF models by modifying the corresponding scripts.
We provide the evaluation pipeline for the UniIR models on the M-BEIR benchmark.
Please create an environment for the FAISS library:
# From the root directory of the project
cd src/common/
conda env create -f faiss_env.yml
Please download the M-BEIR benchmark by following the instructions in the M-BEIR section.
You can train the UniIR models from scratch or download the pre-trained UniIR checkpoints by following the instructions in the Model Zoo section.
cd src/models/uniir_clip/clip_scorefusion/configs_scripts/large/eval/inbatch/
Modify embed.yaml
, index.yaml
, retrieval.yaml
and run_eval_pipeline_inbatch.sh
for your own environment, paths and evaluation settings.
- If you download our pretrained UniIR model, please modify the
UNIIR_DIR
in therun_eval_pipeline_inbatch.sh
to the directory where you want to store large files including the checkpoints, embeddings, index and retrieval results. Then you can place theclip_sf_large.pth
file in the following path:This the default path specified by$UNIIR_DIR/checkpoint/CLIP_SF/Large/Instruct/InBatch/clip_sf_large.pth
model.ckpt_config
in theembed.yaml
file. - Modify the
MBEIR_DATA_DIR
in therun_eval_pipeline_inbatch.sh
to the directory where you store the M-BEIR benchmark. - Modify the
SRC_DIR
in therun_eval_pipeline_inbatch.sh
to the directory where you store the codebase of the UniIR project(This repo).
The default configuration will evaluate the UniIR CLIP_SF Large model on both the M-BEIR (5.6M heterogeneous candidate pool) and the M-BEIR_local (homogeneous candidate pool) benchmarks.
UNION
in the yaml files refers to the M-BEIR (5.6M heterogeneous candidate pool).
You can follow the comments in the yaml files and modify the configurations to evaluate the model on the M-BEIR_local benchmark only.
bash run_eval_pipeline_inbatch.sh
embed
, index
, logger
and retrieval_results
will be saved in the $UNIIR_DIR
directory.
cd src/models/unii_blip/blip_featurefusion/configs_scripts/large/eval/inbatch/
Similarly, if you download our pretrained UniIR model, you can place the blip_ff_large.pth
file in the following path:
$UNIIR_DIR/checkpoint/BLIP_FF/Large/Instruct/InBatch/blip_ff_large.pth
The default configuration will evaluate the UniIR BLIP_FF Large model on both the M-BEIR and the M-BEIR_local benchmarks.
bash run_eval_pipeline_inbatch.sh
UniRAG evaluation is very similar to the default evaluation with the following differences:
- It stores jsonl files containing queries and their retrieved candidates under
retrieval_results
. This is useful when retrieved results will be used in downstream applications like RAG. - When
retrieve_image_text_pairs
inretrieval.yaml
is set toTrue
, a complement candidate will be fetched for each candidate withtext
orimage
only modality. With this setting, the candidate and its complement will always haveimage, text
modality. Complement candidates are fetched by using the original candidates as queries (e.g., querytext -> candidateimage -> complement candidatetext). - To run evaluations in UniRAG mode follow the instructions provided above replacing
InBatch
andinbatch
withUniRAG
andunirag
, respectively.
You can train and evaluate the UniIR CLIP_FF and BLIP_SF models by modifying the corresponding scripts.
We provide the UniIR model checkpoints in the 🤗 Checkpoints. You can directly use the checkpoints for retrieval tasks or fine-tune the models for your own retrieval tasks.
Model Name | Version | Model Size | Model Link |
---|---|---|---|
UniIR(CLIP-SF) | Large | 5.13 GB | Download Link |
UniIR(BLIP-FF) | Large | 7.49 GB | Download Link |
You can download them by
git clone https://huggingface.co/TIGER-Lab/UniIR
- Cong Wei: c58wei@uwaterloo.ca
- Yang Chen: yangc@gatech.edu
- Alan Ritter: alan.ritter@cc.gatech.edu
- Wenhu Chen: wenhuchen@uwaterloo.ca
BibTeX:
@article{wei2023uniir,
title={Uniir: Training and benchmarking universal multimodal information retrievers},
author={Wei, Cong and Chen, Yang and Chen, Haonan and Hu, Hexiang and Zhang, Ge and Fu, Jie and Ritter, Alan and Chen, Wenhu},
journal={arXiv preprint arXiv:2311.17136},
year={2023}
}