Skip to content

404-Repo/vqa_score_repository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VQAScore for Evaluating Text-to-Visual Models

This is an optimized version of the original implementation that improves the inference of the default models.

Original implementation, paper, demo and dataset can be found via links below

[VQAScore Page] [VQAScore Demo] [GenAI-Bench Page] [GenAI-Bench Demo] [CLIP-FlanT5 Model Zoo]

Publication by the authors of the method:

VQAScore significantly outperforms previous metrics such as CLIPScore and PickScore on compositional text prompts, and it is much simpler than prior art (e.g., ImageReward, HPSv2, TIFA, Davidsonian, VPEval, VIEScore) making use of human feedback or proprietary models like ChatGPT and GPT-4Vision.

Quick start

Install the package from the folder:

git clone https://github.com/linzhiqiu/t2v_metrics
cd t2v_metrics
pip install -e .

or via one line:

pip install git+https://github.com/404-Repo/vqa_score_repository.git@v1.0.0

Example of how to inference the model can be found in vqa_score_tool.py.

Notes on GPU and cache

GPU usage: By default, this code uses the first cuda device on your machine. We recommend 40GB GPUs for the largest VQAScore models such as clip-flant5-xxl and llava-v1.5-13b. If you have limited GPU memory, consider smaller models such as clip-flant5-xl and llava-v1.5-7b.

Customizing the question and answer template (for VQAScore)

The question and answer slightly affect the final score, as shown in the Appendix of the corresponding paper. We provide a simple default template for each model and do not recommend changing it for the sake of reproducibility. However, we do want to point out that the question and answer can be easily modified. For example, CLIP-FlanT5 and LLaVA-1.5 use the following template, which can be found at t2v_metrics/clip_t5_model/clip_t5_model.py:

# {} will be replaced by the caption
default_question_template = 'Does this figure show "{}"? Please answer yes or no.'
default_answer_template = 'Yes'

You can customize the template by passing the question_template and answer_template parameters into the forward() or batch_forward() functions:

# Use a different question for VQAScore
scores = clip_flant5_score(images=images,
                           texts=texts,
                           question_template='Is this figure showing "{}"? Please answer yes or no.',
                           answer_template='Yes')

Citation

If you find original work made by the authors of the paper useful, please use the following citation:

@article{lin2024evaluating,
  title={Evaluating Text-to-Visual Generation with Image-to-Text Generation},
  author={Lin, Zhiqiu and Pathak, Deepak and Li, Baiqi and Li, Jiayao and Xia, Xide and Neubig, Graham and Zhang, Pengchuan and Ramanan, Deva},
  journal={arXiv preprint arXiv:2404.01291},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages