GitHub - 404-Repo/vqa_score_repository

VQAScore for Evaluating Text-to-Visual Models

This is an optimized version of the original implementation that improves the inference of the default models.

Original implementation, paper, demo and dataset can be found via links below

[VQAScore Page] [VQAScore Demo] [GenAI-Bench Page] [GenAI-Bench Demo] [CLIP-FlanT5 Model Zoo]

Publication by the authors of the method:

"VQAScore: Evaluating Text-to-Visual Generation with Image-to-Text Generation" (ECCV 2024) [Paper] [HF]
Zhiqiu Lin, Deepak Pathak, Baiqi Li, Jiayao Li, Xide Xia, Graham Neubig, Pengchuan Zhang, Deva Ramanan
"GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation" (CVPR 2024, Best Short Paper @ SynData Workshop) [Paper] [HF]
Baiqi Li*, Zhiqiu Lin*, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia*, Pengchuan Zhang*, Graham Neubig*, Deva Ramanan* (*Co-First and co-senior authors)

VQAScore significantly outperforms previous metrics such as CLIPScore and PickScore on compositional text prompts, and it is much simpler than prior art (e.g., ImageReward, HPSv2, TIFA, Davidsonian, VPEval, VIEScore) making use of human feedback or proprietary models like ChatGPT and GPT-4Vision.

Quick start

Install the package from the folder:

git clone https://github.com/linzhiqiu/t2v_metrics
cd t2v_metrics
pip install -e .

or via one line:

pip install git+https://github.com/404-Repo/vqa_score_repository.git@v1.0.0

Example of how to inference the model can be found in vqa_score_tool.py.

Notes on GPU and cache

GPU usage: By default, this code uses the first cuda device on your machine. We recommend 40GB GPUs for the largest VQAScore models such as clip-flant5-xxl and llava-v1.5-13b. If you have limited GPU memory, consider smaller models such as clip-flant5-xl and llava-v1.5-7b.

Customizing the question and answer template (for VQAScore)

The question and answer slightly affect the final score, as shown in the Appendix of the corresponding paper. We provide a simple default template for each model and do not recommend changing it for the sake of reproducibility. However, we do want to point out that the question and answer can be easily modified. For example, CLIP-FlanT5 and LLaVA-1.5 use the following template, which can be found at t2v_metrics/clip_t5_model/clip_t5_model.py:

# {} will be replaced by the caption
default_question_template = 'Does this figure show "{}"? Please answer yes or no.'
default_answer_template = 'Yes'

You can customize the template by passing the question_template and answer_template parameters into the forward() or batch_forward() functions:

# Use a different question for VQAScore
scores = clip_flant5_score(images=images,
                           texts=texts,
                           question_template='Is this figure showing "{}"? Please answer yes or no.',
                           answer_template='Yes')

Citation

If you find original work made by the authors of the paper useful, please use the following citation:

@article{lin2024evaluating,
  title={Evaluating Text-to-Visual Generation with Image-to-Text Generation},
  author={Lin, Zhiqiu and Pathak, Deepak and Li, Baiqi and Li, Jiayao and Xia, Xide and Neubig, Graham and Zhang, Pengchuan and Ramanan, Deva},
  journal={arXiv preprint arXiv:2404.01291},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
t2v_metrics		t2v_metrics
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vqa_score_tool.py		vqa_score_tool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VQAScore for Evaluating Text-to-Visual Models

Publication by the authors of the method:

Quick start

Notes on GPU and cache

Customizing the question and answer template (for VQAScore)

Citation

About

Releases 1

Packages

Languages

License

404-Repo/vqa_score_repository

Folders and files

Latest commit

History

Repository files navigation

VQAScore for Evaluating Text-to-Visual Models

Publication by the authors of the method:

Quick start

Notes on GPU and cache

Customizing the question and answer template (for VQAScore)

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages