Skip to content

Implementation for paper "AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering" (https://arxiv.org/abs/2311.14906)

License

Notifications You must be signed in to change notification settings

MIFA-Lab/AutoEval-Video

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering

Paper Leaderboard

AutoEval-Video is a comprehensive and challenging benchmark to assess the capabilities of large vision-language models. The highlights of AutoEval-Video include:

  • AutoEval-Video constructs open-ended video-questions across 9 skill dimensions, addressing capabilities of perception, comprehension, and generation.
  • AutoEval-Video contains newly collected videos from YouTube that cover over 40 distinct themes.
  • Unique evaluation rules are annotated for every instance in AutoEval-Video, enabling accurate assessments by an LLM-based automatic evaluator.

Example Instance and Automatic Evaluation Process in AutoEval-Video.


Statistics of AutoEval-Video.

(a) The distribution of the skill dimensions and the video themes in AutoEval-Video.

(b) Statistical information of the video and annotations.

Please refer to our paper for more details about AutoEval-Video.

News

[2023.11.28] AutoEval-Video Leaderboard is released! Welcome to submit your model's results.

[2023.11.25] AutoEval-Video is released! Data and evaluation code is available now.

Leaderboard Submission

Welcome to submit your model results to AutoEval-Video Leaderboard. Please ensure your model results are prepared in JSON format, similar to prediction_sample.json.

Run Evaluation

Utilize our evaluation code, eval.py, to generate output.json, which contains your model's evaluation results. Please ensure your model results are prepared in JSON format, similar to prediction_sample.json. Execute the following evaluation script:

python3 eval.py --rule_path AutoEval-Video.json --pre_path <path_to_your_model_output> --output_dir ./results --ak <your_api_key>

The output.json file contains the accuracy of each instance, while the acc.txt file documents the overall accuracy score.

If you discover that any evaluation rules are not comprehensive, please feel free to submit an issue to us. We will refine the rules if there are identified problems. Additionally, the results on the leaderboard will be updated to reflect these changes.

License

AutoEval-Video is released under Apache License Version 2.0.

Declaration

All videos of AutoEval-Video are collected from YouTube (https://www.youtube.com), following the Creative Commons License (https://support.google.com/youtube/answer/2797468).

Citation

If you find AutoEval-Video useful for your research and applications, please cite using this BibTeX:

@article{chen2023autoevalvideo,
      title={AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering}, 
      author={Xiuyuan Chen and Yuan Lin and Yuchen Zhang and Weiran Huang},
      year={2023},
      journal={arXiv preprint arXiv:2311.14906}
}

About

Implementation for paper "AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering" (https://arxiv.org/abs/2311.14906)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%