AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering

AutoEval-Video is a comprehensive and challenging benchmark to assess the capabilities of large vision-language models. The highlights of AutoEval-Video include:

AutoEval-Video constructs open-ended video-questions across 9 skill dimensions, addressing capabilities of perception, comprehension, and generation.
AutoEval-Video contains newly collected videos from YouTube that cover over 40 distinct themes.
Unique evaluation rules are annotated for every instance in AutoEval-Video, enabling accurate assessments by an LLM-based automatic evaluator.

Example Instance and Automatic Evaluation Process in AutoEval-Video.

Statistics of AutoEval-Video.

(a) The distribution of the skill dimensions and the video themes in AutoEval-Video.

(b) Statistical information of the video and annotations.

Please refer to our paper for more details about AutoEval-Video.

News

[2023.11.28] AutoEval-Video Leaderboard is released! Welcome to submit your model's results.

[2023.11.25] AutoEval-Video is released! Data and evaluation code is available now.

Leaderboard Submission

Welcome to submit your model results to AutoEval-Video Leaderboard. Please ensure your model results are prepared in JSON format, similar to prediction_sample.json.

Run Evaluation

Utilize our evaluation code, eval.py, to generate output.json, which contains your model's evaluation results. Please ensure your model results are prepared in JSON format, similar to prediction_sample.json. Execute the following evaluation script:

python3 eval.py --rule_path AutoEval-Video.json --pre_path <path_to_your_model_output> --output_dir ./results --ak <your_api_key>

The output.json file contains the accuracy of each instance, while the acc.txt file documents the overall accuracy score.

If you discover that any evaluation rules are not comprehensive, please feel free to submit an issue to us. We will refine the rules if there are identified problems. Additionally, the results on the leaderboard will be updated to reflect these changes.

License

AutoEval-Video is released under Apache License Version 2.0.

Declaration

All videos of AutoEval-Video are collected from YouTube (https://www.youtube.com), following the Creative Commons License (https://support.google.com/youtube/answer/2797468).

Citation

If you find AutoEval-Video useful for your research and applications, please cite using this BibTeX:

@article{chen2023autoevalvideo,
      title={AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering}, 
      author={Xiuyuan Chen and Yuan Lin and Yuchen Zhang and Weiran Huang},
      year={2023},
      journal={arXiv preprint arXiv:2311.14906}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
figs		figs
video		video
AUTO-EVAL-VIDEO.json		AUTO-EVAL-VIDEO.json
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
prediction_sample.json		prediction_sample.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering

News

Leaderboard Submission

Run Evaluation

License

Declaration

Citation

About

Releases

Packages

Languages

License

MIFA-Lab/AutoEval-Video

Folders and files

Latest commit

History

Repository files navigation

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering

News

Leaderboard Submission

Run Evaluation

License

Declaration

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages