Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
SHYuanBest authored Oct 17, 2024
1 parent 6a9c85f commit 89a5212
Showing 1 changed file with 15 additions and 16 deletions.
31 changes: 15 additions & 16 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -312,22 +312,21 @@ <h2 class="subtitle has-text-centered">
<h2 class="title is-3 is-centered">Abstract</h2>
<div class="content has-text-justified">
<p>
We propose a novel text-to-video (T2V) generation benchmark, <i>ChronoMagic-Bench</i><sup>1</sup>, to evaluate the temporal
and metamorphic capabilities of the T2V models (e.g., Sora<sup>2</sup> and Lumiere<sup>3</sup>) in time-lapse video generation.
In contrast to existing benchmarks that focus on visual quality and textual relevance of generated videos, <i>ChronoMagic-Bench</i>
focuses on the models’ ability to generate time-lapse videos with significant metamorphic amplitude and temporal coherence.
The benchmark probes T2V models for their physics, biology, and chemistry capabilities, in a free-form text query. For these
purposes, <i>ChronoMagic-Bench</i> introduces <strong>1,649</strong> prompts and real-world videos as references, categorized
into four major types of time-lapse videos: biological, human-created, meteorological, and physical phenomena, which are further
divided into 75 subcategories. This categorization ensures a comprehensive evaluation of the models’ capacity to handle diverse and
complex transformations. To accurately align human preference with the benchmark, we introduce two new automatic metrics, MTScore
and CHScore, to evaluate the videos' metamorphic attributes and temporal coherence. MTScore measures the metamorphic amplitude,
reflecting the degree of change over time, while CHScore assesses the temporal coherence, ensuring the generated videos maintain
logical progression and continuity. Based on the <i>ChronoMagic-Bench</i>, we conduct comprehensive manual evaluations of ten
representative T2V models, revealing their strengths and weaknesses across different categories of prompts, and providing a thorough
evaluation framework that addresses current gaps in video generation research. Moreover, we create a large-scale <i>ChronoMagic-Pro</i>
dataset, containing <strong>460k</strong> high-quality pairs of 720p time-lapse videos and detailed captions. Each caption ensures high
physical pertinence and large metamorphic amplitude, which have a far-reaching impact on the T2V generation community.
We propose a novel text-to-video (T2V) generation benchmark, <i>ChronoMagic-Bench</i>, to evaluate the temporal and metamorphic
knowledge skills in time-lapse video generation of the T2V models (e.g. Sora and Lumiere). Compared to existing benchmarks that
focus on visual quality and text relevance of generated videos, <i>ChronoMagic-Bench</i> focuses on the models’ ability to generate
time-lapse videos with significant metamorphic amplitude and temporal coherence. The benchmark probes T2V models for their physics,
biology, and chemistry capabilities, in a free-form text control. For these purposes, <i>ChronoMagic-Bench</i> introduces <b>1,649</b>
prompts and real-world videos as references, categorized into four major types of time-lapse videos: biological, human creation,
meteorological, and physical phenomena, which are further divided into 75 subcategories. This categorization ensures a comprehensive
evaluation of the models’ capacity to handle diverse and complex transformations. To accurately align human preference on the benchmark, we
introduce two new automatic metrics, MTScore and CHScore, to evaluate the videos' metamorphic attributes and temporal coherence. MTScore
measures the metamorphic amplitude, reflecting the degree of change over time, while CHScore assesses the temporal coherence, ensuring
the generated videos maintain logical progression and continuity. Based on the <i>ChronoMagic-Bench</i>, we conduct comprehensive manual
evaluations of eighteen representative T2V models, revealing their strengths and weaknesses across different categories of prompts,
providing a thorough evaluation framework that addresses current gaps in video generation research. More encouragingly, we create a
large-scale <i>ChronoMagic-Pro</i> dataset, containing <b>460k</b> high-quality pairs of 720p time-lapse videos and detailed captions.
Each caption ensures high physical content and large metamorphic amplitude, which have a far-reaching impact on the video generation community.
</p>
</div>
</div>
Expand Down

0 comments on commit 89a5212

Please sign in to comment.