Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 2.32 KB

README.md

File metadata and controls

51 lines (35 loc) · 2.32 KB

Vision Transformer for video interpolation 🤖📷✌️

This project implements state-of-the-art model Vision Transformer for video frame interpolation to increase its FPS , and compares its performance with traditional approaches like Deep Voxel flow , Super slomo model.

Table of Contents

Introduction

Here I implemented VIFT and Super Slomo model as published in these vift, slomo respectively .

  • Deep Voxel Flow
    • uses Optical flow & CNN approach
    • unable to handle complex motions
  • Super Slomo
    • It replaces Optical flow by flow interpretation Unet like architecture .
    • computationally expensive
  • Video Frame Interpolation Transformer
    • It uses Swin Transformer blocks ( Shifted Window transformer ) to reduce time complexity from quadratic to linear .
    • much smaller compared to Super Slomo , while still achieving better performance .

Demo

Results

model / metric Parameters (M) PSNR ( peek-signal-to-noise-ratio ) SSMI ( structural similarity index )
Deep voxel flow - 27.6 0.92
Super Slomo 38 31.4 0.94
VIFT 7 35.1 0.96

References

License

This project is licensed under the MIT License. See the LICENSE.md file for details.