Skip to content

Latest commit

 

History

History
180 lines (152 loc) · 35 KB

File metadata and controls

180 lines (152 loc) · 35 KB

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

List of Rankings

2D to 3D Video Conversion Rankings

  1. Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

Monocular Depth Estimation Rankings

I. Rankings based on temporal consistency metrics

  1. ScanNet++ (98 video clips with 32 frames each): TAE
  2. NYU-Depth V2: OPW<=0.37

II. Rankings based on 3D metrics

  1. Direct comparison of 9 metric depth models (each with each) on 5 datasets: F-score

III. Rankings based on 2D metrics

  1. Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.078
  2. NYU-Depth V2: AbsRel<=0.045 (relative depth)
  3. NYU-Depth V2: AbsRel<=0.051 (metric depth)

IV. Old layout - currently no longer up to date

  1. NYU-Depth V2 (640×480): AbsRel<=0.058 (old layout - currently no longer up to date)
  2. DA-2K (mostly 1500×2000): Acc (%)>=86 (old layout - currently no longer up to date)
  3. UnrealStereo4K (3840×2160): AbsRel<=0.04 (old layout - currently no longer up to date)
  4. Middlebury2021 (1920×1080): SqRel<=0.5 (old layout - currently no longer up to date)

Appendices


Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

📝 Note: There are no quantitative comparison results of StereoCrafter yet, so this ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black

RK Model
Links:
         Venue   Repository    
Rank (human perceptual
judgment) ↓
arXiv
StereoCrafter
1 StereoCrafter
arXiv
1
2-3 Immersity AI 2-3
2-3 Owl3D 2-3
4 Deep3D
ECCV GitHub Stars
4

Back to Top Back to the List of Rankings

ScanNet++ (98 video clips with 32 frames each): TAE

RK Model
Links:
         Venue   Repository    
  TAE ↓  
{Input fr.}
arXiv
DAV
1 Depth Any Video
arXiv GitHub Stars
2.1 {MF}
2 DepthCrafter
arXiv GitHub Stars
2.2 {MF}
3 ChronoDepth
arXiv GitHub Stars
2.3 {MF}
4 NVDS
ICCV GitHub Stars
3.7 {4}

Back to Top Back to the List of Rankings

NYU-Depth V2: OPW<=0.37

RK Model
Links:
         Venue   Repository    
  OPW ↓  
{Input fr.}
arXiv
FD
   OPW ↓   
{Input fr.}
TPAMI
NVDS+
  OPW ↓  
{Input fr.}
ICCV
NVDS
1 FutureDepth
arXiv
0.303 {4} - -
2 NVDS+
TPAMI GitHub Stars
- 0.339 {4} -
3 NVDS
ICCV GitHub Stars
0.364 {4} - 0.364 {4}

Back to Top Back to the List of Rankings

Direct comparison of 9 metric depth models (each with each) on 5 datasets: F-score

📝 Note: This ranking is based on data from Table 4. The example result 3:0:2 (first left in the first row) means that Depth Pro has a better F-score than UniDepth-V in 3 datasets, in no dataset has the same F-score as UniDepth-V and has a worse F-score compared to UniDepth-V in 2 datasets.

RK Model
Links:
         Venue   Repository    
DP UD M3D v2 DA V2 DA ZoD M3D PF ZeD
1 Depth Pro
arXiv GitHub Stars
- 3:0:2 3:1:1 5:0:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
2 UniDepth-V
CVPR GitHub Stars
2:0:3 - 4:0:1 5:0:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
3 Metric3D v2 ViT-giant
TPAMI GitHub Stars
1:1:3 1:0:4 - 4:1:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
4 Depth Anything V2
NeurIPS GitHub Stars
0:0:5 0:0:5 0:1:4 - 4:1:0 4:0:1 5:0:0 4:0:1 3:0:0

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.078

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
MonST3R
  AbsRel ↓  
{Input fr.}
arXiv
DC
1 MonST3R
arXiv GitHub Stars
0.063 {MF} -
2 DepthCrafter
arXiv GitHub Stars
0.075 {MF} 0.075 {MF}
3 Depth Anything
CVPR GitHub Stars
- 0.078 {1}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.045 (relative depth)

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
BD
   AbsRel ↓   
{Input fr.}
TPAMI
M3D v2
  AbsRel ↓  
{Input fr.}
CVPR
DA
    AbsRel ↓    
{Input fr.}
NeurIPS
DA V2
- - - -
1-2 BetterDepth
arXiv
0.042 {1} - - - - - - -
1-2 Metric3D v2 ViT-Large
TPAMI GitHub Stars
- 0.042 {1} - - - - - -
3 Depth Anything Large
CVPR GitHub Stars
0.043 {1} 0.043 {1} 0.043 {1} 0.043 {1} - - - -
4 Depth Anything V2 Large
NeurIPS GitHub Stars
- - - 0.045 {1} - - - -

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.051 (metric depth)

RK Model
Links:
         Venue   Repository    
   AbsRel ↓   
{Input fr.}
TPAMI
M3D v2
  AbsRel ↓  
{Input fr.}
arXiv
GRIN
- - - - -
1 Metric3D v2 ViT-giant
TPAMI GitHub Stars
0.045 {1} - - - - - -
2 GRIN_FT_NI
arXiv
- 0.051 {1} - - - - -

Back to Top Back to the List of Rankings

NYU-Depth V2 (640×480): AbsRel<=0.058 (old layout - currently no longer up to date)

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1-2 BetterDepth
arXiv
Backbone:
Depth Anything & Marigold
0.042 {1}
arXiv
Hypersim & Virtual KITTI - - -
1-2 Metric3D v2 CSTM_label
ICCV
ENH:
arXiv
Backbone:
DINOv2 with registers (ViT-L/14)
0.042 {1}
arXiv
DDAD & Lyft & Driving Stereo & DIML & Arogoverse2 & Cityscapes & DSEC & Mapillary PSD & Pandaset & UASOL & Virtual KITTI & Waymo & Matterport3d & Taskonomy & Replica & ScanNet & HM3d & Hypersim GitHub Stars - -
3 Depth Anything Large
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.043 {1}
CVPR
Pretraining: BlendedMVS & DIML & HR-WSI & IRS & MegaDepth & TartanAir
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars - -
4 MiDaS v3.1 BEiTL-512
TPAMI
ENH:
arXiv
Backbone:
BEiT512-L (ViT-L/16)
0.048 {1}
CVPR
Pretraining: ReDWeb & HR-WSI & BlendedMVS & NYU-Depth V2 & KITTI
Training: ReDWeb & DIML & 3D Movies & MegaDepth & WSVD & TartanAir & HR-WSI & ApolloScape & BlendedMVS & IRS & NYU-Depth V2 & KITTI
GitHub Stars - PyTorch
GitHub Stars
5 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
0.052 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
6 Marigold
CVPR
Backbone:
Stable Diffusion v2
0.055 {1}
CVPR
Hypersim & Virtual KITTI GitHub Stars - -
7 GenPercept
arXiv
Backbone:
Stable Diffusion v2.1
0.056 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -
8 NeWCRFs + LightedDepth
CVPR
ENH:
CVPR
0.057 {2}
CVPR
ENH:
NYU-Depth V2
GitHub Stars
ENH:
GitHub Stars
- -
9 UniDepth-V
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.0578 {1}
CVPR
A2D2 & Argoverse2 & BDD100k & CityScapes & DrivingStereo & Mapillary PSD & ScanNet & Taskonomy & Waymo GitHub Stars - -

Back to Top Back to the List of Rankings

DA-2K (mostly 1500×2000): Acc (%)>=86 (old layout - currently no longer up to date)

RK     Model      Acc (%) ↑ 
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 Depth Anything V2 Giant
CVPR
ENH:
arXiv
Backbone:
DINOv2 (ViT-G/14)
97.4 {1}
arXiv
Pretraining: BlendedMVS & Hypersim & IRS & TartanAir & VKITTI 2
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars
ENH:
GitHub Stars
- -
2 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
88.1 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
3 Marigold
CVPR
Backbone:
Stable Diffusion v2
86.8 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -

Back to Top Back to the List of Rankings

UnrealStereo4K (3840×2160): AbsRel<=0.04 (old layout - currently no longer up to date)

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 ZoeDepth +PFR=128
arXiv
ENH:
CVPR
0.0388 {1}
CVPR
ENH:
UnrealStereo4K
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Middlebury2021 (1920×1080): SqRel<=0.5 (old layout - currently no longer up to date)

RK     Model       SqRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 LeReS-GBDMF
CVPR
ENH:
AAAI
0.444 {1}
AAAI
ENH:
HR-WSI
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Appendix 3: List of all research papers from the above rankings

Method Paper     Venue    
BetterDepth BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation arXiv
ChronoDepth Learning Temporally Consistent Video Depth from Video Diffusion Priors arXiv
Deep3D Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks ECCV
Depth Any Video Depth Any Video with Scalable Synthetic Data arXiv
Depth Anything Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR
Depth Anything V2 Depth Anything V2 NeurIPS
Depth Pro Depth Pro: Sharp Monocular Metric Depth in Less Than a Second arXiv
DepthCrafter DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos arXiv
FutureDepth FutureDepth: Learning to Predict the Future Improves Video Depth Estimation arXiv
GBDMF Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition AAAI
GenPercept Diffusion Models Trained with Large Data Are Transferable Visual Models arXiv
GeoWizard GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image arXiv
GRIN GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion arXiv
LeReS Learning to Recover 3D Scene Shape from a Single Image CVPR
LightedDepth LightedDepth: Video Depth Estimation in light of Limited Inference View Angles CVPR
Marigold Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation CVPR
Metric3D Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ICCV
Metric3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation TPAMI
MiDaS Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer TPAMI
MiDaS v3.1 MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation arXiv
MonST3R MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion arXiv
NeWCRFs Neural Window Fully-connected CRFs for Monocular Depth Estimation CVPR
NVDS Neural Video Depth Stabilizer ICCV
NVDS+ NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation TPAMI
PatchFusion PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation CVPR
StereoCrafter StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos arXiv
UniDepth UniDepth: Universal Monocular Metric Depth Estimation CVPR
ZoeDepth ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth arXiv

Back to Top Back to the List of Rankings