Junjie Hu, Yan Zhang, Takayuki Okatani, "Visualization of Convolutional Neural Networks for Monocular Depth Estimation," ICCV, 2019. paper
We attempt to interpret CNNs for monocular depth estimation. To this end, we propose to locate the most relevant pixels of input image to depth inference. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image.
Extensive experimental results show
-
The behaviour of CNNs that they seem to select edges in input images depending not on their strengths but on importance for inference of scene geometry.
-
The tendency of attending not only on the boundary but the inside region of each individual object.
-
The importance of image regions around the vanishing points for depth estimation on outdoor scenes.
Please check our paper for more details.
- python 2.7
- pytorch 0.3.1
Download the trained networks for depth estimation : Depth estimation networks
Download the trained networks for mask prediction : Mask prediction network
Download the NYU-v2 dataset: NYU-v2 dataset
-
python test.py
-
python train.py
If you use the code or the pre-processed data, please cite:
@inproceedings{Hu2019VisualizationOC,
title={Visualization of Convolutional Neural Networks for Monocular Depth Estimation},
author={Junjie Hu and Yan Zhang and Takayuki Okatani},
booktitle={IEEE International Conf. on Computer Vision (ICCV)},
year={2019}
}
@inproceedings{Hu2019RevisitingSI,
title={Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries},
author={Junjie Hu and Mete Ozay and Yan Zhang and Takayuki Okatani},
booktitle={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2019}
}