GPU accelerated single view passive stereo depth estimation pipeline. Architecture of the pipeline is presented at the image bellow.
- Real-time DNN based right view generation
- Multiple depth estimation backends
- Real-time CUDA stereo matching algorithm
- Group-wise Correlation Stereo Network (GwcNet)
- MobileStereoNet (MSNet2D & MSNet3D)
- REST API for the entire depth estimation pipeline
Architecture and the data flow of the Right view syntehsis module is presented on the image bellow.
There are multiple depth estimation backends implemented - CUDA stereo matching algorithm, GwcNet and MobileStereoNet. Backend can be configured when creating an instance of DepthEstimationPipeline
class.
The algorithm consists of 9 steps, each of which can we efficiently implemented for execution on GPUs:
- Input images are converted to grayscale
- Input images are scaled down by the factor of
$K$ using Mean Pooling algorithm - Matching cost volume construction using SAD as dissimilarity measure
- Multi-block cost function aggregation
- Winner-take-all disparity selection
- Secondary matching based on 1D disparity optimization using cost space parabola fit
- Disparity map upscale by the factor of
$K$ - Vertical disparity fill using bilateral estimation
- Horizontal disparity fill using bilateral estimation
Architecture of the GwcNet model is presented at the image bellow.
The architecture of the MobileStereoNet model is very similar to the GwcNet model. Main differences inclue using depth-wise separable convolutions instead of regular 3D convolutions, as well as using different method for constructing the combined cost volume from the feature maps of left and right input images (presented at the image bellow).
All videos are saved in 10 FPS, however, in reality, the frame rate of each of the depth estimation modules differes - stereo matching algorithm works at 30 FPS, GwcNet works at 6 FPS and MobileStereoNet works at 4 FPS.
cuda_2022-09-19_18-28-13.mp4
gwcnet_2022-09-19_18-28-51.mp4
msnet3d_2022-09-19_18-30-28.mp4
-
Right view synthesis
-
Xie, Junyuan, Ross Girshick, and Ali Farhadi. "Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks." European conference on computer vision. Springer, Cham, 2016
-
Luo, Yue, et al. "Single view stereo matching." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
-
-
CUDA stereo matching
- Chang, Qiong, and Tsutomu Maruyama. "Real-time stereo vision system: a multi-block matching on GPU." IEEE Access 6 (2018): 42030-42046.
-
GwcNet
- Guo, Xiaoyang, et al. "Group-wise correlation stereo network." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
-
MobileStereoNet
- Shamsafar, Faranak, et al. "Mobilestereonet: Towards lightweight deep networks for stereo matching." Proceedings of the ieee/cvf winter conference on applications of computer vision. 2022.