Evaluting Monocular Absolute Depth prediction on various Objects

Introduction

Calculating the object distance from a camera is a fundamental machine vision problem. To calculate distance of an object from camera by estimating the size of the object from an image, given only the raw image file and no metadata except for the resolution and number of pixels and then interfacing it with Mask RCNN to get the distance of the target object.

Methods and reference

Paper followed: “Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications”. This paper has an idea of implementing self-supervised method with DNet architecture. At first relative depth estimation is done with the dense connected prediction (DCP) that hierarchically combines features in different levels and handles local gradients, after which scale recovery is done to get ground-level depth. We estimate scale factor for current relative depth and from there absolute depth is calculated pixel-wise. Github repo: TJ-IPLab/Dnet

Basic Idea

DNet architecture is followed here for testing models. DNet is a self-supervised monocular depth estimation pipeline that exploits densely connected hierarchical features to obtain more precise object-level depth inference and uses dense geometrical constraints to perform scale recovery. We evaluate the depth of custom images by testing on a pretrained model. Here we use custom images to observe how the architecture is performing. We evaluate the relative depth from that model and perform scale recovery using dense geometrical constrain module. Then we estimate the absolute depth. Then we use Mask RCNN to detect the particular object in the image and calculate mean absolute depth value, thus considering it as the depth from camera.

Current Status

We tested on several indoor and outdoor images. Camera height is a crucial parameter in this model. We tested on different images with variant distances and different camera heights to check how the deviation in distance is changing with increase or decrease of camera height. In the case of indoor images with very less distance, the model is giving moderate results. We tested further on outdoor images with different objects, here those images where Mask RCNN is identifying more than one mask within the object is performing really badly. We found that for an instance with the exact camera height our results were giving slightly high deviations however as we decreased the height, the predicted distance and the deviation was decreased by a range. As mentioned in the paper, they have used a statistical method for estimating camera height which is unknown and not clearly portrayed. Scale factor here is determined through comparison between the given and estimated camera height. As there is an issue in getting the camera height and the proper measures are unknown, we are getting anomalies and errors in distance prediction. Further, low and high resolution of camera pictures is an issue here. We have used mobile cameras while they have used on-board cameras to capture the images. Different factors are there due to which the deviation is varying. A proper conclusion could not be made from these verdicts.

Observations

From the above observations it is clear the model is giving good and bad results. The Only variable that can affect this is the camera height. Both indoor and outdoor images were used. In both the cases the with respect to various camera height the model is performing differently. The evidence for this is in the case of Motorcycle, even though the camera height is between 150-165 cm for it, the model is performing very badly. But when we gave the metric as a random value in this case at 45, the deviation decreased from around 300 % to less than 10 % even though 45 wasn’t the correct height.

Output

Object Tag	Camera Height(appox)	Actual Distance	Predicted Distance	Deviation
Stool	13 cm	125 cm	131 cm	4.8 %
Bottle	13 cm	90 cm	109 cm	21.1 %
Hydrant	38 cm	300 cm	338 cm	12.6%
Hydrant	37 cm	500 cm	260 cm	48%
Motorcycle	165 cm	500 cm	19.11 m	294 %
Motorcycle	Random (45 cm)	500 cm	5.37 m	7.4 %
Motorcycle	165 cm	6 m	20.45 m	240 %
Motorcycle	Random (45 cm)	6 m	5.58 m	7 %

ACTUAL DISTANCE = 0.90m PREDICTED DISTANCE = 1.09m DEVIATION = 21.1%

ACTUAL DISTANCE = 3.00 m PREDICTED DISTANCE = 3.38 m DEVIATION=12.6 %

ACTUAL DISTANCE = 6 m PREDICTED DISTANCE = 5.58 m DEVIATION=7 %

Conclusion

The Camera height we are giving may not be the exact height. For the same object at different height it is giving varying values. I am not able to figure out a way for this. More research on how they are analysing this camera height is needed in order to move forward.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Dataset		Dataset
Disparity Predictions		Disparity Predictions
Depth Esimation.pdf		Depth Esimation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluting Monocular Absolute Depth prediction on various Objects

Introduction

Methods and reference

Basic Idea

Current Status

Observations

Output

Conclusion

About

Releases

Packages

thomasjv799/DNet-Document

Folders and files

Latest commit

History

Repository files navigation

Evaluting Monocular Absolute Depth prediction on various Objects

Introduction

Methods and reference

Basic Idea

Current Status

Observations

Output

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages