Can Depth Map precision be increased? #410

GitHubMJW · 2023-05-18T07:52:00Z

GitHubMJW
May 18, 2023

The depth map image is displayed as a grayscale image, with R, G, and B all being equal. I assume that's the representation used to control the image generation. If so, that means there are only 256 depth values. I was wondering if it might be possible to use the concatenated RGB values to allow for up to 24 bits of precision. I realize there are a number of steps in the process that might prevent this; however, if it were possible to increase the depth precision it would greatly benefit the common situation of a person in the foreground with objects in the distance.

EDIT: I tried to find out whether the MiDaS depth estimator (which is what I believe is used) would support returning a higher precision result, but didn't understand what I read well enough to answer that question. I did find that actually it returns the reciprocal of the depth, which is somewhat interesting.

EDIT: Since I've gotten no responses to my idea, even to say (rightly or wrongly) it's inane, I thought I'd at least explain my line of thinking.

I don't know a lot about GPUs, but from what I do know, I believe most of their calculations are done using floating-point arithmetic. I therefore think it's likely that the output of the depth-estimator preprocessor, and the input to the ControlNet model that uses the computed depth values, are floating-point numbers; and if the data are conveyed by the image, they're converted from FPs into 8-bit unsigned integers to store in the image, and from integers to FPs to be processed by the model.

Possibly the model directly uses the output of the preprocessor, and the grayscale depth image is only intended as a user-friendly representation of the underlying data. If that's so, the answer to the question in my comment title is "No" -- the full precision of the depth data is already being used.

If, however, the depth image is the output of the preprocessor and the input to the ControlNet model, then unless the precision of the depth preprocessor is less than 1 part in 256 throughout its range, the image generation would likely benefit from a more precise representation of the depth values.

The easy way to do that would be to treat the concatenated RGB components of the image as 24-bit numbers. That would no doubt have far more precision than required. Perhaps, though, it's considered desirable to have the image provide an intuitive representation of the data. If so, that makes representing the depths values more challenging. One suggestion is to use a pseudo-color scheme. For instance, the rainbow-like progression Black->Red->Yellow->Green->Cyan->Blue->Magenta->White could provide seven times the precision, which is 1 part in 1792.

geroldmeisinger · 2023-09-19T11:03:42Z

geroldmeisinger
Sep 19, 2023

EDIT: Since I've gotten no responses to my idea, even to say (rightly or wrongly) it's inane, I thought I'd at least explain my line of thinking.

Your request is very reasonable. I don't know the answer to this, so here are some points which come to mind:

Resolution of the image dataset in the first place. Midas was mainly trained on 3D movies, which only have a certain precision (I don't know). To get abitrary precision you would either need high-resolution 3D movies, or synthetic datasets (computer games can generate any precision you want) with the downside that you train on and likely infer synthetic rendering.
Support for floating-point images in control net. so far it only supports int8 but I guess the change is technically possible.
May I also ask WHY you need higher precision or what your specific use case is?

Here is the original paper:
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer - original Midas paper "Armed with these tools, we experiment with five diverse training datasets, including a new, massive data source: 3D films"

but you already tried to research this yourself

EDIT: I tried to find out whether the MiDaS depth estimator (which is what I believe is used) would support returning a higher precision result, but didn't understand what I read well enough to answer that question.

you should also look into the other Depth estimator (Zoe, Leres and what not..). i think they have higher resolution already.

0 replies

geroldmeisinger · 2023-09-19T15:48:30Z

geroldmeisinger
Sep 19, 2023

apparently there is a bug in the controlnet extension right now: Fix 16-bit grayscale control image conversion

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can Depth Map precision be increased? #410

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Can Depth Map precision be increased? #410

GitHubMJW May 18, 2023

Replies: 2 comments

geroldmeisinger Sep 19, 2023

geroldmeisinger Sep 19, 2023

GitHubMJW
May 18, 2023

geroldmeisinger
Sep 19, 2023

geroldmeisinger
Sep 19, 2023