Uncertainty regarding the optical center and depth width / height in `metadata` #94

gnikoloff · 2024-10-02T17:39:57Z

Hey, I am attempting to write a Unity importer for the r3d format. I have been following this example as a base of my code.

In this example the depth float32 is resized via numpy to match the rgb image resolution (720x960). All of the 2d-to-3d reprojection math that follows, involving the focal lengths and optical center extracted from K in metadata, relies on the fact that the depth matches the rgb image.

In my implementation I would rather not upscale the depth to match the rgb. This way I can deal with 49152 points per frame (192 * 256) instead of more than half a million points per frame (720 * 960).

So I ported all of the math from the linked example, however I do downscale the optical center of the camera by a factor of 3.75 to bring it to the depth map resolution range.

Here is my code:

float fx = K[0];
float fy = K[4];
// Downscale the optical center to depth map range
float cx = K[6] / 3.75;
float cy = K[7] / 3.75;

int depthMapWidth = 192;
int depthMapHeight = 256;

float halfDepthMapWidth = (float)depthMapWidth * 0.5f;
float halfDepthMapHeight = (float)depthMapHeight * 0.5f;

int pointsCount = depthMapWidth * depthMapHeight;
Vector3[] points = new Vector3[pointsCount];
int ii = 0;
for (int x = 0; x < depthMapWidth; x++) {
  for (int y = 0; y < depthMapHeight; y++) {
    float depth = depthImg[y][x];
    float f_x = ((float)(x) - halfDepthMapWidth - cx) * depth / fx;
    float f_y = ((float)(y) - halfDepthMapHeight - cy) * depth / fy;
    points[ii] = new Vector3(f_x, f_y, depth / 3);
    ii++;
  }
}

Now this kind of works? Here is it rendered in unity:

However other captures break when I do:

float cx = K[6] / 3.75;
float cy = K[7] / 3.75;

If I remove the division the point cloud renders, albeit wrongly offseted.

My three questions are:

Can I work in the depth map resolution space and downscale everything horizontally by 720 / 192 and everything vertically by 960 / 256? Of course I will need to sample every roughly 3rd pixel in the rgb map this way.
Do I have to take the device pose into consideration when reprojecting from 2D back to 3D? The linked example above does not do it.
Please notice how I subtract half of the depth map width and height to offset the points back to the center. Is this correct? Shouldn't cx and cy take care of this (I tried it and they are offseted to only one quadrant, i.e. the center of the point clod is shifted and not in the center of the viewport)?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

marek-simonik · 2024-10-02T19:13:49Z

Hello,

the problem with your code is twofold;

Problem 1

You should downscale not only the optical center, but also the focal length (fx and fy) by the same scale factor. This is the preferred way of downscaling the camera intrinsics:

float scale = (float) depthImgSize.width / rgbImgSize.width;
float fx_depth = K[0] * scale;
float fy_depth = K[4] * scale;
float cx_depth = K[6] * scale;
float cy_depth = K[7] * scale;

Record3D videos can have varying resolution of RGB images, so it is advised to always compute the scale by dividing the RGB and depth image resolutions of the specific video you are working with instead of hardcoding a fixed scale.

Problem 2

As you correctly said, subtracting half of the image's resolution in addition to subtracting the optical center's coordinates (cx and cy in your code) is wrong/redundant. This way of computing the 3D point's XY coordinates should be used:

float f_x = ((float)(x) - cx) * depth / fx;
float f_y = ((float)(y) - cy) * depth / fy;

I believe the above did answer your 3 question, but in short:

Yes
No
That's wrong (see Problem 2)

Off-topic performance-related tip: assuming that depthImg stores pixels in row-major order, I suggest that you switch the two for loops; i.e. iterate over the rows (Y) of the image in the outer for loop and over columns (X) in the inner for loop. This should better utilize CPU cache, so the code should be faster (do measure the difference though):

for (int y = 0; y < depthMapHeight; y++)
  for (int x = 0; x < depthMapWidth; x++)

gnikoloff · 2024-10-03T08:26:39Z

Thank you so much for the timely response! I just tried your math with various recordings and it does seem to work :)

One more question, and that is not directly related to Record3D, but I'd still appreciate your input: what is the "industry standard" when it comes to parsing and using the confidence levels? I was thinking about culling any pixel with confidence <2 (like the linked example from above). Is this what is usually done? Do you use this data somehow for the built-in vizualiser in Record3D?

Thanks again!

marek-simonik · 2024-10-07T11:50:46Z

Apologies for the delay. I'm not sure what is the industry standard for processing the confidence values. They are not used in the in-app viewer of Record3D, but if I were to use them, then I would consider doing a simple thresholding as per your suggestion.

gnikoloff · 2024-10-10T09:37:13Z

Thank you for all the input. My questions are answered, closing this issue.

gnikoloff closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncertainty regarding the optical center and depth width / height in `metadata` #94

Uncertainty regarding the optical center and depth width / height in `metadata` #94

gnikoloff commented Oct 2, 2024 •

edited

Loading

marek-simonik commented Oct 2, 2024

gnikoloff commented Oct 3, 2024 •

edited

Loading

marek-simonik commented Oct 7, 2024

gnikoloff commented Oct 10, 2024

Uncertainty regarding the optical center and depth width / height in metadata #94

Uncertainty regarding the optical center and depth width / height in metadata #94

Comments

gnikoloff commented Oct 2, 2024 • edited Loading

marek-simonik commented Oct 2, 2024

Problem 1

Problem 2

gnikoloff commented Oct 3, 2024 • edited Loading

marek-simonik commented Oct 7, 2024

gnikoloff commented Oct 10, 2024

Uncertainty regarding the optical center and depth width / height in `metadata` #94

Uncertainty regarding the optical center and depth width / height in `metadata` #94

gnikoloff commented Oct 2, 2024 •

edited

Loading

gnikoloff commented Oct 3, 2024 •

edited

Loading