Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncertainty regarding the optical center and depth width / height in metadata #94

Closed
gnikoloff opened this issue Oct 2, 2024 · 4 comments

Comments

@gnikoloff
Copy link

gnikoloff commented Oct 2, 2024

Hey, I am attempting to write a Unity importer for the r3d format. I have been following this example as a base of my code.

In this example the depth float32 is resized via numpy to match the rgb image resolution (720x960). All of the 2d-to-3d reprojection math that follows, involving the focal lengths and optical center extracted from K in metadata, relies on the fact that the depth matches the rgb image.

In my implementation I would rather not upscale the depth to match the rgb. This way I can deal with 49152 points per frame (192 * 256) instead of more than half a million points per frame (720 * 960).

So I ported all of the math from the linked example, however I do downscale the optical center of the camera by a factor of 3.75 to bring it to the depth map resolution range.

Here is my code:

float fx = K[0];
float fy = K[4];
// Downscale the optical center to depth map range
float cx = K[6] / 3.75;
float cy = K[7] / 3.75;

int depthMapWidth = 192;
int depthMapHeight = 256;

float halfDepthMapWidth = (float)depthMapWidth * 0.5f;
float halfDepthMapHeight = (float)depthMapHeight * 0.5f;

int pointsCount = depthMapWidth * depthMapHeight;
Vector3[] points = new Vector3[pointsCount];
int ii = 0;
for (int x = 0; x < depthMapWidth; x++) {
  for (int y = 0; y < depthMapHeight; y++) {
    float depth = depthImg[y][x];
    float f_x = ((float)(x) - halfDepthMapWidth - cx) * depth / fx;
    float f_y = ((float)(y) - halfDepthMapHeight - cy) * depth / fy;
    points[ii] = new Vector3(f_x, f_y, depth / 3);
    ii++;
  }
}

Now this kind of works? Here is it rendered in unity:

Screenshot 2024-10-02 at 19 33 56

However other captures break when I do:

float cx = K[6] / 3.75;
float cy = K[7] / 3.75;

If I remove the division the point cloud renders, albeit wrongly offseted.

My three questions are:

  1. Can I work in the depth map resolution space and downscale everything horizontally by 720 / 192 and everything vertically by 960 / 256? Of course I will need to sample every roughly 3rd pixel in the rgb map this way.
  2. Do I have to take the device pose into consideration when reprojecting from 2D back to 3D? The linked example above does not do it.
  3. Please notice how I subtract half of the depth map width and height to offset the points back to the center. Is this correct? Shouldn't cx and cy take care of this (I tried it and they are offseted to only one quadrant, i.e. the center of the point clod is shifted and not in the center of the viewport)?

Thanks in advance!

@marek-simonik
Copy link
Owner

Hello,

the problem with your code is twofold;

Problem 1

You should downscale not only the optical center, but also the focal length (fx and fy) by the same scale factor. This is the preferred way of downscaling the camera intrinsics:

float scale = (float) depthImgSize.width / rgbImgSize.width;
float fx_depth = K[0] * scale;
float fy_depth = K[4] * scale;
float cx_depth = K[6] * scale;
float cy_depth = K[7] * scale;

Record3D videos can have varying resolution of RGB images, so it is advised to always compute the scale by dividing the RGB and depth image resolutions of the specific video you are working with instead of hardcoding a fixed scale.

Problem 2

As you correctly said, subtracting half of the image's resolution in addition to subtracting the optical center's coordinates (cx and cy in your code) is wrong/redundant. This way of computing the 3D point's XY coordinates should be used:

float f_x = ((float)(x) - cx) * depth / fx;
float f_y = ((float)(y) - cy) * depth / fy;

I believe the above did answer your 3 question, but in short:

  1. Yes
  2. No
  3. That's wrong (see Problem 2)

Off-topic performance-related tip: assuming that depthImg stores pixels in row-major order, I suggest that you switch the two for loops; i.e. iterate over the rows (Y) of the image in the outer for loop and over columns (X) in the inner for loop. This should better utilize CPU cache, so the code should be faster (do measure the difference though):

for (int y = 0; y < depthMapHeight; y++)
  for (int x = 0; x < depthMapWidth; x++)

@gnikoloff
Copy link
Author

gnikoloff commented Oct 3, 2024

Thank you so much for the timely response! I just tried your math with various recordings and it does seem to work :)

One more question, and that is not directly related to Record3D, but I'd still appreciate your input: what is the "industry standard" when it comes to parsing and using the confidence levels? I was thinking about culling any pixel with confidence <2 (like the linked example from above). Is this what is usually done? Do you use this data somehow for the built-in vizualiser in Record3D?

Thanks again!

@marek-simonik
Copy link
Owner

Apologies for the delay. I'm not sure what is the industry standard for processing the confidence values. They are not used in the in-app viewer of Record3D, but if I were to use them, then I would consider doing a simple thresholding as per your suggestion.

@gnikoloff
Copy link
Author

Thank you for all the input. My questions are answered, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants