Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding the paper #24

Open
maciejhalber opened this issue Nov 15, 2024 · 3 comments
Open

Questions regarding the paper #24

maciejhalber opened this issue Nov 15, 2024 · 3 comments

Comments

@maciejhalber
Copy link

maciejhalber commented Nov 15, 2024

Excellent work with amazing results! Thank you so much for sharing the code, this is an extremely exciting line of work.

This might not be the best place to discuss, but I was wondering if you could answer following questions:

  1. The need for custom solver for eqn. 4 - Eqn. 4 is a sum of convex functions, which means that it in itself will be convex. As such, if we find a minimum of this function we are guaranteed to find a global minimum. I understand that applying LP here might have been deemed to slow in terms of runtime, but what about a couple of iterations IRLS? Is it too slow / not accurate enough compared to your custom solver?
  2. In table 5, when comparing the quality of different output representations, was the network retrained to predict depth only, or were these numbers computed by still predicting point-maps and then keeping the z-component only? Section 4.3 suggests that you did retrain, but I wanted to check.

Thanks so much!

@EasternJournalist
Copy link
Collaborator

Hi, thank you for your interest and for raising these valuable questions!

  1. I agree that IRLS can be a viable alternative, particularly as a simpler compromise for the L1 minimization problem itself. However, beyond considerations of accuracy and efficiency, there are several additional reasons why we opted not to use an iterative algorithm in our experiments:
  • Uncertain convergence speed. While IRLS can achieve a satisfactory solution within an acceptable time, the trade-off between precision and speed is problem-dependent. The convergence speed can vary significantly based on the input, making it challenging to ensure consistent performance. This non-deterministic runtime and precision are not ideal for large-scale training. Tuning parameters like the tolerance and number of iterations will involve more efforts. In contrast, a custom solver that provides an exact solution within a predictable runtime - comparable to 10~15 IRLS iterations with $N=4096$ - is more desirable.
  • Gradient calculation. During an early training experiment, we observed a critical issue: the scale and shift solutions must retain their gradients. If the gradients of the $s$ and $t$ are detached, the network’s output scale tends to explode after a few hundred iterations. This happens because a small, unconstrained gradient in each step incrementally increases the output scale, eventually leading to NaNs. Retaining these gradients effectively cancels the overall output scale gradient, ensuring training stability.
    Our solver naturally retains the computational graph for the solution (using tensor scattering and gathering). However, iterative algorithms like IRLS do not explicitly provide the gradients. While approximations could be developed, they would unnecessarily complicate the problem.
  • Non-convexity of the truncated objective. A major motivation for developing our solver was to handle the truncated objective introduced in the paper: $\min \sum_i \min(w_i|s x_i+t-\hat x_i|, \tau)$. This formulation improves robustness in extreme cases and delivers slight performance gains. Adapting IRLS to this truncated objective is non-trivial due to the inherent non-convexity. Our solver still applies to this problem with a few adaptations.
  1. The network is not retrained for depth estimation. We directly use the z-coordinates from the point map as the depth. The raw $z$ coordinates provide an affine-invariant depth. For scale-invariant depth, the shift is recovered from the point map prediction by solving a PnP-like problem (in Sec 3.1).

@maciejhalber
Copy link
Author

maciejhalber commented Nov 19, 2024

Thank you so much for thorough answers. I was quite intimidated by the equations and talked myself into thinking that IRLS might be sufficient here, but you give compelling reasons to jump into them once more.

If I may be so forward - do you have any ideas how to use the outputs of MoGE with known intrinsics? I know that you perform optimization to recover focal lengths and z-shift, but these do not match exactly my intrinsics. Applying my intrinsics directly to MoGE depth maps leads to some warping. My idea was to take raw (affine-invariant) predictions from MoGE and try to solve following optimization

Essentially we want to find scale and shift that would minimize the distance between a point and each corresponding ray. Would love to hear your thoughts.

@EasternJournalist
Copy link
Collaborator

Great idea! I agree that recovering shift with known intrinsics can be very useful in many scenarios. Although the model itself is not aware of intrinsics input, the output point cloud may be forced to adapt to the user-provided intrinsics.

Since farther points are distributed more sparsely, if optimizing Euclidean distance, the objective will be likely dominated by far-away points. It might be more effective to optimize the projection error rather than the Euclidean distance:

$$ \min_{t_x,t_y,t_z} \sum_{i=1}^N \left({f(x_i+t_x)\over z_i+t_z} - u_i\right)^2+\left({f(y_i+t_y)\over z_i+t_z} - v_i\right)^2 $$

which is non-linear then. If the principal point is centered, as assumed in our model, solving the problem should be straightforward with $t_x=t_y=0$, similar to the recovery optimization described in the paper, albeit with a fixed focal length. However, when considering general intrinsics with non-centered principal points, I am not sure whether the optimization would still converge efficiently. I am going to run some tests and integrate them into the inference interface soon.

Besides, I am sorry for any confusion regarding the algorithm details. While the custom solver may appear complex at first glance, the underlying idea is quite simple—almost brute-force. The paper will be updated soon to enhance clarity by including more concise details and pseudocode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants