-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prediction shape mismatch with GroundTruth #156
Comments
Since some operations in the network involve splitting into patches and up/down sampling, you need to ensure that the input's width and height are divisible by 28.
When your input size is (132,176), this is what happens: Therefore, the recommended practice is to always include |
@ZachL1 Hello, Thank you members for answering my questions earlier, the data structure requirements of kitti data given by your training example, I want to fine-tune your model to suit my use case:
|
@ZachL1 So, after training for 30 epochs on the dataset i have, i do get a decent drop in loss apparently. I get roughly the same accuracy on test images. But then i save the output images using 'do_test.py' and i get some output like this:- I looked at the ranges of values in GT and predicted depth. It seems like the network takes as input, GT depth normalized between 0 and 1 while outputs a prediction which is normalized between 0 and 200. Does this immediately indicate some mistake i might be making, or is this by design supposed to be. |
Hi, @oywenjun11
Hi, @saadehmd I think this might primarily be a visualization issue. The lower left corner seems to be incorrectly predicted as sky. Referring to some existing discussions, you can try clipping the predicted depth first, and then see how the visualization results look.
|
@ZachL1 Thank you so much for taking the time out of your busy schedule to answer my questions, Solved my doubts in this regard.I have recently started using my own dataset to try to fine-tune your model, and I have the following questions:
|
Hi, @oywenjun11
|
Hi, Im trying to use your model for outdoor scene to get metric depth to eventually get a sense of scale at the specific location, but the depth values are too varied, can you help me with this more details here |
@ZachL1 |
Hi,
Thanks a lot for sharing your work and instructions for training and inference. I have been trying to train 'dino_vit_small_reg.dpt_raft' on a mini-dataset of my own. I basically modeled it almost like KITTI, except ofcourse the differences in original image size, focal_length, metric_scale. I also don't have any semantic maps so those are basically just left empty. There are also no normals pre-calculated so those are just arrays of zeros(the way the base dataset initializes them from NONE)
This is the relevant part of dataset config:-
The depthmap i use basically has all the depth values in metric space and 8m is the max detection range
The process_depth part:-
I clip between 0.3 - 3.5 m and normalize to (0 - 1) for canonical transformation.
Now i get that this might not be the most desirable dataset with such small sized images, so i wasn't expecting any impressive results. But atleast i was expecting to train on these without introducing any errors in data-loading/ pre-processing. I can't figure out why but a single FORWARD() pass thru the network generates predictions that are not of the same shape as GroundTruth and so the loss calculation and hence the training basically fails at the very beginning.
Here's a bunch of debug prints from prediction and GT shapes:-
I have also tried shutting down almost all the augmentations except HorizontalFlip:-
The text was updated successfully, but these errors were encountered: