Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing batch execution for Zau for the odometry paper #986

Closed
brunofavs opened this issue Sep 20, 2024 · 12 comments
Closed

Preparing batch execution for Zau for the odometry paper #986

brunofavs opened this issue Sep 20, 2024 · 12 comments
Labels
discussion Issue for discussing a topic

Comments

@brunofavs
Copy link
Collaborator

This issue comes as a follow up of #983, as it was getting too long.

We found some issues that were preventing us from successfully calibrating Zau when using dataset splits containing only latter collections. This was fixed in #984. It had to do with the way we were computing the initial estimate of the patterns, which was prior to the odometry being corrected.

Regardless, to assure the problem in #984 is now fixed I conducted calibrations by splitting the dataset in groups of successive collections and assessed if the results were similar. They were similar.

The command used to run the calibrations was the following :

export CSF="lambda x: int(x) in range(0,11)" \
export DATASET=$ATOM_DATASETS/zau/inesc_day2_5_full/dataset_corrected_with_odometry_and_depth_and_rgb_and_pattern_poses.json && rosrun atom_calibration calibrate \
-json $DATASET -uic -v  \
-csf "$CSF" -ssf "lambda x: x in ['rgb_body_left','rgb_body_right','rgbd_hand_color','rgbd_hand_depth','lidar_body']" \
 -ftol 1e-4 -xtol 1e-4 -gtol 1e-4

(The CSF function was the only thing modified between runs)

And these were the results :

Collection 0 to 10

+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0109     |       7.3352       |       10.1972       |        2.9100        |        0.0078       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

Collection 11 to 20

+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0243     |       3.3670       |       27.4123       |       26.8969        |        0.0258       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

Collection 21 to 30

+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0143     |      17.8234       |       11.4869       |        3.7402        |        0.0101       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

Collection 31 to 40

+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0228     |      26.7431       |       22.5269       |        8.9211        |        0.0193       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

Collection 40 to end

+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0123     |       7.0928       |        7.5322       |        6.1527        |        0.0097       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

The results have some disparity between them, but this is expected as each run is only using 10 collections and amongst these are still some bad collections.


Running with all the collections

Afterwards, I did successive runs and filtered the bad collections and we can see the error is lower now :

Filtered collections : [19,24,31,33,47,0,12,34]

image

In comparison to the last comment made in #983 :

lidar_body: 0.0137 m -> 0.0107 m (3mm better)
rgb_body_left: 5.6301 pix -> 5.3352 pix (improves slightly)
rgb_body_right: 8.1524 pix -> 4.6795 pix (improves greatly)
rgbd_hand_color: 5.2223 pix -> 3.2660 pix (improves)
rgbd_hand_depth: 0.0086 pix -> 0.0079 (improves ever so slightly)

Total error in pix : 18.9pix -> 13.4pix

Also, we can see on the print of the table that there are no significant outliars, thus further filtering should not be necessary.

With this, I'm pretty sure we are ready to move to the batchs. What do you think @miguelriemoliveira @Kazadhum ?

@miguelriemoliveira
Copy link
Member

Regardless, to assure the problem in #984 is now fixed I conducted calibrations by splitting the dataset in groups of successive collections and assessed if the results were similar. They were similar.

This is great news. Thas means #984 is working well.

Total error in pix : 18.9pix -> 13.4pix

Also, we can see on the print of the table that there are no significant outliars, thus further filtering should not be necessary.

This seems ok. Please create a new dataset without these collections.

With this, I'm pretty sure we are ready to move to the batchs. What do you think @miguelriemoliveira @Kazadhum ?

One additional request we perhaps can look at. Its the normalizer of rgb w.r.t. the normalizer for metric sensors.

Now we get results like 1 centimeter in distance sensors, and 5 pixels in rgb sensors.
Perhaps we would prefer results like 1 pixel and 2 centimeters.

I created a new flag in calibrate called rgb_normalizer_multiplier which will do this.
If you set the normalizer multiplier to <1.0 it will increase the weight of the erros in pixels.

@brunofavs can you try to see if we get (even) better results?

@miguelriemoliveira
Copy link
Member

Pushed with wrong issue number. 52af175

@Kazadhum
Copy link
Collaborator

Hi @miguelriemoliveira and @brunofavs!

I think we can continue towards batch executions as the results are indeed better than they were a week ago. But I don't think we're quite there yet maybe.

First, if you can achieve even better results by experimenting with the normalizer multiplier then I think that's a great idea (and let us know what value you used because it might yield some interesting conclusions!)

But secondly, what we're looking at in these tables are residuals. These being low tells us the calibration is probably good, but we still need to evaluate these results. I'd say before running batch executions, you should run some experiments where you randomly split the dataset into a train and a test datasets and then run the evaluations. Effectively, this is what each run in the batch executions will be doing. I think it would be good to make sure the evaluations are returning good results before moving onto batch executions.

So, to summarize, I'd do this:

  1. Create a new "clean" or, rather, filtered dataset, without collections [19,24,31,33,47,0,12,34];
  2. Write down the command you will use for splitting the dataset, calibrating the train dataset and then evaluating it (this will be similar to the template file for the batch executions);
  3. Run it, and if you get good results, then I think we can move onto batches!

Do you agree @miguelriemoliveira @brunofavs?

@miguelriemoliveira
Copy link
Member

Hey,

I think @Kazadhum gave a very good suggestion. We need to see if the evaluations are ok before moving to the batch processing.

@brunofavs
Copy link
Collaborator Author

Hey, sorry for the abstinence for the past 2 days, I had some personal things.

Now we get results like 1 centimeter in distance sensors, and 5 pixels in rgb sensors.
Perhaps we would prefer results like 1 pixel and 2 centimeters.

Yeah 1 pixel and 2 centimeters looks more appealing in a paper than 5 pixels and 1 centimeter for sure.

  • Create a new "clean" or, rather, filtered dataset, without collections [19,24,31,33,47,0,12,34];
  • Write down the command you will use for splitting the dataset, calibrating the train dataset and then evaluating it (this will be similar to the template file for the batch executions);
  • Run it, and if you get good results, then I think we can move onto batches!

I agree with what @Kazadhum said as well. I will be doing the tasks @Kazadhum summarized today.

I will give feedback as soon as I have something to show :)

brunofavs added a commit that referenced this issue Sep 25, 2024
This script works in a counter intuitive way. The -csf function should
select the collections the user wants to remove, not the ones he wants
to keep. Because the name is "remove_collections_from_atom_dataset", not
"preserve_collections_from_atom_dataset".
@brunofavs
Copy link
Collaborator Author

Hey.

I filtered the dataset already.

About the normalizer multiplier.

export DATASET=$ATOM_DATASETS/zau/filtered/dataset_corrected_with_odometry_and_depth_and_rgb_and_pattern_poses_filtered.json && rosrun atom_
calibration calibrate \
-json $DATASET -uic -v  \
-ssf "lambda x: x in ['rgb_body_left','rgb_body_right','rgbd_hand_color','rgbd_hand_depth','lidar_body']" \
 -ftol 1e-6 -xtol 1e-6 -gtol 1e-6 -rnm 1

These first results are with ftol xtol gtol 1e-6 to be quick and assess if the normalizer works. Even like this each calibration is already taking 16 minutes each(running simultaneously).

-rnm 1 (in theory without any effect)

  • Normalizer for lidar3d: 0.199703513281753
  • Normalizer for depth: 0.14716268425435777
  • Normalizer for rgb: 164.97977481201676
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0107     |       4.9642       |        4.4450       |        3.2032        |        0.0079       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

-rnm 0.5

Modifiers :

  • Normalizer for depth: 0.14716268425435783
  • Normalizer for rgb: 82.48988740600838
  • Normalizer for lidar3d: 0.19970351328175295
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
| Collection | lidar_body [m] | rgb_body_left [px] | rgb_body_right [px] | rgbd_hand_color [px] | rgbd_hand_depth [m] |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+
|  Averages  |     0.0107     |       4.0851       |        3.5622       |        3.0086        |        0.0080       |
+------------+----------------+--------------------+---------------------+----------------------+---------------------+

The normalizer seems to be working! Interestingly, the metric errors didn't seem to change, but the pixel errors did. I expected the metric errors to increase as a tradeoff.

About the evaluations.

I was talking with @Kazadhum about which evaluations to do since I don't think the full evaluation script is working wel, but we didn't come to any good conclusion.

With the 5 sensors we have, its already 10 different combinations possible, so I don't know if its worth to test every single one of them. I suggested to try one evaluation between every possible modality but @Kazadhum argued that it might not be conclusive due to the very different pose of the different sensors.

What is your opinion on this @miguelriemoliveira ?

@miguelriemoliveira
Copy link
Member

miguelriemoliveira commented Sep 26, 2024 via email

@brunofavs
Copy link
Collaborator Author

Hey @miguelriemoliveira @Kazadhum !

About the evaluations let's talk in person.

Are you both available to discuss this sometime Monday or Tuesday?

How about 0.1 or 0.01? What are the results in those cases?

Will do these soon.

@Kazadhum
Copy link
Collaborator

Hi @brunofavs! I can meet on either of those days!

@miguelriemoliveira
Copy link
Member

Good morning. How about today Monday at 14h?

@Kazadhum
Copy link
Collaborator

Good morning! Sounds good to me!

@brunofavs
Copy link
Collaborator Author

Works for me as well.

Lets do it at 14h then.

@brunofavs brunofavs added the discussion Issue for discussing a topic label Oct 3, 2024
@brunofavs brunofavs mentioned this issue Oct 3, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issue for discussing a topic
Projects
None yet
Development

No branches or pull requests

3 participants