Bad Softbot's calibration results with perfect simulation #950

brunofavs · 2024-05-09T12:58:51Z

Problems

When calibrating the softbot system without any perturbances, we are getting 1px errors instead of something near 0.
Adding noise is also unreliable. Adding more noise adds error until a certain threshold that makes the error magically disappear.

What was tried already :

Commenting the noise functions doens't do anything.

Lines 247 to 250 in be9f9bf

    
           addNoiseToInitialGuess(dataset, args, selected_collection_key) 
        
           addBiasToJointParameters(dataset, args) 
        
           addNoiseFromNoisyTFLinks(dataset,args,selected_collection_key)

Testing with a smaller prior dataset doens't help
Calibrating a simpler system like the RRbot works greatly

What I need to test

Another dataset on softbot
Limiting the sensors with -ssf, might be something to do with the LiDAR

If those don't work

I 'll post here a branch with a bagfile + dataset to make a MWE (minimum working example for other people to help out)

Tagging @miguelriemoliveira @manuelgitgomes for visibility

Updates

Using just the RGB cameras like in RRbot doens't help either, leading to believe it has to be a faulty dataset or a bug in calibrate

Table -> Calibration results with the previous simpler dataset used on atom_examples

+------------+------------------------+-------------------------+
| Collection | front_left_camera [px] | front_right_camera [px] |
+------------+------------------------+-------------------------+
|    000     |         2.7308         |          0.5883         |
|    001     |         1.2917         |          0.6410         |
|    002     |         0.7326         |          0.6489         |
|    003     |         0.6055         |          0.4272         |
|  Averages  |         1.3401         |          0.5764         |
+------------+------------------------+-------------------------+

The text was updated successfully, but these errors were encountered:

New minimal dataset, issue persists

brunofavs · 2024-05-09T14:19:57Z

This branch issue950 has a MWE example to test this issue.

The configuration file is updated with the new bagfile.

On this link you can transfer the dataset and bagfile.

To recreate :

Calibrating odometry :

rosrun atom_calibration calibrate -json $ATOM_DATASETS/softbot/issue950/dataset_corrected.json -v -ss 1

Without calibrating odometry :

rosrun atom_calibration calibrate -json $ATOM_DATASETS/softbot/issue950/dataset_corrected.json -v -ss 1 -atsl 'lambda x : x in []'

This is a completly perfect simulation.

We start with reprojection errors of around 3px and finish off at around 0.3px.

brunofavs · 2024-05-09T14:25:46Z

I found out that removing the problematic collection 31from the long dataset solves the first issue. With perfect calibration calibrating odometry : This is 264 parameters

| Averages | 2.5161 | 1.9251 | 0.0060 |

Without calibrating odometry : This is 24 parameters.

| Averages | 0.4000 | 0.3731 | 0.0057 |

The optimizer might be falling a local minima with these many parameters maybe, I'm not sure.

@manuelgitgomes told me this does make sense due to overfitting.

brunofavs · 2024-05-09T15:02:00Z

Seems that without calibrating odometry the issue of the noise dissapearing after a threshold is also gone when not calibrating odometry.

I still dont get is why some results with noise in the odometry and calibrating the odometry are great whereas others skyrocket to absurd values.

brunofavs · 2024-05-10T03:02:45Z

#952 was likely another thing throwing us off.

brunofavs · 2024-05-11T04:42:13Z

@miguelriemoliveira I will use the script I made today in #951 to automate all sorts of insightful plots so we can better analyze the data.

Are you available too meet Monday to talk about it? I should have all plots ready by then.

brunofavs · 2024-05-12T03:53:57Z

Here are all the plots computed from the batch.

plots.zip

I couldn't plot anything like "without calibrating odom vs calibrating odom" but I don't have enough data as I didn't run a lot of experiments without calibrating odometry. The few I have do prove that calibrating a system with noisy odometry
leads to pretty awful results.

I'm not sure if I should rerun all these experiments without calibrating odometry

This is better because if the fold lists are defined with strings it's limiting on the lambdas. The user has to user whichever string identifier the fold lists's isnt using and he has no idea until the program crashes when evaluating lambdas

miguelriemoliveira · 2024-05-12T07:24:40Z

Hi @brunofavs ,

we can talk Monday morning if you want.

9h or later?

brunofavs · 2024-05-12T15:13:33Z

Sounds good to me.

Zoom or at Lar?

miguelriemoliveira · 2024-05-12T21:40:34Z

Zoom please

…

On Sun, May 12, 2024, 4:13 PM Bruno Silva ***@***.***> wrote: Sounds good to me. Zoom or at Lar? — Reply to this email directly, view it on GitHub <#950 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACWTHVSJAD4CUGJO2EFDCBDZB6BLHAVCNFSM6AAAAABHOY3CXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGI4DMNJUGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

brunofavs · 2024-05-12T21:42:19Z

Ok zoom it is,
See ya tomorrow then :)

brunofavs · 2024-05-13T18:27:33Z

Hey @miguelriemoliveira @manuelgitgomes

I've been running a lot of experiments on the terminal trying to figure this out. The problem of possibly being on the "flat" part of the curve which lead to seeing apparently static results in plots such as the RGB calibration results was confirmed.

Thing is I'm getting odd behaviors.

When running this command ( x and y as placeholders for nig/ntvf values respectively)

clear && rosrun atom_calibration calibrate \
-json $ATOM_DATASETS/softbot/long_train_dataset1/dataset_corrected.json \
-v \
-ss 1 -nig x x \
-ntfv y y \
-ntfl "world:base_footprint" \
-csf 'lambda x: int(x) in [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 18]' \
-ctgt \
-atsf "lambda x : x not in []" \
-ftol 1e-4 -xtol 1e-4 -gtol 1e-4

I'm getting this table for y(ntfv) = 0 and x(nig) varying :

Not calibrating odometry here

| nig 	| ctgt    	| residuals 	|
|-----	|---------	|-----------	|
| 0.1 	| 0.00251 	| 1.00      	|
| 0.2 	| 0.00565 	| 1.13      	|
| 0.4 	| 0.00530 	| 1.11      	|
| 0.6 	| 0.00578 	| 1.05      	|
| 1.0 	| 0.00680 	| 1.17      	|
| 2   	| 0.00581 	| 1.07      	|
| 5   	| 0.00573 	| 1.08      	|
| 10  	| 0.009   	| 0.16      	|
| 15  	| 0.0099  	| 0.09      	|

It's a weak assumption to say it converges to all feasible nig values. This behavior is very odd to me. We won't see any plot other than either a horizontal line for the ctgt or a even weirder one for the residuals, which would be a negative slope line, not making a lot of sense.

Adding a fixed noise to odometry and varying again the sensor pose noise (nig), I get a consistent similar behavior :

Calibrating odometry here

| nig 	| ctgt    	| residuals 	|
|-----	|---------	|-----------	|
| 0.1 	| 0.013   	| 0.88      	|
| 0.2 	| 0.0095  	| 0.6       	|
| 0.4 	| 0.0092  	| 0.38      	|
| 0.6 	| 0.00985 	| 0.38      	|
| 1.0 	| 0.00987 	| 0.24      	|
| 10  	| 0.010   	| 0.17      	|

ctgtsomewhat constant and residuals declining.

Note: This experiment isn't the same as the plot we discussed in the morning varying ntfv for a fixed nig, here I am varying nig for a given ntfv value, I'll talk about this case in the next section

Varying ntfv for a given nig = 0.3 :

Calibrating odometry here

| nig  	| ctgt    	| residuals 	|
|------	|---------	|-----------	|
| 0.1  	| 0.0085  	| 0.45      	|
| 0.2  	| 0.0098  	| 0.8       	|
| 0.4  	| 0.00906 	| 1.58      	|
| 0.6  	| 0.00988 	| 1.3       	|
| 0.65 	| 0.09    	| 2.25      	|
| 0.7  	| 0.1     	| 1.77      	|
| 0.75 	| 0.1     	| 1.97      	|
| 1    	| 0.3     	| 15000     	|

Here the limit it more visible. At ntfv = 0.7'ish for nig = 0.3 the results start to deteorate quickly. The residuals paint a similar story.

For different values of nig other than 0.3, does the optimizer also break at ntfv = 0.7?

Once hitting the threshold, the error curves seem to skyrocket.

For nig = 0.1 , ntfv = 0.77 showed a perfect calibration
For nig = 0.1, ntfv = 0.79, the error was already 0.46, almost 5 times bigger than the initial guess.

I do see that the data with higher nig does reach the ntfv point where the optimizer can't handle it anymore earlier. That's the expected. However I expected this behavior change to be more linear and less abrupt.

I was hoping in this plot :

To make the point that for a certain value of ntfv we could actually see the lines in the correct order. Meaning higher nig lines would be higher on the graph. If I could catch in the plot the zone of the behavior change I could make that point. I'm fearing it won't be very visible as the behavior is so abrupt that it will just spike upwards and that's it.

My worries

I could eventually use a logarithmic scale on my plots but that would require a lot more data in the threshold zone. I would need increments like 0.77-0.772-0.774....0.79 to actually have the possibility of seeing something.

That would take a absurd amount of data and time to make meaningful conclusions.

I could also just make the assumption that for all feasible noise values the optimizer can deal with the problem.

I'm not sure what do do

brunofavs · 2024-05-13T19:13:12Z

I will run a batch with -ftol 1e-4 -xtol 1e-4 -gtol 1e-4 , 1 run stratified shuffle split 70/30 1 split with :

Data near thresholds
All results with and without calibrating odometry

Will post the plots later.

miguelriemoliveira · 2024-05-13T19:41:37Z

Going step by step through your comments:

It's a weak assumption to say it converges to all feasible nig values. This behavior is very odd to me. We won't see any plot other than either a horizontal line for the ctgt or a even weirder one for the residuals, which would be a negative slope line, not making a lot of sense.

Well, the ctgt is increasing as it should . The residuals are overfitting, I would not worry about those.
Also, nig 5 or 10 is too much. For angles it is more than 360 degrees!

Adding a fixed noise to odometry and varying again the sensor pose noise (nig), I get a consistent similar behavior :

I don't think I agree with this comparison, its not apples to apples because before you were not calibrating odometry and now you are.

In any case, also here I see nothing wrong. CTGT is increasing as it should.

Varying ntfv for a given nig = 0.3

also nothing strange here

However I expected this behavior change to be more linear and less abrupt.

Not sure you have any grounds to be expecting that. It could be abrupt or progressive, there is really not way of saying one is wrong and the other correct.

To make the point that for a certain value of ntfv we could actually see the lines in the correct order. Meaning higher nig lines would be higher on the graph. If I could catch in the plot the zone of the behavior change I could make that point. I'm fearing it won't be very visible as the behavior is so abrupt that it will just spike upwards and that's it.

This is still a bit strange to me, but remember we are seeing residuals, and those are not reliable.
If you do the graph with nig I think results would make sense.

I could eventually use a logarithmic scale on my plots but that would require a lot more data in the threshold zone. I would need increments like 0.77-0.772-0.774....0.79 to actually have the possibility of seeing something.

Too complicated.

That would take a absurd amount of data and time to make meaningful conclusions.

Right, do not do it.

I could also just make the assumption that for all feasible noise values the optimizer can deal with the problem.
I'm not sure what do do

I think the graph with ctgt values would make sense.

The conclusion to draw from these is that we should not trust the residuals, only the ctgt or, if the do not have ground truth, the evaluations (hopefully).

brunofavs · 2024-05-13T21:20:09Z

Well, the ctgt is increasing as it should . The residuals are overfitting, I would not worry about those.
Also, nig 5 or 10 is too much. For angles it is more than 360 degrees!

I mean, they do increase as they should. But they increase so little that it makes me unconfident. Yep I know any past 2pi is "useless", I was really just exhausting possibilities because everything seemed to converge.

I don't think I agree with this comparison, its not apples to apples because before you were not calibrating odometry and now you are.
In any case, also here I see nothing wrong. CTGT is increasing as it should.

It is increasing? It starts at 0.013 and finished at 0.010. From 0.013 went down twice then up thrice to 0.010.

also nothing strange here

Here I do agree, for a fixed nig (while calibrating odom), increasing ntfv increases CTGT as expected, regardless of the residuals being overfitted. This is a great conclusion.

Not sure you have any grounds to be expecting that. It could be abrupt or progressive, there is really not way of saying one is wrong and the other correct.

I really have none to be fair. The model is a black box and I was expecting some linearity due to having some of said linearity for lower values. It was incorrect.

I think the graph with ctgt values would make sense.
The conclusion to draw from these is that we should not trust the residuals, only the ctgt or, if the do not have ground truth, the evaluations (hopefully).

I agree.

brunofavs · 2024-05-14T17:21:11Z

Update on this :

I will run a batch with -ftol 1e-4 -xtol 1e-4 -gtol 1e-4 , 1 run stratified shuffle split 70/30 1 split with :

I ran this batch. There was a odd error on the LiDAR evaluation so I took it off ( will create a issue for it --> #957 ).

The main things to remark are:

The plots with error relative to the grount truth (ctgt)
The noise plots in the zone where the optimizer fails :

Starting with (1)

Note for all the plots below : the Y label should've said "Error (m)" that was my bad.

Calibrating Odometry

More nig more error -> Makes total sense

More odom noise (ntfv) more noise, up until the point where the optimizer can't handle it anymore, also makes sense

Not calibrating Odometry

Without the extra noise on the odom the optimizer can take a lot more nig noise. For all the data points on this plot the calibration results are great. Even for huge noises it wouldn't fail. It failed only at 20m/rad nig, which is outrageous and unrealistic.

The oscillation is amplified by the y axis limits, they only vary about 1mm.

Here the results seem a little more odd, even though it also spikes upwards at 0.7. I will elaborate more on the next section. Here the error is one entire order of magnitude. I would argue that from the point that the error is way bigger than the starting point, the optimizer is already lost and the results aren't reliable. I would ask @manuelgitgomes to help me on a more scientific explanation for this. We discussed it today but I'm failing to have the right words to properly explain it really.

Optimizer spiking at 0.7

If we zoom on the second plot of the previous section :

We can see that generally the higher degree noise curves spike quicker. I'm not sure this is even a useful conclusion or not.

But what I find rather odd is that every curve spikes roughly at the same point (ntfv = 0.7m/rad). I don't have a clear explanation for this.

The 'ctgt' tables printed at the end of the calibration always seemed to match.

There is something that I will correct for the final version though. Rn the ctgt is saving a file with only the averages.

This is a little problematic for 2 reasons. For one there is a anchored sensor with error 0 and there is also the odom error weighing in so I'm not exactly getting the final error of each sensor.

Example table to clarify my point.

+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+
|                      Transform                       |       Description       | Et0 [m] |  Et [m] | Rrot0 [rad] | Erot [rad] |
+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+
|  front_left_camera_link-front_left_camera_rgb_frame  |    front_left_camera    | 0.00000 | 0.00000 |   0.00000   |  0.00000   |
| front_right_camera_link-front_right_camera_rgb_frame |    front_right_camera   | 0.20000 | 0.20000 |   0.20000   |  0.20000   |
|         lidar3d_plate_link-lidar3d_base_link         |         lidar3d         | 0.20000 | 0.20000 |   0.20000   |  0.20000   |
|                 world-base_footprint                 | world_to_base_footprint | 0.75000 | 0.75000 |   0.75000   |  0.75000   |
|                       Averages                       |           ---           | 0.28750 | 0.28750 |   0.28750   |  0.28750   |
+------------------------------------------------------+-------------------------+---------+---------+-------------+------------+

Final results

Adding up the calibrations with and without odom, this will easily take 2 to 3 days to compute on my pc.
@miguelriemoliveira do you think its worth to strip down the data yaml now that we know roughly which plots are more important or invest the time in doing everything in case they prove useful later on?

Also I'm not sure whether to do something before this big batch. I wanted it ideally to be the final one to focus the rest on the time on the writting.

miguelriemoliveira · 2024-05-15T17:38:12Z

But what I find rather odd is that every curve spikes roughly at the same point (ntfv = 0.7m/rad). I don't have a clear explanation for this.

I think there is a bug with the nig or something. This very well defined limit suggest it.

This is a little problematic for 2 reasons. For one there is a anchored sensor with error 0 and there is also the odom error weighing in so I'm not exactly getting the final error of each sensor.

Right, it would be nice to show separately for each sensor.

Also I'm not sure whether to do something before this big batch. I wanted it ideally to be the final one to focus the rest on the time on the writting.

My suggestion. Wait a bit and lets have the meeting tomorrow, then talk on Friday.
In the meantime you can start writing ...

Great job by the way...

brunofavs added the bug Something isn't working label May 9, 2024

brunofavs self-assigned this May 9, 2024

brunofavs added a commit that referenced this issue May 9, 2024

MWE of #950

c0579c6

New minimal dataset, issue persists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad Softbot's calibration results with perfect simulation #950

Bad Softbot's calibration results with perfect simulation #950

brunofavs commented May 9, 2024 •

edited

Loading

brunofavs commented May 9, 2024

brunofavs commented May 9, 2024 •

edited

Loading

brunofavs commented May 9, 2024

brunofavs commented May 10, 2024

brunofavs commented May 11, 2024

brunofavs commented May 12, 2024

miguelriemoliveira commented May 12, 2024

brunofavs commented May 12, 2024

miguelriemoliveira commented May 12, 2024 via email

brunofavs commented May 12, 2024

brunofavs commented May 13, 2024

brunofavs commented May 13, 2024 •

edited

Loading

miguelriemoliveira commented May 13, 2024

brunofavs commented May 13, 2024

brunofavs commented May 14, 2024 •

edited

Loading

miguelriemoliveira commented May 15, 2024

Bad Softbot's calibration results with perfect simulation #950

Bad Softbot's calibration results with perfect simulation #950

Comments

brunofavs commented May 9, 2024 • edited Loading

Problems

What was tried already :

What I need to test

If those don't work

Updates

brunofavs commented May 9, 2024

brunofavs commented May 9, 2024 • edited Loading

brunofavs commented May 9, 2024

brunofavs commented May 10, 2024

brunofavs commented May 11, 2024

brunofavs commented May 12, 2024

miguelriemoliveira commented May 12, 2024

brunofavs commented May 12, 2024

miguelriemoliveira commented May 12, 2024 via email

brunofavs commented May 12, 2024

brunofavs commented May 13, 2024

Not calibrating odometry here

Calibrating odometry here

Note: This experiment isn't the same as the plot we discussed in the morning varying ntfv for a fixed nig, here I am varying nig for a given ntfv value, I'll talk about this case in the next section

Calibrating odometry here

For different values of nig other than 0.3, does the optimizer also break at ntfv = 0.7?

My worries

brunofavs commented May 13, 2024 • edited Loading

miguelriemoliveira commented May 13, 2024

brunofavs commented May 13, 2024

brunofavs commented May 14, 2024 • edited Loading

Update on this :

Starting with (1)

Note for all the plots below : the Y label should've said "Error (m)" that was my bad.

Calibrating Odometry

Not calibrating Odometry

Optimizer spiking at 0.7

Final results

miguelriemoliveira commented May 15, 2024

brunofavs commented May 9, 2024 •

edited

Loading

brunofavs commented May 9, 2024 •

edited

Loading

brunofavs commented May 13, 2024 •

edited

Loading

brunofavs commented May 14, 2024 •

edited

Loading