Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step-like offset to IMU2 gyro mid-flight leads to EKF2 destabilization and crash. #12477

Open
lekston opened this issue Oct 2, 2019 · 12 comments
Assignees

Comments

@lekston
Copy link
Contributor

lekston commented Oct 2, 2019

Bug report

Issue details
This issue has happened to 3 aircraft on 4 ocassions, all flying the same firmware and all fitted with Pixhawk v1. Only the first registered case did not lead to crash as the aircraft, as the primary EKF continued to operate correctly.

After doing some log analysis, I’ve found that the first symptom is allways a step-like offset affecting the IMU2, typically only on Y and Z axes. Maximum offsets reached about 0.4 rad/sec on the first aircraft running SCHED_LOOP @ 50Hz and later 0.2 rad/sec was observed when running @150hz.

The most curious thing about those cases that lead to crashing the aircraft was that it seems that it is the IMU2 that starts providing incorrect readings (when looking at IMU.GyrY vs. IMU2.GyrY and when analysing NKF1.Pitch vs. NKF6.Pitch), but when looking at IMU.EG vs. IMU2.EG, it is the first IMU that does report several communication errors and hence is classified as unhealthy by the driver logic.

Screenshot from 2019-10-02 08-19-00
Image 1: logs from 2019.09.27 – 348.BIN – aircraft continues uncontrolled flight for a few minutes after the gyro / driver fault.

Screenshot from 2019-10-02 13-23-15
Image 2: logs from 2019.09.20 – 8.BIN – IMU2 offset is clearly visible and affects the secondary EKF2 estimate of Pitch (as the aircraft was level between 14:37:47 up to 14:37:50). Later, a communication error with IMU1 causes its health status to degrade (some incorrect readings are also visible). Around 14:37:51, the aircraft selects the secondary EKF2 as the new primary and the control is lost.

Screenshot from 2019-10-02 13-42-27
Image 3: logs from 2019.07.09 – 326.BIN – IMU2 offset occurs at 10.51:05 the dissappears at 10:55:56 and occurs again at 10:57:46, with the last occurrence actually leading to a crash.

Screenshot from 2019-10-02 13-43-49
Image 4: logs from 2019.07.09 – 326.BIN – close-up at IMU2 offset dissappearing.

Screenshot from 2019-10-02 13-13-10
Image 5: logs from 2019.07.09 – 324.BIN – aircraft continues the flight with secondary EKF destabilized. (Log is slightly damaged – level lines are noise)

Just to put things in perspective this (3.9.8) firmware was flown on at least 10 aircraft for 3 months, so the issue does not occur often. Previously we used 3.8.4 (on NuttX) and there was no similar problems. I did go through all changes that went to the 3.9.8 release, especially those affecting the AP_InertialSensors and the AP_NavEKF, but I am unable to find the root cause of this behaviour (we are not using fast sampling and this seemed to be the only major change). Perhaps, it is related to using the ChibiOS SPI drivers?

One more comment about the HW configuration: all affected aircraft were using twin GPS receivers, where the first is uBlox and the second is Emlid Reach, so the secondary EKF does not receive speed accuracy (SAcc) data nor vertical accuracy data (Vacc).

Version
3.9.8 - customized (no changes to AP_InertialSensors nor to AP_NavEKF)
(see https://lekston@github.com/lekston/ardupilot.git branch: ft_release)

Platform
[x] All
[ ] AntennaTracker
[ ] Copter
[ ] Plane
[ ] Rover
[ ] Submarine

Airframe type
Flying wing

Hardware type
Pixhawk

Logs
https://www.dropbox.com/sh/w21zej1yw7yhchs/AABwRYyOn3JJXs6A0SUnQonpa?dl=0

@tridge tridge self-assigned this Oct 2, 2019
@tridge
Copy link
Contributor

tridge commented Oct 2, 2019

interesting logs! I don't yet have an explanation, but will think about it

@lekston
Copy link
Contributor Author

lekston commented Oct 2, 2019

great to hear that @tridge! Looking forward to hear from you and I will definitely advise if I find anything related to the issue.

@lekston
Copy link
Contributor Author

lekston commented Oct 7, 2019

Hi, my colleague managed to recreate the issue on a test bench (using the same HW/SW configuration as described above) and we got the response depicted below.

Screenshot from 2019-10-07 13-22-03

Also, I started looking at the IMU2 driver side on the AP_HAL & AP_HAL_ChibiOS (since we previously had I2C problems after moving to Chibi based builds), but finding no obvious culprits I decided rebuilding the 3.9.8 on PX4/Nuttx (uavcan disabled to fit into 1MB ROM). And now I am running this FW (3.9.8 + PX4) on the same HW that we used to recreate the issue and after 30min of try runs I see no signs of the issue...

The only thing I did modify in the drive was adding register checks on the register responsible for disabling I2C bus on the L3DG gyro (most likely negligible effect). I will confirm asap which of the two changes helped.

@lekston
Copy link
Contributor Author

lekston commented Nov 6, 2019

Hi,
One more occurence - the harshest we've seen...
Screenshot from 2019-11-06 08-27-51

Link to logs:
https://www.dropbox.com/s/bjdls3v9ccewm68/20191105_00000124.BIN?dl=0

Any suggestion would much appreciated!

@ntamas
Copy link
Contributor

ntamas commented Jun 3, 2021

@lekston Did you manage to solve this problem after all? We are experiencing the exact same type of problem on our quads after upgrading from ArduCopter 3.6.7 to 4.0.7, which I believe also meant a change from NuttX to ChibiOS so that could possibly be the problem here as well.

@ntamas
Copy link
Contributor

ntamas commented Jun 3, 2021

Trying to connect the dots; this discussion and this one also report a similar issue.

@lekston
Copy link
Contributor Author

lekston commented Jun 3, 2021

Hi, unfortunately I haven't found the solution for that yet. We've been flying modified 3.8.4 on NuttX exactly because of that (3 Fixed wings crashed for that reason and there was no point in risking more).

I did not test this issue on Cube Black though - all occurences where on Pixhawk 1 type autopilots with dual IMUs. All our Cube Black equipped aircraft so fat (~10) flew on 3.8.4. And since our typical fixed-wing were sold with warranty agains SW errors I really was not comfortable doing any more experiments with ChibiOS since then. Now, as the Cube Black is out of the marked this becomes more pressing as no other Cube HW supports NuttX.

From my experience, this bug was incredibly hard to recreate without restarting a random number of times just waiting.

@ntamas
Copy link
Contributor

ntamas commented Jun 3, 2021

Thanks for the info! You mentioned above that your colleague managed to reproduce the issue on a test bench. We are interested in investigating this further so it would be great if you could share what you needed to do in order to reproduce this in the lab. Did you just have to leave a Pixhawk1 on the desk and log the IMU state while disarmed, or did you need to do something more elaborate than that?

@rmackay9
Copy link
Contributor

I was actually not aware of this issue but in any case, I think a good approach to getting this fixed is to test using the 4.1 release (only Copter or Rover are available for the moment although Plane is coming soon). If it can be replicated with 4.1 then I can add it to the Copter-4.1 issues list and it won't be forgotten and might be a blocker for the release.

@ntamas
Copy link
Contributor

ntamas commented Jun 11, 2021

@rmackay9 I noticed that it is mentioned in the 4.1 issues list as "MU1 failure handling not working". I think that the failure handling is working, the real problem is that it is IMU2 that seems to be having problems but IMU1 marks itself as not healthy so the vehicle switches over to IMU2, which is the one with the offset. Had IMU1 not marked itself as unhealthy, everything would have been fine even if IMU2 acquired a genuine offset suddenly within a few milliseconds.

We'll try to test this with Copter-4.1 in the near future but it's pretty hard to reproduce -- that's why I'm interested in ways to reproduce it on the desk without actually flying a copter. Otherwise we would either need to upgrade, say, 50 quads, and fly them autonomously for half an hour at low altitude (so we don't cause too much damage), or just upgrade one or two and fly them manually for several hours. In the worst case we'll do that but it would be great if someone could say "hey, just leave it on the desk for three days and log the IMUs and the issue will pop up" :)

@vasarhelyi
Copy link

One more comment to the tests performed together with @ntamas: we had several crashes in recent tests presumably caused by this bug after upgrading dozens of pixhawk1-based drones from 3.6.7 to 4.0.7., besides, many drones were not passing prearm checks on the ground with gyro health flag missing (which disappeared after a reboot), so we switched 50 drones back to 3.6.7 and tested them for several hours on the ground and also in flight and so far all of this strange behavior disappeared again, so our assumption that the bug is related to ChibiOS (and not to HW problems of our drones) is stronger now. One more very scary looking effect of this bug that I have noticed several times with 4.0.7 on drones on the ground when watching them on the GCS map is that their position estimation suddenly starts to become unstable and drifts away tens or hundreds of meters within seconds without any warning or error message, so the drone believes that it is OK to speed up suddenly to 100 m/s or so... Reboot solves this as well.

@rmackay9
Copy link
Contributor

@vasarhelyi,

Thanks very much for the report. I think the ship has sailed on 4.0.x but if the problem still exists on 4.1 I'm keen to fix it.

The position drifting sounds like it could be the same problem because an IMU step would overpower the EKF's ability to learn the accelerometer "biases" and could lead to a massive acceleration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants