Basic Fault Protection Ports and Components #2536

timcanham · 2024-02-21T16:29:19Z

timcanham
Feb 21, 2024
Maintainer

This is a proposal to implement a basic fault protection engine for F Prime. It has the following FDIR concepts:

Concepts

1) Fault Announcement (The `FD` in FDIR)

Components detect faults locally since they are the "experts". They look at data for their domain and decide when a fault is present and then announce via a port and a specific fault identifier that a fault has occurred. This is separate from the response, which may also be handled by the same component via an input port, but only when decided by the system fault protection implementation. This does not preclude local responses that may be more appropriate, but provides a standard way to have system coordination of responses. An example might be an instrument that is producing bad data that is detected by an instrument manager component.

2) Fault Monitors (The `FD` in FDIR)

Implementations of fault responses here at JPL have the notions of fault monitors. This is code that implements persistence counts and state to provide a level of filtering of fault symptoms so that the system is not overreactive. Each monitors is tunable to the particular item being monitored.

3) Fault Response (The `IR` in FDIR)

The fault protection implementation will look at a set of fault announcements and decided on a project-specific response. The component will have input ports for fault announcement, will map the announcements to various responses, and invoke an output port with the response. The topology will connect the response output ports to the components implementing the response. The response can be "fanned out" by a splitter component if more than one component implements the response. The response may not be implemented by the component that does the announcement. For instance, the instrument fault example might have one component announce the fault (the instrument manager) while another handles the response (a power component turns off the instrument).

4) Fault Completion

When a fault response is done, the component implementing the response announces it is done, so the fault response implementation can either declare the fault response done, or move to the next step in a response if multiple steps are needed. In our example,

Implementation

Enumerations

A set of FPP enumerations would be created to enumerate faults and responses. These enumerations would live in the project-specific config directories, but would be used by the ports and components below. That allows the names to be customized by a project.

Ports

A set of F Prime ports would be created in Fw to implement the fault interface. The ports would have arguments based on the above enumerated types.

Port	Function
`FaultAnnounce`	Announce the fault
`FaultRespond`	Announce the response
`FaultResponseComp`	Announce the completion of a fault response

Components

Projects can implement their own arbitrarily complex fault component (or components) that implements the FaultAnnounce, FaultRespond and FaultResponseComp ports. A basic implementation can be provided in the F Prime repo that has a table to map announcements to responses.

Helpers

The Fault Monitors can be implemented as helper classes that can be instantiated by components to track a particular item. The persistence counts can be updated via [parameters][https://fprime-community.github.io/fpp/fpp-users-guide.html#Defining-Components_Parameters] or programmatically.

Future Direction

Fault protection is important enough and lends itself to modeling, so perhaps it could become a first-class FPP at some point.

zimri-leisher · 2024-02-21T18:00:00Z

zimri-leisher
Feb 21, 2024

One thing we got from the lessons-learned page (https://llis.nasa.gov/lesson/772) is that fault detection and response should be as configurable from the ground as possible. This could be accomplished in your system by having fault detection and response rely on parameters in the ParamDB that the ground can configure, but we were actually looking at going a step further and having fault detection and response be entirely written in command sequence files, which could be entirely replaced by the ground if necessary. This will mean that we will have to implement logic in command sequence files, which is a separate issue but one that we were already looking at doing for other reasons.
Assuming we implement logic in command sequences, what do you think of this? It wouldn't be mutually exclusive with this FDIR system.

1 reply

timcanham Feb 22, 2024
Maintainer Author

For past projects, we have had parameters to enable/disable responses and tune monitors (including disabling them) with some defaults in case one of the faults is that the file system is having issues. It is a good idea. Having a global enable/disable is also a good idea if you have to use a big hammer to solve a complex fault response problem at the lower levels.

garthwatney · 2024-02-21T19:37:17Z

garthwatney
Feb 21, 2024

I like the more centralized fault protection approach. A response can be defined as a set of ground commands (picked from the dictionary). This way responses can leverage from existing commands. We don't have to create extra fault protection ports and handlers into components to specially handle fault behavior.

0 replies

SterlingPeet · 2024-02-21T19:38:55Z

SterlingPeet
Feb 21, 2024

Perhaps @EbenezerA99 or Antoine could weigh in here on the logic aspect. They presented at FSW and had a poster at SmallSat for their VISORS software, which used a parameter table driven system for fault logic. So they may have opinions that were formed during the process of implementing this in F Prime, and they may be interesting.

0 replies

EbenezerA99 · 2024-02-22T03:43:21Z

EbenezerA99
Feb 22, 2024

Yes, I worked with Antoine on FDIR logic for a two spacecraft formation flying mission called VISORS which used Fprime for its FSW. See this paper for more detailed information on our FDIR strategy.

This implementation looks great to me! IMO, having basic FDIR components in the Fprime repo would be of great benefit to users.

One thing I would like to see is flexibility in how Fault Responses are mapped to 'Fault Announcements', instead of having a strict 1-1 pairing between a Fault Announcement and a Fault Response.

For example, for the VISORS mission we had to implement logic such as:

If Fault1 && Fault 2 -> Invoke FaultResponse1 & FaultResponse2
If Fault3 && Fault 2 -> Invoke FaultResponse1 & FaultResponse3
If only Fault1 -> Invoke FaultResponse3

etc.

5 replies

timcanham Feb 22, 2024
Maintainer Author

Did you have the notion of FaultResponse1 and FaultResponse2 being run sequentially or in order? If the former, was there a notion of a completed response status that would kick off the second response?

timcanham Feb 22, 2024
Maintainer Author

The fault response table could have the notion of each row having fault announcements with specified operators between them (like || or &&), and multiple responses as outputs.

EbenezerA99 Feb 23, 2024

The way we implemented it is we had the FR component connected to a PayloadStateMachine component that actually actuated the responses (Switch Mission Modes, Switch Formation Roles, toggle power to subsystems, etc.). The port connection connecting the components had arguments corresponding to each of the possible fault responses mentioned above, which the PSM would parse and then execute accordingly.

It would execute the responses in a specific order simply based on how we had it organized it in the code, the responses themselves weren't tied to each other so there wasn't the notion of a completed response status kicking off the second response.

BUT, I think having this functionality in the Fprime FR component would be a good idea. If there is a way to make it optional (so that users could decide if they want FaultResponse2 to only trigger if FaultResponse1 completed or if the responses are independent), that would be best imho.

EbenezerA99 Feb 23, 2024

I strongly support the FR table having the ability to tie fault announcements together with || or && operators and have the ability to have multiple responses

apaletta3 Mar 10, 2024

Really cool stuff, I like the implementation you are proposing @timcanham! Would love to check it out once its released.

timcanham · 2024-02-22T20:41:33Z

timcanham
Feb 22, 2024
Maintainer Author

Added a discussion for possible FPP updates for this: #2540

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Fault Protection Ports and Components #2536

{{title}}

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Basic Fault Protection Ports and Components #2536

timcanham Feb 21, 2024 Maintainer

Concepts

1) Fault Announcement (The FD in FDIR)

2) Fault Monitors (The FD in FDIR)

3) Fault Response (The IR in FDIR)

4) Fault Completion

Implementation

Enumerations

Ports

Components

Helpers

Future Direction

Replies: 5 comments · 6 replies

zimri-leisher Feb 21, 2024

timcanham Feb 22, 2024 Maintainer Author

garthwatney Feb 21, 2024

SterlingPeet Feb 21, 2024

EbenezerA99 Feb 22, 2024

timcanham Feb 22, 2024 Maintainer Author

timcanham Feb 22, 2024 Maintainer Author

EbenezerA99 Feb 23, 2024

EbenezerA99 Feb 23, 2024

apaletta3 Mar 10, 2024

timcanham Feb 22, 2024 Maintainer Author

timcanham
Feb 21, 2024
Maintainer

1) Fault Announcement (The `FD` in FDIR)

2) Fault Monitors (The `FD` in FDIR)

3) Fault Response (The `IR` in FDIR)

Replies: 5 comments 6 replies

zimri-leisher
Feb 21, 2024

timcanham Feb 22, 2024
Maintainer Author

garthwatney
Feb 21, 2024

SterlingPeet
Feb 21, 2024

EbenezerA99
Feb 22, 2024

timcanham Feb 22, 2024
Maintainer Author

timcanham Feb 22, 2024
Maintainer Author

timcanham
Feb 22, 2024
Maintainer Author