Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lifecycle: allow components to gracefully restart with exit codes similar to bootstrap #1331

Open
1 task
voxeljorge opened this issue Oct 12, 2022 · 10 comments
Labels
enhancement New feature or request

Comments

@voxeljorge
Copy link

It would be very helpful to allow Greengrass components to be restarted without entering into the BROKEN state in a controlled way. There is already a pattern in place here where components can request that the nucleus restart with special exit codes returned from Bootstrap.

Proposed Solution
I propose exit code support is added to Run which tells the Greengrass Nucleus that the process exited without error, with an exit code like 100 similar to the Bootstrap section. To avoid excessive restarts some kind of logic could be added that forces a minimum delay between restarts, or a maximum number of restarts per time period.

Other
We have a custom component which fetches configuration data and caches it into a file. Right now we have to design an out of band signaling mechanism for that component to signal other components that a configuration update is available. If graceful restarts were supported, the component update could happen via dependency definitions in components.

  • 👋 I may be able to implement this feature request
  • [ x] ⚠️ This feature might incur a breaking change

Changing exit code behavior on Run would technically be a breaking change, although in practice a number could probably be found to be very unlikely to collide with any typical exit code.

@voxeljorge voxeljorge added the needs-triage Needs eyeballs label Oct 12, 2022
@shaguptashaikh
Copy link
Contributor

Hi voxeljorge, thanks for your suggestion. While something like this may be one way to address the use case you describe, if the primary goal is for a component to signal other components to take an action, is there a reason you cannot use the IPC Pub/Sub APIs Greengrass provides? A restart shouldn't be required for this but even if it is, would an explicit mechanism for a component to request a restart for itself or other component via IPC suit your case better?

@shaguptashaikh shaguptashaikh removed the needs-triage Needs eyeballs label Oct 12, 2022
@voxeljorge
Copy link
Author

We are deploying existing software that is written in a language not supported by the IPC Pub/Sub APIs. Rewriting this software in a supported language is not really viable at this time. Additionally, we prefer to only use Pub/Sub approaches like this for pure control plane operations like reloading a configuration. A service restart guarantees a reload and is less likely to be accidentally missed than a Pub/Sub message (imagine a case where the application looking for configuration updates locks up and stops processing these messages due to a bug), as there are fewer moving pieces involved.

The exit/restart flow via dependencies seems much more robust to programming errors and other failures basically, and makes it easier to integrate both third party software and languages that are not supported by the official Greengrass Core SDK.

Side Note: I'm quite confused as to why these APIs are language specific rather than protocol specific, they could have easily been an MQTT or HTTP API which would have allowed for any language to integrate with the Greengrass Core very reasonably.

@voxeljorge
Copy link
Author

One other quick note for use-cases for this, there are frequent occasions when an application may just need to reload its own state from the ground up. I would venture to say that most applications are initially written to require an exit/restart to reload their configuration, even if they have some way to detect that their configuration has changed. I doubt that developers would ever want these types of restarts to trigger a broken condition, except in cases where the application is truly spinning (restarting very frequently without loading correctly).

@shaguptashaikh
Copy link
Contributor

Thanks for explaining, this is definitely one of the use cases the Pub/Sub IPC APIs were designed for. I understand you may have a preference for restarting for reasons you mentioned but Greengrass IPC overall provides a rich set of features otherwise unavailable or difficult to build on your own so I'd still recommend not to avoid IPC usage completely. As for the IPC SDKs not being protocol specific, this decision was made based on several considerations, one being performance overhead. Would you be able to share what language your legacy application is written in so we can see if we're going to support that language for IPC in near future?

Lastly, we will consider the original feature request for restarting on exit codes and post an update here on if it's feasible/whether Greengrass will be able to provide this capability.

@shaguptashaikh
Copy link
Contributor

Just leaving this here for reference, if anyone reading this wants to trigger a restart for a component from another component and is able to use IPC, we have dedicated IPC APIs for it that can be used https://docs.aws.amazon.com/greengrass/v2/developerguide/ipc-local-deployments-components.html#ipc-operation-restartcomponent

@voxeljorge
Copy link
Author

Greengrass IPC overall provides a rich set of features otherwise unavailable or difficult to build on your own so I'd still recommend not to avoid IPC usage completely.

We aren't intentionally avoiding IPC usage completely, we just already had applications written in Go which we did not want to re-write just to gain access to Greengrass IPC. Additionally we really do prefer to contain our "Control Plane" outside of the application where possible, following a pattern similar to the 12 Factor App . We feel that a design where our application code is running in a separate process from the control logic for logging/reloading/etc is considerably more robust. This could be accomplished with a wrapper written in a greengrass supported language but for now that seems unnecessary.

Would you be able to share what language your legacy application is written in so we can see if we're going to support that language for IPC in near future?

Sorry should have included this in the original post I guess, as mentioned above mostly Go.

One more note: We have implemented a hacky version of this that seems to work by setting the errorResetTime lifecycle property to something much lower than the original default of 1hr but long enough that if our application is spinning it still ends up in a BROKEN state. This works well enough for our use-case so we hope this flag doesn't go away (in fact it would be nice if were to be documented and officially supported)

@shaguptashaikh
Copy link
Contributor

shaguptashaikh commented Oct 12, 2022

Okay, as I said we do have a RestartComponent API that should always reliably and easily restart a component from another component or request restart for itself. In your case the component that you expect to exit and cause restart for its dependents could call this instead. We don't have a plan to support Go for IPC in immediate future, but in case you're eventually able to migrate to a supported language, or at least rewrite that one component that needs to trigger a restart for others.

errorResetTime is required to function as it does today for us to handle restarts and ERRORED -> BROKEN transitions, however it's undocumented, it most likely will stay this way in order to avoid breaking changes but I want to be careful about promising something like that for an undocumented config. I can provide an update about a plan to officially support this soon as well

Another less than ideal but functional workaround for you while we evaluate this feature request/until there's a better solution available - you could see if invoking the greengrass-cli restart command from your component works for you
https://docs.aws.amazon.com/greengrass/v2/developerguide/gg-cli-component.html#component-restart

@voxeljorge
Copy link
Author

voxeljorge commented Oct 13, 2022

Okay, as I said we do have a RestartComponent API that should always reliably and easily restart a component from another component or request restart for itself

As far as I can tell this design would would require creating a circular dependency between both services or having some kind of registration process, which is much more complex. For example:

  • Service A retrieves configuration data
  • Service B depends on the data retrieved by a, so a dependency in the recipe is created

In this scenario, what approach would you recommend for service A to know the list of services which it should notify about a configuration reload? How do we ensure that list is correct, up to date, etc? In a design which purely depends on component dependencies to force reloads, this dependency graph is acyclic which is typically more robust.

We don't have a plan to support Go for IPC in immediate future, but in case you're eventually able to migrate to a supported language, or at least rewrite that one component that needs to trigger a restart for others.

We are much more likely to drop Greengrass for an alternate approach than rewrite existing software in Java/C/C++ (maybe python is supported as well? not sure).

@voxeljorge
Copy link
Author

I can provide an update about a plan to officially support this soon as well

A plan for including errorResetTime in the supported recipe configuration schema would be excellent.

@shaguptashaikh
Copy link
Contributor

In this scenario, what approach would you recommend for service A to know the list of services which it should notify about a configuration reload? How do we ensure that list is correct, up to date, etc

You can have Service A request restart for itself using the RestartComponent API and Service B or any other unknown/newly added services will be restarted as long they declare a dependency on A. If you were to use the Pub/Sub APIs for notifying instead of restarting, knowing who to notify isn't needed - anyone subscribing can receive the message.

We'll post here when we have an update on errorResetTime or the original feature request.

@shaguptashaikh shaguptashaikh added the enhancement New feature or request label Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants