Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended view detection #44

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

Matthew-Whitlock
Copy link
Collaborator

Opening the discussion on these options, though some cleanup is needed and I'm sure there are some changes that should be made to the API for clarity.

Essentially, add options to how views are captured for automatic resilience. Two primary changes:

  1. capture_internal:
  • Leaves view hooks enabled during execution of the resilient lambda, enabling capture of views that are only copied internally. Useful for views within classes that are too large to want to copy-construct into the resilient lambda and for more complex view-management (like in NimbleSM).
  • Drawbacks being that at least one execution of the lambda is required before recovery can begin (handled internally w/ same user-code, but changes the control flow of recovery in a hidden way), and users are exposed to more potential issues with control-flow changing which views are captured/recovered based on application state. Neither affects normal execution.
  1. checkpoint_gather/gather_views:
  • checkpoint_gather sets a scope for gathering views sporadically and checkpoint/recovering all at once. gather_views handles actually capturing views for the checkpoint_gather scope. Useful for skipping view-capture in portions of execution without complicating recovery/checkpoint version management.
  • Drawbacks similar to above, but again don't affect normal resilience use.

Issues:

  1. Since both of these are more likely to capture the same view several times, and stdfile doesn't use a map to limit multiple-checkpoint/recovery of a view, this currently uniqs the list of views. This seems more expensive than using the map - should we update the stdfile backend to behave more similarly to the VeloC?
  2. Issues with the cref stuff when capture_internal used (need to revisit to remember why), so crefs not captured when using that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant