Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: #48: vt and magistrate support #49

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

nmm0
Copy link
Contributor

@nmm0 nmm0 commented Jun 8, 2023

Closes #48

Matthew-Whitlock and others added 9 commits June 7, 2023 16:13
Restructured backends to extend a top-level virtual object
	(more backends to update)

New VTProxy registration which just checkpoints status info
about that proxy, but also registers to the VTContext for special handling
of the actual data

  VeloC doesn't appear to have a way to checkpoint an object as
just a single node (e.g. each checkpoint of a proxy element is
a node's iteration's element data, not the iteration's element data),
so recovering to different node positions is currently not possible
  Working on that before I start the steps for handling recovery of
the VT data in the VTContext. I may just use update the file backend
to the new virtual hierarchy and use that for more configurability for now.

  Checkpoints of status info seem very large right now, some 66M for each node
with only one collection and one objgroup being checkpointed. I have not yet
investigated that

  Another todo, attaching listeners to collection element migration events,
to send the updated status info alongside. Not an immediate concern.
option(KR_ENABLE_STDFILE "use StdFile backend for automatic checkpointing" OFF)

option(KR_ENABLE_MAGISTRATE "use Magistrate for serializing and deserializing" OFF)
option(KR_ENABLE_RESILIENT_EXEC "enable resilient execution spaces" OFF)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah just noticed this is wrong, see KR_ENABLE_EXEC_SPACES below. I think I messed this up in the rebase somewhere

nmm0 and others added 8 commits June 8, 2023 12:26
(cherry picked from commit 3e08aa6)

Conflicts:
	CMakeLists.txt
	src/resilience/AutomaticCheckpoint.hpp
	src/resilience/CMakeLists.txt
	src/resilience/backend/AutomaticBase.hpp
	src/resilience/backend/stdfile/StdFileBackend.cpp
	src/resilience/backend/stdfile/StdFileBackend.hpp
	src/resilience/backend/veloc/VelocBackend.cpp
	src/resilience/backend/veloc/VelocBackend.hpp
	src/resilience/context/ContextBase.hpp
	src/resilience/context/VTContext.hpp
	src/resilience/registration/Registration.hpp
	src/resilience/registration/ViewHolder.hpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vt and magistrate support
2 participants