-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: set max slots and checkpoint gc policy should comply with config policies #10140
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10140 +/- ##
==========================================
+ Coverage 54.71% 54.77% +0.06%
==========================================
Files 1266 1266
Lines 159970 159986 +16
Branches 3662 3661 -1
==========================================
+ Hits 87525 87637 +112
+ Misses 72312 72216 -96
Partials 133 133
Flags with carried forward coverage won't be shown. Click here to find out more.
|
✅ Deploy Preview for determined-ui canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patch experiment really sketches me out. e.SetGroupWeight(*newResources.Weight)
and all call e.db.SaveExperimentConfig(e.ID, e.activeConfig)
and so does the endpoint itself. i feel like there are definitely calls where i could make the exp in memory state and db state inconsistent..?
hmm yea good point. |
@@ -418,13 +418,12 @@ func (e *internalExperiment) SetGroupMaxSlots(msg sproto.SetGroupMaxSlots) { | |||
return | |||
} | |||
|
|||
slots, err := configpolicy.CanSetMaxSlots(msg.MaxSlots, w.ID) | |||
err = configpolicy.CanSetMaxSlots(msg.MaxSlots, w.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes/undoes some of the changes you just made in your previous PR ... why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my bad, when I made the previous PR I didn't realize that a lot of the code in the PATCH experiment API handler is specifically for det e set
CLI commands, and relied too heavily on integration tests of the functions I implemented to use in the PATCH handler, rather than testing the handler itself.
After manually testing PATCH requests resulting from those commands and adding automated integration testing for such requests, I think these changes should properly address any remaining issues/inconsistencies with expected config policy-related behavior in this API handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still slightly fuzzy here. just to make sure we're all on the same page.. resources.max_slots/slots/slots_per_trial
are all different but we're ok with constraints.max_slots
controlling all of them? in which case.. maybe if there is a constraint or invariant for max slots it should always get set here, because otherwise an experiment could exceed max slots? once again i confused the check when patched and the check when launched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea! This is just for patching the resources.max_slots
field specifically. Somewhere in the request validation, it could be useful to check resouces.slots
or resouces.slots_per_trial
in this case to make sure it doesn't exceed the requested resources.max_slots
, but I think this goes back to the conversation of not wanting a constraint to alter configs? I think it could be worth checking in this API handler that resources.max_slots
>= resources.slots_per_trial
in the experiment config, but that kinda goes beyond intended fixes for this ticket
Happy to add it here either way though, since a check like that does make sense!
2a432e3
to
b75303c
Compare
b75303c
to
78f0e8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanna note that technically, users can still override the name and description of an experiment with det e set name
and det e set description
, but I think that should be allowed, @kkunapuli wdyt?
Yeah, I agree. |
master/internal/api_experiment.go
Outdated
@@ -1280,6 +1306,11 @@ func (a *apiServer) PatchExperiment( | |||
} | |||
} | |||
|
|||
// `patch` represents the allowed mutations that can be performed on an experiment, in JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment seems removable to me? or maybe should go someone else ha.. def is a stray at this point though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this!
7693b6d
to
c287a09
Compare
c287a09
to
caa9179
Compare
Ticket
CM-590
Description
This PR fixes a couple of issues:
det e set max-slots
to comply with invariant config and constraintsdet e set gc-policy
to comply with invariant configsTest Plan
Checklist
docs/release-notes/
See Release Note for details.