-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Discover Instance Type Capacity Memory Overhead #7004
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
2493708
to
6ee1b24
Compare
6975505
to
c6f9af5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks really good!
pkg/controllers/providers/instancetype/memoryoverhead/controller.go
Outdated
Show resolved
Hide resolved
} | ||
|
||
nodeClassMap := lo.Associate(nodeClassList.Items, func(nodeClass v1.EC2NodeClass) (string, v1.EC2NodeClass) { | ||
return nodeClass.Hash(), nodeClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure you want to do the hash, as that encompasses more details than we care about for resource allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further down there's a match against this map and the node's nodeclass hash annotation so that we're only storing cache values for the latest version of the nodeclass. Would you instead prefer keying this map on the nodeclass name and then while during the node loop having a check that the node's ami is within the nodeclass status.amis list?
CC @jmdeal (I think you were the one on the call today?) There's a negligible amount of variance when measuring overhead in Ki but I was unable to find any variance between the same instance type when measuring in Mi which my PR is doing. Here's a simple script that can be used to verify: https://gist.github.com/jukie/df045af8fec68941f5d119044bf04aee
|
c63c9e6
to
cda22f9
Compare
Fixes #5161
Description
This adds a new controller for managing a cache on the instancetype provider that stores the memory capacity overhead for each instance type by comparing the actual value after a Kubernetes Node gets registered with the cluster.
The cache is then implemented by NewInstanceType() when calculating memory capacity. If a cached value doesn't exist it will fall back to the existing logic of
vmMemoryOverheadPercent
.The cache is keyed by a combination of instance type name and hash of nodeClass.Status.AMIs so this should always be accurate and when an AMI update is triggered will use the existing logic of calculating against
vmMemoryOverheadPercent
to ensure safe instance creation every time.How was this change tested?
Suite tests
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.