feat: per-instance type VM memory overhead percentages #6146

pH14 · 2024-05-03T20:31:19Z

Bandaid for #5161, without trying to solve the problem globally for all EC2 instance types

Description

As originally discussed in kubernetes-sigs/karpenter#716 the global vmMemoryOverheadPercent option is limiting when running on a heterogeneous set of nodes. For instance, a small EC2 instance may require ~7% overhead, but the larger nodes may only require 1-2% overhead. Undersizing this values risks provisioning nodes that cannot actually run the pods it's intended for, and oversizing is inefficient / expensive.

This PR is a strawman proposal for solving this problem: allow Karpenter to accept a mapping of instance type to memory overhead percentage, and fall back to the global vmMemoryOverheadPercent if no specific mapping exists. For our deployment this would allow us to create a mapping for the instance types we care about, and potentially sets up the suggestion in #5161 of "We could launch instance types, check their capacity and diff the reported capacity from the actual capacity in a generated file that can be shipped with Karpenter on release so we always have accurate measurements on the instance type overhead"

How was this change tested?

It hasn't been yet! Before going further on the tests, I wanted to gather input into whether this change might be accepted at all first.

Does this change impact docs?

Yes, PR includes docs updates
Yes, issue opened: #
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ping

netlify · 2024-05-03T20:31:35Z

✅ Deploy Preview for karpenter-docs-prod ready!

Name	Link
🔨 Latest commit	`a578fa2`
🔍 Latest deploy log	https://app.netlify.com/sites/karpenter-docs-prod/deploys/6635499ab9d90300080c7ffe
😎 Deploy Preview	https://deploy-preview-6146--karpenter-docs-prod.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

engedaam · 2024-05-12T00:47:44Z

@pH14 Thank you for the contribution! I have a few questions and suggestions on the next steps you can take. The team mainly hopes to have an RFC for long standing problems such as this one. Within the RFC, I hope that you can address questions such as; What are all the options for solving this problem? What would be the best short and long terms solutions? What is preventing us from implementing the long term solution? Why should Karpenter have a bandaid solution for the short term? Do we want to expose API surface to customers if we don’t intend for us to keep it long term? The team is heading to v1 of the project and we are critical thinking about API surface that we want to expose to user to avoid the need of introducing a breaking change.

The team is definitely open to accepting contributions for VM memory overhead percentages, but we just want to make sure we have evaluated all the possible solutions. Here are examples of RFC we have used in the past: https://github.com/kubernetes-sigs/karpenter/tree/main/designs and https://github.com/aws/karpenter-provider-aws/tree/main/designs. I also recommended coming to the working group meetings to discuss your ideas for possible solutions. Here is where you can find the schedule for the meetings: https://karpenter.sh/docs/contributing/working-group/.

github-actions · 2024-05-26T12:04:07Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Add in option for per-instance type VM memory overhead percentage map…

a578fa2

…ping

pH14 requested a review from a team as a code owner May 3, 2024 20:31

pH14 requested a review from jonathan-innis May 3, 2024 20:31

jonathan-innis assigned engedaam May 10, 2024

github-actions bot added the lifecycle/stale label May 26, 2024

github-actions bot added the lifecycle/closed label Jun 9, 2024

github-actions bot closed this Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-instance type VM memory overhead percentages #6146

feat: per-instance type VM memory overhead percentages #6146

pH14 commented May 3, 2024

netlify bot commented May 3, 2024 •

edited

Loading

engedaam commented May 12, 2024

github-actions bot commented May 26, 2024

feat: per-instance type VM memory overhead percentages #6146

feat: per-instance type VM memory overhead percentages #6146

Conversation

pH14 commented May 3, 2024

netlify bot commented May 3, 2024 • edited Loading

✅ Deploy Preview for karpenter-docs-prod ready!

engedaam commented May 12, 2024

github-actions bot commented May 26, 2024

netlify bot commented May 3, 2024 •

edited

Loading