Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does Karpenter deal with the new m7i-flex instance type? #4367

Open
stevehipwell opened this issue Aug 3, 2023 · 30 comments
Open

How does Karpenter deal with the new m7i-flex instance type? #4367

stevehipwell opened this issue Aug 3, 2023 · 30 comments
Assignees
Labels
feature New feature or request

Comments

@stevehipwell
Copy link
Contributor

Given that the new m7i-flex instance type is now available and breaks the current pattern of burstable instances being in a non m family; how does Karpenter currently treat these instances and allow them to be filtered out? If this isn't a default pattern what plans are there to improve this?

My "two cents" is that the concept of burstable instances is very similar to the spot instance concept and should probably align with that.

@yuval-almog
Copy link

yuval-almog commented Aug 3, 2023

Can I join in and also ask when is the "regular" m7i instances will be supported by Karpenter

@stevehipwell
Copy link
Contributor Author

@yuval-almog I'd assume that they'd be supported as soon as they were supported in EC2 Fleet and I'd expect that to be immediately where constraints are unbounded or general to instance family/generation.

@stevehipwell
Copy link
Contributor Author

I'm also not sure exactly what Karpenter understands about PVs and the EBS CSI Driver but if that is part of the scheduling logic then there would need to be some code changes to support the new improved attachment limits for m7i instances. If Karpenter is PV & CSI aware I can see some significant cost benefits for scheduling workloads with high PV requirements onto m7i at a higher per CPU cost but a lower total cost.

@ellistarn ellistarn added the feature New feature or request label Aug 7, 2023
@ellistarn
Copy link
Contributor

Discussed with the team. We think we need to make two changes.

  • Introduce a new well known label for "burstable" compute. This will apply to flex+T instance types
  • Add a new default requirement that opts out of burstable instance types by default.

@ellistarn ellistarn added the good-first-issue Good for newcomers label Aug 9, 2023
@bwagner5 bwagner5 removed the good-first-issue Good for newcomers label Aug 22, 2023
@bwagner5
Copy link
Contributor

After chatting with the EC2 team working on m7i-flex, we've decided to not treat this instance type as "burstable". We think this instance type will fit workloads for the majority of Karpenter users. There may be some edge cases where it's not the right fit, if you are running close to 100% node CPU utilization at all times, but I'd expect this to be a small number, so we'd like to generally treat this just like any other m family instance type. If you're not comfortable with using the m7i-flex, then you can always add a requirement to exclude the m7i-flex instance family like so:

  requirements:
    - key: "karpenter.k8s.aws/instance-family"
      operator: NotIn
      values: ["m7i-flex"]

Here's from the m7i-flex docs page here:

M7i-flex instances efficiently use compute resources to deliver a baseline level of performance with the ability to scale up to the full compute performance a majority of the time.

@jmdeal jmdeal assigned jmdeal and unassigned jmdeal Aug 25, 2023
@stevehipwell
Copy link
Contributor Author

@bwagner5 I think you're right about m7i-flex instances being generally fine, but semantically I think that there should be a well-known label to differentiate them from other non -flex instances for the use cases where they might be a problem (or more likely, perceived as a problem). This would not only make the distinction in Karpenter as clear as the use of a -flex suffix makes this in the general AWS docs; it would also make the addition of new -flex instances not result in unexpected behaviour.

My suggestion would be to add a label to differentiate between standard, flex (-flex) & burstable (t) instances. This could then be used by a provisioner to change the default behaviour for all new instance types without needing a change for each new one.

@ellistarn
Copy link
Contributor

ellistarn commented Aug 31, 2023

I wonder if t, flex, and standard could be considered as QOS classes that share the same label. @jonathan-innis , we'd need to think quickly on this if we wanted to change anything for beta, since burstable is a Boolean and would need to evolve to an enum. Maybe best to do it additively and leave the old label in.

@stevehipwell
Copy link
Contributor Author

@ellistarn can/does Karpenter provision t instances as unlimited? If so is there a discussion about how a t instance set to unlimited should be classified?

@FernandoMiguel
Copy link
Contributor

By default t3 are unlimited and t2 are not.
There is an account/org wide option to set that, besides the one per launch template.
Karpenter doesn't really handle that, and AFAIK has no native support to set that in the native lc

@stevehipwell
Copy link
Contributor Author

@ellistarn did this get discussed before the beta changes? I think the principal of a QoS class for burstable instances makes sense and I think all t3 instances (unlimited or not) should be treated the same for simplicity.

@ku524
Copy link

ku524 commented Dec 11, 2023

I discovered that m7i-flex is running on EKS without specifying the t type in nodepool. It could have been a major issue if it occurred in production. Is there any update?

@saurabhmodh4
Copy link

First, M7i-flex instances are not T instances.
M7i-flex instances belong to the M family and deliver similar performance as the M7i instances at a lower price.
M7i-flex instances are the easiest way for you to get price performance benefits for a majority of general-purpose workloads. They deliver up to 19% better price performance compared to M6i instances.
M7i-flex instances are designed to deliver a baseline CPU performance with the ability to scale up to the full CPU performance 95% of the time. With M7i-flex instances, you can seamlessly run web and application servers, virtual desktops, batch processing, microservices, enterprise applications, and more.
We have designed M7i-flex to deliver similar performance as M7i for all workloads including production workloads.

Comparison vs. M7i: M7i instances are a great choice for all general-purpose workloads, especially for workloads that need the largest instance sizes or continuous high CPU usage, such as large application servers and databases, gaming servers, CPU-based machine learning (ML), and video streaming.

https://aws.amazon.com/ec2/instance-types/m7i/
https://aws.amazon.com/about-aws/whats-new/2023/08/amazon-ec2-m7i-flex-m7i-instances/
https://aws.amazon.com/blogs/aws/new-seventh-generation-general-purpose-amazon-ec2-instances-m7i-flex-and-m7i/
https://aws.amazon.com/ec2/faqs/
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose-instances.html#general-purpose-performance

Curious to know if you saw any issue or if you are speculating.

@stevehipwell
Copy link
Contributor Author

@saurabhmodh4 M7i-flex instances aren't explicitly compatible with M7i instances (that's why they're cheaper) so shouldn't be treated the same. Due to how Kubernetes works you would need to have the full context to decide on a case-by-case basis if M7i-flex instances were appropriate or not; it's a classic "it depends..." scenario. Therefore the absolute minimum requirement is that it should be possible to proactively block all current and future flex instances from being resolved by a specific provisioner.

@ku524
Copy link

ku524 commented Dec 13, 2023

I agree with what Steve Hipwell said. Why is it cheaper than m7i? It's because there's trade-offs. The basic performance is provided at 40%, but it can burst up to 95% using credits. You mentioned that it is not a T type, but to me, it seems the concept is completely identical to T type, only the name has changed. Am I right? If there is something I am misunderstanding due to lack of information, please explain.

If I am correct, isn't it natural that problems could arise when m7i and m7i-flex are mixed? Imagine an EC2 provisioned in an environment operating on the assumption of 100% usage, facing performance limitations when there are no credits. Pods running on m7i-flex could experience issues such as increased latency due to not being able to use the CPU as needed, while the overall CPU usage remains low. This means that services with pod autoscaling based on CPU usage would not be able to expect autoscaling. This could cause disruptions in users' environments.

Although the name change without maintaining backward compatibility is a bit surprising, it can be understood if it's for the sake of consistency with the same generation/architecture of EC2. However, making it indistinguishable from other m-type instances could be a serious problem. Also, just because it is useful for specific workloads, it does not provide a reasonable basis to limit the karpenter development team from distinguishing it, especially when there is a clear difference.

@saurabhmodh4
Copy link

Sorry, that is wrong. There are no credits in M7i-flex.
Let me clarify in a few steps:

  1. The baseline performance of 40% is a guidance for our customers to easily decide if their workloads are a fit or not. It is derived from the fact that a majority of workloads we see on EC2 do not have more than 40% CPU utilization.
  2. The baseline is not enforced in M7i-flex. However, the baseline is enforced in T instances by leveraging the CPU credit system.
  3. It was designed from the ground up to deliver the performance that majority of the workloads on our M instances need. That's why we call it M7i-flex and that's why we want it to be classified as an M instance in Karpenter.
  4. In your example @ku524 we do not have credits or restrictions so the CPU usage will increase as the Pods need more CPU.
  5. It is useful for a majority of workloads and not specific workloads.
  6. I encourage you to try it out and let me know if you see the things you are mentioning by measuring real-world application performance.
  7. We always provide an --exclude option to exclude a specific instance type if it does not match your workload requirements.

I dont think I got an explicit answer to my question from last time.
"Curious to know if you saw any issue or if you are speculating".

@stevehipwell
Copy link
Contributor Author

@saurabhmodh4 the documentation specifically states that the M7i-flex instance isn't right for all workloads (emphasis mine).

M7i-flex instances are the easiest way for you to get price performance benefits for a majority of general-purpose workloads.

I'm also pretty sure that what you see in general EC2 instances is significantly different to what is seen Kubernetes, where we're attempting to optimise bin packing and CPU utilisation. Flex looks to be an alternative approach to cost cutting which may work in limited Kubernetes contexts but is likely to reduce performance and introduce not only additional latency but also lack of repeatability.

Can I also point out that both the network and EBS bandwidths are significantly reduced for flex instances. Both of these are significant to the operation of most non-trivial Kubernetes clusters.

The more I think about this the more I think that flex instances shouldn't be treated as general purpose instances in the Kubernetes context. A sub 20% price performance benefit isn't going to make up for the engineering cost of guaranteeing unexpected behaviour isn't caused by the compute platform. It's definitely not going to be worth it where performant networking or storage is involved. So ideally Karpenter makes flex instances explicit opt-in.

Again I'll point out that at the bare minimum there needs to be a documented pattern for blocking current and future flex instances.

@saurabhmodh4
Copy link

Agree on the network and EBS baseline bandwidths and the association to non-trivial clusters that need high baseline network and EBS specs. But I bet you those are not the majority which is what I believe you are also implying.
I would like to understand the requirements and the use cases you are referring to since our data showing a majority of workloads not consuming high CPU includes Kubernetes too. I do not agree with "limited Kubernetes contexts and reduced performance" based on the data I see.
I will work to arrange a meeting with you.

@ku524
Copy link

ku524 commented Dec 14, 2023

  1. Of course, since we haven't actually encountered a service failure yet, it remains a concern for now. However, it's not necessary to wait for a failure in production to consider its significance. Waiting until then is too late.

  2. Thank you for the explanation, but there are still parts that I don't fully understand. So, what are the trade-offs of the m7i-flex? Is it just cheaper? A clear explanation of the trade-offs is necessary, as merely highlighting the benefits does not suffice for proper information delivery.

  3. To reiterate, regardless of how good the m7i-flex is, it's not justifiable to deprive us of the means to distinguish it. It must be a clearly different type from the m7i, or else AWS wouldn't have named them differently (m7i vs m7i-flex). I understand your point about m7i-flex benefiting most users. I am hard to agree, considering that due to Kubernetes's nature of specifying requests, the benefits of using m7i-flex seem to be an exception. Even if I were to agree, depriving users of the right to distinguish between them is a separate issue. EKS is not serverless, and users should have full control over the node types.

  4. I'm well aware that it's possible to exclude certain instance types. However, adopting such an approach poses several problems, with the biggest concern being the need to use the same method every time a new flex type is introduced. I am already investing a lot of effort to differentiate between Intel and AMD instance types, as they are not distinguished. I don't want to add complexity with the addition of flex types. Furthermore, excluding them through 'exclude' doesn't guarantee they will be excluded when new ones are added, does it? It's unreasonable to wait for a user to encounter a service failure before making such changes.

@ellistarn
Copy link
Contributor

@jonathan-innis should we build a selector for amd/intel? I agree we should have a selector for flex.

@jonathan-innis
Copy link
Contributor

I'm supportive of a selector on CPU manufacturer, similar to how we have one for GPU manufacturer. Looks like that particular issue is tracked here: #3529. We'd definitely be receptive if anyone was wanting to pick that one up and implement it.

@karlpvoss
Copy link

@saurabhmodh4 is there any documentation you can point us to that shows how the new flex instances actually work? As you've stated, there's no credit system, but more information on the following would be very helpful:

M7i-flex instances are designed to deliver a baseline CPU performance with the ability to scale up to the full CPU performance 95% of the time.

How long is the instance able to hold the 95% figure? What factors determine whether I can scale my CPU usage on the instance, etc.

For what it's worth I agree with the sentiment that this instance will probably be very useful for a large majority of users, but as a person looking to pack pods and and use spot instances I was surprised to see this instance in my fleet.

@jonathan-innis
Copy link
Contributor

For what it's worth I agree with the sentiment that this instance will probably be very useful for a large majority of users, but as a person looking to pack pods and and use spot instances I was surprised to see this instance in my fleet

To chime in on this: I think we're generally supportive of adding additional selectors here (both on cpu manufacturer and also on whether something is a flex instance type or not. Maybe some labels like karpenter.k8s.aws/instance-cpu-manufacturer and karpenter.k8s.aws/instance-flex?

In either case, this isn't something that the maintainers have bandwidth to pull in right now but we'd be supportive of a PR that added this feature in if one was opened.

@NetanelK
Copy link
Contributor

NetanelK commented Mar 4, 2024

Hi guys, I opened a PR (#5769) that takes care of CPU manufacturer selection.

I haven't seen an option to distinguish between flex and standard instances besides thier name, unlike burstable ones that EC2 API does expose.

We can create a requirement (karpenter.k8s.aws/instance-performance-mode?), That checks for instance name (for flex), burstable (from API) or standard.
And assign one of the values (flex, burstable, standard).

@stevehipwell
Copy link
Contributor Author

@bwagner5 did we come up with a proposed label for this?

@jonathan-innis
Copy link
Contributor

jonathan-innis commented Apr 12, 2024

How do we feel about karpenter.k8s.aws/instance-capability-flex: true. EC2 treats the suffix of the instance type name as capabilities: https://docs.aws.amazon.com/ec2/latest/instancetypes/instance-type-names.html. Long-term (if we had asks for it), we could also consider things like karpenter.k8s.aws/instance-capability-network-optimized, karpenter.k8s.aws/instance-capability-block-storage-optimized.

@tehranian
Copy link

@jonathan-innis -- re: network-optimized -- Yes, this has been on my mind as an ask on behalf of Twilio.
Ex: Our API Edge has its own EKS clusters, and our preference is to keep them on network-optimized instances.

@jonathan-innis
Copy link
Contributor

@tehranian Do you have specific bandwidth requirements or you just want network optimized in general?

@jonathan-innis jonathan-innis self-assigned this Apr 17, 2024
@tehranian
Copy link

@jonathan-innis I think we just want a general selector for "network-optimized":

I imagine if we had a specific bandwidth requirement, one could use a >= selector on karpenter.k8s.aws/instance-network-bandwidth using the values from https://karpenter.sh/docs/reference/instance-types/, right?

I'd probably JOIN that in my head with the table from https://instances.vantage.sh/ and choose instance-network-bandwidth > 12500 -- but that's not very easy or intuitive :) For our API Edge clusters (for example), it'd be simpler to have a boolean selector for "network-optimized".

@jonathan-innis
Copy link
Contributor

@tehranian Do you mind opening a separate issue requesting karpenter.k8s.aws/instance-capability-network-optimized. If we are able to get the PR merged in, this may get closed out and this request is going to get lost.

@tehranian
Copy link

Filed #6122 for a boolean field for network-optimized instances 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment