Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VPC CNI memory by reducing number of things it is caching #2887

Open
GnatorX opened this issue Apr 18, 2024 · 12 comments
Open

Improve VPC CNI memory by reducing number of things it is caching #2887

GnatorX opened this issue Apr 18, 2024 · 12 comments

Comments

@GnatorX
Copy link
Contributor

GnatorX commented Apr 18, 2024

What would you like to be added:
Narrow down what VPC CNI is caching to reduce memory utilization in large clusters. Currently we see pretty high memory utilization and seems to scale with nodes.

Why is this needed:
The current behavior is pretty problematic when cluster size gets large (5000+ nodes) causing us to increase memory request for the CNI even though it isn't necessary for the CNI to use all that memory.

I believe simply adding a new ByObject Filter on Node object is enough to reduce cache utilization. Since https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/k8sapi/k8sutils.go#L183 only gets the node object the CNI is running on.

@orsenthil
Copy link
Member

Thanks for the report and the Pull Request. Have you done any measurements with and without this change? Could you share the differences?

@GnatorX
Copy link
Contributor Author

GnatorX commented Apr 19, 2024

Not yet. Will update once I have tested this

@GnatorX
Copy link
Contributor Author

GnatorX commented Apr 19, 2024

@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the node it is running on and calls with node name is index from k8s side which is relatively fast. Rather than filtering, why not just use non-cached calls get that information?

The availability difference isn't that high, watches vs a call.

@GnatorX
Copy link
Contributor Author

GnatorX commented Apr 29, 2024

I took a pprof of the issue.
Screenshot 2024-04-29 at 10 30 58 AM

It seems like the issue is with the stream watcher is consuming memory during cluster size increase. It seems to require quite a bit of memory in order to process all nodes and store it in the memory. Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I need to re-test this with my change however I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

@orsenthil
Copy link
Member

K8s caches which usesList + watches on startup are extremely expensive calls

Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

With your changes, if you see any different in both memory and performance, please share an update here.

@GnatorX
Copy link
Contributor Author

GnatorX commented May 1, 2024

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

Agreed.

When I tested my changes, it didn't yield significant difference in memory utilization. I believe, as shown in the pprof, the memory usage is because of the stream watcher attempting unmarshal incoming data. I think rather than using a informer cache and raw watch against the node itself may be more efficient(?).

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

@orsenthil
Copy link
Member

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

This sounds reasonable, if we have better proof with improvements, we can bring this change in.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

@GnatorX
Copy link
Contributor Author

GnatorX commented Oct 10, 2024

@orsenthil can we reopen this. We have more data now

@GnatorX
Copy link
Contributor Author

GnatorX commented Oct 10, 2024

When we added the filter #2888 we were able to drop our memory utilization on a 3000 nodes cluster.
Screenshot 2024-10-10 at 11 14 15 AM

@orsenthil orsenthil reopened this Oct 10, 2024
@orsenthil
Copy link
Member

When we added the filter #2888 we were able to drop our memory utilization on a 3000 nodes cluster.

Could you explain this a bit more, how did adding cache filter on node reduce the VPC CNI memory utilization? Is it due to stream watcher you attributed to here - #2887 (comment)

@GnatorX
Copy link
Contributor Author

GnatorX commented Oct 23, 2024

Sorry i realize now that I commented on the PR and not the issue. Feel free to close this against the PR since it is merged now

@orsenthil orsenthil added this to the v1.18.6 milestone Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants