-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNI ipamd reconcilation of in-use addresses #3109
Comments
The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime. In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods. This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse. Fixes aws#3109
Hi @hbhasker , thank you for this report. Could you confirm this by looking at the ipamd.log If you share the logs in k8s-awscni-triage@amazon.com, we can look over it too. |
I did confirm by looking at the json for the datastore as well. It clearly had pods in there that had already terminated on the node. I will see if i run into another occurence of the same and capture more information. |
The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime. In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods. This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse. Fixes aws#3109
@orsenthil - We will have to check plugin logs if the delete request landed on CNI. Since kubelet is the source of truth. I don't think we should add more reconcilers rather check why the event was missed or not received.. |
What happened:
We noticed that on some hosts the CNI was thinking 64 IPs were in use with pods that had long terminated. When we checked the node had only 59 pods (including ones that were using host networking) but CNI clearly thought there were 64 pods running and failed allocating new IPs to pods because all IPs were in use (we set the max pods to 64 for the node). We spent sometime trying to figure out how that happens but I guess it can happen if somehow the CNI missing the delete event or fails to process it.
I was trying to read the code and see if there is some race where a delete and create can race causing CNI to incorrectly reject the delete and then proceed to add the IP to ipamd as allocated. In which case the IP remains in use even though the pod is gone. (Its possible I am misunderstanding what kubelet /crio do when a pod is terminated and if the CNI fails the DelNetwork request with an error).
Mostly looking to understand if this is a known issue? Looks like CNI does reconcile its database on a restart but maybe it needs to reconcile it periodically to prevent this?
Environment:
kubectl version
): 1.26cat /etc/os-release
): Amazon Linux 2023uname -a
): 6.1The text was updated successfully, but these errors were encountered: