Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNI ipamd reconcilation of in-use addresses #3109

Open
hbhasker opened this issue Nov 8, 2024 · 3 comments · May be fixed by #3113
Open

CNI ipamd reconcilation of in-use addresses #3109

hbhasker opened this issue Nov 8, 2024 · 3 comments · May be fixed by #3113

Comments

@hbhasker
Copy link

hbhasker commented Nov 8, 2024

What happened:
We noticed that on some hosts the CNI was thinking 64 IPs were in use with pods that had long terminated. When we checked the node had only 59 pods (including ones that were using host networking) but CNI clearly thought there were 64 pods running and failed allocating new IPs to pods because all IPs were in use (we set the max pods to 64 for the node). We spent sometime trying to figure out how that happens but I guess it can happen if somehow the CNI missing the delete event or fails to process it.

I was trying to read the code and see if there is some race where a delete and create can race causing CNI to incorrectly reject the delete and then proceed to add the IP to ipamd as allocated. In which case the IP remains in use even though the pod is gone. (Its possible I am misunderstanding what kubelet /crio do when a pod is terminated and if the CNI fails the DelNetwork request with an error).

Mostly looking to understand if this is a known issue? Looks like CNI does reconcile its database on a restart but maybe it needs to reconcile it periodically to prevent this?

Environment:

  • Kubernetes version (use kubectl version): 1.26
  • CNI Version: 1.18.2
  • OS (e.g: cat /etc/os-release): Amazon Linux 2023
  • Kernel (e.g. uname -a): 6.1
hbhasker pushed a commit to hbhasker/amazon-vpc-cni-k8s that referenced this issue Nov 13, 2024
The CNI today only reconciles its datastore with existing pods at
startup but never again. Sometimes its possible that IPAMD goes
out of sync with the kubelet's view of the pods running on the
node if it fails or is temporarily unreachable by the CNI plugin
handling the DelNetwork call from the contrainer runtime.

In such cases the CNI continues to consider the pods IP allocated
and will not free it as it will never see a DelNetwork again. This
results in CNI failing to assign IP's to new pods.

This change adds a reconcile loop which periodically (once a minute)
reconciles its allocated IPs with existence of pod's veth devices. If
the veth device is not found then it free's up the corresponding
allocation making the IP available for reuse.

Fixes aws#3109
@hbhasker hbhasker linked a pull request Nov 13, 2024 that will close this issue
@orsenthil
Copy link
Member

Hi @hbhasker , thank you for this report. Could you confirm this by looking at the ipamd.log If you share the logs in k8s-awscni-triage@amazon.com, we can look over it too.

@hbhasker
Copy link
Author

I did confirm by looking at the json for the datastore as well. It clearly had pods in there that had already terminated on the node. I will see if i run into another occurence of the same and capture more information.

hbhasker pushed a commit to hbhasker/amazon-vpc-cni-k8s that referenced this issue Nov 13, 2024
The CNI today only reconciles its datastore with existing pods at
startup but never again. Sometimes its possible that IPAMD goes
out of sync with the kubelet's view of the pods running on the
node if it fails or is temporarily unreachable by the CNI plugin
handling the DelNetwork call from the contrainer runtime.

In such cases the CNI continues to consider the pods IP allocated
and will not free it as it will never see a DelNetwork again. This
results in CNI failing to assign IP's to new pods.

This change adds a reconcile loop which periodically (once a minute)
reconciles its allocated IPs with existence of pod's veth devices. If
the veth device is not found then it free's up the corresponding
allocation making the IP available for reuse.

Fixes aws#3109
@jayanthvn
Copy link
Contributor

@orsenthil - We will have to check plugin logs if the delete request landed on CNI. Since kubelet is the source of truth. I don't think we should add more reconcilers rather check why the event was missed or not received..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants