-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host to service network not working after reboot/join after upgrade from v3.26.3 to v3.27.3 (v3.28.0) ebpf dataplane/vxlan/no kube-proxy/dsr #8867
Comments
What changes after killing the pod. Could you share your routing table before/after? |
Did you have ctlb disabled before upgrade? |
Nope bpfConnectTimeLoadBalancingEnabled It's all not set up currently |
Routes almost did not change
|
That route is correct. Since iirc 3.27 we route traffic from host to UDP service via that device by default. |
I wonder is some routes caching is in play. Could you dump |
BTW |
Just to confirm, do you see the same problem from host-networked pods/processes or from regular pods as well? I tried to reproduce the issue, I created a cluster in gcp with kubeadm and installed calico 3.26.4 and upgraded to 3.28 and my DNS did worked just fine. Would you be able to tcpdump whether your traffic is reaching the service, what kind of packets are exiting from If your cluster is not a production cluster we could dig deeper with enabling bpf logging to get some more useful logs. Ideally we could sync at calico users slack. |
Regular pods don't have a problem. They work fine. Tested it to be sure. So only host-networked pods/processes have problem |
So fresh vm node with single interface joined to cluster nslookup openebs-api-rest.openebs.svc.l8s.local. 10.243.0.10 When it's working well after calico-node pod kill |
That doesn't seem to be a problem ⬆️ Do you see packets returning to the client in both cases? you could also enable ⬇️ in default
|
it's a service IP |
that does not work bpfLogLevel: Debug
bpfLogFilters:
all: host 172.24.1.29 and udp port 53 however bpfLogFilters property disappeared from the object |
BTW I found a repeated error in the tigera operator
|
Thanks for the logs, helpful. It seems like the packets from bpfout.cali do not make it to any other device. Perhaps worth verifying with tcpdump. They seem to be eaten by the host network stack. They either have a wrong csum (unlikely that would not get fixed by calico-node restarting) or they get dropped by RPF (could you check the value in |
Before pod killing (after node restart when we have the problem)cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter
1 routeroute -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.24.1.1 0.0.0.0 UG 100 0 0 eth0
10.243.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.8.170 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.27.94 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.38.83 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.51.78 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.54.252 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.61.248 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.70.167 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.77.103 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.94.254 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.117.56 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.121.189 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.125.134 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.140.20 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.150.161 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.157.165 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.157.183 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.158.119 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.164.105 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.181.122 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.185.212 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.194.22 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.230.225 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.251.8 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.254.169 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.255.22 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.244.16.0 10.244.16.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.244.32.0 10.244.32.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.244.48.0 0.0.0.0 255.255.240.0 U 0 0 0 *
10.244.48.6 0.0.0.0 255.255.255.255 UH 0 0 0 califee8cfb24e3
10.244.48.7 0.0.0.0 255.255.255.255 UH 0 0 0 calie89ffdb4633
10.244.48.8 0.0.0.0 255.255.255.255 UH 0 0 0 cali193e57c628e
10.244.48.9 0.0.0.0 255.255.255.255 UH 0 0 0 calid31be247766
10.244.192.0 10.244.192.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.245.80.0 10.245.80.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.246.96.0 10.246.96.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.247.80.0 10.247.80.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.247.112.0 10.247.112.0 255.255.240.0 UG 0 0 0 vxlan.calico
169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 bpfin.cali
172.24.1.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0 iptablesiptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:zkuE8qdwsVpH6Kd2 */ /* Accept packets from flows that pre-date BPF. */ mark match 0x5000000/0x5000000 ctstate RELATED,ESTABLISHED
DROP all -- anywhere anywhere /* cali:XQL0mC-L6wldZdgN */ /* Drop packets from unknown flows. */ mark match 0x5000000/0x5000000
ACCEPT all -- anywhere anywhere /* cali:pbFdTFCLcV-MVLSS */ mark match 0x1000000/0x1000000
DROP all -- anywhere anywhere /* cali:u_TyW7ph8QsYnThE */ mark match ! 0x1000000/0x1000000
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:umcmOn0WnTNOKJrp */ /* Pre-approved by BPF programs. */ mark match 0x3000000/0x3000000
DROP all -- anywhere anywhere /* cali:NnQ109Z-tVFkJGc1 */ /* From workload without BPF seen mark */ mark match ! 0x1000000/0x1000000
MARK all -- anywhere anywhere /* cali:YmI_zfAgHIHbINEV */ /* Mark pre-established flows. */ ctstate RELATED,ESTABLISHED MARK or 0x8000000
cali-to-wl-dispatch all -- anywhere anywhere /* cali:-EFgmtwMJVO64q9s */ /* To workload, check workload is known. */
ACCEPT all -- anywhere anywhere /* cali:wP1i1sEU71uRzM5d */ /* To workload, mark has already been verified. */
ACCEPT all -- anywhere anywhere /* cali:3t5V_2xe4DFVlHBq */ /* From */ /* bpfout.cali */ /* device, mark verified, accept. */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
MARK all -- anywhere anywhere /* cali:CD7jZCSqPP_KjGsd */ /* Mark pre-established flows. */ ctstate RELATED,ESTABLISHED MARK or 0x8000000
KUBE-FIREWALL all -- anywhere anywhere
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain cali-to-wl-dispatch (1 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:cEwZ48PLVj36YM8T */
ACCEPT all -- anywhere anywhere /* cali:Geyg9JmnnDNPlLHX */
ACCEPT all -- anywhere anywhere /* cali:ICRLMI_1Qq8A0HWR */
ACCEPT all -- anywhere anywhere /* cali:bhG4cOJs_TnufY0G */
DROP all -- anywhere anywhere /* cali:k6ZOE-XDbClrqbFe */ /* Unknown interface */ After pod killing (when it is working well)cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter
0 routeroute -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.24.1.1 0.0.0.0 UG 100 0 0 eth0
10.243.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.8.170 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.27.94 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.38.83 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.51.78 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.54.252 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.61.248 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.70.167 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.77.103 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.94.254 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.117.56 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.121.189 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.125.134 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.140.20 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.150.161 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.157.165 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.157.183 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.158.119 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.164.105 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.181.122 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.185.212 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.194.22 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.230.225 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.251.8 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.254.169 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.243.255.22 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
10.244.16.0 10.244.16.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.244.32.0 10.244.32.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.244.48.0 0.0.0.0 255.255.240.0 U 0 0 0 *
10.244.48.6 0.0.0.0 255.255.255.255 UH 0 0 0 califee8cfb24e3
10.244.48.7 0.0.0.0 255.255.255.255 UH 0 0 0 calie89ffdb4633
10.244.48.8 0.0.0.0 255.255.255.255 UH 0 0 0 cali193e57c628e
10.244.48.9 0.0.0.0 255.255.255.255 UH 0 0 0 calid31be247766
10.244.192.0 10.244.192.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.245.80.0 10.245.80.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.246.96.0 10.246.96.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.247.80.0 10.247.80.0 255.255.240.0 UG 0 0 0 vxlan.calico
10.247.112.0 10.247.112.0 255.255.240.0 UG 0 0 0 vxlan.calico
169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 bpfin.cali
172.24.1.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0 iptablesiptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:zkuE8qdwsVpH6Kd2 */ /* Accept packets from flows that pre-date BPF. */ mark match 0x5000000/0x5000000 ctstate RELATED,ESTABLISHED
DROP all -- anywhere anywhere /* cali:XQL0mC-L6wldZdgN */ /* Drop packets from unknown flows. */ mark match 0x5000000/0x5000000
ACCEPT all -- anywhere anywhere /* cali:pbFdTFCLcV-MVLSS */ mark match 0x1000000/0x1000000
DROP all -- anywhere anywhere /* cali:u_TyW7ph8QsYnThE */ mark match ! 0x1000000/0x1000000
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:umcmOn0WnTNOKJrp */ /* Pre-approved by BPF programs. */ mark match 0x3000000/0x3000000
DROP all -- anywhere anywhere /* cali:NnQ109Z-tVFkJGc1 */ /* From workload without BPF seen mark */ mark match ! 0x1000000/0x1000000
MARK all -- anywhere anywhere /* cali:YmI_zfAgHIHbINEV */ /* Mark pre-established flows. */ ctstate RELATED,ESTABLISHED MARK or 0x8000000
cali-to-wl-dispatch all -- anywhere anywhere /* cali:-EFgmtwMJVO64q9s */ /* To workload, check workload is known. */
ACCEPT all -- anywhere anywhere /* cali:wP1i1sEU71uRzM5d */ /* To workload, mark has already been verified. */
ACCEPT all -- anywhere anywhere /* cali:3t5V_2xe4DFVlHBq */ /* From */ /* bpfout.cali */ /* device, mark verified, accept. */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
MARK all -- anywhere anywhere /* cali:CD7jZCSqPP_KjGsd */ /* Mark pre-established flows. */ ctstate RELATED,ESTABLISHED MARK or 0x8000000
KUBE-FIREWALL all -- anywhere anywhere
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain cali-to-wl-dispatch (1 references)
target prot opt source destination
ACCEPT all -- anywhere anywhere /* cali:cEwZ48PLVj36YM8T */
ACCEPT all -- anywhere anywhere /* cali:Geyg9JmnnDNPlLHX */
ACCEPT all -- anywhere anywhere /* cali:ICRLMI_1Qq8A0HWR */
ACCEPT all -- anywhere anywhere /* cali:bhG4cOJs_TnufY0G */
DROP all -- anywhere anywhere /* cali:k6ZOE-XDbClrqbFe */ /* Unknown interface */ |
That is the problem. Something sets it to 1(strict) and when calico-node restarts, is sets it back to 0. The something is probably your systemd which applies configuration when a new device is added. Seems like the issue is present with systemd 245+ What is your linux distro (which I should have asked a while ago)? |
|
in sysctl setting it is set as net.ipv4.conf.all.rp_filter=0 cat /etc/sysctl.conf
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
fs.inotify.max_queued_events=16384
fs.aio-max-nr=1048576
vm.max_map_count=262144
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv4.neigh.default.gc_thresh1=8192
net.ipv4.neigh.default.gc_thresh2=12228
net.ipv4.neigh.default.gc_thresh3=24456
net.core.somaxconn=65535
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.all.accept_local=1
kernel.panic=30
kernel.panic_on_oops=1
vm.overcommit_memory=2
vm.panic_on_oom=0 |
@a-sorokin-sdg do you still see the issue? Have you figured what is changing the rpf? Closing now, but feel free to reopen if you have any new info. |
Yes, still have the issue. |
host to service network not working after reboot/join after upgrade from v3.26.3 to v3.27.3 ebpfdataplane/vxlan/no kube-proxy/dsr
killing calico-node pod immediately fixing the problem
Expected Behavior
host to service network working well after node reboot/join
Current Behavior
host to kube service network not working after reboot/join node unless you kill calico-node pod
it start working after it
Possible Solution
kill calico-node pod on restarted or joined node
Steps to Reproduce (for bugs)
Context
Any pods with a host network would fail to start after rebooting/joining node
Cluster network works fine
Your Environment
kubernetes 1.29.5
calico-node log after new node join
calico-node install-cni log after new node join
calico-node log after killling pod
ccalico-node install-cni log after killling pod
The text was updated successfully, but these errors were encountered: