Connection issue for a multiple zone cluster with calico 3.28.0 #8860

lzhecheng · 2024-05-28T05:04:45Z

Expected Behavior

A Node can reach a service whose endpoint is on another Node (different zone) immediately.

Current Behavior

A Node cannot reach a service whose endpoint is on another Node (different zone) immediately. The first packet is dropped and the second one works.

Possible Solution

Use calico 3.27.3

Steps to Reproduce (for bugs)

Create a multiple zone cluster
Create a service with endpoints on nodes of different zones
wget from a Node to the service
If the endpoint is not on the same node, the first packet is lost

Context

Details here: kubernetes-sigs/cloud-provider-azure#6293

Your Environment

Calico version
Orchestrator version (e.g. kubernetes, mesos, rkt):
Operating System and version:
Link to your project (optional):

matthewdupre · 2024-06-04T16:25:11Z

Sounds like a regression, we'll have a look

sfudeus · 2024-06-25T12:03:20Z

I might have a similar issue and was just about to open a bugreport. For me, this is related to VXLAN checksum offloading.
For vxlan-tunneled traffic, the first SYN is lost and has to be resent.
Please advise if I should add my data here or create a dedicated issue.

@lzhecheng Can you try with disabling checksum offloading (featureDetectOverride: ChecksumOffloadBroken=true in FelixConfiguration) to see if it makes a difference for you?

sfudeus · 2024-06-26T15:14:38Z

I'm adding the basics of what I am observing here:
With VXLAN checksum offloading enabled, I do observe the following:
For any traffic, which is

directed against a NodePort or an external/loadBalancerIP
forwarded to a pod on a different subnet(i.e. requiring vxlan, likely happening always when using vxlanMode: Always instead of CrossSubnet)
the first SYN packet is lost.

I could not observe packet loss when directing the traffic against the podIP itself, only via K8s iptables rules for NodePort/LoadBalancer, likely because of NAT?

I observed the packet loss to happen only on the destination node, between the physical interface and the pod interface, i.e. I could still see the first SYN packet VXLAN encapsulated on the physical interface, but only the second SYN popped up on the pod interface (cali*).

My test client was only running in the hostNetwork, I didn't test from a pod (yet).

matthewdupre · 2024-07-22T23:10:31Z

We've got an internal repro of a kernel problem where when using VXLAN offloading and a packet is SNATted multiple time (in this case for node->service->pod traffic) the checksum doesn't get calculated properly.

We'll revert ChecksumOffloadBroken to true in the next patch release (3.28.1) while we look into alternative fixes (perhaps finding a way to prevent the double SNAT).

I'm not certain that this exactly matches the issue here, but it certainly sounds similar.

matthewdupre · 2024-07-24T19:49:35Z

I think this is likely a kernel problem with VXLAN checksum offload when there are multiple SNATs (which can happen in this kind of host -> service -> pod connection). We're going to disable this in 3.28.1 and will then try and look for a way to get the offload back.

tomastigera · 2024-07-30T17:14:42Z

@lzhecheng @sfudeus thanks for reporting. The kernel issue was supposed to be fixed, but we were looking for a possible repro.

what kernel / linux distro do you use? Do you use any public cloud?

sfudeus · 2024-07-30T17:59:02Z

@tomastigera Pure on-premise on metal for us, currently Flatcar Container Linux 3975.1.1 (beta channel) with kernel 6.6.36-flatcar (likely 3941.1.0 with 6.6.30-flatcar at the time of reporting).

lzhecheng · 2024-07-31T01:55:28Z

@tomastigera my cluster was created on Azure with CAPZ.

Version 6.5.0-1024-azure from a latest VM. I'm not sure the version when reporting the issue.

tomastigera · 2024-07-31T23:03:24Z

I observed the packet loss to happen only on the destination node, between the physical interface and the pod interface

Yes, the first packet has wrong udp csum and thus is dropped by the vxlan device and is not forwarded to the pod.

@sfudeus do you observe the issue with ebpf as well? My understanding is that in iptables, the situation is created by a conflict between calico and kube-proxy rules. In ebpf, there are no kube-proxy rules and packets take a completely different path and so I would expect this not to happen.

sfudeus · 2024-08-01T07:36:00Z

@tomastigera IIRC we saw this with non-ebpf only, but I'll recheck. Not sure when I get to this though, might only be next week.

tomastigera · 2024-08-01T21:44:14Z

It is not an eBPF issue for sure.

tomastigera · 2024-08-01T21:48:06Z

This image thruby/node:3-28-vxlan-csum-fix-2 carries the fix if anybody wants to give it a try.

sfudeus · 2024-08-05T08:38:27Z

@tomastigera I can confirm that I cannot observe the first-syn issues anymore using thruby/node:3-28-vxlan-csum-fix-2 despite csum offloading enabled in a non-ebpf cluster. Is there anything special I should keep an eye on, order of deployments or something like that? I already tried to see if it makes a difference if calico-node or kube-proxy was delayed last, I didn't observe a difference there.

matthewdupre added the kind/bug label Jun 4, 2024

matthewdupre self-assigned this Jun 4, 2024

caseydavenport added likelihood/high impact/high labels Jun 5, 2024

matthewdupre added this to the Calico v3.28.1 milestone Jul 24, 2024

danudey modified the milestones: Calico v3.28.1, Calico v3.28.2 Jul 30, 2024

tomastigera mentioned this issue Aug 1, 2024

Execute MASQ rules after kube-proxy to avoid conflict - reenable vxlan csum offload #9091

Merged

3 tasks

tomastigera assigned tomastigera and unassigned matthewdupre Aug 6, 2024

tomastigera closed this as completed in #9091 Aug 6, 2024

tomastigera mentioned this issue Aug 6, 2024

[release-v3.28] Auto pick #9091: Execute MASQ rules after kube-proxy to avoid conflict - reenable vxlan csum offload #9102

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection issue for a multiple zone cluster with calico 3.28.0 #8860

Connection issue for a multiple zone cluster with calico 3.28.0 #8860

lzhecheng commented May 28, 2024 •

edited

Loading

matthewdupre commented Jun 4, 2024

sfudeus commented Jun 25, 2024

sfudeus commented Jun 26, 2024

matthewdupre commented Jul 22, 2024 •

edited

Loading

matthewdupre commented Jul 24, 2024

tomastigera commented Jul 30, 2024

sfudeus commented Jul 30, 2024

lzhecheng commented Jul 31, 2024

tomastigera commented Jul 31, 2024

sfudeus commented Aug 1, 2024

tomastigera commented Aug 1, 2024

tomastigera commented Aug 1, 2024

sfudeus commented Aug 5, 2024

Connection issue for a multiple zone cluster with calico 3.28.0 #8860

Connection issue for a multiple zone cluster with calico 3.28.0 #8860

Comments

lzhecheng commented May 28, 2024 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

matthewdupre commented Jun 4, 2024

sfudeus commented Jun 25, 2024

sfudeus commented Jun 26, 2024

matthewdupre commented Jul 22, 2024 • edited Loading

matthewdupre commented Jul 24, 2024

tomastigera commented Jul 30, 2024

sfudeus commented Jul 30, 2024

lzhecheng commented Jul 31, 2024

tomastigera commented Jul 31, 2024

sfudeus commented Aug 1, 2024

tomastigera commented Aug 1, 2024

tomastigera commented Aug 1, 2024

sfudeus commented Aug 5, 2024

lzhecheng commented May 28, 2024 •

edited

Loading

matthewdupre commented Jul 22, 2024 •

edited

Loading