Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

luolanzone · 2024-06-18T03:34:22Z

Describe the bug

Recently, a few flexible IPAM e2e tests related with ping keep failing on the dedicated testbed. The error is like below:

 connectivity_test.go:101: Ping mesh test between all Pods
    connectivity_test.go:116: Ping 'testantreaipam-icwltfko/connectivity-testdifferentnodes-7zx9f' -> 'testantreaipam-icwltfko/connectivity-testdifferentnodes-hlhhz': ERROR (error when running ping command 'ping -c 5 -s 1472 -M do -4 192.168.250.81': command terminated with exit code 1 - stdout: PING 192.168.250.81 (192.168.250.81) 1472(1500) bytes of data.
        From 192.168.248.1 icmp_seq=1 Frag needed and DF set (mtu = 1450)
        
        --- 192.168.250.81 ping statistics ---
        5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4088ms
        
         - stderr: ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        )

After checking the environment, we found that the test client Pod interface's MTU is 1500:
2: eth0@if1728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

but the uplink port MTU is 1450:
3008: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000

After a few troubleshooting, we found that this dedicated testbed was deployed with encap mode unexpectedly in an e2e tests, then with noEcap + FlexibleIPAM in new tests.

The existing coredns Pod's MTU was configured by Antrea (with Encap mode) to 1450. According to @hongliangl 's comment, Antrea will create uplink port with a minimal MTU in current Node which is 1450 now (because of coredns is already configured as 1450 before). Even the testbed is redeployed with noEncap + flexibleIPAM, the uplink port's MTU will still be 1450, which failed the ping tests.

There are obviously mismatched between Pod's MTU and uplink port's MTU. Considering that we can't stop users to try with Antrea with encap mode first, then reploy the same testbed with Antrea with noEncap + flexibleIPAM, we may need to think about how to reset uplink MTU when the mode is changed, or other enhancement solution to fix the mismatch.

The text was updated successfully, but these errors were encountered:

luolanzone · 2024-06-18T03:35:15Z

cc @wenyingd @gran-vmv @hongliangl please check if you have more comments or input for this issue, thanks.

luolanzone · 2024-06-18T03:50:58Z

adding @antoninbas @tnqn for awareness.

antoninbas · 2024-06-18T18:55:29Z

To be clear, I don't think this is specific to FlexibleIPAM. Changing the traffic mode will cause MTU issues in different situations, even when the uplink is not moved to the bridge.
For example, deploying Antrea in noEncap mode first, then switching to encap mode, will not update the MTU of existing Pods.

helm install -n kube-system antrea antrea/antrea --set trafficEncapMode=noEncap
Create an antrea/toolbox Pod and check the MTU, it should be 1500 (assuming that the MTU of the physical interface is 1500)
helm uninstall -n kube-system antrea
helm install -n kube-system antrea antrea/antrea --set trafficEncapMode=encap
Check the MTU of the interface for the toolbox Pod: it will stay at 1500 instead of being updated to 1450 as required by the new datapath.
If you recreate the Pod, the MTU for the new Pod will be correct (1450).

I used a toolbox Pod here to be able to exec into it easily, but the issue will also affect the default coreDNS Pods.

Note that the MTU of the antrea-gw0 interface will be updated, as this is something we handle correctly:

antrea/pkg/agent/agent.go

Lines 684 to 694 in 5c1141e

    
           // Idempotent operation to set the gateway's MTU: we perform this operation regardless of 
        
           // whether the gateway interface already exists, as the desired MTU may change across 
        
           // restarts. 
        
           klog.V(4).Infof("Setting gateway interface %s MTU to %d", i.hostGateway, i.networkConfig.InterfaceMTU) 
        
           if err := i.configureGatewayInterface(gatewayIface); err != nil { 
        
           	return err 
        
           } 
        
           if err := i.setInterfaceMTU(i.hostGateway, i.networkConfig.InterfaceMTU); err != nil { 
        
           	return err 
        
           }

Would it be reasonable for each Agent to iterate through all Pod interfaces after initializing the interface store, in order to update the MTU if needed? It should also take care of the uplink case for FlexibleIPAM, as long as the uplink port is created last.

antoninbas · 2024-07-22T17:52:25Z

I am back from my leave so I can handle this if Hongliang is busy with BGP support - @luolanzone @hongliangl

luolanzone · 2024-07-25T02:16:46Z

@antoninbas you can take this one since hongliang is busy on BGP and will take two weeks leave for family. Thanks a lot!

antoninbas · 2024-07-30T23:08:25Z

Would it be reasonable for each Agent to iterate through all Pod interfaces after initializing the interface store, in order to update the MTU if needed?

Thinking about this more, I am not so sure it is a great idea. It is certainly possible, but there is some risk involved and I don't believe other CNIs support this.

Instead we can just request the correct MTU when configuring the uplink port. Even if some existing containers are attached to the bridge and have a lower MTU, OVS will still accept the provided MTU. Users are responsible for restarting workloads if they want them to use the correct MTU as well (based on the updated Antrea configuration).

We can revisit in the future if there is a user request to automatically update the MTU for existing workloads in case of a configuration change.

In bridging mode (on Linux), when moving the physical adapter to the bridge, we explictly set the MTU for the bridge port to the same value as for the physical adapter. Without this change, the MTU may default to a different (lower) value if some existing container ports have a lower MTU value. For example, this occurs when first installing Antrea in encap mode, then re-installing Antrea in noEncap mode with bridging mode enabled. We also do some minor documentation updates to indicate to users that they should consider restarting existing workloads when updating the Antrea datapath configuration. Fixes antrea-io#6456 Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>

luolanzone added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2024

luolanzone mentioned this issue Jun 18, 2024

Fix manifest generation step for e2e test #6452

Merged

antoninbas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 18, 2024

antoninbas added this to the Antrea v2.1 release milestone Jun 18, 2024

luolanzone modified the milestones: Antrea v2.1 release, Antrea v2.2 release Jul 4, 2024

luolanzone assigned hongliangl Jul 4, 2024

luolanzone assigned antoninbas and unassigned hongliangl Jul 25, 2024

antoninbas mentioned this issue Jul 31, 2024

Use same MTU as uplink for bridge port #6577

Merged

antoninbas closed this as completed in ffa1af6 Aug 6, 2024

antoninbas closed this as completed in #6577 Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

luolanzone commented Jun 18, 2024

luolanzone commented Jun 18, 2024

luolanzone commented Jun 18, 2024

antoninbas commented Jun 18, 2024

antoninbas commented Jul 22, 2024

luolanzone commented Jul 25, 2024

antoninbas commented Jul 30, 2024

Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

Comments

luolanzone commented Jun 18, 2024

luolanzone commented Jun 18, 2024

luolanzone commented Jun 18, 2024

antoninbas commented Jun 18, 2024

antoninbas commented Jul 22, 2024

luolanzone commented Jul 25, 2024

antoninbas commented Jul 30, 2024