Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod and uplink port MTU mismatched after a testbed is changed from encap to noEncap mode #6456

Closed
luolanzone opened this issue Jun 18, 2024 · 6 comments · Fixed by #6577
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@luolanzone
Copy link
Contributor

Describe the bug

Recently, a few flexible IPAM e2e tests related with ping keep failing on the dedicated testbed. The error is like below:

 connectivity_test.go:101: Ping mesh test between all Pods
    connectivity_test.go:116: Ping 'testantreaipam-icwltfko/connectivity-testdifferentnodes-7zx9f' -> 'testantreaipam-icwltfko/connectivity-testdifferentnodes-hlhhz': ERROR (error when running ping command 'ping -c 5 -s 1472 -M do -4 192.168.250.81': command terminated with exit code 1 - stdout: PING 192.168.250.81 (192.168.250.81) 1472(1500) bytes of data.
        From 192.168.248.1 icmp_seq=1 Frag needed and DF set (mtu = 1450)
        
        --- 192.168.250.81 ping statistics ---
        5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4088ms
        
         - stderr: ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        ping: local error: message too long, mtu=1450
        )

After checking the environment, we found that the test client Pod interface's MTU is 1500:
2: eth0@if1728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

but the uplink port MTU is 1450:
3008: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000

After a few troubleshooting, we found that this dedicated testbed was deployed with encap mode unexpectedly in an e2e tests, then with noEcap + FlexibleIPAM in new tests.

The existing coredns Pod's MTU was configured by Antrea (with Encap mode) to 1450. According to @hongliangl 's comment, Antrea will create uplink port with a minimal MTU in current Node which is 1450 now (because of coredns is already configured as 1450 before). Even the testbed is redeployed with noEncap + flexibleIPAM, the uplink port's MTU will still be 1450, which failed the ping tests.

There are obviously mismatched between Pod's MTU and uplink port's MTU. Considering that we can't stop users to try with Antrea with encap mode first, then reploy the same testbed with Antrea with noEncap + flexibleIPAM, we may need to think about how to reset uplink MTU when the mode is changed, or other enhancement solution to fix the mismatch.

@luolanzone luolanzone added the kind/bug Categorizes issue or PR as related to a bug. label Jun 18, 2024
@luolanzone
Copy link
Contributor Author

cc @wenyingd @gran-vmv @hongliangl please check if you have more comments or input for this issue, thanks.

@luolanzone
Copy link
Contributor Author

adding @antoninbas @tnqn for awareness.

@antoninbas antoninbas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 18, 2024
@antoninbas
Copy link
Contributor

To be clear, I don't think this is specific to FlexibleIPAM. Changing the traffic mode will cause MTU issues in different situations, even when the uplink is not moved to the bridge.
For example, deploying Antrea in noEncap mode first, then switching to encap mode, will not update the MTU of existing Pods.

  1. helm install -n kube-system antrea antrea/antrea --set trafficEncapMode=noEncap
  2. Create an antrea/toolbox Pod and check the MTU, it should be 1500 (assuming that the MTU of the physical interface is 1500)
  3. helm uninstall -n kube-system antrea
  4. helm install -n kube-system antrea antrea/antrea --set trafficEncapMode=encap
  5. Check the MTU of the interface for the toolbox Pod: it will stay at 1500 instead of being updated to 1450 as required by the new datapath.
  6. If you recreate the Pod, the MTU for the new Pod will be correct (1450).

I used a toolbox Pod here to be able to exec into it easily, but the issue will also affect the default coreDNS Pods.

Note that the MTU of the antrea-gw0 interface will be updated, as this is something we handle correctly:

antrea/pkg/agent/agent.go

Lines 684 to 694 in 5c1141e

// Idempotent operation to set the gateway's MTU: we perform this operation regardless of
// whether the gateway interface already exists, as the desired MTU may change across
// restarts.
klog.V(4).Infof("Setting gateway interface %s MTU to %d", i.hostGateway, i.networkConfig.InterfaceMTU)
if err := i.configureGatewayInterface(gatewayIface); err != nil {
return err
}
if err := i.setInterfaceMTU(i.hostGateway, i.networkConfig.InterfaceMTU); err != nil {
return err
}

Would it be reasonable for each Agent to iterate through all Pod interfaces after initializing the interface store, in order to update the MTU if needed? It should also take care of the uplink case for FlexibleIPAM, as long as the uplink port is created last.

@antoninbas
Copy link
Contributor

I am back from my leave so I can handle this if Hongliang is busy with BGP support - @luolanzone @hongliangl

@luolanzone
Copy link
Contributor Author

@antoninbas you can take this one since hongliang is busy on BGP and will take two weeks leave for family. Thanks a lot!

@luolanzone luolanzone assigned antoninbas and unassigned hongliangl Jul 25, 2024
@antoninbas
Copy link
Contributor

Would it be reasonable for each Agent to iterate through all Pod interfaces after initializing the interface store, in order to update the MTU if needed?

Thinking about this more, I am not so sure it is a great idea. It is certainly possible, but there is some risk involved and I don't believe other CNIs support this.

Instead we can just request the correct MTU when configuring the uplink port. Even if some existing containers are attached to the bridge and have a lower MTU, OVS will still accept the provided MTU. Users are responsible for restarting workloads if they want them to use the correct MTU as well (based on the updated Antrea configuration).

We can revisit in the future if there is a user request to automatically update the MTU for existing workloads in case of a configuration change.

antoninbas added a commit to antoninbas/antrea that referenced this issue Jul 31, 2024
In bridging mode (on Linux), when moving the physical adapter to the
bridge, we explictly set the MTU for the bridge port to the same value
as for the physical adapter. Without this change, the MTU may default to
a different (lower) value if some existing container ports have a lower
MTU value. For example, this occurs when first installing Antrea in
encap mode, then re-installing Antrea in noEncap mode with bridging mode
enabled.

We also do some minor documentation updates to indicate to users that
they should consider restarting existing workloads when updating the
Antrea datapath configuration.

Fixes antrea-io#6456

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Jul 31, 2024
In bridging mode (on Linux), when moving the physical adapter to the
bridge, we explictly set the MTU for the bridge port to the same value
as for the physical adapter. Without this change, the MTU may default to
a different (lower) value if some existing container ports have a lower
MTU value. For example, this occurs when first installing Antrea in
encap mode, then re-installing Antrea in noEncap mode with bridging mode
enabled.

We also do some minor documentation updates to indicate to users that
they should consider restarting existing workloads when updating the
Antrea datapath configuration.

Fixes antrea-io#6456

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Aug 2, 2024
In bridging mode (on Linux), when moving the physical adapter to the
bridge, we explictly set the MTU for the bridge port to the same value
as for the physical adapter. Without this change, the MTU may default to
a different (lower) value if some existing container ports have a lower
MTU value. For example, this occurs when first installing Antrea in
encap mode, then re-installing Antrea in noEncap mode with bridging mode
enabled.

We also do some minor documentation updates to indicate to users that
they should consider restarting existing workloads when updating the
Antrea datapath configuration.

Fixes antrea-io#6456

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants