Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute MASQ rules after kube-proxy to avoid conflict - reenable vxlan csum offload #9091

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

tomastigera
Copy link
Contributor

@tomastigera tomastigera commented Aug 1, 2024

Description

Calico needs to do MASQ for several reasons and kube-proxy does it when
it forwards nodeports to another node. Calico conflicts with this MASQ
in case it routes traffic to the destionation via tunnel. This may
result in a conflict which triggers wrongly calculated checksums in case
tx csum offloading is enabled on the tunnel device, which lead to
disabling offloading on the tunnel and thus to significant performance
hit.

Do MASQ after kube-proxy means that if kube-proxy already did it,
calico's is not executed - it would be a duplicate anyway. Other MASQ
and SNAT cases in the calico chain are othogonal to the kube-proxy
usecase and thus execute as kube-proxy does not do MASQ in that case.

Note that unlik ein other chains where calico may do policy and thus
needs to go first, calico does not execute any policy in POSTROUTING
nat.

fixes #8860

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Fix interaction between kube-proxy and Calico's SNAT rules that could cause corrupted VXLAN packets when checksum offload was enabled.  Move Calico's rules after kube-proxy's to make sure kube-proxy's mark bit is cleared if both would have done SNAT.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@marvin-tigera marvin-tigera added this to the Calico v3.29.0 milestone Aug 1, 2024
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Aug 1, 2024
Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a go at describing the issue in full in the comment. What about nftables mode; how do we interact with kube-proxy in that case?

felix/dataplane/linux/int_dataplane.go Outdated Show resolved Hide resolved
Calico needs to do MASQ for several reasons and kube-proxy does it when
it forwards nodeports to another node. Calico conflicts with this MASQ
in case it routes traffic to the destionation via tunnel. This may
result in a conflict which triggers wrongly calculated checksums in case
tx csum offloading is enabled on the tunnel device, which lead to
disabling offloading on the tunnel and thus to significant performance
hit.

Do MASQ after kube-proxy means that if kube-proxy already did it,
calico's is not executed - it would be a duplicate anyway. Other MASQ
and SNAT cases in the calico chain are othogonal to the kube-proxy
usecase and thus execute as kube-proxy does not do MASQ in that case.

Note that unlik ein other chains where calico may do policy and thus
needs to go first, calico does not execute any policy in POSTROUTING
nat.
@tomastigera tomastigera marked this pull request as ready for review August 2, 2024 20:21
@tomastigera tomastigera requested a review from a team as a code owner August 2, 2024 20:21
@fasaxc fasaxc added docs-not-required Docs not required for this change cherry-pick-candidate and removed docs-pr-required Change is not yet documented labels Aug 6, 2024
@tomastigera tomastigera merged commit ac7af84 into projectcalico:master Aug 6, 2024
2 checks passed
tomastigera added a commit that referenced this pull request Aug 7, 2024
…-release-v3.28

[release-v3.28] Auto pick #9091: Execute MASQ rules after kube-proxy to avoid conflict - reenable vxlan csum offload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-candidate docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection issue for a multiple zone cluster with calico 3.28.0
3 participants