-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico panics if kube-proxy using nftables mode #8025
Comments
Not quite, Felix trying to get the iptables rules from your system using iptables-nft-save, but that command is failing. After retrying, felix gives up and falls on its sword in an attempt to recover. See https://github.com/projectcalico/calico/blob/master/felix/iptables/table.go#L750. So the question is, what are the "incompatible entries" in the filter table that iptables-nft-save doesn't like? Has felix chosen to use nft-tables on your system incorrectly? And what is creating the incompatible entries? Can you get the dump of iptables rules from that system and add them here please? |
What is the version of |
We are Also facing the Same Issue in the below environment.
Please let us know if there is any resolution. On observation that we have seen is if we flush the IP tables we are not seeing this issue. The nft list ruleset command output looks like the below. |
Calico uses iptables 1.8.4 and it may lead to incompatibility with the other versions of iptables in the system. It needs some investigation. Could you build a calico-node image with 1.8.9 and test it out perhaps? https://github.com/projectcalico/calico/blob/release-v3.24/node/Dockerfile.amd64#L16C18-L16C26 |
We have a k8s cluster running on RHEL 9.2, using nftables, and canal (image calico 3.26.1/flanel 0.21.4). The canal daemon-set attempts to have 2 ready for each worker node and you will see that it can only have 1 of 2 running. As an FYI, iptables is depricated in RHEL 9 and Canal and firewalld don't play well. The following in my ruleset for nft is the issue regardless of syntax. Runs great without it.
or
The rule causes the exact behavior described above. Best regards |
I will have to update this. Over the weekend without the rule 3 nodes turned "Not Ready", thus this doesn't appear to be a specific rule. It also must be randon as there 5 other nodes working just fine. |
My Environment
solved: I apt upgrade and reboot then running........maybe I only need reboot system? |
We are having the same issue after a cluster upgrade. Operating system
|
I tried the following since I posted the issue.
The nodes now look stable. When the issue persisted, I observed that:
I updated the
|
I've lost several days iterating through this problem, and it's unsolveable without a complete rewrite of the NFT support in Calico. The underlying problem here is that instead of adding real nftables support, it was added by using the iptables emulation layer. If any other subsystem on the node makes use of nft features incompatible with iptables, calico-node breaks entirely and ceases to work.
At this time the current versions of all of the following make nft-specific changes to the rules which will cause calico to break:
There is no solution, so people are being forced to switch away from Calico to restore their kubernetes cluster networking |
Set |
Apologies to all users on this thread that our documentation failed to provide the solution using FELIX_IPTABLESBACKEND. Doc/Ops team is looking at the best places (probably several) to ensure no one has to struggle with this again. |
That would be a very odd definition of "fixed" -- you must be referring to usage which means castrated? 😉 If my kernel is using nftables then even if I could run iptables and nftables side by side, why in the world would I want to have that confusion? And most modern distro releases don't even have the legacy option available any more. The move to nftables is approaching a decade old. Calico needs to update away from classic iptables before there's no kernels left that support it.
The documentation makes clear how AUTO and NFT options work. This isn't the problem. The problem is that your NFT support is still using iptables commands. It's not really NFT support, it's a passthrough to an emulator that tries to present the nftables in iptables output. Which fails with even simple native nft tables. When set to NFT, you should be using |
You raise good points that are being reviewed. I agree "fixed" was not the best choice of words here. As a writer, I can only help avoid churn, frustration, and time lost troubleshooting for other users until a proper solution is in place. |
Oh, I took no offense to your use of the word. As a writer myself, I tried to play with the word to make it clear I was laughing so that I didn't come off too intensely critical. Yes, the situation is complex, especially when supporting multiple generations of kernels in heterogeneous environments, and I know it's been tricky for projects to find the right balance of embracing nft while continuing to support iptables. I'm just trying to push that doing the investment in pure nftables support is necessary at this point, now that other projects have made that investment and the tables are no longer backwards compatible with iptables. |
I apologize for my confusion. Could some of you elaborate on some of the deep technical points raised here?
This would significantly improve my understanding so I can be on the same level as some of you. I appreciate any help you can provide. |
Under the hood you would run one, but yes, it may lead to some incompatibilities (perhaps referred here as confusion).
Moder means that newer versions do not come with compatibility packages between iptables and nftables. As @bmckercher123 said, we are looking into this issue and we will address it one way or another. Thanks for reporting the issue and apologies for the current troubles. |
Sorry @tomastigera @zzvara, setting the mode to legacy is not a solution to this problem. The current best answer is to use iptables-nft for all your components until we get a proper nftables backend in place. Using a mix of legacy iptables and nftables doesn't fail (assuming your kernel supports both) but the behaviour is very counter-intuitive. nftables can "undo" the verdict made by iptables-legacy so your policy may not get properly enforced and the failures will be confusing. I understand the desire to jump to "proper" nftables mode ASAP but please bear in mind that kubernetes nftables mode is in Alpha in v1.29. It's not ready for prod use either. We've been relying on the itpables-nft translation layer for a long time, which has meant that we're in sync with kube-proxy. If we moved to native nftables before kube-proxy then we'd have caused the same problem for kube-proxy! Clearly, now that kube-proxy has nftables support, we also need to add it ASAP in order to remain in sync. I for one didn't spot that nftables support was on the slate for v1.29. |
Which does not work, as the issue reports here and as I and other have reported. iptables-nft fails when anything that iptables cannot express is in the nft tables, and every other project involved in kubernetes is now adding rules that are incompatible.
Sounds like the definition of failure to me. It works only in very limited circumstances and debugging it is as confusing as hell.
kube-proxy 1.23+ is what is creating nft tables that iptables-nft can't parse, and causing calico to fail.
Yes, this is the core problem. |
Yes, there are two issues here:
Hopefully, this falls under needing to bump the iptables version to support the latest version of the compatibility shim so hopefully we can get a fix for that out soon. Unfortunately, that fix won't make kube-proxy |
#8416 updates the version of the compatibility layer we include in Calico, and so should solve this first bullet point and make Calico compatible with kube-proxy when both are running in iptables-nft compatibility mode. As @fasaxc suggested above, in order to support compatibility with other users of nftables we will likely need to stop depending on the itpables-nft compatibility layer. I'll be looking into this. |
Hi team, is there any workaround for this issue? |
@luniHw Yes, the workaround is to not use kube-proxy in nftables mode with Calico! |
Expected Behavior
One of my nodes is emitting a warning about incompatible nft rules and then panic-looping while failing to log something.
Current Behavior
Possible Solution
The error doesn't suggest how to remove the incompatible rules. I've tried
nft flush ruleset
but the problem consistently comes back.Steps to Reproduce (for bugs)
Context
Your Environment
The text was updated successfully, but these errors were encountered: