You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As pointed out in #1370 when we operate iptables-restore in it's default mode we have the potential to cause race-conditions with other iptables tooling that may exist on the host. In its default operation iptables-restore flushes all existing rules and reads in rules from scratch from the restore file given to it.
Using iptables in this way means that kube-router has the potentially to inadvertently change or remove other application's chains and rules. However, if we use iptables-restore --noflush we can choose to, for the most part, only operate on our own rules and chains. Any chain / rule that is not in the iptables restore input is left alone.
This is especially important because upstream netfilter does not have strong promises of backward compatibility with previous versions of user-space tooling. These two issues together essentially caused #1370 to occur because kube-proxy used a newer version of the user-space tools than kube-router did, when kube-router used iptables-save it wasn't actually fed all of the appropriate options to the kube-proxy rules. Thus when it did it's overly-broad iptables-restore operation, it changed rules that were not related to kube-router
Describe the solution you'd like
kube-router should rework the logic inside the NPC controller so that it uses iptables-restore --noflush and tries as hard as possible to only touch chains and rules that it is authoritative for. As suggested by the kube-proxy team, (thanks to @danwinship) the most efficient way to get from what we have now to what we need would likely be logic along the lines of:
Basically, when you use --noflush, there are three possibilities for a chain:
if you have no :CHAINNAME line for the chain, and no -X CHAINNAME or -A CHAINNAME rules, then the restore doesn't affect that chain at all
if you have :CHAINNAME and then later -X CHAINNAME (and no -A CHAINNAME), then the chain gets deleted
if you have :CHAINNAME (and no -X CHAINNAME), and 0 or more -A CHAINNAME ... lines, then the chain gets completely replaced by the rules in the restore input (just like what would have happened in the non---noflush case)
Is your feature request related to a problem? Please describe.
As pointed out in #1370 when we operate
iptables-restore
in it's default mode we have the potential to cause race-conditions with other iptables tooling that may exist on the host. In its default operationiptables-restore
flushes all existing rules and reads in rules from scratch from the restore file given to it.Using iptables in this way means that kube-router has the potentially to inadvertently change or remove other application's chains and rules. However, if we use
iptables-restore --noflush
we can choose to, for the most part, only operate on our own rules and chains. Any chain / rule that is not in the iptables restore input is left alone.This is especially important because upstream netfilter does not have strong promises of backward compatibility with previous versions of user-space tooling. These two issues together essentially caused #1370 to occur because kube-proxy used a newer version of the user-space tools than kube-router did, when kube-router used
iptables-save
it wasn't actually fed all of the appropriate options to the kube-proxy rules. Thus when it did it's overly-broadiptables-restore
operation, it changed rules that were not related to kube-routerDescribe the solution you'd like
kube-router should rework the logic inside the NPC controller so that it uses
iptables-restore --noflush
and tries as hard as possible to only touch chains and rules that it is authoritative for. As suggested by the kube-proxy team, (thanks to @danwinship) the most efficient way to get from what we have now to what we need would likely be logic along the lines of:Additional context
The text was updated successfully, but these errors were encountered: