-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User defined primary network canary impl #4459
User defined primary network canary impl #4459
Conversation
The namespace active-network annotation is handled at the NAD controller, since namespaces are common to the cluster they need to be updated at a common places, in the case of IC this is control-manager. This change add the namespace informer to it so nad controller can check if the annotation is already set. Signed-off-by: Enrique Llorente <ellorent@redhat.com>
At IC the controller manager will be responsible of of patching namespace with active-network, so it needs permissions to do so. Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Add a net attach def manager system on react on creation/deletion of NAds, this will be used to attach at network segmentation namespace manager that reacts on NAD events. Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This change ensure that there are no multiple primary network at the same namespace, that there are no pods before activate the primary network and update the namesapce with active-network if everything is fine else it will annotate with "unknown", Also namespace is created with active-network="default". Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This change add the following scenarios: - Delete a duplicate primary network nad - Delete primary network nad with no pods - Delete primary network nad with pods Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This commit adds the primary network boolean field to a pod's annotation info so that this information can be used to tell if the pod is managed by the default network OR secondary network. This will be useful in future commits and also for other feature controller's to leverage in the future. k8s.ovn.org/pod-networks: '{ "default":{ "ip_addresses":["10.244.1.4/24"], "mac_address":"0a:58:0a:f4:01:04", "routes":[{"dest":"10.244.0.0/16","nextHop":"10.244.1.1"},{"dest":"100.64.0.0/16","nextHop":"10.244.1.1"}], "ip_address":"10.244.1.4/24", "primary":false }, "ns2/l2-network":{ "ip_addresses":["10.100.200.9/24"], "mac_address":"0a:58:0a:64:c8:09", "ip_address":"10.100.200.9/24", "tunnel_id":2, "primary":true } }' Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit ensures that for L3 and L2 by default all traffic should leave via the default gatewayIP of the podsubnet that is part of the user defined secondary network which is the delegated primary network of the pod. L3: surya@fedora:~/user-defined-networks$ oc rsh -n ns1 tinypod / # ip r default via 10.128.0.1 dev net1 10.96.0.0/16 via 10.128.0.1 dev net1 10.128.0.0/24 dev net1 proto kernel scope link src 10.128.0.4 10.128.0.0/16 via 10.128.0.1 dev net1 10.244.0.0/16 via 10.244.1.1 dev eth0 10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.3 100.64.0.0/16 via 10.244.1.1 dev eth0 L2: / # ip r 10.96.0.0/16 via 10.100.200.1 dev net1 10.100.200.0/24 dev net1 proto kernel scope link src 10.100.200.9 10.244.0.0/16 via 10.244.1.1 dev eth0 10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.4 100.64.0.0/16 via 10.244.1.1 dev eth0 Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
|
||
// TODO: this should be a watch grabbed from the watch factory | ||
nadClient networkattchmentdefclientset.Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a leftover; but we'll need the NAD watcher to plumb into the allocator, so we can figure out what's the namespace's primary network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did this already in my PR there is NAD watcher
ns, err := a.namespaceLister.Get(pod.Namespace) | ||
if err != nil { | ||
return err | ||
} | ||
activeNetworkName, hasAnActiveNetwork := ns.Annotations["k8s.ovn.org/active-network"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
if hasAnActiveNetwork && activeNetworkName != "default" { | ||
network := &nettypes.NetworkSelectionElement{ | ||
// TODO: below we need to have the NAD name, *not* the network name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tssurya we need your helper to give us the NAD name since that's what the ovnkube-controller uses to match on the ovn-k annotations.
Plus, keep in mind the network is cluster scoped, while the NAD is namespace scoped. Trying to say that <ns>/<network name>
doesn't make sense.
// TODO: we need to get the NAD name here, not the network name. | ||
Name: bsnc.GetNetworkName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tssurya I assume your helper will be available here.
We also need to figure out what's the NAD name here, in order to build the network selection element.
@@ -235,6 +235,15 @@ func (bsnc *BaseSecondaryNetworkController) ensurePodForSecondaryNetwork(pod *ka | |||
return err | |||
} | |||
|
|||
if bsnc.IsPrimaryNetwork() && (bsnc.TopologyType() == "layer2" || bsnc.TopologyType() == "layer3") { | |||
network := &nadapi.NetworkSelectionElement{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qinqon we'll need to come up w/ a way to figure out the name of the IPAMClaim for a persistent IPs user defined network.
Maybe we can just hard-code it ?... Like ... <vm name>.user-defined-primary-net
... or something like that. The kubevirt controller would create them w/ that name if there is a NAD on the namespace of the VM w/ the primary knob enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just annotate the virt-launcher pods with ipamclaim name at kubevirt-ipam-controller k8s.ovn.org/ipamclaim-refrence
or the like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we do that ? aren't we actually saying everywhere else that annotation / labeling based solutions are bad, and should not be used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetworkSelectionElement is an annotation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation is reconciled by kubevirt-ipam-controller if someone change it it will be restored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative is ovn-k reading the kubevirt vm label to compose the IPAMCLaim name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetworkSelectionElement is an annotation too.
It's part of a standard. I wouldn't put it in the same level.
Can you first please explain to me why you're against just hard-coding this name ? It's by far the simplest solution from my perspective.
Then we can evaluate alternative, sure.
Let's just annotate the virt-launcher pods with ipamclaim name at kubevirt-ipam-controller k8s.ovn.org/ipamclaim-refrence or the like.
I don't like this because it must happen in the webhook and the controller, if we want to put it back in case someone tampers with it.
an alternative is ovn-k reading the kubevirt vm label to compose the IPAMCLaim name.
I don't follow. Which label ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more, I just realized that we need the annotation or label or whatever; otherwise, we'd do this always, depending on the network - and we don't want to do this for pods.
Name: activeNetworkName, | ||
Namespace: pod.Namespace, | ||
} | ||
nadName := fmt.Sprintf("%s/%s", pod.Namespace, activeNetworkName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wouldn't be activeNetworkName
, but the NAD.metadata.Name
.
name: userDefinedNetworkName, | ||
networkName: userDefinedNetworkName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure these values are different, once we're able to access both the network name + NAD name from both ovnkube-controller / ovnkube-control-plane.
if len(netStatus.IPs) > 1 { | ||
return "", fmt.Errorf("attachment for network %q with more than one IP", attachmentName) | ||
} | ||
return netStatus.IPs[0].IP.String(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for dualstack, we'll need to parametrize this. Check what we're doing on secondary nets, and do that. Should just work.
"", | ||
userDefinedNetworkName, // TODO: we need to find a way to have here access to the network name | ||
userDefinedNetworkName, // TODO: we can have the NAD name from the annotations. But the network name is missing. maybe we can add that to the annotations | ||
pr.CNIConf.MTU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a way to grab the MTU from the right network.
I think we need to actually read the NAD on ovnkube-controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now I think the best way for @tssurya 's helper function to just spit out the NAD object.
Then anyone that needs it, can access the metadata.name info / unmarshall the NAD.Spec.Config data into an OVN-Kubernetes configuration object and do whatever they need.
Food for thought.
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
In this test, we start a container running an HTTP service outside the Kubernetes cluster. We then start a client pod in the Kubernetes cluster, with a user defined primary network attachment, and try to access the service located outside the cluster from it. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
74245dc
to
cb92150
Compare
@@ -538,8 +538,10 @@ func (bnc *BaseNetworkController) addLogicalPortToNetwork(pod *kapi.Pod, nadName | |||
// it for the default network as well. If at all possible, keep them | |||
// functionally equivalent going forward. | |||
var annotationUpdated bool | |||
if bnc.IsSecondary() { | |||
if bnc.IsSecondary() || bnc.IsPrimaryNetwork() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to revisit the definition of a primary net; right now, a user defined primary network is a secondary net. So this added or condition is not required.
} | ||
return bsnc.ensurePodForUserDefinedPrimaryNet(pod, network) | ||
} | ||
|
||
on, networkMap, err := util.GetPodNADToNetworkMapping(pod, bsnc.NetInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative would be to set the dummy
NSE created above in this map. Then it would just munch it and spit out the LS+LSP for the user-defined primary network.
pr.nadName, func(annotations map[string]string, nadName string) (*util.PodAnnotation, bool) { | ||
// TODO: we need to simplify this function. | ||
annotation, isReady := annotCondFn(annotations, nadName) | ||
if nadName == "default" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this thing below should only happen IIF the feature flag allows it.
Since this runs on ovnkube, I think we should be able to have access to that.
if hasUserDefinedPrimaryNetwork && userDefinedNetPodInfo != nil { | ||
klog.Infof("creating the user defined net request for %q on pod %q", userDefinedNetworkName, podName) | ||
userDefinedRequest := newUserDefinedPrimaryNetworkPodRequest( | ||
pr.Command, | ||
pod, | ||
pr.SandboxID, | ||
pr.Netns, | ||
"sdn1", // TODO: need to find a name here. This one is OK I guess | ||
userDefinedNetworkName, | ||
userDefinedNetPodInfo.NADName, | ||
) | ||
if _, err := userDefinedRequest.getCNIResult(clientset, userDefinedNetPodInfo); err != nil { | ||
return nil, err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're on unpriviledged more, this code right here must happen in the CNI shim.
This means that probably the CNI server side (running in ovnkube-controller) should reply to the client (shim) w/ whatever extra info it needs, thus avoiding reading more stuff on the shim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is not exactly required / something else is missing, but I can say that if this commit is not present, you'd miss the default route below:
default via 10.128.0.1 dev sdn1
10.96.0.0/16 via 10.128.0.1 dev sdn1
10.128.0.0/16 dev sdn1 proto kernel scope link src 10.128.0.120
10.244.0.0/16 via 10.244.2.1 dev eth0
10.244.2.0/24 dev eth0 proto kernel scope link src 10.244.2.28
100.64.0.0/16 via 10.244.2.1 dev eth0
#4473 will superceed this PR |
What this PR does and why is it needed
This PR tries to highlight all the changes that are needed e2e to plumb a user defined primary network into the pod.
It also provides an e2e test to assert communication over this network.
What the PR essentially does is:
Which issue(s) this PR fixes
Fixes #
Special notes for reviewers
It currently only works iif the name of the NAD matches the name of the attachment. That's pretty much because we don't currently have a way to pass around the name of the network to the CNI side. Or I fail to see it.
How to verify it
Details to documentation updates
Description for the changelog
Does this PR introduce a user-facing change?