Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User defined primary network canary impl #4459

Closed

Conversation

maiqueb
Copy link
Contributor

@maiqueb maiqueb commented Jun 20, 2024

What this PR does and why is it needed

This PR tries to highlight all the changes that are needed e2e to plumb a user defined primary network into the pod.

It also provides an e2e test to assert communication over this network.

What the PR essentially does is:

  • set the annotations for the user defined primary
  • consume those annotations to create the LS/LSP for the user defined primary
  • consume those annotations in the CNI side to configure both interfaces (default cluster net + user defined primary net)
  • provides an e2e test for east/west over the user defined attachment.

Which issue(s) this PR fixes

Fixes #

Special notes for reviewers

It currently only works iif the name of the NAD matches the name of the attachment. That's pretty much because we don't currently have a way to pass around the name of the network to the CNI side. Or I fail to see it.

How to verify it

Details to documentation updates

Description for the changelog

Does this PR introduce a user-facing change?


qinqon and others added 11 commits June 18, 2024 14:48
The namespace active-network annotation is handled at the NAD
controller, since namespaces are common to the cluster they need to be
updated at a common places, in the case of IC this is control-manager.
This change add the namespace informer to it so nad controller can check
if the annotation is already set.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
At IC the controller manager will be responsible of of patching
namespace with active-network, so it needs permissions to do so.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Add a net attach def manager system on react on creation/deletion of
NAds, this will be used to attach at network segmentation namespace manager
that reacts on NAD events.

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This change ensure that there are no multiple primary network at the
same namespace, that there are no pods before activate the primary
network and update the namesapce with active-network if everything is
fine else it will annotate with "unknown", Also namespace is created
with active-network="default".

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This change add the following scenarios:
- Delete a duplicate primary network nad
- Delete primary network nad with no pods
- Delete primary network nad with pods

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
This commit adds the primary network boolean
field to a pod's annotation info so that this
information can be used to tell if the pod
is managed by the default network OR
secondary network. This will be useful
in future commits and also for other feature
controller's to leverage in the future.

k8s.ovn.org/pod-networks: '{
  "default":{
    "ip_addresses":["10.244.1.4/24"],
    "mac_address":"0a:58:0a:f4:01:04",
    "routes":[{"dest":"10.244.0.0/16","nextHop":"10.244.1.1"},{"dest":"100.64.0.0/16","nextHop":"10.244.1.1"}],
    "ip_address":"10.244.1.4/24",
    "primary":false
  },
  "ns2/l2-network":{
    "ip_addresses":["10.100.200.9/24"],
    "mac_address":"0a:58:0a:64:c8:09",
    "ip_address":"10.100.200.9/24",
    "tunnel_id":2,
    "primary":true
  }
}'

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
This commit ensures that for L3 and L2 by default
all traffic should leave via the default gatewayIP
of the podsubnet that is part of the user defined
secondary network which is the delegated primary
network of the pod.

L3:
surya@fedora:~/user-defined-networks$ oc rsh -n ns1 tinypod
/ # ip r
default via 10.128.0.1 dev net1
10.96.0.0/16 via 10.128.0.1 dev net1
10.128.0.0/24 dev net1 proto kernel scope link src 10.128.0.4
10.128.0.0/16 via 10.128.0.1 dev net1
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.3
100.64.0.0/16 via 10.244.1.1 dev eth0

L2:
/ # ip r
10.96.0.0/16 via 10.100.200.1 dev net1
10.100.200.0/24 dev net1 proto kernel scope link src 10.100.200.9
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.4
100.64.0.0/16 via 10.244.1.1 dev eth0

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
@github-actions github-actions bot added area/unit-testing Issues related to adding/updating unit tests area/e2e-testing labels Jun 20, 2024
Comment on lines +66 to +68

// TODO: this should be a watch grabbed from the watch factory
nadClient networkattchmentdefclientset.Interface
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a leftover; but we'll need the NAD watcher to plumb into the allocator, so we can figure out what's the namespace's primary network

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did this already in my PR there is NAD watcher

Comment on lines +139 to +143
ns, err := a.namespaceLister.Get(pod.Namespace)
if err != nil {
return err
}
activeNetworkName, hasAnActiveNetwork := ns.Annotations["k8s.ovn.org/active-network"]
Copy link
Contributor Author

@maiqueb maiqueb Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting on @tssurya 's helper to identify the namespace's user defined primary net.

Meaning, this code should be replaced by @tssurya 's helper.

)
if hasAnActiveNetwork && activeNetworkName != "default" {
network := &nettypes.NetworkSelectionElement{
// TODO: below we need to have the NAD name, *not* the network name
Copy link
Contributor Author

@maiqueb maiqueb Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tssurya we need your helper to give us the NAD name since that's what the ovnkube-controller uses to match on the ovn-k annotations.

Plus, keep in mind the network is cluster scoped, while the NAD is namespace scoped. Trying to say that <ns>/<network name> doesn't make sense.

Comment on lines +240 to +241
// TODO: we need to get the NAD name here, not the network name.
Name: bsnc.GetNetworkName(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tssurya I assume your helper will be available here.

We also need to figure out what's the NAD name here, in order to build the network selection element.

@@ -235,6 +235,15 @@ func (bsnc *BaseSecondaryNetworkController) ensurePodForSecondaryNetwork(pod *ka
return err
}

if bsnc.IsPrimaryNetwork() && (bsnc.TopologyType() == "layer2" || bsnc.TopologyType() == "layer3") {
network := &nadapi.NetworkSelectionElement{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinqon we'll need to come up w/ a way to figure out the name of the IPAMClaim for a persistent IPs user defined network.

Maybe we can just hard-code it ?... Like ... <vm name>.user-defined-primary-net

... or something like that. The kubevirt controller would create them w/ that name if there is a NAD on the namespace of the VM w/ the primary knob enabled.

Copy link
Contributor

@qinqon qinqon Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just annotate the virt-launcher pods with ipamclaim name at kubevirt-ipam-controller k8s.ovn.org/ipamclaim-refrence or the like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we do that ? aren't we actually saying everywhere else that annotation / labeling based solutions are bad, and should not be used ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkSelectionElement is an annotation too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation is reconciled by kubevirt-ipam-controller if someone change it it will be restored

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative is ovn-k reading the kubevirt vm label to compose the IPAMCLaim name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetworkSelectionElement is an annotation too.

It's part of a standard. I wouldn't put it in the same level.

Can you first please explain to me why you're against just hard-coding this name ? It's by far the simplest solution from my perspective.

Then we can evaluate alternative, sure.

Let's just annotate the virt-launcher pods with ipamclaim name at kubevirt-ipam-controller k8s.ovn.org/ipamclaim-refrence or the like.

I don't like this because it must happen in the webhook and the controller, if we want to put it back in case someone tampers with it.

an alternative is ovn-k reading the kubevirt vm label to compose the IPAMCLaim name.

I don't follow. Which label ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more, I just realized that we need the annotation or label or whatever; otherwise, we'd do this always, depending on the network - and we don't want to do this for pods.

Name: activeNetworkName,
Namespace: pod.Namespace,
}
nadName := fmt.Sprintf("%s/%s", pod.Namespace, activeNetworkName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wouldn't be activeNetworkName, but the NAD.metadata.Name.

Comment on lines +345 to +346
name: userDefinedNetworkName,
networkName: userDefinedNetworkName,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure these values are different, once we're able to access both the network name + NAD name from both ovnkube-controller / ovnkube-control-plane.

if len(netStatus.IPs) > 1 {
return "", fmt.Errorf("attachment for network %q with more than one IP", attachmentName)
}
return netStatus.IPs[0].IP.String(), nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dualstack, we'll need to parametrize this. Check what we're doing on secondary nets, and do that. Should just work.

"",
userDefinedNetworkName, // TODO: we need to find a way to have here access to the network name
userDefinedNetworkName, // TODO: we can have the NAD name from the annotations. But the network name is missing. maybe we can add that to the annotations
pr.CNIConf.MTU,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a way to grab the MTU from the right network.

I think we need to actually read the NAD on ovnkube-controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now I think the best way for @tssurya 's helper function to just spit out the NAD object.

Then anyone that needs it, can access the metadata.name info / unmarshall the NAD.Spec.Config data into an OVN-Kubernetes configuration object and do whatever they need.

Food for thought.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
In this test, we start a container running an HTTP service outside the
Kubernetes cluster.

We then start a client pod in the Kubernetes cluster, with a user
defined primary network attachment, and try to access the service
located outside the cluster from it.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
@maiqueb maiqueb force-pushed the user-defined-primary-network-canary-impl branch from 74245dc to cb92150 Compare June 21, 2024 10:28
@tssurya tssurya added feature/user-defined-network-segmentation All PRs related to User defined network segmentation area/core-kubernetes Issues related to core kubernetes like pods, services, endpoints, endpointslices labels Jun 21, 2024
@tssurya tssurya added the feature/pods All issues related to the PodAPI label Jun 21, 2024
@@ -538,8 +538,10 @@ func (bnc *BaseNetworkController) addLogicalPortToNetwork(pod *kapi.Pod, nadName
// it for the default network as well. If at all possible, keep them
// functionally equivalent going forward.
var annotationUpdated bool
if bnc.IsSecondary() {
if bnc.IsSecondary() || bnc.IsPrimaryNetwork() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to revisit the definition of a primary net; right now, a user defined primary network is a secondary net. So this added or condition is not required.

}
return bsnc.ensurePodForUserDefinedPrimaryNet(pod, network)
}

on, networkMap, err := util.GetPodNADToNetworkMapping(pod, bsnc.NetInfo)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative would be to set the dummy NSE created above in this map. Then it would just munch it and spit out the LS+LSP for the user-defined primary network.

pr.nadName, func(annotations map[string]string, nadName string) (*util.PodAnnotation, bool) {
// TODO: we need to simplify this function.
annotation, isReady := annotCondFn(annotations, nadName)
if nadName == "default" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this thing below should only happen IIF the feature flag allows it.

Since this runs on ovnkube, I think we should be able to have access to that.

Comment on lines +210 to +224
if hasUserDefinedPrimaryNetwork && userDefinedNetPodInfo != nil {
klog.Infof("creating the user defined net request for %q on pod %q", userDefinedNetworkName, podName)
userDefinedRequest := newUserDefinedPrimaryNetworkPodRequest(
pr.Command,
pod,
pr.SandboxID,
pr.Netns,
"sdn1", // TODO: need to find a name here. This one is OK I guess
userDefinedNetworkName,
userDefinedNetPodInfo.NADName,
)
if _, err := userDefinedRequest.getCNIResult(clientset, userDefinedNetPodInfo); err != nil {
return nil, err
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're on unpriviledged more, this code right here must happen in the CNI shim.

This means that probably the CNI server side (running in ovnkube-controller) should reply to the client (shim) w/ whatever extra info it needs, thus avoiding reading more stuff on the shim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this is not exactly required / something else is missing, but I can say that if this commit is not present, you'd miss the default route below:

default via 10.128.0.1 dev sdn1 
10.96.0.0/16 via 10.128.0.1 dev sdn1 
10.128.0.0/16 dev sdn1 proto kernel scope link src 10.128.0.120 
10.244.0.0/16 via 10.244.2.1 dev eth0 
10.244.2.0/24 dev eth0 proto kernel scope link src 10.244.2.28 
100.64.0.0/16 via 10.244.2.1 dev eth0 

@tssurya
Copy link
Member

tssurya commented Jul 1, 2024

#4473 will superceed this PR

@tssurya
Copy link
Member

tssurya commented Jul 8, 2024

closing in favor of #4473 where @maiqueb and @qinqon are working together to get it prod ready

@tssurya tssurya closed this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core-kubernetes Issues related to core kubernetes like pods, services, endpoints, endpointslices area/e2e-testing area/unit-testing Issues related to adding/updating unit tests feature/pods All issues related to the PodAPI feature/user-defined-network-segmentation All PRs related to User defined network segmentation
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants