Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TCP_UDP to NLB TargetGroups and Listeners (rebase) #3807

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lyda
Copy link

@lyda lyda commented Aug 13, 2024

This is a work in progress - I need to test it.

Issue

#1608 (comment)

And based on this PR: #2275

Description

Previously, aws-load-balancer-controller ignored extra overlapping
ServicePorts defined in the Kubernetes Service spec if the external port
numbers were the same even if the protocols were different (e.g. TCP:53,
UDP:53).

This behavior prevented users from exposing services that support TCP
and UDP on the same external load balancer port number.

This patch solves the problem by detecting when a user defines multiple
ServicePorts for the same external load balancer port number but using
TCP and UDP protocols separately. In such situations, a TCP_UDP
TargetGroup and Listener are created and SecurityGroup rules are
updated accordingly. If more than two ServicePorts are defined, only the
first two mergeable ServicePorts are used. Otherwise, the first
ServicePort is used.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lyda
Once this PR has been reviewed and has the lgtm label, please assign m00nf1sh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 13, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @lyda!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @lyda. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 13, 2024
@danielloader
Copy link

danielloader commented Aug 20, 2024

Thanks for taking this up, I get to subscribe to another pull request now. Hope you have much better luck than the last one.

Just in time for HTTP3 going more mainstream!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
@lyda lyda force-pushed the tcpudp-lyda branch 4 times, most recently from b025fee to 23e2f43 Compare October 18, 2024 11:06
@lyda
Copy link
Author

lyda commented Oct 18, 2024

OK, I think I've addressed the issues with the rebase. But the verify step is now failing. Any ideas why?

@lyda
Copy link
Author

lyda commented Nov 4, 2024

@M00nF1sh / @oliviassss : Hey there, I think this is done and reviewable. I've run it in a test EKS system and it does what I needed it to do. However I'm not an expert in this code base or with load balancers so would love feedback from yourselves. Thank you!

@lyda lyda force-pushed the tcpudp-lyda branch 3 times, most recently from 1fb8e33 to 2ab64b4 Compare November 6, 2024 06:26
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 6, 2024
@lyda
Copy link
Author

lyda commented Nov 11, 2024

I've done manual testing and it worked fine. There really aren't docs to update. The information that the current aws-load-balancer-controller doesn't support listening to UDP and TCP on the same port number is information you'll find in the issues.

Is there a slack/discord/irc channel I can chat about this to address any remaining issues?

@z0rc
Copy link

z0rc commented Nov 11, 2024

Is there a slack/discord/irc channel I can chat about this to address any remaining issues?

I believe developers (I'm not one of them) are somewhat reachable via https://slack.k8s.io/, channel #provider-aws.

@@ -87,6 +87,9 @@ const (

// NetworkingProtocolUDP is the UDP protocol.
NetworkingProtocolUDP NetworkingProtocol = "UDP"

// NetworkingProtocolTCP_UDP is the TCP_UDP protocol.
NetworkingProtocolTCP_UDP NetworkingProtocol = "TCP_UDP"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this.
the networking rules here is meant for security groups(which don't have a tcp_udp rule protocol)

instead of TCP_UDP, we should just generate two rules in TGB's networking, one for TCP and another one for UDP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but we do. This is for AWS load balancer target groups, not security groups, and as much I find it annoying, TCP_UDP is a protocol for AWS load balancers. It's used in the single case where the TCP and UDP port numbers are the same. So, for example, if you're implementing a domain name server and need to listen on port 53 for both TCP and UDP, the load balancer must be configured to have a target group protocol of TCP_UDP. I have no idea why they did this, but they did.

}
}

// execute build listener
for _, port := range t.service.Spec.Ports {
Copy link
Collaborator

@M00nF1sh M00nF1sh Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just loop over portMap instead of t.service.Spec.Ports. (we don't need this portMap[key] check this way)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's trying to find the case of the same port number being listened to on TCP and UDP. portMap is empty and is being used to find duplicate port numbers.

@@ -234,3 +256,24 @@ func (t *defaultModelBuildTask) buildListenerAttributes(ctx context.Context, svc
}
return attributes, nil
}

func mergeServicePortsForListener(ports []corev1.ServicePort) corev1.ServicePort {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works but have two issues:

  1. we are hacking the k8s corev1.ServicePort concept by introduce a new corev1.Protocol("TCP_UDP"), which don't exists in k8s, and in theory is an malformed `corev1.ServicePort. (TCP_UDP is not a validate Protocol enum)
  2. this don't generate a warning/error if unsupported protocol/protocol conbination is found.
  3. this exists as a "glue" code to make the existing buildListeners function happy, but in the long run would make the code harder to understand and maintain.

Instead, i think a better option would be:

  1. modify buildListeners to accept a list of corev1.ServicePort. inside buildListeners, it first validate whether this list of corev1.ServicePort can form a single listener(and error out when cannot), and generate listener of correct protocol.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, that's what buildListeners now does. It loops through the corev1.ServicePorts and determines if there's a case where the same port number exists for TCP and UDP. It then synthesises a single corev1.ServicePort from the original two that uses the TCP_UDP protocol that AWS load balancers support.

Either way it then calls buildListener.

I could change everything from buildListener on down to use an internal data structure that includes a slice of corev1.ServicePorts as well as the protocol being used for the load balancer. That seems like a rather complex change. And it will still have the rats nest of conditionals the current code has. Plus all corev1.ServicePort stuff would operate off the first element - so might as well just include a single corev1.ServicePort. And at that point all I've done is avoided abusing an enum.

That said, point number two is definitely valid and I'm now returning errors for a few scenarios up the chain.

@TBBle
Copy link
Contributor

TBBle commented Dec 4, 2024

Does it make sense to mark this PR as closing #1608? That should cause GitHub to link to this PR from that ticket, rather than relying on the comments. (It might also keep k8s-ci-robot from closing that issue as stale while this PR is in-progress... maybe?)

I note that this PR isn't solving the actual use-case for which that ticket was opened, but the bulk of the discussion in that ticket was for this use-case, so tying them together makes sense to me. A solution for the original use-case would revolve around some kind of new LoadBalancerInstance CRD feature and is basically orthogonal to this PR.

@lyda
Copy link
Author

lyda commented Dec 4, 2024

Awesome, thanks for the review. I'll work through these this week.

@lyda
Copy link
Author

lyda commented Dec 5, 2024

Does it make sense to mark this PR as closing #1608? That should cause GitHub to link to this PR from that ticket, rather than relying on the comments. (It might also keep k8s-ci-robot from closing that issue as stale while this PR is in-progress... maybe?)

I note that this PR isn't solving the actual use-case for which that ticket was opened, but the bulk of the discussion in that ticket was for this use-case, so tying them together makes sense to me. A solution for the original use-case would revolve around some kind of new LoadBalancerInstance CRD feature and is basically orthogonal to this PR.

Good point. I'll turn that off because this is completely different to the original issue and just addresses a comment within. Sorry I missed that.

Copy link
Author

@lyda lyda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I've gone through them all. I see your point on buildListener but I don't think the change will make it any better. The code will still have to pick through the three "protocols" (I really do dislike AWS's TCP_UDP "feature" here so I have to put quotes around protocols at least once) and that's going to make the config translation code more complex than just the binary choices before.

The real decision is: do you want to support this feature for the few of us who need it and pay the complexity price or do you not. I completely understand either choice. I would prefer the former obviously because I have zero interest in maintaining a fork, but that's a me problem, not a you problem.

@@ -87,6 +87,9 @@ const (

// NetworkingProtocolUDP is the UDP protocol.
NetworkingProtocolUDP NetworkingProtocol = "UDP"

// NetworkingProtocolTCP_UDP is the TCP_UDP protocol.
NetworkingProtocolTCP_UDP NetworkingProtocol = "TCP_UDP"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but we do. This is for AWS load balancer target groups, not security groups, and as much I find it annoying, TCP_UDP is a protocol for AWS load balancers. It's used in the single case where the TCP and UDP port numbers are the same. So, for example, if you're implementing a domain name server and need to listen on port 53 for both TCP and UDP, the load balancer must be configured to have a target group protocol of TCP_UDP. I have no idea why they did this, but they did.

}
}

// execute build listener
for _, port := range t.service.Spec.Ports {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's trying to find the case of the same port number being listened to on TCP and UDP. portMap is empty and is being used to find duplicate port numbers.

@@ -234,3 +256,24 @@ func (t *defaultModelBuildTask) buildListenerAttributes(ctx context.Context, svc
}
return attributes, nil
}

func mergeServicePortsForListener(ports []corev1.ServicePort) corev1.ServicePort {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, that's what buildListeners now does. It loops through the corev1.ServicePorts and determines if there's a case where the same port number exists for TCP and UDP. It then synthesises a single corev1.ServicePort from the original two that uses the TCP_UDP protocol that AWS load balancers support.

Either way it then calls buildListener.

I could change everything from buildListener on down to use an internal data structure that includes a slice of corev1.ServicePorts as well as the protocol being used for the load balancer. That seems like a rather complex change. And it will still have the rats nest of conditionals the current code has. Plus all corev1.ServicePort stuff would operate off the first element - so might as well just include a single corev1.ServicePort. And at that point all I've done is avoided abusing an enum.

That said, point number two is definitely valid and I'm now returning errors for a few scenarios up the chain.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 7, 2024
M00nF1sh and others added 5 commits December 9, 2024 10:48
Previously, aws-load-balancer-controller ignored extra overlapping
ServicePorts defined in the Kubernetes Service spec if the external port
numbers were the same even if the protocols were different (e.g. TCP:53,
UDP:53).

This behavior prevented users from exposing services that support TCP
and UDP on the same external load balancer port number.

This patch solves the problem by detecting when a user defines multiple
ServicePorts for the same external load balancer port number but using
TCP and UDP protocols separately. In such situations, a TCP_UDP
TargetGroup and Listener are created and SecurityGroup rules are
updated accordingly. If more than two ServicePorts are defined, only the
first two mergeable ServicePorts are used. Otherwise, the first
ServicePort is used.

Note: rebasing errors would be my fault -- Kevin Lyda

Signed-off-by: Kevin Lyda <lyda@titanhq.com>
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants