Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnumbered BGP Peering design document #2508

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

karampok
Copy link
Contributor

/kind design

What this PR does / why we need it:

Special notes for your reviewer:

In part of this PR there is

  1. A design document about unnumbered BGP peering, discussing implementation and API changes
  2. A draft version of how an e2e test can run, more for discussion rather than actually merging that file in that PR.

My plan would be that first we discuss how the raw frr config would look like, then the API in frr-k8s, then in metallb.

Thanks

Release note:

None

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! great work, did a first pass

design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Show resolved Hide resolved
e2etest/bgptests/unnumber.go Outdated Show resolved Hide resolved
Comment on lines 423 to 410
- Modify the testing infra to include point-to-point links which means
connect containers using VETH, and not through a bridge. We need NET-ADMIN cap (or sudo) when creating the VETH
- Create/Delete peer per specific test not to take resources from runners outside the runtime of the test
- This peer needs different configuration (not IPs, interface names), keep a raw config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc this will all be covered in the initial implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate what you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking if these points will be covered or there's still a question about them, while the last point (about primary interfaces) still needs to be thought of

connect containers using VETH, and not through a bridge. We need NET-ADMIN cap (or sudo) when creating the VETH
- Create/Delete peer per specific test not to take resources from runners outside the runtime of the test
- This peer needs different configuration (not IPs, interface names), keep a raw config
- There is no straightforward way to test unnumbered in the primary node interface. The testing setup (kind creation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some value in adding another worker+container that are peered using unnumbered (no p2p, but there will be no conflicts as they're the only routers in the network using it) to get some feedback on peering with the primary node interface? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to have a single worker as speaker, and have single external container frr connected on the same docker network bridge?

My concern is that locally the linux bridge does not forwarding RA when LLA address is the source (at least in my fedora laptop).

For sure we need to verifiy how that works on primary interface, and especially when the interface has also a non-LLA address configured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that locally the linux bridge does not forwarding RA when LLA address is the source (at least in my fedora laptop).

oh, didn't you run into the "random" peering we talked about?

Do you mean to have a single worker as speaker, and have single external container frr connected on the same docker network bridge?

right (adding another worker / designating one), then the "randomness" will be okay - since there are only 2 peers on the same network using unnumbered they'll always peer with each other

@karampok karampok force-pushed the unnumbered-design branch 3 times, most recently from 411fb01 to ead4263 Compare September 10, 2024 09:18
@karampok karampok force-pushed the unnumbered-design branch 2 times, most recently from 54f2204 to d077dc3 Compare September 16, 2024 09:08
design/unnumbered-bgp.md Show resolved Hide resolved
exit
```

An example ToR configuration when based on Junos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo yes, maybe not our task, but unless I see that real config in real setup, I am not sure our use cases are valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move it to the bottom in an addendum. If we can make this work with frr, it's likely to work with other routers as its part of the spec. Here it's distracting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing it as I was not able to find a real world example

design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
implementation will shadow all the existing errors and will fail in the
runtime.

#### Option 2. Add a new Interface field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the most valid one, and one that matches FRR the most in a sense (since it seems there's a distinction between interfaces and addresses) without locking us. Also, the docs don't need to be as complicated (stating all of the cases) as it's simply following the what frr is doing, that is we can just shortly say, if address is not specified the peering happens over the interface (unnumbered), if address is specified it is assumed the it is lla.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK then, will put that as but we need to decide if to further #2423
and have this

// 1. if address and no interface then as normal
// 2. if non valid address and no interface then fail
// 3. if address is LLA and interface not empty, then LLA support
// == extra FRR directive `neighbor fe80::a876:4dff:fe77:408 interface eth0`
// 4. If address is LLA and empty interface, then fail
// 5. If address is empty and interface has value then the BGP unnumber peering config will take place
// ==  modify FRR directive `neighbor PEER interface remote-as <ASN>`
// && replace any address directive to use interface e.g.  `neighbor net0 timers 180 540`
// && enable RAs on this interface
// 6. If address has value (non-LLA) and interface non empty, then fail
``` 
as the end API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand - implementation wise the issue will be solved for free(?) and documentation wise it seems sufficient to document the Interface field something like "if specified without address then peering happens over the interface (unnumbered), if together with address then it is assumed to be LLA"?

// 6. If address has value (non-LLA) and interface non empty, then fail
```
That option might be not ideal (subjective opinion) because documentation looks complicated,
and user might have to do trial-and-error approach. Plus user needs to understand corner cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user will need to understand the usecases as it does for frr, is there extra complexity beyond that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo yes :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I think it should be more detailed, because it is not really clear atm

design/unnumbered-bgp.md Outdated Show resolved Hide resolved
design/unnumbered-bgp.md Outdated Show resolved Hide resolved
Comment on lines 477 to 476
- Modify the testing infra to include point-to-point links which means connect
containers using VETH, and not through a bridge. We need NET-ADMIN cap (or sudo)
when creating the VETH.

- Create/Delete peer per that specific test not to take resources from runners
outside the runtime of the test. This peer needs different configuration (not
IPs and interface names), keep separate raw configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand, you're proposing a separate test for the unnumbered case or expanding the existing containers setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose a separate isolated test for that use case.
(but no strong argument against expanding container setup).
Do you have a preference here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say if it's possible to have all the tests ran on unnumbered in a clean way it's preferable

@karampok karampok force-pushed the unnumbered-design branch 4 times, most recently from 8ec949e to 1cc83ee Compare September 18, 2024 08:28
Using BGP unnumbered peering, which dynamically discovers IPV6 neighbors,
reduces the burden of configuring all interfaces (on the network fabric
or on the cluster nodes) to have IPv4 addressing just for the BGP peering.
Unnumbered BGP utilizes IPv6 link local address to automatically decide who to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unnumbered BGP utilizes IPv6 link local address to automatically decide who to
Unnumbered BGP utilizes IPv6 link local address to automatically decide which peer to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

reduces the burden of configuring all interfaces (on the network fabric
or on the cluster nodes) to have IPv4 addressing just for the BGP peering.
Unnumbered BGP utilizes IPv6 link local address to automatically decide who to
connect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
connect.
connect to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

BGP Unnumbered allows routers to peer with each other when direct connected
without the need for IPs (config uses interface names). Once peering takes
places, exchange IPv4 or IPv6 prefixes can take place. That feature can be used
by FRR router instance configured by MetalLB to simplify specific setups.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid the last sentence, as we are talking about MetalLB in general and not about frr. We can talk about frr when we go in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed FRR reference.

design/unnumbered-bgp.md Outdated Show resolved Hide resolved
## Limitation/Requirements

- To use unnumbered BGP feature, there must be a link (point-to-point, directly
connected) between the interfaces two BGP daemons are using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
connected) between the interfaces two BGP daemons are using
connected) between the interfaces two BGP speakers are using

Note that the other peer is normally a router

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other peer is a ToR which, which acts both as L2 switch, and L3 router. Do you see any confusion though here?

unnumber: true
```

### Testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Testing
### Test plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

There is no straightforward way to test unnumbered in the primary node
interface. The testing setup (kind creation) creates bridge and connect
containers. Ideally we need to setup a kind cluster, and replace the bridge
with a containers that runs FRR on each other side of the veth, and at the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with a containers that runs FRR on each other side of the veth, and at the same
with a container that runs FRR on each other side of the veth, and at the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

containers. Ideally we need to setup a kind cluster, and replace the bridge
with a containers that runs FRR on each other side of the veth, and at the same
time to provide switch/upstream. Until we find a way to automate that infra setup
we test that scenario manual.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
we test that scenario manual.
we will test that scenario manually.

Manually how? What we can do manually that we can't automate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed is not worth the effort to refactor how we set up the infra.

time to provide switch/upstream. Until we find a way to automate that infra setup
we test that scenario manual.

#### E2E test for [Unnumber on secondary interface](design/unnumbered-bgp.md#story-1-unnumbered-iebgp-on-secondary-node-interface)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### E2E test for [Unnumber on secondary interface](design/unnumbered-bgp.md#story-1-unnumbered-iebgp-on-secondary-node-interface)
#### E2E test for [Unnumbered on secondary interface](design/unnumbered-bgp.md#story-1-unnumbered-iebgp-on-secondary-node-interface)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

connect containers using VETH, and not through a bridge. We need NET-ADMIN cap
(or sudo) when creating the VETH.

##### Option 1 - have a separate test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with this, the majority of the scenarios are functional and should (!!!) be orthogonal to how we establish the session

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, i keep the text as is, one we have option 1 I would say we can merge, but I am in favor to then try to implement also option

Copy link
Contributor Author

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback

BGP Unnumbered allows routers to peer with each other when direct connected
without the need for IPs (config uses interface names). Once peering takes
places, exchange IPv4 or IPv6 prefixes can take place. That feature can be used
by FRR router instance configured by MetalLB to simplify specific setups.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed FRR reference.

Using BGP unnumbered peering, which dynamically discovers IPV6 neighbors,
reduces the burden of configuring all interfaces (on the network fabric
or on the cluster nodes) to have IPv4 addressing just for the BGP peering.
Unnumbered BGP utilizes IPv6 link local address to automatically decide who to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

reduces the burden of configuring all interfaces (on the network fabric
or on the cluster nodes) to have IPv4 addressing just for the BGP peering.
Unnumbered BGP utilizes IPv6 link local address to automatically decide who to
connect.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## Limitation/Requirements

- To use unnumbered BGP feature, there must be a link (point-to-point, directly
connected) between the interfaces two BGP daemons are using
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other peer is a ToR which, which acts both as L2 switch, and L3 router. Do you see any confusion though here?


- To use unnumbered BGP feature, there must be a link (point-to-point, directly
connected) between the interfaces two BGP daemons are using
- IPv6 must be enabled on that interfaces, and in specific LLA (link Local IPv6 address) enabled.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

connect containers using VETH, and not through a bridge. We need NET-ADMIN cap
(or sudo) when creating the VETH.

##### Option 1 - have a separate test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, i keep the text as is, one we have option 1 I would say we can merge, but I am in favor to then try to implement also option

nexthop via inet6 fe80::dcad:beff:feff:1162 dev eth02 weight 1
```

### The API Extension
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### The API Extension

We have three option:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three options how the API could look like, but that was removed.


#### BGP RouterID

//TODO is there a requirement for that?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expected if you could explain me how the RouterID is important or not. Example when we have single IPv6 stack. But I have that for discussion, removed


#### Loopback IPv4 address

Linux sends and receives IPv4 packets over an interface that does not have an IPv4 address.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO, explain better

@karampok karampok marked this pull request as ready for review October 2, 2024 10:18
@karampok karampok force-pushed the unnumbered-design branch 4 times, most recently from 074ad7f to d3c9b53 Compare October 3, 2024 09:33
@karampok
Copy link
Contributor Author

karampok commented Oct 3, 2024

@fedepaol @oribon I have updated the doc and I think I have addressed all your feedback. Please take a look, thanks

- Make it work over no point-to-point connection
- Define/Discuss dual ToR architecture, on top of bond interfaces or other HA strategies
- [Dynamic BGP Peering](https://ipwithease.com/dynamic-bgp-peering/) is a
feature different than unnumbered BGP and not supported by Metallb. Both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the "not supported by MetalLB" (it might be in the future, and unrelated to this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

feature different than unnumbered BGP and not supported by Metallb. Both
feature reduce the configuration needed for the BGP peering. The main difference is
that Dynamic BGP peering still needs IP addresses and does not require point-to-point links.
- Support `remote-as` to accept `<external|internal|auto>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also remove this (seems unrelated)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

We can introduce a new field `Interface string` which if defined then Unnumber
BGP takes place. In that case, the address must NOT have value. For
completeness, we could use field might be used to support `LLA%zone` [user case](https://github.com/metallb/metallb/issues/2423).
So at the end, the API documentation will look like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the other comment, I think this could be a lot shorter (no need to specify everything and its consequence)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I prefer to remove once @fedepaol take another feedback and he is ok with the consequence or not (we could decide no to support the LLA)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on unnumbered and maybe implement LLA afterwards if we need / want to.
What's important here is that the API is compatible with that. If we want to mention it, let's add a subsection in the bottom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed the reference to LLA

We can introduce a new field `Interface string` which if defined then Unnumber
BGP takes place. In that case, the address must NOT have value. For
completeness, we could use field might be used to support `LLA%zone` [user case](https://github.com/metallb/metallb/issues/2423).
So at the end, the API documentation will look like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on unnumbered and maybe implement LLA afterwards if we need / want to.
What's important here is that the API is compatible with that. If we want to mention it, let's add a subsection in the bottom


### Option 2. Add a new Interface field

We can introduce a new field `Interface string` which if defined then Unnumber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the proposed change, let's reword with A new interface field will be introduced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## The API Changes

### Option 2. Add a new Interface field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an option anymore, but the proposed change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: karampok <karampok@gmail.com>
@fedepaol
Copy link
Member

LGTM, thanks for this!

@fedepaol fedepaol merged commit abad99e into metallb:main Oct 16, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants