-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: evpn: not coerce VTEP IP to IPv4 in nh_list #16341
Conversation
d5dfcb8
to
e316c87
Compare
e316c87
to
08fc718
Compare
I don't understand what does this failure mean
It is just on one architecture? Is it related to the change?... |
Ignore it, I rerun failed tests only. |
But still, could you fix frrbot check (styling)? |
08fc718
to
f5be337
Compare
Done. I did not do it intially because it asked to change some adjacent code that my change did not touch. Did now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but IMO this should also be covered by a topotest, what do you think?
I agree that it would be useful, but as I am not acquainted with the testing framework that this project is using, I am not sure that I will be able to do it in reasonable time... |
Is this the same case encountered in #14808? |
Looks like, and there is a topotest already. Anyway, it would be great if @chiragshah6 could help here reviewing. |
Looks like so! And such a shame that I did not find it and had to redo all the work myself. But even bigger shame that your PR was not merged back then. Would have saved us quite some trouble that we had in production... edit: @leonshaw , do you think that my change looks a bit cleaner? But, I am no judge, you guys decide which to take! The main thing is to have the bug fixed. |
I think you are correct. It's cleaner to have both IPv4/IPv6 nexthops explicitly. BTW, you can cherry-pick the topotest part if you like. |
Note that withdrawing IPv6 route should not affect IPv4. Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
In L3 BGP-EVPN, if there are both IPv4 and IPv6 routes in the VPN, zebra maintains two instances of `struct zebra_neigh` object: one with IPv4 address of the nexthop, and another with IPv6 address that is an IPv4 mapped to IPv6, but only one intance of `struct zebra_mac` object, that contains a list of nexthop addresses that use this mac. The code in `zebra_vxlan` module uses the fact that the list is empty as the indication that the `zebra_mac` object is unused, and needs to be dropped. However, preexisting code used nexthop address converted to IPv4 notation for the element of this list. As a result, when two `zebra_neigh` objects, one IPv4 and one IPv6-mapped-IPv4 were linked to the `zebra_mac` object, only one element was added to the list. Consequently, when one of the two `zebra_neigh` objects was dropped, the only element in the list was removed, making it empty, and `zebra_mac` object was dropped, and neigbrour cache elements uninstalled from the kernel. As a result, after the last route in _one_ family was removed from a remote vtep, all remaining routes in the _other_ family became unreachable, because RMAC of the vtep was removed. This commit makes `zebra_mac` use uncoerced IP address of the `zebra_neigh` object for the entries in the `nh_list`. This way, `zebra_mac` object no longer loses track of `zebra_neigh` objects that need it. Bug-URL: FRRouting#16340 Signed-off-by: Eugene Crosser <crosser@average.org>
f5be337
to
f883c71
Compare
Thanks! Cherry-picked, and put in front of the fix commit (to make it easier to reproduce the problem with with old code in place). (I could not make topotests run locally though, they require docker, I installed it, but it gives me some permission errors... Hopefully CI will be able to run them.) edit: I managed to make docker work on the worksation, but the test inside busyloops |
I miss the proposed backport of this PR in |
In L3 BGP-EVPN, if there are both IPv4 and IPv6 routes in the VPN, zebra maintains two instances of
struct zebra_neigh
object: one with IPv4 address of the nexthop, and another with IPv6 address that is an IPv4 mapped to IPv6, but only one intance ofstruct zebra_mac
object, that contains a list of nexthop addresses that use this mac.The code in
zebra_vxlan
module uses the fact that the list is empty as the indication that thezebra_mac
object is unused, and needs to be dropped. However, preexisting code used nexthop address converted to IPv4 notation for the element of this list. As a result, when twozebra_neigh
objects, one IPv4 and one IPv6-mapped-IPv4 were linked to thezebra_mac
object, only one element was added to the list. Consequently, when one of the twozebra_neigh
objects was dropped, the only element in the list was removed, making it empty, andzebra_mac
object was dropped, and neigbrour cache elements uninstalled from the kernel.As a result, after the last route in one family was removed from a remote vtep, all remaining routes in the other family became unreachable, because RMAC of the vtep was removed.
This commit makes
zebra_mac
use uncoerced IP address of thezebra_neigh
object for the entries in thenh_list
. This way,zebra_mac
object no longer loses track ofzebra_neigh
objects that need it.Bug-ID: 16340