✨ Add leader election for kcp (core) controllers #2996

embik · 2023-07-18T14:35:29Z

Summary

This PR adds leader elections for controllers started by kcp (core). As the leader election should be done by shard, the code uses the system:admin shard-local workspace. Namespace and Lease name within that namespace can be optionally configured via additional flags, but it defaults to kube-system/kcp-controllers.

This PR introduces some changes to how controllers are registered, mostly so they can start in a single post-start-hook and be wrapped by a leader election instead of all of them starting in their own hook.

My idea for leader election was: Since the kcp process is both serving the api and running the controllers, there is no need for it to crash when it loses a leader election. This is not kube-controller-manager. Instead leader election runs in a continuous loop and even if the leader lock is lost, it is possible that the same kcp process wins leader election at a later point and restarts the controllers that were previously shut down after losing the leader election.

While writing this code, I learned a lot about the kcp server code and ran into the following things that will not be addressed by this PR. It has been sitting in draft for several months now, and I believe that it cannot be all fixed at once:

The phase 0 + phase 1 bootstrapping runs outside of leader election, which means it will run unconditionally when starting a second kcp process. I've been thinking about this a lot (see feature: add leader election for controllers #2216 (comment)) and I believe that the bootstrapping has to be reworked (maybe into a controller) before it can be wrapped into a leader election.
Controllers started by the cache server are also not governed by leader election. I've been looking into the cache server, but the controllers seem to be started on the "generic" code part in upstream libraries, and I'm really confused why they don't seem to integrate leader elections. I want to explore this further, but in the mean time the cache server would need to be run as standalone if you really want to make sure no controllers run in parallel. This might be a requirement anyway because I'm not sure how deeply embedded the cache server is (do the two kcp instances have separate cache servers? seems possible but I don't know).
Running two kcp instances create unhappiness about the Lease object that some kube-apiserver code creates in system:admin. Both instances will continuously log the object has been modified for that. I haven't seen this affecting anything, but who knows.

Still, two kcp instances pointing to the same etcd are functional with this PR. I tested e.g. new workspaces being initialised properly, then terminating one of the instances, and after leader election times out the other instance assumes leadership and starts controllers, also initialising workspaces created afterwards.

Related issue(s)

Fixes #2216

Release Notes

Add leader elections to kcp (core) controllers that can enabled by passing `--enable-leader-election`

kcp-ci-bot · 2023-07-18T14:35:35Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

MadhavJivrajani · 2023-07-25T08:51:23Z

/cc

MadhavJivrajani

Hey @embik 👋🏼
Finally getting around to reviewing this, apologies for the delay.

From an initial pass, I just have a couple of questions around leader state transitions and what happens when those occur.

For the sake of completeness, here is the document that we were working on to compare how controllers were "managed" by kcm and kcp: https://hackmd.io/bZhJ7dtoT4qfqf8q9u9hng?view

The way kcm handles OnStoppedLeading answers my questions, but I'm not sure if we can do that since afaict, the kcp server is run in the foreground as a standalone binary, so we probably can't exit and rely on a daemon to bring it back.

MadhavJivrajani · 2023-07-31T12:07:04Z

pkg/server/server.go

+			OnStartedLeading: func(ctx context.Context) {
+				logger.Info("started leading")
+
+				for controllerName, controller := range s.controllers {
+					logger := logger.WithValues("controller", controllerName)
+					go s.runController(ctx, controller, logger)
+				}
+			},
+			OnStoppedLeading: func() {
+				logger.Info("stopped leading")
+			},


I'm trying to work through the scenario which looks like:

starts leading -> stops leading -> starts leading

If this were to happen, we end up re-spawning all registered controllers.

My understanding - and it might be flawed - is that all controllers should terminate because the context passed to them is canceled upon losing a leader election. So the idea for me was that re-spawning all controllers is exactly what we want when winning a leader election.

However, I haven't gotten around to actually test this yet. So I could be wrong here.

MadhavJivrajani · 2023-07-31T12:13:42Z

pkg/server/server.go

+		Name:     "kcp-controllers",
+	})
+
+	logger.Info("leader election has stopped")


RunOrDie exits either when some error occurs (through a panic), the ctx is cancelled or if the lease couldn't be renewed. If lease couldn't be renewed, we exit this function entirely and don't re-enter it again.

This is a good point - any thoughts on what should happen in that case? We could let the kcp process crash so it gets restarted or put this in a loop, but I'm anxious to just let it keep crashing again and again if there is a bigger problem.

I have chosen to implement a continuous loop around the leader election, so that if a kcp instance loses the leader role it will restart RunOrDie (until the overarching context is canceled).

pkg/server/server.go

embik · 2023-09-25T08:07:41Z

/retest

I suspect it's a flake this time.

embik · 2023-09-25T10:04:20Z

/cc @xrstf

xrstf · 2023-09-26T06:40:40Z

/lgtm

kcp-ci-bot · 2023-09-26T06:40:45Z

LGTM label has been added.

Git tree hash: c22f0988f4562225a962b53a567a85aa04bc5f27

embik · 2023-09-26T06:43:07Z

/cc @sttts

pkg/server/controllers.go

pkg/server/server.go

Signed-off-by: Marvin Beckers <marvin@kubermatic.com>

embik · 2023-09-26T12:33:55Z

/retest

embik · 2023-09-26T12:48:48Z

/retest

sttts · 2023-09-26T13:50:56Z

/lgtm
/approve

kcp-ci-bot · 2023-09-26T13:51:01Z

LGTM label has been added.

Git tree hash: 6630e01b14399c334fdbf16fac5d3fd24af7674f

kcp-ci-bot · 2023-09-26T13:51:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sttts]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kcp-ci-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Jul 18, 2023

kcp-ci-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 18, 2023

embik changed the title ~~✨ Add leader elections for controllers~~ ✨ Add leader election for controllers Jul 18, 2023

kcp-ci-bot requested a review from MadhavJivrajani July 25, 2023 08:51

MadhavJivrajani reviewed Jul 31, 2023

View reviewed changes

embik mentioned this pull request Sep 18, 2023

feature: add leader election for controllers #2216

Closed

1 task

embik force-pushed the leader-election branch 4 times, most recently from e76617e to 19028c8 Compare September 22, 2023 12:54

embik changed the title ~~✨ Add leader election for controllers~~ ✨ Add leader election for kcp (core) controllers Sep 22, 2023

embik marked this pull request as ready for review September 22, 2023 13:12

kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2023

embik marked this pull request as draft September 22, 2023 13:15

kcp-ci-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2023

embik marked this pull request as ready for review September 22, 2023 13:30

kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2023

embik force-pushed the leader-election branch 4 times, most recently from f5bcadc to 30586d1 Compare September 25, 2023 07:39

kcp-ci-bot requested a review from xrstf September 25, 2023 10:04

kcp-ci-bot assigned xrstf Sep 26, 2023

kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2023

kcp-ci-bot requested a review from sttts September 26, 2023 06:43