You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environmental Info:
K3s Version:
v1.30.6+k3s1 but earlier releases are affected as well... probably going back to May/June when the load-balancer stuff was last worked on.
Node(s) CPU architecture, OS, and Version:
n/a
Cluster Configuration:
Minimally, 2 etcd-only, 1 control-plane-only.
Will affect any split-role cluster with more than 1 etcd-only node.
Do NOT use fixed registration endpoint, but register all nodes against the first etcd-only node.
Describe the bug:
When all control-plane nodes are unavailable, the apiserver load-balancer on the secondary etcd-only nodes will fall back to the default server, which is the initial etcd-only node. When the control-plane nodes come back, the load-balancer does not close the connections to the etcd-only node, which leaves the kubelet and internal controllers connected to the etcd-only node, continuously logging apiserver disabled errors.
Nov 13 01:56:40 systemd-node-2 k3s[874]: E1113 01:56:40.186329 874 kubelet_node_status.go:544] "Error updating node status, will retry" err="error getting node \"systemd-node-2\": apiserver disabled"
Nov 13 01:56:40 systemd-node-2 k3s[874]: W1113 01:56:40.218182 874 reflector.go:547] k8s.io/client-go@v1.30.6-k3s1/tools/cache/reflector.go:232: failed to list *v1.Node: apiserver disabled
Nov 13 01:56:40 systemd-node-2 k3s[874]: E1113 01:56:40.218211 874 reflector.go:150] k8s.io/client-go@v1.30.6-k3s1/tools/cache/reflector.go:232: Failed to watch *v1.Node: failed to list *v1.Node: apiserver disabled
Nov 13 01:56:40 systemd-node-2 k3s[874]: E1113 01:56:40.742852 874 leaderelection.go:347] error retrieving resource lock kube-system/k3s-cloud-controller-manager: apiserver disabled
Nov 13 01:56:41 systemd-node-2 k3s[874]: E1113 01:56:41.411435 874 webhook.go:154] Failed to make webhook authenticator request: apiserver disabled
Nov 13 01:56:41 systemd-node-2 k3s[874]: E1113 01:56:41.411485 874 server.go:304] "Unable to authenticate the request due to an error" err="apiserver disabled"
Nov 13 01:56:43 systemd-node-2 k3s[874]: E1113 01:56:43.621369 874 controller.go:145] "Failed to ensure lease exists, will retry" err="apiserver disabled" interval="7s"
Steps To Reproduce:
Start a etcd-only node
Start another etcd-only node, joined to the first etcd-only node
Start a control-plane-only node, joined to the first etcd-only node
Once the cluster is up, restart k3s on the control-plane-only node
Note that after a short period of time, the second etcd-only node goes NotReady
Restart k3s on either of the etcd-only nodes. Note that all nodes become Ready again
Expected behavior:
All nodes reconnect to apiserver after outage
Actual behavior:
Secondary etcd-only nodes fail over to the default server (an etcd-only node with no apiserver) and get stuck there.
Environmental Info:
K3s Version:
v1.30.6+k3s1 but earlier releases are affected as well... probably going back to May/June when the load-balancer stuff was last worked on.
Node(s) CPU architecture, OS, and Version:
n/a
Cluster Configuration:
Minimally, 2 etcd-only, 1 control-plane-only.
Will affect any split-role cluster with more than 1 etcd-only node.
Do NOT use fixed registration endpoint, but register all nodes against the first etcd-only node.
Describe the bug:
When all control-plane nodes are unavailable, the apiserver load-balancer on the secondary etcd-only nodes will fall back to the default server, which is the initial etcd-only node. When the control-plane nodes come back, the load-balancer does not close the connections to the etcd-only node, which leaves the kubelet and internal controllers connected to the etcd-only node, continuously logging
apiserver disabled
errors.Steps To Reproduce:
Expected behavior:
All nodes reconnect to apiserver after outage
Actual behavior:
Secondary etcd-only nodes fail over to the default server (an etcd-only node with no apiserver) and get stuck there.
Additional context / logs:
cc @ShylajaDevadiga
The text was updated successfully, but these errors were encountered: