Leader stepping down with membership change #246

jabolina · 2024-01-21T15:11:57Z

Our election algorithms build on top of JGroups' view changes. Removing the leader through membership operations only changes the Raft cluster members, but the view is unchanged.

A leader removing itself does not step down and can still replicate operations! After the leader is removed and the operation committed, the leader needs to step down, and an election round needs to be initiated.

I still need to investigate how to address this. The Raft dissertation §3.10 has a leader transfer extension, which might give us some hints. The solution should not affect the election mechanism by view changes.

jabolina · 2024-01-21T16:09:25Z

In the Raft dissertation, §4.2.2 describes removing the current leader. The suggestion is to utilize the leadership transfer extension. We carry with the membership operation and the leader steps down after it is committed. This is necessary to make progress, it could be difficult to step down and then remove the node, as it would require an election to happen first.

jabolina · 2024-01-21T18:35:43Z

Going through the leader transfer mechanism in §3.10, I do not find it interesting to include. Quoting: "we have not currently implemented or evaluated this leadership transfer approach.". However, we can utilize some of the ideas.

A node can only be a leader if it has an up-to-date log. Since the membership operation also goes through consensus. After it is committed, we are sure we have nodes with up-to-date logs. We could start an election round on the current leader and utilize only the current member list to restrict.

The tricky part is handling the pending requests, where we might have pending requests during the membership operations. The simplest solution would be to complete everything exceptionally. The complex solution would be to make the current leader enqueue requests and redirect everything after a new leader is elected.

The latter would require changes mostly to the REDIRECT protocol. We could return a specific error code to retry.

jabolina added this to the 1.0.13 milestone Jan 21, 2024

jabolina linked a pull request Jan 28, 2024 that will close this issue

Leader step down when removing itself #248

Open

jabolina modified the milestones: 1.0.13, 1.0.14 Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader stepping down with membership change #246

Leader stepping down with membership change #246

jabolina commented Jan 21, 2024

jabolina commented Jan 21, 2024

jabolina commented Jan 21, 2024

Leader stepping down with membership change #246

Leader stepping down with membership change #246

Comments

jabolina commented Jan 21, 2024

jabolina commented Jan 21, 2024

jabolina commented Jan 21, 2024