Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader stepping down with membership change #246

Open
jabolina opened this issue Jan 21, 2024 · 2 comments · May be fixed by #248
Open

Leader stepping down with membership change #246

jabolina opened this issue Jan 21, 2024 · 2 comments · May be fixed by #248
Milestone

Comments

@jabolina
Copy link
Member

Our election algorithms build on top of JGroups' view changes. Removing the leader through membership operations only changes the Raft cluster members, but the view is unchanged.

A leader removing itself does not step down and can still replicate operations! After the leader is removed and the operation committed, the leader needs to step down, and an election round needs to be initiated.

I still need to investigate how to address this. The Raft dissertation §3.10 has a leader transfer extension, which might give us some hints. The solution should not affect the election mechanism by view changes.

@jabolina jabolina added this to the 1.0.13 milestone Jan 21, 2024
@jabolina
Copy link
Member Author

In the Raft dissertation, §4.2.2 describes removing the current leader. The suggestion is to utilize the leadership transfer extension. We carry with the membership operation and the leader steps down after it is committed. This is necessary to make progress, it could be difficult to step down and then remove the node, as it would require an election to happen first.

@jabolina
Copy link
Member Author

Going through the leader transfer mechanism in §3.10, I do not find it interesting to include. Quoting: "we have not currently implemented or evaluated this leadership transfer approach.". However, we can utilize some of the ideas.

A node can only be a leader if it has an up-to-date log. Since the membership operation also goes through consensus. After it is committed, we are sure we have nodes with up-to-date logs. We could start an election round on the current leader and utilize only the current member list to restrict.

The tricky part is handling the pending requests, where we might have pending requests during the membership operations. The simplest solution would be to complete everything exceptionally. The complex solution would be to make the current leader enqueue requests and redirect everything after a new leader is elected.

The latter would require changes mostly to the REDIRECT protocol. We could return a specific error code to retry.

@jabolina jabolina linked a pull request Jan 28, 2024 that will close this issue
@jabolina jabolina modified the milestones: 1.0.13, 1.0.14 Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant