Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers #111

Open
wants to merge 6 commits into
base: 2.4-li
Choose a base branch
from

Conversation

gitlw
Copy link

@gitlw gitlw commented Jan 7, 2021

If we expand partitions using the AdminZkClient, currently there is no logic to avoid assigning replicas to preferred controllers.
This PR fixes the issue in the following two places:

  1. It avoids assignment of replicas to preferred controllers and maintenance brokers within the AdminZkClient
  2. In case someone is using an non-patched client and still assigns replicas to preferred controllers, the KafkaController will call rearrangePartitionReplicaAssignmentForNewPartitions to remove the preferred controllers from the assignment.

Testing Stretegy:
An unit test is added to ensure the problem doesn't happen again.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gitlw gitlw requested a review from xiowu0 January 7, 2021 17:54
val topicsToBeRearranged = zkClient.getPartitionAssignmentForTopics(topicsToCheck.toSet).filter {
case (topic, partitionMap) =>
val existingAssignment = controllerContext.partitionAssignments.getOrElse(topic, mutable.Map.empty)
val newPartitions = partitionMap.filter{case (partitionId, _) => partitionId >= existingAssignment.size}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add some safe check here to ensure the partition state znode doesn't exist for these new partitions. Although I don't see an issue in the current implementation, if this function is used incorrectly, it would be very dangerous. For example, if we use "rearrangePartitionReplicaAssignmentForNewPartitions(topics, false)" when initializing controller context, this may cause all topics to get reassigned since controllerContext.partitionAssignments is an empty set (this will also result in orphan partitions).
Again, I don't see the issue in this implementation, but I think it is worth checking.

In addition, this doesn't address the case when there is controller move right after partition expansion (before the rearrange actually happened). I think it is ok since we are not trying to handle 100% no replicas in the preferred controller. Maybe we can add some comments for this unhandled situations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. It's true that they are all newPartitions given controllerContext.partitionAssignments is an empty set, but they shouldn't have a replica on the noNewPartitionBrokers, thus they shouldn't be rearranged.

It's a good point that the current implementation cannot handle controller switches. Given it's safe to scan all topics' partitions on controller switch, I think it should be done during a controller switch. Thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual assignment for existing topics can still assign partitions to noNewPartitionBrokers. I think we cannot guarantee that noNewPartitionBrokers won't get replicas. In addition, there are some small-time window that new replicas can still get assigned to preferred controllers due to the fact that preferred controller znode is emphermal znode (see the design doc for more detail).

I think a safer way is to rely on the existence of partition state znode when performing rearrangement.

scan all topics' partitions on controller switch => If we cannot guarantee 100% no replica in the preferred controllers, I think it is ok to not performing special handing during controller switch given the overhead/additional hacky code needed (up to you to make a decision).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer look, I find that there is already logic to handle the controller switch over by calling the rearrangePartitionReplicaAssignmentForNewPartitions method inside initializeControllerContext.

Regarding the preferred controller znodes being ephemeral, it's kinda an orthogonal design issue that we could address independently.

@gitlw gitlw force-pushed the no_replica_on_preferred_controller branch from 0126dbe to 38e9116 Compare May 26, 2021 23:14
@@ -1755,6 +1758,7 @@ class KafkaController(val config: KafkaConfig,
}

if (!isActive) return
rearrangePartitionReplicaAssignmentForNewPartitions(immutable.Set(topic))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it conflict with partition reassignment?
say if a partition gets reassigned to replica ( 1, 2, 5, 6 ) ==> if one of this replica is maintenance brokers, would this reassignment complete if we automatically changing zk node to disallow placing replicas on maintenance brokers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I've updated the PR to avoid assignment replicas to the undesirable hosts during partition reassignment. Please take another look. Thanks!

@gitlw gitlw changed the title [LI-HOTFIX] Avoid assigning replicas to the preferred controllers [LI-HOTFIX] Avoid assigning replicas to the preferred controllers or maintenance brokers May 28, 2021
reassignments.foreach { case (tp, targetReplicas) =>
if (replicasAreValid(tp, targetReplicas)) {
if (replicasAreValid(tp, targetReplicas, noNewPartitionBrokers)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still have some race condition here, because a broker can be marked as maintenance brokers after partition reassignment request is received before the reassignment request is completed

@efeg
Copy link

efeg commented May 29, 2021

Quick Update on Review: Had an offline chat with @gitlw regarding whether we may want to address the second point (i.e. 2. In case someone is using an non-patched client and still assigns replicas to preferred controllers, the KafkaController will call rearrangePartitionReplicaAssignmentForNewPartitions to remove the preferred controllers from the assignment.) in another way. We will chat more about the pros and cons (cc @xiowu0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants