Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[LI-HOTFIX] Fix race condition that blocks alterIsr request queue (#451)
TICKET = LIKAFKA-52126 LIKAFKA-52185 EXIT_CRITERIA = Until rebasing to a release that includes apache#13159 The patch is included as a part of apache#13159. There's a race condition that can cause an unchecked exception thrown ``` 2023/04/11 00:45:31.342 ERROR [NetworkClient] [BrokerToControllerChannelManager broker=5678 name=alterIsr] [kafka-server] [] [BrokerToControllerChannelManager broker=5678 name=alterIsr] Uncaught error in request completion: java.lang.IllegalStateException: No entry found for connection 5674 ``` in this loop ```scala activeControllerAddress().foreach { controllerAddress => { networkClient.disconnect(controllerAddress.idString) }} ``` When this happens, the following line is not executed, ```scala requestQueue.putFirst(queueItem) ``` and the `queueItem` is lost forever. This cause the corresponding [`clearInFlightRequest()`](https://github.com/linkedin/kafka/blob/bb63ee6a4d375d7dd2cef7109acf21562640e17a/core/src/main/scala/kafka/server/BrokerToControllerRequestManager.scala#L135) never executed, and the **whole alterIsrRequest queue is blocked forever**. This basically ruins the foundation of Kafka's monitoring and operations, because almost everything relies on ISR expansion, and fail to expand ISR basically prevents us from doing any kind of surgery.
- Loading branch information