You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.
Problem
Currently, there is only one offset committer thread that acknowledges the successful consumption back to Kafka. As per beast architecture, the Consumer, BQ Workers, and Acknowledger threads work independently and are connected by blocking queues.
The push operation on blocking queues which put the Kafka messages to the queue is not indefinitely blocking, instead, there is a timeout specified for getting a free slot on the queue to push the batch of Kafka messages.
Since we can spawn any number of BQ Workers, the Commit Queue processed by Acknowledger gets full and even with sufficiently high timeouts, the commit queue stays full because of the high load of messages on Acknowledger.
We require a mechanism to increase the processing capacity of Acknowledger thread so that it doesn't become the bottleneck for the application.
Approaches
Wait indefinitely for adding batch to commit queue
Currently, we are only waiting for some time to get a slot in commit queue, if the queue is still full then, the process exits.
One idea is that we can wait indefinitely to push data to the commit queue. By doing this, even though the queue gets full, process doesn't restart.
Disadvantages:
We push data in the queue in a synchronous fashion. So if the push to commit queue takes long time, we are bottle necked on this and we are essentially using only one thread to push data to bigquery. This results in big performance degradation and diverges from beast philosophy of scaling.
Batch commits
Currently, we are sending one acknowledgement per batch.
The idea is to club the acknowledgements for a certain period of time and then send an acknowledgement.
With this approach of batch commit, We need to make sure that there is no data loss.
The text was updated successfully, but these errors were encountered:
Problem
Currently, there is only one offset committer thread that acknowledges the successful consumption back to Kafka. As per beast architecture, the Consumer, BQ Workers, and Acknowledger threads work independently and are connected by blocking queues.
The push operation on blocking queues which put the Kafka messages to the queue is not indefinitely blocking, instead, there is a timeout specified for getting a free slot on the queue to push the batch of Kafka messages.
Since we can spawn any number of BQ Workers, the Commit Queue processed by Acknowledger gets full and even with sufficiently high timeouts, the commit queue stays full because of the high load of messages on Acknowledger.
We require a mechanism to increase the processing capacity of Acknowledger thread so that it doesn't become the bottleneck for the application.
Approaches
Currently, we are only waiting for some time to get a slot in commit queue, if the queue is still full then, the process exits.
One idea is that we can wait indefinitely to push data to the commit queue. By doing this, even though the queue gets full, process doesn't restart.
Disadvantages:
We push data in the queue in a synchronous fashion. So if the push to commit queue takes long time, we are bottle necked on this and we are essentially using only one thread to push data to bigquery. This results in big performance degradation and diverges from beast philosophy of scaling.
Currently, we are sending one acknowledgement per batch.
The idea is to club the acknowledgements for a certain period of time and then send an acknowledgement.
With this approach of batch commit, We need to make sure that there is no data loss.
The text was updated successfully, but these errors were encountered: