-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expire dedups on tick
#297
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #297 +/- ##
==========================================
- Coverage 89.75% 89.73% -0.03%
==========================================
Files 22 22
Lines 3241 3263 +22
==========================================
+ Hits 2909 2928 +19
- Misses 332 335 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I think it should be easier to start with the other optimization (which also must bump the machine version) and this PR should depend on rabbitmq/ra#474 so I'll mark this as a draft for now. We should be able to come back to it after the other PR is merged. |
abc3ebe
to
a106501
Compare
a106501
to
504ffd9
Compare
Ra 2.10.1 introduced a new `handle_aux/5` callback that takes the place of `handle_aux/6`. Instead of passing the log state and machine state separately, this API passes a new `ra_aux:internal_state()` opaque argument which you can read log or machine state out of with helper functions from the `ra_aux` module. This commit only updates to use the new callback: there should be no functional change from this commit.
This will be used in the child commit to write a test which sets a low `tick_timeout` configuration.
As part of the `khepri_machine:post_apply/2` helper which is run after any command is applied, we use `maps:filter/2` to eliminate any entries in the `dedups` field of the machine state which are expired according to the command's timestamp. This `drop_expired_dedups/2` step becomes a bottleneck though when a Khepri store handles many transactions at once. For example in RabbitMQ, queue deletion is done with a transaction submitted from each queue process. When many (for example five thousand) queues are deleted at once, `drop_expired_dedups/2` becomes a noticeable chunk of a flamegraph and specifically the `maps:filter/2` within. `maps:filter/2` is slow here because the BIF used to implement it collects a list of all key-value pairs for which the predicate returns true, sorts it, and then creates a new hashmap from it. We are unlikely to expire any given dedup when handling a command, especially when submitting many commands at once, so we end up essentially calling `maps:from_list(maps:to_list(Dedups))`. It is a small improvement to replace `maps:filter/2` with `maps:fold/3`, a case expression and `maps:remove/2` (reflecting that we will always fold over the map but rarely remove elements) but it is not enough to eliminate this rather large chunk of the flamegraph. Instead the solution in this commit is to move the detection of expired dedups to the "tick" aux effect. Ra emits this effect periodically: every 1s by default but configurable. By moving this detection to `handle_aux/5` we remove it from the "hot path" of `apply/3`. Even better though: we are unlikely to actually need to expire any dedups. In the case of queue deletion in RabbitMQ for example, we are likely to handle all in-flight transactions and handle the subsequent `#dedup_ack{}` commands before even evaluating a `tick` effect. If we do handle a tick effect while transactions are in-flight then we are unlikely to need to expire them anyways so we will only scan the map with the new `khepri_utils:maps_any/2` helper. If we need to expire dedups, the aux handler for the `tick` effect appends a new `#expire_dedups{}` command to the log which does the same as `drop_expired_dedups/2` prior to this commit.
504ffd9
to
c522a88
Compare
This change moves the work we do in
khepri_machine:drop_expired_dedups/2
out ofapply/3
and into the handler for Ra's periodictick
aux effect.The commit message has more details but the TLDR is that the
maps:filter/2
withindrop_expired_dedups/2
can become expensive if many transactions are being handled before their dedups are received. Instead we can determine if any dedups are expired in the handler for thetick
effect and expire any by committing a new#expire_dedups{}
command.Also included is a change to use the new
handle_aux/5
API introduced in Ra 2.10 but that change shouldn't have any effect on behavior.