-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: etcdserver: separate "raft log compact" from snapshot #18235
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
Hi @clement2026. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest |
server/etcdserver/server.go
Outdated
compacti := uint64(1) | ||
if snapi > s.Cfg.SnapshotCatchUpEntries { | ||
compacti = snapi - s.Cfg.SnapshotCatchUpEntries | ||
compacti := uint64(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change from 1 to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/etcd-io/raft/blob/ee19bbe7f9bb16fd7d874a6c08dfac319596d2ad/storage.go#L258-L260
if compactIndex > ms.lastIndex() {
getLogger().Panicf("compact %d is out of bound lastindex(%d)", compactIndex, ms.lastIndex())
}
A panic occurred at this line during local test runs. It appears the initial value of ms.lastIndex() is 0, so compacti
should default to zero.
In the main branch, a panic doesn't occur because ms
always contains more than one element when compaction starts, which isn't the case in this patch.
Great work, the change look good, but as any change in the etcd apply loop we need to be careful here.
Would be good to investigate, would you be able to collect CPU&Memory profiles? |
Found by robustness tests:
|
Thanks! I understand the importance of the apply loop. I'll investigate it and provide CPU & Memory profiles after resolving the bug and test failures. |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest |
Thanks for fixing test, please note this PR is small, but might have big impact on etcd behavior, it's review might take a while. Would be great to repeat the tests and see the profile difference to understand performance difference. cc @ahrtr |
Thanks for checking in. I temporarily fixed the test to test my ideas, but it might cause a serious memory leak. Currently, I'm developing an optimized version to properly fixed the test. I’ll update you once it's done and provide the CPU and memory profiles. |
…all ongoing snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>
abcae5f
to
93d0484
Compare
/retest |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
adb970c
to
302254b
Compare
/retest |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest |
Signed-off-by: Clement <gh.2lgqz@aleeas.com>
/retest |
I'm looking into #17098 @serathius and made changes to separate the compact logic from the snapshot logic.
I ran
rw-benchmark.sh
several times to observe performance changes. I expected this patch to slow down etcd, but it seems to have improved rw-benchmark.sh results. I’m not sure how. Similar results were observed on both cloud VMs (8vCPU, 16GB Memory, NVMe) and a bare-metal machine (8CPU, 32GB Memory, SSD).Since the
rw-benchmark.sh
script was taking forever to finish, I adjusted the parameters to complete the benchmark within a few hours.Below are the script and the results from running it on a Vultr bare-metal machine (8 CPUs, 32GB Memory, SSD, Ubuntu 24.04 LTS x64).
main.csv
issue-17098-patch.csv