-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Performance tuning" docs should be changed to prevent users from having performance-related incidents #2017
Comments
The template specifically says it is required and helps us identify the platform and settings that led to the case you describe. We can't commit cycles to debug you issue when you don't invest time into collecting the data. https://operator.docs.scylladb.com/stable/support/must-gather.html |
Got it, sorry. Attached! |
For GCP local SSDs it's important to disable disk writeback cache, otherwise disks are not capable of handling optimized load. Operator only does it when Scylla Enterprise is used as utility image. If you have a license, switch image in |
Thanks @zimnx! But do you mean we should apply the perftune and disable disk writeback cache for the tuning to work as expected? |
yes |
In my opinion, now if scylladb/seastar#2350 would be implemented, the "experimental" info and the "needs testing in non-production and is not easy to revert" warning could be removed. But the main problem here has been resolved, so this issue can remain closed. Thank you for the understanding and your help, @tnozicka and @zimnx! |
@gdubicki - can you clarify if the issue was indeed the writeback cache? |
What happened?
We have tried following "Performance tuning" docs to optimize our cluster and after doing it our cluster performance has become very, very bad.
In particular, the average write times have increased from ~500ms to ~3000ms (so became about 6 x higher) while the 95 percentile has increased from ~2500ms to ~17500ms (about 7 x higher)!
The read times have been affected as well, although less painfully.
Please see the screenshot from our metrics. The "optimizations" have been applied a bit before 21:00 here.
This has led to a multiple days-long performance incident that took hours to revert.
We don't want anyone else to run into this problem ever again.
What did you expect to happen?
I am talking about this style warning:
perftune.py
does should become easily revertible, seeperftune.py
should allow rollback / revert the changes it made seastar#2350, if the "Performace tuning" should become a non-experimental feature. Preferably, the Scylla operator should then enable running it in revert mode.How can we reproduce it (as minimally and precisely as possible)?
Read the docs https://operator.docs.scylladb.com/stable/performance.html
Scylla Operator version
1.13.0
Kubernetes platform name and version
Please attach the must-gather archive.
I don't think it matters here, because the perftune.py performance reduction itself is being looked into in scylladb/seastar#2350, but here it is, if needed:
scylla-operator-must-gather-9zqx6hqsn6zb.zip
This is from after the revert of the perftune settings. For more info about what perftune did, please see scylladb/seastar#2350.
Anything else we need to know?
See also scylladb/seastar#2350 for some more info about our setup.
The text was updated successfully, but these errors were encountered: