-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional recommended alerts #1135
Comments
@kannappanr some assistance:
Most of the recommended list as discussed does not appear in cluster metrics. They do appear for the node endpoint:
We had previously discussed de-emphasizing the node-level metrics because they should be included in the cluster endpoint as a rollup - is this a bug? cc/ @donatello @shtripat as I think you both have some experience here |
https://github.com/minio/minio/blob/master/docs/metrics/prometheus/list.md#drive-metrics basically very few of these seem to roll up properly |
Partially addresses #1135 To consider: I added the tabs as part of step 3 of the procedure, but we might want to consider having a recommended alerts section separate from the procedure, perhaps above the "Dashboards" heading. Let me know your thoughts.
@kannappanr can you please assist here? |
This might be somewhat resolved with metrics v3, but until we've had enough time for customers to roll past that, we will need to maintain both:
And then fixups to ensure that node-level metrics are rolled up appropriately |
On metrics v3:
Node metric
Node metric
Node metric
Node metric
Node metric
Node metric
|
@kannappanr @anjalshireesh was there still progress on addressing the metrics v2 rollups above, or should we just proceed with documenting the node-level ones for now? Otherwise we can just focus on the cluster rollups that do work and drop the rest until v3 stabilizes. |
re: v2 rollup, customer reported these metrics were "missing" after upgrade because they are now found under
|
@kannappanr @anjalshireesh are we generally going to leave metrics v2 as-is for now then, and focus metrics v3? Our attempt to document the recommended alerts gets flaky because we do not list the |
see also minio/minio#19932 |
Summary
From an internal discussion, we should expand the alerting page to include the following list of recommended metrics:
minio_node_drive_free_bytes
minio_node_drive_free_inodes
minio_node_drive_latency_us
minio_node_drive_offline_total
minio_node_drive_online_total
minio_node_drive_total
minio_node_drive_total_bytes
minio_node_drive_used_bytes
minio_node_drive_errors_timeout
minio_node_drive_errors_availability
minio_node_drive_io_waiting
There's a lot of metrics here and the page already has some examples, so I'm thinking we can use a tab setup of something like
To help constrain the default length of the procedure.
Goals
List the in-scope goals
Non-Goals
Extensive testing of Prometheus + Alert Manager w/ the above metrics
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: