-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubeAPIDown not working as intended if targets a set of clusters #825
Comments
+1 best approach I think is compare with history. if apiserver disappear then raise alert. Another technic is . move out only "up" rule to separtated group and deploy it per cluster. |
This is tough when considering auto-scaling node groups. For example, if a node is scaled down and removed intentionally, that shouldn't trigger an alert. So taking every single instance into account seems difficult. However, you could try and assert that at least one instance of the API server job is present in each cluster with a query like: # This query lists all clusters found by kube_node_info, and marks them as either
# 1 or 0 depending on if they have up{job="kube-apiserver"}, or not (respectively).
#
# List all clusters and mark them value: 0
# {cluster="my-cluster-without-apiserver-job"} 0
1 - group by (cluster) (max by (cluster, node) (kube_node_info{cluster!=""}))
unless on (cluster) (
# except those clusters with kube-apiserver
group by (cluster) (up{job="kube-apiserver", cluster!=""})
)
# List all clusters with kube-apiserver and mark them with value: 1
or on (cluster) (
# {cluster="my-cluster-without-apiserver-job"} 0
group by (cluster) (max by (cluster, node) (kube_node_info{cluster!=""}))
) But this is use-case dependent. Some users would want ALL clusters to have the apiserver job, which is fairly easy to alert on (look for anything with a value of zero). However, some users would want apiserver on only certain clusters, which likely needs the query to be modified to match only the subset of clusters which are intended to have apiserver job. |
This issue has not had any activity in the past 30 days, so the
Thank you for your contributions! |
hi,
I get the following rule when generating prometheus alerts for kubeapi:
The issue that I'm running into is, that my prometheus instance reads data for several clusters, meaning if I add this rule, it doesn't work as intended because the alert will not trigger as long as there is any KubeAPI that is up.
I could create a rule for each cluster, but I'd like to avoid hard-coding.
Have you run into a similar situation and what would you suggest for such use case ?
Thank you,
The text was updated successfully, but these errors were encountered: