KubeAPIDown not working as intended if targets a set of clusters #825

thunko · 2023-02-13T19:00:36Z

hi,

I get the following rule when generating prometheus alerts for kubeapi:

- "alert": "KubeAPIDown"
    "annotations":
      "description": "KubeAPI has disappeared from Prometheus target discovery."
      "runbook_url": "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapidown"
      "summary": "Target disappeared from Prometheus target discovery."
    "expr": |
      absent(up{job="kube-apiserver"} == 1)
    "for": "15m"
    "labels":
      "severity": "critical"

The issue that I'm running into is, that my prometheus instance reads data for several clusters, meaning if I add this rule, it doesn't work as intended because the alert will not trigger as long as there is any KubeAPI that is up.
I could create a rule for each cluster, but I'd like to avoid hard-coding.

Have you run into a similar situation and what would you suggest for such use case ?
Thank you,

The text was updated successfully, but these errors were encountered:

zoftdev · 2024-08-30T07:17:07Z

+1 best approach I think is compare with history. if apiserver disappear then raise alert.

Another technic is . move out only "up" rule to separtated group and deploy it per cluster.
this way we have common rule and each-cluster rule.

skl · 2024-08-30T14:23:08Z

This is tough when considering auto-scaling node groups. For example, if a node is scaled down and removed intentionally, that shouldn't trigger an alert. So taking every single instance into account seems difficult.

However, you could try and assert that at least one instance of the API server job is present in each cluster with a query like:

# This query lists all clusters found by kube_node_info, and marks them as either
# 1 or 0 depending on if they have up{job="kube-apiserver"}, or not (respectively).
#
# List all clusters and mark them value: 0
# {cluster="my-cluster-without-apiserver-job"} 0
1 - group by (cluster) (max by (cluster, node) (kube_node_info{cluster!=""}))
unless on (cluster) (
  # except those clusters with kube-apiserver
  group by (cluster) (up{job="kube-apiserver", cluster!=""})
)
# List all clusters with kube-apiserver and mark them with value: 1
or on (cluster) (
  # {cluster="my-cluster-without-apiserver-job"} 0
  group by (cluster) (max by (cluster, node) (kube_node_info{cluster!=""}))
)

But this is use-case dependent.

Some users would want ALL clusters to have the apiserver job, which is fairly easy to alert on (look for anything with a value of zero).

However, some users would want apiserver on only certain clusters, which likely needs the query to be modified to match only the subset of clusters which are intended to have apiserver job.

github-actions · 2024-09-30T00:25:40Z

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

The stale label will be removed if there is new activity
The issue will be closed in 7 days if there is no new activity
Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

github-actions bot added the stale label Sep 30, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubeAPIDown not working as intended if targets a set of clusters #825

KubeAPIDown not working as intended if targets a set of clusters #825

thunko commented Feb 13, 2023

zoftdev commented Aug 30, 2024 •

edited

Loading

skl commented Aug 30, 2024

github-actions bot commented Sep 30, 2024

KubeAPIDown not working as intended if targets a set of clusters #825

KubeAPIDown not working as intended if targets a set of clusters #825

Comments

thunko commented Feb 13, 2023

zoftdev commented Aug 30, 2024 • edited Loading

skl commented Aug 30, 2024

github-actions bot commented Sep 30, 2024

zoftdev commented Aug 30, 2024 •

edited

Loading