You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the Kubernetes dashboards provided in this repository, and I appreciate the work that has gone into creating them. However, I’ve noticed that some of the metrics currently include non-running pods, which can lead to inaccurate resource usage and performance insights.
Specifically, I would like to request the following changes to ensure that PromQL queries related to resource requests and limits (specifically kube_pod_container_resource_requests and kube_pod_container_resource_limits) explicitly filter out non-running pods. For example, metrics like kube_pod_status_phase should include a check for phase="Running".
Hi we have imilar issue for memory/cpu metrics. When pod is restarted, metrics for it persists for a little while, thus creating peaks in cpu/memory metrics since they are summing 2 metrics together. We have added id to expression to separate them.
Current expression:
"sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\", pod=~\"$pod\", image!=\"\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container)"
Changed expression: "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\", pod=~\"$pod\", image!=\"\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container,id)"
Describe the enhancement you'd like
I have been using the Kubernetes dashboards provided in this repository, and I appreciate the work that has gone into creating them. However, I’ve noticed that some of the metrics currently include non-running pods, which can lead to inaccurate resource usage and performance insights.
Specifically, I would like to request the following changes to ensure that PromQL queries related to resource requests and limits (specifically
kube_pod_container_resource_requests
andkube_pod_container_resource_limits
) explicitly filter out non-running pods. For example, metrics likekube_pod_status_phase
should include a check forphase="Running"
.Current expressions:
sum(kube_pod_container_resource_requests{namespace=~"$namespace", resource="cpu", cluster="$cluster"})
sum(kube_pod_container_resource_limits{namespace=~"$namespace", resource="memory", cluster="$cluster"})
Proposed modifications:
sum(kube_pod_container_resource_requests{namespace=~"$namespace", resource="cpu"} * on(namespace, pod) group_left() (sum(kube_pod_status_phase{phase="Running", cluster="$cluster"}) by (pod, namespace) == 1))
sum(kube_pod_container_resource_limits{namespace=~"$namespace", resource="memory"} * on(namespace, pod) group_left() (sum(kube_pod_status_phase{phase="Running", cluster="$cluster"}) by (pod, namespace) == 1))
Similar modifications should be applied to all relevant metrics to accurately reflect the state of running pods.
Additional context
No response
The text was updated successfully, but these errors were encountered: