Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 6.82 KB

OTTOSCALR_METRICS.md

File metadata and controls

44 lines (32 loc) · 6.82 KB

Ottoscalr metrics

All the standard kubebuilder controller metrics as mentioned here are available for ottoscalr controllers.

Apart from these, following metrics are exported by ottoscalr which can be used to monitor and configure alerts.

The ServiceMonitor is also bundled with the helm chart and can be deployed optionally if your monitoring stack is based on KubePrometheus. If the ServiceMonitor is deployed, these metrics will be prefixed with ottoscalr_

Metric name Metric type Description Labels/tags
policyreco_current_policy_max gauge Current Max replica count to be applied to the HPA policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_current_policy_min gauge Current Min replica count to be applied to the HPA policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_current_policy_utilization gauge Current CPU utilization threshold to be applied to the HPA policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_target_policy_max gauge Max replica count recommended to be applied to the HPA. policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_target_policy_min gauge Min replica count recommended to be applied to the HPA. policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_target_policy_max gauge CPU utilization threshold recommended to be applied to the HPA. policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_workload_info gauge Information about the policyrecommendation policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
workload=<policyrecommendation-workload>
workloadKind=<workloadType(Deployment,Rollout)>
policyreco_reconciler_conditions gauge Metric for checking the status of different conditions of .policyrecommendation.status policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
status=<True,False>
type=<RecoTaskProgress,TargetRecoAchieved>
policyreco_reconciler_task_progress_reason gauge Metric for checking the reason for condition type RecoTaskProgress. policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
reason=<RecoTaskErrored,RecoTaskRecommendationGenerated;
policyreco_reconciled_count counter Number of times a policyrecommendation has been reconciled policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_reconciler_errored_count counter Number of times a policyrecommendation's reconciliation has errored policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
policyreco_reconciler_targetreco_slo_days histogram Time taken for a policy reco to achieve the target recommendation in days policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
minimum_percentage_of_datapoints_present gauge If minimum percentage of datapoints is present to generate recommendation. workload=<deployment-name>
namespace=<policyrecommendation-namespace>
reason=<RecoTaskErrored,RecoTaskRecommendationGenerated;
p8s_query_error_count counter Error counter for a query made to prometheus query=<query-type>
p8sinstance=<prometheusinstance-name>
p8s_query_success_count counter Success counter for a query made to prometheus query=<query-type>
p8sinstance=<prometheusinstance-name>
p8s_concurrent_queries gauge Number of concurrent p8s api calls for a query query=<query-type>
p8sinstance=<prometheusinstance-name>
datapoints_fetched_by_p8s_instance gauge Number of datapoints fetched for a query for a workload from a prometheus instance query=<query-type>
p8sinstance=<prometheusinstance-name>
workload=<deployment-name>
namespace=<policyrecommendation-namespace>
total_datapoints_fetched gauge Total Number of datapoints fetched for a query for a workload after aggregating from all the prometheus instances query=<query-type>
workload=<deployment-name>
namespace=<policyrecommendation-namespace>
prometheus_scraper_query_latency histogram Time to execute prometheus scraper query in seconds query=<query-type>
p8sinstance=<prometheusinstance-name>
workload=<deployment-name>
namespace=<policyrecommendation-namespace>
get_avg_cpu_utilization_query_latency_seconds histogram Total Time to execute utilization datapoint query in seconds policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
workload=<deployment-name>
workloadKind=<workloadType(Deployment,Rollout)>
get_reco_generation_latency_seconds histogram Total time to generate policyrecommendation for a workload once it's execution is started policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
workload=<deployment-name>
workloadKind=<workloadType(Deployment,Rollout)>
breachmonitor_breached gauge If a particular workload has breached the cpu redline or not policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
workload=<deployment-name>
workloadKind=<workloadType(Deployment,Rollout)>
breachmonitor_execution_rate gauge Rate of breachmonitor executions for the workloads
concurrent_breachmonitor_executions counter Number of concurrent breachmonitor executions for the workloads
breachmonitor_mitigation_latency_seconds histogram Time to mitigate breach in seconds for a workload policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>
workload=<deployment-name>
workloadKind=<workloadType(Deployment,Rollout)>
hpaenforcer_reconciled_count counter Number of times a policyrecommendation has been reconciled by HPAEnforcer policyreco=<policyrecommendation-name>
namespace=<policyrecommendation-namespace>