All the standard kubebuilder controller metrics as mentioned here are available for ottoscalr controllers.
Apart from these, following metrics are exported by ottoscalr which can be used to monitor and configure alerts.
The ServiceMonitor is also bundled with the helm chart and can be deployed optionally if your monitoring stack is based on KubePrometheus. If the ServiceMonitor is deployed, these metrics will be prefixed with ottoscalr_
Metric name | Metric type | Description | Labels/tags |
---|---|---|---|
policyreco_current_policy_max |
gauge | Current Max replica count to be applied to the HPA | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_current_policy_min |
gauge | Current Min replica count to be applied to the HPA | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_current_policy_utilization |
gauge | Current CPU utilization threshold to be applied to the HPA | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_target_policy_max |
gauge | Max replica count recommended to be applied to the HPA. | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_target_policy_min |
gauge | Min replica count recommended to be applied to the HPA. | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_target_policy_max |
gauge | CPU utilization threshold recommended to be applied to the HPA. | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_workload_info |
gauge | Information about the policyrecommendation | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> workload =<policyrecommendation-workload> workloadKind =<workloadType(Deployment,Rollout)> |
policyreco_reconciler_conditions |
gauge | Metric for checking the status of different conditions of .policyrecommendation.status |
policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> status =<True,False> type =<RecoTaskProgress,TargetRecoAchieved> |
policyreco_reconciler_task_progress_reason |
gauge | Metric for checking the reason for condition type RecoTaskProgress . |
policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> reason =<RecoTaskErrored,RecoTaskRecommendationGenerated; |
policyreco_reconciled_count |
counter | Number of times a policyrecommendation has been reconciled | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_reconciler_errored_count |
counter | Number of times a policyrecommendation's reconciliation has errored | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
policyreco_reconciler_targetreco_slo_days |
histogram | Time taken for a policy reco to achieve the target recommendation in days | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |
minimum_percentage_of_datapoints_present |
gauge | If minimum percentage of datapoints is present to generate recommendation. | workload =<deployment-name> namespace =<policyrecommendation-namespace> reason =<RecoTaskErrored,RecoTaskRecommendationGenerated; |
p8s_query_error_count |
counter | Error counter for a query made to prometheus | query =<query-type> p8sinstance =<prometheusinstance-name> |
p8s_query_success_count |
counter | Success counter for a query made to prometheus | query =<query-type> p8sinstance =<prometheusinstance-name> |
p8s_concurrent_queries |
gauge | Number of concurrent p8s api calls for a query | query =<query-type> p8sinstance =<prometheusinstance-name> |
datapoints_fetched_by_p8s_instance |
gauge | Number of datapoints fetched for a query for a workload from a prometheus instance | query =<query-type> p8sinstance =<prometheusinstance-name> workload =<deployment-name> namespace =<policyrecommendation-namespace> |
total_datapoints_fetched |
gauge | Total Number of datapoints fetched for a query for a workload after aggregating from all the prometheus instances | query =<query-type> workload =<deployment-name> namespace =<policyrecommendation-namespace> |
prometheus_scraper_query_latency |
histogram | Time to execute prometheus scraper query in seconds | query =<query-type> p8sinstance =<prometheusinstance-name> workload =<deployment-name> namespace =<policyrecommendation-namespace> |
get_avg_cpu_utilization_query_latency_seconds |
histogram | Total Time to execute utilization datapoint query in seconds | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> workload =<deployment-name> workloadKind =<workloadType(Deployment,Rollout)> |
get_reco_generation_latency_seconds |
histogram | Total time to generate policyrecommendation for a workload once it's execution is started | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> workload =<deployment-name> workloadKind =<workloadType(Deployment,Rollout)> |
breachmonitor_breached |
gauge | If a particular workload has breached the cpu redline or not | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> workload =<deployment-name> workloadKind =<workloadType(Deployment,Rollout)> |
breachmonitor_execution_rate |
gauge | Rate of breachmonitor executions for the workloads | |
concurrent_breachmonitor_executions |
counter | Number of concurrent breachmonitor executions for the workloads | |
breachmonitor_mitigation_latency_seconds |
histogram | Time to mitigate breach in seconds for a workload | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> workload =<deployment-name> workloadKind =<workloadType(Deployment,Rollout)> |
hpaenforcer_reconciled_count |
counter | Number of times a policyrecommendation has been reconciled by HPAEnforcer | policyreco =<policyrecommendation-name> namespace =<policyrecommendation-namespace> |