Skip to content

k-yomo/elastic-cloud-autoscaler

Repository files navigation

elastic-cloud-autoscaler

License: Apache-2.0 Test Codecov Go Report Card

Elastic Cloud Autoscaler based on CPU util or cron schedules inspired by es-operator.

⚠️ This library is still experimental, please use at your own risk if you use. I also highly recommend using with DryRun: true at first.

Compatibility

  • Elasticsearch >= 8.x

Features

The autoscaler supports following ways of auto-scaling.

  • CPU utilization based auto-scaling.
    • Autoscaler tries to scale-out/scale-in when average CPU util is higher/lower than the desired CPU utilization throughout the threshold duration
  • Cron schedule based auto-scaling.
    • You can override min/max node num for configured duration with the cron format schedule.

Configuration

Config properties Type Required Description
index string true Index to update replicas when scaling out/in
shardsPerNode int true Desired shard count per 1 node. Autoscaler won't scale-in / scale-out to the node count that can't meet this ratio.
defaultMinMemoryGBPerZone int true Default memory min size per zone. Available number is only 64,...(64xN node)
defaultMaxMemoryGBPerZone int true Default memory max size per zone. Available number is only 64,...(64xN node)
autoScaling object
autoScaling.desiredCPUUtilPercent int true (in autoScaling) Desired CPU utilization percent. Autoscaler will change nodes to make CPU utilization closer to the desired CPU utilization.
autoScaling.scaleOutThresholdDuration time.Duration Threshold duration for scale-out. When CPU util is higher than desiredCPUUtilPercent throughout the threshold duration scale-out may happen.
autoScaling.scaleOutCoolDownDuration time.Duration Cool down period for scale-out after the last scaling operation.
autoScaling.scaleInThresholdDuration time.Duration Threshold duration for scale-in. When CPU util is lower than desiredCPUUtilPercent throughout the threshold duration scale-in may happen.
autoScaling.scaleInCoolDownDuration time.Duration Cool down period for scale-in after the last scaling operation
[]scheduledScalings array of object
scheduledScalings[i].startCronSchedule string true (in scheduledScaling) Cron format schedule to start the specified min/max size. Default timezone is machine local timezone. If you want to specify, set TZ= prefix (e.g. TZ=UTC 0 0 0 0 0).
scheduledScalings[i].duration time.Duration true (in scheduledScaling) Duration to apply above min/max size from startCronSchedule
scheduledScalings[i].minMemoryGBPerZone int true (in scheduledScaling) Min memory size during the specified period.
scheduledScalings[i].maxMemoryGBPerZone int true (in scheduledScaling) Max memory size during the specified period.

Example YAML Config

index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 64
defaultMaxMemoryGBPerZone: 256
autoScaling:
  desiredCPUUtilPercent: 50
  scaleOutThresholdDuration: 5m
  scaleInThresholdDuration: 10m
scheduledScalings:
  - startCronSchedule: TZ=UTC 0 0 * * *
    duration: 1h
    minMemoryGBPerZone: 128
    maxMemoryGBPerZone: 256

How it scales

Autoscaler tries to scale-out/scale-in within min/max range keeping configured shardsPerNode.

If CPU based auto-scaling is configured, Autoscaler tries to increase/decrease the number of nodes and replicas so that they are closer to the target utilization when the CPU utilization stays above/below the target CPU utilization for a certain period of time.

If it can't meet the shardsPerNode, Autoscaler won't apply scaling operation.

Example

- scale-out when CPU utilization is higher then the target

Premise
cluster:
  memoryGBPerZone: 384 (64g * 6)
  zoneCount: 2
  averageCPUUtil: 60 (keeping 60 for 5 minutes)
index:
  numberOfShards: 2
  numberOfReplicas: 5
Config
index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 284 (64g * 6)
defaultMaxMemoryGBPerZone: 768 (64g * 12)
autoScaling:
  desiredCPUUtilPercent: 45
  scaleOutThresholdDuration: 5m
  scaleInThresholdDuration: 10m
Result

Scaling-out to 8 nodes per zone to reduce CPU utilization. (60% * 12 nodes / 16 nodes => 45%).

cluster:
  memoryGBPerZone: 256 (64g * 6) => 512 (64g * 8)
  zoneCount: 2
  averageCPUUtil: 80
index:
  numberOfShards: 2
  numberOfReplicas: 5 => 6

- not scaling due to shardsPerNode violation

Premise
cluster:
  memoryGBPerZone: 192 (64g * 3)
  zoneCount: 2
index:
  numberOfShards: 3
  numberOfReplicas: 1
Config
index: test
shardsPerNode: 1
defaultMinMemoryGBPerZone: 256 (64g * 4)
defaultMaxMemoryGBPerZone: 256 (64g * 4)

In the above case, Autoscaler won't apply scaling-out to 4 nodes x 2 zones even though defaultMinMemoryGBPerZone is 4 nodes. since either 1 replica (6 shards in total) or 2 replicas (9 shards in total) can't be 8 which is required by shardsPerNode: 1.

Usage

Elastic Cloud Autoscaler can be used as library. Example is in ./examples/main.go.

Also handy docker image is provided. See kyomo/elastic-cloud-autoscaler for more image details.

Constraints

Demo

You can easily test this library with the below repository. https://github.com/k-yomo/elastic-cloud-autoscaler-demo