StackSets
creates Stacks
which in turn creates the underlying resource
types Ingress
, Service
and Deployment
. In order to connect these
resources at the port level they must be configured to use the same port number
or name. For StackSets
there are a number of configuration options possible
for setting up this port mapping.
The backendPort
value under spec.ingress
must be defined as shown below:
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
# optional Ingress definition.
ingress:
hosts: [my-app.example.org]
backendPort: 80
stackTemplate:
spec:
version: v1
replicas: 3
podTemplate:
spec:
containers:
- name: skipper
image: ghcr.io/zalando/skipper:latest
args:
- skipper
- -inline-routes
- '* -> inlineContent("OK") -> <shunt>'
- -address=:80
ports:
- containerPort: 80
This will result in an Ingress
resource where the service.port.number
value is
80
:
apiVersion: networking/v1
kind: Ingress
metadata:
name: my-app-v1-traffic-segment
spec:
rules:
- host: my-app.example.org
http:
paths:
- backend:
service:
name: my-app-v1
port:
number: 80
And since the podTemplate
of the StackSet
also defines a containerPort 80
for the container:
containers:
- name: skipper
image: ghcr.io/zalando/skipper:latest
args:
- skipper
- -inline-routes
- '* -> inlineContent("OK") -> <shunt>'
- -address=:80
ports:
- containerPort: 80
The service created for a Stack
will get the following generated port
configuration:
ports:
- name: port-0
port: 80
protocol: TCP
targetPort: 80
Which ensures that there is a connection from Ingress -> Service -> Pods
.
If you have multiple ports or containers defined in a pod it's important that
exactly one of the ports map to the backendPort
defined for the ingress i.e.
one port of the containers must have the same name or port number as the
backendPort
.
In some cases you want to expose multiple ports in your service. For this use
case it's possible to define the service ports on the StackSet
.
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
# optional Ingress definition.
ingress:
hosts: [my-app.example.org]
backendPort: 80
stackTemplate:
spec:
version: v1
replicas: 3
# define custom ports for the service
service:
ports:
- name: ingress
port: 80
protocol: TCP
targetPort: 8080
- name: metrics
port: 79
protocol: TCP
targetPort: 7979
podTemplate:
spec:
containers:
- name: skipper
image: ghcr.io/zalando/skipper:latest
args:
- skipper
- -inline-routes
- 'Path("/metrics") -> inlineContent("app_amazing_metric 42") -> <shunt>'
- -inline-routes
- '* -> inlineContent("OK") -> <shunt>'
- -address=:80
ports:
- containerPort: 8080
name: ingress
- containerPort: 7979
name: metrics
Here you must make sure that the value used for spec.ingress.backendPort
also
maps to one of the ports in the Service
, either by name or port number.
Additionally the service ports should map to corresponding container ports also
by name or port number.
A Horizontal Pod Autoscaler can be attached to the the deployment created by the stackset. HPAs can be used to scale the number of pods based on metrics from different sources. Specifying an HPA for a deployment allows the stack to scale up during periods of higher traffic and then scale back down during off-peak hours to save costs.
HPAs can be specified via the autoscaler
field. This is then resolved by the
stackset-controller which generates an HPA with an equivalent spec.
Currently, the autoscaler can be used to specify scaling based on the following
metrics:
CPU
Memory
AmazonSQS
PodJSON
Ingress
RouteGroup
RequestsPerSecond
ZMON
ScalingSchedule
ClusterScalingSchedule
Note: Based on the metrics type specified you may need to also deploy the kube-metrics-adapter in your cluster.
Following is an example using the autoscaler
field to generate an HPA with
CPU metrics, Memory metrics and external metrics based on AmazonSQS queue size.
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: AmazonSQS
queue:
name: foo
region: eu-west-1
average: 30
- type: CPU
averageUtilization: 80
# optional: scale based on metrics from a single named container as
# opposed to the average of all containers in a pod.
container: "app"
- type: Memory
averageUtilization: 80
# optional: scale based on metrics from a single named container as
# opposed to the average of all containers in a pod.
container: "app"
Here the stackset would be scaled based on the length of the Amazon SQS Queue size so that there are no more than 30 items in the queue per pod. Also the autoscaler tries to keep the CPU usage below 80% in the pods by scaling. If multiple metrics are specified then the HPA calculates the number of pods required per metrics and uses the highest recommendation.
JSON metrics exposed by the pods are also supported. Here's an example where the pods expose metrics in
JSON format on the /metrics
endpoint on port 9090. The key for the metrics should be specified as well.
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: PodJson
endpoint:
port: 9090
path: /metrics
key: '$.http_server.rps'
average: 1k
If Skipper is used for ingress in the cluster then scaling can also be done based on the requests received by the stack. The following autoscaler metric specifies that the number of requests per pod should not be more than 30.
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: Ingress
average: 30
If using RouteGroup
instead of Ingress
, then the following config is the
equivalent:
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: RouteGroup
average: 30
If using an external ingress, in other words using neither RouteGroup
or Ingress
, the recommended way to scale your applications using a RPS metric is through RequestsPerSecond
type, like the following example:
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: RequestsPerSecond
average: 30
requestsPerSecond:
hostnames:
- 'example.com'
- 'foo.bar.baz'
The RPS measured in the specified hostnames are weighted by the ammount of traffic the stack is getting. For example: let's say a traffic switch is happening from stack-A
to stack-B
, and in the current state 50% of the traffic is being routed to stack-B
backends. When calculating the RPS the metric will get the total traffic to example.com
and foo.bar.baz
and sum it all up, after this the final value is multiplied by the percentage weight in this case 50%, resulting in something like:
`sum(traffic('example.com'), traffic('foo.bar.baz')) * 0.5`
This final value will be compared to the value in average
field, in this example 30. If the final number is bigger than 30 the backend will scale out, otherwise it will stay the same.
Note that the field hostnames
can accept as many hostnames as you want.
If ZMON based metrics are supported you can enable scaling based on ZMON checks as shown in the following metric configuration:
autoscaler:
minReplicas: 1
maxReplicas: 3
metrics:
- type: ZMON
zmon:
checkID: "1234"
key: "custom.value"
duration: "5m"
aggregators:
- avg
tags:
application: "my-app"
average: 30
Metrics to scale based on time are also supported. It relies on the
ScalingSchedule
collectors.
The following is an example of metrics configuration, for both
ScalingSchedule
and ClusterScalingSchedule
resources:
autoscaler:
minReplicas: 1
maxReplicas: 30
metrics:
- type: ClusterScalingSchedule
# The average value per pod of the returned metric
average: 1000
clusterScalingSchedule:
# The name of the deployed ClusterScalingSchedule object
name: "cluster-wide-scheduling-event"
- type: ScalingSchedule
# The average value per pod of the returned metric
average: 10
scalingSchedule:
# The name of the deployed ScalingSchedule object
name: "namespaced-scheduling-event"
The stackset-controller has alpha
support for prescaling stacks before
directing traffic to them. That is, if you deploy your stacks with Horizontal
Pod Autoscaling (HPA) enabled then you might have the current stack scaled to
20 pods while a new stack is initially deployed with only 3 pods. In this case
you want to make sure that the new stack is scaled to 20 pods before it gets
any traffic, otherwise it might die under the high unexpected load and the HPA
would not be able to react and scale up fast enough.
To enable prescaling support, you simply need to add the
alpha.stackset-controller.zalando.org/prescale-stacks
annotation to your
StackSet
resource:
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
annotations:
alpha.stackset-controller.zalando.org/prescale-stacks: "yes"
# alpha.stackset-controller.zalando.org/reset-hpa-min-replicas-delay: 20m # optional
spec:
...
The pre scaling works as follows:
- User directs/increases traffic to a stack.
- Before the stack gets any traffic it will calculate a prescale value
n
of replicas based on the sum of all stacks currently getting traffic. - The HPA of the stack will get the
MinReplicas
value set equal to the prescale value calculated in 2. - Once the stack has
n
ready pods then traffic will be switched to it. - After the traffic has been switched and has been running for 10 min. then
the HPA
MinReplicas
will be reset back to what is configured in the stackset allowing the HPA to scale down in case the load decreases for the service.
The default delay for resetting the MinReplicas
of the HPA is 10 min. You can
configure the time by setting the
alpha.stackset-controller.zalando.org/reset-hpa-min-replicas-delay
annotation
on the stackset.
Note: Even if you switch traffic gradually like 10%...20%..50%..80%..100%
It will still prescale based on the sum of all stacks getting traffic within each step.
This means that it might overscale for some minutes before the HPA kicks in and
scales back down to the needed resources. Reliability is favoured over cost in
the prescale logic.
To explain further, the amount of this overscale is decided as follows. Imagine
a case with minReplicas: 20
and maxReplicas: 40
and traffic is switched in
steps as 1%
, 25%
, 50%
and 100%
. Additionally, imagine the existing
stack has 20 replicas running.
- Upon switching of
1%
of the traffic to the new stack, the amount of prescale should be 1 % of 20 (sum of all stacks sizes receiving traffic currently). However, since this is less thanminReplicas
, the scale will be up tominReplicas
and will be 20. - Before switching to
25%
of the traffic, the stack is increased in size by 25% of 40 = 10 (20 from old stack + 20 from the new), so the new stack is scaled to 20 + 10 = 30 replicas before the switch. - Before switching to
50%
, same follows, and the prescale is now done as 50% of 50 = 25 (30 replicas of new stack + 20 replicas of the old), and so the new stack is to be scaled to 55 replicas, however, sincemaxReplicas
is set to 40, stack size will be set to40
. - Similarly, when
100%
of the traffic is to be switched, the size ofmaxReplicas
will be enforced.
External controllers can create routes based on multiple Ingress, FabricGateway RouteGroup SMI, Istio or other CRDs.
Known controllers, that support this:
- https://github.com/zalando-incubator/fabric-gateway
<add-yours-here>
Users need to provide the backendPort
value under spec.externalIngress
as shown below:
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
externalIngress:
backendPort: 8080
Clients set the traffic switch states in spec.traffic
.
External controllers should read actual traffic switch status from
status.traffic
. The status.traffic
has all information to link to
the right Kubernetes service.
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
traffic:
- stackName: my-app-v1
weight: 80
- stackName: my-app-v2
weight: 20
status:
observedStackVersion: v2
stacks: 2
stacksWithTraffic: 1
traffic:
- stackName: my-app-v1
serviceName: my-app-v1
servicePort: 8080
weight: 100
- stackName: my-app-v2
serviceName: my-app-v2
servicePort: 8080
weight: 0
Important: with externalIngress
set the StackSet Controller won't create
Ingress resources. Therefore The External Controller is also responsible for
versioning external Ingress.
RouteGroups
is a skipper specific CRD and are a more powerful routing
configuration than ingress. For example you want to redirect /login
,
if the cookie "my-login-cookie" is not set, to your Open ID Connect
provider or something like this? Here is how you can do that:
apiVersion: zalando.org/v1
kind: StackSet
metadata:
name: my-app
spec:
routegroup:
additionalBackends:
- name: theShunt
type: shunt
backendPort: 9090
hosts:
- "www.example.org"
routes:
- pathSubtree: "/"
# route with more predicates has more weight, than the redirected route
- path: "/login"
predicates:
- Cookie("my-login-cookie")
- path: "/login"
# backends within a route overwrites
backends:
- backendName: theShunt
filters:
- redirectTo(308, "https://login.example.com")
stackTemplate:
spec:
version: v1
replicas: 3
podTemplate:
spec:
containers:
- name: skipper
image: ghcr.io/zalando/skipper:latest
args:
- skipper
- -inline-routes
- 'r0: * -> inlineContent("OK") -> <shunt>; r1: Path("/login") -> inlineContent("login") -> <shunt>;'
- -address=:9090
ports:
- containerPort: 9090