-
I'm curious what sort of roll out strategy Slack uses on index nodes. We do one at a time rollout, but since each index node kicks off a corresponding recovery task, IO load on kafka spikes. I wonder if there is a better strategy. Does slack have a mechanism or configuration that waits for recovery nodes to re-index and push a segment before the next index node reloads? Or maybe something else? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@autata for indexers we have a deploy script that will delete up to 10% of pods in parallel, and then wait two mins before continuing onto the next phase. This looks something like the following: #!/bin/bash
pods="$(kubectl --context=$cluster get pods -l app=$app -o name | cut -d '/' -f 2)"
# delete 10% of the pods in parallel
let max_not_running=$replicas/10
for pod in $pods; do
kubectl --context=$cluster delete pod $pod &
sleep 5
not_running_pods="$(kubectl --context=$cluster get pods -l app=$app | grep -v STATUS | grep -v Running | wc -l)"
while [[ $not_running_pods -gt $max_not_running ]]; do
echo "Pausing till $not_running_pods pods get back to Running state before deploying more"
# wait 2 mins before doing the next iteration
sleep 120
not_running_pods="$(kubectl --context=$cluster get pods -l app=$app | grep -v STATUS | grep -v Running | wc -l)"
done
done
running_pods="$(kubectl --context=$cluster get pods -l app=$app | grep -v STATUS | grep Running | wc -l)"
while [[ $running_pods -lt $replicas ]]; do
let remaining_replicas=$replicas-$running_pods
echo "running_pods=$running_pods. Waiting for $remaining_replicas pods to reach Running state before completing deploy"
sleep 10
running_pods="$(kubectl --context=$cluster get pods -l app=$app | grep -v STATUS | grep Running | wc -l)"
done environment:
app: kaldb-index-example-dev
cluster: logging-dev-iad
replicas: 5 |
Beta Was this translation helpful? Give feedback.
@autata for indexers we have a deploy script that will delete up to 10% of pods in parallel, and then wait two mins before continuing onto the next phase. This looks something like the following: