Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tortoise scales unreasonably when application misbehaves #405

Open
randytqwjp opened this issue Jun 5, 2024 · 3 comments
Open

Tortoise scales unreasonably when application misbehaves #405

randytqwjp opened this issue Jun 5, 2024 · 3 comments

Comments

@randytqwjp
Copy link
Collaborator

https://mercari.slack.com/archives/C6HC4JBKM/p1717478614923889

When pods are stuck in crashbackloopoff, tortoise recommendation algorithm does not consider this and causes recommendations to scale HPA max replicas to a unreasonable amount

@sanposhiho
Copy link
Collaborator

causes recommendations to scale HPA max replicas to a unreasonable amount

Does it mean the tortoise lowered the target utilization of HPA too much and consequently HPA increased the replica number?

@randytqwjp
Copy link
Collaborator Author

there was a bug in application logic and caused some pods to be stuck in crashbackloop while remaining pods utilization went up. Tortoise then increased maxreplica for this service but since new pods also get stuck in crashbackloop, tortoise kept increasing maxreplica

@sanposhiho
Copy link
Collaborator

My suggestion is that we can improve Tortoise to check all Pods' status, and then if the ratio of such crashed Pods is higher than the criterion (50% etc), stop changing the max replica (or maybe stop changing any parameters until the situation is stable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants