Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine reliability of (Promotheus) metrics #447

Open
leplatrem opened this issue Nov 11, 2022 · 5 comments
Open

Determine reliability of (Promotheus) metrics #447

leplatrem opened this issue Nov 11, 2022 · 5 comments

Comments

@leplatrem
Copy link
Contributor

Image

On the 9th Nov at 16h30, it was reported in PROD that the load balancer heartbeat was called 10M times per minute (184K RPS). Normal heartbeat 48K RPS. And the /stripe_from_pubsub 14K RPS, which lead to 1378 error responses 5XX per min.

Are these numbers real?

@bsieber-mozilla
Copy link
Contributor

How would we recreate this event? Turn off ingress in staging and turn it back on?

@bkochendorfer
Copy link
Member

I tried to recreate in Staging by enabling and disabling the ingress. I did see a spike but nothing quite like this. Let's keep an eye on it and see if it happens again. I'm wondering if it's possibly related to the performance issues that were fixed by adding an ingress.

@bsieber-mozilla
Copy link
Contributor

After reviewing the dashboard it seems that 5xx have reduced, however there have been spikes on the heartbeat. We'll keep this open for now

@grahamalama
Copy link
Contributor

Another possible complication here:

In this case, if you had multiple containers, by default, when Prometheus came to read the metrics, it would get the ones for a single container each time (for the container that handled that particular request), instead of getting the accumulated metrics for all the replicated containers.

Is this affecting us now?

@leplatrem
Copy link
Contributor Author

Screenshot 2023-10-05 at 16 50 09

/cc @ahoneiser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants