-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for received etcd calls #270
Comments
A PR would be welcome. It is unlikely that we will be able to prioritize full parity with etcd's server metrics. |
Thanks for your prompt response. My team is unfortunately unable to contribute golang features at this time. Partial incomplete would likely still be useful. |
If you're just trying to figure out if a "standby" instance is being used, wouldn't the SQL call metrics be a good proxy for client traffic? There will always be a base level of chatter, but less client requests will still directly correlate to less SQL calls. |
From manual peeking at netstat connections in the kine container, we suspect there is no incoming client traffic into the "standby" instance, yet the sql calls metrics are still pretty high. The etcd client metrics would be a robust way to get the truth. As a workaround, we're trying to infer etcd client traffic from container network traffic by using metric We're doubling with cilium flow metrics to confirm the standby kine pod is not receiving ingress connections it should not. Note that our set up is somewhat complex, here is a draft sketch.
Our constraint is that the cross-az bandwidth is limited in our environment (represented as the green wireguard bars), so we try to favor local-az traffic affinity. |
I will note that kine doesn't really have a concept of being in standby. Even if there are no client connections, it will continue polling the datastore, converting rows into events (even if they don't get sent to any watchers), processing event TTLs, attempting to perform compaction, and so on. The only thing that won't happen on the "standby" nodes is any one-off queries related to direct resource retrieval or modification - basically anything that flows through the "append" or "list" nodes in the diagram at https://github.com/k3s-io/kine/blob/master/docs/flow.md. Everything in the poll loop at the bottom runs at all times. With that in mind, you might consider changing your deployment to favor a different architecture, or working on some optimizations that disable the poll loop when kine has no clients. |
Thanks for highlighting the background work performed by all kine instances. We'll take more time trying to better understand it. We initially focused on the compactor as we noticed high volume of related sql queries, although we can't yet measure their cumulative bandwidth usage. We were considering spreading the start of kine processes in order to avoid concurrent execution of the compactor (as a workaround for turning it off with #230 ). Do you think that could help reduce the overall background activity ? Once we confirm no etcd requests hit the "standby" kine instance, we may need to try to further analyze the background activity and see if they could come from unusual/unoptimize usage pattern of the k8s api. Here are some thoughts for measurements of the background activity
We are considering asking the k8s scheduler to colocate a single kine with the single pg active process using pod affinity rules. When the pg process will fail over, we'll need to reschedule the kine pod. The descheduler project looked appealing but the support for descheduling pods violating podAffinity is pending, see kubernetes-sigs/descheduler#1286 Any other idea ? I am careful however of relying too much on the k8s scheduler during recovery from az outages, as I fear the fail over time will be long. We have to test and measure such fail-over time. |
Only one node will actually compact, the others will skip compaction due to the Let me take a look at what metrics etcd provides, and see how much work it would be to add some initial ones to kine.
That sounds like more of a job for a custom operator, as opposed to something that the default scheduler can handle. If you want to control placement of a pod, based on the state of a system that is opaque to Kubernetes (ie, the primary pg cluster member), that is operator territory. |
Thanks a lot.
That sounds the best option, coupled with having one kine container colocated with each pg container regardless of whether it is a primary or a read-only replica. This architecture does not rely on the k8s scheduler to provide ha and fail over, and should improve up time in face of network partitions or az loss. Our early attempt of colocating kine in the postgresql pod showed that the kine process exited early on when connecting to a read-only pg replica. With default container start command, this resulted in the whole pod failure, including postgresql. I'm considering options for having the "standby" kine instance to wait for the backing store to be available before accepting incoming traffic. Here are some thoughts:
|
I did a quick scan over the etcd metrics, I think these are the ones that we could theoretically support:
|
awesome, thanks Brad, that would be very useful ! |
Re on metrics. I'm suspecting that the loop (that you kindly highlighted in #270 (comment)), fetching all rows content, is a significant proportion of the kine-to-db traffic we're observing kine/pkg/logstructured/sqllog/sql.go Line 428 in c1da9bf
What metrics could help measure the impact of the loop and help focus potential future optimization efforts ? |
The SQL queries would be captured in the existing SQL metrics. Having the poll loop not run while there are no watch clients connected may not be an achievable goal. There are some internal maintenance/cleanup routines (mostly around TTL handling) that rely on having a constant feed of events from the cluster. Compaction will also attempt to run at intervals even if there are no clients. It'll take some experimentation to see what we can do; kine was not designed for active/standby use. |
Thanks again for your feedback I
This new metric would be very useful thanks !
I still need to refine my analysis w.r.t. suspected top traffic contributors, as I don't yet have automated mapping from pg statement stats into kine functions, so I currently need to perform analysis from pg statements log samples. Here is a copy of the pg statement stats of queries sorted by number of rows returned. $3 does not always appear in pg logs in my sample, and seems to be $3 = 'compact_rev_key', apparently rather hinting to compactor queries The queryId=7735331401850268309 associated with the top statement corresponds in pg log_statement=all to the following, with $1 iterating over may kkeys.
Yes, the alternative for us is to only have a single active kine process at a given time, and having the "standby" kine instance to wait for the backing store to be available before accepting incoming traffic, and starting its compactor and loop (see #270 (comment)) I have on my list to prototype the container start command retry loop. |
This is not an event poll or compact; the query you're digging into here is a list query. You'll see the same thing with different values for the You'll see |
Nope. You'll see a PR linked here when one is available. |
Expected behavior
As a kine user
In order to diagnose performance issues such as chatty SQL for a standby kine instance in a multi-server cluster
I need to measure incoming etcd traffic received by each kine instance
Current behavior
Workaround
Measure client-side etcd calls from K8s api, however due to load balancing, the metrics are insufficient
The text was updated successfully, but these errors were encountered: