Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citus support #424

Open
ivyazmitinov opened this issue Aug 5, 2024 · 7 comments
Open

Citus support #424

ivyazmitinov opened this issue Aug 5, 2024 · 7 comments
Assignees

Comments

@ivyazmitinov
Copy link

Citus is a (fairly) popular extension that adds horizontal scaling capabilities for Postgres. Since with Citus queries become distributed, pg_stat_activity is not sufficient enough to track them, so Citus provides its own views, like citus_stat_activity.

It would be great improvement for DBAs who is using pg_activity for Citus cluster management to support those facilities out-of-box 🙂

@dlax
Copy link
Member

dlax commented Aug 27, 2024

Hi @ivyazmitinov, thank you for your feature request.

I've put together a draft PR to start adding support for Citus at #426, it just queries citus_stat_activity whenever we used to query pg_stat_activity when pg_activity is invoked with the (new) --citus command-line option.

I don't have the system setup (Citus) to test this on my side, so it'd be greatly appreciated if you could give it a try and report back. Thank you!


See https://github.com/dalibo/pg_activity?tab=readme-ov-file#from-source-using-git for how to checkout and install the development version; the target branch for #426 is named citus.

@dlax dlax self-assigned this Aug 27, 2024
@ivyazmitinov
Copy link
Author

Amazing, thank you @dlax!
I will test it out, and also send out a call in the Citus public slack 🙂

@blogh
Copy link
Collaborator

blogh commented Sep 3, 2024

Hi,

Thx for the patch @dlax.

We should probably add these columns for citus [1]:

  • global_pid: Citus global process id associated with the query;
  • nodeid: Citus node id of the node the citus_stat_activity row comes from;
  • is_worker_query: Boolean value, showing if the row is from one of the queries that run on the shards.

I am not sure the former pid column is still useful (local node pid)?

I also think that in the context of Citus joining into pg_locks is not meaningfull since citus_stat_activity
displays global pids whereas pg_locks show local locks. We should instead use citus_lock_waits [1].

The pg_cancel_backendand() and pg_terminate_backend() queries [1] also need to be changed to be fed
the global_pid instead of the "local" pid.

@ivyazmitinov: I'd love to have you opinion, I am not a user of Citus, I only too a quick look at the docs.

[1] https://www.citusdata.com/blog/2022/07/21/citus-stat-activity-views-for-distributed-postgres/
[2] https://docs.citusdata.com/en/stable/develop/api_metadata.html#distributed-query-activity

@blogh
Copy link
Collaborator

blogh commented Sep 3, 2024

Using citus_nodename_for_nodeid(nodeid) instead of nodeid is also probably better.

@ivyazmitinov
Copy link
Author

Absolutly agree, thanks @blogh!

I am not sure the former pid column is still useful (local node pid)?

It is: sometime only one of the distributed queries underperforms, so it is necessary to get down and troubleshoot individual worker

I was not able to test it out myself yet, but definitely will this week 🙂

@ivyazmitinov
Copy link
Author

I was also thinking that, since distributed queries are in a parent-child relationship with a main query on a coordinator, it would be greate to organize a tree structure, similar to subprocesses in htop, but it would require too much effort, I assume...

@ivyazmitinov
Copy link
Author

So, I was asble to test out the implementation. Good news is that it works, citus_stat_activity is queried correctly 🙂
Unfortunatelly, there is much more work to be done to adapt it to the distibuted nature of citus, in addition to those mentioned by @blogh :

  1. Database and system counters don't make much sense in current implementation, since they show only coordinator node, while queries are shown from every node in the cluster, and I am not sure, what is the best way to deal with this. Sum up all the available resources across all nodes in the cluster? Allow to "switch" between nodes to show utilization only for one node? Both?
    image
  2. Since queries are shown from every node in the cluster, and there can be an infinite amount of nodes (and shards per node), the amount of queries in citus_stat_activity grows geometrically and can easily hit tens or hundreds thousands which is almost impossible to manage. So, it is necessary to be able to have "fold/unfold" functionality for distributed queries, for a developer to be able inspect only those distributed queries that they interested in.

I will also try to tackle those issues on the base of #426 later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants