-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for "sonar analyze" #43
Comments
How I fetch data so that I can test on real data locally:
|
I agree, these cover all the non-automated use cases for https://github.com/NAICNO/Jobanalyzer as well, and I think the automated use cases will require a superstructure separate from sonar (but using sonar data). |
@benteb is working on this part. More ideas and suggestions most welcome. |
As we're going to need to synthesize jobs for batchless systems (see #56), we're also going to need a way to list those jobs from the logs, so I'm sketching out a utility for that over in https://github.com/NAICNO/Jobanalyzer, see subdirectory Edit: |
It turns out (see above comment) that what I've been doing for Jobanalyzer is basically |
Thank you! I will check it out. Definitely better to pool resources than duplicating efforts. It is very possible that we might not need |
I think that this is "done" and that we should move specific, detailed requests to the Jobanalyzer repo and try to resolve them there. In that repo, there is a top-level directory adhoc-reports/ that has some sample applications. These are primitive but evocative. For example, here's a rundown of the jobs Pubudu (a colleague here) ran on the ML cluster over the 90 days:
(The columns are CPU seconds, GPU seconds, and command names of the processes in the job - most of his work is implemented by these immense pipelines, and he runs a lot of experiments.) This is a simple bash+awk script that executes remote sonalyze queries to the host that is the keeper of the sonar data. The script is not production ready - it needs parameters, formatting, headers, etc - but it's getting us toward what the present Issue is asking for. |
Hacked up another one: Top 25 jobs by program name on Fox for the last week, the columns are command name, cpu seconds, and percentage of accumulated CPU time:
|
And for hack value, the same on Saga (b/c I'm stealing the current sonar data from Saga):
This highlights a couple of things, partly the annoying appearance of slurmstepd (this is a saga artifact I think, I don't see it on Fox, it could be because sonar on saga does not use the same process filtering as on Fox) but also that many jobs are pipelines or trees of processes and that that makes it a little more annoying to say that a particular program used so and so much compute. |
big picture goals:
The text was updated successfully, but these errors were encountered: