Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fox config info is not right #532

Open
3 tasks
lars-t-hansen opened this issue Jun 25, 2024 · 2 comments
Open
3 tasks

Fox config info is not right #532

lars-t-hansen opened this issue Jun 25, 2024 · 2 comments
Assignees
Labels
component:infra Shell scripts, cron scripts, web server, etc pri:high task:bug Something isn't working

Comments

@lars-t-hansen
Copy link
Collaborator

lars-t-hansen commented Jun 25, 2024

Looks like this needs to be regenerated. gpu-9 is 4xA100 but we've recorded 2xH100 and so the plots are not looking right. There probably should be some kind of alert for this, if we're not going to make the config info time sensitive.

  • Merge slurminfo and make-config-file, the current situation is an unmaintainable mess
  • Be sure to update documentation
  • Regenerate fox info and deploy it
@lars-t-hansen lars-t-hansen added task:bug Something isn't working component:infra Shell scripts, cron scripts, web server, etc labels Jun 25, 2024
@lars-t-hansen lars-t-hansen self-assigned this Aug 27, 2024
@lars-t-hansen
Copy link
Collaborator Author

lars-t-hansen commented Aug 27, 2024

Actually this affects the Fox GPU usage numbers which are becoming a thing (#522), so we should explore this with some more haste.

@lars-t-hansen
Copy link
Collaborator Author

I may be making this too complicated. In reality, we'll bring up a cluster by making it report some initial data including sysinfo. In principle it should be able to do that without referring to the config file. (The add operation does not require the config to know about every host, though it will be good if there's an empty config present.) Then after a day or so we'll run make-config-info to generate an initial config file from the sysinfo; this will be missing all nodes that are down at the time, but they can be added by hand, instead of from the background file - that process is fairly tricky anyway. Until the config file is available we can't use the dashboard or remote sonalyze, but this is not all that important. Once the config file is present the server has to be restarted again. Probably we want it to have some sort of restart functionality anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:infra Shell scripts, cron scripts, web server, etc pri:high task:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant