Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read data directly from GPU APIs #87

Open
lars-t-hansen opened this issue Aug 16, 2023 · 9 comments
Open

Read data directly from GPU APIs #87

lars-t-hansen opened this issue Aug 16, 2023 · 9 comments
Labels
enhancement New feature or request Later Low priority / background task performance

Comments

@lars-t-hansen
Copy link
Collaborator

Related to #86. Currently we run nvidia-smi and rocm-smi to obtain GPU data. This is bad for several reasons:

  • the output formats are idiosyncratic, not documented, not stable
  • sometimes we have to run commands multiple times to get all data we need

Much better would probably be to use the programmatic APIs towards the cards.

On the other hand, needing to link against these C libraries adds to the complexity of sonar and creates a situation where the same sonar binary may not be usable on all systems. So a compromise solution would be to create small C (probably) programs that we wrap around the programmatic APIs and invoke from sonar. These would need to be run once and would have a defined and compact output format.

@lars-t-hansen
Copy link
Collaborator Author

This may be the same issue or a separate issue: On a multi-card node it usually happens that a single card goes down. But in this case, nvidia-smi hangs or errors out for all the cards. Going directly to the API might fix that problem too.

(For NVIDIA at least that problem may be fixable while staying with nvidia-smi: we can enumerate devices by enumerating /dev/nvidia*, then use nvidia-smi -i to probe cards individually. But eight invocations for eight cards is not the situation we'd like to find ourselves in.)

@lars-t-hansen
Copy link
Collaborator Author

A related issue: Currently on ML9, sonar data (and nvidia-smi) say that one card is being used 100%, the others are idle. But nvtop says two cards are running at 100%. It would be useful to try to reduce this discrepancy.

@lars-t-hansen
Copy link
Collaborator Author

Re the output format: On the very new node gpu-13.fox, nvidia-smi pmon now has a different output format and sonar is not able to parse it.

@lars-t-hansen
Copy link
Collaborator Author

ml1: NVIDIA System Management Interface -- v545.23.08

$ nvidia-smi pmon -c 1 -s u
# gpu         pid  type    sm    mem    enc    dec    command
# Idx           #   C/G     %      %      %      %    name
    0    1174916     C     88     54      -      -    python         
    0    1186862     C      -      -      -      -    python3        
    1    1174916     C     92     53      -      -    python         
    1    1223470     C      -      -      -      -    python3        
    2    1174916     C     89     53      -      -    python         
    2     941737     C      -      -      -      -    python3        

gpu-13.fox: NVIDIA System Management Interface -- v550.54.14

$ nvidia-smi pmon -c 1 -s u
# gpu         pid   type     sm    mem    enc    dec    jpg    ofa    command 
# Idx           #    C/G      %      %      %      %      %      %    name 
    0          -     -      -      -      -      -      -      -    -              
    1          -     -      -      -      -      -      -      -    -              
    2          -     -      -      -      -      -      -      -    -              
    3          -     -      -      -      -      -      -      -    -              

It could look like the sensible thing to do here would be to decode the # gpu line and use that as a key into the other data.

@lars-t-hansen
Copy link
Collaborator Author

Going to fork that off as its own bug, and leave this bug to be about the original subject matter.

@lars-t-hansen
Copy link
Collaborator Author

As noted here we would need to build with the NVIDIA library called "nvml" to do this (on NVIDIA). It is poorly documented and part of a larger SDK, unclear if that is needed on every machine or just during build.

@lars-t-hansen
Copy link
Collaborator Author

From last week's Slurm conference: Slurm has been using nvml (and something related for AMD) to talk to the GPU but are finding that this is hard to manage - discrepancies between build system and deploy system are problematic. Plus NVIDIA have reportedly been changing the API even after promising not to do so. They are finding that they can get what they need from the /sys filesystem instead and in Slurm 24.11 the GPU monitoring will be via /sys. We should investigate this for the same reasons.

@lars-t-hansen
Copy link
Collaborator Author

$ cat /proc/driver/nvidia/gpus/*/information
Model: 		 NVIDIA GeForce RTX 2080 Ti
IRQ:   		 296
GPU UUID: 	 GPU-35080357-601c-7113-ec05-f6ca1e58a91e
Video BIOS: 	 90.02.17.00.b2
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:18:00.0
Device Minor: 	 0
GPU Excluded:	 No
Model: 		 NVIDIA GeForce RTX 2080 Ti
IRQ:   		 297
GPU UUID: 	 GPU-be013a01-364d-ca23-f871-206fe3f259ba
Video BIOS: 	 90.02.0b.40.09
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:3b:00.0
Device Minor: 	 1
GPU Excluded:	 No
Model: 		 NVIDIA GeForce RTX 2080 Ti
IRQ:   		 298
GPU UUID: 	 GPU-daa9f6ac-c8bf-87be-8adc-89b1e7d3f38a
Video BIOS: 	 90.02.0b.40.09
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:86:00.0
Device Minor: 	 2
GPU Excluded:	 No

@lars-t-hansen
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Later Low priority / background task performance
Projects
None yet
Development

No branches or pull requests

1 participant