Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Hopper devices #11

Open
everburstSun opened this issue Dec 19, 2024 · 0 comments
Open

Issue with Hopper devices #11

everburstSun opened this issue Dec 19, 2024 · 0 comments

Comments

@everburstSun
Copy link

I'm testing the library with the following configurations:

Driver Version: 535.183.01
CUDA Version: 12.2
CUDA Toolkit Version: 12.6
GPU: Nvidia H100

With the default settings, running the dsscfg program I got the following output:

Begin testing DSSCFG on the CPU (double precision)
CPU iteration 0 F: -0.349732
CPU iteration 0 F: 0.0192875
CPU iteration 0 F: -0.496264
CPU iteration 100 F: -0.888434
CPU iteration 200 F: -1.00093
CPU iteration 300 F: -1.01115
CPU iteration 400 F: -1.01126
CPU iteration 500 F: -1.01129
Timing: 13.946 ms / iteration
Begin testing DSSCFG with CUDA (double precision)
terminate called after throwing an instance of 'thrust::THRUST_200500_600_610_700_720_750_860_NS::system::system_error'
  what():  after reduction step 1: cudaErrorInvalidDevice: invalid device ordinal
Aborted (core dumped)

Then I added 90 to the CMakeLists.txt at

foreach(ComputeCapability 60 61 70 72 75 86 90)

The program can run without throwing an exception. But the GPU minimization got stuck and didn't continue further:

Begin testing DSSCFG on the CPU (double precision)
CPU iteration 0 F: -0.349732
CPU iteration 0 F: 0.0192875
CPU iteration 0 F: -0.496264
CPU iteration 100 F: -0.888434
CPU iteration 200 F: -1.00093
CPU iteration 300 F: -1.01115
CPU iteration 400 F: -1.01126
CPU iteration 500 F: -1.01129
Timing: 20.0913 ms / iteration
Begin testing DSSCFG with CUDA (double precision)
CUDA iteration 0 F: -0.349732
CUDA iteration 0 F: -0.199307
CUDA iteration 0 F: -0.496264

I'm not sure whether is it because the hardware was not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant