⚡️ Enable training with CUDA #59

emprice · 2024-11-24T05:01:03Z

PR implementing the feature suggested in #58. The integer matmul operation needed for a masked autoregressive transform should be explicitly carried out on CPU and then migrated to the current default device. All other tensors get created on the default device anyway. This way, if the user calls torch.set_default_device('cuda'), that preference will be followed.

I have modified existing tests to make sure that no existing code should be broken by my modification; all tests are now performed on CPU and with CUDA to be sure they pass either way.

emprice · 2024-11-24T19:21:28Z

Looks like torch.set_default_device() and torch.get_default_device() are newer than I realized. torch.set_default_device() is only used for testing, and those could easily be conditionally skipped. Not sure how to fix the issue with torch.get_default_device(). Converting to a NumPy array and then using torch.tensor is a possibility as a workaround.

francois-rozet

Thank you for your PR. Unfortunately, I think there is a better approach to address #58.

The cast to .int() in MaskedMLP is there because matmul is not possible for boolean tensors on CPU (and CUDA). We can simply cast to .double() instead to support CUDA. I tried in b5fc6cc and it works perfectly.

Concerning the CI, instead of modifying many (but not all) tests, it would be better to add a global option to pytest to run all tests with set_device_default("cuda"). I tried in 2f325e4, which allows to run

pytest --device cuda

francois-rozet · 2024-11-24T20:11:36Z

zuko/nn.py

+        # PyTorch doesn't support this operation for integer arrays on CUDA devices
+        precedence = (
+            adjacency.cpu().int() @ adjacency.cpu().int().t() == adjacency.sum(dim=-1).cpu()
+        )
+        try:
+            precedence = precedence.to(torch.get_default_device())
+        except AttributeError:
+            precedence = torch.tensor(precedence.detach().cpu().numpy())


The cast to .int() is here because matmul is not possible for boolean tensors on CPU (and CUDA). We can simply cast to .double() instead to support CUDA.

emprice · 2024-11-24T21:42:44Z

No longer needed -- #58 has been fixed!

⚡️ Enable training with CUDA

31a2f21

💚 Ensure CI passes with older PyTorch

a98988d

francois-rozet reviewed Nov 24, 2024

View reviewed changes

emprice closed this Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Enable training with CUDA #59

⚡️ Enable training with CUDA #59

emprice commented Nov 24, 2024

emprice commented Nov 24, 2024

francois-rozet left a comment

francois-rozet Nov 24, 2024

emprice commented Nov 24, 2024

⚡️ Enable training with CUDA #59

⚡️ Enable training with CUDA #59

Conversation

emprice commented Nov 24, 2024

emprice commented Nov 24, 2024

francois-rozet left a comment

Choose a reason for hiding this comment

francois-rozet Nov 24, 2024

Choose a reason for hiding this comment

emprice commented Nov 24, 2024