What is the current recommended approach for data parallelism? #9508

davidireland3 · 2024-07-15T10:13:14Z

davidireland3
Jul 15, 2024

About 3-4 years ago I used torch_geometric.nn.data_parallel to split my data across multiple GPU's, but it seems now it is deprecated / not the recommended approach. I'm looking at the info for distributed training but I'm not sure if it fits my use case.

I don't have a single very large graph that won't fit on memory but rather I am looking to obtain graph level embeddings, and so when I batch my data that is when the memory becomes an issue and I want to try a larger batch size by splitting the batch across multiple GPU's. FWIW I will then be looking to apply some contrastive loss to the graphs, not sure if this makes a difference for what the recommended best approach would be.

Thanks for any help!