How to train the llama 7B model on two Nvidia 3090 GPUs #3459
Unanswered
guijuzhejiang
asked this question in
Community | Q&A
Replies: 1 comment 1 reply
-
Hi @guijuzhejiang We have updated How to train with limited resources. Thanks. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to train the llama 7B model using the "colossalai_zero2" strategy on two 3090 GPUs, setted batch_size=1,but the following error of CUDA out of memory is reported:
OutOfMemoryError: CUDA out of memory. Tried to allocate 12.55 GiB (GPU 0; 23.70 GiB total capacity; 12.58 GiB already allocated; 10.13 GiB free; 12.59 GiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It seems that the vam required by the model is about 25G, but a single 3090GPU only has 24G video memory, and the two add up to 48G. So how to set the parameters to use two 3090GPUs to train the llama 7B model?
Beta Was this translation helpful? Give feedback.
All reactions