How to train the llama 7B model on two Nvidia 3090 GPUs #3459

guijuzhejiang · 2023-04-06T03:18:40Z

guijuzhejiang
Apr 6, 2023

I want to train the llama 7B model using the "colossalai_zero2" strategy on two 3090 GPUs, setted batch_size=1,but the following error of CUDA out of memory is reported:
OutOfMemoryError: CUDA out of memory. Tried to allocate 12.55 GiB (GPU 0; 23.70 GiB total capacity; 12.58 GiB already allocated; 10.13 GiB free; 12.59 GiB reserved in total by PyTorch)
If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
It seems that the vam required by the model is about 25G, but a single 3090GPU only has 24G video memory, and the two add up to 48G. So how to set the parameters to use two 3090GPUs to train the llama 7B model?

binmakeswell · 2023-05-05T03:44:12Z

binmakeswell
May 5, 2023
Maintainer

Hi @guijuzhejiang We have updated How to train with limited resources. Thanks.
https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat#faq

1 reply

dopu2k16 Aug 21, 2023

How to pretrain LLaMa-7B on a custom text file on single A100 using colossai ai?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train the llama 7B model on two Nvidia 3090 GPUs #3459

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to train the llama 7B model on two Nvidia 3090 GPUs #3459

guijuzhejiang Apr 6, 2023

Replies: 1 comment · 1 reply

binmakeswell May 5, 2023 Maintainer

dopu2k16 Aug 21, 2023

guijuzhejiang
Apr 6, 2023

Replies: 1 comment 1 reply

binmakeswell
May 5, 2023
Maintainer