关于Qwen2-72B 全量参数微调所需的显卡下限 #4615
Replies: 6 comments 2 replies
-
batchsize 是 1 吗,把 cutoff_len 降低一两倍看看能不能跑起来 |
Beta Was this translation helpful? Give feedback.
-
是的,per_device_train_batch_size=1,但是我就是需要packing到Qwen支持的最大长度32768呢,是显卡不够吗,贫穷的原因吗,还是说有什么更不占显存的配置呢~ 这是我的zero3+offload的配置 |
Beta Was this translation helpful? Give feedback.
-
目前可能不太支持这么长序列的微调,后续会增加方法。建议先以 8k 长度训练 |
Beta Was this translation helpful? Give feedback.
-
@hiyouga 后续不知道是否可以参考一下这个repo来改进下长序列微调?https://github.com/jzhang38/EasyContext |
Beta Was this translation helpful? Give feedback.
-
请问大佬我在全量参数sft qwen2-72b的时候,如果只是1000条数据,可以跑起来;但是当用2M+1.1M之后,相同配置就会爆显存,请问大佬有遇到这个问题吗 |
Beta Was this translation helpful? Give feedback.
-
@silvercherry 应该是2M的数据里混入了很长的训练数据。可以分别统计下这1000条和 2M的token长度。把长数据过滤调。 |
Beta Was this translation helpful? Give feedback.
-
Reminder
System Info
Reproduction
如题,
请问全量微调qwen2-72B,训练maxlen为32768,128张A800,80G显存,zero3+offload,这个配置是本来就会爆显存吗,还是说有办法可以跑起来呢。
谢谢~
Expected behavior
No response
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions