Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

昇腾训练千问2-7B DPO下设置batch_size问题求助 #6315

Open
1 task done
liuanping opened this issue Dec 12, 2024 · 0 comments
Open
1 task done

昇腾训练千问2-7B DPO下设置batch_size问题求助 #6315

liuanping opened this issue Dec 12, 2024 · 0 comments
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed

Comments

@liuanping
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

利用昇腾机器通过DPO训练方法训练千问模型,报显存不足,当然通过一定的方法能训练起来,但是存在隐患,想找一个更好的解决办法,特来求助:
设置:
昇腾8机,64卡 64GB显存机器
seq_length 4096
micro_batch_size_per_gpu 1
train_batch_size 64
gradient_accumulation_step 1
这样的情况下跑着跑着会显存不足。

后来设置seq_length 2048 显存就够了 就能跑完结果

本想设置train_batch_size=32来降低显存占用,但是会报错 train_batch_size!=micro_batch_size_per_gpu*gradient_accumulation_step *world_size

想问的是是否有什么办法把train_batch_size设置成32 可以正常跑 例如设置PP TP这样的参数 来实现更灵活的batch_size设置。

Reproduction

运行的脚本:torchrun --nonodes=$NNODES --node_rank=$NODE_RANK --nproc_per_node=$NGPUS_PER_NODE --master_addr $MASTER_ADDR --master_port $MASTER_PORT -m train example/train_full/llama_8B_full_train.yaml

Expected behavior

不涉及

Others

不涉及

@github-actions github-actions bot added pending This problem is yet to be addressed npu This problem is related to NPU devices labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
npu This problem is related to NPU devices pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant