Replies: 1 comment 1 reply
-
单机多卡时候能选择特定显卡吗? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
最近在使用llamafactory的过程中发现,使用CUDA_VISIBLE_DEVICES=0 FORCE_TORCHRUN=1 NNODES=2 RANK=0 llamafactory-cli train examples/train_lora/glm4-9b-chat-sft.yaml命令开启训练会把所有的卡都用上,其中CUDA_VISIBLE_DEVICES参数不起作用,也就是我们在指定显卡之后还是会在其他卡上进行训练,导致OOM。
使用设备
3080Ti * 4 和 3090 * 4
method
model_name_or_path: /wspace/aigc/weights/THUDM/chatglm4-9b
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json # 无论是用z0、z2、z3还是不使用deepspeed都会多卡加载
dataset
dataset: webqa,webnovel
template: llama3
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/glm4-9b-chat/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
Beta Was this translation helpful? Give feedback.
All reactions