You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This discussion was converted from issue #6342 on December 17, 2024 10:16.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Reminder
System Info
作者你好,
我采用sft -> dpo两阶段训练方式,每一阶段的训练都是采用lora,运行命令如下:
sft:
FORCE_TORCHRUN=1 NNODES=$WORLD_SIZE NODE_RANK=$RANK MASTER_ADDR=$MASTER_ADDR MASTER_PORT=$MASTER_PORT TORCH_USE_CUDA_DSA=1 CUDA_LAUNCH_BLOCKING=1 WANDB_MODE=disabled llamafactory-cli train
--model_name_or_path Qwen2-VL-7B-Instruct
--stage sft
--do_train
--finetuning_type lora
--deepspeed examples/deepspeed/ds_z3_config.json
--dataset $dataset
--template qwen2_vl
--cutoff_len 1000000
--max_samples 100000000
--preprocessing_num_workers 128
--output_dir $output_dir
--logging_steps 10
--save_steps 50
--plot_loss
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--learning_rate 1e-4
--num_train_epochs 3.0
--lr_scheduler_type cosine
--warmup_ratio 0.1
--bf16
--ddp_timeout 180000000
--val_size 0.05
--per_device_eval_batch_size 1
--eval_strategy steps
--eval_steps 10000
--video_maxlen 768
--overwrite_output_dir
--overwrite_cache True \
dpo:
WANDB_MODE=disabled llamafactory-cli train
--model_name_or_path ./LLaMA-Factory/saves/qwen2_vl-7b/mix_sft_72b_bs64/1e-4/full_model
--stage dpo
--do_train true
--finetuning_type lora
--lora_target all
--pref_beta 0.1
--pref_loss sigmoid
--deepspeed examples/deepspeed/ds_z3_config.json
--dataset $dataset
--template qwen2_vl
--cutoff_len 10000000
--max_samples 100000
--preprocessing_num_workers 32
--output_dir $output_dir
--logging_steps 10
--save_steps 20
--plot_loss
--per_device_train_batch_size 1
--gradient_accumulation_steps 16
--learning_rate 5e-6
--num_train_epochs 10
--lr_scheduler_type cosine
--warmup_ratio 0.1
--bf16
--ddp_timeout 180000000
--val_size 0.0001
--per_device_eval_batch_size 1
--eval_strategy steps
--eval_steps 500
--video_maxlen 128
--overwrite_output_dir
--overwrite_cache True
其中,./LLaMA-Factory/saves/qwen2_vl-7b/mix_sft_72b_bs64/1e-4/full_model保存了合并sft lora之后的模型;
但是我打印了一下dpo训练之前和训练之后的模型,发现二者的adapter_config.json也完全一致,workflow.py传入trainer之前的网络结构完全一致,如下:
PeftModelForCausalLM(
(base_model): LoraModel(
(model): Qwen2VLForConditionalGeneration(
(visual): Qwen2VisionTransformerPretrainedModel(
(patch_embed): PatchEmbed(
(proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
)
(rotary_pos_emb): VisionRotaryEmbedding()
(blocks): ModuleList(
(0-31): 32 x Qwen2VLVisionBlock(
(norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(attn): VisionSdpaAttention(
(qkv): Linear(in_features=1280, out_features=3840, bias=True)
(proj): Linear(in_features=1280, out_features=1280, bias=True)
)
(mlp): VisionMlp(
(fc1): Linear(in_features=1280, out_features=5120, bias=True)
(act): QuickGELUActivation()
(fc2): Linear(in_features=5120, out_features=1280, bias=True)
)
)
)
(merger): PatchMerger(
(ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=5120, out_features=5120, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=5120, out_features=3584, bias=True)
)
)
)
(model): Qwen2VLModel(
(embed_tokens): Embedding(152064, 3584)
(layers): ModuleList(
(0-27): 28 x Qwen2VLDecoderLayer(
(self_attn): Qwen2VLSdpaAttention(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=3584, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=512, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=512, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=512, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=512, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=3584, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): Qwen2VLRotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=18944, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=18944, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=18944, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=18944, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=18944, out_features=3584, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=18944, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((0,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((0,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((0,), eps=1e-06)
(rotary_emb): Qwen2VLRotaryEmbedding()
)
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)
)
)
现在这个网络看起来只是dpo lora从sft lora初始化,而不是在此基础上合并一个新的lora;如果我希望dpo训练完应该合并一个新的lora,请问应该如何正确操作呢?
Reproduction
Expected behavior
No response
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions