dpo dataset requirements not identical to ppo #5147

Katehuuh · 2024-08-11T20:56:55Z

Katehuuh
Aug 11, 2024

Why are the dataset requirements for ppo not the same as for Preference Dataset: stage: dpo? Both should be RLHF techniques, PPO compute rewards which are typically needed not for SFT-style.

LLaMA-Factory/examples/train_lora/llama3_lora_ppo.yaml

Line 12 in c93d55b

dataset: identity,alpaca_en_demo

LLaMA-Factory/examples/train_lora/llama3_lora_dpo.yaml

Line 13 in c93d55b

dataset: dpo_en_demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo dataset requirements not identical to ppo #5147

{{title}}

Replies: 0 comments

Select a reply

dpo dataset requirements not identical to ppo #5147

Katehuuh Aug 11, 2024

Replies: 0 comments

Katehuuh
Aug 11, 2024