ppo_target在何处使用 #6297

wzongyu · 2024-12-10T05:08:34Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.0
Platform: Linux-4.18.0-408.el8.x86_64-x86_64-with-glibc2.35
Python version: 3.10.14
PyTorch version: 2.3.1+cu121 (GPU)
Transformers version: 4.43.2
Datasets version: 2.19.0
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.8.6
GPU type: NVIDIA - H100
DeepSpeed version: 0.14.0
Bitsandbytes version: 0.43.0

Reproduction

NA

Expected behavior

ppo有一个kl散度的参数ppo_target，但是查看完代码并未发现是在何处进行调用的，只有一个赋值的行为。而且调整这个参数的值进行ppo训练并不会影响loss和reward。除非把他设置为0。请大神解惑。

Others

No response

github-actions bot added the pending This problem is yet to be addressed label Dec 10, 2024

wzongyu closed this as completed Dec 10, 2024

hiyouga added invalid This doesn't seem right and removed pending This problem is yet to be addressed labels Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo_target在何处使用 #6297

ppo_target在何处使用 #6297

wzongyu commented Dec 10, 2024 •

edited

Loading

ppo_target在何处使用 #6297

ppo_target在何处使用 #6297

Comments

wzongyu commented Dec 10, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

wzongyu commented Dec 10, 2024 •

edited

Loading