Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppo_target在何处使用 #6297

Closed
1 task done
wzongyu opened this issue Dec 10, 2024 · 0 comments
Closed
1 task done

ppo_target在何处使用 #6297

wzongyu opened this issue Dec 10, 2024 · 0 comments
Labels
invalid This doesn't seem right

Comments

@wzongyu
Copy link

wzongyu commented Dec 10, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.0
Platform: Linux-4.18.0-408.el8.x86_64-x86_64-with-glibc2.35
Python version: 3.10.14
PyTorch version: 2.3.1+cu121 (GPU)
Transformers version: 4.43.2
Datasets version: 2.19.0
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.8.6
GPU type: NVIDIA - H100
DeepSpeed version: 0.14.0
Bitsandbytes version: 0.43.0

Reproduction

NA

Expected behavior

ppo有一个kl散度的参数ppo_target,但是查看完代码并未发现是在何处进行调用的,只有一个赋值的行为。而且调整这个参数的值进行ppo训练并不会影响loss和reward。除非把他设置为0。请大神解惑。

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 10, 2024
@wzongyu wzongyu closed this as completed Dec 10, 2024
@hiyouga hiyouga added invalid This doesn't seem right and removed pending This problem is yet to be addressed labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants