We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llamafactory version: 0.9.0 Platform: Linux-4.18.0-408.el8.x86_64-x86_64-with-glibc2.35 Python version: 3.10.14 PyTorch version: 2.3.1+cu121 (GPU) Transformers version: 4.43.2 Datasets version: 2.19.0 Accelerate version: 0.30.1 PEFT version: 0.11.1 TRL version: 0.8.6 GPU type: NVIDIA - H100 DeepSpeed version: 0.14.0 Bitsandbytes version: 0.43.0
NA
ppo有一个kl散度的参数ppo_target,但是查看完代码并未发现是在何处进行调用的,只有一个赋值的行为。而且调整这个参数的值进行ppo训练并不会影响loss和reward。除非把他设置为0。请大神解惑。
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Reminder
System Info
llamafactory version: 0.9.0
Platform: Linux-4.18.0-408.el8.x86_64-x86_64-with-glibc2.35
Python version: 3.10.14
PyTorch version: 2.3.1+cu121 (GPU)
Transformers version: 4.43.2
Datasets version: 2.19.0
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.8.6
GPU type: NVIDIA - H100
DeepSpeed version: 0.14.0
Bitsandbytes version: 0.43.0
Reproduction
NA
Expected behavior
ppo有一个kl散度的参数ppo_target,但是查看完代码并未发现是在何处进行调用的,只有一个赋值的行为。而且调整这个参数的值进行ppo训练并不会影响loss和reward。除非把他设置为0。请大神解惑。
Others
No response
The text was updated successfully, but these errors were encountered: