Skip to content

为什么DPO训练时ref_model不指定为初始的SFT完的模型,而是用每个batch参数更新后的policy model来初始化呢? #1801

为什么DPO训练时ref_model不指定为初始的SFT完的模型,而是用每个batch参数更新后的policy model来初始化呢?

为什么DPO训练时ref_model不指定为初始的SFT完的模型,而是用每个batch参数更新后的policy model来初始化呢? #1801

Triggered via issue December 12, 2024 03:18
Status Success
Total duration 14s
Artifacts

label_issue.yml

on: issues
label_issue
3s
label_issue
Fit to window
Zoom out
Zoom in

Annotations

1 warning
label_issue
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636