You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
flat_grad_grad_kl = torch.cat([grad.contiguous().view(-1) for grad in grads]).data
return flat_grad_grad_kl + v * damping
stepdir = conjugate_gradients(Fvp, -loss_grad, 10)
shs = 0.5 * (stepdir * Fvp(stepdir)).sum(0, keepdim=True)
lm = torch.sqrt(shs / max_kl)
fullstep = stepdir / lm[0]
According to the TRPO formular, $direction=\sqrt{(\frac{2\delta}{g^T F^{-1} g})} F^{-1} g$,
So $shs=g^T F^{-1} g$,
but your coding is different from that, why?
The text was updated successfully, but these errors were encountered:
According to the TRPO formular,
$direction=\sqrt{(\frac{2\delta}{g^T F^{-1} g})} F^{-1} g$ ,$shs=g^T F^{-1} g$ ,
So
but your coding is different from that, why?
The text was updated successfully, but these errors were encountered: