what is shs? #21

YUYING07 · 2023-03-02T08:44:15Z

    flat_grad_grad_kl = torch.cat([grad.contiguous().view(-1) for grad in grads]).data

    return flat_grad_grad_kl + v * damping

stepdir = conjugate_gradients(Fvp, -loss_grad, 10)

shs = 0.5 * (stepdir * Fvp(stepdir)).sum(0, keepdim=True)

lm = torch.sqrt(shs / max_kl)
fullstep = stepdir / lm[0]

According to the TRPO formular,
$direction=\sqrt{(\frac{2\delta}{g^T F^{-1} g})} F^{-1} g$,
So $shs=g^T F^{-1} g$,
but your coding is different from that, why?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is shs? #21

what is shs? #21

YUYING07 commented Mar 2, 2023

what is shs? #21

what is shs? #21

Comments

YUYING07 commented Mar 2, 2023