You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The fixed log prob part of the line and the "get_loss" function part are exactly the same.
The two parts are executed consecutively so that the two values ("fixed_log_prob", "log_prob") are exactly the same.
Is there a reason you wrote the code like this?
The text was updated successfully, but these errors were encountered:
Hi, I believe the code should be correct. If you check the "fixed_log_prob", it is a constant tensor (no dependent on nn parameters); and if you check the "log_prob" you will see "grad_fn = ..." (dependent on the nn parameters). This is exactly what we want for importance sampling(treat the current parameters as fixed old policy), same logic apply for get_kl().
pytorch-trpo/main.py
Lines 108 to 119 in e200eb8
The fixed log prob part of the line and the "get_loss" function part are exactly the same.
The two parts are executed consecutively so that the two values ("fixed_log_prob", "log_prob") are exactly the same.
Is there a reason you wrote the code like this?
The text was updated successfully, but these errors were encountered: