You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am a newcomer to drl. When I try to read trpo_step in trpo.py, I notice that you use a linesearch method instead of trust region for numerical optimization. So I want to know why you choose that method and dose it conflict with a "trust region" policy gradient algorithm?
The text was updated successfully, but these errors were encountered:
Hi, I am a newcomer to drl. When I try to read trpo_step in trpo.py, I notice that you use a linesearch method instead of trust region for numerical optimization. So I want to know why you choose that method and dose it conflict with a "trust region" policy gradient algorithm?
The text was updated successfully, but these errors were encountered: