You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, to do on-policy, or off policy, or "true" off policy sampling efficiently, you need to set "recalculate" flags appropriately.
If you do this incorrectly, you end up with bugs.
We propose to change things such that everything is reclaulated by default. Therefore the user always gets the expected output, and will only need to learn the API to speed up their code.
This will need to be accompanied by documentation.
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: