You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment the environments only have cardinal directional action space. This complicates analytically solving some of the environments such as lava land where the mouse spawns surrounded by lava and in cases where a mouse spawns on the same square as cheese for example (though we usually try and avoid the latter).
Consider adding a no-op action which would simplify these corner cases. Maze solving code already supports possibility for no-op actions.
Aside from changing the environments and level solvers themselves, some changes would be required for example to policy heatmap plotting (thankfully the diamond plots can still work with the central square used to represent the no-op action). Also some of the environment demos such as interactive mode.
The main negative side effect would be that existing baselines would no longer be compatible with the new environments because the architecture type signature would be changing. This also means old checkpoints would no longer be load-able.
The text was updated successfully, but these errors were encountered:
At the moment the environments only have cardinal directional action space. This complicates analytically solving some of the environments such as lava land where the mouse spawns surrounded by lava and in cases where a mouse spawns on the same square as cheese for example (though we usually try and avoid the latter).
Consider adding a no-op action which would simplify these corner cases. Maze solving code already supports possibility for no-op actions.
Aside from changing the environments and level solvers themselves, some changes would be required for example to policy heatmap plotting (thankfully the diamond plots can still work with the central square used to represent the no-op action). Also some of the environment demos such as interactive mode.
The main negative side effect would be that existing baselines would no longer be compatible with the new environments because the architecture type signature would be changing. This also means old checkpoints would no longer be load-able.
The text was updated successfully, but these errors were encountered: