You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are issues with the scoring calculation of expert strategies in the maze2d environment.
The incorrect scoring calculation is a result of the expert strategies not being called properly.
The scores of the experts should be higher than the current scores.
Description:
Environment: maze2d
If you utilize the provided code (scripts/reference_scores/maze2d_controller.py) to calculate the score of the expert strategy, it may yield inaccurate results.
The WaypointController strategy (expert strategy) may only produce accurate results in the initial episode. However, it is likely to fail in reaching the goal point during subsequent episodes in the maze2d environment.
Why this happen?
The issue arises from the expert strategy implemented in the d4rl/pointmaze/waypoint_controller.py file. Specifically, the get_action function serves as the action selection mechanism for the expert strategy, and it contains the following code snippet:
This code implies that the waypoints will only be recalculated when the endpoint changes.
Taking into consideration the code in scripts/reference_scores/maze2d_controller.py, it appears that the self._new_target() function is executed solely at the beginning of the first episode. This is because env.reset() does not modify the endpoint. Consequently, in subsequent episodes, the waypoints will not be recalculated, and instead, the waypoints from the initial trajectory will be reused. As a result, the optimal strategy fails to achieve the desired outcome.
Experiment
Upon incorporating env.render() into the scripts/reference_scores/maze2d_controller.py file, it was observed that the expert strategy indeed fails to reach the target point. The video has been uploaded to Google Drive:
After making modifications to the code, I conducted a re-evaluation of the expert strategy across different environments. The results are presented below:
env_name
maze2d-umaze-v1
maze2d-medium-v1
maze2d-large-v1
expert policy(new)
223.48
420.48
551.23
expert policy(old)
161.86
277.39
273.99
The text was updated successfully, but these errors were encountered:
onceagain8
changed the title
[Bug Report] Expert score for maze2d environment may be wrong
[Question] Expert score for maze2d environment may be wrong
Jul 9, 2023
Hi, I think you're right. I trained the decision transformer in maze2d-medium-dense-v1 environment and calculated the normalized score with this command: env.get_normalized_score(average return of 100 episodes). However, I obtained a score of 56, which does not align with the reported maximum score of 35 in the paper " QDT: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL".
I wanted to know if you have calculated the expert score for maze2d-medium-dense-v1?
Hi, I think you're right. I trained the decision transformer in maze2d-medium-dense-v1 environment and calculated the normalized score with this command: env.get_normalized_score(average return of 100 episodes). However, I obtained a score of 56, which does not align with the reported maximum score of 35 in the paper " QDT: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL". I wanted to know if you have calculated the expert score for maze2d-medium-dense-v1?
Hi, I'm also attempting to calculate normalized score with command: env.get_normalized_score(average return of 100 episodes) in antmaze task , but can't get correct score repported in the paper. Have you found a solution to this issue?
Summary
Description:
Environment: maze2d
If you utilize the provided code (
scripts/reference_scores/maze2d_controller.py
) to calculate the score of the expert strategy, it may yield inaccurate results.The WaypointController strategy (expert strategy) may only produce accurate results in the initial episode. However, it is likely to fail in reaching the goal point during subsequent episodes in the maze2d environment.
Why this happen?
The issue arises from the expert strategy implemented in the
d4rl/pointmaze/waypoint_controller.py
file. Specifically, theget_action
function serves as the action selection mechanism for the expert strategy, and it contains the following code snippet:This code implies that the waypoints will only be recalculated when the endpoint changes.
Taking into consideration the code in
scripts/reference_scores/maze2d_controller.py
, it appears that theself._new_target()
function is executed solely at the beginning of the first episode. This is becauseenv.reset()
does not modify the endpoint. Consequently, in subsequent episodes, the waypoints will not be recalculated, and instead, the waypoints from the initial trajectory will be reused. As a result, the optimal strategy fails to achieve the desired outcome.Experiment
Upon incorporating
env.render()
into thescripts/reference_scores/maze2d_controller.py
file, it was observed that the expert strategy indeed fails to reach the target point. The video has been uploaded to Google Drive:After making modifications to the code, I conducted a re-evaluation of the expert strategy across different environments. The results are presented below:
The text was updated successfully, but these errors were encountered: