Not learning reward? #42

IanWangg · 2021-08-17T11:17:24Z

Hi,

In the original code base, the algo does not learn the reward function, instead, the agent computes cumulative reward using the true reward function (directly get from env). Meaning we are actually using some true information from the environment, is that allowed in the offline setting? Also, after changing the 'learn_reward' option to True in the config file, the performance is much worse.

IanWangg · 2021-08-18T12:28:12Z

cannot reproduce the results presented by the paper on D4RL benchmark when using the learned reward model.

Valerio-Colombo · 2022-01-10T17:13:16Z

Hi,
I'm having the same problem with the hopper example provided. Has anyone solved the problem?
The rollout reward is stuck at very low values

Valerio-Colombo · 2022-01-14T09:28:29Z

I may have a solution. In the readme, it suggests to change the configuration file putting 'learn_reward' = True and 'reward_file' = None. If you put 'reward_file' = None the execution stops due to an error. What I find is to keep the 'reward_file' field, but comment out the reward function contained in there. If you simply comment the 'reward_file' flag, you will also delete the termination function used for the rollouts.
At least in the hopper example, these changes fixed the problem.

IanWangg closed this as completed Aug 17, 2021

IanWangg reopened this Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not learning reward? #42

Not learning reward? #42

IanWangg commented Aug 17, 2021 •

edited

Loading

IanWangg commented Aug 18, 2021

Valerio-Colombo commented Jan 10, 2022 •

edited

Loading

Valerio-Colombo commented Jan 14, 2022

Not learning reward? #42

Not learning reward? #42

Comments

IanWangg commented Aug 17, 2021 • edited Loading

IanWangg commented Aug 18, 2021

Valerio-Colombo commented Jan 10, 2022 • edited Loading

Valerio-Colombo commented Jan 14, 2022

IanWangg commented Aug 17, 2021 •

edited

Loading

Valerio-Colombo commented Jan 10, 2022 •

edited

Loading