Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not learning reward? #42

Open
IanWangg opened this issue Aug 17, 2021 · 3 comments
Open

Not learning reward? #42

IanWangg opened this issue Aug 17, 2021 · 3 comments

Comments

@IanWangg
Copy link

IanWangg commented Aug 17, 2021

Hi,

In the original code base, the algo does not learn the reward function, instead, the agent computes cumulative reward using the true reward function (directly get from env). Meaning we are actually using some true information from the environment, is that allowed in the offline setting? Also, after changing the 'learn_reward' option to True in the config file, the performance is much worse.

@IanWangg IanWangg reopened this Aug 18, 2021
@IanWangg
Copy link
Author

cannot reproduce the results presented by the paper on D4RL benchmark when using the learned reward model.

@Valerio-Colombo
Copy link

Valerio-Colombo commented Jan 10, 2022

Hi,
I'm having the same problem with the hopper example provided. Has anyone solved the problem?
The rollout reward is stuck at very low values

@Valerio-Colombo
Copy link

I may have a solution. In the readme, it suggests to change the configuration file putting 'learn_reward' = True and 'reward_file' = None. If you put 'reward_file' = None the execution stops due to an error. What I find is to keep the 'reward_file' field, but comment out the reward function contained in there. If you simply comment the 'reward_file' flag, you will also delete the termination function used for the rollouts.
At least in the hopper example, these changes fixed the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants