Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental results of MoREL for D4RL benchmarks #35

Open
stevenyangyj opened this issue Mar 12, 2021 · 8 comments
Open

Experimental results of MoREL for D4RL benchmarks #35

stevenyangyj opened this issue Mar 12, 2021 · 8 comments

Comments

@stevenyangyj
Copy link

Hi,
Thanks so much for the open-sourced work. The results presented here are very impressive. But I really cannot reproduce the experimental results shown in "Readme page", especially the random datasets in D4RL benchmarks. Could you please share the configs fils and "reward_functions.".py used for D4RL benchamrks?

Btw:
Here is my results in the hopper-random-v0 and hopper-medium-v0 datasets, using the default hyperparameters from d4rl_hopper_medium.txt

eval_score

eval_score_medium

@BrunoBSM
Copy link

BrunoBSM commented Mar 17, 2021

I'm having issues with reproducing the work as well. I noticed the d4rl_hopper_medium.txt configuration is not the same as in the paper. However, by adapting it to the information given in the paper I still had trouble reproducing it.

Here is my config and results:

# general inputs

'env_name'      :   'Hopper-v2',
'act_repeat'    :   1,
'seed'          :   123,
'num_iter'      :   500,
'eval_rollouts' :   4,
'num_models'    :   4,
'save_freq'     :   25,
'device'        :   'cuda',
'learn_reward'  :   False,
'data_file'     :   'datasets/hopper-medium-v2.pickle',
'reward_file'   :   '../model_based_npg/utils/reward_functions/gym_hopper.py',
'model_file'    :   'hopper-medium-v2-models.pickle',
'bc_init'       :   True,
'pessimism_coef':   3.0,
'truncate_reward' : 0.0,
'exp_notes'     :   'Reproduction of MOReL paper results',

# dynamics learning

'hidden_size'   :   (512, 512),
'activation'    :   'relu',
'fit_lr'        :   1e-3,
'fit_wd'        :   0.0,
'fit_mb_size'   :   256,
'fit_epochs'    :   300,
'refresh_fit'   :   False,
'max_steps'     :   1e8,

# NPG params

'policy_size'   :   (32, 32),
'step_size'     :   0.02,
'init_log_std'  :   -0.25,
'min_log_std'   :   -2.0,
'gamma'         :   0.999,
'gae_lambda'    :   0.97,
'update_paths'  :   50,
'start_state'   :   'init',
'horizon'       :   400,
'npg_hp'        :   dict(FIM_invert_args={'iters': 25, 'damping': 1e-4}),

plot

I imagine some configurations are wrong. If this is true, please provide us the correct configuration to reproduce the work.

@IcarusWizard
Copy link

Hi guys. I just go through the code, and I find the setup of NumPy seed is missing in the learn_model.py script which will cause the results to be unreproducible.

@aravindr93
Copy link
Owner

Thank you for the interest and apologies for the slow reply (I've been a bit behind due to my PhD defense). Below are quick replies. I will leave the issue open and provide more detailed answers in a week after ICML rebuttals are over.

  • @OverEuro the config/hyperparameters between the random and medium datasets are different, especially the pessimism coefficient. I will find and share the configs shortly. For the medium dataset, the results appear to be in the correct ballpark initially before the performance seems to degrade. This suggests the dynamics model is perhaps slightly different due to the random seed issue @IcarusWizard mentioned.

  • @BrunoBSM thanks for the question. The hyperparamters mentioned in the paper are for the BRAC datasets and not D4RL (which we included in the most recent update on arxiv). I will find and share the config files for other D4RL tasks.

  • @IcarusWizard thanks for catching this. I will fix this in the next update to the code.

@chicwzh
Copy link

chicwzh commented Apr 26, 2021

Hi, I'm having issues with reproducing the work as well. Here is my results in the hopper-medium-v0 datasets, using the official code from https://sites.google.com/view/morel and default hyperparameters from d4rl_hopper_medium.txt.
image

@jihwan-jeong
Copy link

jihwan-jeong commented Sep 16, 2021

Hi :) Thanks for the great work! @aravindr93 If you don't mind asking, I wonder when would you be able to share the configs for D4RL results? If they're already shared, I'd appreciate it if anyone could point me to where I can find them. Thanks! :)

@symoon11
Copy link

symoon11 commented Sep 30, 2021

Hi guys. I will give you some tips for reproducing MOReL on D4RL dataset.

  1. You should properly transform actions from synthetic trajectories when you feed them into a dynamics model. Actions in D4RL dataset have values within a range of (-1, 1), since they are generated from a policy with tanh output activation. However, the policy used in this library does not have any output activation so that actions from this policy can have any real values.
  2. MOReL is very sensitive to hyperparameters. I recommend you sweep pessimism_coef, init_log_std, damping and choose the best.

I hope this will be helpful for your research.

@qsa-fox
Copy link

qsa-fox commented Oct 20, 2021

I reproduced the hopper-random-v0 result on d4rl by using the d4rl_hopper_medium.txt, excepting modified the pessimism_coef to 0. I also found that the performance is sensitive to the trained WorldModel, if you cannot get a good result, try retrain the WorldModels.
eval_score
Here is the configuation file:
d4rl_hopper_random.txt

@XuJing1022
Copy link

Hi, I run the command following the readme file as https://github.com/aravindr93/mjrl/tree/v2/projects/morel, but my results is different from the picture shown. Here is my results:
image
And I haven't changed any hyperparameters. Can you tell how to reproduce the results? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants