Experimental results of MoREL for D4RL benchmarks #35

stevenyangyj · 2021-03-12T06:57:58Z

Hi,
Thanks so much for the open-sourced work. The results presented here are very impressive. But I really cannot reproduce the experimental results shown in "Readme page", especially the random datasets in D4RL benchmarks. Could you please share the configs fils and "reward_functions.".py used for D4RL benchamrks?

Btw:
Here is my results in the hopper-random-v0 and hopper-medium-v0 datasets, using the default hyperparameters from d4rl_hopper_medium.txt

BrunoBSM · 2021-03-17T19:12:46Z

I'm having issues with reproducing the work as well. I noticed the d4rl_hopper_medium.txt configuration is not the same as in the paper. However, by adapting it to the information given in the paper I still had trouble reproducing it.

Here is my config and results:

# general inputs

'env_name'      :   'Hopper-v2',
'act_repeat'    :   1,
'seed'          :   123,
'num_iter'      :   500,
'eval_rollouts' :   4,
'num_models'    :   4,
'save_freq'     :   25,
'device'        :   'cuda',
'learn_reward'  :   False,
'data_file'     :   'datasets/hopper-medium-v2.pickle',
'reward_file'   :   '../model_based_npg/utils/reward_functions/gym_hopper.py',
'model_file'    :   'hopper-medium-v2-models.pickle',
'bc_init'       :   True,
'pessimism_coef':   3.0,
'truncate_reward' : 0.0,
'exp_notes'     :   'Reproduction of MOReL paper results',

# dynamics learning

'hidden_size'   :   (512, 512),
'activation'    :   'relu',
'fit_lr'        :   1e-3,
'fit_wd'        :   0.0,
'fit_mb_size'   :   256,
'fit_epochs'    :   300,
'refresh_fit'   :   False,
'max_steps'     :   1e8,

# NPG params

'policy_size'   :   (32, 32),
'step_size'     :   0.02,
'init_log_std'  :   -0.25,
'min_log_std'   :   -2.0,
'gamma'         :   0.999,
'gae_lambda'    :   0.97,
'update_paths'  :   50,
'start_state'   :   'init',
'horizon'       :   400,
'npg_hp'        :   dict(FIM_invert_args={'iters': 25, 'damping': 1e-4}),

I imagine some configurations are wrong. If this is true, please provide us the correct configuration to reproduce the work.

IcarusWizard · 2021-03-22T16:30:05Z

Hi guys. I just go through the code, and I find the setup of NumPy seed is missing in the learn_model.py script which will cause the results to be unreproducible.

aravindr93 · 2021-03-23T03:58:15Z

Thank you for the interest and apologies for the slow reply (I've been a bit behind due to my PhD defense). Below are quick replies. I will leave the issue open and provide more detailed answers in a week after ICML rebuttals are over.

@OverEuro the config/hyperparameters between the random and medium datasets are different, especially the pessimism coefficient. I will find and share the configs shortly. For the medium dataset, the results appear to be in the correct ballpark initially before the performance seems to degrade. This suggests the dynamics model is perhaps slightly different due to the random seed issue @IcarusWizard mentioned.
@BrunoBSM thanks for the question. The hyperparamters mentioned in the paper are for the BRAC datasets and not D4RL (which we included in the most recent update on arxiv). I will find and share the config files for other D4RL tasks.
@IcarusWizard thanks for catching this. I will fix this in the next update to the code.

chicwzh · 2021-04-26T08:12:39Z

Hi, I'm having issues with reproducing the work as well. Here is my results in the hopper-medium-v0 datasets, using the official code from https://sites.google.com/view/morel and default hyperparameters from d4rl_hopper_medium.txt.

jihwan-jeong · 2021-09-16T07:05:06Z

Hi :) Thanks for the great work! @aravindr93 If you don't mind asking, I wonder when would you be able to share the configs for D4RL results? If they're already shared, I'd appreciate it if anyone could point me to where I can find them. Thanks! :)

symoon11 · 2021-09-30T03:15:58Z

Hi guys. I will give you some tips for reproducing MOReL on D4RL dataset.

You should properly transform actions from synthetic trajectories when you feed them into a dynamics model. Actions in D4RL dataset have values within a range of (-1, 1), since they are generated from a policy with tanh output activation. However, the policy used in this library does not have any output activation so that actions from this policy can have any real values.
MOReL is very sensitive to hyperparameters. I recommend you sweep pessimism_coef, init_log_std, damping and choose the best.

I hope this will be helpful for your research.

qsa-fox · 2021-10-20T09:00:16Z

I reproduced the hopper-random-v0 result on d4rl by using the d4rl_hopper_medium.txt, excepting modified the pessimism_coef to 0. I also found that the performance is sensitive to the trained WorldModel, if you cannot get a good result, try retrain the WorldModels.

Here is the configuation file:
d4rl_hopper_random.txt

XuJing1022 · 2022-03-11T02:53:06Z

Hi, I run the command following the readme file as https://github.com/aravindr93/mjrl/tree/v2/projects/morel, but my results is different from the picture shown. Here is my results:

And I haven't changed any hyperparameters. Can you tell how to reproduce the results? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental results of MoREL for D4RL benchmarks #35

Experimental results of MoREL for D4RL benchmarks #35

stevenyangyj commented Mar 12, 2021

BrunoBSM commented Mar 17, 2021 •

edited

Loading

IcarusWizard commented Mar 22, 2021

aravindr93 commented Mar 23, 2021

chicwzh commented Apr 26, 2021

jihwan-jeong commented Sep 16, 2021 •

edited

Loading

symoon11 commented Sep 30, 2021 •

edited

Loading

qsa-fox commented Oct 20, 2021 •

edited

Loading

XuJing1022 commented Mar 11, 2022

Experimental results of MoREL for D4RL benchmarks #35

Experimental results of MoREL for D4RL benchmarks #35

Comments

stevenyangyj commented Mar 12, 2021

BrunoBSM commented Mar 17, 2021 • edited Loading

IcarusWizard commented Mar 22, 2021

aravindr93 commented Mar 23, 2021

chicwzh commented Apr 26, 2021

jihwan-jeong commented Sep 16, 2021 • edited Loading

symoon11 commented Sep 30, 2021 • edited Loading

qsa-fox commented Oct 20, 2021 • edited Loading

XuJing1022 commented Mar 11, 2022

BrunoBSM commented Mar 17, 2021 •

edited

Loading

jihwan-jeong commented Sep 16, 2021 •

edited

Loading

symoon11 commented Sep 30, 2021 •

edited

Loading

qsa-fox commented Oct 20, 2021 •

edited

Loading