Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document running the entire benchmarking suite #657

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
81 changes: 79 additions & 2 deletions benchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,98 @@ The `src/imitation/scripts/config/tuned_hps` directory provides the tuned hyperp

Configuration files can be loaded either from the CLI or from the Python API.

## CLI
## Single benchmark

To run a single benchmark from the command line:

```bash
python -m imitation.scripts.<train_script> <algo> with <algo>_<env>
```
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`. The `env` can be either of `seals_ant`, `seals_half_cheetah`, `seals_hopper`, `seals_swimmer`, or `seals_walker`. The hyperparameters for other environments are not tuned yet. You may be able to get reasonable performance by using hyperparameters tuned for a similar environment; alternatively, you can tune the hyperparameters using the `tuning` script.

## Python
To view the results:

```bash
python -m imitation.scripts.analyze analyze_imitation with \
source_dir_str="output/sacred" table_verbosity=0 \
csv_output_path=results.csv \
run_name="<name>"
```

To run a single benchmark from Python add the config to your Sacred experiment `ex`:

```python
...
from imitation.scripts.<train_script> import <train_ex>
<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"])
```

## Entire benchmark suite

### Running locally

To generate the commands to run the entire benchmarking suite with multiple random seeds:

```bash
python experiments/commands.py \
--name=<name> \
--cfg_pattern "benchmarking/example_*.json" \
--seeds 0 1 2 \
--output_dir=output
```

To run those commands in parallel:

```bash
python experiments/commands.py \
--name=<name> \
--cfg_pattern "benchmarking/example_*.json" \
--seeds 0 1 2 \
--output_dir=output | parallel -j 8
```

(You may need to `brew install parallel` to get this to work on Mac.)

### Running on Hofvarpnir

To generate the commands for the Hofvarpnir cluster:

```bash
python experiments/commands.py \
--name=<name> \
--cfg_pattern "benchmarking/example_*.json" \
--seeds 0 1 2 \
--output_dir=/data/output \
--remote
```

To run those commands pipe them into bash:

```bash
python experiments/commands.py \
--name <name> \
Copy link
Collaborator

@ernestum ernestum Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be --name <name> or --name=<name>?

--cfg_pattern "benchmarking/example_*.json" \
--seeds 0 1 2 \
--output_dir /data/output \
--remote | bash
```

### Results

To produce a table with all the results:

```bash
python -m imitation.scripts.analyze analyze_imitation with \
source_dir_str="output/sacred" table_verbosity=0 \
csv_output_path=results.csv \
run_name="<name>"
```

To compute a p-value to test whether the differences from the paper are statistically significant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in my test run they were statistically significant so something may have changed since the paper

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too surprising, can you share the results and if they moved in a positive or negative direction since the paper? ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll include this once I include the canonical results CSV

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should just do a bulk update of all results - I might need help with getting access to more compute for this though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue for this #710

Copy link
Contributor

@timbauman timbauman May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for Ant it looks like the original results were mean 1953 std dev 99 and the new results (for me) are mean 1794 std dev 244. The p-value is 0.20 so it's not a statistically significant difference though. This is different from my run yesterday though. This gives more reasons to rerun everything in bulk IMO


```bash
python -m imitation.scripts.compare_to_baseline results.csv
```
# Tuning Hyperparameters

The hyperparameters of any algorithm in imitation can be tuned using `src/imitation/scripts/tuning.py`.
Expand Down
6 changes: 6 additions & 0 deletions benchmarking/results/logs_example_airl_seals_ant_bhp.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,algo,env_name,expert_return_summary,imit_return_summary
,0,101,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_711915,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),123.476 ± 2.16606 (n=56)
,0,100,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082120_c540b2,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-378.377 ± 60.6063 (n=56)
,0,102,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_ba94a1,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-314.108 ± 19.2371 (n=56)
,0,104,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_8c6aba,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-0.402349 ± 19.7147 (n=56)
,0,103,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_47f04c,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),18.9413 ± 1.1345 (n=56)
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,algo,env_name,expert_return_summary,imit_return_summary
,0,100,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115006_924cb4,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),1674.29 ± 581.622 (n=56)
,0,104,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_b838f5,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3652.14 ± 648.766 (n=56)
,0,102,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_23f6ee,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3491.62 ± 368.717 (n=56)
,0,101,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_ae2f97,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),4441.25 ± 87.8795 (n=56)
,0,103,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_1ae278,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3960.15 ± 108.134 (n=56)
6 changes: 6 additions & 0 deletions benchmarking/results/logs_example_airl_seals_hopper_bhp.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls,train.policy_kwargs.activation_fn.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,train.policy_kwargs.net_arch,algo,env_name,expert_return_summary,imit_return_summary
,0,103,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223308_a8cbd6,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2600.12 ± 155.143 (n=56)
,0,101,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223308_299f28,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2663.1 ± 121.83 (n=56)
,0,104,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223307_1607e3,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2740.77 ± 107.306 (n=56)
,0,100,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223305_7116b9,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2758.67 ± 121.298 (n=56)
,0,102,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223307_23fde3,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2613.26 ± 128.037 (n=56)
Loading
Loading