-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document running the entire benchmarking suite #657
base: master
Are you sure you want to change the base?
Changes from all commits
a06574e
d8a82dc
d5587d5
d9f63a8
54094eb
25cc78f
6a378d8
b890b4e
a237405
27b9d82
761a7e4
b2e2014
8ccbdb0
ef3ac84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,21 +4,98 @@ The `src/imitation/scripts/config/tuned_hps` directory provides the tuned hyperp | |
|
||
Configuration files can be loaded either from the CLI or from the Python API. | ||
|
||
## CLI | ||
## Single benchmark | ||
|
||
To run a single benchmark from the command line: | ||
|
||
```bash | ||
python -m imitation.scripts.<train_script> <algo> with <algo>_<env> | ||
``` | ||
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`. The `env` can be either of `seals_ant`, `seals_half_cheetah`, `seals_hopper`, `seals_swimmer`, or `seals_walker`. The hyperparameters for other environments are not tuned yet. You may be able to get reasonable performance by using hyperparameters tuned for a similar environment; alternatively, you can tune the hyperparameters using the `tuning` script. | ||
|
||
## Python | ||
To view the results: | ||
|
||
```bash | ||
python -m imitation.scripts.analyze analyze_imitation with \ | ||
source_dir_str="output/sacred" table_verbosity=0 \ | ||
csv_output_path=results.csv \ | ||
run_name="<name>" | ||
``` | ||
|
||
To run a single benchmark from Python add the config to your Sacred experiment `ex`: | ||
|
||
```python | ||
... | ||
from imitation.scripts.<train_script> import <train_ex> | ||
<train_ex>.run(command_name="<algo>", named_configs=["<algo>_<env>"]) | ||
``` | ||
|
||
## Entire benchmark suite | ||
|
||
### Running locally | ||
|
||
To generate the commands to run the entire benchmarking suite with multiple random seeds: | ||
|
||
```bash | ||
python experiments/commands.py \ | ||
--name=<name> \ | ||
--cfg_pattern "benchmarking/example_*.json" \ | ||
--seeds 0 1 2 \ | ||
--output_dir=output | ||
``` | ||
|
||
To run those commands in parallel: | ||
|
||
```bash | ||
python experiments/commands.py \ | ||
--name=<name> \ | ||
--cfg_pattern "benchmarking/example_*.json" \ | ||
--seeds 0 1 2 \ | ||
--output_dir=output | parallel -j 8 | ||
``` | ||
|
||
(You may need to `brew install parallel` to get this to work on Mac.) | ||
|
||
### Running on Hofvarpnir | ||
|
||
To generate the commands for the Hofvarpnir cluster: | ||
|
||
```bash | ||
python experiments/commands.py \ | ||
--name=<name> \ | ||
--cfg_pattern "benchmarking/example_*.json" \ | ||
--seeds 0 1 2 \ | ||
--output_dir=/data/output \ | ||
--remote | ||
``` | ||
|
||
To run those commands pipe them into bash: | ||
|
||
```bash | ||
python experiments/commands.py \ | ||
--name <name> \ | ||
--cfg_pattern "benchmarking/example_*.json" \ | ||
--seeds 0 1 2 \ | ||
--output_dir /data/output \ | ||
--remote | bash | ||
``` | ||
|
||
### Results | ||
|
||
To produce a table with all the results: | ||
|
||
```bash | ||
python -m imitation.scripts.analyze analyze_imitation with \ | ||
source_dir_str="output/sacred" table_verbosity=0 \ | ||
csv_output_path=results.csv \ | ||
run_name="<name>" | ||
``` | ||
|
||
To compute a p-value to test whether the differences from the paper are statistically significant: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, in my test run they were statistically significant so something may have changed since the paper There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not too surprising, can you share the results and if they moved in a positive or negative direction since the paper? ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll include this once I include the canonical results CSV There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think we should just do a bulk update of all results - I might need help with getting access to more compute for this though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created issue for this #710 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay for Ant it looks like the original results were mean 1953 std dev 99 and the new results (for me) are mean 1794 std dev 244. The p-value is 0.20 so it's not a statistically significant difference though. This is different from my run yesterday though. This gives more reasons to rerun everything in bulk IMO |
||
|
||
```bash | ||
python -m imitation.scripts.compare_to_baseline results.csv | ||
``` | ||
# Tuning Hyperparameters | ||
|
||
The hyperparameters of any algorithm in imitation can be tuned using `src/imitation/scripts/tuning.py`. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,algo,env_name,expert_return_summary,imit_return_summary | ||
,0,101,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_711915,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),123.476 ± 2.16606 (n=56) | ||
,0,100,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082120_c540b2,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-378.377 ± 60.6063 (n=56) | ||
,0,102,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_ba94a1,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-314.108 ± 19.2371 (n=56) | ||
,0,104,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_8c6aba,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),-0.402349 ± 19.7147 (n=56) | ||
,0,103,False,10000000.0,8192,8192,16,seals/Ant-v0,output/airl/seals_Ant-v0/20221024_082122_47f04c,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_ant_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,16,0.3,3.27750078482474e-06,0.8,0.995,3.249429831179079e-05,0.9,10,0.4351450387648799,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/Ant-v0,2408.22 ± 665.201 (n=104),18.9413 ± 1.1345 (n=56) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,algo,env_name,expert_return_summary,imit_return_summary | ||
,0,100,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115006_924cb4,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),1674.29 ± 581.622 (n=56) | ||
,0,104,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_b838f5,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3652.14 ± 648.766 (n=56) | ||
,0,102,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_23f6ee,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3491.62 ± 368.717 (n=56) | ||
,0,101,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_ae2f97,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),4441.25 ± 87.8795 (n=56) | ||
,0,103,False,10000000.0,2048,512,16,seals/HalfCheetah-v0,output/airl/seals_HalfCheetah-v0/20221021_115008_1ae278,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-09-05T18:27:27-07:00/seals_half_cheetah_1/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,64,0.1,0.0005544771755195421,0.95,0.95,0.00047248619386801587,0.8,5,0.11483689492120866,50,imitation.policies.base.FeedForward32Policy,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,AIRL,seals/HalfCheetah-v0,3465.42 ± 976.462 (n=104),3960.15 ± 108.134 (n=56) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
agent_path,checkpoint_interval,seed,show_config,total_timesteps,algorithm_kwargs.demo_batch_size,algorithm_kwargs.gen_replay_buffer_capacity,algorithm_kwargs.n_disc_updates_per_round,common.env_name,common.log_dir,common.log_format_strs,common.log_format_strs_additional.wandb,common.log_level,common.log_root,common.max_episode_steps,common.num_vec,common.parallel,common.wandb.wandb_kwargs.monitor_gym,common.wandb.wandb_kwargs.project,common.wandb.wandb_kwargs.save_code,common.wandb.wandb_name_prefix,common.wandb.wandb_tag,demonstrations.n_expert_demos,demonstrations.rollout_path,expert.policy_type,reward.add_std_alpha,reward.ensemble_size,reward.net_cls.py/type,reward.net_kwargs.normalize_input_layer.py/type,reward.normalize_output_layer.py/type,rl.batch_size,rl.rl_cls.py/type,rl.rl_kwargs.batch_size,rl.rl_kwargs.clip_range,rl.rl_kwargs.ent_coef,rl.rl_kwargs.gae_lambda,rl.rl_kwargs.gamma,rl.rl_kwargs.learning_rate,rl.rl_kwargs.max_grad_norm,rl.rl_kwargs.n_epochs,rl.rl_kwargs.vf_coef,train.n_episodes_eval,train.policy_cls,train.policy_kwargs.activation_fn.py/type,train.policy_kwargs.features_extractor_class.py/type,train.policy_kwargs.features_extractor_kwargs.normalize_class.py/type,train.policy_kwargs.net_arch,algo,env_name,expert_return_summary,imit_return_summary | ||
,0,103,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223308_a8cbd6,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2600.12 ± 155.143 (n=56) | ||
,0,101,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223308_299f28,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2663.1 ± 121.83 (n=56) | ||
,0,104,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223307_1607e3,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2740.77 ± 107.306 (n=56) | ||
,0,100,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223305_7116b9,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2758.67 ± 121.298 (n=56) | ||
,0,102,False,10000000.0,2048,8192,16,seals/Hopper-v0,output/airl/seals_Hopper-v0/20221022_223307_23fde3,"['tensorboard', 'stdout', 'wandb']",,20,,,8,True,False,algorithm-benchmark,False,,,,/home/taufeeque/imitation/output/train_experts/2022-10-11T06:27:42-07:00/seals_hopper_2/rollouts/final.pkl,ppo-huggingface,,,imitation.rewards.reward_nets.BasicShapedRewardNet,imitation.util.networks.RunningNorm,imitation.util.networks.RunningNorm,8192,stable_baselines3.ppo.ppo.PPO,512,0.1,0.009709494745755033,0.98,0.995,0.0005807211840258373,0.9,20,0.20315938606555833,50,MlpPolicy,torch.nn.modules.activation.ReLU,imitation.policies.base.NormalizeFeaturesExtractor,imitation.util.networks.RunningNorm,"[{'pi': [64, 64], 'vf': [64, 64]}]",AIRL,seals/Hopper-v0,2630.92 ± 112.582 (n=104),2613.26 ± 128.037 (n=56) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
--name <name>
or--name=<name>
?