Add scripts and configs for hyperparameter tuning #675

taufeeque9 · 2023-02-09T06:38:31Z

Description

This PR adds the scripts and configs for tuning hyperparameters of imitation algorithms. The results in the first draft of the paper were computed using the configs in the PR. Note that the preference comparisons algorithm still needs to be tuned on all of the environments.

Changes:

imitation/scripts/parallel.py: Added features useful for tuning hyperparameters (HPs) of the algorithms. Specifically, added options to 1) repeat trials across configs to get a better (mean) estimate of the return of HP configs, 2) evaluate the best config from an HP tune run, and 3) restart failed trials of a tune run.
imitation/scripts/analyze.py: Added table verbosity level 3, which generates a CSV with all the config HPs, including RL HPs and the specific algorithm's arguments, along with the expert returns and the returns of the imitation algorithm.
imitation/scripts/config/parallel.py: Added HP search space for various algorithms like BC, Dagger, GAIL, and AIRL. Search space for Preference Comparisons may not be good enough.
imitation/scripts/config/train_*.py: The added configs contain the HP for the tuned base RL algorithm.

Testing

Updated test cases for parallel and tested HP tuning manually.

This change just made some error messages go away indicating the missing imitation.algorithms.dagger.ExponentialBetaSchedule but it did not fix the root cause.

codecov · 2023-02-09T15:06:55Z

Codecov Report

Merging #675 (664fc37) into master (b452384) will decrease coverage by 0.58%.
Report is 5 commits behind head on master.
The diff coverage is 46.24%.

@@            Coverage Diff             @@
##           master     #675      +/-   ##
==========================================
- Coverage   96.33%   95.75%   -0.58%     
==========================================
  Files          93       93              
  Lines        8789     8884      +95     
==========================================
+ Hits         8467     8507      +40     
- Misses        322      377      +55

Files Changed	Coverage Δ
src/imitation/scripts/train_imitation.py	`92.64% <ø> (ø)`
src/imitation/scripts/config/train_rl.py	`48.57% <15.15%> (-10.41%)`	⬇️
src/imitation/scripts/config/train_adversarial.py	`53.26% <19.23%> (-6.00%)`	⬇️
src/imitation/scripts/config/train_imitation.py	`52.63% <31.57%> (-7.02%)`	⬇️
...ion/scripts/config/train_preference_comparisons.py	`59.57% <35.00%> (-4.07%)`	⬇️
src/imitation/scripts/config/parallel.py	`65.71% <50.00%> (+9.35%)`	⬆️
src/imitation/scripts/parallel.py	`62.33% <51.72%> (-3.71%)`	⬇️
src/imitation/scripts/ingredients/reward.py	`87.01% <66.66%> (-0.83%)`	⬇️
src/imitation/scripts/analyze.py	`91.60% <100.00%> (+0.19%)`	⬆️
src/imitation/scripts/config/analyze.py	`80.00% <100.00%> (ø)`
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ernestum

First round of review mostly with suggestions to the parallel script.

src/imitation/scripts/parallel.py

src/imitation/scripts/config/train_adversarial.py

src/imitation/scripts/config/train_preference_comparisons.py

ernestum · 2023-05-11T18:02:05Z

Not part of this PR but @taufeeque9 could you elaborate on the benchmarking/util.py script? It is unclear right now where in the benchmark pipeline it is used.

src/imitation/scripts/config/train_rl.py

src/imitation/scripts/train_adversarial.py

src/imitation/scripts/config/train_adversarial.py

src/imitation/scripts/parallel.py

AdamGleave

Are the scripts in benchmarking/ being type checked and linted? I don't see them in the output at https://app.circleci.com/pipelines/github/HumanCompatibleAI/imitation/3897/workflows/02ec5434-d883-4edf-9fa5-99faeb0645f2/jobs/15772 The directory benchmarking/ is not specified in the inputs config for ptype in setup.cfg.

I think should either move the scripts into src/imitation/scripts/ and make them part of the package, or update CI config to test things in benchmarking/ (worth checking we're not missing them for flake8 etc as well).

AdamGleave · 2023-10-03T01:39:04Z

benchmarking/tuning.py

+from pandas.api import types as pd_types
+from ray.tune.search import optuna
+from sacred.observers import FileStorageObserver
+from tuning_config import parallel_ex, tuning_ex


PEP8: standard library, third-party then first party-imports with line separating each.

Suggested change

from tuning_config import parallel_ex, tuning_ex

from tuning_config import parallel_ex, tuning_ex

benchmarking/tuning.py

AdamGleave

Right now it seems like there's a lot of duplication in specifying the pre-tuned hyperparameters in *.json files in benchmarking/, and in named_configs in each algorithm. If possible it seems preferable to make this more DRY (Don't Repeat Yourself) so that we have a single source of truth: otherwise I could imagine them drifting apart as we do future hyperparameter sweeps.

It is convenient to be able to write e.g. with seals_ant rather than with benchmarking/airl_seals_ant_best_hp_eval.json. But Sacred does support an ex.add_named_config: https://github.com/IDSIA/sacred/blob/17c530660d5b405af0f5c286b1a93f3d8911d026/sacred/ingredient.py#L251 So we could replace some of these with that.

It does seem like the named configs have been edited to be more minimal. That's great, but as a user, which one should I use? If the answer is "always use the named_config" then we should really consider editing the JSONs (manually or automatically). If the answer is "it depends" then it would help to document the pros and cons and justify why we need both.

benchmarking/tuning.py

AdamGleave · 2023-10-03T01:59:13Z

benchmarking/tuning.py

+    ][0]
+
+    if print_return:
+        all_returns = df[df["mean_return"] == row["mean_return"]][return_key]


This is a bit fragile, you could in theory have multiple distinct hyperparameter groups that led to the same mean returns across seeds, but in practice probably OK.

In case we get multiple distinct hyperparameter groups that get the same mean returns across seeds, we pick the first hyperparameter group from the df. And the ordering of hyperparameter groups in the df should be arbitrary, so it should be fine to do this, right?

benchmarking/tuning_config.py

src/imitation/scripts/analyze.py

src/imitation/scripts/parallel.py

tests/scripts/test_scripts.py

…arams

taufeeque9 · 2023-10-05T02:37:44Z

I think should either move the scripts into src/imitation/scripts/ and make them part of the package

I have moved the benchmarking scripts into src/imitation/scripts and the tuned hyperparameter json files into src/imitation/scripts/config/tuned_hps. This seems more clean and elegant to me than keeping everything in the benchmarking directory. With these changes the benchmarking directory seems useless now as it just contains a util.py file to clean up the json files. We can move the util.py file somewhere else and delete the benchmarking directory.

Right now it seems like there's a lot of duplication in specifying the pre-tuned hyperparameters in *.json files in benchmarking/, and in named_configs in each algorithm.

This was a great point! Thanks for noting it. I have added all of the json files as named configs in their respective train_<algo> script. So now we can directly include the named configs from the command line instead of the json files. As a result, I have also removed the old named configs for the algorithms & environments we have tuned.

So for example, in the train_adversarial script, I have removed seals_half_cheetah named config and added airl_seals_half_cheetah and gail_seals_half_cheetah configs for the respective algorithms of airl and gail. So now a user can run the following:

python -m imitation.scripts.train_adversarial airl with airl_seals_half_cheetah

Note that there's a redundancy in the command where airl is being mentioned twice. I've tried to think of ways to remove this redundancy but couldn't figure a way out that preserves the command name airl and removes airl from the named config. We can maybe remove the command name altogether and keep algo_env config names but that would be quite a big change for this PR. I think for now raising this as an issue might be best so that we can deal with this later with some other PR.

…add-humanoid-bc-tune

AdamGleave

LGTM, some minor suggestions.

CI is failing right now -- we should get it green before merging.

Lint error is just whitespace should be easy to fix.

Unit test error is in test_scripts for eval_policy. Not sure why this is being triggered here as we've not changed the code. It may just be a Gymnasium upgrade triggering it? If so, happy for you to do a hotfix for this in a separate PR, we can then get that hotfix merged then finally merge this PR. Can probably be resolved by changing gym.make to include appropriate render_mode="rgb_array".

With these changes the benchmarking directory seems useless now as it just contains a util.py file to clean up the json files. We can move the util.py file somewhere else and delete the benchmarking directory.

Agreed probably best to move util.py somewhere else. There's not an obvious alternative location for benchmarking/README.md and we might want to include benchmarking table output etc in that directory so I'm OK keeping it around as just a stub for now but you could also move it to e.g. BENCHMARK.md at the top-level.

I've tried to think of ways to remove this redundancy but couldn't figure a way out that preserves the command name airl and removes airl from the named config.

I think this is fine, people can type the extra 5 keystrokes :)

benchmarking/README.md

ernestum · 2023-10-09T18:28:14Z

test_experiments is now also crashing under MacOS, so I guess we need to fix that one too?

ernestum

LGTM
The windows pipeline will be fixed here: #811
Can you merge this @adam?

AdamGleave · 2023-10-10T23:07:24Z

Merged -- well done @taufeeque9 for pushing this over the finish line, and thanks @ernestum for the detailed review!

taufeeque9 and others added 14 commits January 5, 2023 01:49

Merge py file changes from benchmark-algs

b4210c1

Clean parallel script

97bc063

Undo the changes from #653 to the dagger benchmark config files.

9291225

This change just made some error messages go away indicating the missing imitation.algorithms.dagger.ExponentialBetaSchedule but it did not fix the root cause.

Improve readability and interpretability of benchmarking tests.

276d863

Add pxponential beta scheduler for dagger

37eb914

Ignore coverage for unknown algorithms.

877383b

Cleanup and extend tests for beta schedules in dagger.

c8e55cb

Merge branch 'master' into benchmark-pr

6b9b306

Fix test cases

8576465

Add optuna to dependencies

d81eb68

Fix test case

27467d3

Merge branch 'master' into benchmark-pr

b59a768

Clean up the scripts

1a3b6b8

Remove reporter(done) since mean_return is reported by the runs

7a438da

AdamGleave mentioned this pull request Feb 13, 2023

GAIL and AIRL don't work #680

Closed

AdamGleave requested a review from ernestum February 13, 2023 19:06

taufeeque9 added 4 commits February 20, 2023 19:03

Merge branch 'master' into benchmark-pr

5bc5835

Add beta_schedule parameter to dagger script

2e56de8

Merge branch 'master' into benchmark-pr

84e854a

Update config policy kwargs

73d8576

ernestum requested changes May 11, 2023

View reviewed changes

src/imitation/scripts/config/train_rl.py Show resolved Hide resolved

src/imitation/scripts/train_adversarial.py Outdated Show resolved Hide resolved

taufeeque9 added 3 commits May 16, 2023 19:00

Changes from review

9fdf878

Fix errors with some configs

1c1dbc4

Merge branch 'master' into benchmark-pr

3467af2

ernestum added this to the Release v1.0 milestone May 25, 2023

ernestum reviewed Jun 12, 2023

View reviewed changes

src/imitation/scripts/config/train_adversarial.py Show resolved Hide resolved

ernestum reviewed Jun 12, 2023

View reviewed changes

src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved

Push gymnasium dependency to 0.29 to ensure mujoco envs work.

71f6c92

AdamGleave reviewed Oct 3, 2023

View reviewed changes

src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved

src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved

src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved

tests/scripts/test_scripts.py Outdated Show resolved Hide resolved

taufeeque9 added 6 commits October 4, 2023 05:58

Incorporate review comments

747ad32

Fix test errors

691e759

Move benchmarking/ to scripts/ and add named configs for tuned hyperp…

2038e60

…arams

Bump cache version & remove unnecessary files

35c7265

Include tuned hyperparam json files in package data

fdf4f49

Update storage hash

5f9a4e6

taufeeque9 requested a review from AdamGleave October 5, 2023 02:38

taufeeque9 and others added 4 commits October 5, 2023 21:47

Update search space of bc

91bb785

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

3d93c84

…add-humanoid-bc-tune

update benchmark and hyper parameter tuning readme

f59fea2

Update README.md

95110dc

AdamGleave approved these changes Oct 6, 2023

View reviewed changes

benchmarking/README.md Outdated Show resolved Hide resolved

benchmarking/README.md Outdated Show resolved Hide resolved

benchmarking/README.md Outdated Show resolved Hide resolved

benchmarking/README.md Outdated Show resolved Hide resolved

taufeeque9 added 6 commits October 6, 2023 22:35

Incorporate reviewer's comments in benchmarking readme

75f3477

Merge branch 'master' into benchmark-pr

77c1115

Update gymnasium version and render mode in eval policy

1ba2b00

Fix error

ba4b693

Merge branch 'update-gymnasium-dep' into benchmark-pr

bb76ee1

Merge branch 'master' into benchmark-pr

278f225

taufeeque9 added 2 commits October 10, 2023 00:43

Update commands.py hex strings

01755a2

Merge branch 'master' into benchmark-pr

fdcef92

ernestum approved these changes Oct 10, 2023

View reviewed changes

AdamGleave merged commit 20366b0 into master Oct 10, 2023
7 of 9 checks passed

AdamGleave deleted the benchmark-pr branch October 10, 2023 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scripts and configs for hyperparameter tuning #675

Add scripts and configs for hyperparameter tuning #675

taufeeque9 commented Feb 9, 2023 •

edited

Loading

codecov bot commented Feb 9, 2023 •

edited

Loading

ernestum left a comment

ernestum commented May 11, 2023

AdamGleave left a comment

AdamGleave Oct 3, 2023

AdamGleave left a comment

AdamGleave Oct 3, 2023

taufeeque9 Oct 3, 2023

taufeeque9 commented Oct 5, 2023

AdamGleave left a comment

ernestum commented Oct 9, 2023

ernestum left a comment

AdamGleave commented Oct 10, 2023

	from tuning_config import parallel_ex, tuning_ex

	from tuning_config import parallel_ex, tuning_ex

Add scripts and configs for hyperparameter tuning #675

Add scripts and configs for hyperparameter tuning #675

Conversation

taufeeque9 commented Feb 9, 2023 • edited Loading

Description

Testing

codecov bot commented Feb 9, 2023 • edited Loading

Codecov Report

ernestum left a comment

Choose a reason for hiding this comment

ernestum commented May 11, 2023

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Oct 3, 2023

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Oct 3, 2023

Choose a reason for hiding this comment

taufeeque9 Oct 3, 2023

Choose a reason for hiding this comment

taufeeque9 commented Oct 5, 2023

AdamGleave left a comment

Choose a reason for hiding this comment

ernestum commented Oct 9, 2023

ernestum left a comment

Choose a reason for hiding this comment

AdamGleave commented Oct 10, 2023

taufeeque9 commented Feb 9, 2023 •

edited

Loading

codecov bot commented Feb 9, 2023 •

edited

Loading