Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripts and configs for hyperparameter tuning #675

Merged
merged 69 commits into from
Oct 10, 2023
Merged

Conversation

taufeeque9
Copy link
Collaborator

@taufeeque9 taufeeque9 commented Feb 9, 2023

Description

This PR adds the scripts and configs for tuning hyperparameters of imitation algorithms. The results in the first draft of the paper were computed using the configs in the PR. Note that the preference comparisons algorithm still needs to be tuned on all of the environments.

Changes:

  • imitation/scripts/parallel.py: Added features useful for tuning hyperparameters (HPs) of the algorithms. Specifically, added options to 1) repeat trials across configs to get a better (mean) estimate of the return of HP configs, 2) evaluate the best config from an HP tune run, and 3) restart failed trials of a tune run.
  • imitation/scripts/analyze.py: Added table verbosity level 3, which generates a CSV with all the config HPs, including RL HPs and the specific algorithm's arguments, along with the expert returns and the returns of the imitation algorithm.
  • imitation/scripts/config/parallel.py: Added HP search space for various algorithms like BC, Dagger, GAIL, and AIRL. Search space for Preference Comparisons may not be good enough.
  • imitation/scripts/config/train_*.py: The added configs contain the HP for the tuned base RL algorithm.

Testing

Updated test cases for parallel and tested HP tuning manually.

@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Merging #675 (664fc37) into master (b452384) will decrease coverage by 0.58%.
Report is 5 commits behind head on master.
The diff coverage is 46.24%.

@@            Coverage Diff             @@
##           master     #675      +/-   ##
==========================================
- Coverage   96.33%   95.75%   -0.58%     
==========================================
  Files          93       93              
  Lines        8789     8884      +95     
==========================================
+ Hits         8467     8507      +40     
- Misses        322      377      +55     
Files Changed Coverage Δ
src/imitation/scripts/train_imitation.py 92.64% <ø> (ø)
src/imitation/scripts/config/train_rl.py 48.57% <15.15%> (-10.41%) ⬇️
src/imitation/scripts/config/train_adversarial.py 53.26% <19.23%> (-6.00%) ⬇️
src/imitation/scripts/config/train_imitation.py 52.63% <31.57%> (-7.02%) ⬇️
...ion/scripts/config/train_preference_comparisons.py 59.57% <35.00%> (-4.07%) ⬇️
src/imitation/scripts/config/parallel.py 65.71% <50.00%> (+9.35%) ⬆️
src/imitation/scripts/parallel.py 62.33% <51.72%> (-3.71%) ⬇️
src/imitation/scripts/ingredients/reward.py 87.01% <66.66%> (-0.83%) ⬇️
src/imitation/scripts/analyze.py 91.60% <100.00%> (+0.19%) ⬆️
src/imitation/scripts/config/analyze.py 80.00% <100.00%> (ø)
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@ernestum ernestum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review mostly with suggestions to the parallel script.

src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
src/imitation/scripts/config/train_adversarial.py Outdated Show resolved Hide resolved
@ernestum
Copy link
Collaborator

Not part of this PR but @taufeeque9 could you elaborate on the benchmarking/util.py script? It is unclear right now where in the benchmark pipeline it is used.

src/imitation/scripts/config/train_rl.py Show resolved Hide resolved
src/imitation/scripts/train_adversarial.py Outdated Show resolved Hide resolved
@ernestum ernestum added this to the Release v1.0 milestone May 25, 2023
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the scripts in benchmarking/ being type checked and linted? I don't see them in the output at https://app.circleci.com/pipelines/github/HumanCompatibleAI/imitation/3897/workflows/02ec5434-d883-4edf-9fa5-99faeb0645f2/jobs/15772 The directory benchmarking/ is not specified in the inputs config for ptype in setup.cfg.

I think should either move the scripts into src/imitation/scripts/ and make them part of the package, or update CI config to test things in benchmarking/ (worth checking we're not missing them for flake8 etc as well).

from pandas.api import types as pd_types
from ray.tune.search import optuna
from sacred.observers import FileStorageObserver
from tuning_config import parallel_ex, tuning_ex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: standard library, third-party then first party-imports with line separating each.

Suggested change
from tuning_config import parallel_ex, tuning_ex
from tuning_config import parallel_ex, tuning_ex

benchmarking/tuning.py Outdated Show resolved Hide resolved
benchmarking/tuning.py Outdated Show resolved Hide resolved
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it seems like there's a lot of duplication in specifying the pre-tuned hyperparameters in *.json files in benchmarking/, and in named_configs in each algorithm. If possible it seems preferable to make this more DRY (Don't Repeat Yourself) so that we have a single source of truth: otherwise I could imagine them drifting apart as we do future hyperparameter sweeps.

It is convenient to be able to write e.g. with seals_ant rather than with benchmarking/airl_seals_ant_best_hp_eval.json. But Sacred does support an ex.add_named_config: https://github.com/IDSIA/sacred/blob/17c530660d5b405af0f5c286b1a93f3d8911d026/sacred/ingredient.py#L251 So we could replace some of these with that.

It does seem like the named configs have been edited to be more minimal. That's great, but as a user, which one should I use? If the answer is "always use the named_config" then we should really consider editing the JSONs (manually or automatically). If the answer is "it depends" then it would help to document the pros and cons and justify why we need both.

benchmarking/tuning.py Outdated Show resolved Hide resolved
][0]

if print_return:
all_returns = df[df["mean_return"] == row["mean_return"]][return_key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit fragile, you could in theory have multiple distinct hyperparameter groups that led to the same mean returns across seeds, but in practice probably OK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we get multiple distinct hyperparameter groups that get the same mean returns across seeds, we pick the first hyperparameter group from the df. And the ordering of hyperparameter groups in the df should be arbitrary, so it should be fine to do this, right?

benchmarking/tuning_config.py Outdated Show resolved Hide resolved
src/imitation/scripts/analyze.py Outdated Show resolved Hide resolved
src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
src/imitation/scripts/parallel.py Outdated Show resolved Hide resolved
tests/scripts/test_scripts.py Outdated Show resolved Hide resolved
@taufeeque9
Copy link
Collaborator Author

I think should either move the scripts into src/imitation/scripts/ and make them part of the package

I have moved the benchmarking scripts into src/imitation/scripts and the tuned hyperparameter json files into src/imitation/scripts/config/tuned_hps. This seems more clean and elegant to me than keeping everything in the benchmarking directory. With these changes the benchmarking directory seems useless now as it just contains a util.py file to clean up the json files. We can move the util.py file somewhere else and delete the benchmarking directory.

Right now it seems like there's a lot of duplication in specifying the pre-tuned hyperparameters in *.json files in benchmarking/, and in named_configs in each algorithm.

This was a great point! Thanks for noting it. I have added all of the json files as named configs in their respective train_<algo> script. So now we can directly include the named configs from the command line instead of the json files. As a result, I have also removed the old named configs for the algorithms & environments we have tuned.

So for example, in the train_adversarial script, I have removed seals_half_cheetah named config and added airl_seals_half_cheetah and gail_seals_half_cheetah configs for the respective algorithms of airl and gail. So now a user can run the following:

python -m imitation.scripts.train_adversarial airl with airl_seals_half_cheetah

Note that there's a redundancy in the command where airl is being mentioned twice. I've tried to think of ways to remove this redundancy but couldn't figure a way out that preserves the command name airl and removes airl from the named config. We can maybe remove the command name altogether and keep algo_env config names but that would be quite a big change for this PR. I think for now raising this as an issue might be best so that we can deal with this later with some other PR.

Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor suggestions.

CI is failing right now -- we should get it green before merging.

Lint error is just whitespace should be easy to fix.

Unit test error is in test_scripts for eval_policy. Not sure why this is being triggered here as we've not changed the code. It may just be a Gymnasium upgrade triggering it? If so, happy for you to do a hotfix for this in a separate PR, we can then get that hotfix merged then finally merge this PR. Can probably be resolved by changing gym.make to include appropriate render_mode="rgb_array".

With these changes the benchmarking directory seems useless now as it just contains a util.py file to clean up the json files. We can move the util.py file somewhere else and delete the benchmarking directory.

Agreed probably best to move util.py somewhere else. There's not an obvious alternative location for benchmarking/README.md and we might want to include benchmarking table output etc in that directory so I'm OK keeping it around as just a stub for now but you could also move it to e.g. BENCHMARK.md at the top-level.

I've tried to think of ways to remove this redundancy but couldn't figure a way out that preserves the command name airl and removes airl from the named config.

I think this is fine, people can type the extra 5 keystrokes :)

benchmarking/README.md Outdated Show resolved Hide resolved
benchmarking/README.md Outdated Show resolved Hide resolved
benchmarking/README.md Outdated Show resolved Hide resolved
benchmarking/README.md Outdated Show resolved Hide resolved
@ernestum
Copy link
Collaborator

ernestum commented Oct 9, 2023

test_experiments is now also crashing under MacOS, so I guess we need to fix that one too?

Copy link
Collaborator

@ernestum ernestum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
The windows pipeline will be fixed here: #811
Can you merge this @adam?

@AdamGleave AdamGleave merged commit 20366b0 into master Oct 10, 2023
7 of 9 checks passed
@AdamGleave AdamGleave deleted the benchmark-pr branch October 10, 2023 23:07
@AdamGleave
Copy link
Member

Merged -- well done @taufeeque9 for pushing this over the finish line, and thanks @ernestum for the detailed review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants