Skip to content

Commit

Permalink
Prepare for v1.0.0 release (#314)
Browse files Browse the repository at this point in the history
* Prepare for v1.0.0 release

* add missing documentation

* update docs on open rl benchmark

* change site name

* update documentation

* point reproducibility script to master

* revert mkdocs change

* add docs

* Check requirements.txt exports in pre-commit CI

* check all files in pre-commit

* fix docs for dqn

* v1.0.0 blog (#315)

* v1.0.0 blog

* add changes

* support insider

* properly link github usernames

* Add a note on support gymnasium

* fix typo

* add link for google jax

* fix typo

* add release note and highlight jax's performance

* fix typo

* highlight performance

* Address comments

* update blog

* update description

* remove words

* quick change

* quick change

* quick change

* fix typo

* omit `dqn_jax.py` from the announcement
  • Loading branch information
vwxyzjn authored Nov 14, 2022
1 parent 19a0907 commit c37a3ec
Show file tree
Hide file tree
Showing 33 changed files with 497 additions and 152 deletions.
3 changes: 2 additions & 1 deletion .github/issue_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@

## Checklist
- [ ] I have installed dependencies via `poetry install` (see [CleanRL's installation guideline](https://docs.cleanrl.dev/get-started/installation/).
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo (required)
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo.
- [ ] I have checked the [documentation site](https://docs.cleanrl.dev/) and found not relevant information in [GitHub issues](https://github.com/vwxyzjn/cleanrl/issues).

## Current Behavior
<!--- Tell us what happens instead of the expected behavior -->
Expand Down
6 changes: 3 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,16 @@
- [ ] I have updated the documentation and previewed the changes via `mkdocs serve`.
- [ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
- [ ] I have contacted [vwxyzjn](https://github.com/vwxyzjn) to obtain access to the [openrlbenchmark W&B team](https://wandb.ai/openrlbenchmark) (**required**).
- [ ] I have tracked applicable experiments in [openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl) with `--capture-video` flag toggled on (**required**).
- [ ] I have added additional documentation and previewed the changes via `mkdocs serve`.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have added links to the PR related to the algorithm variant.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with `width=500` and `height=300`).
- [ ] I have added the learning curves (in PNG format).
- [ ] I have added links to the tracked experiments.
- [ ] I have updated the overview sections at the [docs](https://docs.cleanrl.dev/rl-algorithms/overview/) and the [repo](https://github.com/vwxyzjn/cleanrl#overview)
- [ ] I have updated the tests accordingly (if applicable).
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
- uses: pre-commit/action@v2.0.3
with:
extra_args: --hook-stage manual --all-files
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ You may also use a prebuilt development environment hosted in Gitpod:
| | [`ppo_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_continuous_actionpy)
| | [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_lstmpy)
| | [`ppo_atari_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy)
| | [`ppo_atari_envpool_xla_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool_xla_jax.py), [docs](/rl-algorithms/ppo/#ppo_atari_envpool_xla_jaxpy)
| | [`ppo_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_procgenpy)
| | [`ppo_atari_multigpu.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_multigpu.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy)
| | [`ppo_pettingzoo_ma_atari.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
Expand All @@ -138,14 +139,18 @@ You may also use a prebuilt development environment hosted in Gitpod:
|[Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) | [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_actionpy) |
| | [`td3_continuous_action_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action_jax.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_action_jaxpy) |
|[Phasic Policy Gradient (PPG)](https://arxiv.org/abs/2009.04416) | [`ppg_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppg_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppg/#ppg_procgenpy) |
|[Random Network Distillation (RND)](https://arxiv.org/abs/1810.12894) | [`ppo_rnd_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_rnd_envpool.py), [docs](/rl-algorithms/ppo-rnd/#ppo_rnd_envpoolpy) |


## Open RL Benchmark

CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.
To make our experimental data transparent, CleanRL participates in a related project called [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark), which contains tracked experiments from popular DRL libraries such as ours, [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3), [openai/baselines](https://github.com/openai/baselines), [jaxrl](https://github.com/ikostrikov/jaxrl), and others.

Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see [repo](https://github.com/openrlbenchmark/openrlbenchmark)).

![](docs/static/o1.png)
![](docs/static/o2.png)
![](docs/static/o3.png)
![](docs/static/o1.png)


## Support and get involved
Expand Down
2 changes: 1 addition & 1 deletion benchmark/ppo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--workers 1


poetry install --with envpool
poetry install --with envpool,jax
poetry run python -m cleanrl_utils.benchmark \
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 \
--command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \
Expand Down
14 changes: 9 additions & 5 deletions cleanrl_utils/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ def parse_args():
parser.add_argument("--num-seeds", type=int, default=3,
help="the number of random seeds")
parser.add_argument("--start-seed", type=int, default=1,
help="the number of random seeds")
parser.add_argument('--workers', type=int, default=0,
help='the number of eval workers to run benchmark experimenets (skips evaluation when set to 0)')
help="the number of the starting seed")
parser.add_argument("--workers", type=int, default=0,
help="the number of workers to run benchmark experimenets")
parser.add_argument("--auto-tag", type=lambda x: bool(strtobool(x)), default=True, nargs="?", const=True,
help="if toggled, the runs will be tagged with the output from `git describe --tags` (e.g., v1.0.0b2-11-g5db4db7)")
help="if toggled, the runs will be tagged with git tags, commit, and pull request number if possible")
args = parser.parse_args()
# fmt: on
return args
Expand Down Expand Up @@ -78,7 +78,9 @@ def autotag() -> str:
for env_id in args.env_ids:
commands += [" ".join([args.command, "--env-id", env_id, "--seed", str(args.start_seed + seed)])]

print(commands)
print("======= commands to run:")
for command in commands:
print(command)

if args.workers > 0:
from concurrent.futures import ThreadPoolExecutor
Expand All @@ -87,3 +89,5 @@ def autotag() -> str:
for command in commands:
executor.submit(run_experiment, command)
executor.shutdown(wait=True)
else:
print("not running the experiments because --workers is set to 0; just printing the commands to run")
4 changes: 4 additions & 0 deletions docs/blog/.authors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
costa:
name: Costa Huang
description: Lead dev of CleanRL
avatar: https://avatars.githubusercontent.com/u/5555347
3 changes: 3 additions & 0 deletions docs/blog/.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# comments: true
# hide:
# - feedback
1 change: 1 addition & 0 deletions docs/blog/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Blog
Loading

1 comment on commit c37a3ec

@vercel
Copy link

@vercel vercel bot commented on c37a3ec Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

cleanrl – ./

docs.cleanrl.dev
cleanrl-vwxyzjn.vercel.app
cleanrl.vercel.app
cleanrl-git-master-vwxyzjn.vercel.app

Please sign in to comment.