Skip to content

Commit

Permalink
added Transformer-XL (PPO-TrXL) to the navigation bar, improved docs
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcoMeter committed Sep 9, 2024
1 parent cc24bfa commit 476757a
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
10 changes: 5 additions & 5 deletions docs/rl-algorithms/ppo-trxl.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# PPO leveraging Tranformer-XL (TrXL) as Episodic Memory
# Tranformer-XL (PPO-TrXL)

## Overview

Real-world tasks may expose imperfect information (e.g. partial observability). Such tasks require an agent to leverage memory capabilities. One way to do this is to use recurrent neural networks (e.g. LSTM) as seen in :material-github: [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py), :material-file-document: [docs](/rl-algorithms/ppo/#ppo_atari_lstmpy). Here, Transformer-XL is used as episodic memory.
Real-world tasks may expose imperfect information (e.g. partial observability). Such tasks require an agent to leverage memory capabilities. One way to do this is to use recurrent neural networks (e.g. LSTM) as seen in :material-github: [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py), :material-file-document: [docs](/rl-algorithms/ppo/#ppo_atari_lstmpy). Here, Transformer-XL is used as episodic memory in Proximal Policy Optimization (PPO).

Original Paper and Implementation

* :material-file-document: [Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents](https://arxiv.org/abs/2309.17207)
* :material-github: [neroRL](https://github.com/MarcoMeter/neroRL)
* :material-github: [neroRL](https://github.com/MarcoMeter/neroRL), [Episodic Transformer Memory PPO](https://github.com/MarcoMeter/episodic-transformer-memory-ppo)

Related Publications and Repositories

Expand All @@ -22,7 +22,7 @@ Related Publications and Repositories

| Variants Implemented | Description |
| ----------- | ----------- |
| :material-github: [`ppo_trxl.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_trxl/ppo_trxl.py), :material-file-document: [docs](/rl-algorithms/ppo_trxl#ppo_trxlpy) | For training on tasks like `Endless-MortarMayhem-v0`. |
| :material-github: [`ppo_trxl.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_trxl/ppo_trxl.py), :material-file-document: [docs](/rl-algorithms/ppo-trxl#ppo_trxlpy) | For training on tasks like `Endless-MortarMayhem-v0`. |

Below is our single-file implementation of PPO-TrXL:

Expand Down Expand Up @@ -97,7 +97,7 @@ Note: When training on potentially endless episodes, the cached hidden states de

Learning curves:

<img src="./ppo-trxl/compare.png">
<img src="../ppo-trxl/compare.png">


### Enjoy pre-trained models
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ nav:
- rl-algorithms/ppo-rnd.md
- rl-algorithms/rpo.md
- rl-algorithms/qdagger.md
- rl-algorithms/ppo-trxl.md
- Advanced:
- advanced/hyperparameter-tuning.md
- advanced/resume-training.md
Expand Down

0 comments on commit 476757a

Please sign in to comment.