🌲🌲🌲
Reinforcement Learning Algorithm Based On TensorFlow2.0.
This project includes SOTA or classic RL(reinforcement learning) algorithms used for training agents by interacting with Unity through ml-agents Release 1 or with gym. The goal of this framework is to provide stable implementations of standard RL algorithms and simultaneously enable fast prototyping of new methods.
It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).
- Suitable for Windows, Linux, and OSX
- Almost reimplementation and competitive performance of original papers
- Reusable modules
- Clear hierarchical structure and easy code control
- Compatible with OpenAI Gym and Unity3D Ml-agents
- Restoring the training process from where it stopped, retraining on a new task, fine-tuning
- Using other training task's model as parameter initialization, specifying
--load
This project supports:
- Unity3D ml-agents.
- Gym{MuJoCo, PyBullet, gym_minigrid}, for now only two data types are compatible——
[Box, Discrete]
. Support 99.65% environment settings of Gym(exceptBlackjack-v0
,KellyCoinflip-v0
, andKellyCoinflipGeneralized-v0
).Support parallel training using gym envs, just need to specify(Because of GIL, It turned out to be pseudo-multithreading)--gym-agents
to how many agents you want to train in parallel.- Discrete -> Discrete (observation type -> action type)
- Discrete -> Box
- Box -> Discrete
- Box -> Box
- Box/Discrete -> Tuple(Discrete, Discrete, Discrete)
- MultiAgent training. One brain controls multiple agents.
- MultiBrain training. Brains' model should be same algorithm or have the same learning-progress(perStep or perEpisode).
- MultiImage input(only for ml-agents). Images will resized to same shape before store into replay buffer, like
[84, 84, 3]
. - Four types of ReplayBuffer, Default is ER:
- ER
- n-step ER
- Prioritized ER
- n-step Prioritized ER
- Noisy Net for better exploration.
- Intrinsic Curiosity Module for almost all off-policy algorithms implemented.
- Parallel training multiple scenes for Gym
- Unified data format of environments between ml-agents and gym
- Just need to write a single file for other algorithms' implementation(Similar algorithm structure).
- Many controllable factors and adjustable parameters
For now, these algorithms are available:
- Single-Agent training algorithms(Some algorithms that only support continuous space problems use Gumbel-softmax trick to implement discrete versions, i.e. DDPG):
- �Q-Learning, Sarsa, Expected Sarsa
- 🐛Policy Gradient, PG
- 🐛Actor Critic, AC
- Advantage Actor Critic, A2C
- Trust Region Policy Optimization, TRPO
- 💥Proximal Policy Optimization, PPO
- Deterministic Policy Gradient, DPG
- Deep Deterministic Policy Gradient, DDPG
- 🔥Soft Actor Critic, SAC
- Tsallis Actor Critic, TAC
- 🔥Twin Delayed Deep Deterministic Policy Gradient, TD3
- Deep Q-learning Network, DQN, 2013, 2015
- Double Deep Q-learning Network, DDQN
- Dueling Double Deep Q-learning Network, DDDQN
- Deep Recurrent Q-learning Network, DRQN
- Deep Recurrent Double Q-learning, DRDQN
- Category 51, C51
- Quantile Regression DQN, QR-DQN
- Implicit Quantile Networks, IQN
- Rainbow DQN
- MaxSQN
- Soft Q-Learning, SQL
- Bootstrapped DQN
- Contrastive Unsupervised RL, CURL
- Hierachical training algorithms:
- Multi-Agent training algorithms(not support visual input yet):
- Multi-Agent Deep Deterministic Policy Gradient, MADDPG
- Multi-Agent Deterministic Policy Gradient, MADPG
- Multi-Agent Twin Delayed Deep Deterministic Policy Gradient, MATD3
- Safe Reinforcement Learning algorithms(not stable yet):
Algorithms(29) | Discrete | Continuous | Image | RNN | Command parameter |
---|---|---|---|---|---|
Q-Learning/Sarsa/Expected Sarsa | √ | qs | |||
PG | √ | √ | √ | pg | |
AC | √ | √ | √ | √ | ac |
A2C | √ | √ | √ | a2c | |
TRPO | √ | √ | √ | trpo | |
PPO | √ | √ | √ | ppo | |
DQN | √ | √ | √ | dqn | |
Double DQN | √ | √ | √ | ddqn | |
Dueling Double DQN | √ | √ | √ | dddqn | |
Bootstrapped DQN | √ | √ | √ | bootstrappeddqn | |
Soft Q-Learning | √ | √ | √ | sql | |
C51 | √ | √ | √ | c51 | |
QR-DQN | √ | √ | √ | qrdqn | |
IQN | √ | √ | √ | iqn | |
Rainbow | √ | √ | √ | rainbow | |
DPG | √ | √ | √ | √ | dpg |
DDPG | √ | √ | √ | √ | ddpg |
PD-DDPG | √ | √ | √ | √ | pd_ddpg |
TD3 | √ | √ | √ | √ | td3 |
SAC(has V network) | √ | √ | √ | √ | sac_v |
SAC | √ | √ | √ | √ | sac |
TAC | sac | √ | √ | √ | tac |
MaxSQN | √ | √ | √ | maxsqn | |
MADPG | √ | √ | ma_dpg | ||
MADDPG | √ | √ | ma_ddpg | ||
MATD3 | √ | √ | ma_td3 | ||
OC | √ | √ | √ | √ | oc |
AOC | √ | √ | √ | √ | aoc |
PPOC | √ | √ | √ | √ | ppoc |
IOC | √ | √ | √ | √ | ioc |
HIRO | √ | √ | hiro | ||
CURL | √ | √ | √ | curl |
"""
Usage:
python [options]
Options:
-h,--help 显示帮助
-i,--inference 推断 [default: False]
-a,--algorithm=<name> 算法 [default: ppo]
-c,--config-file=<file> 指定模型的超参数config文件 [default: None]
-e,--env=<file> 指定环境名称 [default: None]
-p,--port=<n> 端口 [default: 5005]
-u,--unity 是否使用unity客户端 [default: False]
-g,--graphic 是否显示图形界面 [default: False]
-n,--name=<name> 训练的名字 [default: None]
-s,--save-frequency=<n> 保存频率 [default: None]
-m,--models=<n> 同时训练多少个模型 [default: 1]
-r,--rnn 是否使用RNN模型 [default: False]
--store-dir=<file> 指定要保存模型、日志、数据的文件夹路径 [default: None]
--seed=<n> 指定模型的随机种子 [default: 0]
--unity-env-seed=<n> 指定unity环境的随机种子 [default: 0]
--max-step=<n> 每回合最大步长 [default: None]
--max-episode=<n> 总的训练回合数 [default: None]
--sampler=<file> 指定随机采样器的文件路径 [default: None]
--load=<name> 指定载入model的训练名称 [default: None]
--prefill-steps=<n> 指定预填充的经验数量 [default: None]
--prefill-choose 指定no_op操作时随机选择动作,或者置0 [default: False]
--gym 是否使用gym训练环境 [default: False]
--gym-agents=<n> 指定并行训练的数量 [default: 1]
--gym-env=<name> 指定gym环境的名字 [default: CartPole-v0]
--gym-env-seed=<n> 指定gym环境的随机种子 [default: 0]
--render-episode=<n> 指定gym环境从何时开始渲染 [default: None]
--info=<str> 抒写该训练的描述,用双引号包裹 [default: None]
--use-wandb 是否上传数据到W&B [default: False]
Example:
python run.py -a sac -g -e C:/test.exe -p 6666 -s 10 -n test -c config.yaml --max-step 1000 --max-episode 1000 --sampler C:/test_sampler.yaml
python run.py -a ppo -u -n train_in_unity --load last_train_name
python run.py -ui -a td3 -n inference_in_unity
python run.py -gi -a dddqn -n inference_with_build -e my_executable_file.exe
python run.py --gym -a ppo -n train_using_gym --gym-env MountainCar-v0 --render-episode 1000 --gym-agents 4
python run.py -u -a ddpg -n pre_fill --prefill-steps 1000 --prefill-choose
"""
If you specify gym, unity, and envrionment executable file path simultaneously, the following priorities will be followed: gym > unity > unity_env.
- log, model, training parameter configuration, and data are stored in
C:/RLdata
for Windows, or$HOME/RLdata
for Linux/OSX - maybe need to use command
su
orsudo
to run on a Linux/OSX - record directory format is
RLdata/Environment/Algorithm/Group name(for ml-agents)/Training name/config&excel&log&model
- make sure brains' number > 1 if specifing
ma*
algorithms like maddpg - multi-agents algorithms doesn't support visual input and PER for now
- need 3 steps to implement a new algorithm
- write
.py
ina/tf2algos
directory and make the policy inherit from classPolicy
,On_Policy
orOff_Policy
- write default configuration in
algos/tf2algos/config.yaml
- register new algorithm at dictionary algos in
algos/tf2algos/register.py
, i.e.'dqn': {'class': 'DQN', 'policy': 'off-policy', 'update': 'perStep'}
, make sure the classname matches the name of the algorithm class
- write
- set algorithms' hyper-parameters in algos/tf2algos/config.yaml
- set training default configuration in config.py
- change neural network structure in rls/tf2nn.py
- RNN for on-policy algorithms
- Fix multi-agent algorithms
- DARQN
- ACER
- Ape-X
- R2D2
ACKTR
- python>3.6, <=3.8
- tensorflow>=2.1.0
- numpy
- pywin32==224
- docopt
- pyyaml
- pillow
- openpyxl
- gym
- opencv-python
- ray, ray[debug] for OS based on Linux
$ git clone https://github.com/StepNeverStop/RLs.git
pip package coming soon.
If using this repository for your research, please cite:
@misc{RLs,
author = {Keavnn},
title = {RLs: Reinforcement Learning research framework for Unity3D and Gym},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/StepNeverStop/RLs}},
}
Any questions about this project or errors about my bad grammer, plz let me know in this.