This is the final project for the Reinforcement Learning Course of the 2018/2019 MVA Master class.
This project is carried by Mehdi Boubnan & Ayman Chaouki. It consists of training an agent to play in different scenarios of the game DOOM with deep reinforcement learning methods from Deep Q learning and its enhancements like double Q learning, deep recurrent network (with LSTM), deep dueling architecture and prioritized replay to Asynchronous Advantage Actor-Critic (A3C) and Curiosity-Driven learning.
You can take a look at our paper Deep reinforcement learning applied to Doom for more details about the algorithms and some empirical results.
Here are two examples of agents trained with A3C.
- Operating system enabling the installation of VizDoom (there are some building problems with Ubuntu 16.04 for example), we use Ubuntu 18.04.
- NVIDIA GPU + CUDA and CuDNN (for optimal performance for deep Q learning methods).
- Python 3.6 (in order to install tensorflow).
- Install VizDoom package according to your operating system, see https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md
- Install pytorch.
conda install pytorch torchvision -c pytorch
- Install tensorflow with GPU support, see https://www.tensorflow.org/install/pip
- Install tensorboard and tensorboardX.
pip install tensorboard
pip install tensorboardX
- Install moviepy.
pip install moviepy
- Clone this repo
git clone https://github.com/Swirler/Deep-Reinforcement-Learning-applied-to-DOOM
cd Deep-Reinforcement-Learning-applied-to-DOOM
cd "Deep Q Learning"
- scenarios : Configurations and .wad files of the following scenarios (basic, deadly corridor and defend the center).
- weights : The weights of training each scenario will be saved here.
- You can view training rewards, game variables and loss plots by running
tensorboard --logdir runs
and clicking the URL http://localhost:6006 - Train a model with train.py , for example:
python train.py --scenario basic --window 1 --batch_size 32 --total_episodes 100 --lr 0.0001 --freq 20
- The previous command saves training weights in weights/basic/ each 20 episodes. You can use the following command to view your agent playing:
python play.py --scenario basic --window 1 --weights weights/none_19.pth --total_episodes 20 --frame_skip 2
cd "A3C_Curiosity"
- scenarios : Configurations and .wad files of the following scenarios (basic, deadly corridor, defend the center, defend the line and my way home).
- saves : Models, tensorboad summaries and workers gifs during training will be saved here.
- You can view training rewards, game variables and loss plots by running
python utils/launch_tensorboard.py
- Train a model with main.py , for example:
- Deadly corridor with default parameters :
python main.py --scenario deadly_corridor --actions all --num_workers 12 --max_episodes 1600
- Basic with default parameters :
python main.py --scenario basic --actions single --num_workers 12 --max_episodes 1200
- Deadly corridor with default parameters with PPO:
python main.py --use_ppo --scenario deadly_corridor --actions all --num_workers 12 --max_episodes 1600
- Deadly corridor with default parameters with curiosity:
python main.py --use_curiosity --scenario deadly_corridor --actions all --num_workers 12 --max_episodes 1600
See utils/args.py for more parameters.
- You can use the following command to view your agent playing using the last trained model:
python main.py --play --scenario deadly_corridor --actions all --play_episodes 10