An implementation of the following methods: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), Dueling Architecture, Deep Quality-Value (DQV), and DQV-max. The implementation is based on the following papers:
- Playing Atari with Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Deep Quality-Value (DQV) Learning
- Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
The methods are trained and evaluated on the Catch game.
To install all dependencies, run the following command:
pip install -r requirements.txt
To train the agent, run the following command:
python source/train_agent.py [Training Options]
- --run_name (str): Name of the run.
- --algorithm ({DQN,Dueling_architecture,DQV,DQV_max}) : Type of algorithm to use for training.
- --log_video: Whether to log video of agent's performance.
- --max_epochs (int): Maximum number of steps to train for.
- --batch_size (int): Batch size for training.
- --batches_per_step (int): Number of batches to sample from replay buffer per agent step.
- --optimizer ({Adam,RMSprop,SGD}): Optimizer to use for training.
- --learning_rate (float): Learning rate for training.
- --gamma (float): Discount factor.
- --epsilon_start (float): Initial epsilon.
- --epsilon_end (float): Final epsilon.
- --epsilon_decay_rate (int): Number of steps to decay epsilon over.
- --buffer_capacity (int): Capacity of replay buffer.
- --replay_warmup_steps (int): Number of steps to warm up the replay buffer.
- --target_net_update_freq (int): Number of steps between target network updates.
- --soft_update_tau (float): Tau for soft target network updates.
- --double_q_learning: Whether to use double Q-learning.
- --hidden_size (int): Number of hidden units in the feedforward network.
- --n_filters (int): Number of filters in the convolutional network.
- --prioritized_replay: Whether to use prioritized replay.
- --prioritized_replay_alpha (float): Alpha parameter for prioritized replay.
- --prioritized_replay_beta (float): Beta parameter for prioritized replay.