Gomoku RL

Documentation: https://hesic73.github.io/gomoku_rl/

[TOC]

Introduction

gomoku_rl is an open-sourced project that trains agents to play the game of Gomoku through deep reinforcement learning. Previous works often rely on variants of AlphaGo/AlphaZero and inefficiently use GPU resources. Notably, many existing projects are limited to small boards, with only a few exceptions. [1] incorporates curriculum learning and other enhancements; [2] and [3] collect transitions from multiple environments and also parallelize MCTS execution. In contrast, gomoku_rl features GPU-parallelized simulation and leverages recent advancements in MARL. Starting from random play, a model can achieve human-level performance on a $15\times15$ board within hours of training on a 3090.

Installation

Install gomoku_rl with the following command:

git clone git@github.com:hesic73/gomoku_rl.git
cd gomoku_rl
conda create -n gomoku python=3.11.5
conda activate gomoku
pip install -e .

I use python 3.11.5, torch 2.1.0 and torchrl 0.2.1. Lower versions of python and torch 1.x should be compatible as well.

Getting Started

gomoku_rl uses hydra to configure training hyperparameters. You can modify the settings in cfg/train_InRL.yaml or override them via the command line:

# override default settings in cfg/train_InRL.yaml
python scripts/train_InRL.py num_env=1024 device=cuda epochs=500 wandb.mode=online
# or simply:
python scripts/train_InRL.py.py

The default location for saving checkpoints is wandb/*/files or tempfile.gettempdir() if wandb.mode=='disabled'. Modify the output directory by specifying the run_dir parameter.

After training, play Gomoku with your model using the scripts/demo.py script:

# Install PyQt5
pip install PyQt5
python scripts/demo.py device=cpu grid_size=56 piece_radius=24 checkpoint=/model/path
# default checkpoint (only for board_size=15)
python scripts/demo.py

Pretrained models for a $15\times15$ board are available under pretrained_models/15_15/. Be aware that using the wrong model for the board size will lead to loading errors due to mismatches in AI architectures. In PPO, when share_network=True, the actor and the critic could utilize a shared encoding module. At present, a PPO object with a shared encoder cannot load from a checkpoint without sharing.

GUI

Note: for deployment, we opt for torch.jit.ScriptModule instead of torch.nn.Module. The *.pt files used in scripts/train_*.py are state dicts of a torch.nn.Module and cannot be directly utilized in this context.

In addition to scripts/demo.py, there is a standalone C++ GUI application. To compile the source code, make sure to have Qt, Libtorch and cmake installed. Refer to https://pytorch.org/cppdocs/installing.html for instructions on how to install C++ distributions of Pytorch.

Here are the commands to build the executable:

# Make a directory
mkdir build; cd build

# Generate the build system
# If torch is not installed on your computer, specify the absolute path to Libtorch
cmake -DCMAKE_PREFIX_PATH=/absolute/path/to/libtorch ../src

# Alternatively, if torch is installed, use the following command
cmake -DCMAKE_PREFIX_PATH=`python3 -c 'import torch;print(torch.utils.cmake_prefix_path)'` ../src

# Build the executable
cmake --build . --config Release

PS: If CMake cannot find Torch, try set(Torch_DIR /absolute/path/to/libtorch/share/cmake/torch).

Algorithms

Presently, the framework incorporates PPO and DQN algorithms, with a designed flexibility for incorporating additional RL methods. In the realm of multi-agent training, it supports Independent RL and PSRO.

Notably, Independent RL has demonstrated superior efficacy over PSRO. As mentioned in [1], due to Gomoku's asymmetry, it's hard to train a network to play both black and white.

(Maybe I need to tune hyperparameters for PSRO.)

Details

Free-style Gomoku is a two-player zero-sum extensive-form game. Two players alternatively place black and white stones on a board and the first who forms an unbroken line of five or more stones of his color wins. In the context of Multi-Agent Reinforcement Learning (MARL), two agents learn in the environment competitively. During each agent's turn, its observation is the (encoded) current board state, and its action is the selection of a position on the board to place a stone. We use action masking to prevent illegal moves. Winning rewards the agent with +1, while losing incurs a penalty of -1.

TO DO

Restructure the code to decouple rollout functionality from GomokuEnv.
Enhance documentaion.
Further improvement

References

[1] https://arxiv.org/pdf/1809.10595

[2] https://github.com/initial-h/AlphaZero_Gomoku_MPI

[3] https://github.com/hijkzzz/alpha-zero-gomoku

[4] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

[5] What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Citation

Please use this bibtex if you want to cite this repository:

@misc{He2023gomoku_rl,
  author = {He, Sicheng},
  title = {gomoku_rl},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/hesic73/gomoku_rl}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.github/workflows		.github/workflows
assets/images		assets/images
cfg		cfg
docs		docs
examples		examples
gomoku_rl		gomoku_rl
pretrained_models/15_15		pretrained_models/15_15
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gomoku RL

Introduction

Installation

Getting Started

GUI

Algorithms

Details

TO DO

References

Citation

About

Releases 2

Packages

Languages

License

hesic73/gomoku_rl

Folders and files

Latest commit

History

Repository files navigation

Gomoku RL

Introduction

Installation

Getting Started

GUI

Algorithms

Details

TO DO

References

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages