OpenSpiel 1.3
This release adds several games and algorithms, improvements, bug fixes, and documentation updates.
Support and Process changes
- Added Python 3.11 support
- Added Roshambo bot population to wheels
- Removed Python 3.6 support
- Upgraded versions of supported extra packages (OR-Tools, abseil, Jax, TF, Pytorch, etc.)
Games
- Bach or Stravisnky matrix game
- Block Dominoes (python)
- Crazy Eights
- Dhu Dizhu
- Liar's poker (python)
- MAEDN (Mensch Ärgere Dich Nicht)
- Nine Men's morris
Game Transforms
- Add Noisy utility to leaves game transform
- Add Zero-sum game transform
Other environments
- Atari Learning Environment (ALE)
Algorithms
- Boltzmann Policy Iteration (for mean-field games)
- Correlated Q-learning
- Information State MCTS, Cowling et al. '12 (Python)
- LOLA and LOLA-DiCE (Foerster, Chen, Al-Shedivat, et al. '18) and Opponent Shaping (JAX)
- MIP Nash solver (Sandholm, Gilpin, and Conitzer '05)
- Proximal Policy Optimization (PPO); adapted from CleanRL. Supports single-agent use case, tested on ALE.
- Regret-matching (Hart & Mas-Colell '00) for normal-form games and as a PSROv2 meta-solver
- Regularized Nash Dynamics (R-NaD), Perolat & de Vylder et. al '22, Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
Bots
- Simple heuristic Gin Rummy bot
- Roshambo bot population (see python/examples/roshambo_bot_population.py)
Examples
- Opponent shaping on iterated matrix games example
- Roshambo population example
- Using Nash bargaining solution for negotiation example
Improvements and other additions
- Add
Bot::Clone()
method for cloning bots - Avoid relying on C++ exceptions for playthrough tests
- Add support Agent-vs-Task case in Nash averaging
- Add scoring variants to the game Oh Hell
- Add eligibility traces in C++ Q-learning and SARSA
- Allow creation of per-player random policies
- Support simultaneous move games in policy aggregator and exploitability
- Support UCIBot via pybind11
- Add single_tensor observer for all games
- Add used_indices for non-marginal solvers in PSROv2
- Add Flat Dirichlet random policy sampling
- Add several options to bargaining game: probabilistic ending, max turns, discounted utilities
- Add lambda returns support to JAX policy gradient
- Several improvements to Gambit EFG parser / support
- Add support for softmax policies in fictitious play
- Add temperature parameter to fixed point MFG algorithms
- Add information state tensor to battleship
- Add option to tabular BR to return maximum entropy BR
Fixes
- Fix UCIBot compilation in Windows
- Misc fixes to Nash averaging
- RNaD: fix MLP torso in final layer
- Dark hex observation (max length)
- Fix max game length in abstracted poker games
- Fix legal moves in some ACPC(poker) game cases
- Fix joint policy aggregator
- Fix non-uniform chance outcome sampling in Deep CFR (TF2 & Pytorch)
- Fix randomization bug in alpha_zero_torch
Several other miscellaneous fixes and improvements.
Known issues
There are a few known issues that will be fixed in the coming months.
- Collision with pybind11 and version in C++ LibTorch AlphaZero. See #966.
- PyTorch NFSP convergence issue. See #1008.
Acknowledgments
Thanks to Google DeepMind for continued support of development and maintenance of OpenSpiel.
Thanks to all of our contributors:
- Core Team: https://github.com/deepmind/open_spiel/blob/master/docs/authors.md
- All Contributors: https://github.com/deepmind/open_spiel/graphs/contributors