The goal of this project is the development of an agent capable of playing the game of Blackjack with a positive net return (i.e. "beat the house") using reinforcement learning. For this purpose, different policies are tested in two different action spaces. Furthermore, we present a novel dynamic betting algorithm that enables the policy to additionally adapt the betting amount in each round. For more information see our project report.
We use two different action spaces for our static betting (i.e. betting the same amount each round) policies:
- Limited action space: Only two of the allowed actions in Blackjack are used: hit and stand
- Full action space: All allowed actions of Blackjack are used: hit, stand, double, split and insurance
In this action space the following (static betting) policies have been tested:
- Value iteration
- Monte Carlo learning
- Q-learning
- Double Q-learning
- SARSA learning
- Deep Q-network
In this action space the following (static betting) policies have been tested:
- Q-learning
- SARSA learning
During training of the aforementioned algorithms, the following exploration policies have been used to trade-off exploration and exploitation:
- Random policy
- Greedy policy
- Epsilon-greedy policy
- Upper confidence bound (UCB) policy
- Boltzmann policy
In this setting the static betting strategy (one of the policies from above) from one of the action spaces is augmented with our RL dynamic betting policy (see our project report for more detail), using (among other information) card counting.
For smaller decks (which are advantageous when using card counting), our method is able to provide large positive net return, i.e. "beats the house" and also surpasses conventional methods used by professional Blackjack players. For larger decks, our method has not been able to provide positive net return yet. We believe the cause of this lies in the (relatively) poor convergence of the static betting policies when trained in the full action space.