tic-tac-toe-rl-implementation

Created the extended tic tac toe reinforcement learning example from the Barto and Sutton reinforcement learning book in typescript. The reinforcement learning was added on top of my past project, which was just a regular tic tac toe game: https://github.com/Joel-Singh/tic-tac-toe

How it works

Creates a list of all possible states attaching a value to each (i.e a policy.) Default values of 1, 0, and 0.5 for winning, losing, and all other states respectively. After each move, the value of every state is updated, bringing it closer to the value of the proceeding state. Try playing the same first couple of moves and see how it learns.

Interesting code snippets

Implementation of a policy type. Each policy also keeps track of the possible next policies from it.

// @ts-ignore
function Policy(state: BoardState, value: number, possibleMoves: ReturnType<typeof Policy>[]) {
  return {
    state,
    value,
    possibleMoves: possibleMoves as ReturnType<typeof Policy>[]
  }
}
export type Policy = ReturnType<typeof Policy>

Here is how I create all possible policies. Recursively as a tree. Reino represents the rl player.

// assume Reino always goes second and is x
export function createAllPossiblePolicies() {
  const initialPolicy = Policy(createEmptyBoardState(), 0, []);

  function addPolicy(turnSymbol: possibleBoardStates, policy: Policy) {
    const newPolicyState = createCopy(policy.state)
    for (let i = 0; i < 9; i++) {
      if (policy.state[i] === 'empty') {
        newPolicyState[i] = turnSymbol;

        const newPolicy = Policy(createCopy(newPolicyState), 0, []);

        const xWinning = isWinner(newPolicyState, 'x');
        const oWinning = isWinner(newPolicyState, 'o');

        if (xWinning) {
          newPolicy.value = 1;
        } else if (oWinning) {
          newPolicy.value = 0;
        } else {
          newPolicy.value = 0.5;
        }

        policy.possibleMoves.push(newPolicy);

        const notAWinner = !(xWinning || oWinning);
        if (notAWinner) {
          addPolicy(turnSymbol === 'x' ? 'o' : 'x', newPolicy);
        }

        newPolicyState[i] = 'empty';
      } else {
        policy.possibleMoves.push(null);
      }
    }
  }

  addPolicy('o', initialPolicy);
  return initialPolicy;

  }
}

Function to update policies

  // equation on page 10 of rl by barto and sutton second edition: V(S_t) <- V(S_t) + a[V(S_t+1) - V(S_t)]
  function updatePolicy(currentPolicy: Policy, previousPolicy: Policy) {
    const stepSizeParameter = 1;
    previousPolicy.value = previousPolicy.value + stepSizeParameter*(currentPolicy.value - previousPolicy.value);
  }

Live Demo

https://joel-singh.github.io/tic-tac-toe-rl-implementation/

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
project-screenshot.png		project-screenshot.png
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tic-tac-toe-rl-implementation

How it works

Interesting code snippets

Live Demo

About

Releases

Packages

Languages

Joel-Singh/tic-tac-toe-rl-implementation

Folders and files

Latest commit

History

Repository files navigation

tic-tac-toe-rl-implementation

How it works

Interesting code snippets

Live Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages