Skip to content

Solving Markov Decision Process using Value Iteration and Policy Iteration, SARSA, Expected SARSA and Q-Learning

Notifications You must be signed in to change notification settings

Atul-Acharya-17/Markov-Decision-Process

Repository files navigation

Markov Decision Process

Problem Description


1

The maze environment is a 6x6 grid world which contains walls, rewards and penalties. All green squares have a reward of +1. All orange/red squares have a penalty of -1. All white squares have a reward of -0.04. All grey squares are walls and these are unreachable states. Hence the reward for walls need not be defined and has been set to 0 for this experiment.


1

The transition model is as follows: the intended outcome occurs with probability 0.8, and with probability 0.1 the agent moves at either right angle to the intended direction. If the move would make the agent walk into a wall, the agent stays in the same place as before. The rewards for the white squares are -0.04, for the green squares are +1, and for the brown squares are -1. Note that there are no terminal states; the agent’s state sequence is infinite.


Note

There is no terminal state. For some reinforcement learning algorithms, the epsiode ends after a certain number of steps.


Getting Started

Install Requirements

pip install -r requirements.txt

Run Code

python __main__.py --algorithm=value_iteration --display_policy=True --display_utilities=True

Choice of Algorithms

  • value_iteration
  • policy_iteration
  • sarsa
  • expected_sarsa
  • q_learning
  • monte_carlo
  • dyna_q

Results

Value Iteration

Policy

1

Utilities

1

Plot

1

Policy Iteration

Policy

1

Utilities

1

Plot

1

About

Solving Markov Decision Process using Value Iteration and Policy Iteration, SARSA, Expected SARSA and Q-Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published