Skip to content

milestone results

Marin Bukov edited this page Nov 20, 2016 · 8 revisions

potential problems of the algorithm

  1. RL state does not know about physical state: trajectory loops potentially dangerous for convergence
  2. Replays/Forced Learning induce overfitting: non-best (s,a) pairs also updated thru the tilings. This would not occur in a tabular algorithm but this is also the main reason why tabular methods learn slower.
Clone this wiki locally