milestone results

Jump to bottom

Marin Bukov edited this page Nov 20, 2016 · 8 revisions

potential problems of the algorithm

RL state does not know about physical state: trajectory loops potentially dangerous for convergence
Replays/Forced Learning induce overfitting: non-best (s,a) pairs also updated thru the tilings. This would not occur in a tabular algorithm but this is also the main reason why tabular methods learn slower.

Toggle table of contents Pages 4

Clone this wiki locally