-
Notifications
You must be signed in to change notification settings - Fork 10
milestone results
Marin Bukov edited this page Nov 20, 2016
·
8 revisions
- RL state does not know about physical state: trajectory loops potentially dangerous for convergence
- Replays/Forced Learning induce overfitting: non-best (s,a) pairs also updated thru the tilings. This would not occur in a tabular algorithm but this is also the main reason why tabular methods learn slower.