1. Reinforcement Learning Michael L. Littman Slides from http://www.cs.vu.nl/~elena/ml_13light.ppt which appear to have been adapted from http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-3/www/l20.ps
16. Note we used general fact that: This works with things other than max that satisfy this non-expansion property [Szepesv á ri & Littman, 1999].
17.
18. Nondeterministic Case (2) Q learning generalizes to nondeterministic worlds Alter training rule to where Can still prove convergence of to Q [Watkins and Dayan, 1992]. Standard properties: n = 0, n 2 = .
19. Temporal Difference Learning (1) Q learning: reduce discrepancy between successive Q estimates One step time difference: Why not two steps? Or n ? Blend all of these: