Solve Grid world problem

1
Reinforcement Learning
in The Grid World problem
Author
Alireza Andalib
Learning Machine

6
Supervised Learning:
Example Class
Reinforcement Learning:
Situation Reward Situation Reward
…
RL

Supervised Learning
SystemInputs Outputs
Training Info = desired (target) outputs
Error = (target output – actual output)
7

RL
SystemInputs Outputs (“actions”)
Training Info = evaluations (“rewards” / “penalties”)
8

22
2525
1.7120 9.7461 3.1311 5.4209 1.0036
0.7994 2.9233 2.3299 1.9586 0.4665
0.0023 0.7899 07355 0.4364 0.2287-
0.7664- 0.8488- 0.0076 0.1855- 0.9621-
0.9949- 1.3554- 1.0946- 1.4766- 2.0021-

24
IPE
K100i,j
1.4008 9.5698 3.1841 5.4309 0.8827
0.6503 2.9231 1.9576 1.8581 0.3910
0.0303- 0.8137 0.7354 0.4787 0.2830-
0.4062- 0.0118- 0.0183 0.1828- 0.7333-
0.6535- 0.4780- 0.4594- 0.5763- 0.9488-

26
PI
State
Go Right Jump Go Left Jump Go Left
Go Up Go Up Go Left Go Up Go Left
Go Up Go Up Go Up Go Up Go Left

28
 Horstmann, Cay. "GridWorld". horstmann.com.
Accessed September 15, 2008
 www.inf.ed.ac.uk/teaching/courses/rl
 www.math-info.univ-paris5.fr/~bouzy/Doc/AA2/Reinforcement
 www.cs.berkeley.edu/~pabbeel/cs287-fa12
 courses.cs.washington.edu/courses/cse473/12sp/s
lides/16-mdp.pdf

Solve Grid world problem

Recommended

Recommended

More Related Content

Similar to Solve Grid world problem

Similar to Solve Grid world problem (20)

Recently uploaded

Recently uploaded (20)

Solve Grid world problem