Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# Reinforcement Learning : A Beginners Tutorial

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Login to see the comments

### Reinforcement Learning : A Beginners Tutorial

1. 1. Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
2. 2. The Problem<br />
3. 3. Agent-Environment Interface<br />
4. 4. Environment Model<br />
5. 5. Goals & Rewards<br />
6. 6. Returns<br />
7. 7. Credit-Assignment Problem<br />
8. 8. Markov Decision Process<br />An MDP is defined by &lt; S, A, p, r, &gt;<br />S - set of states of the environment<br />A(s)– set of actions possible in state s<br /> - probability of transition from s<br />- expected reward when executing ain s<br /> - discount rate for expected reward<br />Assumption: discrete timet = 0, 1, 2, . . .<br />r<br />r<br />r<br />t +2<br />t +3<br />. . .<br />s<br />. . .<br />t +1<br />s<br />s<br />s<br />t+3<br />t+1<br />t+2<br />t<br />a<br />a<br />a<br />a<br />t<br />t+1<br />t +2<br />t +3<br />
9. 9. Value Functions<br />
10. 10. Value Functions<br />
11. 11. Value Functions<br />
12. 12. Optimal Value Functions<br />
13. 13. Exploration-Exploitation Problem<br />
14. 14. Policies<br />
15. 15. Elementary Solution Methods<br />
16. 16. Dynamic Programming<br />
17. 17. Perfect Model<br />
18. 18. Bootstrapping<br />
19. 19. Generalized Policy Iteration<br />
20. 20. Efficiency of DP<br />
21. 21. Monte-Carlo Methods<br />
22. 22. Episodic Return<br />
23. 23. Advantages over DP<br /><ul><li>No Model
24. 24. Simulation OR part of Model
25. 25. Focus on small subset of states
26. 26. Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
27. 27. On-Policy VS Off-Policy<br />
28. 28. Action-value instead of State-value<br />
29. 29. Temporal-Difference Learning<br />
30. 30. Advantages of TD Learning<br />
31. 31. SARSA (On-Policy)<br />
32. 32. Q-Learning (Off-Policy)<br />
33. 33.
34. 34. Actor-Critic Methods(On-Policy)<br />
35. 35. R-Learning (Off-Policy)<br />&gt;&gt;Average Expected reward per time-step<br />
36. 36. Eligibility Traces<br />
37. 37.
38. 38.
39. 39. References<br />Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.<br />Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003<br />Slides for reading with :<br /><ul><li>Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009</li>