Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Autonomous Car by Jatin Sharma 12522 views
- Reinforcement learning by Chandra Meena 25685 views
- Theories of Learning by MingMing Davis 1356 views
- Reinforcement learning 7313 by Slideshare 11100 views
- Agent architectures by Antonio Moreno 23507 views
- A very easy explanation to understa... by Ryo Onozuka 1781 views

16,405 views

Nov. 03, 2009

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

No Downloads

Total views

16,405

On SlideShare

0

From Embeds

0

Number of Embeds

27

Shares

0

Downloads

489

Comments

7

Likes

10

No notes for slide

- 1. Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
- 2. The Problem<br />
- 3. Agent-Environment Interface<br />
- 4. Environment Model<br />
- 5. Goals & Rewards<br />
- 6. Returns<br />
- 7. Credit-Assignment Problem<br />
- 8. Markov Decision Process<br />An MDP is defined by < S, A, p, r, ><br />S - set of states of the environment<br />A(s)– set of actions possible in state s<br /> - probability of transition from s<br />- expected reward when executing ain s<br /> - discount rate for expected reward<br />Assumption: discrete timet = 0, 1, 2, . . .<br />r<br />r<br />r<br />t +2<br />t +3<br />. . .<br />s<br />. . .<br />t +1<br />s<br />s<br />s<br />t+3<br />t+1<br />t+2<br />t<br />a<br />a<br />a<br />a<br />t<br />t+1<br />t +2<br />t +3<br />
- 9. Value Functions<br />
- 10. Value Functions<br />
- 11. Value Functions<br />
- 12. Optimal Value Functions<br />
- 13. Exploration-Exploitation Problem<br />
- 14. Policies<br />
- 15. Elementary Solution Methods<br />
- 16. Dynamic Programming<br />
- 17. Perfect Model<br />
- 18. Bootstrapping<br />
- 19. Generalized Policy Iteration<br />
- 20. Efficiency of DP<br />
- 21. Monte-Carlo Methods<br />
- 22. Episodic Return<br />
- 23. Advantages over DP<br /><ul><li>No Model
- 24. Simulation OR part of Model
- 25. Focus on small subset of states
- 26. Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
- 27. On-Policy VS Off-Policy<br />
- 28. Action-value instead of State-value<br />
- 29. Temporal-Difference Learning<br />
- 30. Advantages of TD Learning<br />
- 31. SARSA (On-Policy)<br />
- 32. Q-Learning (Off-Policy)<br />
- 33.
- 34. Actor-Critic Methods(On-Policy)<br />
- 35. R-Learning (Off-Policy)<br />>>Average Expected reward per time-step<br />
- 36. Eligibility Traces<br />
- 37.
- 38.
- 39. References<br />Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.<br />Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003<br />Slides for reading with :<br /><ul><li>Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009</li>

No public clipboards found for this slide

Login to see the comments