Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Autonomous Car by Jatin Sharma 9331 views
- Reinforcement learning by Chandra Meena 14483 views
- Theories of Learning by Mi Ro Da 591 views
- Reinforcement learning 7313 by Slideshare 8553 views
- Agent architectures by Antonio Moreno 18003 views
- A very easy explanation to understa... by Ryo Onozuka 1286 views

15,949 views

Published on

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

No Downloads

Total views

15,949

On SlideShare

0

From Embeds

0

Number of Embeds

27

Shares

0

Downloads

478

Comments

0

Likes

10

No embeds

No notes for slide

- 1. Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
- 2. The Problem<br />
- 3. Agent-Environment Interface<br />
- 4. Environment Model<br />
- 5. Goals & Rewards<br />
- 6. Returns<br />
- 7. Credit-Assignment Problem<br />
- 8. Markov Decision Process<br />An MDP is defined by < S, A, p, r, ><br />S - set of states of the environment<br />A(s)– set of actions possible in state s<br /> - probability of transition from s<br />- expected reward when executing ain s<br /> - discount rate for expected reward<br />Assumption: discrete timet = 0, 1, 2, . . .<br />r<br />r<br />r<br />t +2<br />t +3<br />. . .<br />s<br />. . .<br />t +1<br />s<br />s<br />s<br />t+3<br />t+1<br />t+2<br />t<br />a<br />a<br />a<br />a<br />t<br />t+1<br />t +2<br />t +3<br />
- 9. Value Functions<br />
- 10. Value Functions<br />
- 11. Value Functions<br />
- 12. Optimal Value Functions<br />
- 13. Exploration-Exploitation Problem<br />
- 14. Policies<br />
- 15. Elementary Solution Methods<br />
- 16. Dynamic Programming<br />
- 17. Perfect Model<br />
- 18. Bootstrapping<br />
- 19. Generalized Policy Iteration<br />
- 20. Efficiency of DP<br />
- 21. Monte-Carlo Methods<br />
- 22. Episodic Return<br />
- 23. Advantages over DP<br /><ul><li>No Model
- 24. Simulation OR part of Model
- 25. Focus on small subset of states
- 26. Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
- 27. On-Policy VS Off-Policy<br />
- 28. Action-value instead of State-value<br />
- 29. Temporal-Difference Learning<br />
- 30. Advantages of TD Learning<br />
- 31. SARSA (On-Policy)<br />
- 32. Q-Learning (Off-Policy)<br />
- 33.
- 34. Actor-Critic Methods(On-Policy)<br />
- 35. R-Learning (Off-Policy)<br />>>Average Expected reward per time-step<br />
- 36. Eligibility Traces<br />
- 37.
- 38.
- 39. References<br />Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.<br />Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003<br />Slides for reading with :<br /><ul><li>Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009</li>

No public clipboards found for this slide

Be the first to comment