Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reinforcement Learning : A Beginners Tutorial

15,949 views

Published on

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

  • Be the first to comment

Reinforcement Learning : A Beginners Tutorial

  1. 1. Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
  2. 2. The Problem<br />
  3. 3. Agent-Environment Interface<br />
  4. 4. Environment Model<br />
  5. 5. Goals & Rewards<br />
  6. 6. Returns<br />
  7. 7. Credit-Assignment Problem<br />
  8. 8. Markov Decision Process<br />An MDP is defined by &lt; S, A, p, r, &gt;<br />S - set of states of the environment<br />A(s)– set of actions possible in state s<br /> - probability of transition from s<br />- expected reward when executing ain s<br /> - discount rate for expected reward<br />Assumption: discrete timet = 0, 1, 2, . . .<br />r<br />r<br />r<br />t +2<br />t +3<br />. . .<br />s<br />. . .<br />t +1<br />s<br />s<br />s<br />t+3<br />t+1<br />t+2<br />t<br />a<br />a<br />a<br />a<br />t<br />t+1<br />t +2<br />t +3<br />
  9. 9. Value Functions<br />
  10. 10. Value Functions<br />
  11. 11. Value Functions<br />
  12. 12. Optimal Value Functions<br />
  13. 13. Exploration-Exploitation Problem<br />
  14. 14. Policies<br />
  15. 15. Elementary Solution Methods<br />
  16. 16. Dynamic Programming<br />
  17. 17. Perfect Model<br />
  18. 18. Bootstrapping<br />
  19. 19. Generalized Policy Iteration<br />
  20. 20. Efficiency of DP<br />
  21. 21. Monte-Carlo Methods<br />
  22. 22. Episodic Return<br />
  23. 23. Advantages over DP<br /><ul><li>No Model
  24. 24. Simulation OR part of Model
  25. 25. Focus on small subset of states
  26. 26. Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
  27. 27. On-Policy VS Off-Policy<br />
  28. 28. Action-value instead of State-value<br />
  29. 29. Temporal-Difference Learning<br />
  30. 30. Advantages of TD Learning<br />
  31. 31. SARSA (On-Policy)<br />
  32. 32. Q-Learning (Off-Policy)<br />
  33. 33.
  34. 34. Actor-Critic Methods(On-Policy)<br />
  35. 35. R-Learning (Off-Policy)<br />&gt;&gt;Average Expected reward per time-step<br />
  36. 36. Eligibility Traces<br />
  37. 37.
  38. 38.
  39. 39. References<br />Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.<br />Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003<br />Slides for reading with :<br /><ul><li>Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009</li>

×