Lecture21

1,194 views
1,145 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,194
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
69
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture21

  1. 1. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  2. 2. Recap of Lectures 5-18 Supervised learning p g Data classification Labeled data Build a model that covers all the space Unsupervised learning Clustering Unlabeled data Group similar objects G i il bj t Association rule analysis Unlabeled data Get the most frequent/important associations Genetic Fuzzy Systems Slide 2 Artificial Intelligence Machine Learning
  3. 3. Today’s Agenda Introduction Reinforcement Learning Some examples before going farther Slide 3 Artificial Intelligence Machine Learning
  4. 4. Introduction What does reinforcement learning aim at? g Learning from interaction (with environment) Goal-directed learning GOAL State Environment Environment Action Agent agent Learning what to do and its effect Trial-and-error search and delayed reward Slide 4 Artificial Intelligence Machine Learning
  5. 5. Introduction Learn a reactive behaviors Behaviors as a mapping between perceptions and actions The Th agent has to exploit what it already knows in order to th t l it h t l dk i dt obtain reward, but it also has to explore in order to make better action selections in the future. Dilemma − neither exploitation nor exploration can be e a e t e e p o tat o o e p o at o ca pursued exclusively without failing at the task. Slide 5 Artificial Intelligence Machine Learning
  6. 6. How Can We Learn It? Look-up tables p Rules 1. 3. Perception Action State 1 Action 1 State 2 Action 2 State 3 Action 3 … … Neural Net orks Ne ral Networks Finite t Fi it automata t 2. 4. Slide 6 Artificial Intelligence Machine Learning
  7. 7. Reinforcement Learning Slide 7 Artificial Intelligence Machine Learning
  8. 8. Reinforcement Learning Reward function Agent r:S → R State Action or Reward st at r:S×A→ R rt Environment Agent and environment interact at discrete time steps t=0,1,2, … The agent g observes state at step t: st ε S produces action at at step t: at ε A(st) gets resulting reward: rt+1 ε R goes to the next step st+1 Slide 8 Artificial Intelligence Machine Learning
  9. 9. Reinforcement Learning Agent State Action Reward st at rt Environment Trace of a trial …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Agent goal: Maximize the total amount of reward t receives Therefore, that means maximizing not only the immediate reward, but cumulative reward in the long run Slide 9 Artificial Intelligence Machine Learning
  10. 10. Example of RL Example: Recycling robot State charge level of battery Actions look for cans, wait for can, go recharge Reward R d positive for finding cans, negative for running out of battery Slide 10 Artificial Intelligence Machine Learning
  11. 11. More precisely… Restricting to Markovian Decision Process (MDP) g ( ) Finite set of situations Finite t f ti Fi it set of actions Transition probabilities Reward probabilities This means that The agent needs to have complete information of the world State st+1 only depends on state st and action at Slide 11 Artificial Intelligence Machine Learning
  12. 12. Recycling Robot Example 1 − β , −3 β , R search wait 1, R wait search recharge 1, 0 High g Low search wait α ,R 1 − α ,R search wait search 1R 1, Slide 12 Artificial Intelligence Machine Learning
  13. 13. Recycling Robot Example S = {high, low} g A (high) = {wait, search} A (low ) = {wait, search, recharge} R search : expected # cans while searching R wait : expected # cans while waiting R search > R wait Slide 13 Artificial Intelligence Machine Learning
  14. 14. Breaking the Markovian Property Possible problems that do not satisfy MDP p y When action and states are not finite Solution: Discretize the set of actions and states When transition probabilities do not depend only on the current state Possible solution: represent states as structures build up over time from sequences of sensations q This is POMDP Partial observable MDP Use POMDP algorithms to solve these problems g Slide 14 Artificial Intelligence Machine Learning
  15. 15. Elements of Reinforcement Learning Slide 15 Artificial Intelligence Machine Learning
  16. 16. Elements of RL Policy: what to do Reward: what’s good Value: What’s good because it p ed cts reward a ue at s t predicts e a d Model: What follows what Slide 16 Artificial Intelligence Machine Learning
  17. 17. Components of an RL Agent Policy (behavior) Mapping from states to actions π*: S A Reward Local reward in state t: rt Model Probability of transition from state s to s’ by executing action a s T(s,a,s’) And The transitions probabilities depend only on these parameters This is not known by the agent Slide 17 Artificial Intelligence Machine Learning
  18. 18. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 18 Artificial Intelligence Machine Learning
  19. 19. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 19 Artificial Intelligence Machine Learning
  20. 20. Pursuing the goal: Maximize long term reward Slide 20 Artificial Intelligence Machine Learning
  21. 21. Goals and Rewards Ok, but I need to maximize my long term reward. How I , y g get the long term reward? Long term reward defined in terms of the goal of the agent The agent receives the local reward at each time step How? Intuitive idea: Sum all the rewards obtained so far Problem: It can increase heavily in non-ending tasks Slide 21 Artificial Intelligence Machine Learning
  22. 22. Goals and Rewards How can we deal with non-ending tasks? g Weighted addition of local rewards The γ parameter (0 < γ < 1) is the discounting factor e pa a ete ) s t e d scou t g acto …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Note t e b as for immediate rewards ote the bias o ed ate e a ds If you want to avoid it, set γ close to 1 Slide 22 Artificial Intelligence Machine Learning
  23. 23. Some examples Slide 23 Artificial Intelligence Machine Learning
  24. 24. Pole balancing Balance the pole p The car can move forward a d backward and bac a d Avoid failure: the pole falling beyond a certain critical angle the car hitting the end of the track g Reward -1 upon failure -ak, for k steps before failure a Slide 24 Artificial Intelligence Machine Learning
  25. 25. Mountain Car Problem Objective j Get to the top of the hill as qu c y quickly as poss b e possible State d fi iti St t definition: Car position and speed Actions Forward, reverse, none Reward -1 for each step that are not the on the top of the hill -number of steps before reaching the top of the hill Slide 25 Artificial Intelligence Machine Learning
  26. 26. Next Class How t l H to learn th policies the li i Slide 26 Artificial Intelligence Machine Learning
  27. 27. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

×