SlideShare a Scribd company logo

RL.ppt

A
A

Reinforcement learning

RL.ppt

1 of 16
Download to read offline
Reinforcement Learning
Russell and Norvig: ch 21
CMSC 671 – Fall 2005
Slides from Jean-Claude
Latombe and Lise Getoor
Reinforcement Learning
Supervised (inductive) learning is the simplest and
most studied type of learning
How can an agent learn behaviors when it doesn’t
have a teacher to tell it how to perform?
 The agent has a task to perform
 It takes some actions in the world
 At some later point, it gets feedback telling it how well it did
on performing the task
 The agent performs the same task over and over again
This problem is called reinforcement learning:
 The agent gets positive reinforcement for tasks done well
 The agent gets negative reinforcement for tasks done poorly
Reinforcement Learning (cont.)
The goal is to get the agent to act in the
world so as to maximize its rewards
The agent has to figure out what it did that
made it get the reward/punishment
 This is known as the credit assignment problem
Reinforcement learning approaches can be
used to train computers to do many tasks
 backgammon and chess playing
 job shop scheduling
 controlling robot limbs
Reinforcement learning on the
web
Nifty applets:
 for blackjack
 for robot motion
 for a pendulum controller
Formalization
Given:
 a state space S
 a set of actions a1, …, ak
 reward value at the end of each trial (may
be positive or negative)
Output:
 a mapping from states to actions
example: Alvinn (driving agent)
state: configuration of the car
learn a steering action for each state
Accessible or
observable state
Repeat:
 s  sensed state
 If s is terminal then exit
 a  choose action (given s)
 Perform a
Reactive Agent Algorithm

Recommended

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 

More Related Content

Similar to RL.ppt

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxdeeplearning6
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdfNamanJain758248
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationbutest
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement LearningUtkarsh Garg
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 

Similar to RL.ppt (20)

Q_Learning.ppt
Q_Learning.pptQ_Learning.ppt
Q_Learning.ppt
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Intro rl
Intro rlIntro rl
Intro rl
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdf
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentation
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 

Recently uploaded

Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxJisc
 
EDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfEDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfElizabeth Walsh
 
UniSC Moreton Bay Library self-guided tour
UniSC Moreton Bay Library self-guided tourUniSC Moreton Bay Library self-guided tour
UniSC Moreton Bay Library self-guided tourUSC_Library
 
LOGISTICS AND SUPPLY CHAIN MANAGEMENT
LOGISTICS  AND  SUPPLY CHAIN  MANAGEMENTLOGISTICS  AND  SUPPLY CHAIN  MANAGEMENT
LOGISTICS AND SUPPLY CHAIN MANAGEMENThpirrjournal
 
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Model
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment ModelKochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Model
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Modelsandeepmenon62
 
Different types of animal Tissues DMLT .pptx
Different types of animal Tissues DMLT .pptxDifferent types of animal Tissues DMLT .pptx
Different types of animal Tissues DMLT .pptxPunamSahoo3
 
Chromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-PrincipleChromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-Principleblessipriyanka
 
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)Rabiya Husain
 
D.pharmacy Pharmacology 4th unit notes.pdf
D.pharmacy Pharmacology 4th unit notes.pdfD.pharmacy Pharmacology 4th unit notes.pdf
D.pharmacy Pharmacology 4th unit notes.pdfSUMIT TIWARI
 
skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its typesMinaxi patil. CATALLYST
 
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRAS
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRASACTIVIDAD DE CLASE No 1 - SOPA DE LETRAS
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRASMaria Lucia Céspedes
 
Appendicular SkeletonSystem PPT.....pptx
Appendicular SkeletonSystem PPT.....pptxAppendicular SkeletonSystem PPT.....pptx
Appendicular SkeletonSystem PPT.....pptxRenuka N Sunagad
 
Grantseeking Solo- Securing Awards with Limited Staff PDF.pdf
Grantseeking Solo- Securing Awards with Limited Staff  PDF.pdfGrantseeking Solo- Securing Awards with Limited Staff  PDF.pdf
Grantseeking Solo- Securing Awards with Limited Staff PDF.pdfTechSoup
 
Introduction of General Pharmacology PPT.pptx
Introduction of General Pharmacology PPT.pptxIntroduction of General Pharmacology PPT.pptx
Introduction of General Pharmacology PPT.pptxRenuka N Sunagad
 
Practical Research 1: Nature of Inquiry and Research.pptx
Practical Research 1: Nature of Inquiry and Research.pptxPractical Research 1: Nature of Inquiry and Research.pptx
Practical Research 1: Nature of Inquiry and Research.pptxKatherine Villaluna
 
Narrative Exploration of New Categories at Mondelēz
Narrative Exploration of New Categories at MondelēzNarrative Exploration of New Categories at Mondelēz
Narrative Exploration of New Categories at MondelēzRay Poynter
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsTushar Tank
 
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptx
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptxCapTechTalks Webinar Feb 2024 Darrell Burrell.pptx
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptxCapitolTechU
 
John See - Narrative Story
John See - Narrative StoryJohn See - Narrative Story
John See - Narrative StoryAlan See
 

Recently uploaded (20)

Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
EDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfEDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdf
 
UniSC Moreton Bay Library self-guided tour
UniSC Moreton Bay Library self-guided tourUniSC Moreton Bay Library self-guided tour
UniSC Moreton Bay Library self-guided tour
 
LOGISTICS AND SUPPLY CHAIN MANAGEMENT
LOGISTICS  AND  SUPPLY CHAIN  MANAGEMENTLOGISTICS  AND  SUPPLY CHAIN  MANAGEMENT
LOGISTICS AND SUPPLY CHAIN MANAGEMENT
 
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Model
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment ModelKochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Model
Kochi Mulesoft Meetup # 17 - RTF on OpenShift Deployment Model
 
Different types of animal Tissues DMLT .pptx
Different types of animal Tissues DMLT .pptxDifferent types of animal Tissues DMLT .pptx
Different types of animal Tissues DMLT .pptx
 
Chromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-PrincipleChromatography-Gas chromatography-Principle
Chromatography-Gas chromatography-Principle
 
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
HOW TO DEVELOP A RESEARCH PROPOSAL (FOR RESEARCH SCHOLARS)
 
D.pharmacy Pharmacology 4th unit notes.pdf
D.pharmacy Pharmacology 4th unit notes.pdfD.pharmacy Pharmacology 4th unit notes.pdf
D.pharmacy Pharmacology 4th unit notes.pdf
 
skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its types
 
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRAS
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRASACTIVIDAD DE CLASE No 1 - SOPA DE LETRAS
ACTIVIDAD DE CLASE No 1 - SOPA DE LETRAS
 
Appendicular SkeletonSystem PPT.....pptx
Appendicular SkeletonSystem PPT.....pptxAppendicular SkeletonSystem PPT.....pptx
Appendicular SkeletonSystem PPT.....pptx
 
Grantseeking Solo- Securing Awards with Limited Staff PDF.pdf
Grantseeking Solo- Securing Awards with Limited Staff  PDF.pdfGrantseeking Solo- Securing Awards with Limited Staff  PDF.pdf
Grantseeking Solo- Securing Awards with Limited Staff PDF.pdf
 
Introduction of General Pharmacology PPT.pptx
Introduction of General Pharmacology PPT.pptxIntroduction of General Pharmacology PPT.pptx
Introduction of General Pharmacology PPT.pptx
 
Caldecott Medal Book Winners and Media Used
Caldecott Medal Book Winners and Media UsedCaldecott Medal Book Winners and Media Used
Caldecott Medal Book Winners and Media Used
 
Practical Research 1: Nature of Inquiry and Research.pptx
Practical Research 1: Nature of Inquiry and Research.pptxPractical Research 1: Nature of Inquiry and Research.pptx
Practical Research 1: Nature of Inquiry and Research.pptx
 
Narrative Exploration of New Categories at Mondelēz
Narrative Exploration of New Categories at MondelēzNarrative Exploration of New Categories at Mondelēz
Narrative Exploration of New Categories at Mondelēz
 
Intuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov ChainsIntuition behind Monte Carlo Markov Chains
Intuition behind Monte Carlo Markov Chains
 
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptx
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptxCapTechTalks Webinar Feb 2024 Darrell Burrell.pptx
CapTechTalks Webinar Feb 2024 Darrell Burrell.pptx
 
John See - Narrative Story
John See - Narrative StoryJohn See - Narrative Story
John See - Narrative Story
 

RL.ppt

  • 1. Reinforcement Learning Russell and Norvig: ch 21 CMSC 671 – Fall 2005 Slides from Jean-Claude Latombe and Lise Getoor
  • 2. Reinforcement Learning Supervised (inductive) learning is the simplest and most studied type of learning How can an agent learn behaviors when it doesn’t have a teacher to tell it how to perform?  The agent has a task to perform  It takes some actions in the world  At some later point, it gets feedback telling it how well it did on performing the task  The agent performs the same task over and over again This problem is called reinforcement learning:  The agent gets positive reinforcement for tasks done well  The agent gets negative reinforcement for tasks done poorly
  • 3. Reinforcement Learning (cont.) The goal is to get the agent to act in the world so as to maximize its rewards The agent has to figure out what it did that made it get the reward/punishment  This is known as the credit assignment problem Reinforcement learning approaches can be used to train computers to do many tasks  backgammon and chess playing  job shop scheduling  controlling robot limbs
  • 4. Reinforcement learning on the web Nifty applets:  for blackjack  for robot motion  for a pendulum controller
  • 5. Formalization Given:  a state space S  a set of actions a1, …, ak  reward value at the end of each trial (may be positive or negative) Output:  a mapping from states to actions example: Alvinn (driving agent) state: configuration of the car learn a steering action for each state
  • 6. Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm
  • 7. Policy (Reactive/Closed-Loop Strategy) • A policy P is a complete mapping from states to actions -1 +1 2 3 1 4 3 2 1
  • 8. Repeat:  s  sensed state  If s is terminal then exit  a  P(s)  Perform a Reactive Agent Algorithm
  • 9. Approaches Learn policy directly– function mapping from states to actions Learn utility values for states (i.e., the value function)
  • 10. Value Function The agent knows what state it is in The agent has a number of actions it can perform in each state. Initially, it doesn't know the value of any of the states If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states:  U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space
  • 11. Exploration The agent may occasionally choose to explore suboptimal moves in the hopes of finding better outcomes  Only by visiting all the states frequently enough can we guarantee learning the true values of all the states A discount factor is often introduced to prevent utility values from diverging and to promote the use of shorter (more efficient) sequences of actions to attain rewards The update equation using a discount factor  is:  U(oldstate) = reward +  * U(newstate) Normally,  is set between 0 and 1
  • 12. Q-Learning Q-learning augments value iteration by maintaining an estimated utility value Q(s,a) for every action at every state The utility of a state U(s), or Q(s), is simply the maximum Q value over all the possible actions at that state Learns utilities of actions (not states)  model-free learning
  • 13. Q-Learning foreach state s foreach action a Q(s,a)=0 s=currentstate do forever a = select an action do action a r = reward from doing a t = resulting state from doing a Q(s,a) = (1 – ) Q(s,a) +  (r +  Q(t)) s = t The learning coefficient, , determines how quickly our estimates are updated Normally,  is set to a small positive constant less than 1
  • 14. Selecting an Action Simply choose action with highest (current) expected utility? Problem: each action has two effects  yields a reward (or penalty) on current sequence  information is received and used in learning for future sequences Trade-off: immediate good for long-term well- being stuck in a rut try a shortcut – you might get lost; you might learn a new, quicker route!
  • 15. Exploration policy Wacky approach (exploration): act randomly in hopes of eventually exploring entire environment Greedy approach (exploitation): act to maximize utility using current estimate Reasonable balance: act more wacky (exploratory) when agent has little idea of environment; more greedy when the model is close to correct Example: n-armed bandits…
  • 16. RL Summary Active area of research Approaches from both OR and AI There are many more sophisticated algorithms that we have not discussed Applicable to game-playing, robot controllers, others