SlideShare a Scribd company logo
1 of 15
Download to read offline
What is Reinforcement Learning?
Branch of machine learning concerned with taking
sequences of actions
Usually described in terms of agent interacting with a
previously unknown environment, trying to maximize
cumulative reward
Agent Environment
action
observation, reward
Formalized as partially observable Markov decision
process (POMDP)
Motor Control and Robotics
Robotics:
Observations: camera images, joint angles
Actions: joint torques
Rewards: stay balanced, navigate to target locations,
serve and protect humans
Business Operations
Inventory Management
Observations: current inventory levels
Actions: number of units of each item to purchase
Rewards: profit
In Other ML Problems
Classification with Hard Attention1
Observation: current image window
Action: where to look
Reward: +1 for correct classification
Sequential/structured prediction, e.g., machine
translation2
Observations: words in source language
Action: emit word in target language
Reward: sentence-level accuracy metric, e.g. BLEU score
1
V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu. “Recurrent models of visual attention”. NIPS. 2014.
2
H. Daum´e III, J. Langford, and D. Marcu. “Search-based structured prediction”. (2009); S. Ross,
G. J. Gordon, and D. Bagnell. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online
Learning.” AISTATS. 2011; M. Norouzi, S. Bengio, N. Jaitly, M. Schuster, Y. Wu, et al. “Reward augmented
maximum likelihood for neural structured prediction”. NIPS. 2016.
What Is Deep Reinforcement Learning?
Reinforcement learning using neural networks to approximate
functions
Policies (select next action)
Value functions (measure goodness of states or
state-action pairs)
Dynamics Models (predict next states and rewards)
How Does RL Relate to Other ML Problems?
Supervised learning:
Environment samples input-output pair (xt, yt) ∼ ρ
Agent predicts ˆyt = f (xt)
Agent receives loss (yt, ˆyt)
Environment asks agent a question, and then tells it the
right answer
How Does RL Relate to Other ML Problems?
Contextual bandits:
Environment samples input xt ∼ ρ
Agent takes action ˆyt = f (xt)
Agent receives cost ct ∼ P(ct | xt, ˆyt) where P is an
unknown probability distribution
Environment asks agent a question, and gives agent a
noisy score on its answer
Application: personalized recommendations
How Does RL Relate to Other ML Problems?
Reinforcement learning:
Environment samples input xt ∼ P(xt | xt−1, yt−1)
Environment is stateful: input depends on your previous
actions!
Agent takes action ˆyt = f (xt)
Agent receives cost ct ∼ P(ct | xt, ˆyt) where P a
probability distribution unknown to the agent.
How Does RL Relate to Other Machine Learning
Problems?
Summary of differences between RL and supervised learning:
You don’t have full access to the function you’re trying to
optimize—must query it through interaction.
Interacting with a stateful world: input xt depend on your
previous actions
1990s, beginnings
K. S. Narendra and K. Parthasarathy. “Identification and control of dynamical systems using neural networks”.
IEEE Transactions on neural networks (1990) W. T. Miller, P. J. Werbos, and R. S. Sutton. Neural networks for
control. 1991
1990s, beginnings
L.-J. Lin. Reinforcement learning for robots using neural networks. Tech. rep. DTIC Document, 1993
1990s, beginnings
G. Tesauro. “Temporal difference learning and TD-Gammon”. Communications of the ACM (1995). Figures
from R. S. Sutton and A. G. Barto. Introduction to reinforcement learning. MIT Press, 1998
Recent Success Stories in Deep RL
ATARI using deep Q-learning3
, policy gradients4
,
DAGGER5
3
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, et al. “Playing Atari with Deep Reinforcement
Learning”. (2013).
4
J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. “Trust Region Policy Optimization”. (2015);
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, et al. “Asynchronous methods for deep reinforcement
learning”. (2016).
5
X. Guo, S. Singh, H. Lee, R. L. Lewis, and X. Wang. “Deep learning for real-time Atari game play using offline
Monte-Carlo tree search planning”. NIPS. 2014.
Recent Success Stories in Deep RL
Robotic manipulation using guided policy search6
Robotic locomotion using policy gradients7
6
S. Levine, C. Finn, T. Darrell, and P. Abbeel. “End-to-end training of deep visuomotor policies”. (2015).
7
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. “High-dimensional continuous control using
generalized advantage estimation”. (2015).
Recent Success Stories in Deep RL
AlphaGo: supervised learning + policy gradients + value
functions + Monte-Carlo tree search8
8
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, et al. “Mastering the game of Go with deep neural
networks and tree search”. Nature (2016).

More Related Content

What's hot

25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
How popular are your tweets?
How popular are your tweets?How popular are your tweets?
How popular are your tweets?avijit_saha
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
«Evolution strategies in reinforcement learning», Borys Tymchenko.
«Evolution strategies in reinforcement learning», Borys Tymchenko.«Evolution strategies in reinforcement learning», Borys Tymchenko.
«Evolution strategies in reinforcement learning», Borys Tymchenko.Provectus
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
Infusing Digital Technologies for an Engineering Laboratory
Infusing Digital Technologies for an Engineering LaboratoryInfusing Digital Technologies for an Engineering Laboratory
Infusing Digital Technologies for an Engineering LaboratoryAlex See
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0taeseon ryu
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learningbutest
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 

What's hot (20)

25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Intro rl
Intro rlIntro rl
Intro rl
 
How popular are your tweets?
How popular are your tweets?How popular are your tweets?
How popular are your tweets?
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
«Evolution strategies in reinforcement learning», Borys Tymchenko.
«Evolution strategies in reinforcement learning», Borys Tymchenko.«Evolution strategies in reinforcement learning», Borys Tymchenko.
«Evolution strategies in reinforcement learning», Borys Tymchenko.
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Infusing Digital Technologies for an Engineering Laboratory
Infusing Digital Technologies for an Engineering LaboratoryInfusing Digital Technologies for an Engineering Laboratory
Infusing Digital Technologies for an Engineering Laboratory
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 
introducción a Machine Learning
introducción a Machine Learningintroducción a Machine Learning
introducción a Machine Learning
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 

Similar to Lec0

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Human Factors and Ergonomics 2011 Maria C. R. Harrington
Human Factors and Ergonomics  2011 Maria C. R. HarringtonHuman Factors and Ergonomics  2011 Maria C. R. Harrington
Human Factors and Ergonomics 2011 Maria C. R. Harringtonmariacrharrington
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networksbutest
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Yoonho Lee
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...butest
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in FinanceHamed Vaheb
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationZina Petrushyna
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)butest
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 

Similar to Lec0 (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Human Factors and Ergonomics 2011 Maria C. R. Harrington
Human Factors and Ergonomics  2011 Maria C. R. HarringtonHuman Factors and Ergonomics  2011 Maria C. R. Harrington
Human Factors and Ergonomics 2011 Maria C. R. Harrington
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
My experiment
My experimentMy experiment
My experiment
 
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...
J. Scholz 1 , B. Hengst 2 , G. Calbert 1 , A. Antoniades 2 ...
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in Finance
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Evaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulationEvaluation of recommender technology using multi agent simulation
Evaluation of recommender technology using multi agent simulation
 
课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)课堂讲义(最后更新:2009-9-25)
课堂讲义(最后更新:2009-9-25)
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 

Recently uploaded

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfOrient Homes
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 

Recently uploaded (20)

VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdfCatalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
Catalogue ONG NƯỚC uPVC - HDPE DE NHAT.pdf
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 

Lec0

  • 1. What is Reinforcement Learning? Branch of machine learning concerned with taking sequences of actions Usually described in terms of agent interacting with a previously unknown environment, trying to maximize cumulative reward Agent Environment action observation, reward Formalized as partially observable Markov decision process (POMDP)
  • 2. Motor Control and Robotics Robotics: Observations: camera images, joint angles Actions: joint torques Rewards: stay balanced, navigate to target locations, serve and protect humans
  • 3. Business Operations Inventory Management Observations: current inventory levels Actions: number of units of each item to purchase Rewards: profit
  • 4. In Other ML Problems Classification with Hard Attention1 Observation: current image window Action: where to look Reward: +1 for correct classification Sequential/structured prediction, e.g., machine translation2 Observations: words in source language Action: emit word in target language Reward: sentence-level accuracy metric, e.g. BLEU score 1 V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu. “Recurrent models of visual attention”. NIPS. 2014. 2 H. Daum´e III, J. Langford, and D. Marcu. “Search-based structured prediction”. (2009); S. Ross, G. J. Gordon, and D. Bagnell. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.” AISTATS. 2011; M. Norouzi, S. Bengio, N. Jaitly, M. Schuster, Y. Wu, et al. “Reward augmented maximum likelihood for neural structured prediction”. NIPS. 2016.
  • 5. What Is Deep Reinforcement Learning? Reinforcement learning using neural networks to approximate functions Policies (select next action) Value functions (measure goodness of states or state-action pairs) Dynamics Models (predict next states and rewards)
  • 6. How Does RL Relate to Other ML Problems? Supervised learning: Environment samples input-output pair (xt, yt) ∼ ρ Agent predicts ˆyt = f (xt) Agent receives loss (yt, ˆyt) Environment asks agent a question, and then tells it the right answer
  • 7. How Does RL Relate to Other ML Problems? Contextual bandits: Environment samples input xt ∼ ρ Agent takes action ˆyt = f (xt) Agent receives cost ct ∼ P(ct | xt, ˆyt) where P is an unknown probability distribution Environment asks agent a question, and gives agent a noisy score on its answer Application: personalized recommendations
  • 8. How Does RL Relate to Other ML Problems? Reinforcement learning: Environment samples input xt ∼ P(xt | xt−1, yt−1) Environment is stateful: input depends on your previous actions! Agent takes action ˆyt = f (xt) Agent receives cost ct ∼ P(ct | xt, ˆyt) where P a probability distribution unknown to the agent.
  • 9. How Does RL Relate to Other Machine Learning Problems? Summary of differences between RL and supervised learning: You don’t have full access to the function you’re trying to optimize—must query it through interaction. Interacting with a stateful world: input xt depend on your previous actions
  • 10. 1990s, beginnings K. S. Narendra and K. Parthasarathy. “Identification and control of dynamical systems using neural networks”. IEEE Transactions on neural networks (1990) W. T. Miller, P. J. Werbos, and R. S. Sutton. Neural networks for control. 1991
  • 11. 1990s, beginnings L.-J. Lin. Reinforcement learning for robots using neural networks. Tech. rep. DTIC Document, 1993
  • 12. 1990s, beginnings G. Tesauro. “Temporal difference learning and TD-Gammon”. Communications of the ACM (1995). Figures from R. S. Sutton and A. G. Barto. Introduction to reinforcement learning. MIT Press, 1998
  • 13. Recent Success Stories in Deep RL ATARI using deep Q-learning3 , policy gradients4 , DAGGER5 3 V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, et al. “Playing Atari with Deep Reinforcement Learning”. (2013). 4 J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. “Trust Region Policy Optimization”. (2015); V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, et al. “Asynchronous methods for deep reinforcement learning”. (2016). 5 X. Guo, S. Singh, H. Lee, R. L. Lewis, and X. Wang. “Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning”. NIPS. 2014.
  • 14. Recent Success Stories in Deep RL Robotic manipulation using guided policy search6 Robotic locomotion using policy gradients7 6 S. Levine, C. Finn, T. Darrell, and P. Abbeel. “End-to-end training of deep visuomotor policies”. (2015). 7 J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. “High-dimensional continuous control using generalized advantage estimation”. (2015).
  • 15. Recent Success Stories in Deep RL AlphaGo: supervised learning + policy gradients + value functions + Monte-Carlo tree search8 8 D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, et al. “Mastering the game of Go with deep neural networks and tree search”. Nature (2016).