SlideShare a Scribd company logo
1 of 60
Download to read offline
An Introduction to
Reinforcement Learning
Jie-Han Chen
NetDB, National Cheng Kung University
3/27, 2018 @ National Cheng Kung University, Taiwan
1
The content in this lecture were borrowed from:
1. Rich Sutton’s textbook
2. David Silver’s Reinforcement Learning class in UCL
3. Sergey Levine’s Deep Reinforcement Learning class in UCB
2
Disclamier
Syllabus
● Introduction to Reinforcement Learning
● Markov Decision Process
● Dynamic Programming
● Monte Carlo method
● Temporal Difference method
● Deep Reinforcement Learning
● Policy Gradient
● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning
● Active Research Issue
3
Resources
Textbooks:
● Reinforcement Learning: An Introduction, Sutton and Barto
● Algorithms for Reinforcement Learning, Szepesvari
Course:
● CS 294 Deep Reinforcement Learning, Berkeley
● David Silver’s Reinforcement Learning course, UCL
● CMU 10703 Deep Reinforcement Learning and Control, CMU
● Shan-Hung Wu’s Deep Learning course in NTHU
All of them are our reference materials in this lecture.
4
Outline
● Syllabus
● Introduction
● Elements of reinforcement learning and its objective
● History of RL
● Applications
● The challenge and active research fields in RL
● Research institute and notable researchers
5
Machine Learning
From David Silver’s RL course 6
Introduction to Reinforcement Learning
Reinforcement learning is a learning framework different from supervised learning
and unsupervised learning.
It is composed of series of perception and interaction between agent and
environment.
From Sutton’s book 7
Agent and Environment
At each step t the agent:
● Receives scalar reward Rt
● Receives observaiotn Ot
● Executes action At
The environment:
● Receives action At
● Emits observation Ot+1
● Emits scalar reward Rt+1
8
Introduction to Reinforcement Learning
Reinforcement Learning is often used to solve sequential decision problem.
● Goal: select actions to maximize total future reward
● Action may have long term consequences
● Reward may be delayed
● It may be better to sacrifice immediate reward to gain more long-term reward
● Eg:
○ A financial investiment
○ Chess game
9
Supervised Learning & Unsupervised Learning
The input data are independent (i.i.d).
Current output will not affect the next
input.
10
Reinforcement Learning
The agent’s action do affect the data
received in the future.
Figure from Wikipedia, made by waldoalvarez11
Introduction to Reinforcement Learning
● In reinforcement learning the
agent learns from trial and error.
● The better experience make the
agent learn better policy.
● What kind of experience is
better?
The image is from :
http://www.homemeeting.us/franktmc/maze_2.jpg
12
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
13
Elements of reinforcement learning - policy
Policy
● Define the learning agents’ way of behaving at a given time. Could be a
simple function or lookup table or search process
● Often denoted by
● Could be deterministic or stochastic
14
Elements of reinforcement learning - policy
If you are Russell Westbrook, and now
is defended by James Harden. With
this situation, you have 3 choices:
● Cut
● Shoot
● Pass
15
Stochastic policy
Probability
Action
16
Deterministic policy
Probability
Action
17
Policies - Action space
In reinforcement learning, we can categorize the problem by the action space into
2 types.
● Discrete action space
● Continuous action space
In previous example, the decision or the action are in discrete space, but there are
many example of continuous control, eg: robotic arm. The stochastic policy of
continuous control problem would like a probability density function.
18
Elements of reinforcement learning - reward
Reward: r / Rt
● Defines the goal in a reinforcement learning problem
● Indicates how well agent is doing at step t
● Immediately percepted from the environment
19
Elements of reinforcement learning - reward
+2
0 or -0.2?
20
Elements of reinforcement learning - reward
In chess or Go, the reward is defined
by its outcome.
● Win: +1
● Draw: 0
● Lose: -1
In most steps, we don’t receive any
reward(value = 0). It’s a kind of sparse
reward problem.
21
Elements of reinforcement learning - reward
If we want to reach the goal by less
steps, we often define the reward to
-1 when you take a step.
22
Elements of reinforcement learning - value function
Value function
● Indicates which decision is good in the long run.
● There are two forms:
○ state-value function
○ action-value function
● Unlike reward, value function is an estmated value.
23
Elements of reinforcement learning - value function
The game comes to 99 vs 98(our) and just
left 5 seconds to the end of the game.
Now, If you need to throw in in midfield,
which one would you pass the ball to?
1. 櫻木花道
2. 三井壽
24
Elements of reinforcement learning - model
Model of environments (optional)
● Use something to mimic the behavior of the environment.
● Allow inferences to be made about how the environment will behave.
(planning)
● Methods for solving reinforcement learning problems that use models for
planning are called model-based methods. The opposites are model-free
methods.
25
Elements of reinforcement learning - model
Interaction, inferences
Learn the model
The image is from David Silver’s RL course 26
Just like ...
27
Elements of reinforcement learning - model
28
Elements of reinforcement learning - model
29
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
30
The objective of reinforcement learning
Reinforcement learning is a framework
of goal directed learning.
The objective of reinforcement learning
is to maximize accumulative rewards in
each task.
The image is from:
https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
History of Reinforcement Learning
Reinforcement Learning is inspired by two domain knowledge
● Optimal control
● Biological learning system: Animal learning
32
Optimal control
It is a mathematical optimization method for deriving control policies
especially under certain constraints.
The optimization method is largely due to the work of Lev Pontryagin and
Richard Bellman in the 1950s.
33
Richard Bellman
Richard Bellman was an applied
mathematician, who introduced dynamic
programming in 1953.
Work:
● Bellman Equation
● Curse of dimensionality
● Bellman-Ford algorithm
34
Animal Learning
● Teach dog - positive reward
35
Animal Learning
● Teach dog - penalty (negative reward)
36
Some question about RL
● Why do we need to learn Reinforcement Learning?
● What make Reinforcement Learning spring up like mushrooms?
37
Backgammon (IBM, 1992)
Temporal difference learning and TD-Gammon, by
Gerald Tesauro, 1992
Gammon is 雙陸棋 in Chinese.
source: from wikipedia
38
Autonomous Helicopter (Stanford, 2000)
The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and
Pieter Abbeel in Stanford.
You can see more details on: http://heli.stanford.edu/39
Deep reinforcement learning in Atari game (2013)
Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning
end-to-end model to combine deep learning with raw inputs.
40
Deep reinforcement learning in Atari game (2013)
41
Deep Reinforcement Learning for Robotic Manipulation
42
AlphaGo (DeepMind, 2016)
43
AlphaGo (DeepMind, 2016)
AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and
deep reinforcement learning (policy gradient) to master the game of Go.
44
AlphaGo Zero (DeepMind, 2017)
AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with
2-head architecture to learn from scratch without human knowledge.
45
46
AlphaGo Zero (DeepMind, 2017)
Dota2 (OpenAI, 2017)
● Beats the world’s top professionals at 1v1 matches
● The bot learned from scratch by self-play
47
Dota2 (OpenAI, 2017)
48
Dota2 (OpenAI, 2017)
49
Alibaba (Starcraft1, multiagent)
50
Deep RL for Dialogue Generation (Li et al., 2016)
● RL agent generates more interactive responses
● RL agent tends to end a sentence with a question and hand the conversation
over to the user
● Next step: explore intrinsic rewards, large-scale training
From the slides on http://opendialogue.miulab.tw51
The Challenge of reinforcement learning
● Sparse reward issue
● Reward credit assignment
● Large space for exploration (trial-and-error)
● Imperfect information, partial observation
52
Active research domain
● Multiagent reinforcement learning
● Hierarchical reinforcement learning
● Inverse reinforcement learning
● Multi-task Transfer learning in reinforcement learning
● Meta learning
● One-shot reinforcement learning
● Deep reinforcement learning in dialogue generation
53
Research institute and notable researchers
54
The research scientists in RL you must know!
● Richard S. Sutton
● David Silver
● Pieter Abbeel
● Sergey Levine
55
Richard S. Sutton
● The founding father of reinforcement
learning
● Professor of Computer Science at University
of Alberta
● Temporal difference learning
● Dyna architecture
56
David Silver
● The research scientist in DeepMind
● Lead researcher on AlphaGo and AlphaGo
Zero team
● Supervised by Sutton in Ph.D
● A professor in University College London
before
57
Pieter Abbeel
● Professor in UC Berkeley
● Director of the UC Berkeley Robot Learning Lab
● Research scientist and advisor in OpenAI
58
Sergey Levine
● Assistant Professor in UC Berkeley
● Research scientist in Google Brain
● Autonomous robots
59
Question?
60

More Related Content

What's hot

An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMPuneet Kulyana
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorialYisong Yue
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 

What's hot (20)

An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 

Similar to An introduction to reinforcement learning

Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learningJie-Han Chen
 
Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Muhammed Kocabaş
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
 
Teacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningTeacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningMattia Racca
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Reinforcement learning in a nutshell
Reinforcement learning in a nutshellReinforcement learning in a nutshell
Reinforcement learning in a nutshellNing Zhou
 
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...Codemotion
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Xiaohu ZHU
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
Machine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brainMachine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brainDevGAMM Conference
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learningMarsan Ma
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Universitat Politècnica de Catalunya
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...SeriousGamesAssoc
 

Similar to An introduction to reinforcement learning (20)

Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)Human-level Control Through Deep Reinforcement Learning (Presentation)
Human-level Control Through Deep Reinforcement Learning (Presentation)
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
Teacher-Aware Active Robot Learning
Teacher-Aware Active Robot LearningTeacher-Aware Active Robot Learning
Teacher-Aware Active Robot Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Reinforcement learning in a nutshell
Reinforcement learning in a nutshellReinforcement learning in a nutshell
Reinforcement learning in a nutshell
 
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Machine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brainMachine Learning in Unity - How to give your game AI a real brain
Machine Learning in Unity - How to give your game AI a real brain
 
Unit5: Learning
Unit5: LearningUnit5: Learning
Unit5: Learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learning
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
 

More from Jie-Han Chen

Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Deep reinforcement learning
Deep reinforcement learningDeep reinforcement learning
Deep reinforcement learningJie-Han Chen
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processJie-Han Chen
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLJie-Han Chen
 
BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)Jie-Han Chen
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchainJie-Han Chen
 
The artofreadablecode
The artofreadablecodeThe artofreadablecode
The artofreadablecodeJie-Han Chen
 

More from Jie-Han Chen (10)

Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Deep reinforcement learning
Deep reinforcement learningDeep reinforcement learning
Deep reinforcement learning
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
The artofreadablecode
The artofreadablecodeThe artofreadablecode
The artofreadablecode
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 

Recently uploaded (20)

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 

An introduction to reinforcement learning

  • 1. An Introduction to Reinforcement Learning Jie-Han Chen NetDB, National Cheng Kung University 3/27, 2018 @ National Cheng Kung University, Taiwan 1
  • 2. The content in this lecture were borrowed from: 1. Rich Sutton’s textbook 2. David Silver’s Reinforcement Learning class in UCL 3. Sergey Levine’s Deep Reinforcement Learning class in UCB 2 Disclamier
  • 3. Syllabus ● Introduction to Reinforcement Learning ● Markov Decision Process ● Dynamic Programming ● Monte Carlo method ● Temporal Difference method ● Deep Reinforcement Learning ● Policy Gradient ● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning ● Active Research Issue 3
  • 4. Resources Textbooks: ● Reinforcement Learning: An Introduction, Sutton and Barto ● Algorithms for Reinforcement Learning, Szepesvari Course: ● CS 294 Deep Reinforcement Learning, Berkeley ● David Silver’s Reinforcement Learning course, UCL ● CMU 10703 Deep Reinforcement Learning and Control, CMU ● Shan-Hung Wu’s Deep Learning course in NTHU All of them are our reference materials in this lecture. 4
  • 5. Outline ● Syllabus ● Introduction ● Elements of reinforcement learning and its objective ● History of RL ● Applications ● The challenge and active research fields in RL ● Research institute and notable researchers 5
  • 6. Machine Learning From David Silver’s RL course 6
  • 7. Introduction to Reinforcement Learning Reinforcement learning is a learning framework different from supervised learning and unsupervised learning. It is composed of series of perception and interaction between agent and environment. From Sutton’s book 7
  • 8. Agent and Environment At each step t the agent: ● Receives scalar reward Rt ● Receives observaiotn Ot ● Executes action At The environment: ● Receives action At ● Emits observation Ot+1 ● Emits scalar reward Rt+1 8
  • 9. Introduction to Reinforcement Learning Reinforcement Learning is often used to solve sequential decision problem. ● Goal: select actions to maximize total future reward ● Action may have long term consequences ● Reward may be delayed ● It may be better to sacrifice immediate reward to gain more long-term reward ● Eg: ○ A financial investiment ○ Chess game 9
  • 10. Supervised Learning & Unsupervised Learning The input data are independent (i.i.d). Current output will not affect the next input. 10
  • 11. Reinforcement Learning The agent’s action do affect the data received in the future. Figure from Wikipedia, made by waldoalvarez11
  • 12. Introduction to Reinforcement Learning ● In reinforcement learning the agent learns from trial and error. ● The better experience make the agent learn better policy. ● What kind of experience is better? The image is from : http://www.homemeeting.us/franktmc/maze_2.jpg 12
  • 13. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 13
  • 14. Elements of reinforcement learning - policy Policy ● Define the learning agents’ way of behaving at a given time. Could be a simple function or lookup table or search process ● Often denoted by ● Could be deterministic or stochastic 14
  • 15. Elements of reinforcement learning - policy If you are Russell Westbrook, and now is defended by James Harden. With this situation, you have 3 choices: ● Cut ● Shoot ● Pass 15
  • 18. Policies - Action space In reinforcement learning, we can categorize the problem by the action space into 2 types. ● Discrete action space ● Continuous action space In previous example, the decision or the action are in discrete space, but there are many example of continuous control, eg: robotic arm. The stochastic policy of continuous control problem would like a probability density function. 18
  • 19. Elements of reinforcement learning - reward Reward: r / Rt ● Defines the goal in a reinforcement learning problem ● Indicates how well agent is doing at step t ● Immediately percepted from the environment 19
  • 20. Elements of reinforcement learning - reward +2 0 or -0.2? 20
  • 21. Elements of reinforcement learning - reward In chess or Go, the reward is defined by its outcome. ● Win: +1 ● Draw: 0 ● Lose: -1 In most steps, we don’t receive any reward(value = 0). It’s a kind of sparse reward problem. 21
  • 22. Elements of reinforcement learning - reward If we want to reach the goal by less steps, we often define the reward to -1 when you take a step. 22
  • 23. Elements of reinforcement learning - value function Value function ● Indicates which decision is good in the long run. ● There are two forms: ○ state-value function ○ action-value function ● Unlike reward, value function is an estmated value. 23
  • 24. Elements of reinforcement learning - value function The game comes to 99 vs 98(our) and just left 5 seconds to the end of the game. Now, If you need to throw in in midfield, which one would you pass the ball to? 1. 櫻木花道 2. 三井壽 24
  • 25. Elements of reinforcement learning - model Model of environments (optional) ● Use something to mimic the behavior of the environment. ● Allow inferences to be made about how the environment will behave. (planning) ● Methods for solving reinforcement learning problems that use models for planning are called model-based methods. The opposites are model-free methods. 25
  • 26. Elements of reinforcement learning - model Interaction, inferences Learn the model The image is from David Silver’s RL course 26
  • 28. Elements of reinforcement learning - model 28
  • 29. Elements of reinforcement learning - model 29
  • 30. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 30
  • 31. The objective of reinforcement learning Reinforcement learning is a framework of goal directed learning. The objective of reinforcement learning is to maximize accumulative rewards in each task. The image is from: https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
  • 32. History of Reinforcement Learning Reinforcement Learning is inspired by two domain knowledge ● Optimal control ● Biological learning system: Animal learning 32
  • 33. Optimal control It is a mathematical optimization method for deriving control policies especially under certain constraints. The optimization method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s. 33
  • 34. Richard Bellman Richard Bellman was an applied mathematician, who introduced dynamic programming in 1953. Work: ● Bellman Equation ● Curse of dimensionality ● Bellman-Ford algorithm 34
  • 35. Animal Learning ● Teach dog - positive reward 35
  • 36. Animal Learning ● Teach dog - penalty (negative reward) 36
  • 37. Some question about RL ● Why do we need to learn Reinforcement Learning? ● What make Reinforcement Learning spring up like mushrooms? 37
  • 38. Backgammon (IBM, 1992) Temporal difference learning and TD-Gammon, by Gerald Tesauro, 1992 Gammon is 雙陸棋 in Chinese. source: from wikipedia 38
  • 39. Autonomous Helicopter (Stanford, 2000) The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and Pieter Abbeel in Stanford. You can see more details on: http://heli.stanford.edu/39
  • 40. Deep reinforcement learning in Atari game (2013) Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning end-to-end model to combine deep learning with raw inputs. 40
  • 41. Deep reinforcement learning in Atari game (2013) 41
  • 42. Deep Reinforcement Learning for Robotic Manipulation 42
  • 44. AlphaGo (DeepMind, 2016) AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and deep reinforcement learning (policy gradient) to master the game of Go. 44
  • 45. AlphaGo Zero (DeepMind, 2017) AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with 2-head architecture to learn from scratch without human knowledge. 45
  • 47. Dota2 (OpenAI, 2017) ● Beats the world’s top professionals at 1v1 matches ● The bot learned from scratch by self-play 47
  • 51. Deep RL for Dialogue Generation (Li et al., 2016) ● RL agent generates more interactive responses ● RL agent tends to end a sentence with a question and hand the conversation over to the user ● Next step: explore intrinsic rewards, large-scale training From the slides on http://opendialogue.miulab.tw51
  • 52. The Challenge of reinforcement learning ● Sparse reward issue ● Reward credit assignment ● Large space for exploration (trial-and-error) ● Imperfect information, partial observation 52
  • 53. Active research domain ● Multiagent reinforcement learning ● Hierarchical reinforcement learning ● Inverse reinforcement learning ● Multi-task Transfer learning in reinforcement learning ● Meta learning ● One-shot reinforcement learning ● Deep reinforcement learning in dialogue generation 53
  • 54. Research institute and notable researchers 54
  • 55. The research scientists in RL you must know! ● Richard S. Sutton ● David Silver ● Pieter Abbeel ● Sergey Levine 55
  • 56. Richard S. Sutton ● The founding father of reinforcement learning ● Professor of Computer Science at University of Alberta ● Temporal difference learning ● Dyna architecture 56
  • 57. David Silver ● The research scientist in DeepMind ● Lead researcher on AlphaGo and AlphaGo Zero team ● Supervised by Sutton in Ph.D ● A professor in University College London before 57
  • 58. Pieter Abbeel ● Professor in UC Berkeley ● Director of the UC Berkeley Robot Learning Lab ● Research scientist and advisor in OpenAI 58
  • 59. Sergey Levine ● Assistant Professor in UC Berkeley ● Research scientist in Google Brain ● Autonomous robots 59