SlideShare a Scribd company logo
1
Reinforcement Learning
By Usman Qayyum
13, Nov, 2018
Machine Learning Expert ?
2
Supervised Learning suffers from
underline human-bias present in the data
Machine Learning
• Supervised Learning
Example Class
• Reinforcement Learning
Situation Reward Situation Reward
…
• Un-Supervised Learning
Example
Classification
Regression
Clustering
Auto-Encoder
Qlearning, DQN
Policy Gradient
Actor-Critic
3
Human Learning (Trail & Error)
● Achieves Goal Fail to achieve Goal
Baby starts walking and successfully reaches the couch
4
Reinforcement Learning
● Trial & error learning
● Learning from interaction
● Learning what to do—how to map
situations to actions—so as to maximize a
numerical reward signal
5
How to Formulate RL Problem
Environment—Physical world in which the agent
operates
State—Current situation of the agent
Action— Agent interaction with environment
through actions
Reward—Feedback from the environment
Policy—Method to map agent’s state to actions
Value—Future reward that an agent would receive
by taking an action in a particular state
6
RL Applications (Games/Networking)
Objective Complete the game with the highest score
State Raw pixel inputs of the game state
Action Game controls e.g. Left, Right, Up, Down
Reward Score increase/decrease at each time step
Objective Win the game!
State Position of all pieces
Action Where to put the next piece down
Reward 1 if win at the end of the game, 0 otherwise
Objective Intelligent Channel Selection
State Occupation on each channel in current time slot
Action Set the channel to be used for the next time slot
Reward +1 in case of no collision with interferer
otherwise -17
Markov Decision Process 
8
Markov Decision Process
9
• MDP is used to describe an environment for reinforcement learning
• Almost all RL problems can be formalized as MDPs
Markov property states that, “ The future is independent of the past given the present.”
P[St+1 | St ] = P[ St+1 | S1, ….. , St ]
Markov Chain Transition matrix
Markov reward
Model / Model-Free Learning
10
Environment (Taxi Game)
11
Representations
WALL --> (Can't pass through, will remain in the same position
Yellow --> Taxi Current Location
Blue --> Pick up Location
Purple --> Drop-off Location
Green --> Taxi turn green once passenger board
Q Learning …
● Q-Table is just a fancy name for a simple lookup table where we calculate
the maximum expected future rewards for action at each state.
But the questions are:
How do we calculate the values of the Q-table?
Are the values available or predefined?12
States = 500
Actions
0: move south
1: move north
2: move east
3: move west
4: pickup passenger
5: dropoff passenger
Reward:
+20: successfully pick up a passenger and
drop them off at desired location
-1: for each step
-10: every time you incorrectly pick up or
drop off a passenger
Q Learning …
Step1: When the episode initially starts, every Q-value is 0.
13
Q Learning …
Step 2&3: choose and perform an action
In the beginning, the agent will explore the environment and randomly choose actions.
As the agent explores the environment, the agent starts to exploit the environment.
14
Q Learning …
Step 4 & 5: Measure reward and Update Q Table
The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a).
Learning Rate Discount Factor (Future reward)
15
Q-Learning to DQN
16
Google Deep-mind (Deep Q-Network)
17 “Human-level control through deep reinforcement learning”, Nature, 2015
Gym
A library that can simulate large numbers of reinforcement learning environments, including Atari games
18
• Lack of standardization of environments used in publications
• The need for better benchmarks.
Example: Taxi Game Problem (OpenAI Gym)
19
Example-1
20
Example-2
21
Example-2 …
22
23
Deep Q-Network
Human-level control through deep reinforcement learning – Nature Vol 518, Feb 26, 2015
By Usman Qayyum
15, Nov, 2018
24
Model-Free RL (Recap)
● Policy-based RL
○ Search directly for the optimal policy ∏*
○ This is the policy achieving maximum future reward
● Value-based RL
○ Estimate the optimal value function Q*(s,a)
○ This is the maximum value achievable under any
policy
25
Q-Learning to DQN (Value based RL )
26
Q-table is like a “cheat-sheet” to help us to find the maximum expected
future reward of an action, given a current state.
• Good strategy — however, this is not scalable.
Playing Atari with Deep RL (Nature, 2015)
● Played seven Atari 2600 games
● Beat previous ML approaches on six
● Beat human expert on three
● Aim to create a single neural network
agent that is able to successfully learn
to play as many of the games as
possible.
● Learns strictly from experience - no pre-
training.
● Inputs: game screen + score.
● No game-specific tuning.
27
What’s Next
28
Atari
● Rules of the game unknown
● Learn directly from interactive
game play
● Pick Action on joystick, see pixels
and score
29
Preprocessing & Temporal limitation
30
Convolution Layer/Fully Connected
31
• Frames are processed by three convolution layers.
• These layers allow you to exploit spatial relationships in images.
• But also, because frames are stacked together, you can exploit
some spatial properties across those frames.
Experience Replay
32
Experience replay will help us to handle two things:
Avoid forgetting previous experiences: the variability of the weights, because
there is high correlation between actions and states.
Solution: create a “replay buffer.” This stores experience tuples while interacting
with the environment, and then we sample a small batch of tuple to feed our neural
network.
Reduce correlations between experiences: we know that every action affects the next state. This
outputs a sequence of experience tuples which can be highly correlated
Solution: By sampling from the replay buffer at random, we can break this correlation. This prevents
action values from oscillating or diverging catastrophically.
Clipping Rewards
33
Each game has different score scales. For example, in Pong, players
can get 1 point when wining the play. Otherwise, players get -1 point.
However, in SpaceInvaders, players get 10~30 points when defeating
invaders. This difference would make training unstable.
Thus Clipping Rewards technique clips scores, which all positive
rewards are set +1 and all negative rewards are set -1.
DQN Algorithm
34
Performance
35
Recent Graph from Google Deepmind, 2018
(current trend in RL Gaming)
Naïve DQN vs Replay-buffer-based DQN
STRENGTHS AND WEAKNESSES
● Good at
‣ Quick-moving, complex, short-horizon games ‣ Semi-independent trails
within the game
‣ Negative feedback on failure
● Bad at
‣ long-horizon games that don’t converge ‣ Any “walking around” game
‣ Montezuma’s revenge
Worldly knowledge helps humans play these games relatively easily.
36
Example Code
● DQN with Atari Game
○ Colab jupyter notebooks
37
Reference
● Rich Sutton, Reinforcement Learning: an introduction, 2017
● Deep Reinforcement Learning, An overview, 2017 https://arxiv.org/pdf/1701.07274.pdf
● UCL course Reinforcement Learning:
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
● CS231, Reinfrocement Learning, Lecture 14, 2017
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf
● Thomas Simonini, Medium Post “An introduction to Reinforcement Learning”
https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-
4339519de419
● Arthur Juliani, Medium Post “Simple Reinforcement Learning in Tensorflow”,
https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-1-
fd544fab149
38

More Related Content

What's hot

An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
Jie-Han Chen
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Nikolay Pavlov
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
Jie-Han Chen
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
Dong Guo
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
Thomas da Silva Paula
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
Chandra Meena
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
VARUN KUMAR
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AI
Kirti Verma
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Muhammad Iqbal Tawakal
 
Artificial Intelligence -- Search Algorithms
Artificial Intelligence-- Search Algorithms Artificial Intelligence-- Search Algorithms
Artificial Intelligence -- Search Algorithms
Syed Ahmed
 

What's hot (20)

An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AI
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Artificial Intelligence -- Search Algorithms
Artificial Intelligence-- Search Algorithms Artificial Intelligence-- Search Algorithms
Artificial Intelligence -- Search Algorithms
 

Similar to Deep Reinforcement Learning

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
MLconf
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
ManiMaran230751
 
RL.ppt
RL.pptRL.ppt
RL.ppt
AzharJamil15
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
OswaldoAndrsOrdezBol
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Elias Hasnat
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
MLconf
 
Finalver
FinalverFinalver
Finalver
Natan Katz
 
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Shakeeb Ahmad Mohammad Mukhtar
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Hogeon Seo
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
Utkarsh Garg
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
Giancarlo Frison
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
Julia Maddalena
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
Mohammaderfan Arefimoghaddam
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
Natan Katz
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
M Waleed Kadous
 

Similar to Deep Reinforcement Learning (20)

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Finalver
FinalverFinalver
Finalver
 
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 

More from Usman Qayyum

Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
Usman Qayyum
 
Ai for kids
Ai for kidsAi for kids
Ai for kids
Usman Qayyum
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
Usman Qayyum
 
Thermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkThermal colorization using Deep Neural Network
Thermal colorization using Deep Neural Network
Usman Qayyum
 
Introduction to deep Learning
Introduction to deep LearningIntroduction to deep Learning
Introduction to deep Learning
Usman Qayyum
 

More from Usman Qayyum (6)

Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Ai for kids
Ai for kidsAi for kids
Ai for kids
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
Thermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkThermal colorization using Deep Neural Network
Thermal colorization using Deep Neural Network
 
Introduction to deep Learning
Introduction to deep LearningIntroduction to deep Learning
Introduction to deep Learning
 

Recently uploaded

Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 

Recently uploaded (20)

Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 

Deep Reinforcement Learning

  • 1. 1 Reinforcement Learning By Usman Qayyum 13, Nov, 2018
  • 2. Machine Learning Expert ? 2 Supervised Learning suffers from underline human-bias present in the data
  • 3. Machine Learning • Supervised Learning Example Class • Reinforcement Learning Situation Reward Situation Reward … • Un-Supervised Learning Example Classification Regression Clustering Auto-Encoder Qlearning, DQN Policy Gradient Actor-Critic 3
  • 4. Human Learning (Trail & Error) ● Achieves Goal Fail to achieve Goal Baby starts walking and successfully reaches the couch 4
  • 5. Reinforcement Learning ● Trial & error learning ● Learning from interaction ● Learning what to do—how to map situations to actions—so as to maximize a numerical reward signal 5
  • 6. How to Formulate RL Problem Environment—Physical world in which the agent operates State—Current situation of the agent Action— Agent interaction with environment through actions Reward—Feedback from the environment Policy—Method to map agent’s state to actions Value—Future reward that an agent would receive by taking an action in a particular state 6
  • 7. RL Applications (Games/Networking) Objective Complete the game with the highest score State Raw pixel inputs of the game state Action Game controls e.g. Left, Right, Up, Down Reward Score increase/decrease at each time step Objective Win the game! State Position of all pieces Action Where to put the next piece down Reward 1 if win at the end of the game, 0 otherwise Objective Intelligent Channel Selection State Occupation on each channel in current time slot Action Set the channel to be used for the next time slot Reward +1 in case of no collision with interferer otherwise -17
  • 9. Markov Decision Process 9 • MDP is used to describe an environment for reinforcement learning • Almost all RL problems can be formalized as MDPs Markov property states that, “ The future is independent of the past given the present.” P[St+1 | St ] = P[ St+1 | S1, ….. , St ] Markov Chain Transition matrix Markov reward
  • 10. Model / Model-Free Learning 10
  • 11. Environment (Taxi Game) 11 Representations WALL --> (Can't pass through, will remain in the same position Yellow --> Taxi Current Location Blue --> Pick up Location Purple --> Drop-off Location Green --> Taxi turn green once passenger board
  • 12. Q Learning … ● Q-Table is just a fancy name for a simple lookup table where we calculate the maximum expected future rewards for action at each state. But the questions are: How do we calculate the values of the Q-table? Are the values available or predefined?12 States = 500 Actions 0: move south 1: move north 2: move east 3: move west 4: pickup passenger 5: dropoff passenger Reward: +20: successfully pick up a passenger and drop them off at desired location -1: for each step -10: every time you incorrectly pick up or drop off a passenger
  • 13. Q Learning … Step1: When the episode initially starts, every Q-value is 0. 13
  • 14. Q Learning … Step 2&3: choose and perform an action In the beginning, the agent will explore the environment and randomly choose actions. As the agent explores the environment, the agent starts to exploit the environment. 14
  • 15. Q Learning … Step 4 & 5: Measure reward and Update Q Table The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a). Learning Rate Discount Factor (Future reward) 15
  • 17. Google Deep-mind (Deep Q-Network) 17 “Human-level control through deep reinforcement learning”, Nature, 2015
  • 18. Gym A library that can simulate large numbers of reinforcement learning environments, including Atari games 18 • Lack of standardization of environments used in publications • The need for better benchmarks.
  • 19. Example: Taxi Game Problem (OpenAI Gym) 19
  • 23. 23 Deep Q-Network Human-level control through deep reinforcement learning – Nature Vol 518, Feb 26, 2015 By Usman Qayyum 15, Nov, 2018
  • 24. 24
  • 25. Model-Free RL (Recap) ● Policy-based RL ○ Search directly for the optimal policy ∏* ○ This is the policy achieving maximum future reward ● Value-based RL ○ Estimate the optimal value function Q*(s,a) ○ This is the maximum value achievable under any policy 25
  • 26. Q-Learning to DQN (Value based RL ) 26 Q-table is like a “cheat-sheet” to help us to find the maximum expected future reward of an action, given a current state. • Good strategy — however, this is not scalable.
  • 27. Playing Atari with Deep RL (Nature, 2015) ● Played seven Atari 2600 games ● Beat previous ML approaches on six ● Beat human expert on three ● Aim to create a single neural network agent that is able to successfully learn to play as many of the games as possible. ● Learns strictly from experience - no pre- training. ● Inputs: game screen + score. ● No game-specific tuning. 27
  • 29. Atari ● Rules of the game unknown ● Learn directly from interactive game play ● Pick Action on joystick, see pixels and score 29
  • 30. Preprocessing & Temporal limitation 30
  • 31. Convolution Layer/Fully Connected 31 • Frames are processed by three convolution layers. • These layers allow you to exploit spatial relationships in images. • But also, because frames are stacked together, you can exploit some spatial properties across those frames.
  • 32. Experience Replay 32 Experience replay will help us to handle two things: Avoid forgetting previous experiences: the variability of the weights, because there is high correlation between actions and states. Solution: create a “replay buffer.” This stores experience tuples while interacting with the environment, and then we sample a small batch of tuple to feed our neural network. Reduce correlations between experiences: we know that every action affects the next state. This outputs a sequence of experience tuples which can be highly correlated Solution: By sampling from the replay buffer at random, we can break this correlation. This prevents action values from oscillating or diverging catastrophically.
  • 33. Clipping Rewards 33 Each game has different score scales. For example, in Pong, players can get 1 point when wining the play. Otherwise, players get -1 point. However, in SpaceInvaders, players get 10~30 points when defeating invaders. This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.
  • 35. Performance 35 Recent Graph from Google Deepmind, 2018 (current trend in RL Gaming) Naïve DQN vs Replay-buffer-based DQN
  • 36. STRENGTHS AND WEAKNESSES ● Good at ‣ Quick-moving, complex, short-horizon games ‣ Semi-independent trails within the game ‣ Negative feedback on failure ● Bad at ‣ long-horizon games that don’t converge ‣ Any “walking around” game ‣ Montezuma’s revenge Worldly knowledge helps humans play these games relatively easily. 36
  • 37. Example Code ● DQN with Atari Game ○ Colab jupyter notebooks 37
  • 38. Reference ● Rich Sutton, Reinforcement Learning: an introduction, 2017 ● Deep Reinforcement Learning, An overview, 2017 https://arxiv.org/pdf/1701.07274.pdf ● UCL course Reinforcement Learning: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html ● CS231, Reinfrocement Learning, Lecture 14, 2017 http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf ● Thomas Simonini, Medium Post “An introduction to Reinforcement Learning” https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning- 4339519de419 ● Arthur Juliani, Medium Post “Simple Reinforcement Learning in Tensorflow”, https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-1- fd544fab149 38