SlideShare a Scribd company logo
1 of 94
Download to read offline
1
An introduction to
by Xander Steenbrugge
Reinforcement
Learning
2
The difference in mind between man
and the higher animals, great as it is,
certainly is one of degree and not of
kind.
-- Charles Darwin
“
3
Overview
1. A brief history of AI
2. Machine Learning today
3. Introduction to Reinforcement Learning
4. Problems in Reinforcement Learning
5. Promising Research
6. A look into the future
4
A brief history of AI
5
1950: Alan Turing’s “Turing test”
6
1997:
IBM’s
Deep Blue
7
Chess branching factor ~ 35
8
Chessbot basics
MiniMax Search
● Concept:
“Maximize the evaluation of your
move while minimizing opponent's
move evaluation”
● Reviews each possible move sequence
● Has high time cost since every possible
future board position must be evaluated
9
Minimax strategy + brute force heuristics based state
evaluations
10
2011:
Apple’s Siri
IBM wins Jeopardy
11
12
The Deep Learning Revolution: ImageNet 2012
Deep Learning on Google Trends
Ultimate Machine Learning with Google Cloud 13
The old, algorithmic approach
“apple”
“orange”
“banana”
IF (round) THEN
IF (orange AND coarse) THEN
“orange”
ELSE IF (green AND smooth) THEN
“apple”
ELSE IF ...
...
ELSE IF …
“banana”
Ultimate Machine Learning with Google Cloud 14
Let the machine find the rules
“apple”
“orange”
“banana”
?
Confidential & Proprietary
ConvNets
Keys to Successful ML
Large Datasets Good Models Lots of Computation
17
Machine Learning Today
Confidential + Proprietary
Machine Learning is everywhere
already…
20
Rapidly Accelerating Use of Deep Learning at Google
Number of projects using some form of deep learning
2012 2013 2014 2015
1500
1000
500
0
Used across products:
21Confidential & ProprietaryGoogle Cloud Platform 21
Speech recognition
Audio Input
Deep
Recurrent
Neural Network
Text Output
● Reduced word errors by more than 30%
● 20% of Mobile queries are Voice Search
Google Research blog, August 2012, August 2015
“How cold is it
outside?”
Confidential & ProprietaryGoogle Cloud Platform 22
Google Translate
Confidential + Proprietary
Datatonic & you
PLACE IMAGE HERE
24
In popular culture:
+ ‘The next big thing’
+ sentient AI in the next 10 years
+ Will put humans out of a job
+ Foolproof
Machine learning:
Datatonic & you
PLACE IMAGE HERE
25
Really:
+ Been around for 60 years now
+ ‘Sentient next year’, every
year, for the last 60 years.
+ AI winters: 1970, 1990, … ?
+ Not foolproof
Machine learning:
26
A person on a beach
flying a kite.
A person skiing down a
snow covered slope.
A group of giraffe standing
next to each other.
27
A woman riding a horse
on a dirt road.
An airplane is parked on
the tarmac at an airport.
A group of people standing
on top of a beach.
28
29
Image classifiers are easily fooled!
30
31
● Supervised: need large amounts of annotated
training data
● Static inference machines
● Bad transfer learning capabilities to new tasks
Practical limitations of current AI systems
32
Introduction to
Reinforcement Learning
33
34
Policy
35
Policy
36
37
Policy networks
Raw Pixels
38
39
Policy Gradients
Run a policy for a while. See what actions led to high rewards. Increase their probability.
40
Practical Applications of RL
41
42
Data Centre cooling
Confidential + Proprietary
Confidential + Proprietary
46
Chess branching factor ~ 35
47
Go branching factor ~ 250
48
Neural nets to the rescue!
49
Deep Learning enhanced MiniMax
50
Advantages that AlphaGo can leverage
1. Fully deterministic: no noise in the game
2. Fully observed: each player has complete information and there are
no hidden variables. (unlike Poker for example)
3. Discrete action space.
4. Each game is relatively short (approximately 200 actions).
5. Target function is clear (win/lose) & fast to evaluate.
6. Huge datasets of human gameplay are available to bootstrap the
learning, so AlphaGo doesn’t have to start from scratch.
51
52
3:53
53
Image Segmentation
54
Deepdrive in GTA V
55
Open AI Universe
Universe Starter Agent
56
Problems in
Reinforcement Learning
57
Bad sample efficiency
58
Cold Start
59
Exploration vs Exploitation
60
Subgoal creation
61
Promising Research
62
Attention
63
64
Memory
65
Domain Transfer
66
PathNet
67
Physical Intuition
Simulation Real
Ground Truth
Prediction
68
Auxiliary Learning Signals
Policy
69
Auxiliary Learning Signals
70
Auxiliary Learning Signals
71
Auxiliary Learning Signals (continued)
Divide observations in 3 classes:
1. Things that the agent can control
2. Things the agent cannot control but affect it
3. Things the agent cannot control but do not affect it
A good feature space for curiosity should model (1) and (2) and
be unaffected by (3).
72
Auxiliary Learning Signals (continued)
73
Auxiliary Learning Signals (continued)
74
Learning to communicate
Observations
Memory State
Message
Policy
Network
Memory
State
Message
75
Third person imitation learning
76
Adversarial Networks
77
Generative Networks
78
Generative Networks
79
Generative Networks
80
Unsupervised Learning’s potential
7
2
1
0
4
FFN < Auto-Encoder < GAN
81
Personal thoughts
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
➢ Personal, digital assistants (Siri, Viv, Alexa, ...)
Personal thoughts
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
➢ Personal, digital assistants (Siri, Viv, Alexa, ...)
➢ Machine generated/augmented content
Personal thoughts
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
▪ “Virtual Reality (VR) will become a mainstream
experience sharing platform”
Personal thoughts
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
▪ “Virtual Reality (VR) will become a mainstream
experience sharing platform”
▪ “Natural language processing will be fundamental
to interacting with all of these new technologies”
Personal thoughts
Plenty of problems to solve...
Plenty of solutions around...
93
Thanks!
94
We are hiring!
datatonic.com

More Related Content

Similar to Introduction to reinforcement learning

DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
Gregory Renard
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
Jerry Lu
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introduction
Amer Noureddin
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its application
FELICIALILIANJ
 
AI – Risks, Opportunities and Ethical Issues April 2023.pdf
AI – Risks, Opportunities and Ethical Issues April 2023.pdfAI – Risks, Opportunities and Ethical Issues April 2023.pdf
AI – Risks, Opportunities and Ethical Issues April 2023.pdf
Adam Ford
 

Similar to Introduction to reinforcement learning (20)

DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
Raise of deep learning
Raise of deep learningRaise of deep learning
Raise of deep learning
 
Military Flight Training - Digital Technology Disruption Ahead?
Military Flight Training - Digital Technology Disruption Ahead?Military Flight Training - Digital Technology Disruption Ahead?
Military Flight Training - Digital Technology Disruption Ahead?
 
Principles of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine LearningPrinciples of Artificial Intelligence & Machine Learning
Principles of Artificial Intelligence & Machine Learning
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
 
1_intro2AI.pdf
1_intro2AI.pdf1_intro2AI.pdf
1_intro2AI.pdf
 
Sp14 cs188 lecture 1 - introduction
Sp14 cs188 lecture 1  - introductionSp14 cs188 lecture 1  - introduction
Sp14 cs188 lecture 1 - introduction
 
Ai titech-virach-20191026
Ai titech-virach-20191026Ai titech-virach-20191026
Ai titech-virach-20191026
 
Artificial Intelligence-Introduction
Artificial Intelligence-IntroductionArtificial Intelligence-Introduction
Artificial Intelligence-Introduction
 
Machine learning 101 - or less
Machine learning 101 - or lessMachine learning 101 - or less
Machine learning 101 - or less
 
AI = Amplified Intelligence | introduction to AI
AI = Amplified Intelligence | introduction to AIAI = Amplified Intelligence | introduction to AI
AI = Amplified Intelligence | introduction to AI
 
Artificial Intelligence and its application
Artificial Intelligence and its applicationArtificial Intelligence and its application
Artificial Intelligence and its application
 
Maritime Information Warfare - The Human Dimension
Maritime Information Warfare - The Human DimensionMaritime Information Warfare - The Human Dimension
Maritime Information Warfare - The Human Dimension
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
 
Assessing and screening whs regulatory risk using a machine learning model - ...
Assessing and screening whs regulatory risk using a machine learning model - ...Assessing and screening whs regulatory risk using a machine learning model - ...
Assessing and screening whs regulatory risk using a machine learning model - ...
 
Vertex perspectives artificial intelligence
Vertex perspectives   artificial intelligenceVertex perspectives   artificial intelligence
Vertex perspectives artificial intelligence
 
Vertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial IntelligenceVertex Perspectives | Artificial Intelligence
Vertex Perspectives | Artificial Intelligence
 
AI – Risks, Opportunities and Ethical Issues April 2023.pdf
AI – Risks, Opportunities and Ethical Issues April 2023.pdfAI – Risks, Opportunities and Ethical Issues April 2023.pdf
AI – Risks, Opportunities and Ethical Issues April 2023.pdf
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Introduction to reinforcement learning