An introduction into Reinforcement Learning, with an outlook on some of the most prominent problems and promising research from the past couple of years.
2. 2
The difference in mind between man
and the higher animals, great as it is,
certainly is one of degree and not of
kind.
-- Charles Darwin
“
3. 3
Overview
1. A brief history of AI
2. Machine Learning today
3. Introduction to Reinforcement Learning
4. Problems in Reinforcement Learning
5. Promising Research
6. A look into the future
8. 8
Chessbot basics
MiniMax Search
● Concept:
“Maximize the evaluation of your
move while minimizing opponent's
move evaluation”
● Reviews each possible move sequence
● Has high time cost since every possible
future board position must be evaluated
13. Ultimate Machine Learning with Google Cloud 13
The old, algorithmic approach
“apple”
“orange”
“banana”
IF (round) THEN
IF (orange AND coarse) THEN
“orange”
ELSE IF (green AND smooth) THEN
“apple”
ELSE IF ...
...
ELSE IF …
“banana”
14. Ultimate Machine Learning with Google Cloud 14
Let the machine find the rules
“apple”
“orange”
“banana”
?
20. 20
Rapidly Accelerating Use of Deep Learning at Google
Number of projects using some form of deep learning
2012 2013 2014 2015
1500
1000
500
0
Used across products:
21. 21Confidential & ProprietaryGoogle Cloud Platform 21
Speech recognition
Audio Input
Deep
Recurrent
Neural Network
Text Output
● Reduced word errors by more than 30%
● 20% of Mobile queries are Voice Search
Google Research blog, August 2012, August 2015
“How cold is it
outside?”
24. Datatonic & you
PLACE IMAGE HERE
24
In popular culture:
+ ‘The next big thing’
+ sentient AI in the next 10 years
+ Will put humans out of a job
+ Foolproof
Machine learning:
25. Datatonic & you
PLACE IMAGE HERE
25
Really:
+ Been around for 60 years now
+ ‘Sentient next year’, every
year, for the last 60 years.
+ AI winters: 1970, 1990, … ?
+ Not foolproof
Machine learning:
26. 26
A person on a beach
flying a kite.
A person skiing down a
snow covered slope.
A group of giraffe standing
next to each other.
27. 27
A woman riding a horse
on a dirt road.
An airplane is parked on
the tarmac at an airport.
A group of people standing
on top of a beach.
31. 31
● Supervised: need large amounts of annotated
training data
● Static inference machines
● Bad transfer learning capabilities to new tasks
Practical limitations of current AI systems
50. 50
Advantages that AlphaGo can leverage
1. Fully deterministic: no noise in the game
2. Fully observed: each player has complete information and there are
no hidden variables. (unlike Poker for example)
3. Discrete action space.
4. Each game is relatively short (approximately 200 actions).
5. Target function is clear (win/lose) & fast to evaluate.
6. Huge datasets of human gameplay are available to bootstrap the
learning, so AlphaGo doesn’t have to start from scratch.
71. 71
Auxiliary Learning Signals (continued)
Divide observations in 3 classes:
1. Things that the agent can control
2. Things the agent cannot control but affect it
3. Things the agent cannot control but do not affect it
A good feature space for curiosity should model (1) and (2) and
be unaffected by (3).
82. Personal thoughts
▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
83. ▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
➢ Personal, digital assistants (Siri, Viv, Alexa, ...)
Personal thoughts
84. ▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
➢ Self driving cars
➢ Personal, digital assistants (Siri, Viv, Alexa, ...)
➢ Machine generated/augmented content
Personal thoughts
85.
86.
87.
88. ▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
▪ “Virtual Reality (VR) will become a mainstream
experience sharing platform”
Personal thoughts
89. ▪ “Intelligent software will become the main driver of
most technological advances in the next decade”
▪ “Virtual Reality (VR) will become a mainstream
experience sharing platform”
▪ “Natural language processing will be fundamental
to interacting with all of these new technologies”
Personal thoughts