© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Julien Simon
Principal Technical Evangelist, AI & Machine Learning, AWS
@julsimon
An Introduction
to Reinforcement Learning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Supervised learning
Run an algorithm on a labelled data set, i.e. a data set containing samples
and answers. Gradually, the model learns how to correctly predict the right
answer. Regression and classification are examples of supervised learning.
Unsupervised learning
Run an algorithm on an unlabelled data set, i.e. a data set containing
samples only. Here, the model progressively learns patterns in data and
organizes samples accordingly. Clustering and topic modeling are examples
of unsupervised learning.
Typesof MachineLearning
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Supervised learning
Unsupervised learning
Types of Machine LearningSOPHISTICATIONOFMLMODELS
AMOUNT OF TRAINING DATA REQUIRED
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typesof MachineLearning
AMOUNT OFTRAINING DATA REQUIRED
Supervised learning
Unsupervised learning
SOPHISTICATIONOFMLMODELS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typesof MachineLearning
Reinforcement learning
(RL)
Supervised learning
Unsupervised learning
AMOUNT OFTRAINING DATA REQUIRED
SOPHISTICATIONOFMLMODELS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Remember whenyoufirstlearned this?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Or this?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We didn’t have an extensive labelled data
set back then 
And yet we learned
How?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Defining Reinforcement Learning
An algorithm (aka an agent) interacts with its
environment.
The agent receives a positive or negative reward
for actions that it takes: rewards are computed by
a user-defined function which outputs a numeric
representation of the actions that should be
incentivized.
By trying to maximize the accumulation of
rewards, the agent learns an optimal strategy (aka
policy) for decision making.
Source: Wikipedia
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Usecases
• Large complex problems
• Uncertain, dynamic environments
• Continuous learning
• Supply chain management
• HVAC systems
• Industrial robotics
• Autonomous vehicles
• Portfolio management
• Oil exploration
• etc.
Caterpillar: 250-ton autonomous mining trucks
https://diginomica.com/2017/04/17/sending-disruption-mines/
https://www.cat.com/en_US/articles/customer-stories/built-for-it/thefutureisnow-driverless.html
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: navigatingamaze
• Imagine an agent learning to navigate a maze. It can move in certain directions but is
blocked from going through walls.
• The agent discovers its environment (the current maze) one step at at time, receiving a
reward each time: stepping into a dead end is a negative reward, moving one step closer
to the exit is a positive reward.
• After a certain number of steps (or if we found the exit), the current episode ends.
• After a certain number of episodes, the agent uses the action/reward data points to
train a model, in order to make better decisions next time around.
• One critical thing to understand is that the RL model isn’t trained on a predefined set of
labelled mazes (that would be supervised learning).
• This cycle of exploring and training is central to RL: given enough mazes and enough
training time, we would soon enough know how to navigate any maze.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Environment
• The space in which the RL model operates.
• This can be either a real-world environment
or a simulator.
• If you train a physical autonomous vehicle
on a physical road, that would be a real-
world environment.
• If you train a computer program that
models an autonomous vehicle driving on a
road, that would be a simulator… probably
a much safer option!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ExploitationvsExploration
• Selecting the next action is a balance
between exploitation (‘using what you’ve
learned’) and exploration (‘taking a chance
to learn new things’)
• If you favor exploitation, you may never
reach high-value rewards.
• If you favor exploration, you’ll probably run
into trouble very often!
• Initially, the agent will explore at random
for a fixed number of episodes (aka heatup
phase): this generates data for the first
round of training.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Training aRLmodel
1. Formulate the problem: goal, environment, state, actions, reward
2. Define the environment: real-world or simulator?
3. Define the presets
4. Write the training code and the value function
5. Train the model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AmazonSageMakerRL
Reinforcementlearningforeverydeveloperanddatascientist
Broad support
for frameworks
Broad support for simulation
environments including
SimuLink and MatLab
K E Y F E A T U R E S
TensorFlow,Apache
MXNet, Intel Coach, and
Ray RL support
2D & 3D physics
environments and
OpenAI Gym support
Supports Amazon Sumerian and
Amazon RoboMaker
Fully
managed
Example notebooks
and tutorials
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How can weget developers rolling
withreinforcement learning?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IntroducingAWS DeepRacer
Fullyautonomous1/18thscaleracecar, drivenbyreinforcementlearning
https://youtu.be/X-6v4RZy-TE
HD video camera
Dual-core Intel
processorFour-wheel drive
Dual power for
compute and drive
Accelerometer
Gyroscope
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DeepRacer
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DeepRacer League
CompetitiveracingleagueforAWSDeepRacer
Compete virtually onlineTrain models with RL
Race in trials Final at AWS re:Invent
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Getting started
http://aws.amazon.com/free
https://ml.aws
https://aws.amazon.com/sagemaker
https://aws.amazon.com/deepracer/
https://github.com/aws/sagemaker-python-sdk
https://github.com/awslabs/amazon-sagemaker-examples
https://medium.com/@julsimon
https://gitlab.com/juliensimon/dlnotebooks
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Julien Simon
Principal Technical Evangelist, AI & Machine Learning, AWS
@julsimon

An Introduction to Reinforcement Learning (December 2018)

  • 1.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Julien Simon Principal Technical Evangelist, AI & Machine Learning, AWS @julsimon An Introduction to Reinforcement Learning
  • 2.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Supervised learning Run an algorithm on a labelled data set, i.e. a data set containing samples and answers. Gradually, the model learns how to correctly predict the right answer. Regression and classification are examples of supervised learning. Unsupervised learning Run an algorithm on an unlabelled data set, i.e. a data set containing samples only. Here, the model progressively learns patterns in data and organizes samples accordingly. Clustering and topic modeling are examples of unsupervised learning. Typesof MachineLearning
  • 3.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Supervised learning Unsupervised learning Types of Machine LearningSOPHISTICATIONOFMLMODELS AMOUNT OF TRAINING DATA REQUIRED
  • 4.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Typesof MachineLearning AMOUNT OFTRAINING DATA REQUIRED Supervised learning Unsupervised learning SOPHISTICATIONOFMLMODELS
  • 5.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Typesof MachineLearning Reinforcement learning (RL) Supervised learning Unsupervised learning AMOUNT OFTRAINING DATA REQUIRED SOPHISTICATIONOFMLMODELS
  • 6.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Remember whenyoufirstlearned this?
  • 7.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Or this?
  • 8.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. We didn’t have an extensive labelled data set back then  And yet we learned How?
  • 9.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Defining Reinforcement Learning An algorithm (aka an agent) interacts with its environment. The agent receives a positive or negative reward for actions that it takes: rewards are computed by a user-defined function which outputs a numeric representation of the actions that should be incentivized. By trying to maximize the accumulation of rewards, the agent learns an optimal strategy (aka policy) for decision making. Source: Wikipedia
  • 10.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Usecases • Large complex problems • Uncertain, dynamic environments • Continuous learning • Supply chain management • HVAC systems • Industrial robotics • Autonomous vehicles • Portfolio management • Oil exploration • etc. Caterpillar: 250-ton autonomous mining trucks https://diginomica.com/2017/04/17/sending-disruption-mines/ https://www.cat.com/en_US/articles/customer-stories/built-for-it/thefutureisnow-driverless.html
  • 11.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Example: navigatingamaze • Imagine an agent learning to navigate a maze. It can move in certain directions but is blocked from going through walls. • The agent discovers its environment (the current maze) one step at at time, receiving a reward each time: stepping into a dead end is a negative reward, moving one step closer to the exit is a positive reward. • After a certain number of steps (or if we found the exit), the current episode ends. • After a certain number of episodes, the agent uses the action/reward data points to train a model, in order to make better decisions next time around. • One critical thing to understand is that the RL model isn’t trained on a predefined set of labelled mazes (that would be supervised learning). • This cycle of exploring and training is central to RL: given enough mazes and enough training time, we would soon enough know how to navigate any maze.
  • 12.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Environment • The space in which the RL model operates. • This can be either a real-world environment or a simulator. • If you train a physical autonomous vehicle on a physical road, that would be a real- world environment. • If you train a computer program that models an autonomous vehicle driving on a road, that would be a simulator… probably a much safer option!
  • 13.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. ExploitationvsExploration • Selecting the next action is a balance between exploitation (‘using what you’ve learned’) and exploration (‘taking a chance to learn new things’) • If you favor exploitation, you may never reach high-value rewards. • If you favor exploration, you’ll probably run into trouble very often! • Initially, the agent will explore at random for a fixed number of episodes (aka heatup phase): this generates data for the first round of training.
  • 14.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Training aRLmodel 1. Formulate the problem: goal, environment, state, actions, reward 2. Define the environment: real-world or simulator? 3. Define the presets 4. Write the training code and the value function 5. Train the model
  • 15.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AmazonSageMakerRL Reinforcementlearningforeverydeveloperanddatascientist Broad support for frameworks Broad support for simulation environments including SimuLink and MatLab K E Y F E A T U R E S TensorFlow,Apache MXNet, Intel Coach, and Ray RL support 2D & 3D physics environments and OpenAI Gym support Supports Amazon Sumerian and Amazon RoboMaker Fully managed Example notebooks and tutorials
  • 16.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 17.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 18.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved.
  • 19.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. How can weget developers rolling withreinforcement learning?
  • 21.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. IntroducingAWS DeepRacer Fullyautonomous1/18thscaleracecar, drivenbyreinforcementlearning https://youtu.be/X-6v4RZy-TE HD video camera Dual-core Intel processorFour-wheel drive Dual power for compute and drive Accelerometer Gyroscope © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer
  • 23.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer League CompetitiveracingleagueforAWSDeepRacer Compete virtually onlineTrain models with RL Race in trials Final at AWS re:Invent
  • 24.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Getting started http://aws.amazon.com/free https://ml.aws https://aws.amazon.com/sagemaker https://aws.amazon.com/deepracer/ https://github.com/aws/sagemaker-python-sdk https://github.com/awslabs/amazon-sagemaker-examples https://medium.com/@julsimon https://gitlab.com/juliensimon/dlnotebooks
  • 25.
    Thank you! © 2018,Amazon Web Services, Inc. or its affiliates. All rights reserved. Julien Simon Principal Technical Evangelist, AI & Machine Learning, AWS @julsimon

Editor's Notes

  • #4 1/The type of datasets Ground Truth typically helps create can be used to create extremely sophisticated models using a method called ‘supervised’ learning; this is common with computer vision, speech, and language. 2/It’s how we train Rekognition - our computer vision service is trained on tens of millions of labeled images, Polly’s lifelike voices come from hundreds of hours of scripted voice recordings, and so forth. 3/The sheer volume of the data, combined with deep learning neural networks, allows us to train models with human-like capabilities based on that data. 4/At the other end of this spectrum is ‘unsupervised’ learning, where algorithms don’t need large volumes of labeled data. 5/These approaches are commonly used for use cases such as anomaly detection; where the algorithm is only looking for statistical outliers in, say, a stream of data from an IoT temperature sensor. When it detects that the temperature is changing in a meaningful way, the model can send a signal and take action (open a window, for example). 6/These models are no less useful - in fact they are complementary to supervised methods - but they don’t attempt to mimic human level intelligence in the same way.
  • #5 1/ In the bottom right, we have a no man’s land where for the obvious reasons of not wanting to invest a lot for little gain, there’s no meaningful research happening. 2/ But, there’s fertile ground in the upper left
  • #21 1/ There are a lot of demands placed on organizations when dealing with documents. What they typically want to be able to do sounds straightforward… 2/ They want to be able to identify documents in any format; 3/ and then extract text from those documents, accurately. 4/ But there are a whole ton of challenges which make this difficult; such as the variety of forms and formats, and the quality. 5/ The way customers try to overcome this complexity today is by either by manual review (which is accurate, but time consuming and expensive), or 6/ with simple OCR and/or.. 7/ template based data extraction (which is fast, but tends not to be accurate enough, so they end up sending the documents to manual review or verification anyway). TRANSITION: we think there is a better way, and that instead of manual reviews, simplistic OCR, and templates, we can replace that heavy lifting with smart, cheap, powerful machine learning…
  • #23 1/ DeepRacer is a physical device, about the size of a shoe box, which is packed full of everything you need to learn about reinforcement learning through autonomous driving. 2/ It has an HD video camera mounted high up, so it can get a good view of the road ahead; 3/ To make it work, you access a fully configured 3D physics simulator available in the cloud, with a track and a virtual car ready to start training. 4/ All you need to do is provide a simple - or complex - scoring function, using simple Python code, and with a single click, we’ll train the model in the simulator using reinforcement learning in SageMaker - you can watch in real time if you wish to see how the learning is going. 5/ Then just take your model, load it onto DeepRacer, and watch it go… We think this is a really interesting and fun way to get started with reinforcement learning, and as we started to experiment with this internally, a funny thing happened… The teams started racing against each other; continually tweaking and adjusting their reward functions for speed around a virtual track. Factions sprang up, it got pretty competitive, and developer’s knowledge and experience with RL grew almost exponentially… In fact, we had so much fun, that we wanted to bring this to our customers, and so today, I’m also announcing…
  • #24 Here’s how the league will work… 1/ Anyone can build an RL model in SageMaker (or develop on own and bring to SageMaker) 2/ At our 20 or so AWS Summits in 2019 we’ll hold a DeepRacer League Race, you can compete in as many of these as you like. 3/ Winner of each DRL Race and top 10 points getters qualify for the DRL Championship Cup held at re:invent 2019 here in Vegas. 4/ We’ll also have virtual events and tournaments throughout the year, likely about 20 where we will take the winners and top 10 points getters to the Championship Cup at re:invent. 5/ While there will be individual prizes for each race, big prize is Championship Cup at re:Invent 6/ This year, for 2018, because we don’t have as much lead time, we’re doing an accelerated version for our first Championship Cup.