© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS DeepRacer
Revving up with
Reinforcement Learning
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How can we put
reinforcement learning
in the hands of all
developers? literally
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Robotic autonomous
race car
Racing LeagueVirtual simulator, to train
and experiment
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is AWS DeepRacer?
My first attempt at
building a self driving
car…
(2014)
AWS Robocar Rally (2017)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Reinforcement Learning?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SUPERVISED UNSUPERVISED REINFORCEMENT
Machine learning overview
METHOD Supervised learning
HOW IT WORKS Expert driver controls a real
world car, that has a camera. Save the images
from the camera as inputs and corresponding
driving actions (speed and steering angle) as
outputs. Train a model.
RESULT Provide state(image) into model and
receive driving action
RL vs. other approaches for robotic racing
METHOD Reinforcement learning
HOW IT WORKS Virtual agent repeatedly
interacts with a simulated environment and
logs experience (image, action, new state,
reward). Experience is used to train a model,
and new model is used to get more
experience.
RESULT Provide state(image) into model and
receive driving action
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AUTONOMOUS CARS FINANCIAL TRADING DATACENTER COOLINGFLEET LOGISTICS
Reinforcement Learning use cases
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RL for AB Testing
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reinforcement learning terms
AGENT ENVIRONMENT STATE
ACTION
EPISODEREWARD
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VALUE FUNCTION
POLICY FUNCTION
How does learning happen?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Policy Function
Input
Output
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RL algorithms: Vanilla policy gradient
J(q)New
weights
New
weights
0.4 ± 𝛿 0.3 ± 𝛿
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RL algorithms: Proximal policy optimization (PPO)
(State, action, reward,
next state)
(st,at, rt, st+1)
Advantage
Improved model
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does a reward function look like?
def reward_function(on_track, x, y, distance_from_center, car_orientation,
progress, steps, throttle, steering, track_width, waypoints,
closest_waypoint):
import math
# Example Centerline following reward function
marker_1 = 0.1 * track_width
marker_2 = 0.25 * track_width
marker_3 = 0.5 * track_width
reward = 1e-3
if distance_from_center >= 0.0 and distance_from_center <= marker_1:
reward = 1
elif distance_from_center <= marker_2:
reward = 0.5
elif distance_from_center <= marker_3:
reward = 0.1
else:
reward = 1e-3 # likely crashed/ close to off track
return float(reward)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Snakes on the (control) plane
@frankmunz)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fish and Chips Chole Poori Paneer Uttappam Khara Dosa
Explore vs Exploit
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Explore the grid and accumulate rewards
Episode : Process of exploring the grids earning rewards until the car
moves out of the bounds or reaches the goal.
Out of bounds Final Destination.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Iterate! Learning doesn’t happen on the first go!
The model learns which subsequent actions will results
highest cumulative rewards.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agent Improves as it gains more experience.
As the agent gains more and more experience, it learns to
stay on the central squares to get higher rewards.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Exploration
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agent Improves as it gains more experience.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Exploration vs. exploitation
EXPLORATION EXPLOITATION
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convergence
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Revving up with Reinforcement Learning by Ricardo Sueiras

  • 1.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. AWS DeepRacer Revving up with Reinforcement Learning
  • 2.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. How can we put reinforcement learning in the hands of all developers? literally
  • 3.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Robotic autonomous race car Racing LeagueVirtual simulator, to train and experiment © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is AWS DeepRacer?
  • 4.
    My first attemptat building a self driving car… (2014)
  • 5.
  • 8.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. What is Reinforcement Learning?
  • 9.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. SUPERVISED UNSUPERVISED REINFORCEMENT Machine learning overview
  • 10.
    METHOD Supervised learning HOWIT WORKS Expert driver controls a real world car, that has a camera. Save the images from the camera as inputs and corresponding driving actions (speed and steering angle) as outputs. Train a model. RESULT Provide state(image) into model and receive driving action RL vs. other approaches for robotic racing METHOD Reinforcement learning HOW IT WORKS Virtual agent repeatedly interacts with a simulated environment and logs experience (image, action, new state, reward). Experience is used to train a model, and new model is used to get more experience. RESULT Provide state(image) into model and receive driving action
  • 11.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. AUTONOMOUS CARS FINANCIAL TRADING DATACENTER COOLINGFLEET LOGISTICS Reinforcement Learning use cases
  • 12.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. RL for AB Testing
  • 13.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Reinforcement learning terms AGENT ENVIRONMENT STATE ACTION EPISODEREWARD
  • 14.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. VALUE FUNCTION POLICY FUNCTION How does learning happen?
  • 15.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Policy Function Input Output
  • 16.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. RL algorithms: Vanilla policy gradient J(q)New weights New weights 0.4 ± 𝛿 0.3 ± 𝛿
  • 17.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. RL algorithms: Proximal policy optimization (PPO) (State, action, reward, next state) (st,at, rt, st+1) Advantage Improved model
  • 18.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. What does a reward function look like? def reward_function(on_track, x, y, distance_from_center, car_orientation, progress, steps, throttle, steering, track_width, waypoints, closest_waypoint): import math # Example Centerline following reward function marker_1 = 0.1 * track_width marker_2 = 0.25 * track_width marker_3 = 0.5 * track_width reward = 1e-3 if distance_from_center >= 0.0 and distance_from_center <= marker_1: reward = 1 elif distance_from_center <= marker_2: reward = 0.5 elif distance_from_center <= marker_3: reward = 0.1 else: reward = 1e-3 # likely crashed/ close to off track return float(reward)
  • 19.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Snakes on the (control) plane @frankmunz)
  • 20.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Fish and Chips Chole Poori Paneer Uttappam Khara Dosa Explore vs Exploit
  • 21.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Explore the grid and accumulate rewards Episode : Process of exploring the grids earning rewards until the car moves out of the bounds or reaches the goal. Out of bounds Final Destination.
  • 22.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Iterate! Learning doesn’t happen on the first go! The model learns which subsequent actions will results highest cumulative rewards.
  • 23.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Agent Improves as it gains more experience. As the agent gains more and more experience, it learns to stay on the central squares to get higher rewards.
  • 24.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Exploration
  • 25.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Agent Improves as it gains more experience.
  • 26.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Exploration vs. exploitation EXPLORATION EXPLOITATION
  • 27.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Convergence
  • 28.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 30.
    © 2019, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.