Revving up with Reinforcement Learning by Ricardo Sueiras

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS DeepRacer
Revving up with
Reinforcement Learning

How can we put
reinforcement learning
in the hands of all
developers? literally

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Robotic autonomous
race car
Racing LeagueVirtual simulator, to train
and experiment
What is AWS DeepRacer?

My first attempt at
building a self driving
car…
(2014)

What is Reinforcement Learning?

SUPERVISED UNSUPERVISED REINFORCEMENT
Machine learning overview

METHOD Supervised learning
HOW IT WORKS Expert driver controls a real
world car, that has a camera. Save the images
from the camera as inputs and corresponding
driving actions (speed and steering angle) as
outputs. Train a model.
RESULT Provide state(image) into model and
receive driving action
RL vs. other approaches for robotic racing
METHOD Reinforcement learning
HOW IT WORKS Virtual agent repeatedly
interacts with a simulated environment and
logs experience (image, action, new state,
reward). Experience is used to train a model,
and new model is used to get more
experience.
RESULT Provide state(image) into model and
receive driving action

AUTONOMOUS CARS FINANCIAL TRADING DATACENTER COOLINGFLEET LOGISTICS
Reinforcement Learning use cases

RL for AB Testing

Reinforcement learning terms
AGENT ENVIRONMENT STATE
ACTION
EPISODEREWARD

VALUE FUNCTION
POLICY FUNCTION
How does learning happen?

Policy Function
Input
Output

RL algorithms: Vanilla policy gradient
J(q)New
weights
New
weights
0.4 ± 𝛿 0.3 ± 𝛿

RL algorithms: Proximal policy optimization (PPO)
(State, action, reward,
next state)
(st,at, rt, st+1)
Advantage
Improved model

What does a reward function look like?
def reward_function(on_track, x, y, distance_from_center, car_orientation,
progress, steps, throttle, steering, track_width, waypoints,
closest_waypoint):
import math
# Example Centerline following reward function
marker_1 = 0.1 * track_width
reward = 1e-3
if distance_from_center >= 0.0 and distance_from_center <= marker_1:
reward = 1
elif distance_from_center <= marker_2:
reward = 0.5
elif distance_from_center <= marker_3:
reward = 0.1
else:
reward = 1e-3 # likely crashed/ close to off track
return float(reward)

Snakes on the (control) plane
@frankmunz)

Fish and Chips Chole Poori Paneer Uttappam Khara Dosa
Explore vs Exploit

Explore the grid and accumulate rewards
Episode : Process of exploring the grids earning rewards until the car
moves out of the bounds or reaches the goal.
Out of bounds Final Destination.

Iterate! Learning doesn’t happen on the first go!
The model learns which subsequent actions will results
highest cumulative rewards.

Agent Improves as it gains more experience.
As the agent gains more and more experience, it learns to
stay on the central squares to get higher rewards.

Exploration

Agent Improves as it gains more experience.

Exploration vs. exploitation
EXPLORATION EXPLOITATION

Convergence

Revving up with Reinforcement Learning by Ricardo Sueiras

More Related Content

Similar to Revving up with Reinforcement Learning by Ricardo Sueiras

More from Alex Cachia

Recently uploaded

Revving up with Reinforcement Learning by Ricardo Sueiras