Deep RL Talk @ NYTimes

•

1 like•77 views

Evan Casey

December 2016

Technology

Deep RL with Policy
Optimization
Evan Casey

Reinforcement Learning
Figure source: Sutton & Barto, 1998

Policy Optimization
Figure source: Sutton & Barto, 1998

Defining the Objective
Cumulative reward
for the trajectory
Joint probability of individual
state-action pair probs:

Why Policy Gradients?
Learning policy directly instead of Q or V
Don’t need dynamics model:
Vanilla Q-learning is intractable for large action spaces

Intuition
Gradient tries to:
● Increase probability of paths with
positive
● Decrease probability of paths with
negative
Figure source: Schulman & Abbeel

A quick note on the cross-entropy method (CEM)

A3C
Entropy
regularization term
Minh, et al. 2016

Active areas of research
New environments!
Better sample efficiency
Transfer learning
Perception
Exploration/Auxiliary Tasks

Tensorflow tips and tricks
Use tf summaries for bookkeeping
Variable scoping for re-use
● tf.get_collection(tf.GraphKeys.VARIABLES)
Global counters
Coordinator and server APIs for multi-threaded/distributed
training
Gradient clipping

RL tips and tricks
Standardize your rollouts
Batch size makes a big difference
Neural net architecture doesn’t matter that much:
● batch norm, dropout, etc
Policy gradients don’t benefit as much from off-policy exploration (eg.
e-greedy)

Sources
NIPS Tutorial (Schulman and Abbeel):
http://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulm
an-Abbeel.pdf
Source code:
http://github.com/evancasey/demeter

Similar to Deep RL Talk @ NYTimes

Force field Analysis Health Innovation Wessex

Forecasting ppt @ bec domsBabasab Patil

Stock Return Forecast - Theory and Empirical EvidenceTai Tran

important exam.pdfAnilGhadge6

TEACHING AND LEARNING BASED OPTIMISATIONUday Wankar

Simplifying effort estimation based on use case pointsAbdulrhman Shaheen

Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk

Application of predictive analytics on semi-structured north Atlantic tropica...Skylar Hernandez

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le

Policy gradientJie-Han Chen

AMS_Aviation_2014_AliMDO_Lab

Krajewski Chapter 13.pptSaadHashmi

Combining and pooling forecasts based on selection criteriaDevon K. Barrow

Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey

Forecasting-Exponential Smoothingiceu novida adinata

Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk

Force field analysis ppt Health Innovation Wessex

ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015RIILP

Forecasting Slidesknksmart

Evaluating the Stability and Credibility of Ontology Matching MethodsXing Niu

Similar to Deep RL Talk @ NYTimes (20)

Force field Analysis

Forecasting ppt @ bec doms

Stock Return Forecast - Theory and Empirical Evidence

important exam.pdf

TEACHING AND LEARNING BASED OPTIMISATION

Simplifying effort estimation based on use case points

Trust Region Policy Optimization, Schulman et al, 2015

Application of predictive analytics on semi-structured north Atlantic tropica...

Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity

Policy gradient

AMS_Aviation_2014_Ali

Krajewski Chapter 13.ppt

Combining and pooling forecasts based on selection criteria

Use of Definitive Screening Designs to Optimize an Analytical Method

Forecasting-Exponential Smoothing

Proximal Policy Optimization Algorithms, Schulman et al, 2017

Force field analysis ppt

ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015

Forecasting Slides

Evaluating the Stability and Credibility of Ontology Matching Methods

Recently uploaded

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB

UiPath manufacturing technology benefits and AI overviewDianaGray10

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz

Google I/O Extended 2024 WarsawGDSC PJATK

Working together SRE & Platform EngineeringMarcus Vechiato

Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3

Top 10 CodeIgniter Development CompaniesTopCSSGallery

Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance

WebAssembly is Key to Better LLM PerformanceSamy Fodil

Vector Search @ sw2con for slideshare.pptxjbellis

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG

Portal Kombat : extension du réseau de propagande russe中央社

AI mind or machine power point presentationyogeshlabana357357

ADP Passwordless Journey Case Study.pptxFIDO Alliance

2024 May Patch TuesdayIvanti

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

How we scaled to 80K users by doing nothing!.pdfSrushith Repakula

Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite

Recently uploaded (20)

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

UiPath manufacturing technology benefits and AI overview

Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)

Google I/O Extended 2024 Warsaw

Working together SRE & Platform Engineering

Intro to Passkeys and the State of Passwordless.pptx

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf

Top 10 CodeIgniter Development Companies

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

WebAssembly is Key to Better LLM Performance

Vector Search @ sw2con for slideshare.pptx

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx

Portal Kombat : extension du réseau de propagande russe

AI mind or machine power point presentation

ADP Passwordless Journey Case Study.pptx

2024 May Patch Tuesday

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...

JohnPollard-hybrid-app-RailsConf2024.pptx

How we scaled to 80K users by doing nothing!.pdf

Microsoft CSP Briefing Pre-Engagement - Questionnaire

Deep RL Talk @ NYTimes

1. Deep RL with Policy Optimization Evan Casey

2. Reinforcement Learning Figure source: Sutton & Barto, 1998

3. Policy Optimization Figure source: Sutton & Barto, 1998

4. Defining the Objective Cumulative reward for the trajectory Joint probability of individual state-action pair probs:

5. Why Policy Gradients? Learning policy directly instead of Q or V Don’t need dynamics model: Vanilla Q-learning is intractable for large action spaces

6. How does it work?

7. The Policy Gradient Theorem

8. Using Empirical Estimates

9. REINFORCE Williams, 1992

10. Intuition Gradient tries to: ● Increase probability of paths with positive ● Decrease probability of paths with negative Figure source: Schulman & Abbeel

11. Adding a baseline Recall:

12. Generalized advantage actor-critic

13. Implementation

14. Implementation

15. Implementation

16. Implementation (GAE)

17. Results

18. A quick note on the cross-entropy method (CEM)

19. A3C Entropy regularization term Minh, et al. 2016

20. A3C Results Minh, et al. 2016

21. Active areas of research New environments! Better sample efficiency Transfer learning Perception Exploration/Auxiliary Tasks

22. Tensorflow tips and tricks Use tf summaries for bookkeeping Variable scoping for re-use ● tf.get_collection(tf.GraphKeys.VARIABLES) Global counters Coordinator and server APIs for multi-threaded/distributed training Gradient clipping

23. Tensorboard

24. RL tips and tricks Standardize your rollouts Batch size makes a big difference Neural net architecture doesn’t matter that much: ● batch norm, dropout, etc Policy gradients don’t benefit as much from off-policy exploration (eg. e-greedy)

25. Sources NIPS Tutorial (Schulman and Abbeel): http://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulm an-Abbeel.pdf Source code: http://github.com/evancasey/demeter

Deep RL Talk @ NYTimes

Recommended

Recommended

More Related Content

Similar to Deep RL Talk @ NYTimes

Similar to Deep RL Talk @ NYTimes (20)

Recently uploaded

Recently uploaded (20)

Deep RL Talk @ NYTimes