SlideShare a Scribd company logo
1 of 25
Download to read offline
Deep RL with Policy
Optimization
Evan Casey
Reinforcement Learning
Figure source: Sutton & Barto, 1998
Policy Optimization
Figure source: Sutton & Barto, 1998
Defining the Objective
Cumulative reward
for the trajectory
Joint probability of individual
state-action pair probs:
Why Policy Gradients?
Learning policy directly instead of Q or V
Don’t need dynamics model:
Vanilla Q-learning is intractable for large action spaces
How does it work?
The Policy Gradient Theorem
Using Empirical Estimates
REINFORCE
Williams, 1992
Intuition
Gradient tries to:
● Increase probability of paths with
positive
● Decrease probability of paths with
negative
Figure source: Schulman & Abbeel
Adding a baseline
Recall:
Generalized advantage actor-critic
Implementation
Implementation
Implementation
Implementation (GAE)
Results
A quick note on the cross-entropy method (CEM)
A3C
Entropy
regularization term
Minh, et al. 2016
A3C Results
Minh, et al. 2016
Active areas of research
New environments!
Better sample efficiency
Transfer learning
Perception
Exploration/Auxiliary Tasks
Tensorflow tips and tricks
Use tf summaries for bookkeeping
Variable scoping for re-use
● tf.get_collection(tf.GraphKeys.VARIABLES)
Global counters
Coordinator and server APIs for multi-threaded/distributed
training
Gradient clipping
Tensorboard
RL tips and tricks
Standardize your rollouts
Batch size makes a big difference
Neural net architecture doesn’t matter that much:
● batch norm, dropout, etc
Policy gradients don’t benefit as much from off-policy exploration (eg.
e-greedy)
Sources
NIPS Tutorial (Schulman and Abbeel):
http://people.eecs.berkeley.edu/~pabbeel/nips-tutorial-policy-optimization-Schulm
an-Abbeel.pdf
Source code:
http://github.com/evancasey/demeter

More Related Content

Similar to Deep RL Talk @ NYTimes

Forecasting ppt @ bec doms
Forecasting ppt @ bec domsForecasting ppt @ bec doms
Forecasting ppt @ bec domsBabasab Patil
 
Stock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceTai Tran
 
important exam.pdf
important exam.pdfimportant exam.pdf
important exam.pdfAnilGhadge6
 
TEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONTEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONUday Wankar
 
Simplifying effort estimation based on use case points
Simplifying effort estimation based on use case pointsSimplifying effort estimation based on use case points
Simplifying effort estimation based on use case pointsAbdulrhman Shaheen
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Application of predictive analytics on semi-structured north Atlantic tropica...
Application of predictive analytics on semi-structured north Atlantic tropica...Application of predictive analytics on semi-structured north Atlantic tropica...
Application of predictive analytics on semi-structured north Atlantic tropica...Skylar Hernandez
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_AliMDO_Lab
 
Krajewski Chapter 13.ppt
Krajewski Chapter 13.pptKrajewski Chapter 13.ppt
Krajewski Chapter 13.pptSaadHashmi
 
Combining and pooling forecasts based on selection criteria
Combining and pooling forecasts based on selection criteriaCombining and pooling forecasts based on selection criteria
Combining and pooling forecasts based on selection criteriaDevon K. Barrow
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey
 
Forecasting-Exponential Smoothing
Forecasting-Exponential SmoothingForecasting-Exponential Smoothing
Forecasting-Exponential Smoothingiceu novida adinata
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015RIILP
 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slidesknksmart
 
Evaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsEvaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsXing Niu
 

Similar to Deep RL Talk @ NYTimes (20)

Force field Analysis
Force field Analysis Force field Analysis
Force field Analysis
 
Forecasting ppt @ bec doms
Forecasting ppt @ bec domsForecasting ppt @ bec doms
Forecasting ppt @ bec doms
 
Stock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical Evidence
 
important exam.pdf
important exam.pdfimportant exam.pdf
important exam.pdf
 
TEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATIONTEACHING AND LEARNING BASED OPTIMISATION
TEACHING AND LEARNING BASED OPTIMISATION
 
Simplifying effort estimation based on use case points
Simplifying effort estimation based on use case pointsSimplifying effort estimation based on use case points
Simplifying effort estimation based on use case points
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Application of predictive analytics on semi-structured north Atlantic tropica...
Application of predictive analytics on semi-structured north Atlantic tropica...Application of predictive analytics on semi-structured north Atlantic tropica...
Application of predictive analytics on semi-structured north Atlantic tropica...
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
AMS_Aviation_2014_Ali
AMS_Aviation_2014_AliAMS_Aviation_2014_Ali
AMS_Aviation_2014_Ali
 
Krajewski Chapter 13.ppt
Krajewski Chapter 13.pptKrajewski Chapter 13.ppt
Krajewski Chapter 13.ppt
 
Combining and pooling forecasts based on selection criteria
Combining and pooling forecasts based on selection criteriaCombining and pooling forecasts based on selection criteria
Combining and pooling forecasts based on selection criteria
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical Method
 
Forecasting-Exponential Smoothing
Forecasting-Exponential SmoothingForecasting-Exponential Smoothing
Forecasting-Exponential Smoothing
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017
 
Force field analysis ppt
Force field analysis ppt Force field analysis ppt
Force field analysis ppt
 
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
ESR8 Liangyou Li - EXPERT Summer School - Malaga 2015
 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slides
 
Evaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching MethodsEvaluating the Stability and Credibility of Ontology Matching Methods
Evaluating the Stability and Credibility of Ontology Matching Methods
 

Recently uploaded

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 

Deep RL Talk @ NYTimes