SlideShare a Scribd company logo
1 of 20
Download to read offline
A Deeper Look at Experience Replay
(17.12)
Seungjae Ryan Lee
Online Learning
• Learn directly from experience
• Highly correlated data
New transition t
Experience Replay
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Transition t Batch B
Effectiveness of Experience Replay
• Only method that can generate uncorrelated data for online RL
• Except using multiple workers (A3C)
• Significantly improves data efficiency
• Norm in many deep RL algorithms
• Deep Q-Networks (DQN)
• Deep Deterministic Policy Gradient (DDPG)
• Hindsight Experience Replay (HER)
Problem with Experience Replay
• There has been default capacity of 106 used for:
• Different algorithms (DQN, PG, etc.)
• Different environments (retro games, continuous control, etc.)
• Different neural network architectures
Result 1
Replay buffer capacity can have significant negative impact on
performance if too low or too high.
Combined Experience Replay (CER)
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to and online transition 𝑡 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Batch B
Transition t
Combined Experience Replay (CER)
Result 2
CER can remedy the negative influence of a large replay buffer with
𝑂 1 computation.
CER vs. Prioritized Experience Replay (PER)
• Prioritized Experience Replay (PER)
• Stochastic replay method
• Designed to replay the buffer more efficiently
• Always expected to improve performance
• 𝑂(𝑁 log 𝑁)
• Combined Experience Replay (CER)
• Guaranteed to use newest transition
• Designed to remedy negative influence of a large replay buffer
• Does not improve performance for good replay buffer sizes
• 𝑂(1)
Test agents
1. Online-Q
• Q-learning with online transitions 𝑡
2. Buffer-Q
• Q-learning with the replay buffer 𝐵
3. Combined-Q
• Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
Testbed Environments
• 3 environments for 3 methods
• Tabular, Linear and Nonlinear approximations
• Introduce “timeout” to all tasks
• Episode ends automatically after 𝑇 timesteps (large enough for each task)
• Prevent episode being arbitrarily long
• Used partial-episode-bootstrap (PEB) to minimize negative side-effects
Testbed: Gridworld
• Represent tabular methods
• Agent starts in 𝑆 and has a goal state 𝐺
• Agent can move left, right, up, down
• Reward is -1 until goal is reached
• If the agent bumps into the wall (black),
it remains in the same position
Gridworld Results (Tabular)
• Online-Q solves task very slowly
• Buffer-Q shows worse performance / speed for larger buffers
• Combined-Q shows slightly faster speed for larger buffers
Gridworld Results (Nonlinear)
• Online-Q fails to learn
• Combined-Q significantly speeds up learning
Testbed: Lunar Lander
• Represent linear approximation methods (with tile coding)
• Agent tries to land a shuttle on
the moon
• State space: 𝑅8
• 4 discrete actions
Lunar Lander Results (Linear)
• Buffer-Q shows worse learning speed for larger buffers
• Combined-Q is robust for varying buffer size
Lunar Lander Results (Nonlinear)
• Online-Q achieves best performance
• Combined-Q shows marginal improvement to Buffer-Q
• Buffer-Q and Combined-Q overfits after some time
Testbed: Pong
• Represent nonlinear approximation methods
• RAM states used instead of raw
pixels
• More accurate state representation
• State space: 0, … , 255 128
• 6 discrete actions
Pong Results (Nonlinear)
• All 3 agents fail to learn with a simple 1-hidden-layer network
• CER does not improve performance or speed
Limitations of Experience Replay
• Important transitions have delayed effects
• Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁)
• Partially mitigated with correct buffer size or CER
• Both are workarounds, not solutions
• Experience Replay itself is flawed
• Focus should be on replacing experience replay
Thank you!
Original Paper: https://arxiv.org/abs/1712.01275
Paper Recommendations:
• Prioritized Experience Replay
• Hindsight Experience Replay
• Asynchronous Methods for Deep Reinforcement Learning
You can find more content in www.endtoend.ai/slides

More Related Content

Recently uploaded

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

[1712.01275] A Deeper Look at Experience Replay

  • 1. A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee
  • 2. Online Learning • Learn directly from experience • Highly correlated data New transition t
  • 3. Experience Replay • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Transition t Batch B
  • 4. Effectiveness of Experience Replay • Only method that can generate uncorrelated data for online RL • Except using multiple workers (A3C) • Significantly improves data efficiency • Norm in many deep RL algorithms • Deep Q-Networks (DQN) • Deep Deterministic Policy Gradient (DDPG) • Hindsight Experience Replay (HER)
  • 5. Problem with Experience Replay • There has been default capacity of 106 used for: • Different algorithms (DQN, PG, etc.) • Different environments (retro games, continuous control, etc.) • Different neural network architectures Result 1 Replay buffer capacity can have significant negative impact on performance if too low or too high.
  • 6. Combined Experience Replay (CER) • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to and online transition 𝑡 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Batch B Transition t
  • 7. Combined Experience Replay (CER) Result 2 CER can remedy the negative influence of a large replay buffer with 𝑂 1 computation.
  • 8. CER vs. Prioritized Experience Replay (PER) • Prioritized Experience Replay (PER) • Stochastic replay method • Designed to replay the buffer more efficiently • Always expected to improve performance • 𝑂(𝑁 log 𝑁) • Combined Experience Replay (CER) • Guaranteed to use newest transition • Designed to remedy negative influence of a large replay buffer • Does not improve performance for good replay buffer sizes • 𝑂(1)
  • 9. Test agents 1. Online-Q • Q-learning with online transitions 𝑡 2. Buffer-Q • Q-learning with the replay buffer 𝐵 3. Combined-Q • Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
  • 10. Testbed Environments • 3 environments for 3 methods • Tabular, Linear and Nonlinear approximations • Introduce “timeout” to all tasks • Episode ends automatically after 𝑇 timesteps (large enough for each task) • Prevent episode being arbitrarily long • Used partial-episode-bootstrap (PEB) to minimize negative side-effects
  • 11. Testbed: Gridworld • Represent tabular methods • Agent starts in 𝑆 and has a goal state 𝐺 • Agent can move left, right, up, down • Reward is -1 until goal is reached • If the agent bumps into the wall (black), it remains in the same position
  • 12. Gridworld Results (Tabular) • Online-Q solves task very slowly • Buffer-Q shows worse performance / speed for larger buffers • Combined-Q shows slightly faster speed for larger buffers
  • 13. Gridworld Results (Nonlinear) • Online-Q fails to learn • Combined-Q significantly speeds up learning
  • 14. Testbed: Lunar Lander • Represent linear approximation methods (with tile coding) • Agent tries to land a shuttle on the moon • State space: 𝑅8 • 4 discrete actions
  • 15. Lunar Lander Results (Linear) • Buffer-Q shows worse learning speed for larger buffers • Combined-Q is robust for varying buffer size
  • 16. Lunar Lander Results (Nonlinear) • Online-Q achieves best performance • Combined-Q shows marginal improvement to Buffer-Q • Buffer-Q and Combined-Q overfits after some time
  • 17. Testbed: Pong • Represent nonlinear approximation methods • RAM states used instead of raw pixels • More accurate state representation • State space: 0, … , 255 128 • 6 discrete actions
  • 18. Pong Results (Nonlinear) • All 3 agents fail to learn with a simple 1-hidden-layer network • CER does not improve performance or speed
  • 19. Limitations of Experience Replay • Important transitions have delayed effects • Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁) • Partially mitigated with correct buffer size or CER • Both are workarounds, not solutions • Experience Replay itself is flawed • Focus should be on replacing experience replay
  • 20. Thank you! Original Paper: https://arxiv.org/abs/1712.01275 Paper Recommendations: • Prioritized Experience Replay • Hindsight Experience Replay • Asynchronous Methods for Deep Reinforcement Learning You can find more content in www.endtoend.ai/slides