Model Based Episodic Memory

•

0 likes•124 views

Hung Le

Model-based Episodic Memory enables sample-efficiency and noise-tolerance for RL agents.

Presentations & Public Speaking

Model-Based Episodic Memory Induces
Dynamic Hybrid Controls
Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha
Venkatesh
Presented by Hung Le
1

Reinforcement learning
2
Image source: Wikipedia
1. Model-based RL
2. Model-free RL
3. Episodic RL
3rd way: episodic control
Episodic memory-Hippocampus
Instance of the experiences
Fast learning
Heuristic/suboptimal
Questions that episodic memory can answer:
What did you have for breakfast this morning?
Which action did the agent take resulting in high return?

Typical episodic control paradigm
Current experience
Memory
read
Experiences | Returns
Policy
Value
• Key-value episodic memory
• Key=Experience can be any from
single state to the whole trajectory
• Value=return/estimated value
Environment
Memory write
3
Image Source: Sutton & Barto Book: Reinforcement Learning: An Introduction

Hybrid design of episodic and model-free RL
(Complementary learning systems)
4
Action
Update
Rapid learning
Episodic Memory
Image source: internet, Neural Episodic Control

Limitations
• Near-deterministic assumption
 store the best return
• Sample-inefficiency
 store state-action-value which demands experiencing all actions to
make reliable decisions
 update one memory slot at a time, slow value propagation
• Fixed combination between episodic and parametric values
 episodic contribution weight unchanged for different observations
and requires manual tuning of the weight
5

Our contribution
• Episodic memory of trajectory-value
 Store trajectory representations instead of states  handle noisy, POMDP
• Memory-based value estimation mechanism
 Memory read: mix average and max return of nearest neighbors balancing
 Memory write: weighted averaging write to multiple slots
 Memory refine: bootstrapped memory update hasten value propagation
• Dynamic hybrid control:
 Neural network learns to weight episodic value against DQN’s value
 Conditioned on the current trajectory
6

Trajectory representation
learning
• Trajectory model is LSTM
• Hidden state ⃗
𝜏𝜏 is the representation
• Self-supervised learning:
 Recall past events given a query as the
preceding event (reconstruction loss)
 2 trajectories having more common
transitions are closer in the
representation space
7

Memory reading
• (a) average neighbors: pessimistic
• (b) max neighbors: optimistic
• Randomly select (a) or (b) with a probability p
8

Memory writing
At the end of episode, update the values of
multiple key neighbors such that the updated
values are approaching the return with
speeds relative to the distances
9

Memory refining
• Refine the memory value at any step
• Bootstrapped update
10

Episodic value estimation via memory-based planning
• What is the value of taking action a from state s?
• Next observation is approximated by the trajectory representation following a
• The value is queried from the memory
• Current reward r is estimated from a reward model
11

Combining episodic and parametric value
function
13

MBEC++ in noisy classical control tasks
15

MBEC++ in POMDP and Atari tasks
16
Human normalized scores (mean/median) at
10 million frames for all and a subset of 25
games.

Key takeaways about our episodic memory
• Storing distributed trajectories produced by a trajectory model
• Memory-based planning with fast value-propagating memory writing
and refining
• Dynamic consolidation of episodic values to parametric value function
• Good results:
• Noisy environments
• Atari games
• POMDPs
17

Thank you
thai.le@deakin.edu.au
A²I²
Deakin University
Geelong Waurn Ponds
Campus, Geelong, VIC 3220
Hung Le
18

What's hot

Dexterous In-hand Manipulation by OpenAIAnand Joshi

A brief introduction to Searn AlgorithmSupun Abeysinghe

Actor critic algorithmJie-Han Chen

Algorithms Design PatternsAshwin Shiv

Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee

Machine learning Algorithms with a Sagemaker demoHridyesh Bisht

Planning and Learning with Tabular MethodsDongmin Lee

Deep Q-learning from Demonstrations DQfDAmmar Rashed

Reinforcement Learning : A Beginners TutorialOmar Enayet

Reinforcement learningDongHyun Kwak

Reinforcement LearningSalem-Kabbani

Deep Reinforcement LearningUsman Qayyum

Deep Q-LearningNikolay Pavlov

An introduction to reinforcement learningSubrat Panda, PhD

Recent Trends in Neural Net Policy LearningSungjoon Choi

Deep reinforcement learning from scratchJie-Han Chen

An introduction to reinforcement learningJie-Han Chen

Reinforcement Learning Guide For Beginnersgokulprasath06

Temporal difference learningJie-Han Chen

Reinforcement learning 7313Slideshare

What's hot (20)

Dexterous In-hand Manipulation by OpenAI

A brief introduction to Searn Algorithm

Actor critic algorithm

Algorithms Design Patterns

Maximum Entropy Reinforcement Learning (Stochastic Control)

Machine learning Algorithms with a Sagemaker demo

Planning and Learning with Tabular Methods

Deep Q-learning from Demonstrations DQfD

Reinforcement Learning : A Beginners Tutorial

Reinforcement learning

Reinforcement Learning

Deep Reinforcement Learning

Deep Q-Learning

An introduction to reinforcement learning

Recent Trends in Neural Net Policy Learning

Deep reinforcement learning from scratch

An introduction to reinforcement learning

Reinforcement Learning Guide For Beginners

Temporal difference learning

Reinforcement learning 7313

Similar to Model Based Episodic Memory

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

Intro to Deep Reinforcement LearningKhaled Saleh

Reservoir Computing Overview (with emphasis on Liquid State Machines)Alex Klibisz

Lecture 9 Markov decision processVARUN KUMAR

Trajectory Transformer.pptxSeungeon Baek

TransDreamer.pptxSeungeon Baek

Reinforcement learningDing Li

Deep Dive into Hyperparameter TuningShubhmay Potdar

Memory-based Reinforcement LearningHung Le

KATE - a Platform for Machine Learningdiannepatricia

Reinforcement Learning and Artificial Neural NetsPierre de Lacaze

Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo

DefenseTalk_TrimmedAbhishek Sharma

Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna

[slide] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University

An Updated Survey on Niching Methods and Their ApplicationsSajib Sen

05 distance learning standards-scorm research宥均林

Deep Learning Sample Class (Jon Lederman)Jon Lederman

Deep learning Tutorial - Part IIQuantUniversity

Similar to Model Based Episodic Memory (20)

Introduction to Deep Reinforcement Learning

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

Intro to Deep Reinforcement Learning

Reservoir Computing Overview (with emphasis on Liquid State Machines)

Lecture 9 Markov decision process

Trajectory Transformer.pptx

TransDreamer.pptx

Reinforcement learning

Deep Dive into Hyperparameter Tuning

Memory-based Reinforcement Learning

KATE - a Platform for Machine Learning

Reinforcement Learning and Artificial Neural Nets

Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)

DefenseTalk_Trimmed

Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...

[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection

An Updated Survey on Niching Methods and Their Applications

05 distance learning standards-scorm research

Deep Learning Sample Class (Jon Lederman)

Deep learning Tutorial - Part II

Recently uploaded

Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal

Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage

SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr

Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4

Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls

VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal

Presentation on Engagement in Book Clubssamaasim06

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807

Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany

George Lever - eCommerce Day Chile 2024eCommerce Institute

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi

ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS

Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls

Recently uploaded (20)

Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )

Introduction to Prompt Engineering (Focusing on ChatGPT)

SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr

Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata

Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service

VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services

Presentation on Engagement in Book Clubs

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf

Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...

George Lever - eCommerce Day Chile 2024

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...

ANCHORING SCRIPT FOR A CULTURAL EVENT.docx

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...

Night 7k Call Girls Noida Sector 128 Call Me: 8448380779

Model Based Episodic Memory

1. Model-Based Episodic Memory Induces Dynamic Hybrid Controls Authors: Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh Presented by Hung Le 1

2. Reinforcement learning 2 Image source: Wikipedia 1. Model-based RL 2. Model-free RL 3. Episodic RL 3rd way: episodic control Episodic memory-Hippocampus Instance of the experiences Fast learning Heuristic/suboptimal Questions that episodic memory can answer: What did you have for breakfast this morning? Which action did the agent take resulting in high return?

3. Typical episodic control paradigm Current experience Memory read Experiences | Returns Policy Value • Key-value episodic memory • Key=Experience can be any from single state to the whole trajectory • Value=return/estimated value Environment Memory write 3 Image Source: Sutton & Barto Book: Reinforcement Learning: An Introduction

4. Hybrid design of episodic and model-free RL (Complementary learning systems) 4 Action Update Rapid learning Episodic Memory Image source: internet, Neural Episodic Control

5. Limitations • Near-deterministic assumption  store the best return • Sample-inefficiency  store state-action-value which demands experiencing all actions to make reliable decisions  update one memory slot at a time, slow value propagation • Fixed combination between episodic and parametric values  episodic contribution weight unchanged for different observations and requires manual tuning of the weight 5

6. Our contribution • Episodic memory of trajectory-value  Store trajectory representations instead of states  handle noisy, POMDP • Memory-based value estimation mechanism  Memory read: mix average and max return of nearest neighbors balancing  Memory write: weighted averaging write to multiple slots  Memory refine: bootstrapped memory update hasten value propagation • Dynamic hybrid control:  Neural network learns to weight episodic value against DQN’s value  Conditioned on the current trajectory 6

7. Trajectory representation learning • Trajectory model is LSTM • Hidden state ⃗ 𝜏𝜏 is the representation • Self-supervised learning:  Recall past events given a query as the preceding event (reconstruction loss)  2 trajectories having more common transitions are closer in the representation space 7

8. Memory reading • (a) average neighbors: pessimistic • (b) max neighbors: optimistic • Randomly select (a) or (b) with a probability p 8

9. Memory writing At the end of episode, update the values of multiple key neighbors such that the updated values are approaching the return with speeds relative to the distances 9

10. Memory refining • Refine the memory value at any step • Bootstrapped update 10

11. Episodic value estimation via memory-based planning • What is the value of taking action a from state s? • Next observation is approximated by the trajectory representation following a • The value is queried from the memory • Current reward r is estimated from a reward model 11

12. MBEC agent in navigation tasks 12

13. Combining episodic and parametric value function 13

14. MBEC+DQN=MBEC++ 14

15. MBEC++ in noisy classical control tasks 15

16. MBEC++ in POMDP and Atari tasks 16 Human normalized scores (mean/median) at 10 million frames for all and a subset of 25 games.

17. Key takeaways about our episodic memory • Storing distributed trajectories produced by a trajectory model • Memory-based planning with fast value-propagating memory writing and refining • Dynamic consolidation of episodic values to parametric value function • Good results: • Noisy environments • Atari games • POMDPs 17

18. Thank you thai.le@deakin.edu.au A²I² Deakin University Geelong Waurn Ponds Campus, Geelong, VIC 3220 Hung Le 18

Model Based Episodic Memory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Model Based Episodic Memory

Similar to Model Based Episodic Memory (20)

Recently uploaded

Recently uploaded (20)

Model Based Episodic Memory