Continual reinforcement learning with complex synapses

•

0 likes•160 views

ThyrixYang1

Continual reinforcement learning with complex synapses presentation slides.

Science

Continual Reinforcement Learning with
Complex Synapses
Christos Kaplanis, Murray Shanahan, Claudia Clopath
presentation by Jia-Qi Yang
LAMDA Group

Idea
Catastrophic forgetting is a common problem with reinforcement
learning(nonstationary & correlated experiences, neural network).
Replay buffer: not scale well.
A possible solution: save the parameters in some unit that have
memory function.
A biologically plausible synaptic model: The Benna-Fusi Model(2016,
Nature Neuroscience)
1

The Benna-Fusi Model
Maximise the expected signal to noise ratio (SNR) of memories over
time in a population of synapses undergoing continual plasticity in
the form of random, un-correlated modiﬁcations.
w(t) =
∑
t‘<t
∆w(t‘
)r(t − t‘
)
Maximum signal to noise ratio (SNR) is achieved when r(t) ∼ t− 1
2
(power law decay).
Impractical.
2

The Benna-Fusi Model
Power law decay can be approximated by a synaptic model
consisting of a ﬁnite chain of N communicating dynamic variables.
And it’s dynamic is deﬁned as:
Ck
duk
dt
= gk−1,k(uk−1 − uk) + gk,k+1(uk+1 − uk)
It’s an ordinary differential equation (ODE) and can be solved by
Euler method.
3

The Benna-Fusi Model: visualize
Figure 1: liquid ﬂowing between a series of beakers of increasing size and
decreasing tube widths
4

Reinforcement learning
Q learning:
Q(s, a) = E
π
[
∞∑
i=t
γi−t
ri|st = s, at = a]
Q(st, at) ← Q(st, at) + η[rt + γV(st+1) − Q(st, at)]
Deep Q learning(DQN):
Just ﬁt V(s) and Q(s, a) using neural network, say, use V(s; θ) and
Q(s, a; θ).
5

Some details
Eligibility traces, only used in tabular case.
Q-learning: target network, replay buffer, soft Q-learning,
task-speciﬁc gains and biases.
6

Experiments
• Continual Q-learning(tabular Q-values)
• Continual Multi-task Deep RL(DQN, unrelated tasks)
• Continual Learning within a Single Task(without replay buffer)
7

Continual Q-learning
10x10 grid map.
5 actions.
Two tasks:
1. the reward located at upper right corner.
2. the reward located at bottom left corner.
Alternate tasks every 10000 episodes.
Directly memorize Q values(tabular).
8

Continual Q-learning
Figure 2: Continual Q-learning 9

Continual Multi-task Deep RL
Two completely different tasks:
1. Cart-Pole.
2. Catcher.
Continuous observation space and discrete action space -> DQN.
Memorize the parameters of DQN.
10

Continual Multi-task Deep RL
Figure 3: Cart-Pole(left), Catcher(right)
11

Continual Multi-task Deep RL
Figure 4: Continual Multi-task Deep RL
12

Continual Learning within a Single Task
Targets is moving during learning process, strong correlation, replay
buffer is used to alleviate this problem.
Try to remove replay buffer, and learn single task.
13

Continual Learning within a Single Task
Figure 5: Continual Learning within a Single Task
14

Conclusion
Looks good on simple tasks.
Didn’t work on more complex tasks(from ALE) -> still too simple.
Fast: 1.5-2 times slower than Q-learning
15

What's hot

The world of loss function홍배 김

Deep Learning for Computer Vision: Deep Networks (UPC 2016)Universitat Politècnica de Catalunya

Lecture 19: Implementation of Histogram Image OperationVARUN KUMAR

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya

TalkTaichi Kiwaki

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김

Lecture 6: Convolutional Neural NetworksSang Jun Lee

Joint unsupervised learning of deep representations and image clustersUniversitat Politècnica de Catalunya

Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김

Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...Universitat Politècnica de Catalunya

Brief intro : Invariance and Equivariance홍배 김

Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Universitat Politècnica de Catalunya

Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee

Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

What's hot (20)

The world of loss function

Deep Learning for Computer Vision: Deep Networks (UPC 2016)

Lecture 19: Implementation of Histogram Image Operation

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...

Talk

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...

Lecture 6: Convolutional Neural Networks

Joint unsupervised learning of deep representations and image clusters

Anomaly Detection and Localization Using GAN and One-Class Classifier

Recurrent Neural Networks. Part 1: Theory

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)

Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...

Brief intro : Invariance and Equivariance

Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)

Similar to Continual reinforcement learning with complex synapses

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka

Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)Universitat Politècnica de Catalunya

tutorial.pptVara Prasad

The reversible residual networkThyrixYang1

MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle

Deep learning lecture - part 1 (basics, CNN)SungminYou

Neural Networks. OverviewOleksandr Baiev

Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa

Introduction to deep learningJunaid Bhat

The neural tangent link between CNN denoisers and non-local filtersJulián Tachella

(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki

14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751

MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya

SOFT COMPUTERING TECHNICS -Unit 1sravanthi computers

Oleksandr Obiednikov “Affine transforms and how CNN lives with them”Lviv Startup Club

Recurrent and Recursive Nets (part 2)sohaib_alam

convolutional_neural_networks in deep learningssusere5ddd6

Csss2010 20100803-kanevski-lecture2hasan_elektro

Lec3 dqnRonald Teo

Similar to Continual reinforcement learning with complex synapses (20)

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience

Skip RNN: Learning to Skip State Updates in RNNs (ICLR 2018)

tutorial.ppt

The reversible residual network

MLIP - Chapter 2 - Preliminaries to deep learning

Deep learning lecture - part 1 (basics, CNN)

Neural Networks. Overview

Random Matrix Theory and Machine Learning - Part 4

Introduction to deep learning

The neural tangent link between CNN denoisers and non-local filters

(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning

14889574 dl ml RNN Deeplearning MMMm.ppt

MVPA with SpaceNet: sparse structured priors

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018

SOFT COMPUTERING TECHNICS -Unit 1

Oleksandr Obiednikov “Affine transforms and how CNN lives with them”

Recurrent and Recursive Nets (part 2)

convolutional_neural_networks in deep learning

Csss2010 20100803-kanevski-lecture2

Lec3 dqn

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani

Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75

COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed

Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244

Site Acceptance Test .Poonam Aher Patil

Factory Acceptance Test( FAT).pptx .Poonam Aher Patil

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit

Module for Grade 9 for Asynchronous/Distance learninglevieagacer

chemical bonding Essentials of Physical Chemistry2.pdfTukamushabaBismark

Proteomics: types, protein profiling steps etc.Silpa

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197

GBSN - Biochemistry (Unit 1)Areesha Ahmad

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2

GBSN - Microbiology (Unit 2)Areesha Ahmad

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi

Clean In Place(CIP).pptx .Poonam Aher Patil

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young

COST ESTIMATION FOR A RESEARCH PROJECT.pptx

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

Site Acceptance Test .

Factory Acceptance Test( FAT).pptx .

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑

Module for Grade 9 for Asynchronous/Distance learning

chemical bonding Essentials of Physical Chemistry2.pdf

Proteomics: types, protein profiling steps etc.

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

GBSN - Biochemistry (Unit 1)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

GBSN - Microbiology (Unit 2)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.

Clean In Place(CIP).pptx .

Continual reinforcement learning with complex synapses

1. Continual Reinforcement Learning with Complex Synapses Christos Kaplanis, Murray Shanahan, Claudia Clopath presentation by Jia-Qi Yang LAMDA Group

2. Idea Catastrophic forgetting is a common problem with reinforcement learning(nonstationary & correlated experiences, neural network). Replay buffer: not scale well. A possible solution: save the parameters in some unit that have memory function. A biologically plausible synaptic model: The Benna-Fusi Model(2016, Nature Neuroscience) 1

3. The Benna-Fusi Model Maximise the expected signal to noise ratio (SNR) of memories over time in a population of synapses undergoing continual plasticity in the form of random, un-correlated modiﬁcations. w(t) = ∑ t‘<t ∆w(t‘ )r(t − t‘ ) Maximum signal to noise ratio (SNR) is achieved when r(t) ∼ t− 1 2 (power law decay). Impractical. 2

4. The Benna-Fusi Model Power law decay can be approximated by a synaptic model consisting of a ﬁnite chain of N communicating dynamic variables. And it’s dynamic is deﬁned as: Ck duk dt = gk−1,k(uk−1 − uk) + gk,k+1(uk+1 − uk) It’s an ordinary differential equation (ODE) and can be solved by Euler method. 3

5. The Benna-Fusi Model: visualize Figure 1: liquid ﬂowing between a series of beakers of increasing size and decreasing tube widths 4

6. Reinforcement learning Q learning: Q(s, a) = E π [ ∞∑ i=t γi−t ri|st = s, at = a] Q(st, at) ← Q(st, at) + η[rt + γV(st+1) − Q(st, at)] Deep Q learning(DQN): Just ﬁt V(s) and Q(s, a) using neural network, say, use V(s; θ) and Q(s, a; θ). 5

7. Some details Eligibility traces, only used in tabular case. Q-learning: target network, replay buffer, soft Q-learning, task-speciﬁc gains and biases. 6

8. Experiments • Continual Q-learning(tabular Q-values) • Continual Multi-task Deep RL(DQN, unrelated tasks) • Continual Learning within a Single Task(without replay buffer) 7

9. Continual Q-learning 10x10 grid map. 5 actions. Two tasks: 1. the reward located at upper right corner. 2. the reward located at bottom left corner. Alternate tasks every 10000 episodes. Directly memorize Q values(tabular). 8

10. Continual Q-learning Figure 2: Continual Q-learning 9

11. Continual Multi-task Deep RL Two completely different tasks: 1. Cart-Pole. 2. Catcher. Continuous observation space and discrete action space -> DQN. Memorize the parameters of DQN. 10

12. Continual Multi-task Deep RL Figure 3: Cart-Pole(left), Catcher(right) 11

13. Continual Multi-task Deep RL Figure 4: Continual Multi-task Deep RL 12

14. Continual Learning within a Single Task Targets is moving during learning process, strong correlation, replay buffer is used to alleviate this problem. Try to remove replay buffer, and learn single task. 13

15. Continual Learning within a Single Task Figure 5: Continual Learning within a Single Task 14

16. Conclusion Looks good on simple tasks. Didn’t work on more complex tasks(from ALE) -> still too simple. Fast: 1.5-2 times slower than Q-learning 15

Continual reinforcement learning with complex synapses

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Continual reinforcement learning with complex synapses

Similar to Continual reinforcement learning with complex synapses (20)

Recently uploaded

Recently uploaded (20)

Continual reinforcement learning with complex synapses