SlideShare a Scribd company logo
1 of 55
Download to read offline
Continual Learning
in
Deep Neural Networks
Concepts, Recent Work Reviews, Direction of research
Wonjun Chung
wonjunc@mli.kaist.ac.kr
Contents
I. Concepts of Continual Learning
A. Problem Definition
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI*, SSL)
B. Dynamic Architectures (PN, DEN*)
C. Dual-Memory (PCN, GR*)
D. Others (IMM*, VCL*, GEM*)
III. Research Direction
* It is not future reading paper. I already read this paper, but I will update it before presentation.
02/08 기준 추가해야 할 것
1. Previous work의 실험 결과
2. Previous work의 limitation
3. * Paper review 슬라이드
4. Benchmark of continual learning
I. Concepts of Continual Learning
A. Problem Definition
Human and animals have the ability to continually acquire and fine-tune knowledge
throughout their lifespan.
I. Concepts of Continual Learning
A. Problem Definition
Continual learning capabilities are crucial for artificial agents interacting in the real
world and processing continuous streams of information.
Time
Additional tasks
I. Concepts of Continual Learning
B. Catastrophic Forgetting
Catastrophic Forgetting
When a neural network is used to learn a sequence of tasks, the learning of the later
tasks may degrade the performance of the models learned for the earlier tasks.
Time
Catastrophic forgetting
I. Concepts of Continual Learning
B. Catastrophic Forgetting
Catastrophic Forgetting
It occurs as the network parameters shift toward the optimal state for performing the
second of two successive tasks, overwriting the configuration that allowed them to
perform the first.
I. Concepts of Continual Learning
C. Desiderata
Desiderata of Continual Learning
● Online learning*
○ Learning occurs at every moment, with no fixed tasks or data sets and no
clear boundaries between tasks.
● Presence of transfer (forward/backward)*
○ Learning agents should be able to transfer and adapt what it learned from
previous experience, as well as make use of more recent experience to
improve performance on capabilities learned earlier.
* Not an absolute criterion
I. Concepts of Continual Learning
C. Desiderata
Desiderata of Continual Learning
● Resistance to catastrophic forgetting
○ New learning should not destroy performance on previously seen data.
● Bounded model size*
○ Capacity should be fixed, forcing the model to use its resources
intelligently, gracefully forgetting what it has learned.
● No direct access to previous experience
○ Model cannot access to all of its past experience(data).
* Not an absolute criterion
Contents
I. Concepts of Continual Learning
A. Problem Definition (data, metric)
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI*, SSL)
B. Dynamic Architectures (PN, DEN)
C. Dual-Memory (PCN, GR)
D. Others (IMM, VCL, GEM)
III. Research Direction
II. Various Approaches
A. Regularization
Regularization Approaches
Regularization approaches alleviate catastrophic forgetting by imposing constraints
on the update of the neural weights
● Paper list
○ Overcoming catastrophic forgetting in neural networks (EWC), PNAS 2017
○ Continual Learning Through Synaptic Intelligence (SI), ICML 2017
○ Selfless Sequential Learning (SSL), ICLR 2019
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
● Motivation
It is inspired by human brain in which synaptic consolidation enables continual
learning by reducing the plasticity of synapse related to previous learned tasks.
It can be done by constraining important parameters to stay close to their old
values.
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
In computationally, quadratic penalty on the difference between the parameters for
the old and the new tasks that slows down the learning for task-relevant weights
coding for previously learned knowledge.
How to choose important parameters for the earlier tasks?
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
The Bayesian approach is used to measure the importance of parameters.
The relevance of the parameter 𝜃 with respect to a task’s training data is modeled
as the posterior distribution
The log probability of the data given the parameters is simply the negative
of the loss function for the problem at hand
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
The data is split into two independent parts,
one defining task A( ) and the other task B( )
Posterior probability of the
parameters given the entire
dataset
Loss function for task B
Posterior probability of the
parameters given the task A
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
The posterior probability of the parameters given the task A ( ) must
contain information about which parameters were important to task A and is the key
to implementing EWC
Posterior probability of the
parameters given the entire
dataset
Loss function for task B
Posterior probability of the
parameters given the task A
(Intractable)
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
It approximates the posterior as a Gaussian distribution(Laplace’s approximation)
with mean given by the parameters and a diagonal precision given by the
diagonal of the Fisher information matrix
Given this approximation, the objective in EWC is:
: Loss for task B only , : importance of old task
: Parameter index
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
● Fisher information matrix
It can measure sensitivity of the model with respect to the parameter
The diagonal entry is:
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
II. Various Approaches
A. Regularization
Elastic Weight Consolidation (EWC)
● Results
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Core motivation
In neural biology, lateral inhibition describes the process where an activated neuron reduces the
activity of its weaker neighbors. It creates a powerful decorrelated and compact representation with
minimum interference between different input patterns in the brain (Yu et al., 2014).
1. Decorrelated representation
○ Learning a decorrelated representation is less vulnerable to catastrophic
forgetting
2. Sparse representation
○ It leaves enough capacity for future tasks. (Selfless)
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Core motivation
Orange indicates changed neurons activations as a result of the second task.
Such interference is largely reduced when imposing sparsity on the representation.
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Core Idea
:
Encourages sparsity in the
activations for each
layer
Additional Regularizer
It penalizes changes to important
parameters for earlier tasks (e.g.
EWC)
Loss for current task
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Sparse Coding through Neural Inhibition (SNI)
: Hidden layer with activations for a set of inputs
running over all N neurons
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Sparse Coding through Neural Inhibition (SNI)
1. By assuming a close to zero mean of the activations(ReLU), ,
it minimizes the correlation between any two active neurons.
Decorrelated representation
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Sparse Coding through Neural Inhibition (SNI)
2. Each active neuron receives a penalty from every other active neuron that
corresponds to that other neuron’s activation magnitude.
Sparse representation
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Sparse Coding through Local Neural Inhibition (SLNI)
SNI is too harsh for complex tasks that need a richer representation.
Relaxing the objective by imposing a spatial weighting to the correlation penalty.
An active neuron penalizes mostly its close neighbours and this effect vanishes for
neurons further away.
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Neuron Importance for Discounting Inhibition (SLNID)
When the new tasks are similar to previous task(shared pattern), neurons used for
previous tasks will be active.
The SLNI would discourage other neurons from being active and encourage the new
task to adapt the already activate neurons. => Catastrophic forgetting
To avoid such interference,
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Neuron Importance for Discounting Inhibition (SLNID)
Add a weight factor taking into account the importance of the neurons w.r.t the
previous tasks.
Sensitivity of the loss w.r.t the neurons outputs
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Neuron Importance for Discounting Inhibition (SLNID)
It will not suppress the other neurons from being active
neither be affected by other active neurons
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
SNID SLNID
First layer neuron importance after learning the first task. More active neurons are
tolerated in SLNID.
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
SNID SLNID
SLNID allows previous neurons to be re-used for the third task. It avoids changing
the previous important neurons by adding new neurons.
II. Various Approaches
A. Regularization
Selfless Sequential Learning (SSL)
● Final Objective
Contents
I. Concepts of Continual Learning
A. Problem Definition (data, metric)
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI, SSL)
B. Dynamic Architectures (PN, DEN*)
C. Dual-Memory (PCN, GR)
D. Others (IMM, VCL, GEM)
III. Research Direction
II. Various Approaches
B. Dynamic Architectures
Dynamic Architecture Approaches
It changes architecture properties in response to new tasks by dynamically
accommodating novel neural resources. (e.g., re-training with an increased number
of neurons or layers.)
● Paper list
○ Progressive Neural Networks (PN), NIPS 2016
○ Lifelong Learning with Dynamically Expandable Networks (DEN), ICLR 2018
II. Various Approaches
B. Dynamic Architectures
Progressive Neural Network (PN)
● Core Idea
Catastrophic forgetting is prevented by instantiating a new neural network for each
task being solved, while transfer is enabled via lateral connections to features of
previously learned columns.
II. Various Approaches
B. Dynamic Architectures
Progressive Neural Network (PN)
PN starts with a single column:
a neural network having layers with hidden activations , with the
number of units at layer , and parameters trained to convergence.
II. Various Approaches
B. Dynamic Architectures
Progressive Neural Network (PN)
When switching to a second task, the parameters are “frozen” and a new
column with parameters is instantiated, where layer receives input from
both and via lateral connections
II. Various Approaches
B. Dynamic Architectures
Progressive Neural Network (PN)
Weight matrix of layer of
column
Lateral connections from
layer of column ,
to layer of column
II. Various Approaches
B. Dynamic Architectures
Progressive Neural Network (PN)
● Adapters
Non-linear lateral connections. They serve both to improve initial conditioning and
dimensionality reduction.
Anterior feature vector
with dimensionality
Scale factor
Projection matrix
Contents
I. Concepts of Continual Learning
A. Problem Definition (data, metric)
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI, SSL)
B. Dynamic Architectures (PN, DEN)
C. Dual-Memory (PCN, GR*)
D. Others (IMM, VCL, GEM)
III. Research Direction
II. Various Approaches
C. Dual Memory
Dual-Memory Approaches
The CLS* theory provides the basis for the approaches.
● Paper list
○ Progress & Compress: A scalable framework for continual learning (PCN), ICML 2018
○ Continual Learning with Deep Generative Replay (GR), NIPS 2017
*Complement Learning System
Rapid Learning : Initial learning of arbitrary new information
Gradual Learning of structured knowledge
Bidirectional connections: storage, retrieval and replay
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
The PCN implements two neural networks, a knowledge base and an active column,
which are trained in two distinct, alternating phases(Progress & Compress).
Knowledge
base
(Compress)
Active
column
(Progress)
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
● Progress phase
The knowledge base is fixed, while parameters in the active column are optimised
without constraints or regularisation.
PNC enables the reuse of past information through layerwise connections
between knowledge base and active column (similar to PN).
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
● Compress phase
Newly learnt parameters are consolidated into knowledge base.
The consolidation is done via a distillation process.
Objective
Minimize the KL Divergence between the active column and
knowledge base prediction
: Prediction of the active column and
knowledge base respectively.
&
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
● Compress phase
Minimizing the KL Divergence is same as minimizing the cross-entropy
Fixed
cross-entropy
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
● Compress phase
The original EWC makes the computational cost linear in the number of tasks.
Thus, Online EWC is used to avoid catastrophic forgetting.
Final objective
Online EWC
II. Various Approaches
C. Dual-Memory
Progress & Compress Network (PCN)
● Online EWC
Apply Laplace’s approximation to the whole posterior, rather than the likelihood
terms. The Gaussian approximation of previous task likelihoods are “re-centered” at
the latest MAP parameter
EWC (Mahalanobis norm) Online EWC
A mean and a Fisher need to be kept for each
task, which makes the computational cost
linear in the number of tasks.
,
Contents
I. Concepts of Continual Learning
A. Problem Definition (data, metric)
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI, SSL)
B. Dynamic Architectures (PN, DEN)
C. Dual-Memory (PCN, GR)
D. Others (IMM*, VCL*, GEM*)
III. Research Direction
IV. Appendix
Contents
I. Concepts of Continual Learning
A. Problem Definition (data, metric)
B. Catastrophic Forgetting
C. Desiderata
II. Various Approaches
A. Regularization (EWC, SI, SSL)
B. Dynamic Architectures (PN, DEN)
C. Dual-Memory (PCN, GR)
D. Others (IMM, VCL, GEM)
III. Research Direction
III. Research Direction
Plan (~ 3월 초)
1. Sparse & Decorrelated representation
a. Selfless Sequential Learning 구현 및 추가실험
2. Knowledge Distillation
a. Progress & Compress 구현 및 추가실험
3. Variational Bayes
a. Variational Continual Learning 구현 및 추가실험
4. Task-agnostic Continual Learning (final research goal)
a. Online version of Variational Bayes (paper for task-agnostic CL)
b. 위의 1, 2, 3 을 task-agnostic continual learning에 접목 가능 할지
III. Research Direction
A. Motivation
● Task-agnostic Continual Learning
What if the task changes gradually ?
So far, most papers assume that the model knows task boundaries or labels(Task-aware).
In practice, the data distribution shifts gradually without hard task boundaries as well as
task labels(Task-agnostic).
Task 1
Task 2
Task 3
Task
Task
Task
Task-aware Task-agnostic
III. Research Direction
A. Motivation
● Limitations of the previous work
EWC : Task boundaries are needed to compute Fisher of previous task
SSL : Task boundaries are needed to compute the Fisher of previous tasks and the importance of neurons.
PN : Task boundaries and labels are needed to add new columns and inference.
PCN : Task boundaries are needed to change the progress mode to compress mode.
GEM : Task boundaries and labels are needed to make episodic memory and compute the gradient of it.
III. Research Direction
A. Motivation
General training scheme of Continual learning
Task 1
Task 2
Task 3
Learn T1
Learn T2
Consolidate T1
Learn T3
Consolidate
T1 ,T2
Stop training
&
Compute fisher
Add a new column
Distillation
...
Task-aware
III. Research Direction
A. Motivation
Q.
1. When and how should we consolidate knowledge of earlier tasks?
2. How do we make a model that knows that task is changing?
3. Online version of variational Bayes (Paper for task-agnostic CL)
Task
Task
Task
III. Research Direction
B. Research plan
Reproducing & Extra experiment
● Selfless Sequential Learning
● Progress & Compress
● Variational Continual Learning
Paper Reading
● Online structured Laplace Approximations For Overcoming Catastrophic Forgetting
● Task Agnostic Continual Learning Using Online Variational Bayes
● A Bayesian Approach to On-line Learning
● Overcoming Catastrophic Interference using Conceptor-Aided Backpropagation
● iCaRL: Incremental classifier and representation learning
● Overcoming Catastrophic Forgetting with Hard Attention to the Task
● Lifelong learning with a network of experts
● Distilling the knowledge in a neural network
● etc...

More Related Content

What's hot

Overcoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkOvercoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkKaty Lee
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningMLAI2
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learningNguyen Giang
 
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...Universitat Politècnica de Catalunya
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...Jihwan Bang
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningMLAI2
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsJoonyoung Yi
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learningMichał Kuźba
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
Federated learning in brief
Federated learning in briefFederated learning in brief
Federated learning in briefShashi Perera
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 

What's hot (20)

Overcoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkOvercoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural network
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learning
 
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...Generative Models and Adversarial Training  (D3L4 2017 UPC Deep Learning for ...
Generative Models and Adversarial Training (D3L4 2017 UPC Deep Learning for ...
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learning
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Federated learning in brief
Federated learning in briefFederated learning in brief
Federated learning in brief
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 

Similar to Continual learning: Survey

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdfKammetaJoshna
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural NetworksYogendra Tamang
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequencesClaudio Gallicchio
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...ijaia
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
 
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...Analysis of Influences of memory on Cognitive load Using Neural Network Back ...
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...ijdmtaiir
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final ReportShikhar Agarwal
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognitionHưng Đặng
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognitionHưng Đặng
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 

Similar to Continual learning: Survey (20)

CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
imageclassification-160206090009.pdf
imageclassification-160206090009.pdfimageclassification-160206090009.pdf
imageclassification-160206090009.pdf
 
Deep learning and its applications in biomedicine
Deep learning and its applications in biomedicineDeep learning and its applications in biomedicine
Deep learning and its applications in biomedicine
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Image classification with Deep Neural Networks
Image classification with Deep Neural NetworksImage classification with Deep Neural Networks
Image classification with Deep Neural Networks
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
A SYSTEMATIC STUDY OF DEEP LEARNING ARCHITECTURES FOR ANALYSIS OF GLAUCOMA AN...
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
 
Facebook Deep face
Facebook Deep faceFacebook Deep face
Facebook Deep face
 
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...Analysis of Influences of memory on Cognitive load Using Neural Network Back ...
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...
 
Fuzzy Logic Final Report
Fuzzy Logic Final ReportFuzzy Logic Final Report
Fuzzy Logic Final Report
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
Ffnn
FfnnFfnn
Ffnn
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Continual learning: Survey

  • 1. Continual Learning in Deep Neural Networks Concepts, Recent Work Reviews, Direction of research Wonjun Chung wonjunc@mli.kaist.ac.kr
  • 2. Contents I. Concepts of Continual Learning A. Problem Definition B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI*, SSL) B. Dynamic Architectures (PN, DEN*) C. Dual-Memory (PCN, GR*) D. Others (IMM*, VCL*, GEM*) III. Research Direction * It is not future reading paper. I already read this paper, but I will update it before presentation. 02/08 기준 추가해야 할 것 1. Previous work의 실험 결과 2. Previous work의 limitation 3. * Paper review 슬라이드 4. Benchmark of continual learning
  • 3. I. Concepts of Continual Learning A. Problem Definition Human and animals have the ability to continually acquire and fine-tune knowledge throughout their lifespan.
  • 4. I. Concepts of Continual Learning A. Problem Definition Continual learning capabilities are crucial for artificial agents interacting in the real world and processing continuous streams of information. Time Additional tasks
  • 5. I. Concepts of Continual Learning B. Catastrophic Forgetting Catastrophic Forgetting When a neural network is used to learn a sequence of tasks, the learning of the later tasks may degrade the performance of the models learned for the earlier tasks. Time Catastrophic forgetting
  • 6. I. Concepts of Continual Learning B. Catastrophic Forgetting Catastrophic Forgetting It occurs as the network parameters shift toward the optimal state for performing the second of two successive tasks, overwriting the configuration that allowed them to perform the first.
  • 7. I. Concepts of Continual Learning C. Desiderata Desiderata of Continual Learning ● Online learning* ○ Learning occurs at every moment, with no fixed tasks or data sets and no clear boundaries between tasks. ● Presence of transfer (forward/backward)* ○ Learning agents should be able to transfer and adapt what it learned from previous experience, as well as make use of more recent experience to improve performance on capabilities learned earlier. * Not an absolute criterion
  • 8. I. Concepts of Continual Learning C. Desiderata Desiderata of Continual Learning ● Resistance to catastrophic forgetting ○ New learning should not destroy performance on previously seen data. ● Bounded model size* ○ Capacity should be fixed, forcing the model to use its resources intelligently, gracefully forgetting what it has learned. ● No direct access to previous experience ○ Model cannot access to all of its past experience(data). * Not an absolute criterion
  • 9. Contents I. Concepts of Continual Learning A. Problem Definition (data, metric) B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI*, SSL) B. Dynamic Architectures (PN, DEN) C. Dual-Memory (PCN, GR) D. Others (IMM, VCL, GEM) III. Research Direction
  • 10. II. Various Approaches A. Regularization Regularization Approaches Regularization approaches alleviate catastrophic forgetting by imposing constraints on the update of the neural weights ● Paper list ○ Overcoming catastrophic forgetting in neural networks (EWC), PNAS 2017 ○ Continual Learning Through Synaptic Intelligence (SI), ICML 2017 ○ Selfless Sequential Learning (SSL), ICLR 2019
  • 11. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) ● Motivation It is inspired by human brain in which synaptic consolidation enables continual learning by reducing the plasticity of synapse related to previous learned tasks. It can be done by constraining important parameters to stay close to their old values.
  • 12. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) In computationally, quadratic penalty on the difference between the parameters for the old and the new tasks that slows down the learning for task-relevant weights coding for previously learned knowledge. How to choose important parameters for the earlier tasks?
  • 13. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) The Bayesian approach is used to measure the importance of parameters. The relevance of the parameter 𝜃 with respect to a task’s training data is modeled as the posterior distribution The log probability of the data given the parameters is simply the negative of the loss function for the problem at hand
  • 14. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) The data is split into two independent parts, one defining task A( ) and the other task B( ) Posterior probability of the parameters given the entire dataset Loss function for task B Posterior probability of the parameters given the task A
  • 15. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) The posterior probability of the parameters given the task A ( ) must contain information about which parameters were important to task A and is the key to implementing EWC Posterior probability of the parameters given the entire dataset Loss function for task B Posterior probability of the parameters given the task A (Intractable)
  • 16. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) It approximates the posterior as a Gaussian distribution(Laplace’s approximation) with mean given by the parameters and a diagonal precision given by the diagonal of the Fisher information matrix Given this approximation, the objective in EWC is: : Loss for task B only , : importance of old task : Parameter index
  • 17. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) ● Fisher information matrix It can measure sensitivity of the model with respect to the parameter The diagonal entry is:
  • 18. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC)
  • 19. II. Various Approaches A. Regularization Elastic Weight Consolidation (EWC) ● Results
  • 20. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Core motivation In neural biology, lateral inhibition describes the process where an activated neuron reduces the activity of its weaker neighbors. It creates a powerful decorrelated and compact representation with minimum interference between different input patterns in the brain (Yu et al., 2014). 1. Decorrelated representation ○ Learning a decorrelated representation is less vulnerable to catastrophic forgetting 2. Sparse representation ○ It leaves enough capacity for future tasks. (Selfless)
  • 21. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Core motivation Orange indicates changed neurons activations as a result of the second task. Such interference is largely reduced when imposing sparsity on the representation.
  • 22. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Core Idea : Encourages sparsity in the activations for each layer Additional Regularizer It penalizes changes to important parameters for earlier tasks (e.g. EWC) Loss for current task
  • 23. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Sparse Coding through Neural Inhibition (SNI) : Hidden layer with activations for a set of inputs running over all N neurons
  • 24. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Sparse Coding through Neural Inhibition (SNI) 1. By assuming a close to zero mean of the activations(ReLU), , it minimizes the correlation between any two active neurons. Decorrelated representation
  • 25. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Sparse Coding through Neural Inhibition (SNI) 2. Each active neuron receives a penalty from every other active neuron that corresponds to that other neuron’s activation magnitude. Sparse representation
  • 26. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Sparse Coding through Local Neural Inhibition (SLNI) SNI is too harsh for complex tasks that need a richer representation. Relaxing the objective by imposing a spatial weighting to the correlation penalty. An active neuron penalizes mostly its close neighbours and this effect vanishes for neurons further away.
  • 27. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Neuron Importance for Discounting Inhibition (SLNID) When the new tasks are similar to previous task(shared pattern), neurons used for previous tasks will be active. The SLNI would discourage other neurons from being active and encourage the new task to adapt the already activate neurons. => Catastrophic forgetting To avoid such interference,
  • 28. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Neuron Importance for Discounting Inhibition (SLNID) Add a weight factor taking into account the importance of the neurons w.r.t the previous tasks. Sensitivity of the loss w.r.t the neurons outputs
  • 29. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Neuron Importance for Discounting Inhibition (SLNID) It will not suppress the other neurons from being active neither be affected by other active neurons
  • 30. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) SNID SLNID First layer neuron importance after learning the first task. More active neurons are tolerated in SLNID.
  • 31. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) SNID SLNID SLNID allows previous neurons to be re-used for the third task. It avoids changing the previous important neurons by adding new neurons.
  • 32. II. Various Approaches A. Regularization Selfless Sequential Learning (SSL) ● Final Objective
  • 33. Contents I. Concepts of Continual Learning A. Problem Definition (data, metric) B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI, SSL) B. Dynamic Architectures (PN, DEN*) C. Dual-Memory (PCN, GR) D. Others (IMM, VCL, GEM) III. Research Direction
  • 34. II. Various Approaches B. Dynamic Architectures Dynamic Architecture Approaches It changes architecture properties in response to new tasks by dynamically accommodating novel neural resources. (e.g., re-training with an increased number of neurons or layers.) ● Paper list ○ Progressive Neural Networks (PN), NIPS 2016 ○ Lifelong Learning with Dynamically Expandable Networks (DEN), ICLR 2018
  • 35. II. Various Approaches B. Dynamic Architectures Progressive Neural Network (PN) ● Core Idea Catastrophic forgetting is prevented by instantiating a new neural network for each task being solved, while transfer is enabled via lateral connections to features of previously learned columns.
  • 36. II. Various Approaches B. Dynamic Architectures Progressive Neural Network (PN) PN starts with a single column: a neural network having layers with hidden activations , with the number of units at layer , and parameters trained to convergence.
  • 37. II. Various Approaches B. Dynamic Architectures Progressive Neural Network (PN) When switching to a second task, the parameters are “frozen” and a new column with parameters is instantiated, where layer receives input from both and via lateral connections
  • 38. II. Various Approaches B. Dynamic Architectures Progressive Neural Network (PN) Weight matrix of layer of column Lateral connections from layer of column , to layer of column
  • 39. II. Various Approaches B. Dynamic Architectures Progressive Neural Network (PN) ● Adapters Non-linear lateral connections. They serve both to improve initial conditioning and dimensionality reduction. Anterior feature vector with dimensionality Scale factor Projection matrix
  • 40. Contents I. Concepts of Continual Learning A. Problem Definition (data, metric) B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI, SSL) B. Dynamic Architectures (PN, DEN) C. Dual-Memory (PCN, GR*) D. Others (IMM, VCL, GEM) III. Research Direction
  • 41. II. Various Approaches C. Dual Memory Dual-Memory Approaches The CLS* theory provides the basis for the approaches. ● Paper list ○ Progress & Compress: A scalable framework for continual learning (PCN), ICML 2018 ○ Continual Learning with Deep Generative Replay (GR), NIPS 2017 *Complement Learning System Rapid Learning : Initial learning of arbitrary new information Gradual Learning of structured knowledge Bidirectional connections: storage, retrieval and replay
  • 42. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) The PCN implements two neural networks, a knowledge base and an active column, which are trained in two distinct, alternating phases(Progress & Compress). Knowledge base (Compress) Active column (Progress)
  • 43. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) ● Progress phase The knowledge base is fixed, while parameters in the active column are optimised without constraints or regularisation. PNC enables the reuse of past information through layerwise connections between knowledge base and active column (similar to PN).
  • 44. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) ● Compress phase Newly learnt parameters are consolidated into knowledge base. The consolidation is done via a distillation process. Objective Minimize the KL Divergence between the active column and knowledge base prediction : Prediction of the active column and knowledge base respectively. &
  • 45. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) ● Compress phase Minimizing the KL Divergence is same as minimizing the cross-entropy Fixed cross-entropy
  • 46. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) ● Compress phase The original EWC makes the computational cost linear in the number of tasks. Thus, Online EWC is used to avoid catastrophic forgetting. Final objective Online EWC
  • 47. II. Various Approaches C. Dual-Memory Progress & Compress Network (PCN) ● Online EWC Apply Laplace’s approximation to the whole posterior, rather than the likelihood terms. The Gaussian approximation of previous task likelihoods are “re-centered” at the latest MAP parameter EWC (Mahalanobis norm) Online EWC A mean and a Fisher need to be kept for each task, which makes the computational cost linear in the number of tasks. ,
  • 48. Contents I. Concepts of Continual Learning A. Problem Definition (data, metric) B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI, SSL) B. Dynamic Architectures (PN, DEN) C. Dual-Memory (PCN, GR) D. Others (IMM*, VCL*, GEM*) III. Research Direction IV. Appendix
  • 49. Contents I. Concepts of Continual Learning A. Problem Definition (data, metric) B. Catastrophic Forgetting C. Desiderata II. Various Approaches A. Regularization (EWC, SI, SSL) B. Dynamic Architectures (PN, DEN) C. Dual-Memory (PCN, GR) D. Others (IMM, VCL, GEM) III. Research Direction
  • 50. III. Research Direction Plan (~ 3월 초) 1. Sparse & Decorrelated representation a. Selfless Sequential Learning 구현 및 추가실험 2. Knowledge Distillation a. Progress & Compress 구현 및 추가실험 3. Variational Bayes a. Variational Continual Learning 구현 및 추가실험 4. Task-agnostic Continual Learning (final research goal) a. Online version of Variational Bayes (paper for task-agnostic CL) b. 위의 1, 2, 3 을 task-agnostic continual learning에 접목 가능 할지
  • 51. III. Research Direction A. Motivation ● Task-agnostic Continual Learning What if the task changes gradually ? So far, most papers assume that the model knows task boundaries or labels(Task-aware). In practice, the data distribution shifts gradually without hard task boundaries as well as task labels(Task-agnostic). Task 1 Task 2 Task 3 Task Task Task Task-aware Task-agnostic
  • 52. III. Research Direction A. Motivation ● Limitations of the previous work EWC : Task boundaries are needed to compute Fisher of previous task SSL : Task boundaries are needed to compute the Fisher of previous tasks and the importance of neurons. PN : Task boundaries and labels are needed to add new columns and inference. PCN : Task boundaries are needed to change the progress mode to compress mode. GEM : Task boundaries and labels are needed to make episodic memory and compute the gradient of it.
  • 53. III. Research Direction A. Motivation General training scheme of Continual learning Task 1 Task 2 Task 3 Learn T1 Learn T2 Consolidate T1 Learn T3 Consolidate T1 ,T2 Stop training & Compute fisher Add a new column Distillation ... Task-aware
  • 54. III. Research Direction A. Motivation Q. 1. When and how should we consolidate knowledge of earlier tasks? 2. How do we make a model that knows that task is changing? 3. Online version of variational Bayes (Paper for task-agnostic CL) Task Task Task
  • 55. III. Research Direction B. Research plan Reproducing & Extra experiment ● Selfless Sequential Learning ● Progress & Compress ● Variational Continual Learning Paper Reading ● Online structured Laplace Approximations For Overcoming Catastrophic Forgetting ● Task Agnostic Continual Learning Using Online Variational Bayes ● A Bayesian Approach to On-line Learning ● Overcoming Catastrophic Interference using Conceptor-Aided Backpropagation ● iCaRL: Incremental classifier and representation learning ● Overcoming Catastrophic Forgetting with Hard Attention to the Task ● Lifelong learning with a network of experts ● Distilling the knowledge in a neural network ● etc...