SlideShare a Scribd company logo
VARIATIONAL CONTINUAL LEARNING
FOR DEEP DISCRIMINATIVE MODELS
2019. 2. 27.
Wonjun Chung
wonjunc@mli.kaist.ac.kr
CONTENTS
1. Continual Learning Backgrounds
2. Continual Learning by Approximate Bayesian
Inference
3. Variational Continual Learning and Episodic
Memory Enhancement
4. Experiments
5. Discussion
PART 1
CONTINUAL LEARNING BACKGROUNDS
- CONCEPTS, BENCHMARKS
4
• Continual Learning is a very general form of online learning
• Data continuously arrive in a non i.i.d way
• Tasks may change over time
• Entirely new tasks can emerge
• Model must adapt to perform well on the entire set of tasks in an
incremental way that avoids revisiting all previous data
CONCEPTS OF THE CONTINUAL LEARNING
5
• It is challenging to balance between adapting to recent task and retaining
knowledge from old tasks
• Plasticity & Stability trade-off
CONCEPTS OF THE CONTINUAL LEARNING
6
• Permuted MNIST
• Split MNIST/CIFAR
BENCHMARKS OF THE CONTINUAL LEARNING
Task 1
Task 2
Task 3
1 2 3
7
• Split MNIST/CIFAR is more difficult than the Permuted MNIST
• Multi-head discriminative network
• Each task t has own “head network” and it is not optimization variable in
later tasks
• Discussion point 1:
• How to reduce catastrophic forgetting in the multi-head networks?
BENCHMARKS OF THE CONTINUAL LEARNING
PART 2
CONTINUAL LEARNING BY
APPROXIMATE BAYESIAN INFERENCE
- VARIATIONAL INFERENCE
9
• Bayesian Inference provides a natural framework for continual learning
BAYESIAN INFERENCE IN CONTINUAL LEARNING
.
.
.
1
0
• The posterior distribution after seeing T tasks(datasets) is recovered by
applying Bayes’ rule:
• True posterior distribution is intractable
• Approximation is required
• Variational KL minimization (Variational Inference)
BAYESIAN INFERENCE IN CONTINUAL LEARNING
1
1
• Recall
VARIATIONAL INFERENCE
Variational Lower Bound (Appendix)
PART 3
VARIATIONAL CONTINUAL LEARNING AND EPISODIC
MEMORY ENHANCEMENT
- CORESET ALGORITHM
1
3
VARIATIONAL CONTINUAL LEARNING
Goal of VCL
: Set of allowed approximate posteriors (Gaussian mean-field approx.)
: Intractable normalizing constant (not required)
• Zeroth approximate distribution is defined to be the prior :
• Repeated approximation may accumulate errors causing the model to forget old
tasks.
• Gaussian mean-field approximation :
1
4
• For each task, is produced by selecting new data points from the current task and a
selection from the old corset
• Any heuristic can be used to make selections
• Random, K-Center algorithm
CORESET
Coreset
: Small representative set of data from previously observed data in order to mitigate
catastrophic forgetting
1
5
• Input: Prior
• Output:
• Initialize the coreset and variational approximation:
• For 1st task (t = 1),
• Observe the dataset
• update the coreset using and
• Update the variational distribution for non-coreset data points:
CORESET VCL ALGORITHM
1
6
• Con’t
• Compute the final variational distribution:
• Only used for prediction, and not propagation
• Perform prediction at test input :
• Iterate for t = 1… T
CORESET VCL ALGORITHM
1
7
CORESET VCL ALGORITHM
Train
Test
• Discussion point 2 : Is it reasonable?
1
8
• Recall
OBJECTIVE OF VCL
Final objective
minimizing with respect to the variational parameter
Regularization Likelihood
1
9
• KL divergence of between two Gaussian can be computed in closed-form (Appendix)
• Expected log likelihood requires further approximation
• Monte Carlo Sampling
• Local Reparameterization trick
•
•
OBJECTIVE OF VCL
Final objective
minimizing with respect to the variational parameter
2
0
•
MONTE CARLO GRADIENTS
Proposition
Let be a random variable having a probability density given by and let
where is a deterministic function. Suppose further that the marginal probability
density of is such that
Then a function with derivatives in :
PART 4
EXPERIMENTS
- OVERVIEW OF RELATED WORKS, CONTRAST
2
2
• Continual Learning for Deep Discriminative models:
• Regularized maximum likelihood estimation
RELATED WORK
Regularization termLikelihood
: Overall regularization strength
: Diagonal matrix that encodes the relative strength of regularization on each element of
2
3
• Continual Learning for Deep Discriminative models:
• Regularized maximum likelihood estimation
• Laplace Propagation (LP) : Laplace’s approximation at each step
• Diagonal Laplace propagation
RELATED WORK
Penalty termLikelihood
: Initialized using the covariance of the Gaussian prior
2
4
• Elastic Weight Consolidation(EWC) :
• Approximating the average Hessian of the likelihoods using the Fisher information
• Regularization:
• Synaptic Intelligence (SI):
: Comparing the changing rate of the gradients of the objective and the changing
rate of the parameters
RELATED WORK
Only for just before task :
All previous tasks :
2
5
• Permuted MNIST
• Coreset size = 200
• Discussion point 3:
• There is significant gap between R.C. only and K-center only but the gap is
vanished when applying VCL
AVERAGE TEST SET ACCURACY
2
6
• There is no significant performance gap between VCL w.o coreset and large Coreset
EFFECT OF CORESET SIZE
2
7
• VCL outperforms EWC and LP but it is slightly worse than SI.
SPLIT MNIST ACCURACY
2
8
CONTOUR OF THE PREDICTION PROBABILITIES
PART 5
DISCUSSION
3
0
• Discussion point 1:
• How to reduce catastrophic forgetting in the multi-head networks?
• Discussion point 2 :
• Is it reasonable?
• Discussion point 3:
• There is significant gap between R.C. only and K-center only but the gap is
vanished when applying VCL
• They use Bayesian Neural Networks but did not mention about uncertainty.
• How does learning new task affect uncertainty of the old model?
• Uncertainty-Guided Continual Learning?
DISCUSSION
ANY QUESTIONS?
32
• Variational Continual Learning (ICLR 2018)
• Weight Uncertainty in Neural Networks (ICML 2015)
REFERENCES
33
APPENDIX
• KL divergence of two Gaussians:
where,
34
• con’t
APPENDIX

More Related Content

What's hot

Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
Sangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
Sangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
Sangwoo Mo
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
Sangwoo Mo
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering
煜林 车
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Sangwoo Mo
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sangmin Woo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
Sangwoo Mo
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
Taesu Kim
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
Sangwoo Mo
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
sagayalavanya2
 
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
MLAI2
 
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AIG. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
MLILAB
 
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAIJ. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
MLILAB
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
Introduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural NetworksIntroduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural Networks
Miles Cranmer
 
J. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AIJ. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AI
MLILAB
 

What's hot (19)

Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Parallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural ClusteringParallelizing Pruning-based Graph Structural Clustering
Parallelizing Pruning-based Graph Structural Clustering
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
 
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AIG. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
G. Park, J.-Y. Yang, et. al., NeurIPS 2020, MLILAB, KAIST AI
 
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAIJ. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
J. Park, H. Shim, AAAI 2022, MLILAB, KAISTAI
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Introduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural NetworksIntroduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural Networks
 
J. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AIJ. Park, AAAI 2022, MLILAB, KAIST AI
J. Park, AAAI 2022, MLILAB, KAIST AI
 

Similar to Continual learning: Variational continual learning

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
Seunghyun Hwang
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
Sungchul Kim
 
Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning ability
Nabeel Aron
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Vincenzo Lomonaco
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Universitat Politècnica de Catalunya
 
Deep Learning in Limited Resource Environments
Deep Learning in Limited Resource EnvironmentsDeep Learning in Limited Resource Environments
Deep Learning in Limited Resource Environments
OguzVuruskaner
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
Nils van Velzen
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
Sunghoon Joo
 
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
Kien Duc Do
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
Katy Lee
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
Seunghyun Hwang
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
FEG
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learning
Nguyen Giang
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Sungchul Kim
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
thanhdowork
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
sudheeremoa229
 
EE5180_G-5.pptx
EE5180_G-5.pptxEE5180_G-5.pptx
EE5180_G-5.pptx
MandeepChaudhary10
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
Tuan Yang
 

Similar to Continual learning: Variational continual learning (20)

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning ability
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
Deep Learning in Limited Resource Environments
Deep Learning in Limited Resource EnvironmentsDeep Learning in Limited Resource Environments
Deep Learning in Limited Resource Environments
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
Semi-Supervised Learning with Variational Bayesian Inference and Maximum Unce...
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Variational continual learning
Variational continual learningVariational continual learning
Variational continual learning
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
EE5180_G-5.pptx
EE5180_G-5.pptxEE5180_G-5.pptx
EE5180_G-5.pptx
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Continual learning: Variational continual learning

  • 1. VARIATIONAL CONTINUAL LEARNING FOR DEEP DISCRIMINATIVE MODELS 2019. 2. 27. Wonjun Chung wonjunc@mli.kaist.ac.kr
  • 2. CONTENTS 1. Continual Learning Backgrounds 2. Continual Learning by Approximate Bayesian Inference 3. Variational Continual Learning and Episodic Memory Enhancement 4. Experiments 5. Discussion
  • 3. PART 1 CONTINUAL LEARNING BACKGROUNDS - CONCEPTS, BENCHMARKS
  • 4. 4 • Continual Learning is a very general form of online learning • Data continuously arrive in a non i.i.d way • Tasks may change over time • Entirely new tasks can emerge • Model must adapt to perform well on the entire set of tasks in an incremental way that avoids revisiting all previous data CONCEPTS OF THE CONTINUAL LEARNING
  • 5. 5 • It is challenging to balance between adapting to recent task and retaining knowledge from old tasks • Plasticity & Stability trade-off CONCEPTS OF THE CONTINUAL LEARNING
  • 6. 6 • Permuted MNIST • Split MNIST/CIFAR BENCHMARKS OF THE CONTINUAL LEARNING Task 1 Task 2 Task 3 1 2 3
  • 7. 7 • Split MNIST/CIFAR is more difficult than the Permuted MNIST • Multi-head discriminative network • Each task t has own “head network” and it is not optimization variable in later tasks • Discussion point 1: • How to reduce catastrophic forgetting in the multi-head networks? BENCHMARKS OF THE CONTINUAL LEARNING
  • 8. PART 2 CONTINUAL LEARNING BY APPROXIMATE BAYESIAN INFERENCE - VARIATIONAL INFERENCE
  • 9. 9 • Bayesian Inference provides a natural framework for continual learning BAYESIAN INFERENCE IN CONTINUAL LEARNING . . .
  • 10. 1 0 • The posterior distribution after seeing T tasks(datasets) is recovered by applying Bayes’ rule: • True posterior distribution is intractable • Approximation is required • Variational KL minimization (Variational Inference) BAYESIAN INFERENCE IN CONTINUAL LEARNING
  • 12. PART 3 VARIATIONAL CONTINUAL LEARNING AND EPISODIC MEMORY ENHANCEMENT - CORESET ALGORITHM
  • 13. 1 3 VARIATIONAL CONTINUAL LEARNING Goal of VCL : Set of allowed approximate posteriors (Gaussian mean-field approx.) : Intractable normalizing constant (not required) • Zeroth approximate distribution is defined to be the prior : • Repeated approximation may accumulate errors causing the model to forget old tasks. • Gaussian mean-field approximation :
  • 14. 1 4 • For each task, is produced by selecting new data points from the current task and a selection from the old corset • Any heuristic can be used to make selections • Random, K-Center algorithm CORESET Coreset : Small representative set of data from previously observed data in order to mitigate catastrophic forgetting
  • 15. 1 5 • Input: Prior • Output: • Initialize the coreset and variational approximation: • For 1st task (t = 1), • Observe the dataset • update the coreset using and • Update the variational distribution for non-coreset data points: CORESET VCL ALGORITHM
  • 16. 1 6 • Con’t • Compute the final variational distribution: • Only used for prediction, and not propagation • Perform prediction at test input : • Iterate for t = 1… T CORESET VCL ALGORITHM
  • 17. 1 7 CORESET VCL ALGORITHM Train Test • Discussion point 2 : Is it reasonable?
  • 18. 1 8 • Recall OBJECTIVE OF VCL Final objective minimizing with respect to the variational parameter Regularization Likelihood
  • 19. 1 9 • KL divergence of between two Gaussian can be computed in closed-form (Appendix) • Expected log likelihood requires further approximation • Monte Carlo Sampling • Local Reparameterization trick • • OBJECTIVE OF VCL Final objective minimizing with respect to the variational parameter
  • 20. 2 0 • MONTE CARLO GRADIENTS Proposition Let be a random variable having a probability density given by and let where is a deterministic function. Suppose further that the marginal probability density of is such that Then a function with derivatives in :
  • 21. PART 4 EXPERIMENTS - OVERVIEW OF RELATED WORKS, CONTRAST
  • 22. 2 2 • Continual Learning for Deep Discriminative models: • Regularized maximum likelihood estimation RELATED WORK Regularization termLikelihood : Overall regularization strength : Diagonal matrix that encodes the relative strength of regularization on each element of
  • 23. 2 3 • Continual Learning for Deep Discriminative models: • Regularized maximum likelihood estimation • Laplace Propagation (LP) : Laplace’s approximation at each step • Diagonal Laplace propagation RELATED WORK Penalty termLikelihood : Initialized using the covariance of the Gaussian prior
  • 24. 2 4 • Elastic Weight Consolidation(EWC) : • Approximating the average Hessian of the likelihoods using the Fisher information • Regularization: • Synaptic Intelligence (SI): : Comparing the changing rate of the gradients of the objective and the changing rate of the parameters RELATED WORK Only for just before task : All previous tasks :
  • 25. 2 5 • Permuted MNIST • Coreset size = 200 • Discussion point 3: • There is significant gap between R.C. only and K-center only but the gap is vanished when applying VCL AVERAGE TEST SET ACCURACY
  • 26. 2 6 • There is no significant performance gap between VCL w.o coreset and large Coreset EFFECT OF CORESET SIZE
  • 27. 2 7 • VCL outperforms EWC and LP but it is slightly worse than SI. SPLIT MNIST ACCURACY
  • 28. 2 8 CONTOUR OF THE PREDICTION PROBABILITIES
  • 30. 3 0 • Discussion point 1: • How to reduce catastrophic forgetting in the multi-head networks? • Discussion point 2 : • Is it reasonable? • Discussion point 3: • There is significant gap between R.C. only and K-center only but the gap is vanished when applying VCL • They use Bayesian Neural Networks but did not mention about uncertainty. • How does learning new task affect uncertainty of the old model? • Uncertainty-Guided Continual Learning? DISCUSSION
  • 32. 32 • Variational Continual Learning (ICLR 2018) • Weight Uncertainty in Neural Networks (ICML 2015) REFERENCES
  • 33. 33 APPENDIX • KL divergence of two Gaussians: where,