5. Introduction
Problem in
Applications
Image recognition
• new capabilities are to be added
• Assumes old data set is available
• Infeasible
• Vision system using CNN
• No old task data
• Train using new data
• Preserves old task
• A new method for Neural Nets
• To learn without forgetting
6. Introduction
CNN
Convolutional neural network
• Operates on volumes
• Convolution (filter):
• reduces to smaller size with specific information
• ReLu
• Pooling : reduces the spatial size to reduce computation and
parameters
• Maxpooling is to reduce to max parameter
14. Present methods
Currently developed
methods
Less Forgetting Learning:
• Similar
• Hinders change in
• Task specific decision boundary
• Shared representation
• Adds L2 loss :
• No change in ᶿs for new task
• ᶿo remains same
15. Present methods
Currently developed
methods
Cross-stitch Network:
• Works on MTL
• Introduces cross-stitch module
• Jointly learns:
• 2 same structure network blocks
• 2 pairs of weight -> same output(s)
• Outperforms joint training
• Needs old task DS
• Increases network size
16. Present methods
Currently developed
methods
WA-CNN:
• Expands the network (ᶿs)
• Improves new task performance
• Freezing ᶿo
• Maintains old-task
• Outperforms traditional fine-tuning
• But it increases network size faster than LwF
18. • Uses only new task data to train
• Preserves the original capabilities
• Performs favourably :
• Feature extraction
• Fine tuning adaptation
• Similar to MTL that uses old DS
The proposed
method
18
19. • a Unified vision system
• The CNN has parameters:
• ᶿs : shared parameter
• ᶿo : old / specific task parameter
• The goal is to add ᶿn : new task parameter
• Learn parameters
• Works well on old & new task
• using only new & not old task DS
The proposed
method
19
20. • Advantage over common approaches :
• Classification performance
• Outperforms feature extraction & fine-tuning
• Computation efficiency
• Faster training & test time
• But slower than fine tuning
• Simplicity in deployment
• No need to retrain in adapting network
The proposed
method
20
22. Phase I : Initialization
• The output is recorded (Yo) on old task for new data
• Response is a set of label probabilities
• A node for new class
• Weights initialised randomly
Procedure
22
23. Phase II : Training
• Train to minimize loss for all task
• Regularization using Stochastic gradient descent
• Two steps:
• Warm-up step
• Freeze ᶿo & ᶿs & trains ᶿn
• Joint optimization step
• Train all weights
Procedure
24. Phase II : Training
• Logistic loss
• Knowledge distillation loss
Procedure
25. Phase II : Training
• For calculating Knowledge distillation function
• Recorded & current probabilities
• T > 1 ; usually T = 2
• λo is a loss balance weight = 1
• Larger = greater old task performance
• Smaller = greater new task performance
Procedure
26. Experiment • Use
• Large dataset to train initial net
• Smaller dataset to add new task
• Old/Original task :
• ImageNet
• Contains 1000 object category
• more than 1000K training images
• Places365-standard
• Contains 365 scene classes
• ~1600K training images
• New task:
• PASCAL VOC (“VOC”) ~6K
• Caltech-UCSD Birds (“CUB”) ~6K
• MIT indoor scene (“Scenes”) ~6K
27. Experiment • Two scenario:
• Single new task scenario:
• On new task, LwF outperformed
• LFL, fine-tuning FC, feature extraction
& fine-tuning in most pair
• On old task, performs
• Better than fine-tuning
• Underperforms feature extraction,fine-
tuning FC & LFL
• Multiple new task scenario:
• LwF outperforms all except joint training
29. Extension
of
LwF
• Network expansion
• Adds nodes to some layers
• Allows new-task-specific information to be stored
• Used along with LwF
• Performs better feature extraction
29
30. Limitations
30
• Can`t deal with change in domain
• All new-task data to be present before
computing their old task response
• Learning new task decreases old-task
recovery
We learn new things when new neurons are created for it in our brain
Better the plasticity , better we remember
As time goes, without refreshing these things tends to fade away
This is same in case of ANN
need to do :
Explain about the neural network,The basics
Notes:
if I include lemur identification to the net , it forgets the dog functionality
Forgets the old task
Explain with an example of forgetting with basic neural network
Need to add : forgetting example
One of the well known victim of forgetting is in Image recognition
Explain
paper:
slide 1
Introduction slide 1