In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
Comparing Incremental Learning Strategies for Convolutional Neural Networks
1. COMPARING INCREMENTAL LEARNING STRATEGIES
FOR
CONVOLUTIONAL NEURAL NETWORKS
Vincenzo Lomonaco & Davide Maltoni
{vincenzo.lomonaco, davide.maltoni}@unibo.it
Department of Computer Science and Engineering – DISI
University of Bologna
2. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
3. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
4. INTRODUCTION – CNNs and Current Limitations
State-of-the-art algorithm for many
tasks in CV, NLP, SR, etc..
Very general and adaptive
Works directy on raw data (no
hand-engineered features required)
Computational demanding
Tricky hyper-parametrization
Applicability in Incremental
Learning Scenario?
6. INTRODUCTION – Incremental learnig: Why?
Constraints:
• Memory: We can’t afford to keep in memory all the batches.
• Computational power: We can’t afford to train our classification model from
scratch after each batch.
𝐵𝑎𝑡𝑐ℎ0 𝐵𝑎𝑡𝑐ℎ1 𝐵𝑎𝑡𝑐ℎ 𝑛
. . .
Initial Batch Incremental Batches
7. INTRODUCTION – Incremental learnig: Why?
Goal:
• Maximize the Accuracy % after each batch
• Going towards a more smooth and natural learning but still using CNNs
𝐵𝑎𝑡𝑐ℎ0 𝐵𝑎𝑡𝑐ℎ1 𝐵𝑎𝑡𝑐ℎ 𝑛
. . .
Initial Batch Incremental Batches
9. INTRODUCTION – Incremental learnig: Why?
𝐵𝑎𝑡𝑐ℎ0 𝐵𝑎𝑡𝑐ℎ1 𝐵𝑎𝑡𝑐ℎ 𝑛
. . .
𝑀0
𝑀1
• We can free the memory
occupied by 𝐵𝑎𝑡𝑐ℎ0 and
get 𝑀1 just by updating
𝑀0 with the new coming
batch
• However, we risk to forget
what we’ve previously
learned
13. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
14. INC. LEARNING STRATEGIES FOR CNNS - Definitions
The different possibilities we explored to deal with an incremental
tuning/learning scenario, can be conveniently framed in three main strategies:
1. Training/tuning an ad hoc CNN architecture suitable for the problem.
2. Using an already trained CNN as a fixed feature extractor in conjunction with
an incremental classifier.
3. Fine-tuning an already trained CNN.
15. INC. LEARNING STRATEGIES FOR CNNS - Instantiations
In our experiments (with focus on image classification) we tested three instantiations of the
aforementioned strategies, respectively:
1. (Ad-hoc arch.) LeNet7
Consists of the classical “LeNet7” proposed by Yan LeCun in 2004. Still competitive on low/medium
scale problems.
2. (CNN-fixed w. inc. Classifier) CaffeNet + SVM
Consists of a pre-trained CNN provided in the Caffe library (“BVLC Reference CaffeNet”, based on
the “AlexNet” architecture; An incremental and linear SVM as Classifier.
3. (CNN-Finetuning) CaffeNet + FT
Consists again of the “BVLC Reference CaffeNet” but instead of using it as a fixed feature extractor the
network is fine-tuned to suit the new task.
16. INC. LEARNING STRATEGIES FOR CNNS - Instantiations
Furthermore, for the “BigBrother” dataset we decided to test an additional pair of strategies:
4. (CNN-fixed w. inc. Classifier) VGG_Face + SVM
Consists of a pre-trained (16-levels) CNN called “VGG Face” which has been trained on a very large
dataset of faces (2,622 Subjects and 2.6M images); Again, a incremental and linear SVM as Classifier.
5. (CNN-Finetuning) VGG_Face + FT
Consists again of the “VGG_Face” CNN but instead of using it as a fixed feature extractor the network
is fine-tuned to suit the new task.
17. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental Learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
18. DATASETS
We were interested in datasets where:
• The objects of interest have been acquired in a number of successive sessions
• The environmental condition can change among the sessions.
We focused on two applicative fields where incremental learning is very relevant
(robotics and biometrics) and chose two datasets respectively:
• iCubWorld28
• BigBrother
21. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental Learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
22. EXPERIMENTS AND RESULTS – Exp. Design
Experiments Policy:
• We trained the models until full convergence on the first batch of data
• We tuned them on the successive incremental batches, trying to balance the trade-off
between accuracy gain and forgetting.
24. EXPERIMENTS AND RESULTS – iCubWorld28 Results
• CaffeNet + SVM has a very
good recognition rate
increment
• CaffeNet + FT is the most
effective
• LeNet7 struggles to learn
complex invariant features
necessary for this problem
26. EXPERIMENTS AND RESULTS – BigBrother Results
• LeNet7 model performs
slightly better than CaffeNet
+ SVM or CaffeNet + FT
• VGG_Face + SVM and
VGG_Face + FT have
impressive performance on
this problem
• VGG_Face + SVM seems to
be the best choice both for
the accuracy and the
stability
28. EXPERIMENTS AND RESULTS – Dealing with Forgetting
• An adjustable learning
rate is significantly more
stable
• A simple thresholding
approach has been used.
• We did not found any
significant difference using
a continuous approach
29. OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental Learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works
30. CONCLUSIONS AND FUTURE WORKS
• When possible (i.e., transfer learning from the same domain), it is preferable to use
CNN as a fixed feature extractor to feed an incremental classifier
• If the features are not optimized, the tuning of low level layers may be preferable
and the learning strength can be used to control forgetting.
• Training a CNN from scratch can be advantageous if the problem patterns (and
feature invariances) are highly specific and a sufficient number of samples are
available.
31. CONCLUSIONS AND FUTURE WORKS
In the near future we plan to extend this work by:
• Performing a more extensive experimental evaluation
• Finding a more principled way to control forgetting and adapting the tuning
parameters to the size (and bias) of each incremental batch.
• Studying real-world applications of semi-supervised incremental learning strategies
for CNNs.
32. COMPARING INCREMENTAL LEARNING STRATEGIES FOR
CONVOLUTIONAL NEURAL NETWORKS
Vincenzo Lomonaco & Davide Maltoni
{vincenzo.lomonaco, davide.maltoni}@unibo.it
Department of Computer Science and Engineering – DISI
University of Bologna
Thank you for your attention.
Any Questions?