Comparing Incremental Learning Strategies for Convolutional Neural Networks

COMPARING INCREMENTAL LEARNING STRATEGIES
FOR
CONVOLUTIONAL NEURAL NETWORKS
Vincenzo Lomonaco & Davide Maltoni
{vincenzo.lomonaco, davide.maltoni}@unibo.it
Department of Computer Science and Engineering – DISI
University of Bologna

OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works

INTRODUCTION – CNNs and Current Limitations
State-of-the-art algorithm for many
tasks in CV, NLP, SR, etc..
Very general and adaptive
Works directy on raw data (no
hand-engineered features required)
Computational demanding
Tricky hyper-parametrization
Applicability in Incremental
Learning Scenario?

INTRODUCTION – Incremental learnig: Why?
𝐵𝑎𝑡𝑐ℎ0 𝐵𝑎𝑡𝑐ℎ1 𝐵𝑎𝑡𝑐ℎ 𝑛
. . .
Initial Batch Incremental Batches


Constraints:
• Memory: We can’t afford to keep in memory all the batches.
• Computational power: We can’t afford to train our classification model from
scratch after each batch.
. . .


Goal:
• Maximize the Accuracy % after each batch
• Going towards a more smooth and natural learning but still using CNNs
. . .


. . .
𝑀0

. . .
𝑀0
𝑀1
• We can free the memory
occupied by 𝐵𝑎𝑡𝑐ℎ0 and
get 𝑀1 just by updating
𝑀0 with the new coming
batch
• However, we risk to forget
what we’ve previously
learned

. . .
𝑀0
𝑀1

. . .
𝑀0
𝑀1
. . . 𝑀 𝑛

INC. LEARNING STRATEGIES FOR CNNS - Definitions
The different possibilities we explored to deal with an incremental
tuning/learning scenario, can be conveniently framed in three main strategies:
1. Training/tuning an ad hoc CNN architecture suitable for the problem.
2. Using an already trained CNN as a fixed feature extractor in conjunction with
an incremental classifier.
3. Fine-tuning an already trained CNN.

INC. LEARNING STRATEGIES FOR CNNS - Instantiations
In our experiments (with focus on image classification) we tested three instantiations of the
aforementioned strategies, respectively:
1. (Ad-hoc arch.)  LeNet7
Consists of the classical “LeNet7” proposed by Yan LeCun in 2004. Still competitive on low/medium
scale problems.
2. (CNN-fixed w. inc. Classifier)  CaffeNet + SVM
Consists of a pre-trained CNN provided in the Caffe library (“BVLC Reference CaffeNet”, based on
the “AlexNet” architecture; An incremental and linear SVM as Classifier.
3. (CNN-Finetuning)  CaffeNet + FT
Consists again of the “BVLC Reference CaffeNet” but instead of using it as a fixed feature extractor the
network is fine-tuned to suit the new task.

INC. LEARNING STRATEGIES FOR CNNS - Instantiations
Furthermore, for the “BigBrother” dataset we decided to test an additional pair of strategies:
4. (CNN-fixed w. inc. Classifier)  VGG_Face + SVM
Consists of a pre-trained (16-levels) CNN called “VGG Face” which has been trained on a very large
dataset of faces (2,622 Subjects and 2.6M images); Again, a incremental and linear SVM as Classifier.
5. (CNN-Finetuning)  VGG_Face + FT
Consists again of the “VGG_Face” CNN but instead of using it as a fixed feature extractor the network
is fine-tuned to suit the new task.

OUTLINE
1. Introduction
• CNNs and current limitations
• Incremental learning: Why?
2. Incremental Learning Strategies
for CNNs
• Definitions
• Possible instantiations used during
the experimentations
3. Datasets
• iCubWorld28
• BigBrother
4. Experiments and Results
• Exp. design
• Results analysis
5. Conclusions and Future Works

DATASETS
We were interested in datasets where:
• The objects of interest have been acquired in a number of successive sessions
• The environmental condition can change among the sessions.
We focused on two applicative fields where incremental learning is very relevant
(robotics and biometrics) and chose two datasets respectively:
• iCubWorld28
• BigBrother

DATASETS – iCubWorld28
Key Features:
• Img size: 128×128
• Num. classes: 7 (× 4 obj)
• Tot. imgs: 39,693
• Num batches: 9 +1 (test)

DATASETS – BigBrother
Key Features:
• Img size: 70×70
• Num. classes: 7
• Tot. imgs: 23,842
• Num batches: 56 +1 (test)

EXPERIMENTS AND RESULTS – Exp. Design
Experiments Policy:
• We trained the models until full convergence on the first batch of data
• We tuned them on the successive incremental batches, trying to balance the trade-off
between accuracy gain and forgetting.

EXPERIMENTS AND RESULTS – iCubWorld28 Results

EXPERIMENTS AND RESULTS – iCubWorld28 Results
• CaffeNet + SVM has a very
good recognition rate
increment
• CaffeNet + FT is the most
effective
• LeNet7 struggles to learn
complex invariant features
necessary for this problem

EXPERIMENTS AND RESULTS – BigBrother Results

EXPERIMENTS AND RESULTS – BigBrother Results
• LeNet7 model performs
slightly better than CaffeNet
+ SVM or CaffeNet + FT
• VGG_Face + SVM and
VGG_Face + FT have
impressive performance on
this problem
• VGG_Face + SVM seems to
be the best choice both for
the accuracy and the
stability

EXPERIMENTS AND RESULTS – Dealing with Forgetting

EXPERIMENTS AND RESULTS – Dealing with Forgetting
• An adjustable learning
rate is significantly more
stable
• A simple thresholding
approach has been used.
• We did not found any
significant difference using
a continuous approach

CONCLUSIONS AND FUTURE WORKS
• When possible (i.e., transfer learning from the same domain), it is preferable to use
CNN as a fixed feature extractor to feed an incremental classifier
• If the features are not optimized, the tuning of low level layers may be preferable
and the learning strength can be used to control forgetting.
• Training a CNN from scratch can be advantageous if the problem patterns (and
feature invariances) are highly specific and a sufficient number of samples are
available.

CONCLUSIONS AND FUTURE WORKS
In the near future we plan to extend this work by:
• Performing a more extensive experimental evaluation
• Finding a more principled way to control forgetting and adapting the tuning
parameters to the size (and bias) of each incremental batch.
• Studying real-world applications of semi-supervised incremental learning strategies
for CNNs.

COMPARING INCREMENTAL LEARNING STRATEGIES FOR
CONVOLUTIONAL NEURAL NETWORKS
Vincenzo Lomonaco & Davide Maltoni
{vincenzo.lomonaco, davide.maltoni}@unibo.it
Department of Computer Science and Engineering – DISI
University of Bologna
Thank you for your attention.
Any Questions?

Comparing Incremental Learning Strategies for Convolutional Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Comparing Incremental Learning Strategies for Convolutional Neural Networks

Similar to Comparing Incremental Learning Strategies for Convolutional Neural Networks (20)

More from Vincenzo Lomonaco

More from Vincenzo Lomonaco (19)

Recently uploaded

Recently uploaded (20)

Comparing Incremental Learning Strategies for Convolutional Neural Networks