In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well on face recognition tasks being able to deal with large occlusions, extremely low resolutions, strong illumination variations, etc. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning.
In this work we compare different incremental learning strategies for CNN-based architectures in the context of face recognition.
CNN-Based Incremental Learning Strategies for Face Recognition
1. CNN-Based Incremental Learning
Strategies for Face Recognition
Vincenzo Lomonaco and Davide Maltoni
Biometric System Lab – DISI - University of Bologna
Vincenzo Lomonaco and Davide Maltoni
DISI - University of Bologna
Emails: {vincenzo.lomonaco, davide.maltoni}@unibo.it
Websites: www.vincenzolomonaco.com, http://bias.csr.unibo.it/maltoni/
Contacts
[1] Franco, A., Maio, D., Maltoni, D., Scienze, C.L., Bologna, U.:The Big Brother Database: Evaluating Face
Recognition in Smart Home Environments. Image (Rochester, N.Y.). 142–150 (2009).
[2] Franco, A., Maio, D., Maltoni, D.: Incremental template updating for face recognition in home environments.
Pattern Recognit. 43, 2891–2903 (2010).
[3] Maltoni, D., Lomonaco,V.: Semi-supervised Tuning from Temporal Coherence.Tech. Report. DISI - Univ. of
Bologna. http://arxiv.org/pdf/1511.03163v3.pdf. 1–14 (2015).
References
In the last decade, Convolutional Neural Networks (CNNs) have shown to
perform incredibly well on face recognition tasks being able to deal with large
occlusions, extremely low resolutions, strong illumination variations, etc.
However, partly because of their complex training and tricky hyper-parameters
tuning, CNNs have been scarcely studied in the context of incremental learning.
In this work we compare different incremental learning strategies for CNN-
based architectures in the context of face recognition.
One possible approach to deal with this incremental scenario is to store all the
previously seen past data, and retrain the model from scratch as soon as a
new batch of data is available. However, this solution is often impractical for
many real world systems where memory and computational resources are
subject to stiff constraints.
A different approach to address this issue, is to update the model based only
on the new available batch of data.
The BigBrother dataset (SETB) [1] has been created starting from 2 DVDs made
commercially available at the end of the 2006 edition of the “Big Brother” reality
show produced for the Italian TV.
It consists of 14,675 (70×70) gray-scale images of faces belonging to 7 subject
often characterized by bad lighting, poor focus, occlusions, and non-frontal
pose.
In addition to the typical training and test sets, it provides an additional large
set of images called “updating set” for incremental learning/tuning purposes
and split in 54 days.
Figure 1. The seven subjects of the Big Brother dataset (SETB).
• Forgetting can be a very detrimental issue: hence, when possible (i.e.,
transfer learning from the same domain), it is preferable to use CNN as a
fixed feature extractor to feed an incremental classifier.
• If the features are not optimized (transfer learning from a different domain),
the tuning of low level layers may be preferable and the learning strength
(i.e., learning rate, number of iteration, etc.) can be used to control
forgetting.
• Training a CNN from scratch can be advantageous if the problem patterns
(and feature invariances) are highly specific and a sufficient number of
samples are available.
Figure 2. Accuracy of the different strategies tested on the SETB of the Big Brother dataset.
Figure 3. The impact of the learning rate on forgetting.
Final Acc. % LeNet7 CaffeNet + SVM VGG + SVM CaffeNet + FT VGG + FT
34 Days Split 82.35% 80.10% 96.96% 73.23% 91.39%
Orig. Days
Split
75.33% 75.13% 96.73% 70.23% 89,58%
Cumulative
Days
90.50% 86.79% 97.65% 84.26% 95,51%
Gain +7.03% +4.97% +0.23% +3.00% +1.81%
Loss -8.15% -6.69% -0.69% -11.03% -4.12%
Table 1. Accuracy gain and loss of the 34 Days split with respect of the Original and Cumulative days split.
Abstract Experiments and results
Big Brother dataset
Incremental Learning Strategies
Conclusions
Pre-trained CNN + SVM
Pre-trained CNN + Finetuning
Ad-hoc CNN trained from scratch
SVM
3
2
1