Continual Learning: why, how, and when

Continual Learning in
Deep Neural Networks
why, how, and when
Gabriele Graffieti
Ph.D. Day DS&C 2021

A tale about machine learning
Once upon a time, a lot of data was collected.
That data was fed into a huge machine learning model.
The model was then able to properly process the data, and when new similar data
arrived at the model, it can correctly handle it.
What’s wrong with this?

A tale about machine learning
Once upon a time, a lot of data was collected.
That data was fed to a huge machine learning model.
The model was then able to properly process the data, and when new similar data
arrived to the model, it can correctly handle it.
What’s wrong with this? Nothing! I just described how machine learning works!
And it works wonderfully!

The antagonists of our tale
● ...a lot of data was collected
○ is it always possible?
○ How to store it?
○ What about training time?
○ What if I don’t want to wait until I collected a lot of data?
○ When some data is a lot? When some data is sufficient?
● ...when new similar data arrived at the model, it can correctly handle it
○ What if dissimilar data arrives at the model?
○ What if I want to adapt the model to new data?
○ What if data changes over time?
○ What if I want to continually train the model?

The main villain of ML: catastrophic forgetting
● Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural
network to completely and abruptly forget previously learned information upon learning new
information.
● This holds for all the ML models that are trained using “greedy” algorithms (e.g. stochastic gradient
descent, CART…)
● The training algorithm optimizes the parameters of the model using the currently available data, past
data and past knowledge are not taken into consideration.
● Heavily related to the stability-plasticity dilemma.

Catastrophic Forgetting: an example
You collect some data from the real world, and after each collection phase you train the model (only on the
lastly collected data) to classify it in two classes:
Data 1 Data 2 Data 3

Catastrophic Forgetting: an example
You collect some data from the real world, and after each collection phase you train the model (only on the
lastly collected data) to classify it in two classes:
Final solution on data 1 Final solution on all data Optimal final solution

ML fails everyday!
● Social network
○ Impossible to train a model with all the available data (too much computation and time needed)
○ Train the model only on last data ≠ train it on whole data
○ Train a model incrementally only on new data (less time/computation required) without forgetting past
knowledge
● Self-driving cars
○ Train the car at the same time it is driving (correction from the driver as signals) - Tesla is doing this!
○ Do not forget past knowledge! If I live in NYC don’t want my car to forget how to drive in the countryside
● Personalized devices
○ Keyboard next word suggestion may learn my writing style without forgetting about “good” writing style
○ Domestic robots can learn to recognize and handle new objects without forgetting how to handle other objects
● Scientific analysis of data
○ A model may be able to merge and reason about different results of different experiments
● Carbon footprint and energy consumption
○ Retraining a model with all the data every time new data arrives is costly both economically and environmentally
○ Extreme case: training complex language models (GPT-3) emit as much CO2
as 5 cars in their entire lifetimes

Continual learning (CL) in a nutshell
We define a continual learning scenario, a scenario when we do not have all
the data at once, but we discover new data as time progresses.
More specifically, the data we discover may not be a good approximation for
the total data distribution.
Other constraints:
● Every time new data arrives the model needs to be updated
● The model update should be fast enough to be used before new data
arrives
● Past knowledge must not be forgotten (at least not catastrophically)
Continual Learning

Continual learning strategies
● Architectural
✅ It works pretty good
❌ Does not scale well
❌ Not biologically plausible
● Regularization
✅ Mathematically sound
❌ Difficult to optimize and implement
● Rehearsal / replay
✅ Straightforward to implement
✅ It works pretty good in most scenarios
❌ Memory
❌ Privacy
❌ Computation

Replay strategy
e(k+1)
e(k)
e(k-1)
e(k-2)
External
memory
Pro:
● Catastrophic forgetting highly
reduced.
● Simple and easy to implement
strategy.
● Memory is cheap and abundant.
Cons:
● Memory is not infinite (the stream
of experience can be infinite).
● What about privacy and private
data?
● Not biologically plausible.
● Computation

Latent replay
Pellegrini, L., Graffieti, G., Lomonaco, V., & Maltoni, D. Latent replay for real-time continual learning. In 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS)
● Only latent activations are
memorized (less memory
footprint)
● Only a portion of the network
needs to be trained with replay
data (less computational
footprint)
● The latent replay layer can be
moved to balance speed and
accuracy

Latent replay
Pellegrini, L., Graffieti, G., Lomonaco, V., & Maltoni, D. Latent replay for real-time continual learning. In 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS)
My talk about it

Training on smartphone devices
Pellegrini, L., Lomonaco, V., Graffieti, G., & Maltoni, D. Continual Learning at the Edge: Real-Time Training on Smartphone Devices.In 2021 European
Symposium on Artificial Neural Networks (ESANN)
Video demo

Generative replay
e(k+1)
e(k)
e(k-1)
e(k-2)
Generative Model
Pro:
● No replay memory is needed.
● More biologically plausible.
● Can also generate unseen or new
plausible data.
● Generative replay can generalize
and possibly yield better results.
Cons:
● How to train the generative
model??
○ The problem is now on the
continual training of the
generator instead of the
classifier.
○ Data quality is the main
issue here

Negative generative replay
● Instead of using bad quality generated data as replay data, use them only as negative example.
○ E.g. do not tell to the model “this image is a cat”; instead tell it “this image is not a dog”
Graffieti, G., Maltoni D., Pellegrini, L. & Lomonaco, V. __________ - Under double blind review at some conference

A continual learning Avalanche
Lomonaco, V., Pellegrini, L., Cossu, A., Carta, A., Graffieti, G., Hayes, T. L., ... & Maltoni, D. Avalanche: an End-to-End Library for Continual Learning. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
https://avalanche.continualai.org

Future work
Continual learning is a very active research area and can be addressed in many directions.
● We are now investigating replay (especially generative replay) and the role of (episodic) memory.
● Bio-inspired models and neuroscience can be the key to solve many issues.
● Gaussian processes (and Bayesian statistics) are being explored as continual learning frameworks.
My thought (not necessarily right)
● We need a “non-greedy” optimization algorithm (SGD and the majority of optimization algorithms for
NN are greedy).
● Memory is a key concept and should be analyzed and explored.
● We need a more “natural” way of learning (learn how to walk before learning how to run).
● Machine learning is the perfect fit if we want to solve a single task. ML is not the perfect fit if we want
to build real intelligence.

Publications
Gabriele Graffieti, Davide Maltoni, Lorenzo Pellegrini, Vincenzo Lomonaco, _________ - Under double blind review at some conference
Guido Borghi, Annalisa Franco, Gabriele Graffieti, Davide Maltoni, Automated Artifact Retouching in Morphed Images with Attention Maps, IEEE Access (2021)
Lorenzo Pellegrini, Vincenzo Lomonaco, Gabriele Graffieti, Davide Maltoni, Continual Learning at the Edge: Real-Time Training on Smartphone Devices.
Proceedings of the 29th European Symposium on Artificial Neural Networks (ESANN), 2021
Gabriele Graffieti, Davide Maltoni. Artifacts-Free Single Image Defogging. Atmosphere 12 (5), 577, 2021.
V Lomonaco, L Pellegrini, A Cossu, A Carta, G Graffieti, et al. Avalanche: an End-to-End Library for Continual Learning. Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR) 2021.
Gabriele Graffieti, Davide Maltoni. Towards Artifacts-free Image Defogging. International Conference on Pattern Recognition (ICPR) 2020.

Publications (cont.)
H Bae, E Brophy, RHM Chan, B Chen, F Feng, G Graffieti, et al. Iros 2019 lifelong robotic vision: Object recognition challenge. IEEE Robotics & Automation
Magazine 27 (2), 11-16, 2020
Lorenzo Pellegrini, Gabriele Graffieti, Vincenzo Lomonaco, Davide Maltoni. Latent replay for real-time continual learning. International Conference on
Intelligent Robots and Systems (IROS) 2020.
Gabriele Graffieti, Lorenzo Pellegrini, Vincenzo Lomonaco, Davide Maltoni. Efficient Continual Learning with Latent Rehearsal. International Conference on
Intelligent Robots and Systems (IROS) 2019.

Competitions
Self-supervised Learning for Next-Generation Industry-level Autonomous Driving (ICCV 2021) - 1st place (5k$ prize)
Lifelong Robotic Vision Challenge (IROS 2019) - 2nd place

Continual Learning: why, how, and when

Continual Learning: why, how, and when

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Continual Learning: why, how, and when

Similar to Continual Learning: why, how, and when (20)

More from Gabriele Graffieti

More from Gabriele Graffieti (8)

Recently uploaded

Recently uploaded (20)

Continual Learning: why, how, and when