Continual Learning over Small non-i.i.d. Batches of Natural Video Streams

Continual Learning over Small non-i.i.d. Batches
of Natural Video Streams
ContinualAI May Meetup
Gabriele Graﬃeti
gabriele.graffieti@unibo.it
PhD Student
Computer Science and Engineering Department
University of Bologna
Italy
May 29, 2020
Gabriele Graﬃeti CL over Small non-i.i.d. Batches of Video Streams May 29, 2020 1 / 18

Sequential data
A CL ideal application
• A new object is shown to a CL agent.
• The agent acquire a short video of the object.
• Frame extracted from the video constitute one or more small mini-batches, containing a
highly correlated patterns from single class.
Problems
• standard SGD-based optimization does not work well on this setting.
• High forgetting, since a single batch contains only one object.
• In popular benchmarks (CIFAR, ImageNet) instances of the same class are independent.
Its unlikely that an application experiences at time t a set of independent images of the same
class. It’s more realistic to encounter a single object and observe it under diﬀerent poses.

Our setting
CORe50
• 50 classes, 10 categories 5 different objects per category.
• ∼165,000 128×128 RGB-D images.
• for each object there are 11 different video sessions (∼300 frames recorded with a Kinect
2 at 20 fps).
NICv2-391
• New Instances and Classes scenario.
• 391 batches of 300 images each (first batch 10 times larger).
• Only one object per incremental batch.

NICv2-391

AR1*
Convolutionallayers
Head
Data layer
For each batch:
• weights in the online head are zero-init for new
classes and reload from oﬀ-line head for known
classes.
• at the end of the batch weights are consolidated in
the oﬀ-line memory by weighted averaging.
cw[j] =
cw[j] · wpastj + (tw[j] − avg(tw))
wpastj + 1
wpastj =
pastj
curj

AR1*
Convolutionallayers
Head
Data layer
• To control forgetting in the lower layers we use
Synaptic Intelligence (limit update of important
weights).
• SI exploits information made available by SGD,
does not require any further gradient propagation.
• Weight update does not require to store the old
weights.

Results on CORe50 NICv2

Results on CORe50 NICv2
Strat. Run time (m) Data (MB) Params. (MB)
CWR* 21.4 0 0.2
Naive 25.6 0 0
LWF 27.8 0 0
EWC 31.2 0 24.4
AR1* 39.9 0 12.2
DSLDA 79.1 0 0.2
Cumul. 2826.2 5,898.3 0

Improve the results with replay
Convolutionallayers
Head
Data layer Replay memory
• Straightforward technique: just store past
data and repeat them through the
network.
• No need to weight constraints in the
convolutional layers (AR1*free).
• If few images per class are taken, not so
critical impact on memory overhead.

Improve the results with replay

Replay problems
Convolutionallayers
Head
Data layer Replay memory
• Requires extra storage (e.g. for ImageNet,
if we store 20 patterns per class, the total
storage is about 3.8 GB)
• Requires extra forward/backward steps
when mixing new and old patterns more
iterations for epoch.

Latent replay
Advantages:
• Eﬃciency: extra forward and backward
steps take place only in the upper layers.
• Less storage required.
• Activations can be quantized/compressed
with negligible accuracy loss.

Latent replay

Strategy Run time (m) Mem. overhead (MB) Final Accuracy
Naive 25.6 0 + 0 7.13%
CWR* 21.4 0 + 0.2 56.99%
DSLDA 79.1 0 + 0.2 48.02%
AR1* 39.9 0 + 12.4 56.32%
AR1*free (Image) 133.3 75 + 0 77.30%
AR1*free (conv5 4/dw) 41.2 48 + 0 72.23%
AR1*free (pool6) 23.7 5.8 + 0 59.75%
Cumulative 2826.2 ∼6,000 + 0 85.26%

Future works
• Latent Generative Replay.
• Self-training by exploiting temporal coherence.
• Openset classiﬁcation (automatic discovery of new classes).
• Sparse human supervision (active learning).

Thank You!

Bibliography I
Vincenzo Lomonaco, Davide Maltoni
CORe50: a new Dataset and Benchmark for continual Object Recognition
CoRL 2017
Vincenzo Lomonaco, Davide Maltoni, Lorenzo Pellegrini
Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches
CVPR 2020 (Workshop)
Li Zhizhong, Derek Hoiem
Learning without Forgetting
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

Bibliography II
James Kirkpatrick et al.
Overcoming catastrophic forgetting in neural networks
Proceedings of the National Academy of Sciences, 2017
Tyler L. Hayes, Christopher Kanan
Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis
arXiv Preprint, 2019
Lorenzo Pellegrini, Gabriele Graﬃeti, Vincenzo Lomonaco, Davide Maltoni
Latent Replay for Real-Time Continual Learning
arXiv Preprint, 2020

Continual Learning over Small non-i.i.d. Batches of Natural Video Streams

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Continual Learning over Small non-i.i.d. Batches of Natural Video Streams

Similar to Continual Learning over Small non-i.i.d. Batches of Natural Video Streams (20)

More from Gabriele Graffieti

More from Gabriele Graffieti (6)

Recently uploaded

Recently uploaded (20)

Continual Learning over Small non-i.i.d. Batches of Natural Video Streams