Introduction to continual learning
Giang Nguyen
KAIST, 2019 Some parts from Prof. Hwang Sung Ju - KAIST
INTRODUCTION
Continual learning
3
• Continual Learning (CL) is built on the idea of learning continuously and
adaptively about the external world and enabling the autonomous
incremental development of ever more complex skills and knowledge
Ex. From learning alphabet to learning other languages
Continual learning
4
• Humans learn throughout their lives and retain/use the previously
learned knowledge when learning for a new task
Continual learning
5
• Humans become increasingly smarter over time. Couldn’t we build a
similar system that basically learns forever?
Refine means we
adapt the learned
models with new data
 weights of learn models
could change to generalize
the new data
CHALLENGES
New class appear
But its still fine if the new data is car
Challenge
The weights of models could significantly change
to classify the zebra
SOLUTIONS
Continual learning
14
• Expectations of CL
‒ Online learning: learning occurs at every moment
‒ Presence of transfer: able to transfer from previous tasks to new ones
‒ Resistance to catastrophic forgetting
‒ No direct access to previous experience
Challenge for Continual Learning
• We need a balance between adapting to recent data and retaining
knowledge from old data because:
• Too much plasticity leads to the catastrophic forgetting problem
• Too much stability leads to an inability to adapt
15
Like our brain, sometimes we forget something invaluable to memorize valuable knowledge
This how our CL system in our brain is working
This loss function not only care
about loss of Task B but also
performance of Task A
Use of coreset
• In order to mitigate semantic drift problem, VCL includes a small
representative set of data from previously observed tasks.
• Here we assume that some samples really represent all the data
distribution of the previous task
18
Experiments
Benchmarks for CL
20
• Permuted MNIST
21
• Split MNIST
Benchmarks for CL
Take-home messages
23
• CL is very new domain, and its potential is immerse, not already
at its explosion
• The experiments now are on really simple task and not in large
scale
• In Image Captioning, its possible to come with CL, but the way
is so far and depends on state-of-the-art works on CL
Take-home messages

Introduction to continual learning

  • 1.
    Introduction to continuallearning Giang Nguyen KAIST, 2019 Some parts from Prof. Hwang Sung Ju - KAIST
  • 2.
  • 3.
    Continual learning 3 • ContinualLearning (CL) is built on the idea of learning continuously and adaptively about the external world and enabling the autonomous incremental development of ever more complex skills and knowledge Ex. From learning alphabet to learning other languages
  • 4.
    Continual learning 4 • Humanslearn throughout their lives and retain/use the previously learned knowledge when learning for a new task
  • 5.
    Continual learning 5 • Humansbecome increasingly smarter over time. Couldn’t we build a similar system that basically learns forever? Refine means we adapt the learned models with new data  weights of learn models could change to generalize the new data
  • 6.
  • 9.
    New class appear Butits still fine if the new data is car
  • 10.
    Challenge The weights ofmodels could significantly change to classify the zebra
  • 13.
  • 14.
    Continual learning 14 • Expectationsof CL ‒ Online learning: learning occurs at every moment ‒ Presence of transfer: able to transfer from previous tasks to new ones ‒ Resistance to catastrophic forgetting ‒ No direct access to previous experience
  • 15.
    Challenge for ContinualLearning • We need a balance between adapting to recent data and retaining knowledge from old data because: • Too much plasticity leads to the catastrophic forgetting problem • Too much stability leads to an inability to adapt 15 Like our brain, sometimes we forget something invaluable to memorize valuable knowledge
  • 16.
    This how ourCL system in our brain is working
  • 17.
    This loss functionnot only care about loss of Task B but also performance of Task A
  • 18.
    Use of coreset •In order to mitigate semantic drift problem, VCL includes a small representative set of data from previously observed tasks. • Here we assume that some samples really represent all the data distribution of the previous task 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    23 • CL isvery new domain, and its potential is immerse, not already at its explosion • The experiments now are on really simple task and not in large scale • In Image Captioning, its possible to come with CL, but the way is so far and depends on state-of-the-art works on CL Take-home messages

Editor's Notes

  • #19 For each task, the new coreset Ct is produced by selecting new data points from the current task and a selection from the old coreset Ct−1. Then, we update the variational distribution for non-corset data points, meaning the dataset that we remove at the data points in corset Now, we compute the final variational distribution. Here the distribution is computed with the presence of Coreset Ct Lastly, we perform prediction and test input to see the effect of corset.
  • #21 Ok, we are gonna show you quickly on two tasks. First which is going to deal with covariate shift example. These are standard tasks for CL which previous methods have used as benchmark tasks. So this one is call Permuted MNIST task and task number one comprises of just classifying MNIST. Task number 2 applies a fixed permutation to the pixels in each one of these classes. And then you have to categorize them again as zero one though nine. So the statistics have changed according to permutation.
  • #22 The second benchmark task is called Split MNIST. So here is how the wat the benchmark’s task work. Where we have a bunch of different task and different heads to our network for each one of those tasks with a common base to that body, to that network. So the way this works is in the first task, we are going to do 0 vs 1 in MNIST. In task 2, we are going to classify 2 and 3, 4 vs 5 and so on. And the hope is that we can leverage the features we have learned in the first task to do better in these later tasks . Again here is EWC and SI, SI is really good on this task which is really interesting. EWC does not perform well. The VCL does pretty well, and drop off towards the end, but still pretty good.