Study Meeting Presentation:



Really Quick Intro on Few-Shot Learning

Author: Noel Tay



Date: 2020/08/28 

Getting things in perspective …
Predictor
Train
Dog
Let us limit the discussion to:
- Image classification
- Supervised learning
- Close set recognition
- Requires a huge number of data for each
task
- New task needs to be retrained
- However, humans can learn it effortlessly
Image source from “The CIFAR-10 dataset” (https://www.cs.toronto.edu/~kriz/cifar.html)
Image Source: https://unsplash.com/
Image source from “The CIFAR-10 dataset” (https://www.cs.toronto.edu/~kriz/cifar.html)
Image Source:
https://unsplash.com/
Getting things in perspective …
How many people are there?
What is this place?
Where is this place?
What is the time of day?
What is the temperature?
What is the mood?
Do they practice social distancing?
Do they wear masks?
Humans:
- Can decompose/manipulate
representations
- Accommodate to task
- Don’t need extra training
Data Bias!
3
Image Source: https://unsplash.com/
Getting things in perspective …
To mimic human ability:
- Finding good priors
Blank slate vs. innate behaviors
- Good representations
Learning with the help of ‘unlabeled’ data, such as self-supervised learning
- Transfer learning
Knowledge transfer from one task to the other (For example, improving face
recognition with another model that deals with different expressions)
- Few-shot learning
This is what we will be talking about!
4
Few-Shot Learning
- To classify new data after being given a few
samples
- Extreme case is called one-shot learning
Class 1
Class 2
- It is not to solve insufficient data issue, but to provide an alternative way to
handle little data per class
5
?
Source: https://unsplash.com/
Urban Rural
Truck
Car
Few-Shot Learning
- To quickly switch to new classification task with few samples
6
Image source: https://unsplash.com/
Yellow
Red
Few-Shot Learning
- To quickly switch to new classification task with few samples
7
Image source: https://unsplash.com/
Query set
Support set
Few-Shot Learning
- N-way-K-shot
Truck
Car
2-way-4-shot
?
8
Task
Image source: https://unsplash.com/
Few-Shot Learning
Meta-Learning Framework
- Conventional approach is to train the model using dataset to perform
classification
- Meta-learning is to ‘train’ the model to learn how to use dataset to
perform classification (Learning to Learn)
Class 1
Class 2 ?
Class 1
Class 2
?
9
Meta-Learning
predictor Dog
Train
Testing
10
Conventional
Image source from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
Meta-Learning: Learning to Learn
Testing
predictor
Train
Dog
There are no sample-class binding
11
Each data sample
is a Task
Image source from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
Image source from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
12
Class 1
Class 2
Class 1
Class 2
Class 1
Class 2
Meta-Learning: Classes, samples and labels shuffling
Image source modified from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
Meta-Learning
- Based on similarity
- Matching networks
- Prototypical networks
- Relation networks
- Based on learning algorithm
- Model agnostic meta-learning (MAML)
- Memory augmented neural network
- Based on data
- Bayesian programs
13
Meta-Learning: Based on Similarity
0.08 0.02 0.1 0.8
x x x x
sum
Matching
Network
Prototypical
Network
14
[1] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29, 3630-3638.
[2] Snell, J., Swersky, K., & Zemel, R. S. (2017). Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175.
Image source from original paper [1]
Image source from original paper [2]
Image modified from
original paper [1]
Meta-Learning: Based on Similarity
15
Image source from: https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/
Meta-Learning: Based on Learning Algorithm
Memory Augmented Neural Network (MANN)
Learns the algorithm to store and retrieve memories [1]
16
dog cat dog dog cat
…
NULL
[1] Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016, June). Meta-learning with memory-augmented neural networks. In International conference on machine learning (pp.
1842-1850). PMLR.
Image source from original paper [1]
Image source from original paper [1]
Image source: https://unsplash.com/
w1
w2
Meta-Learning: Based on Learning Algorithm
Model agnostic meta-learning (MAML) [1]
17
Task 1 Task 2
Task 3
Init
[1] Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135). PMLR.
Image source modified from:
https://lilianweng.github.io/lil-log/2018/11/30/meta-learnin
g.html
Image source modified from:
https://lilianweng.github.io/lil-log/2018/11/30/meta-learnin
g.html
w1
w2
Meta-Learning: Based on Learning Algorithm
Model agnostic meta-learning (MAML)
18
Task 1 Task 2
Task 3
data for task1
learning
data for task2
learning
data for task3
learning
w1
w2
Meta-Learning: Based on Learning Algorithm
Model agnostic meta-learning (MAML)
19
Task 1 Task 2
Task 3
data for task1 meta
learning
data for task2 meta
learning
data for task3 meta
learning
w1
w2
Meta-Learning: Based on Learning Algorithm
Model agnostic meta-learning (MAML)
20
Task 1 Task 2
Task 3
Meta-Learning: Based on Data
Modeling through Bayesian Programs
21
…
- Structure of the model contains information
on how the output is created (prior)
- Meta-learning learns a way for various
Bayesian program modules to combine to
express unseen data
- Remember probabilistic programming with
Pyro?
[1] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.
Image source from original paper [1]
22
Consideration (after getting things in perspective…)
- Do I need ‘learning to learn’ or just lack of data
Does my application justify its usage
- Is my dataset sufficient enough
Huge amount of data doesn’t mean sufficient
- What prior knowledge I have
For example: data model, invariance assumption
- Any training constraints I can impose
For example: curriculum learning, multi-loss, feature space constraints

Introduction to Few shot learning

  • 1.
    Study Meeting Presentation:
 
 ReallyQuick Intro on Few-Shot Learning
 Author: Noel Tay
 
 Date: 2020/08/28 

  • 2.
    Getting things inperspective … Predictor Train Dog Let us limit the discussion to: - Image classification - Supervised learning - Close set recognition - Requires a huge number of data for each task - New task needs to be retrained - However, humans can learn it effortlessly Image source from “The CIFAR-10 dataset” (https://www.cs.toronto.edu/~kriz/cifar.html) Image Source: https://unsplash.com/ Image source from “The CIFAR-10 dataset” (https://www.cs.toronto.edu/~kriz/cifar.html) Image Source: https://unsplash.com/
  • 3.
    Getting things inperspective … How many people are there? What is this place? Where is this place? What is the time of day? What is the temperature? What is the mood? Do they practice social distancing? Do they wear masks? Humans: - Can decompose/manipulate representations - Accommodate to task - Don’t need extra training Data Bias! 3 Image Source: https://unsplash.com/
  • 4.
    Getting things inperspective … To mimic human ability: - Finding good priors Blank slate vs. innate behaviors - Good representations Learning with the help of ‘unlabeled’ data, such as self-supervised learning - Transfer learning Knowledge transfer from one task to the other (For example, improving face recognition with another model that deals with different expressions) - Few-shot learning This is what we will be talking about! 4
  • 5.
    Few-Shot Learning - Toclassify new data after being given a few samples - Extreme case is called one-shot learning Class 1 Class 2 - It is not to solve insufficient data issue, but to provide an alternative way to handle little data per class 5 ? Source: https://unsplash.com/
  • 6.
    Urban Rural Truck Car Few-Shot Learning -To quickly switch to new classification task with few samples 6 Image source: https://unsplash.com/
  • 7.
    Yellow Red Few-Shot Learning - Toquickly switch to new classification task with few samples 7 Image source: https://unsplash.com/
  • 8.
    Query set Support set Few-ShotLearning - N-way-K-shot Truck Car 2-way-4-shot ? 8 Task Image source: https://unsplash.com/
  • 9.
    Few-Shot Learning Meta-Learning Framework -Conventional approach is to train the model using dataset to perform classification - Meta-learning is to ‘train’ the model to learn how to use dataset to perform classification (Learning to Learn) Class 1 Class 2 ? Class 1 Class 2 ? 9
  • 10.
    Meta-Learning predictor Dog Train Testing 10 Conventional Image sourcefrom: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
  • 11.
    Meta-Learning: Learning toLearn Testing predictor Train Dog There are no sample-class binding 11 Each data sample is a Task Image source from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html Image source from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
  • 12.
    12 Class 1 Class 2 Class1 Class 2 Class 1 Class 2 Meta-Learning: Classes, samples and labels shuffling Image source modified from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html
  • 13.
    Meta-Learning - Based onsimilarity - Matching networks - Prototypical networks - Relation networks - Based on learning algorithm - Model agnostic meta-learning (MAML) - Memory augmented neural network - Based on data - Bayesian programs 13
  • 14.
    Meta-Learning: Based onSimilarity 0.08 0.02 0.1 0.8 x x x x sum Matching Network Prototypical Network 14 [1] Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29, 3630-3638. [2] Snell, J., Swersky, K., & Zemel, R. S. (2017). Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175. Image source from original paper [1] Image source from original paper [2] Image modified from original paper [1]
  • 15.
    Meta-Learning: Based onSimilarity 15 Image source from: https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/
  • 16.
    Meta-Learning: Based onLearning Algorithm Memory Augmented Neural Network (MANN) Learns the algorithm to store and retrieve memories [1] 16 dog cat dog dog cat … NULL [1] Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016, June). Meta-learning with memory-augmented neural networks. In International conference on machine learning (pp. 1842-1850). PMLR. Image source from original paper [1] Image source from original paper [1] Image source: https://unsplash.com/
  • 17.
    w1 w2 Meta-Learning: Based onLearning Algorithm Model agnostic meta-learning (MAML) [1] 17 Task 1 Task 2 Task 3 Init [1] Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135). PMLR. Image source modified from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learnin g.html Image source modified from: https://lilianweng.github.io/lil-log/2018/11/30/meta-learnin g.html
  • 18.
    w1 w2 Meta-Learning: Based onLearning Algorithm Model agnostic meta-learning (MAML) 18 Task 1 Task 2 Task 3 data for task1 learning data for task2 learning data for task3 learning
  • 19.
    w1 w2 Meta-Learning: Based onLearning Algorithm Model agnostic meta-learning (MAML) 19 Task 1 Task 2 Task 3 data for task1 meta learning data for task2 meta learning data for task3 meta learning
  • 20.
    w1 w2 Meta-Learning: Based onLearning Algorithm Model agnostic meta-learning (MAML) 20 Task 1 Task 2 Task 3
  • 21.
    Meta-Learning: Based onData Modeling through Bayesian Programs 21 … - Structure of the model contains information on how the output is created (prior) - Meta-learning learns a way for various Bayesian program modules to combine to express unseen data - Remember probabilistic programming with Pyro? [1] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338. Image source from original paper [1]
  • 22.
    22 Consideration (after gettingthings in perspective…) - Do I need ‘learning to learn’ or just lack of data Does my application justify its usage - Is my dataset sufficient enough Huge amount of data doesn’t mean sufficient - What prior knowledge I have For example: data model, invariance assumption - Any training constraints I can impose For example: curriculum learning, multi-loss, feature space constraints