SlideShare a Scribd company logo
Few Shot Learning
Asif Ali
M.E SCUT CHINA
Date: May 31, 2019
Contents
• Introduction
• Problem statement, Why?
• Approaches
– Meta learning
• Matching network
• MAML
– Metric Learning
• Relation Networks
• Prototypical Networks
– AUGMENTAION BASED
• Delta encoder
• Few shot learning through informative retrieval lens
Introduction
• The ability of deep neural networks to extract complex statistics and learn high level features from vast
datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark
contrast to human perception — even a child could recognise a giraffe after seeing a single picture.
• Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc
hack
Can machine learning do better?
Few-shot learning aims to solve these issues
Few shot learning
• Whereas most machine learning based object categorization algorithms require
training on hundreds or thousands of samples/images and very large datasets,
one/FEW-shot learning aims to learn information about object categories from
one, or only a few, training samples/images.
• It is estimated that a child has learned almost all of the 10 ~ 30 thousand object
categories in the world by the age of six. This is due not only to the human mind's
computational power, but also to its ability to synthesize and learn new object
classes from existing information about different, previously learned classes.
Problem statement
Using a large annotated offline dataset,
dog
elephant
monkey
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
knowledge
transfer
lemur
rabbit
mongoose
model
for novel
categories
given task
Online
training
…
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
classifier
for novel
categories
classification
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
detector
for novel
categories
detection
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Problem statement
Using a large annotated offline dataset,
knowledge
transfer
regressor
for novel
categories
Online
training
Offline
trainin
g
perform regression for novel categories,
represented by just a few samples each.
dog
elephant
monkey
…
lemur
rabbit
mongoose
Why work on few-shot learning?
1. It brings the DL closer to real-world business usecases.
• Companies hesitate to spend much time and money on
annotated data for a solution that they may profit.
• Relevant objects are continuously replaced with new ones. DL
has to be agile.
2. It involves a bunch of exciting cutting-edge technologies.
Meta-
learning
methods
Networks
generating
networks
Data
synthesizers
Semantic
metric spaces
Graph neural
networks
Neural Turing
Machines
GANs
Meta-learning
Learn a learning strategy
to adjust well to a new
few-shot learning task
Data augmentation
Synthesize more data from the novel
classes to facilitate the regular
learning
Metric learning
Learn a `semantic` embedding
space using a distance loss
function
Few-
shot
learning Each category is
represented by just a
few examples
Learn to perform
classification,
detection, regression
The n-shot, k-way task
• The ability of a algorithm to perform few-shot learning is typically measured by its
performance on n-shot, k-way tasks. These are run as follows:
1. A model is given a query sample belonging to a new, previously unseen class
2. It is also given a support set, S, consisting of n examples each from k different unseen classes.
3. The algorithm then has to determine which of the support set classes the query sample belongs to
Training a
meta learner
to learn on
each task
Meta-Learning
Standard learning: datadatadata
instances
training a
learner on
the data
model
Meta learning: datadatatasks
mod
el
mod
el
learning
strategy
data knowledge
task-
specific
learner
task-agnostic
specific
classes
training data
target data
task-specific
meta-
learner
datadatatask
data
meta-
learner
New
task
Recurrent meta-learners
Matching Networks in Vinyals et.al., NIPS 2016
Distance-based classification: based on similarity between
the query and support samples in the embedding space
(adaptive metric):
𝑦 =
𝑖
𝑎 𝑥, 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
𝑓, 𝑔 - LSTM embeddings of 𝑥 dependent on the support set S
• Embedding space is class-agnostic
• LSTM attention mechanism adjusts the embedding to the task
to be elaborated later
Concept of episodes: test
conditions in the training.
• N new categories
• M training examples per
category
• one query example in {1..N}
categories.
• Typically, N=5, M=1, 5.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
Optimization as a model for few-shot learning
• META-LEARN LSTM learn a general initialization
of the learner (classifier) network that allows for
quick convergence of training.
Problem: Gradient-based
optimization in high
capacity classifiers requires
many iterative steps over
many examples to perform
well.
Solution: an LSTM-based
meta-learner model to
learn the exact optimization
algorithm to train another
learner neural network
classifier in the few-shot
learning.
Optimizers
Optimize the learner to perform well after fine-tuning on the task data done by
a single (or few) step(s) of Gradient Descent.
MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017
Standard objective (task-specific, for task T):
min
θ
ℒT θ , learned via update θ′
= θ − α ∙ 𝛻θℒT(θ)
Meta-objective (across tasks):
min
θ T~p(ℑ) ℒT θ′ , learned via an update θ ← θ − β𝛻θ T~p(ℑ) ℒT θ′
reprinted from
Li et.al., 2017
Meta-SGD Li et.al., 2017
“Interestingly, the learning process can continue forever, thus enabling life-long learning,
and at any moment, the meta-learner can be applied to learn a learner for any new task.”
Meta-SGD Li et.al., 2017
Render α as a vector of size θ.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Meta-SGD 54.24 / 70.86
Metric Learning
Offline training
datadata
data
instan
ces
deep
embeddi
ng
model
Training: achieve good distributions for offline categories
Inference: Nearest Neightbour in the embedding space
query
semantic
embedding
space
class 3
class 1
class 2
class C
class A
class B
New task data d(q,A)
d(q,C)
d(q,B)
classification
Metric Learning
Relation networks, Sung et.al., CVPR 2018
Use the Siamese Networks principle :
• Concatenate embeddings of query and support samples
• Relation module is trained to produces score 1 for correct class and 0 for others
• Extends to zero-shot learning by replacing support embeddings with semantic features.
replicated from Sung et.al., Learning to
Compare - Relation network for few-shot
learning, CVPR 2018
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
Metric Learning
Matching Networks,Vinyals et.al., NIPS 2016
Objective: maximize the cross-entropy for the non-parametric softmax
classifier (𝑥,𝑦) 𝑙𝑜𝑔𝑃 𝜃 𝑦 𝑥, 𝑆 , with
𝑃 𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆
Each ca
by a sin
Prototypical Networks, Snell et al, 2016:
Each category is represented by it mean sample (prototype).
Objective: maximize the cross-entropy with the prototypes-based
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Prototypical
Networks
49.42 / 68.20
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
Prototypical Networks
• In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class
prototypes to achieve impressive few-shot performance — exceeding Matching Networks
without the complication of FCE. The key assumption is made is that there exists an
embedding in which samples from each class cluster around a single prototypical
representation which is simply the mean of the individual samples
Sample synthesis
Offline stage
datadata
data
instan
ces
train a
synthesizer
sampling from
class distribution
synthesizer
model
data knowledge
On new task data
datafew data
instances
synthesizer
model
novel classes
datadata
many
data
instances
train a
model
task
model
datadataoffline
data
More augmentation approaches
Δ-encoder Schwartz et.al., NeurIPS 2018
• Use a variant of autoencoder to capture the intra-class
difference between two class samples in the latent space.
• Transfer class distributions from training to novel classes.
Encoder
Decoder
𝑍
Sampled
target
Sampled
reference
Sampled
delta
New class
reference
Synthesized
new class
example
Synthesis
Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an
effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
Few shot learning through Information Retrieval lens
Goal: Ranking For classification
We want to classify the points by finding out which class is most similar one
So we are going to rank all the other w.r.t to some similarity measure
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Mean Average Precision
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
PROBLEMS AHEAD
The mean Average Precision is a terrible loss function (for gradient descent purposes)
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019
Thank you

More Related Content

What's hot

Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
Jeong-Gwan Lee
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
Kuppusamy P
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
David Voyles
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
NhatHai Phan
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
UmmeSalmaM1
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
AyanaRukasar
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Po-Chuan Chen
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
Yan Xu
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
Jörgen Sandig
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
inovex GmbH
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
Illia Polosukhin
 

What's hot (20)

Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
META-LEARNING.pptx
META-LEARNING.pptxMETA-LEARNING.pptx
META-LEARNING.pptx
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
Deep Neural Networks (DNN)
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 

Similar to Few shot learning/ one shot learning/ machine learning

Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
Katy Lee
 
ML crash course
ML crash courseML crash course
ML crash course
mikaelhuss
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
MonicaTimber
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
Felipe Prado
 
Machine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning Systems
Clarence Chio
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
Vaibhav Singh
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Hannes Fassold
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
Changhoon Jeong
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
Sunil Chomal
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
MadhuriChandanbatwe
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
Hiroyuki Vincent Yamazaki
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
ASHOK KUMAR
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Akshay Kanchan
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
mlaij
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 

Similar to Few shot learning/ one shot learning/ machine learning (20)

Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
ML crash course
ML crash courseML crash course
ML crash course
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Machine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning Systems
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
 
Deep Meta Learning
Deep Meta Learning Deep Meta Learning
Deep Meta Learning
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
AI & ML in Defence Systems - Sunil Chomal
AI & ML in Defence Systems   - Sunil ChomalAI & ML in Defence Systems   - Sunil Chomal
AI & ML in Defence Systems - Sunil Chomal
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 

Recently uploaded

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 

Recently uploaded (20)

New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 

Few shot learning/ one shot learning/ machine learning

  • 1. Few Shot Learning Asif Ali M.E SCUT CHINA Date: May 31, 2019
  • 2. Contents • Introduction • Problem statement, Why? • Approaches – Meta learning • Matching network • MAML – Metric Learning • Relation Networks • Prototypical Networks – AUGMENTAION BASED • Delta encoder • Few shot learning through informative retrieval lens
  • 3. Introduction • The ability of deep neural networks to extract complex statistics and learn high level features from vast datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark contrast to human perception — even a child could recognise a giraffe after seeing a single picture. • Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc hack Can machine learning do better? Few-shot learning aims to solve these issues
  • 4. Few shot learning • Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of samples/images and very large datasets, one/FEW-shot learning aims to learn information about object categories from one, or only a few, training samples/images. • It is estimated that a child has learned almost all of the 10 ~ 30 thousand object categories in the world by the age of six. This is due not only to the human mind's computational power, but also to its ability to synthesize and learn new object classes from existing information about different, previously learned classes.
  • 5. Problem statement Using a large annotated offline dataset, dog elephant monkey Offline trainin g perform for novel categories, represented by just a few samples each. knowledge transfer lemur rabbit mongoose model for novel categories given task Online training …
  • 6. Problem statement Using a large annotated offline dataset, knowledge transfer classifier for novel categories classification Online training Offline trainin g perform for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 7. Problem statement Using a large annotated offline dataset, knowledge transfer detector for novel categories detection Online training Offline trainin g perform for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 8. Problem statement Using a large annotated offline dataset, knowledge transfer regressor for novel categories Online training Offline trainin g perform regression for novel categories, represented by just a few samples each. dog elephant monkey … lemur rabbit mongoose
  • 9. Why work on few-shot learning? 1. It brings the DL closer to real-world business usecases. • Companies hesitate to spend much time and money on annotated data for a solution that they may profit. • Relevant objects are continuously replaced with new ones. DL has to be agile. 2. It involves a bunch of exciting cutting-edge technologies. Meta- learning methods Networks generating networks Data synthesizers Semantic metric spaces Graph neural networks Neural Turing Machines GANs
  • 10. Meta-learning Learn a learning strategy to adjust well to a new few-shot learning task Data augmentation Synthesize more data from the novel classes to facilitate the regular learning Metric learning Learn a `semantic` embedding space using a distance loss function Few- shot learning Each category is represented by just a few examples Learn to perform classification, detection, regression
  • 11. The n-shot, k-way task • The ability of a algorithm to perform few-shot learning is typically measured by its performance on n-shot, k-way tasks. These are run as follows: 1. A model is given a query sample belonging to a new, previously unseen class 2. It is also given a support set, S, consisting of n examples each from k different unseen classes. 3. The algorithm then has to determine which of the support set classes the query sample belongs to
  • 12. Training a meta learner to learn on each task Meta-Learning Standard learning: datadatadata instances training a learner on the data model Meta learning: datadatatasks mod el mod el learning strategy data knowledge task- specific learner task-agnostic specific classes training data target data task-specific meta- learner datadatatask data meta- learner New task
  • 13. Recurrent meta-learners Matching Networks in Vinyals et.al., NIPS 2016 Distance-based classification: based on similarity between the query and support samples in the embedding space (adaptive metric): 𝑦 = 𝑖 𝑎 𝑥, 𝑥𝑖 𝑦𝑖 , 𝑎 𝑥, 𝑥𝑖 = 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆 𝑓, 𝑔 - LSTM embeddings of 𝑥 dependent on the support set S • Embedding space is class-agnostic • LSTM attention mechanism adjusts the embedding to the task to be elaborated later Concept of episodes: test conditions in the training. • N new categories • M training examples per category • one query example in {1..N} categories. • Typically, N=5, M=1, 5. Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31
  • 14. Optimization as a model for few-shot learning • META-LEARN LSTM learn a general initialization of the learner (classifier) network that allows for quick convergence of training. Problem: Gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well. Solution: an LSTM-based meta-learner model to learn the exact optimization algorithm to train another learner neural network classifier in the few-shot learning.
  • 15. Optimizers Optimize the learner to perform well after fine-tuning on the task data done by a single (or few) step(s) of Gradient Descent. MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017 Standard objective (task-specific, for task T): min θ ℒT θ , learned via update θ′ = θ − α ∙ 𝛻θℒT(θ) Meta-objective (across tasks): min θ T~p(ℑ) ℒT θ′ , learned via an update θ ← θ − β𝛻θ T~p(ℑ) ℒT θ′ reprinted from Li et.al., 2017 Meta-SGD Li et.al., 2017 “Interestingly, the learning process can continue forever, thus enabling life-long learning, and at any moment, the meta-learner can be applied to learn a learner for any new task.” Meta-SGD Li et.al., 2017 Render α as a vector of size θ. Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Meta-SGD 54.24 / 70.86
  • 16. Metric Learning Offline training datadata data instan ces deep embeddi ng model Training: achieve good distributions for offline categories Inference: Nearest Neightbour in the embedding space query semantic embedding space class 3 class 1 class 2 class C class A class B New task data d(q,A) d(q,C) d(q,B) classification
  • 17. Metric Learning Relation networks, Sung et.al., CVPR 2018 Use the Siamese Networks principle : • Concatenate embeddings of query and support samples • Relation module is trained to produces score 1 for correct class and 0 for others • Extends to zero-shot learning by replacing support embeddings with semantic features. replicated from Sung et.al., Learning to Compare - Relation network for few-shot learning, CVPR 2018 Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Relation networks 50.44 / 65.32 Meta-SGD 54.24 / 70.86 LEO 61.76 / 77.59
  • 18. Metric Learning Matching Networks,Vinyals et.al., NIPS 2016 Objective: maximize the cross-entropy for the non-parametric softmax classifier (𝑥,𝑦) 𝑙𝑜𝑔𝑃 𝜃 𝑦 𝑥, 𝑆 , with 𝑃 𝜃 𝑦 𝑥, 𝑆 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑐𝑜𝑠 𝑓 𝑥, 𝑆 , 𝑔 𝑥𝑖, 𝑆 Each ca by a sin Prototypical Networks, Snell et al, 2016: Each category is represented by it mean sample (prototype). Objective: maximize the cross-entropy with the prototypes-based Method miniImageNet classification accuracy 1/5 shot Matching networks 43.56 / 55.31 MAML 48.70 / 63.11 Relation networks 50.44 / 65.32 Prototypical Networks 49.42 / 68.20 Meta-SGD 54.24 / 70.86 LEO 61.76 / 77.59
  • 19. Prototypical Networks • In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class prototypes to achieve impressive few-shot performance — exceeding Matching Networks without the complication of FCE. The key assumption is made is that there exists an embedding in which samples from each class cluster around a single prototypical representation which is simply the mean of the individual samples
  • 20. Sample synthesis Offline stage datadata data instan ces train a synthesizer sampling from class distribution synthesizer model data knowledge On new task data datafew data instances synthesizer model novel classes datadata many data instances train a model task model datadataoffline data
  • 21. More augmentation approaches Δ-encoder Schwartz et.al., NeurIPS 2018 • Use a variant of autoencoder to capture the intra-class difference between two class samples in the latent space. • Transfer class distributions from training to novel classes. Encoder Decoder 𝑍 Sampled target Sampled reference Sampled delta New class reference Synthesized new class example Synthesis Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
  • 22. Few shot learning through Information Retrieval lens Goal: Ranking For classification We want to classify the points by finding out which class is most similar one So we are going to rank all the other w.r.t to some similarity measure Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 23. Mean Average Precision Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 24.
  • 25. PROBLEMS AHEAD The mean Average Precision is a terrible loss function (for gradient descent purposes)
  • 26.
  • 27.
  • 28. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 29. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017. https://arxiv.org/abs/1707.02610
  • 30. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019

Editor's Notes

  1. Meta learning: “learn on other problems how to improve learning for our target problem”
  2. References: Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra, Matching networks for one shot learning. In NIPS 2016 Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap, Meta-Learning with Memory-Augmented Neural Networks, ICML 2016
  3. References: Finn, Chelsea, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017 Li2017 - Z. Li, F. Zhou, F. Chen, and H. Li. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv:1707.09835, 2017
  4. References Sung, Flood & Yang, Yongxin & Zhang, Li & Xiang, Tao & H. S. Torr, Philip & Hospedales, Timothy. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018
  5. Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an effective sample synthesis method for few-shot object recognition', NeurIPS 2018. Chen, Z., Fu, Y., Zhang, Y., Jiang, Y. G., Xue, X., & Sigal, L. (2018). Semantic feature augmentation in few-shot learning. arXiv preprint arXiv:1804.05298