Siamese Networks
for One-shot Learning
Masa Kato
1
Contents
Introduction of methods for one shot learning using siamese neural network.
• Signature Verification using a "Siamese” Time Delay Neural Network (1993), NIPS
• Siamese Neural Networks for One-shot Image Recognition (2015), ICML
• Matching Networks for One Shot Learning (2016), NIPS
Propose my idea for matching
2
History of One-shot Learning
Firstly proposed by Fei-Fei et al. (2003); Fei-Fei et al. (2006). They developed a
variational Bayesian framework.
Lake et al. (2013) proposed an algorithm with a method called Hierarchical
Bayesian Programming Learning.
Methods based metric learning were proposed (Koch et al. (2015); Vinyals et al.
(2016)).
Methods based neural network with memory were proposed (Graves et al. (2014);
Santoro et al. (2016)).
 There exist some other general formulations and domain specific researches.
One-shot Object Detection was proposed in Schwartz et al. (2018)
3
Recent Methods for One-shot Learning
using Neural Networks
1. Metric Learning
2. Memory network
Papers:
1. Koch et al. (2015)
2. Graves et al. (2014)
1+2. Vinyals et al. (2016)
The siamese network is often used.
• Siamese nets were first introduces by Bromley et al.
(1993) to solve signature verification as an image
matching problem.
• Koch et al. (2015) proposed Deep Siamese Networks
for one-shot image recognition.
• Vinyals et al. (2016) proposed Matching Nets, which
is a model that incorporated memory network to
Deep Siamese Networks and formulated the task as
classification problem.
• Schwartz et al. (2018) applied existing methods for
One-shot Object Detection.
4
Siamese Network
Siamese network consists
of two identical sub-networks joined at their outputs.
Image A Image B
Layer Layer
Computes metric between A and B
5
More Detail of Basic Structure
Image A
Image B
Same structure and weights
6
Signature Verification
using a “Siamese” Time Delay
Neural Network
• The aim of the project was to make a signature verification system based on the
NCR 5990 Signature Capture Device.
• A signature is 800 sets of 𝑥, 𝑦 and pen up-down points with time 𝑡.
• Preprocess the data before training the network.
7
Bromley et al. (1993)
Performance
8
GA: Genuine signature pairs
• Correct pairs.
FR: Forgery
• Write to deceive.
Classified the signature and detect the
forgery with good performance.
Siamese Neural Networks
for On-shot Image Recognition
• Siamese nets were first introduces by Bromley et al. (1993) to solve signature
verification as an image matching problem.
• Koch et al. (2015) used convolutional deep neural network to extract features of
images before calculating its distance.
9
Koch et al. (2015)
• The model is a siamese convolutional network with 𝐿 layers each with 𝑁𝑙 units, where
ℎ1,𝑙 represents the hidden vector in layer 𝑙 for the first twin, and ℎ2,𝑙 denotes the same
for the second twin.
• ReLU units in the first 𝐿 − 2 layers and sigmoidal units in the remaining layers.
Distance metric
Image A
Image B
Deep Siamese Networks
10
Learning
𝑀: minibatch size.
𝑖: indexes the 𝑖the minibatch.
𝑦 𝑥1
𝑖
, 𝑥2
𝑖
: length- 𝑀 vector which contains the labels for the minibatch.
• 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
= 1 whenever 𝑥1 and 𝑥2 are from the same class.
• 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
= 0 otherwise.
Regularized cross-entropy objective on a binary classifier
ℒ 𝑥1
𝑖
, 𝑥2
𝑖
= 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
log 𝑝 𝑥1
𝑖
, 𝑥2
𝑖
+ 1 − 𝑦 𝑥1
𝑖
, 𝑥2
𝑖
log(1 − 𝑝 𝑥1
𝑖
, 𝑥2
𝑖
) + 𝜆 𝑇
|𝑤|2
11
Dataset
Dataset: Omniglot
1623 characters from 50 different alphabets (40 train, 10 test).
Each of these was hand down by 20 different people.
The number of letters in each alphabet varies considerably from about 15 to
upwards of 40 characters.
12
N-way k-shot learning
This is a problem setting which is often used in one shot learning.
• Pick 𝑁 classes.
• Use 𝑘 training data.
13
Experiments
14
The number
of samples
Data augmentation
use 20 alphabet from 50
(except for previous 30 alphabet)
and 1 data from 20.
use 30 alphabet from 50 and 12
data from 20.
For fine tuning
Matching Networks
for One Shot Learning
Image A Image B
Layer Layer
Computes metric Classification
Layer
Image A
Image A
Image B
Image C
Attention
Matching Networks
for One Shot Learning
Siamese Neural Networks
for On-shot Image Recognition
15
Vinyals et al. (2016)
Concepts
➕ Excellent generalization.
➖ Learning is slow and based on large
datasets, requiring many weight updates
using SGD.
➕ Novel examples to be assimilated.
➖ Some models in this family do not
require any training but performance
depends on the chosen metric.
Incorporate the characteristics from both parametric and non-parametric models
Rapid acquisition of new examples while providing excellent generalization from common
examples.
Parametric models (Deep Learning) Non-parametric models
16
1. Propose Matching Nets, a neural network which uses recent advances in attention and
memory that enable rapid learning.
2. The training procedure is based on a simple machine learning principle: test and train
conditions must match.
Model Architecture
• A neural attention mechanism is defined to access a memory matrix which
stores useful information to solve the task at hand.
𝑘 examples of image-label pairs 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1
𝑘
.
A classifier 𝑐 𝑠( 𝑥) which defines a probability distribution over outputs 𝑦 given a
test example 𝑥.
Define the mapping 𝑺 → 𝒄 𝒔( 𝒙) to be 𝑷( 𝒚| 𝒙, 𝑺)
where 𝑃 is parametrized by a neural network
𝑷( 𝒚| 𝒙, 𝑺)
17
Model Architecture
• The model computes 𝑦 as follows:
𝑦 = ∑𝑖=1
𝑘
𝑎 𝑥, 𝑥𝑖 𝑦𝑖
where 𝑥𝑖, 𝑦𝑖 are the samples and labels from the support set 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1
𝑘
, and 𝑎 is
an attention mechanism which is discussed in the next slide.
If there is only one image, it is
one-shot learning.
𝑺 = {(𝒙𝒊, 𝒚𝒊)}𝒊=𝟏
𝒌
𝒙
𝒚
𝑎 𝑥, 𝑥𝑖
18
Formulation and Learning
The algorithm relies on choosing 𝑎 . , . , the attention mechanism.
The simplest form is to use softmax over the cosine distance 𝑐, i.e.,
𝑎 𝑥, 𝑥𝑖 = 𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑖 )/
𝑗=1
𝑘
𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑗 )
with embedding functions 𝑓 and 𝑔 being approximate neural networks to embed
𝑥 and 𝑥𝑖.
The Attention Kernel
19
𝐿: Possible label sets
• 𝐿 could be the label set {𝑐𝑎𝑡𝑠, 𝑑𝑜𝑔𝑠}.
𝑇: Distribution over 𝐿.This is the train data.
1. Sample 𝐿 from 𝑇.
2. Sample 𝑆 and 𝐵 from 𝐿.
3. Minimize the error predicting the labels in the batch 𝐵 conditioned on the
support set 𝑆.
Definition
Learning Step
Objective Function
𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[
𝑥,𝑦 ∈𝐵
log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]]
Simulate the task of one shot learning only from train data.
20
Experiments
N-way k-shot learning
• Pick 𝑁 unseen character classes, independent of alphabet, as
𝐿.
• Provide the model with one drawing of each of the 𝑁
characters as 𝑆~𝐿 and a batch 𝐵~𝐿.
21
𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[
𝑥,𝑦 ∈𝐵
log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]]
Objective Function
Experiments
• Pixels: Nearest Neighbor.
• Baseline: Using features calculated with CNN, do Nearest Neighbor.
• Convolutional siamese net: “Siamese Neural Networks for One-shot
Image Recognition”.
22
The number of class
References
Slides: https://www.slideshare.net/masa_s/dlmatching-networks-for-one-shot-
learning-71539566
Blog: https://sorenbouma.github.io/blog/oneshot/
Papers:
• Signature Verification using a "Siamese” Time Delay Neural Network (1993),
NIPS
• DeepFace: Closing the Gap to Human-Level Performance in Face Verification
(2014), IEEE
• Siamese Neural Networks for One-shot Image Recognition (2015), ICML
• Matching Networks for One Shot Learning (2016), NIPS
• RepMet: Representative-based metric learning for classification and one-shot
object detection (2018), arXiv
23

Neural netorksmatching

  • 1.
    Siamese Networks for One-shotLearning Masa Kato 1
  • 2.
    Contents Introduction of methodsfor one shot learning using siamese neural network. • Signature Verification using a "Siamese” Time Delay Neural Network (1993), NIPS • Siamese Neural Networks for One-shot Image Recognition (2015), ICML • Matching Networks for One Shot Learning (2016), NIPS Propose my idea for matching 2
  • 3.
    History of One-shotLearning Firstly proposed by Fei-Fei et al. (2003); Fei-Fei et al. (2006). They developed a variational Bayesian framework. Lake et al. (2013) proposed an algorithm with a method called Hierarchical Bayesian Programming Learning. Methods based metric learning were proposed (Koch et al. (2015); Vinyals et al. (2016)). Methods based neural network with memory were proposed (Graves et al. (2014); Santoro et al. (2016)).  There exist some other general formulations and domain specific researches. One-shot Object Detection was proposed in Schwartz et al. (2018) 3
  • 4.
    Recent Methods forOne-shot Learning using Neural Networks 1. Metric Learning 2. Memory network Papers: 1. Koch et al. (2015) 2. Graves et al. (2014) 1+2. Vinyals et al. (2016) The siamese network is often used. • Siamese nets were first introduces by Bromley et al. (1993) to solve signature verification as an image matching problem. • Koch et al. (2015) proposed Deep Siamese Networks for one-shot image recognition. • Vinyals et al. (2016) proposed Matching Nets, which is a model that incorporated memory network to Deep Siamese Networks and formulated the task as classification problem. • Schwartz et al. (2018) applied existing methods for One-shot Object Detection. 4
  • 5.
    Siamese Network Siamese networkconsists of two identical sub-networks joined at their outputs. Image A Image B Layer Layer Computes metric between A and B 5
  • 6.
    More Detail ofBasic Structure Image A Image B Same structure and weights 6
  • 7.
    Signature Verification using a“Siamese” Time Delay Neural Network • The aim of the project was to make a signature verification system based on the NCR 5990 Signature Capture Device. • A signature is 800 sets of 𝑥, 𝑦 and pen up-down points with time 𝑡. • Preprocess the data before training the network. 7 Bromley et al. (1993)
  • 8.
    Performance 8 GA: Genuine signaturepairs • Correct pairs. FR: Forgery • Write to deceive. Classified the signature and detect the forgery with good performance.
  • 9.
    Siamese Neural Networks forOn-shot Image Recognition • Siamese nets were first introduces by Bromley et al. (1993) to solve signature verification as an image matching problem. • Koch et al. (2015) used convolutional deep neural network to extract features of images before calculating its distance. 9 Koch et al. (2015)
  • 10.
    • The modelis a siamese convolutional network with 𝐿 layers each with 𝑁𝑙 units, where ℎ1,𝑙 represents the hidden vector in layer 𝑙 for the first twin, and ℎ2,𝑙 denotes the same for the second twin. • ReLU units in the first 𝐿 − 2 layers and sigmoidal units in the remaining layers. Distance metric Image A Image B Deep Siamese Networks 10
  • 11.
    Learning 𝑀: minibatch size. 𝑖:indexes the 𝑖the minibatch. 𝑦 𝑥1 𝑖 , 𝑥2 𝑖 : length- 𝑀 vector which contains the labels for the minibatch. • 𝑦 𝑥1 𝑖 , 𝑥2 𝑖 = 1 whenever 𝑥1 and 𝑥2 are from the same class. • 𝑦 𝑥1 𝑖 , 𝑥2 𝑖 = 0 otherwise. Regularized cross-entropy objective on a binary classifier ℒ 𝑥1 𝑖 , 𝑥2 𝑖 = 𝑦 𝑥1 𝑖 , 𝑥2 𝑖 log 𝑝 𝑥1 𝑖 , 𝑥2 𝑖 + 1 − 𝑦 𝑥1 𝑖 , 𝑥2 𝑖 log(1 − 𝑝 𝑥1 𝑖 , 𝑥2 𝑖 ) + 𝜆 𝑇 |𝑤|2 11
  • 12.
    Dataset Dataset: Omniglot 1623 charactersfrom 50 different alphabets (40 train, 10 test). Each of these was hand down by 20 different people. The number of letters in each alphabet varies considerably from about 15 to upwards of 40 characters. 12
  • 13.
    N-way k-shot learning Thisis a problem setting which is often used in one shot learning. • Pick 𝑁 classes. • Use 𝑘 training data. 13
  • 14.
    Experiments 14 The number of samples Dataaugmentation use 20 alphabet from 50 (except for previous 30 alphabet) and 1 data from 20. use 30 alphabet from 50 and 12 data from 20. For fine tuning
  • 15.
    Matching Networks for OneShot Learning Image A Image B Layer Layer Computes metric Classification Layer Image A Image A Image B Image C Attention Matching Networks for One Shot Learning Siamese Neural Networks for On-shot Image Recognition 15 Vinyals et al. (2016)
  • 16.
    Concepts ➕ Excellent generalization. ➖Learning is slow and based on large datasets, requiring many weight updates using SGD. ➕ Novel examples to be assimilated. ➖ Some models in this family do not require any training but performance depends on the chosen metric. Incorporate the characteristics from both parametric and non-parametric models Rapid acquisition of new examples while providing excellent generalization from common examples. Parametric models (Deep Learning) Non-parametric models 16 1. Propose Matching Nets, a neural network which uses recent advances in attention and memory that enable rapid learning. 2. The training procedure is based on a simple machine learning principle: test and train conditions must match.
  • 17.
    Model Architecture • Aneural attention mechanism is defined to access a memory matrix which stores useful information to solve the task at hand. 𝑘 examples of image-label pairs 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1 𝑘 . A classifier 𝑐 𝑠( 𝑥) which defines a probability distribution over outputs 𝑦 given a test example 𝑥. Define the mapping 𝑺 → 𝒄 𝒔( 𝒙) to be 𝑷( 𝒚| 𝒙, 𝑺) where 𝑃 is parametrized by a neural network 𝑷( 𝒚| 𝒙, 𝑺) 17
  • 18.
    Model Architecture • Themodel computes 𝑦 as follows: 𝑦 = ∑𝑖=1 𝑘 𝑎 𝑥, 𝑥𝑖 𝑦𝑖 where 𝑥𝑖, 𝑦𝑖 are the samples and labels from the support set 𝑆 = {(𝑥𝑖, 𝑦𝑖)}𝑖=1 𝑘 , and 𝑎 is an attention mechanism which is discussed in the next slide. If there is only one image, it is one-shot learning. 𝑺 = {(𝒙𝒊, 𝒚𝒊)}𝒊=𝟏 𝒌 𝒙 𝒚 𝑎 𝑥, 𝑥𝑖 18
  • 19.
    Formulation and Learning Thealgorithm relies on choosing 𝑎 . , . , the attention mechanism. The simplest form is to use softmax over the cosine distance 𝑐, i.e., 𝑎 𝑥, 𝑥𝑖 = 𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑖 )/ 𝑗=1 𝑘 𝑒 𝑐(𝑓 𝑥 ,𝑔 𝑥 𝑗 ) with embedding functions 𝑓 and 𝑔 being approximate neural networks to embed 𝑥 and 𝑥𝑖. The Attention Kernel 19
  • 20.
    𝐿: Possible labelsets • 𝐿 could be the label set {𝑐𝑎𝑡𝑠, 𝑑𝑜𝑔𝑠}. 𝑇: Distribution over 𝐿.This is the train data. 1. Sample 𝐿 from 𝑇. 2. Sample 𝑆 and 𝐵 from 𝐿. 3. Minimize the error predicting the labels in the batch 𝐵 conditioned on the support set 𝑆. Definition Learning Step Objective Function 𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[ 𝑥,𝑦 ∈𝐵 log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]] Simulate the task of one shot learning only from train data. 20
  • 21.
    Experiments N-way k-shot learning •Pick 𝑁 unseen character classes, independent of alphabet, as 𝐿. • Provide the model with one drawing of each of the 𝑁 characters as 𝑆~𝐿 and a batch 𝐵~𝐿. 21 𝜃 = arg max 𝜃 𝔼 𝐿∼𝑇[𝔼 𝑆∼𝐿,𝐵∼𝐿[ 𝑥,𝑦 ∈𝐵 log 𝑃 𝜃 𝑦 𝑥, 𝑆 ]] Objective Function
  • 22.
    Experiments • Pixels: NearestNeighbor. • Baseline: Using features calculated with CNN, do Nearest Neighbor. • Convolutional siamese net: “Siamese Neural Networks for One-shot Image Recognition”. 22 The number of class
  • 23.
    References Slides: https://www.slideshare.net/masa_s/dlmatching-networks-for-one-shot- learning-71539566 Blog: https://sorenbouma.github.io/blog/oneshot/ Papers: •Signature Verification using a "Siamese” Time Delay Neural Network (1993), NIPS • DeepFace: Closing the Gap to Human-Level Performance in Face Verification (2014), IEEE • Siamese Neural Networks for One-shot Image Recognition (2015), ICML • Matching Networks for One Shot Learning (2016), NIPS • RepMet: Representative-based metric learning for classification and one-shot object detection (2018), arXiv 23