1
Click to edit Master title style
On Transfer Learning Techniques for Machine Learning
Assistive Robotics Technology Laboratory
School of Electrical and Computer Engineering
Purdue University, West Lafayette, IN, USA
Debasmit Das
Advisory Committee
C.S. George Lee (Chair)
Stanley Chan
Guang Lin
Guang Cheng
2
Outline
• INTRODUCTION
• DOMAIN ADAPTATION
- Motivation
- Introduction
- Method 1 (Graph Matching)
- Method 2 (Hyper-graph Matching)
- Method 3 (Graph Matching & Representation Learning)
- Summary
• SMALL SAMPLE LEARNING
- Introduction
- Few shot Learning
- Zero shot Learning
- Hypothesis Transfer Learning
• CONCLUSION
Completed Work
Ongoing & Future
Work
3
IntroductionMachine Learning Taxonomy
Machine
Learning
Supervised
Learning
Reinforcement
Learning
Unsupervised
Learning
Learns a function
that maps input
to output given
input-output
pairs as data.
E.g. Classification,
Regression.
Learns some
information about
dataset given input data
without output labels.
E.g. Clustering, Density
Estimation.
Learns to map
desirable action from
state of an agent given
state-action-rewards
as data.
4
IntroductionSupervised Learning (SL)
Collect lot’s of data
and then annotate
Choose your model
depending on task
Train your model
against a loss
Evaluate model on
unseen data
MODEL
Annotate
OR
SVMANN
Optimize cross
entropy/least
squares.
Output
Test samples
5
IntroductionSL v/s Human Learning
• Humans learn very fast compared to machine.
• Human learning allows recognizing new objects
and new domains with very less data.
• Current state-of-the-art models are closed set.
• Transfer Learning can benefit supervised learning.
SL performance increases with more
training data and more complex models.
E.g. deeper neural networks.
[Canziani et al. ISCAS’17]
Evolution of Deep Architectures
Close to human ability ?
Really ?
6
IntroductionTransfer Learning (TL)
• Allows pre-trained machine learning models to be
adapted and applied to new tasks and new
domains.
• New tasks can be novel categories.
• New domain can be a novel variety of the same
categories.
• Automatic Annotation : Reduces human effort
of labelling new domains/tasks.
• Faster Learning : Learning novel tasks from
less data prevents long training time.
• Data Efficiency : In some domains, obtaining
data is cumbersome. E.g. Medical tests, Robotics.
Added Benefits
7
Introduction
Real World Applications
Recognizing rare
novel objects.
Model transfer from
simulation to real world.
[Google AI Blog]
Dealing with changing appearances
and environment.
[Wulfmeier et al. IROS 2017]
8
Transfer Learning Tasks
Transfer Learning
Domain
Adaptation
Small Sample
Learning
Unsupervised
Domain Adaptation
Semi-supervised
Domain Adaptation
Few-shot
Learning
Zero-shot
Learning
(Target Domain
Sparsely labelled)
(Same Categories) (Different Categories)
(Target Domain
Sparsely labelled)
(Target Domain
fully unlabeled)
(Target Domain
fully unlabeled)
Hypothesis
Transfer Learning
(Only source models
available)
Introduction
Completed On-going To DoNot doing
9
Training and Testing conditions Introduction
UDA HTL
FSL ZSL
• Distribution discrepancy between training and testing
conditions.
• Testing data unlabeled but same categories as training. • Base (novel) categories contain models/prototypes
(few labelled data).
• Base categories used as training and novel categories
used for testing.
• Base categories used as training and novel categories
used for testing.
• Base categories contain abundant labelled data.
Novel categories contain few labelled data.
• Base categories used as training and novel categories
used for testing.
• Base (novel) categories contain abundant labelled
(unlabeled) data. Class-level semantic information
available.
10
Unsupervised Domain Adaptation (UDA) Introduction
Results
Minimize distribution discrepancy by
matching graphs/ hyper-graphs.
Domain adaptation with/without
representation learning.
Source
Domain
Target
Domain
Without representation learning
With representation learning
Create maximum margin classifier
Class 2Class 1
Class 3
Sample 1
Sample 2
Sample 3
• Produce better recognition
performance with respect to
global methods.
• Representation learning is
slower but produces better
recognition performance.
• Third order matching produces
better results than second
order matching.
11
Few Shot Learning (FSL) Introduction
Preliminary ResultsInfer the class statistics of the
data-starved novel classes
Use a discriminative low
dimensional space
1
2
3
4
5
Pair-wise distances between
points used as features
Produces more accurate
classification
Produces more dense
feature space
• Produce competitive performance
with respect to previous methods.
• Most important component of
framework not clear. Ablation
studies required.
12
Zero Shot Learning (ZSL) Introduction
Preliminary Results
Structurally match feature
and semantics
Adapt the semantic space
and classification scores
Domain Adaptation
Score calibration
• Produce recognition performance
much better than previous
methods.
• DA step most important
because of better generalization to
novel data.
• Calibration not as effective as DA
because it does not use novel test
data.
13
Hypothesis Transfer Learning (HTL) Introduction
Expected ResultsMatch source models with
target samples
Properly constrain the
correspondence matrix
• Correspondences should be positive to
prevent negative transfer.
• Correspondences should be bounded
to select relevant source models.
• Possibly, explore first and second-order
matching between models and samples.
• Should produce performance
better than no adaptation
baseline and close to oracle
baseline.
• Selected models should have
semantic similarity with the
samples.
14
Introduction
Common theme of using relations/matching and structures to Transfer Learning tasks.
Still each task has uniqueness in the methodology
• Sample to sample relation for UDA.
• Sample to prototype relation for FSL.
• Sample to semantics relation for ZSL.
• Sample to model relation for HTL.
Relations
Sample (Level 1)
Prototype (Level 2)
Semantics (Level 3)
Category (Level 4)
Increasing level of information
Structures
• Structural alignment between domains
for UDA.
• Structural alignment between sample
and semantics for ZSL.
• Structure between samples yield
new representation for FSL.
Unified approach to TL
15
IntroductionImpact beyond TL
UDA HTL
FSL ZSL
Core Idea : Match distribution
Using graphs/hyper-graphs.
Impact : Generative Models,
Anomaly detection.
Core Idea : Discriminative
Low-dimensional space and
generating statistics.
Impact : Discriminative/
Generative Learning.
Core Idea : Adaptive matching
Between features and semantics.
Impact : Media Retrieval,
Description generation.
Core Idea : Match Model
and samples.
Impact : Recommendation
System, Learn-ware.
16
Publications Introduction
Unsupervised Domain Adaptation
Zero Shot Learning
Few Shot Learning
• Debasmit, Das, and C. S. George Lee. "Sample-to-sample correspondence for unsupervised domain adaptation."
Engineering Applications of Artificial Intelligence (EAAI) (73) (2018): 80-91.
• Debasmit, Das, and C. S. George Lee. "Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain
Adaptation." Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2018, pp. 342-352.
• Debasmit, Das, and C. S. George Lee. "Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching,"
Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 3758-3762.
• Debasmit, Das, and C. S. George Lee. "Zero-shot Image Recognition Using Relational Matching, Adaptation and
Calibration," Accepted at the International Joint Conference on Neural Networks (IJCNN), 2019.
• Debasmit, Das, and C. S. George Lee. "A Two-Stage Approach to Few-Shot Learning for Image Recognition." Under
review at the IEEE Transactions on Image Processing (TIP).
17
Click to edit Master title style
Domain Adaptation
18
Motivation
Classical Supervised Learning Setting
Find where
Training and testing
samples from same
distribution !!
19
Classifying a Dog and a Cat
Training Samples Testing Samples
Training and testing
distribution different!!
Domain Adaptation
Required!!
Motivation
20
Domain Adaptation Methods
Non-Deep Methods Deep Methods
• Instance Re-weighting
[Dai et al. ICML’07].
• Parameter Adaptation
[Bruzzone et al. TPAMI’10].
• Feature Transformation
[Fernando et al. ICCV’13]
[Sun et al. AAAI’16].
• Discrepancy Based.
[Long et al. ICML’15]
[Sun et al. ECCV’16]
• Adversarial Based.
[Ganin et al. JMLR’16]
[Tzeng et al. CVPR’17]
Motivation
21
Motivation
Discrepancy Based Methods
Mostly global metrics. Minimizes
statistics of data like
covariance [Sun et al. ECCV’16] or
maximum mean discrepancy
[Long et al. ICML’15].
Local Method
Optimal Transport
[Courty et al. TPAMI’17]. Basically
point-point matching. Using first
order information
might be misleading.
Higher order method
Uses structural
information. Relation
between data is used
to match domains.
22
Representing Correspondences
Correspondence Matrix
1
0
Matching Matrix
Continuous
Relaxation
Introduction
First-order Matching Second-order Matching
Continuous
Relaxation
Continuous
Relaxation
23
Qualitative Comparison
Methods 1st order
Matching
2nd order
matching
3rd order
Matching
Representation
Learning
Method 1
[Das & Lee, EAAI’18]
Method 2
[Das & Lee, ICIP’18]
Method 3
[Das & Lee, ICANN’18]
Yes No
No
No
No
Yes
Yes
Yes Yes
YesYesYes
Introduction
• Each method has an unique optimization.
• Method 3 has additional training stage.
- Method 1 (Graph Matching)
- Method 2 (Hyper-graph Matching)
- Method 3 (Graph Matching &
Representation Learning)
24
Proposed Approach
Construct graphs
from source & target
samples
Find Matching
between sample
graphs
Map source domain
to target domain
Method 1
Debasmit Das and C.S. George Lee, “Sample-to-Sample Correspondence for Unsupervised
Domain Adaptation," Engineering Applications of Artificial Intelligence, Vol. 73, pp. 80-91,
May 2018.
For details refer :
(Graph Matching)
First-order Matching
Second-order
Matching
Class
Regularization
Optimization : Conditional Gradient + Network Simplex
27
Real Data : SURF Features Method 1
Comparison with previous work
CalTech (C)
MNIST (M)
USPS (U)
Amazon (A)
DSLR (D) Webcam (W)
Deep FeaturesSURF Features
28
Proposed Approach
Find Exemplars from
both Domains
Find Matching
between exemplar
hyper-graphs
Map source domain
to target domain
Method 2
Debasmit Das and C.S. George Lee, “Unsupervised Domain Adaptation Using Regularized
Hyper-Graph Matching,” Proceedings of 2018 IEEE International Conference on Image
Processing (ICIP), Athens, Greece, pp. 3758-3762, October 7-10, 2018.
For details refer :
(Hyper-graph Matching)
Optimization : Conditional Gradient + ADMM
Affinity
Propagation
31
Experimental Results Method 2
32
Proposed Approach
Construct graphs
from source &
target representation
Method 3
Find matching
between source &
target representation
Optimize the shared
domain representation
and classifier
Stage 1 Result
Select unlabeled
target samples with
confident outputs
Apply novel large
margin loss on these
samples
Optimize the shared
domain representation
and classifier
Stage 2
DOMAIN
DISCREPNACY
REDUCED
LARGE
MARGIN
CLASSIFIER
Debasmit Das and CS George Lee. “Graph Matching and Pseudo-Label Guided Deep
Unsupervised Domain Adaptation,” Proceedings of 2018 International Conference on Artificial
Neural Networks (ICANN), Rhodes, Greece, pp. 342-352, October 4-7, 2018.
For details refer :
(Graph Matching & Representation Learning)
Optimization : Stochastic Gradient Descent
33
Overall architecture
Stage 1 training Stage 2 training
Method 3
35
Comparison with previous work
MMD [JMLR’12], DANN [JMLR’16], CORAL [ECCV’16], WDGRL [AAAI’18]
(Maximum Mean Discrepancy) (Domain Adversarial) (Correlation alignment) (Wasserstein Distance)
Method 3
36
Additional Results Method 3
Without Adaptation
With Graph Matching
With Graph Matching
& Pseudo-labeling
37
Conclusions
Recognition Performance
Computational Efficiency
Method 1
Method 1
Method 2
Method 2
Method 3
Method 3
• Proposed three methods on
Unsupervised Domain Adaptation.
• Use graph/hyper-graph matching to
minimize domain discrepancy.
• Competitive results on standard DA
datasets for image recognition.
Summary
Impact
• Localized and structure based matching for
data distributions. Extensible to time series
as well.
• Beyond DA to any method requiring
distribution matching.
• Generative Modelling – Use GM loss
instead of KL divergence for GANs.
• Anomaly Detection – Samples with higher
GM losses are anomalies.
38
Click to edit Master title style
Small Sample Learning
39
Small Sample Learning (SSL) Introduction
DA SSL
• Same Task but different domain. • Same Domain but different task.
• Source Task and Target task have
same set of categories.
• Source Task and Target task have
different set of categories.
• Source domain has abundant
data but target domain has
few/zero labelled data.
• Source domain has abundant
data but target domain has
few/zero labelled data.
• Distribution discrepancy less
between source and target task.
• Distribution discrepancy more
between source and target task.
40
Few Shot Learning (FSL)
Feature Space
• Base Categories (source domain) contains
abundant labelled data.
• Novel Categories (target domain) contains
few labelled data.
• Need to extract useful knowledge from
source domain.
• Apply that to recognize novel categories.
41
FSLRelated Work of FSL
Few-shot
Learning
Metric
Learning
Meta
Learning
Generative
approaches
Alternative
approaches
Matching Net
[Vinayls et al. NIPS’16]
Proto Net
[Snell et al. NIPS’17]
LSTM Optimization
[Ravi et al. ICLR’17]
Model Agnositc
[Finn et al. ICML’17]
GAN Hallucination
[Wang et al. CVPR ’18]
Autoencoder
[Schwartz et al. NIPS’18]
Model Regression
[Wang et al. ECCV ’16]
Memory Augmented
[Santoro et al. ICML’16]
[Learn a metric]
[Learn to learn] [Generate data]
42
Challenges of FSL
Curse of dimensionality Uncertain Class Variance Ill sampling of data
Given novel data sample
Unknown prototype location
Given class mean location
σ
Unknown class variance
Sparser feature space
with increasing
dimensionality
FSL
43
Proposed Solution
Find Discriminative low
Dimensional space
Estimate Class Variance
from class mean location
Learn category-agnostic
transformation
Given novel data sample
Unknown prototype location
Given class mean location
σ
Unknown class variance
Transformation
Use relative distances
1
2
3
4
5
FSL
44
Proposed Framework FSL
45
Preliminary Results
MiniImageNet [Vinayls et al. NIPS’16]
Omniglot [Lake et al. CogSci’11]
Datasets
FSL
Comparison with previous work on Omniglot.
Comparison with previous work on MiniImageNet.
46
Zero Shot Learning (ZSL)
Feature Space
Semantic Space
• Base Categories (source domain) contains
abundant labelled data.
• Novel Categories (target domain) contains
unlabeled data.
• However, class level semantic information
available for all categories.
• Need to relate the feature space and space.
47
ZSLRelated Work of ZSL
Zero-shot
Learning
Embedding
Methods
Transductive
approaches
Generative
approaches
Hybrid
approaches
Linear embedding
[Bernardino et al.
ICML’15]
Deep Embedding
[Zhang et al. CVPR’17]
Multiview
[Fu et al. TPAMI’15]
Dictionary Learning
[Kodirov et al. ICCV’15]
Constrained VAE
[Verma et al. CVPR’18]
Feature GAN
[Xian et al. CVPR’18]
Semantic Similarity
[Zhang et al. CVPR ’15]
Convex Combo
[Norouzi et al. ICLR’13]
[Relate feature & semantics ]
[Use unlabeled test data] [Generate data]
[Novel class from old class]
48
Challenges of ZSL
Hubness Domain Shift Seen Class Biasedness
• In the GZSL Setting ,
test data can be from
both seen and
unseen categories.
• Most unseen test data
predicted as seen
categories.
• Initially studied by
Chao et al. ECCV’16.
• Domain shift between
unseen test data and
unseen semantic
embeddings.
• Since unseen test data
not used in training.
• Phenomenon where only
a few candidates
become nearest
neighbor predictions.
• Due to curse of
dimensionality.
• Initially studied by
Radovanovic et al.
JMLR’10.
ZSL
49
Proposed Solution
One-one and pair-wise
regression
Domain Adaptation Calibration
• Need to adapt semantic
embeddings to unseen
test data.
• Use previous DA
approach [Das & Lee
EAAI’18].
• Find correspondences
between semantic
embedding and unseen
test samples.
• Scaled calibration to
reduce scores of seen
classes.
• Implicit reduction of
variance of seen
classes.
• Structural matching
between semantics
and feature.
• Implicit
reduction of
dimensionality.
ZSL
50
Proposed Framework ZSL
51
Preliminary Results
• Animals with Atrributes (AwA2)
[Lampert et al. TPAMI’14
• Pascal & Yahoo (aPY)
[Farhadi et al. CVPR’09]
• Caltech-UCSD Birds (CUB)
[Welinder et al. ‘10]
• Scene Understanding (SUN)
[Patterson et al. CVPR’12]
Datasets
ZSL
Comparison with previous work on four datasets.
52
Hypothesis Transfer Learning (HTL)
Feature Space
• No access to base categories (source domain)
data.
• Only high-level information about source
categories available. E.g. Model parameters, class
prototypes etc.
• Novel Categories (target domain) contains
unlabeled data.
• Need to relate the source domain models and
target domain samples.
HTL
53
HTLRelated Work of HTL
Linear
Combination
Kernel
Method
Feature
selection
[ Orabona et al.
ICRA’09, Tommasi et
al. TPAMI’14]
[ Jie et al. ICCV’11] [ Kurborskij et al. CVIU’17].
Relatively unexplored Topic. Constrained Target Models to
be some combination of source models.
[Linear Models] [Non-linear Models] [Greedy Method]
54
Proposed Direction
• Previous works only consider constant contribution of
source models across target domain.
• Need to consider variable contribution of source model
across target domain.
• Need to find model-to-sample correspondences similar
to sample-to-sample correspondences.
• Need to constrain correspondences to obtain variable
solutions. E.g. sparser correspondences to ensure
redundant contribution of source models on the same
target sample or to prevent negative transfer.
HTL
55
Conclusion
• Justified the importance of transfer learning for machine learning and real
world applications.
• Discussed three methods on unsupervised domain adaptation which
produced competitive results with respect to previous methods.
• Described ongoing work about few/zero shot learning with some
preliminary results. More analyses of the methods required.
• Proposed future work in which would consist of small sample learning in a
more realistic scenario.
• Insight : Common theme of using structure, relations and matching in all
the methods.
56
THANK YOU
Any Questions ?

Preliminary Exam Slides

  • 1.
    1 Click to editMaster title style On Transfer Learning Techniques for Machine Learning Assistive Robotics Technology Laboratory School of Electrical and Computer Engineering Purdue University, West Lafayette, IN, USA Debasmit Das Advisory Committee C.S. George Lee (Chair) Stanley Chan Guang Lin Guang Cheng
  • 2.
    2 Outline • INTRODUCTION • DOMAINADAPTATION - Motivation - Introduction - Method 1 (Graph Matching) - Method 2 (Hyper-graph Matching) - Method 3 (Graph Matching & Representation Learning) - Summary • SMALL SAMPLE LEARNING - Introduction - Few shot Learning - Zero shot Learning - Hypothesis Transfer Learning • CONCLUSION Completed Work Ongoing & Future Work
  • 3.
    3 IntroductionMachine Learning Taxonomy Machine Learning Supervised Learning Reinforcement Learning Unsupervised Learning Learnsa function that maps input to output given input-output pairs as data. E.g. Classification, Regression. Learns some information about dataset given input data without output labels. E.g. Clustering, Density Estimation. Learns to map desirable action from state of an agent given state-action-rewards as data.
  • 4.
    4 IntroductionSupervised Learning (SL) Collectlot’s of data and then annotate Choose your model depending on task Train your model against a loss Evaluate model on unseen data MODEL Annotate OR SVMANN Optimize cross entropy/least squares. Output Test samples
  • 5.
    5 IntroductionSL v/s HumanLearning • Humans learn very fast compared to machine. • Human learning allows recognizing new objects and new domains with very less data. • Current state-of-the-art models are closed set. • Transfer Learning can benefit supervised learning. SL performance increases with more training data and more complex models. E.g. deeper neural networks. [Canziani et al. ISCAS’17] Evolution of Deep Architectures Close to human ability ? Really ?
  • 6.
    6 IntroductionTransfer Learning (TL) •Allows pre-trained machine learning models to be adapted and applied to new tasks and new domains. • New tasks can be novel categories. • New domain can be a novel variety of the same categories. • Automatic Annotation : Reduces human effort of labelling new domains/tasks. • Faster Learning : Learning novel tasks from less data prevents long training time. • Data Efficiency : In some domains, obtaining data is cumbersome. E.g. Medical tests, Robotics. Added Benefits
  • 7.
    7 Introduction Real World Applications Recognizingrare novel objects. Model transfer from simulation to real world. [Google AI Blog] Dealing with changing appearances and environment. [Wulfmeier et al. IROS 2017]
  • 8.
    8 Transfer Learning Tasks TransferLearning Domain Adaptation Small Sample Learning Unsupervised Domain Adaptation Semi-supervised Domain Adaptation Few-shot Learning Zero-shot Learning (Target Domain Sparsely labelled) (Same Categories) (Different Categories) (Target Domain Sparsely labelled) (Target Domain fully unlabeled) (Target Domain fully unlabeled) Hypothesis Transfer Learning (Only source models available) Introduction Completed On-going To DoNot doing
  • 9.
    9 Training and Testingconditions Introduction UDA HTL FSL ZSL • Distribution discrepancy between training and testing conditions. • Testing data unlabeled but same categories as training. • Base (novel) categories contain models/prototypes (few labelled data). • Base categories used as training and novel categories used for testing. • Base categories used as training and novel categories used for testing. • Base categories contain abundant labelled data. Novel categories contain few labelled data. • Base categories used as training and novel categories used for testing. • Base (novel) categories contain abundant labelled (unlabeled) data. Class-level semantic information available.
  • 10.
    10 Unsupervised Domain Adaptation(UDA) Introduction Results Minimize distribution discrepancy by matching graphs/ hyper-graphs. Domain adaptation with/without representation learning. Source Domain Target Domain Without representation learning With representation learning Create maximum margin classifier Class 2Class 1 Class 3 Sample 1 Sample 2 Sample 3 • Produce better recognition performance with respect to global methods. • Representation learning is slower but produces better recognition performance. • Third order matching produces better results than second order matching.
  • 11.
    11 Few Shot Learning(FSL) Introduction Preliminary ResultsInfer the class statistics of the data-starved novel classes Use a discriminative low dimensional space 1 2 3 4 5 Pair-wise distances between points used as features Produces more accurate classification Produces more dense feature space • Produce competitive performance with respect to previous methods. • Most important component of framework not clear. Ablation studies required.
  • 12.
    12 Zero Shot Learning(ZSL) Introduction Preliminary Results Structurally match feature and semantics Adapt the semantic space and classification scores Domain Adaptation Score calibration • Produce recognition performance much better than previous methods. • DA step most important because of better generalization to novel data. • Calibration not as effective as DA because it does not use novel test data.
  • 13.
    13 Hypothesis Transfer Learning(HTL) Introduction Expected ResultsMatch source models with target samples Properly constrain the correspondence matrix • Correspondences should be positive to prevent negative transfer. • Correspondences should be bounded to select relevant source models. • Possibly, explore first and second-order matching between models and samples. • Should produce performance better than no adaptation baseline and close to oracle baseline. • Selected models should have semantic similarity with the samples.
  • 14.
    14 Introduction Common theme ofusing relations/matching and structures to Transfer Learning tasks. Still each task has uniqueness in the methodology • Sample to sample relation for UDA. • Sample to prototype relation for FSL. • Sample to semantics relation for ZSL. • Sample to model relation for HTL. Relations Sample (Level 1) Prototype (Level 2) Semantics (Level 3) Category (Level 4) Increasing level of information Structures • Structural alignment between domains for UDA. • Structural alignment between sample and semantics for ZSL. • Structure between samples yield new representation for FSL. Unified approach to TL
  • 15.
    15 IntroductionImpact beyond TL UDAHTL FSL ZSL Core Idea : Match distribution Using graphs/hyper-graphs. Impact : Generative Models, Anomaly detection. Core Idea : Discriminative Low-dimensional space and generating statistics. Impact : Discriminative/ Generative Learning. Core Idea : Adaptive matching Between features and semantics. Impact : Media Retrieval, Description generation. Core Idea : Match Model and samples. Impact : Recommendation System, Learn-ware.
  • 16.
    16 Publications Introduction Unsupervised DomainAdaptation Zero Shot Learning Few Shot Learning • Debasmit, Das, and C. S. George Lee. "Sample-to-sample correspondence for unsupervised domain adaptation." Engineering Applications of Artificial Intelligence (EAAI) (73) (2018): 80-91. • Debasmit, Das, and C. S. George Lee. "Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain Adaptation." Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2018, pp. 342-352. • Debasmit, Das, and C. S. George Lee. "Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching," Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 3758-3762. • Debasmit, Das, and C. S. George Lee. "Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration," Accepted at the International Joint Conference on Neural Networks (IJCNN), 2019. • Debasmit, Das, and C. S. George Lee. "A Two-Stage Approach to Few-Shot Learning for Image Recognition." Under review at the IEEE Transactions on Image Processing (TIP).
  • 17.
    17 Click to editMaster title style Domain Adaptation
  • 18.
    18 Motivation Classical Supervised LearningSetting Find where Training and testing samples from same distribution !!
  • 19.
    19 Classifying a Dogand a Cat Training Samples Testing Samples Training and testing distribution different!! Domain Adaptation Required!! Motivation
  • 20.
    20 Domain Adaptation Methods Non-DeepMethods Deep Methods • Instance Re-weighting [Dai et al. ICML’07]. • Parameter Adaptation [Bruzzone et al. TPAMI’10]. • Feature Transformation [Fernando et al. ICCV’13] [Sun et al. AAAI’16]. • Discrepancy Based. [Long et al. ICML’15] [Sun et al. ECCV’16] • Adversarial Based. [Ganin et al. JMLR’16] [Tzeng et al. CVPR’17] Motivation
  • 21.
    21 Motivation Discrepancy Based Methods Mostlyglobal metrics. Minimizes statistics of data like covariance [Sun et al. ECCV’16] or maximum mean discrepancy [Long et al. ICML’15]. Local Method Optimal Transport [Courty et al. TPAMI’17]. Basically point-point matching. Using first order information might be misleading. Higher order method Uses structural information. Relation between data is used to match domains.
  • 22.
    22 Representing Correspondences Correspondence Matrix 1 0 MatchingMatrix Continuous Relaxation Introduction First-order Matching Second-order Matching Continuous Relaxation Continuous Relaxation
  • 23.
    23 Qualitative Comparison Methods 1storder Matching 2nd order matching 3rd order Matching Representation Learning Method 1 [Das & Lee, EAAI’18] Method 2 [Das & Lee, ICIP’18] Method 3 [Das & Lee, ICANN’18] Yes No No No No Yes Yes Yes Yes YesYesYes Introduction • Each method has an unique optimization. • Method 3 has additional training stage. - Method 1 (Graph Matching) - Method 2 (Hyper-graph Matching) - Method 3 (Graph Matching & Representation Learning)
  • 24.
    24 Proposed Approach Construct graphs fromsource & target samples Find Matching between sample graphs Map source domain to target domain Method 1 Debasmit Das and C.S. George Lee, “Sample-to-Sample Correspondence for Unsupervised Domain Adaptation," Engineering Applications of Artificial Intelligence, Vol. 73, pp. 80-91, May 2018. For details refer : (Graph Matching) First-order Matching Second-order Matching Class Regularization Optimization : Conditional Gradient + Network Simplex
  • 25.
    27 Real Data :SURF Features Method 1 Comparison with previous work CalTech (C) MNIST (M) USPS (U) Amazon (A) DSLR (D) Webcam (W) Deep FeaturesSURF Features
  • 26.
    28 Proposed Approach Find Exemplarsfrom both Domains Find Matching between exemplar hyper-graphs Map source domain to target domain Method 2 Debasmit Das and C.S. George Lee, “Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching,” Proceedings of 2018 IEEE International Conference on Image Processing (ICIP), Athens, Greece, pp. 3758-3762, October 7-10, 2018. For details refer : (Hyper-graph Matching) Optimization : Conditional Gradient + ADMM Affinity Propagation
  • 27.
  • 28.
    32 Proposed Approach Construct graphs fromsource & target representation Method 3 Find matching between source & target representation Optimize the shared domain representation and classifier Stage 1 Result Select unlabeled target samples with confident outputs Apply novel large margin loss on these samples Optimize the shared domain representation and classifier Stage 2 DOMAIN DISCREPNACY REDUCED LARGE MARGIN CLASSIFIER Debasmit Das and CS George Lee. “Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain Adaptation,” Proceedings of 2018 International Conference on Artificial Neural Networks (ICANN), Rhodes, Greece, pp. 342-352, October 4-7, 2018. For details refer : (Graph Matching & Representation Learning) Optimization : Stochastic Gradient Descent
  • 29.
    33 Overall architecture Stage 1training Stage 2 training Method 3
  • 30.
    35 Comparison with previouswork MMD [JMLR’12], DANN [JMLR’16], CORAL [ECCV’16], WDGRL [AAAI’18] (Maximum Mean Discrepancy) (Domain Adversarial) (Correlation alignment) (Wasserstein Distance) Method 3
  • 31.
    36 Additional Results Method3 Without Adaptation With Graph Matching With Graph Matching & Pseudo-labeling
  • 32.
    37 Conclusions Recognition Performance Computational Efficiency Method1 Method 1 Method 2 Method 2 Method 3 Method 3 • Proposed three methods on Unsupervised Domain Adaptation. • Use graph/hyper-graph matching to minimize domain discrepancy. • Competitive results on standard DA datasets for image recognition. Summary Impact • Localized and structure based matching for data distributions. Extensible to time series as well. • Beyond DA to any method requiring distribution matching. • Generative Modelling – Use GM loss instead of KL divergence for GANs. • Anomaly Detection – Samples with higher GM losses are anomalies.
  • 33.
    38 Click to editMaster title style Small Sample Learning
  • 34.
    39 Small Sample Learning(SSL) Introduction DA SSL • Same Task but different domain. • Same Domain but different task. • Source Task and Target task have same set of categories. • Source Task and Target task have different set of categories. • Source domain has abundant data but target domain has few/zero labelled data. • Source domain has abundant data but target domain has few/zero labelled data. • Distribution discrepancy less between source and target task. • Distribution discrepancy more between source and target task.
  • 35.
    40 Few Shot Learning(FSL) Feature Space • Base Categories (source domain) contains abundant labelled data. • Novel Categories (target domain) contains few labelled data. • Need to extract useful knowledge from source domain. • Apply that to recognize novel categories.
  • 36.
    41 FSLRelated Work ofFSL Few-shot Learning Metric Learning Meta Learning Generative approaches Alternative approaches Matching Net [Vinayls et al. NIPS’16] Proto Net [Snell et al. NIPS’17] LSTM Optimization [Ravi et al. ICLR’17] Model Agnositc [Finn et al. ICML’17] GAN Hallucination [Wang et al. CVPR ’18] Autoencoder [Schwartz et al. NIPS’18] Model Regression [Wang et al. ECCV ’16] Memory Augmented [Santoro et al. ICML’16] [Learn a metric] [Learn to learn] [Generate data]
  • 37.
    42 Challenges of FSL Curseof dimensionality Uncertain Class Variance Ill sampling of data Given novel data sample Unknown prototype location Given class mean location σ Unknown class variance Sparser feature space with increasing dimensionality FSL
  • 38.
    43 Proposed Solution Find Discriminativelow Dimensional space Estimate Class Variance from class mean location Learn category-agnostic transformation Given novel data sample Unknown prototype location Given class mean location σ Unknown class variance Transformation Use relative distances 1 2 3 4 5 FSL
  • 39.
  • 40.
    45 Preliminary Results MiniImageNet [Vinaylset al. NIPS’16] Omniglot [Lake et al. CogSci’11] Datasets FSL Comparison with previous work on Omniglot. Comparison with previous work on MiniImageNet.
  • 41.
    46 Zero Shot Learning(ZSL) Feature Space Semantic Space • Base Categories (source domain) contains abundant labelled data. • Novel Categories (target domain) contains unlabeled data. • However, class level semantic information available for all categories. • Need to relate the feature space and space.
  • 42.
    47 ZSLRelated Work ofZSL Zero-shot Learning Embedding Methods Transductive approaches Generative approaches Hybrid approaches Linear embedding [Bernardino et al. ICML’15] Deep Embedding [Zhang et al. CVPR’17] Multiview [Fu et al. TPAMI’15] Dictionary Learning [Kodirov et al. ICCV’15] Constrained VAE [Verma et al. CVPR’18] Feature GAN [Xian et al. CVPR’18] Semantic Similarity [Zhang et al. CVPR ’15] Convex Combo [Norouzi et al. ICLR’13] [Relate feature & semantics ] [Use unlabeled test data] [Generate data] [Novel class from old class]
  • 43.
    48 Challenges of ZSL HubnessDomain Shift Seen Class Biasedness • In the GZSL Setting , test data can be from both seen and unseen categories. • Most unseen test data predicted as seen categories. • Initially studied by Chao et al. ECCV’16. • Domain shift between unseen test data and unseen semantic embeddings. • Since unseen test data not used in training. • Phenomenon where only a few candidates become nearest neighbor predictions. • Due to curse of dimensionality. • Initially studied by Radovanovic et al. JMLR’10. ZSL
  • 44.
    49 Proposed Solution One-one andpair-wise regression Domain Adaptation Calibration • Need to adapt semantic embeddings to unseen test data. • Use previous DA approach [Das & Lee EAAI’18]. • Find correspondences between semantic embedding and unseen test samples. • Scaled calibration to reduce scores of seen classes. • Implicit reduction of variance of seen classes. • Structural matching between semantics and feature. • Implicit reduction of dimensionality. ZSL
  • 45.
  • 46.
    51 Preliminary Results • Animalswith Atrributes (AwA2) [Lampert et al. TPAMI’14 • Pascal & Yahoo (aPY) [Farhadi et al. CVPR’09] • Caltech-UCSD Birds (CUB) [Welinder et al. ‘10] • Scene Understanding (SUN) [Patterson et al. CVPR’12] Datasets ZSL Comparison with previous work on four datasets.
  • 47.
    52 Hypothesis Transfer Learning(HTL) Feature Space • No access to base categories (source domain) data. • Only high-level information about source categories available. E.g. Model parameters, class prototypes etc. • Novel Categories (target domain) contains unlabeled data. • Need to relate the source domain models and target domain samples. HTL
  • 48.
    53 HTLRelated Work ofHTL Linear Combination Kernel Method Feature selection [ Orabona et al. ICRA’09, Tommasi et al. TPAMI’14] [ Jie et al. ICCV’11] [ Kurborskij et al. CVIU’17]. Relatively unexplored Topic. Constrained Target Models to be some combination of source models. [Linear Models] [Non-linear Models] [Greedy Method]
  • 49.
    54 Proposed Direction • Previousworks only consider constant contribution of source models across target domain. • Need to consider variable contribution of source model across target domain. • Need to find model-to-sample correspondences similar to sample-to-sample correspondences. • Need to constrain correspondences to obtain variable solutions. E.g. sparser correspondences to ensure redundant contribution of source models on the same target sample or to prevent negative transfer. HTL
  • 50.
    55 Conclusion • Justified theimportance of transfer learning for machine learning and real world applications. • Discussed three methods on unsupervised domain adaptation which produced competitive results with respect to previous methods. • Described ongoing work about few/zero shot learning with some preliminary results. More analyses of the methods required. • Proposed future work in which would consist of small sample learning in a more realistic scenario. • Insight : Common theme of using structure, relations and matching in all the methods.
  • 51.