SlideShare a Scribd company logo
1 of 42
A Survey on
      Transfer Learning
               Sinno Jialin Pan
  Department of Computer Science and Engineering
The Hong Kong University of Science and Technology
        Joint work with Prof. Qiang Yang
Transfer Learning? (DARPA 05)
Transfer Learning (TL):
    The ability of a system to recognize and apply knowledge and
    skills learned in previous tasks to novel tasks (in new domains)



 It is motivated by human learning. People can often transfer knowledge
 learnt previously to novel situations
  Chess  Checkers
  Mathematics  Computer Science
  Table Tennis  Tennis
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Traditional ML vs. TL
                                               (P. Langley 06)
                        Traditional ML in                                          Transfer of learning
                        multiple domains                                             across domains
training items




                                                                  training items




                                                                                                               test items
                                                     test items




                 Humans can learn in many domains.                         Humans can also transfer from one
                                                                           domain to other domains.
Traditional ML vs. TL
           Learning Process of                       Learning Process of
             Traditional ML                           Transfer Learning

              training items                           training items




Learning System          Learning System   Learning System




                                            Knowledge                   Learning System
Notation
Domain:
It consists of two components: A feature space     , a marginal distribution


In general, if two domains are different, then they may have different feature spaces
or different marginal distributions.

Task:
Given a specific domain and label space      , for each   in the domain, to
predict its corresponding label

In general, if two tasks are different, then they may have different label spaces or
different conditional distributions
Notation
For simplicity, we only consider at most two domains and two tasks.

Source domain:

Task in the source domain:

Target domain:

Task in the target domain
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Why Transfer Learning?
 In some domains, labeled data are in short supply.
 In some domains, the calibration effort is very expensive.
 In some domains, the learning process is time consuming.


  How to extract knowledge learnt from related domains to help
  learning in a target domain with a few labeled data?
  How to extract knowledge learnt from related domains to speed up
  learning in a target domain?


          Transfer learning techniques may help!
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Settings of Transfer Learning
  Transfer learning settings     Labeled data in   Labeled data in      Tasks
                                 a source domain   a target domain
  Inductive Transfer Learning
                                       ×                 √           Classification
                                                                      Regression
                                       √                 √                …
Transductive Transfer Learning
                                       √                 ×           Classification
                                                                      Regression
                                                                          …
Unsupervised Transfer Learning                                        Clustering
                                       ×                 ×
                                                                         …
An overview of
various settings of                                                                                               Self-taught
                                                                            Case 1
 transfer learning                                                                                                 Learning
                                                 No labeled data in a source domain


                                Inductive Transfer
                                    Learning
                                          Labeled data are available in a source domain
   Labeled data are available
      in a target domain                                                                       Source and         Multi-task
                                                                                             target tasks are
                                                                            Case 2                learnt          Learning
                                                                                             simultaneously

Transfer
Learning
                            Labeled data are
                           available only in a                                                Assumption:
                             source domain                 Transductive                         different          Domain
                                                                                              domains but
                                                         Transfer Learning                     single task        Adaptation
  No labeled data in
   both source and
    target domain                                                                     Assumption: single domain
                                                                                           and single task



                            Unsupervised                                                           Sample Selection Bias
                          Transfer Learning                                                          /Covariance Shift
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Approaches to Transfer Learning
Transfer learning approaches                        Description
        Instance-transfer             To re-weight some labeled data in a source
                                          domain for use in the target domain
 Feature-representation-transfer   Find a “good” feature representation that reduces
                                   difference between a source and a target domain
                                             or minimizes error of models
         Model-transfer             Discover shared parameters or priors of models
                                    between a source domain and a target domain
 Relational-knowledge-transfer     Build mapping of relational knowledge between a
                                          source domain and a target domain.
Approaches to Transfer Learning
                              Inductive         Transductive        Unsupervised
                          Transfer Learning   Transfer Learning   Transfer Learning

   Instance-transfer             √                   √
Feature-representation-          √                   √                   √
       transfer
    Model-transfer               √
Relational-knowledge-            √
       transfer
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Inductive Transfer Learning
               Instance-transfer Approaches
• Assumption: the source domain and target domain data use
  exactly the same features and labels.

• Motivation: Although the source domain data can not be
  reused directly, there are some parts of the data that can still be
  reused by re-weighting.

• Main Idea: Discriminatively adjust weighs of data in the
  source domain for use in the target domain.
Inductive Transfer Learning
                         --- Instance-transfer Approaches
                             Non-standard SVMs
                                 [Wu and Dietterich ICML-04]

                       Uniform weights                 Correct the decision boundary by re-weighting




Loss function on the                     Loss function on the
target domain data                       source domain data                       Regularization term


   Differentiate the cost for misclassification of the target and source data
Inductive Transfer Learning
    --- Instance-transfer Approaches
                          TrAdaBoost
                              [Dai et al. ICML-07]

           Hedge (  )                           AdaBoost
      [Freund et al. 1997]                 [Freund et al. 1997]
  To decrease the weights                     To increase the weights of
  of the misclassified data                   the misclassified data



                                                             The whole
                    Source domain                         training data set
                                         target domain
                     labeled data         labeled data


                                           Classifiers trained on
                                          re-weighted labeled data

                                Target domain
                                unlabeled data
Inductive Transfer Learning
     Feature-representation-transfer Approaches
         Supervised Feature Construction
                  [Argyriou et al. NIPS-06, NIPS-07]

Assumption: If t tasks are related to each other, then they may
share some common features which can benefit for all tasks.

Input: t tasks, each of them has its own training data.

Output: Common features learnt across t tasks and t models for t
tasks, respectively.
Supervised Feature Construction
                    [Argyriou et al. NIPS-06, NIPS-07]




        Average of the empirical
        error across t tasks                            Regularization to make the
                                                        representation sparse

                               Orthogonal Constraints


where
Inductive Transfer Learning
     Feature-representation-transfer Approaches
        Unsupervised Feature Construction
                         [Raina et al. ICML-07]

Three steps:
1.   Applying sparse coding [Lee et al. NIPS-07] algorithm to learn
     higher-level representation from unlabeled data in the source
     domain.

2.   Transforming the target data to new representations by new bases
     learnt in the first step.

3.   Traditional discriminative models can be applied on new
     representations of the target data with corresponding labels.
Unsupervised Feature Construction
                        [Raina et al. ICML-07]

Step1:




Input: Source domain data X S  {xS } and coefficient 
                                         i

Output: New representations of the source domain data AS  {aS }    i

       and new bases B  {bi }

Step2:



Input: Target domain data XT  {xT } , coefficient  and bases B  {bi }
                                     i


Output: New representations of the target domain data AT  {aT }i
Inductive Transfer Learning
                     Model-transfer Approaches
               Regularization-based Method
                          [Evgeiou and Pontil, KDD-04]

Assumption: If t tasks are related to each other, then they may share some
parameters among individual models.

Assume ft  wt  x be a hyper-plane for task , where t {T , S} and




               Common part                       Specific part for individual task

                                                                                 Regularization terms
Encode them into SVMs:                                                            for multiple tasks
Inductive Transfer Learning
       Relational-knowledge-transfer Approaches
                               TAMAR
                        [Mihalkova et al. AAAI-07]
Assumption: If the target domain and source domain are related, then there
may be some relationship between domains being similar, which can be used for
transfer learning

Input:
1. Relational data in the source domain and a statistical relational
     model, Markov Logic Network (MLN), which has been learnt in the source
     domain.
2. Relational data in the target domain.

Output: A new statistical relational model, MLN, in the target domain.

Goal: To learn a MLN in the target domain more efficiently and effectively.
TAMAR [Mihalkova et al. AAAI-07]
Two Stages:
1. Predicate Mapping
   – Establish the mapping between predicates in the source
     and target domain. Once a mapping is established, clauses
     from the source domain can be translated into the target
     domain.
2. Revising the Mapped Structure
   – The clauses mapping from the source domain directly
     may not be completely accurate and may need to be
     revised, augmented , and re-weighted in order to properly
     model the target data.
TAMAR [Mihalkova et al. AAAI-07]
    Source domain (academic domain)            Target domain (movie domain)



               AdvisedBy                                 WorkedFor
Student (B)                   Professor (A)   Actor(A)               Director(B)


      Publication         Publication             MovieMember    MovieMember


              Paper (T)                                  Movie(M)




Mapping




Revising
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Transductive Transfer Learning
                  Instance-transfer Approaches
     Sample Selection Bias / Covariance Shift
                  [Zadrozny ICML-04, Schwaighofer JSPI-00]
Input: A lot of labeled data in the source domain and no labeled data in the
target domain.

Output: Models for use in the target domain data.

Assumption: The source domain and target domain are the same. In addition,
P(YS | X S ) and P(YT | X T ) are the same while P( X S ) and P( X T ) may be
different causing by different sampling process (training data and test data).

Main Idea: Re-weighting (important sampling) the source domain data.
Sample Selection Bias/Covariance Shift
To correct sample selection bias:




                          weights for source
                            domain data



How to estimate        ?
One straightforward solution is to estimate P( X S ) and P( X T ) ,
respectively. However, estimating density function is a hard problem.
Sample Selection Bias/Covariance Shift
              Kernel Mean Match (KMM)
                         [Huang et al. NIPS 2006]

Main Idea: KMM tries to estimate              directly instead of estimating
density function.

It can be proved that can be estimated by solving the following quadratic
programming (QP) optimization problem.
                                                      To match means between
                                                  training and test data in a RKHS




Theoretical Support: Maximum Mean Discrepancy (MMD) [Borgwardt et al.
BIOINFOMATICS-06]. The distance of distributions can be measured
by Euclid distance of their mean vectors in a RKHS.
Transductive Transfer Learning
      Feature-representation-transfer Approaches
                      Domain Adaptation
 [Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07]
Assumption: Single task across domains, which means P(YS | X S )and P(YT | X T )
are the same while P( X S ) and P( X T ) may be different causing by feature
representations across domains.

Main Idea: Find a “good” feature representation that reduce the “distance”
between domains.

Input: A lot of labeled data in the source domain and only unlabeled data in the
target domain.

Output: A common representation between source domain data and target
domain data and a model on the new representation for use in the target domain.
Domain Adaptation
  Structural Correspondence Learning (SCL)
   [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05]

Motivation: If two domains are related to each other, then there may exist
some “pivot” features across both domain. Pivot features are features that
behave in the same way for discriminative learning in both domains.

Main Idea: To identify correspondences among features from different
domains by modeling their correlations with pivot features. Non-pivot features
form different domains that are correlated with many of the same pivot
features are assumed to correspond, and they are treated similarly in a
discriminative learner.
SCL
[Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-
                                   05]
                                               a) Heuristically choose m pivot
                                               features, which is task specific.
                                               b) Transform each vector of pivot
                                               feature to a vector of binary
                                               values and then create
                                               corresponding prediction problem.


                                                   Learn parameters of each
                                                   prediction problem


                                                   Do Eigen Decomposition
                                                   on the matrix of
                                                   parameters and learn the
                                                   linear mapping function.


                                          Use the learnt mapping function to
                                          construct new features and train
                                          classifiers onto the new representations.
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning
   Inductive Transfer Learning
   Transductive Transfer Learning
   Unsupervised Transfer Learning
Unsupervised Transfer Learning
      Feature-representation-transfer Approaches
              Self-taught Clustering (STC)
                            [Dai et al. ICML-08]

Input: A lot of unlabeled data in a source domain and a few unlabeled data in a
target domain.

Goal: Clustering the target domain data.

Assumption: The source domain and target domain data share some common
features, which can help clustering in the target domain.

Main Idea: To extend the information theoretic co-clustering algorithm
[Dhillon et al. KDD-03] for transfer learning.
Self-taught Clustering (STC)
                                 [Dai et al. ICML-08]
                                 Common features



                                                         Target domain data



Source domain data

                                                                          Co-clustering in the
                                                                            source domain
Objective function that need to be minimized


      Co-clustering in the target domain

where

                                     Cluster functions
Output
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Negative Transfer
 Most approaches to transfer learning assume transferring knowledge across
  domains be always positive.

 However, in some cases, when two tasks are too dissimilar, brute-force
  transfer may even hurt the performance of the target task, which is called
  negative transfer [Rosenstein et al NIPS-05 Workshop].

 Some researchers have studied how to measure relatedness among tasks
  [Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03].

 How to design a mechanism to avoid negative transfer needs to be studied
  theoretically.
Outline
 Traditional Machine Learning vs. Transfer Learning

 Why Transfer Learning?

 Settings of Transfer Learning

 Approaches to Transfer Learning

 Negative Transfer

 Conclusion
Conclusion
                              Inductive         Transductive        Unsupervised
                          Transfer Learning   Transfer Learning   Transfer Learning

   Instance-transfer             √                   √
Feature-representation-          √                   √                   √
       transfer
    Model-transfer               √
Relational-knowledge-            √
       transfer




How to avoid negative transfer need to be attracted more attention!

More Related Content

What's hot

IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET Journal
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1arthi v
 
Statistically adaptive learning for a general class of..
Statistically adaptive learning for a general class of..Statistically adaptive learning for a general class of..
Statistically adaptive learning for a general class of..Accenture
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중datasciencekorea
 
Simulation of Language Acquisition Walter Daelemans
Simulation of Language Acquisition Walter DaelemansSimulation of Language Acquisition Walter Daelemans
Simulation of Language Acquisition Walter Daelemansbutest
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clusteringChinnapat Kaewchinporn
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLVijaySharma802
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionIJCSIS Research Publications
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmIDES Editor
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...ijnlc
 
FIR Filter Implementation by Systolization using DA-based Decomposition
FIR Filter Implementation by Systolization using DA-based DecompositionFIR Filter Implementation by Systolization using DA-based Decomposition
FIR Filter Implementation by Systolization using DA-based DecompositionIDES Editor
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationSoftwarePractice
 
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceDeep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceBill Liu
 
Dynamic framed slotted aloha algorithms using fast tag estimation
Dynamic framed slotted aloha algorithms using fast tag estimationDynamic framed slotted aloha algorithms using fast tag estimation
Dynamic framed slotted aloha algorithms using fast tag estimationambitlick
 

What's hot (19)

IRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based SteganographyIRJET- Wavelet Transform based Steganography
IRJET- Wavelet Transform based Steganography
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1
 
Statistically adaptive learning for a general class of..
Statistically adaptive learning for a general class of..Statistically adaptive learning for a general class of..
Statistically adaptive learning for a general class of..
 
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
 
Simulation of Language Acquisition Walter Daelemans
Simulation of Language Acquisition Walter DaelemansSimulation of Language Acquisition Walter Daelemans
Simulation of Language Acquisition Walter Daelemans
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clustering
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Meta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIMLMeta Pseudo Label - InsideAIML
Meta Pseudo Label - InsideAIML
 
A Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware DetectionA Survey of Deep Learning Algorithms for Malware Detection
A Survey of Deep Learning Algorithms for Malware Detection
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
FPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT AlgorithmFPGA Based Design of High Performance Decimator using DALUT Algorithm
FPGA Based Design of High Performance Decimator using DALUT Algorithm
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
FIR Filter Implementation by Systolization using DA-based Decomposition
FIR Filter Implementation by Systolization using DA-based DecompositionFIR Filter Implementation by Systolization using DA-based Decomposition
FIR Filter Implementation by Systolization using DA-based Decomposition
 
Cd31527532
Cd31527532Cd31527532
Cd31527532
 
libHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservationlibHPC: Software sustainability and reuse through metadata preservation
libHPC: Software sustainability and reuse through metadata preservation
 
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT ConferenceDeep Learning for Speech Recognition in Cortana at AI NEXT Conference
Deep Learning for Speech Recognition in Cortana at AI NEXT Conference
 
Dynamic framed slotted aloha algorithms using fast tag estimation
Dynamic framed slotted aloha algorithms using fast tag estimationDynamic framed slotted aloha algorithms using fast tag estimation
Dynamic framed slotted aloha algorithms using fast tag estimation
 

Similar to [ppt]

Presentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarPresentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarJacob Molenaar
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
Kma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typeKma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typegharawi
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrellzukun
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
 
An adaptive Multi-Agent based Architecture for Engineering Education
An adaptive Multi-Agent based Architecture for Engineering EducationAn adaptive Multi-Agent based Architecture for Engineering Education
An adaptive Multi-Agent based Architecture for Engineering EducationMiguel R. Artacho
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya
 
Introduction to OOP concepts
Introduction to OOP conceptsIntroduction to OOP concepts
Introduction to OOP conceptsAhmed Farag
 
Some thoughts on large scale data classification
Some thoughts on large scale data classificationSome thoughts on large scale data classification
Some thoughts on large scale data classificationXiang Li
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Universitat Politècnica de Catalunya
 
Analysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersAnalysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersharshavardhan814108
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 

Similar to [ppt] (20)

Presentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob MolenaarPresentation Online Educa 2011 Jacob Molenaar
Presentation Online Educa 2011 Jacob Molenaar
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Kma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_typeKma week 6_knowledge_transfer_type
Kma week 6_knowledge_transfer_type
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Fcv rep darrell
Fcv rep darrellFcv rep darrell
Fcv rep darrell
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
 
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Seminar dm
Seminar dmSeminar dm
Seminar dm
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
Ontology Driven E-Learning Environment
Ontology Driven E-Learning EnvironmentOntology Driven E-Learning Environment
Ontology Driven E-Learning Environment
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
An adaptive Multi-Agent based Architecture for Engineering Education
An adaptive Multi-Agent based Architecture for Engineering EducationAn adaptive Multi-Agent based Architecture for Engineering Education
An adaptive Multi-Agent based Architecture for Engineering Education
 
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...
 
Introduction to OOP concepts
Introduction to OOP conceptsIntroduction to OOP concepts
Introduction to OOP concepts
 
Some thoughts on large scale data classification
Some thoughts on large scale data classificationSome thoughts on large scale data classification
Some thoughts on large scale data classification
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
 
Analysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papersAnalysis on Domain Adaptation based on different papers
Analysis on Domain Adaptation based on different papers
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

[ppt]

  • 1. A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint work with Prof. Qiang Yang
  • 2. Transfer Learning? (DARPA 05) Transfer Learning (TL): The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (in new domains) It is motivated by human learning. People can often transfer knowledge learnt previously to novel situations  Chess  Checkers  Mathematics  Computer Science  Table Tennis  Tennis
  • 3. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 4. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 5. Traditional ML vs. TL (P. Langley 06) Traditional ML in Transfer of learning multiple domains across domains training items training items test items test items Humans can learn in many domains. Humans can also transfer from one domain to other domains.
  • 6. Traditional ML vs. TL Learning Process of Learning Process of Traditional ML Transfer Learning training items training items Learning System Learning System Learning System Knowledge Learning System
  • 7. Notation Domain: It consists of two components: A feature space , a marginal distribution In general, if two domains are different, then they may have different feature spaces or different marginal distributions. Task: Given a specific domain and label space , for each in the domain, to predict its corresponding label In general, if two tasks are different, then they may have different label spaces or different conditional distributions
  • 8. Notation For simplicity, we only consider at most two domains and two tasks. Source domain: Task in the source domain: Target domain: Task in the target domain
  • 9. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 10. Why Transfer Learning?  In some domains, labeled data are in short supply.  In some domains, the calibration effort is very expensive.  In some domains, the learning process is time consuming.  How to extract knowledge learnt from related domains to help learning in a target domain with a few labeled data?  How to extract knowledge learnt from related domains to speed up learning in a target domain?  Transfer learning techniques may help!
  • 11. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 12. Settings of Transfer Learning Transfer learning settings Labeled data in Labeled data in Tasks a source domain a target domain Inductive Transfer Learning × √ Classification Regression √ √ … Transductive Transfer Learning √ × Classification Regression … Unsupervised Transfer Learning Clustering × × …
  • 13. An overview of various settings of Self-taught Case 1 transfer learning Learning No labeled data in a source domain Inductive Transfer Learning Labeled data are available in a source domain Labeled data are available in a target domain Source and Multi-task target tasks are Case 2 learnt Learning simultaneously Transfer Learning Labeled data are available only in a Assumption: source domain Transductive different Domain domains but Transfer Learning single task Adaptation No labeled data in both source and target domain Assumption: single domain and single task Unsupervised Sample Selection Bias Transfer Learning /Covariance Shift
  • 14. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 15. Approaches to Transfer Learning Transfer learning approaches Description Instance-transfer To re-weight some labeled data in a source domain for use in the target domain Feature-representation-transfer Find a “good” feature representation that reduces difference between a source and a target domain or minimizes error of models Model-transfer Discover shared parameters or priors of models between a source domain and a target domain Relational-knowledge-transfer Build mapping of relational knowledge between a source domain and a target domain.
  • 16. Approaches to Transfer Learning Inductive Transductive Unsupervised Transfer Learning Transfer Learning Transfer Learning Instance-transfer √ √ Feature-representation- √ √ √ transfer Model-transfer √ Relational-knowledge- √ transfer
  • 17. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 18. Inductive Transfer Learning Instance-transfer Approaches • Assumption: the source domain and target domain data use exactly the same features and labels. • Motivation: Although the source domain data can not be reused directly, there are some parts of the data that can still be reused by re-weighting. • Main Idea: Discriminatively adjust weighs of data in the source domain for use in the target domain.
  • 19. Inductive Transfer Learning --- Instance-transfer Approaches Non-standard SVMs [Wu and Dietterich ICML-04] Uniform weights Correct the decision boundary by re-weighting Loss function on the Loss function on the target domain data source domain data Regularization term  Differentiate the cost for misclassification of the target and source data
  • 20. Inductive Transfer Learning --- Instance-transfer Approaches TrAdaBoost [Dai et al. ICML-07] Hedge (  ) AdaBoost [Freund et al. 1997] [Freund et al. 1997] To decrease the weights To increase the weights of of the misclassified data the misclassified data The whole Source domain training data set target domain labeled data labeled data Classifiers trained on re-weighted labeled data Target domain unlabeled data
  • 21. Inductive Transfer Learning Feature-representation-transfer Approaches Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07] Assumption: If t tasks are related to each other, then they may share some common features which can benefit for all tasks. Input: t tasks, each of them has its own training data. Output: Common features learnt across t tasks and t models for t tasks, respectively.
  • 22. Supervised Feature Construction [Argyriou et al. NIPS-06, NIPS-07] Average of the empirical error across t tasks Regularization to make the representation sparse Orthogonal Constraints where
  • 23. Inductive Transfer Learning Feature-representation-transfer Approaches Unsupervised Feature Construction [Raina et al. ICML-07] Three steps: 1. Applying sparse coding [Lee et al. NIPS-07] algorithm to learn higher-level representation from unlabeled data in the source domain. 2. Transforming the target data to new representations by new bases learnt in the first step. 3. Traditional discriminative models can be applied on new representations of the target data with corresponding labels.
  • 24. Unsupervised Feature Construction [Raina et al. ICML-07] Step1: Input: Source domain data X S  {xS } and coefficient  i Output: New representations of the source domain data AS  {aS } i and new bases B  {bi } Step2: Input: Target domain data XT  {xT } , coefficient  and bases B  {bi } i Output: New representations of the target domain data AT  {aT }i
  • 25. Inductive Transfer Learning Model-transfer Approaches Regularization-based Method [Evgeiou and Pontil, KDD-04] Assumption: If t tasks are related to each other, then they may share some parameters among individual models. Assume ft  wt  x be a hyper-plane for task , where t {T , S} and Common part Specific part for individual task Regularization terms Encode them into SVMs: for multiple tasks
  • 26. Inductive Transfer Learning Relational-knowledge-transfer Approaches TAMAR [Mihalkova et al. AAAI-07] Assumption: If the target domain and source domain are related, then there may be some relationship between domains being similar, which can be used for transfer learning Input: 1. Relational data in the source domain and a statistical relational model, Markov Logic Network (MLN), which has been learnt in the source domain. 2. Relational data in the target domain. Output: A new statistical relational model, MLN, in the target domain. Goal: To learn a MLN in the target domain more efficiently and effectively.
  • 27. TAMAR [Mihalkova et al. AAAI-07] Two Stages: 1. Predicate Mapping – Establish the mapping between predicates in the source and target domain. Once a mapping is established, clauses from the source domain can be translated into the target domain. 2. Revising the Mapped Structure – The clauses mapping from the source domain directly may not be completely accurate and may need to be revised, augmented , and re-weighted in order to properly model the target data.
  • 28. TAMAR [Mihalkova et al. AAAI-07] Source domain (academic domain) Target domain (movie domain) AdvisedBy WorkedFor Student (B) Professor (A) Actor(A) Director(B) Publication Publication MovieMember MovieMember Paper (T) Movie(M) Mapping Revising
  • 29. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 30. Transductive Transfer Learning Instance-transfer Approaches Sample Selection Bias / Covariance Shift [Zadrozny ICML-04, Schwaighofer JSPI-00] Input: A lot of labeled data in the source domain and no labeled data in the target domain. Output: Models for use in the target domain data. Assumption: The source domain and target domain are the same. In addition, P(YS | X S ) and P(YT | X T ) are the same while P( X S ) and P( X T ) may be different causing by different sampling process (training data and test data). Main Idea: Re-weighting (important sampling) the source domain data.
  • 31. Sample Selection Bias/Covariance Shift To correct sample selection bias: weights for source domain data How to estimate ? One straightforward solution is to estimate P( X S ) and P( X T ) , respectively. However, estimating density function is a hard problem.
  • 32. Sample Selection Bias/Covariance Shift Kernel Mean Match (KMM) [Huang et al. NIPS 2006] Main Idea: KMM tries to estimate directly instead of estimating density function. It can be proved that can be estimated by solving the following quadratic programming (QP) optimization problem. To match means between training and test data in a RKHS Theoretical Support: Maximum Mean Discrepancy (MMD) [Borgwardt et al. BIOINFOMATICS-06]. The distance of distributions can be measured by Euclid distance of their mean vectors in a RKHS.
  • 33. Transductive Transfer Learning Feature-representation-transfer Approaches Domain Adaptation [Blitzer et al. EMNL-06, Ben-David et al. NIPS-07, Daume III ACL-07] Assumption: Single task across domains, which means P(YS | X S )and P(YT | X T ) are the same while P( X S ) and P( X T ) may be different causing by feature representations across domains. Main Idea: Find a “good” feature representation that reduce the “distance” between domains. Input: A lot of labeled data in the source domain and only unlabeled data in the target domain. Output: A common representation between source domain data and target domain data and a model on the new representation for use in the target domain.
  • 34. Domain Adaptation Structural Correspondence Learning (SCL) [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR-05] Motivation: If two domains are related to each other, then there may exist some “pivot” features across both domain. Pivot features are features that behave in the same way for discriminative learning in both domains. Main Idea: To identify correspondences among features from different domains by modeling their correlations with pivot features. Non-pivot features form different domains that are correlated with many of the same pivot features are assumed to correspond, and they are treated similarly in a discriminative learner.
  • 35. SCL [Blitzer et al. EMNL-06, Blitzer et al. ACL-07, Ando and Zhang JMLR- 05] a) Heuristically choose m pivot features, which is task specific. b) Transform each vector of pivot feature to a vector of binary values and then create corresponding prediction problem. Learn parameters of each prediction problem Do Eigen Decomposition on the matrix of parameters and learn the linear mapping function. Use the learnt mapping function to construct new features and train classifiers onto the new representations.
  • 36. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Inductive Transfer Learning  Transductive Transfer Learning  Unsupervised Transfer Learning
  • 37. Unsupervised Transfer Learning Feature-representation-transfer Approaches Self-taught Clustering (STC) [Dai et al. ICML-08] Input: A lot of unlabeled data in a source domain and a few unlabeled data in a target domain. Goal: Clustering the target domain data. Assumption: The source domain and target domain data share some common features, which can help clustering in the target domain. Main Idea: To extend the information theoretic co-clustering algorithm [Dhillon et al. KDD-03] for transfer learning.
  • 38. Self-taught Clustering (STC) [Dai et al. ICML-08] Common features Target domain data Source domain data Co-clustering in the source domain Objective function that need to be minimized Co-clustering in the target domain where Cluster functions Output
  • 39. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 40. Negative Transfer  Most approaches to transfer learning assume transferring knowledge across domains be always positive.  However, in some cases, when two tasks are too dissimilar, brute-force transfer may even hurt the performance of the target task, which is called negative transfer [Rosenstein et al NIPS-05 Workshop].  Some researchers have studied how to measure relatedness among tasks [Ben-David and Schuller NIPS-03, Bakker and Heskes JMLR-03].  How to design a mechanism to avoid negative transfer needs to be studied theoretically.
  • 41. Outline  Traditional Machine Learning vs. Transfer Learning  Why Transfer Learning?  Settings of Transfer Learning  Approaches to Transfer Learning  Negative Transfer  Conclusion
  • 42. Conclusion Inductive Transductive Unsupervised Transfer Learning Transfer Learning Transfer Learning Instance-transfer √ √ Feature-representation- √ √ √ transfer Model-transfer √ Relational-knowledge- √ transfer How to avoid negative transfer need to be attracted more attention!