Translated Learning:
Transferlearningacross different featurespaces
Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu.
In Proceedings of Twenty-Second Annual Conference on Neural Information Proc
essing Systems (NIPS 2008)
Definition
 Transfer Learning across different feature spaces
 When labeled data are more insufficient in target feature spac
e than in other feature spaces.
E.g. Web data(text document > images),
cross-language classification(English > Bangla, or other languages..)
Human Learning Example
 Task: tyrannosaurus vs stegosaurus
- tyrannosaurus: bipedal carnivore with a massive skull balanc
ed by long, heavy tail. Its forelimbs were small and retained on
ly two digits.
- stegosaurus: quadruped ornithischian dinosaur of four long
bony spikes on a flexible tail and two rows of upright triangula
r bony plates running along the back.
Model-level Translation
Learning
Input Output
Learning
Input Output
Elephants are
big mammals
on earth...
massive
hoofed
mammal
of Africa...
translating learning models
 make the best use of available data that have both features of
the source and target domains
Model-level Translation
A language model to link the class labels to the features in the
source spaces, which is translated to the features in the target
spaces. It is completed by tracing back to the instances in the
target spaces
𝑐 𝑦𝑠 𝑦𝑡 𝑥𝑡
Feature-level translator
Features in
source space
Features in
target space
𝑐 𝑦𝑠 𝑥 𝑠 𝑐 𝑦𝑡 𝑥𝑡
Translated Learning
 Classify the instances 𝑥 𝑢 as accurately as possible using the
labeled training data ℒ = ℒ 𝑠 ∪ ℒ 𝑡 and the translator 𝜙.
Elephants are
big mammals
on earth... massive
hoofed
mammal
of Africa...
source space
labeled unlabeledlabeled
target space
𝑝 𝑦𝑡 𝑦𝑠 ∝ 𝜙(𝑦𝑡, 𝑦𝑠)
𝑥 𝑠 = (𝑦𝑠
1
, … , 𝑦𝑠
𝑛 𝑠
)
𝑥 𝑡 𝑥 𝑢
ℒ 𝑠 ℒ 𝑡 𝒰
feature translator
Risk Minimization Framework
 Risk function
measure the risk for classifying 𝑥𝑡 to the category 𝑐.
 Loss function
loss with respect to the event of 𝜃 𝐶 and 𝜃 𝑋𝑡
being relevant.
the label of 𝑥 𝑡 is 𝑐
Models with respect to 𝐶 and 𝑋𝑡 involving all the possible models
Distance function between two models 𝜃 𝐶 and 𝜃 𝑋𝑡
Model Estimation
Approximate the risk function as
Assume there is no prior difference among all the classes
This prior balances the influence of different
classes in the class-imbalance case
Translated Learning via Risk Minimization
Training phase
For each 𝑐 ∈ 𝒞
Test phase
For each 𝑥𝑡 ∈ 𝒰
Predict the label ,
Source space labeled data Target space labeled data
𝑐 𝑦𝑠 𝑦𝑡 𝑥𝑡𝜃𝑐 𝜃 𝑥 𝑡
Feature Translator
𝑥 𝑠 𝑥 𝑡
𝑦𝑠 𝑦𝑡
Source(text) Target(images)
Instance
Feature social annotations
on images
Search engine results in response to queries
Web pages including
text and pictures
If we use instance-level co-occurrence data,
Evaluation
Text-aidedImage Classification
 Images from Caltech-256, Documents from ODP(Open Directory Project)
 Auxiliary data: binary labeled text documents
 Objective: Image classification when co-occurrence data is insufficient
 Evaluated under three dissimilarity functions
Kullback-Leibler divergence(KL), Negative of cosine function(NCOS), Negative of the Pearson’s
correlation coefficient(NPCC)
 Co-occurrence data collected from a image search engine, 𝑝 𝑦𝑠, 𝑥𝑡
> >
Evaluation
Text-aidedImage Classification
(the size of ℒ 𝑡)
Evaluation
Text-aidedImage Classification
the classification model relies more on the auxiliary text training data
Evaluation
Cross-languageClassification
 Dataset from ODP English/German pages
 English documents are used to help classify German documents
 only 16 German labeled documents are available in each category
 co-occurrence data: English-German dictionary, 𝑝(𝑦𝑡, 𝑦𝑠)
 NCOS is used for dissimilarity function
 Assume that machine translation is unavailable and they rely on dictionary only
Evaluation
Cross-languageClassification
when 𝜆 is small, the performance of TLRisk is better and stable (𝜆 = 2−4)
Conclusions
 A translated learning framework for classifying target data using
data from another feature space.
 We can find a bridge to link the two spaces with only a little
labeled data in the target space.
 They formulated translated learning framework using risk
minimization and an approximation method for model estimation .
 Showed effectiveness through two applications, the text-aided
image classification and the cross-language classification.
Thank you !
Q&A

Translated learning

  • 1.
    Translated Learning: Transferlearningacross differentfeaturespaces Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. In Proceedings of Twenty-Second Annual Conference on Neural Information Proc essing Systems (NIPS 2008)
  • 2.
    Definition  Transfer Learningacross different feature spaces  When labeled data are more insufficient in target feature spac e than in other feature spaces. E.g. Web data(text document > images), cross-language classification(English > Bangla, or other languages..)
  • 3.
    Human Learning Example Task: tyrannosaurus vs stegosaurus - tyrannosaurus: bipedal carnivore with a massive skull balanc ed by long, heavy tail. Its forelimbs were small and retained on ly two digits. - stegosaurus: quadruped ornithischian dinosaur of four long bony spikes on a flexible tail and two rows of upright triangula r bony plates running along the back.
  • 4.
    Model-level Translation Learning Input Output Learning InputOutput Elephants are big mammals on earth... massive hoofed mammal of Africa... translating learning models  make the best use of available data that have both features of the source and target domains
  • 5.
    Model-level Translation A languagemodel to link the class labels to the features in the source spaces, which is translated to the features in the target spaces. It is completed by tracing back to the instances in the target spaces 𝑐 𝑦𝑠 𝑦𝑡 𝑥𝑡 Feature-level translator Features in source space Features in target space 𝑐 𝑦𝑠 𝑥 𝑠 𝑐 𝑦𝑡 𝑥𝑡
  • 6.
    Translated Learning  Classifythe instances 𝑥 𝑢 as accurately as possible using the labeled training data ℒ = ℒ 𝑠 ∪ ℒ 𝑡 and the translator 𝜙. Elephants are big mammals on earth... massive hoofed mammal of Africa... source space labeled unlabeledlabeled target space 𝑝 𝑦𝑡 𝑦𝑠 ∝ 𝜙(𝑦𝑡, 𝑦𝑠) 𝑥 𝑠 = (𝑦𝑠 1 , … , 𝑦𝑠 𝑛 𝑠 ) 𝑥 𝑡 𝑥 𝑢 ℒ 𝑠 ℒ 𝑡 𝒰 feature translator
  • 7.
    Risk Minimization Framework Risk function measure the risk for classifying 𝑥𝑡 to the category 𝑐.  Loss function loss with respect to the event of 𝜃 𝐶 and 𝜃 𝑋𝑡 being relevant. the label of 𝑥 𝑡 is 𝑐 Models with respect to 𝐶 and 𝑋𝑡 involving all the possible models Distance function between two models 𝜃 𝐶 and 𝜃 𝑋𝑡
  • 8.
    Model Estimation Approximate therisk function as Assume there is no prior difference among all the classes This prior balances the influence of different classes in the class-imbalance case
  • 9.
    Translated Learning viaRisk Minimization Training phase For each 𝑐 ∈ 𝒞 Test phase For each 𝑥𝑡 ∈ 𝒰 Predict the label , Source space labeled data Target space labeled data 𝑐 𝑦𝑠 𝑦𝑡 𝑥𝑡𝜃𝑐 𝜃 𝑥 𝑡
  • 10.
    Feature Translator 𝑥 𝑠𝑥 𝑡 𝑦𝑠 𝑦𝑡 Source(text) Target(images) Instance Feature social annotations on images Search engine results in response to queries Web pages including text and pictures If we use instance-level co-occurrence data,
  • 11.
    Evaluation Text-aidedImage Classification  Imagesfrom Caltech-256, Documents from ODP(Open Directory Project)  Auxiliary data: binary labeled text documents  Objective: Image classification when co-occurrence data is insufficient  Evaluated under three dissimilarity functions Kullback-Leibler divergence(KL), Negative of cosine function(NCOS), Negative of the Pearson’s correlation coefficient(NPCC)  Co-occurrence data collected from a image search engine, 𝑝 𝑦𝑠, 𝑥𝑡 > >
  • 12.
  • 13.
    Evaluation Text-aidedImage Classification the classificationmodel relies more on the auxiliary text training data
  • 14.
    Evaluation Cross-languageClassification  Dataset fromODP English/German pages  English documents are used to help classify German documents  only 16 German labeled documents are available in each category  co-occurrence data: English-German dictionary, 𝑝(𝑦𝑡, 𝑦𝑠)  NCOS is used for dissimilarity function  Assume that machine translation is unavailable and they rely on dictionary only
  • 15.
    Evaluation Cross-languageClassification when 𝜆 issmall, the performance of TLRisk is better and stable (𝜆 = 2−4)
  • 16.
    Conclusions  A translatedlearning framework for classifying target data using data from another feature space.  We can find a bridge to link the two spaces with only a little labeled data in the target space.  They formulated translated learning framework using risk minimization and an approximation method for model estimation .  Showed effectiveness through two applications, the text-aided image classification and the cross-language classification.
  • 17.