SlideShare a Scribd company logo
1 of 40
Transfer Learning
Transfer Learning
Dog/Cat
Classifier
cat dog
Data not directly related to the task considered
elephant tiger
Similar domain, different tasks Different domains, same task
http://weebly110810.weebly.com/3
96403913129399.html
http://www.sucaitianxia.com/png/c
artoon/200811/4261.html
dog cat
Why?
Task Considered
Image
Recognition
Data not directly related
Speech
Recognition
Specific
domain
Taiwanese
English
Chinese
Webpages
……
Medical
Images
http://www.bigr.nl/website/structure/main.php?page=resear
chlines&subpage=project&id=64
Text
Analysis
http://www.spear.com.hk/Translation-company-Directory.html
Transfer Learning
• Example in real life
爆漫王
責編
漫畫家
投稿 jump
畫分鏡
指導教授
研究生
投稿期刊
跑實驗
漫畫家
研究生
(word embedding knows that)
Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Model Fine-tuning
Warning: different terminology in
different literature
Model Fine-tuning
• Task description
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
, 𝑦𝑡
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Very little
A large amount
One-shot learning: only a few
examples in target domain
Conservative Training
Source data
(e.g. Audio data of
Many speakers)
Target data (e.g.
A little data from
target speaker)
Input layer
Output layer
initialization
Input layer
Output layer
output close
parameter close
Layer Transfer
Input layer
Output layer Copy some parameters
Target data
Source
data
1. Only train the rest layers (prevent
overfitting)
2. fine-tune the whole network (if
there is sufficient data)
Layer Transfer
• Which layer can be transferred (copied)?
• Speech: usually copy the last few layers
• Image: usually copy the first few layers
Layer 1 Layer 2 Layer L
1
x
2
x
……
……
……
……
……
elephant
……
N
x
Pixels
……
……
……
Layer Transfer - Image
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How
transferable are features in deep neural networks?”, NIPS, 2014
Only train the
rest layers
fine-tune the
whole network
Source: 500 classes
from ImageNet
Target: another 500
classes from ImageNet
Layer Transfer - Image
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How
transferable are features in deep neural networks?”, NIPS, 2014
Only train the
rest layers
Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Input
feature
Task A Task B
Task A Task B
Input feature
for task A
Input feature
for task B
Multitask Learning
- Multilingual Speech Recognition
acoustic features
states of
French
states of
German
Human languages
share some common
characteristics.
states of
Spanish
states of
Italian
states of
Mandarin
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
Multitask Learning - Multilingual
Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
25
30
35
40
45
50
1 10 100 1000
Character
Error
Rate
Hours of training data for Mandarin
Mandarin
only
With
European
Language
Progressive Neural Networks
input
Task 1
input
Task 2
input
Task 3
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert
Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia
Hadsell, “Progressive Neural Networks”, arXiv preprint 2016
Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori
Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra,
“PathNet: Evolution Channels Gradient Descent in Super Neural Networks”,
arXiv preprint, 2017
Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Domain-adversarial
training
Task description
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
Training data
Testing data
Same task,
mismatch
with label
without label
Domain-adversarial training
Domain-adversarial training
Similar to GAN
Too easy to feature
extractor ……
feature extractor
Domain classifier
Domain-adversarial training
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
This is a big network, but different parts have different goals.
Maximize label
classification accuracy
Maximize domain
classification accuracy
Maximize label classification accuracy +
minimize domain classification accuracy
feature extractor
Domain classifier
Label predictor
Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Domain classifier fails in the end
It should struggle ……
Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Domain-adversarial
training
Zero-shot learning
Zero-shot Learning
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
Different
tasks
In speech recognition, we can not have all possible words
in the source (training) data.
Training data
Testing data
cat dog
How we solve this problem in speech recognition?
𝑥𝑠:
𝑦𝑠:
……
……
𝑥𝑡
:
http://evchk.wikia.com/wiki/%E8%8
D%89%E6%B3%A5%E9%A6%AC
Zero-shot Learning
• Representing each class by its attributes
furry 4 legs tail …
Dog O O O
Fish X X O
Chimp O X X
…
sufficient attributes for one
to one mapping
class
attributes
NN
Training
NN
furry 4 legs tail furry 4 legs tail
1 1 1
1 0 0 Database
Zero-shot Learning
• Representing each class by its attributes
furry 4 legs tail …
Dog O O O
Fish X X O
Chimp O X X
…
class
attributes
Find the class with the most
similar attributes
Testing
sufficient attributes for one
to one mapping
furry 4 legs tail
0 0 1
NN
Zero-shot Learning
• Attribute embedding
Embedding Space
y2 (attribute
of dog)
x1
x2
x3
y1 (attribute
of chimp)
y3 (attribute of
Grass-mud horse)
𝑓 𝑥1
𝑓 𝑥2
𝑔 𝑥3
𝑔 𝑦2
𝑓 ∗ and g ∗ can be NN.
Training target:
𝑓 𝑥𝑛
and 𝑔 𝑦𝑛
as
close as possible
𝑓 𝑦3
𝑔 𝑦1
Zero-shot Learning
• Attribute embedding + word embedding
Embedding Space
y2 (attribute
of dog)
x1
x2
x3
y1 (attribute
of chimp)
y3 (attribute of
Grass-mud horse)
𝑓 𝑥1
𝑓 𝑥2
𝑔 𝑥3
𝑔 𝑦2
𝑓 𝑦3
𝑔 𝑦1
What if we don’t
have database
V(chimp)
V(dog)
V(Grass-
mud_horse)
∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑚𝑎𝑥 0, 𝑚 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑥𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚
Zero-shot Learning
𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑓 𝑥𝑛 − 𝑔 𝑦𝑛
2 Problem?
𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑚𝑎𝑥 0, 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑚 ∙ 𝑔 𝑥
Zero loss: 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 < 0
Margin you defined
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 − max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 > 𝑘
𝑓 𝑥𝑛 and 𝑔 𝑦𝑛 as close 𝑓 𝑥𝑛 and 𝑔 𝑦𝑚 not as close
Zero-shot Learning
• Convex Combination of Semantic Embedding
NN
lion tiger
0.5 0.5
V(lion)
V(tiger)
V(liger)
0.5V(tiger)+0.5V(lion)
Find the closest
word vector
Only need off-the-shelf NN for
ImageNet and word vector
https://arxiv.org/pdf/1312.5650v3.pdf
Example of Zero-shot Learning
Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen,
Nikhil Thorat. Google’s Multilingual Neural Machine Translation System: Enabling
Zero-Shot Translation, arXiv preprint 2016
Example of Zero-shot Learning
Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Self-taught learning
Domain-adversarial
training
Zero-shot learning
Self-taught Clustering
Rajat Raina , Alexis Battle , Honglak
Lee , Benjamin Packer , Andrew Y. Ng,
Self-taught learning: transfer learning
from unlabeled data, ICML, 2007
Wenyuan Dai, Qiang Yang,
Gui-Rong Xue, Yong Yu, "Self-
taught clustering", ICML 2008
Different from semi-
supervised learning
Self-taught learning
• Learning to extract better representation from the source
data (unsupervised approach)
• Extracting better representation for target data
Acknowledgement
• 感謝 劉致廷 同學於上課時發現投影片上的錯誤
• 感謝 John Chou 發現投影片上的錯字
Appendix
More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M.
Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS
2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia
Schmid, “Label-Embedding for Attribute-Based Classification”,
CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff
Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep
Visual-Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey
Dean, “Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016

More Related Content

Similar to Transfer Learning: An Overview of Common Techniques

Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPRrubaa Panchendrarajan
 
Peter Norvig - NYC Machine Learning 2013
Peter Norvig - NYC Machine Learning 2013Peter Norvig - NYC Machine Learning 2013
Peter Norvig - NYC Machine Learning 2013Michael Scovetta
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPInsoo Chung
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2Viral Gupta
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 

Similar to Transfer Learning: An Overview of Common Techniques (20)

Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
Peter Norvig - NYC Machine Learning 2013
Peter Norvig - NYC Machine Learning 2013Peter Norvig - NYC Machine Learning 2013
Peter Norvig - NYC Machine Learning 2013
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
 
SNLI_presentation_2
SNLI_presentation_2SNLI_presentation_2
SNLI_presentation_2
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 

More from HaibinSu2

CS0098S2G02_08 (1).ppt
CS0098S2G02_08 (1).pptCS0098S2G02_08 (1).ppt
CS0098S2G02_08 (1).pptHaibinSu2
 
chem2503_oct05.ppt
chem2503_oct05.pptchem2503_oct05.ppt
chem2503_oct05.pptHaibinSu2
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxHaibinSu2
 
MolecularMotors.ppt
MolecularMotors.pptMolecularMotors.ppt
MolecularMotors.pptHaibinSu2
 
Lesson 25-27.pdf
Lesson 25-27.pdfLesson 25-27.pdf
Lesson 25-27.pdfHaibinSu2
 
Lesson 28-29.pdf
Lesson 28-29.pdfLesson 28-29.pdf
Lesson 28-29.pdfHaibinSu2
 

More from HaibinSu2 (6)

CS0098S2G02_08 (1).ppt
CS0098S2G02_08 (1).pptCS0098S2G02_08 (1).ppt
CS0098S2G02_08 (1).ppt
 
chem2503_oct05.ppt
chem2503_oct05.pptchem2503_oct05.ppt
chem2503_oct05.ppt
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
MolecularMotors.ppt
MolecularMotors.pptMolecularMotors.ppt
MolecularMotors.ppt
 
Lesson 25-27.pdf
Lesson 25-27.pdfLesson 25-27.pdf
Lesson 25-27.pdf
 
Lesson 28-29.pdf
Lesson 28-29.pdfLesson 28-29.pdf
Lesson 28-29.pdf
 

Recently uploaded

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 

Recently uploaded (20)

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 

Transfer Learning: An Overview of Common Techniques

  • 2. Transfer Learning Dog/Cat Classifier cat dog Data not directly related to the task considered elephant tiger Similar domain, different tasks Different domains, same task http://weebly110810.weebly.com/3 96403913129399.html http://www.sucaitianxia.com/png/c artoon/200811/4261.html dog cat
  • 3. Why? Task Considered Image Recognition Data not directly related Speech Recognition Specific domain Taiwanese English Chinese Webpages …… Medical Images http://www.bigr.nl/website/structure/main.php?page=resear chlines&subpage=project&id=64 Text Analysis http://www.spear.com.hk/Translation-company-Directory.html
  • 4. Transfer Learning • Example in real life 爆漫王 責編 漫畫家 投稿 jump 畫分鏡 指導教授 研究生 投稿期刊 跑實驗 漫畫家 研究生 (word embedding knows that)
  • 5. Transfer Learning - Overview Target Data Source Data (not directly related to the task) labelled labelled unlabeled unlabeled Model Fine-tuning Warning: different terminology in different literature
  • 6. Model Fine-tuning • Task description • Source data: 𝑥𝑠, 𝑦𝑠 • Target data: 𝑥𝑡 , 𝑦𝑡 • Example: (supervised) speaker adaption • Source data: audio data and transcriptions from many speakers • Target data: audio data and its transcriptions of specific user • Idea: training a model by source data, then fine- tune the model by target data • Challenge: only limited target data, so be careful about overfitting Very little A large amount One-shot learning: only a few examples in target domain
  • 7. Conservative Training Source data (e.g. Audio data of Many speakers) Target data (e.g. A little data from target speaker) Input layer Output layer initialization Input layer Output layer output close parameter close
  • 8. Layer Transfer Input layer Output layer Copy some parameters Target data Source data 1. Only train the rest layers (prevent overfitting) 2. fine-tune the whole network (if there is sufficient data)
  • 9. Layer Transfer • Which layer can be transferred (copied)? • Speech: usually copy the last few layers • Image: usually copy the first few layers Layer 1 Layer 2 Layer L 1 x 2 x …… …… …… …… …… elephant …… N x Pixels …… …… ……
  • 10. Layer Transfer - Image Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable are features in deep neural networks?”, NIPS, 2014 Only train the rest layers fine-tune the whole network Source: 500 classes from ImageNet Target: another 500 classes from ImageNet
  • 11. Layer Transfer - Image Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How transferable are features in deep neural networks?”, NIPS, 2014 Only train the rest layers
  • 12. Transfer Learning - Overview Target Data Source Data (not directly related to the task) labelled labelled unlabeled unlabeled Multitask Learning Fine-tuning
  • 13. Multitask Learning • The multi-layer structure makes NN suitable for multitask learning Input feature Task A Task B Task A Task B Input feature for task A Input feature for task B
  • 14. Multitask Learning - Multilingual Speech Recognition acoustic features states of French states of German Human languages share some common characteristics. states of Spanish states of Italian states of Mandarin Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
  • 15. Multitask Learning - Multilingual Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers." ICASSP, 2013 25 30 35 40 45 50 1 10 100 1000 Character Error Rate Hours of training data for Mandarin Mandarin only With European Language
  • 16. Progressive Neural Networks input Task 1 input Task 2 input Task 3 Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell, “Progressive Neural Networks”, arXiv preprint 2016
  • 17. Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra, “PathNet: Evolution Channels Gradient Descent in Super Neural Networks”, arXiv preprint, 2017
  • 18. Transfer Learning - Overview Target Data Source Data (not directly related to the task) labelled labelled unlabeled unlabeled Multitask Learning Fine-tuning Domain-adversarial training
  • 19. Task description • Source data: 𝑥𝑠, 𝑦𝑠 • Target data: 𝑥𝑡 Training data Testing data Same task, mismatch with label without label
  • 21. Domain-adversarial training Similar to GAN Too easy to feature extractor …… feature extractor Domain classifier
  • 22. Domain-adversarial training Not only cheat the domain classifier, but satisfying label classifier at the same time This is a big network, but different parts have different goals. Maximize label classification accuracy Maximize domain classification accuracy Maximize label classification accuracy + minimize domain classification accuracy feature extractor Domain classifier Label predictor
  • 23. Domain-adversarial training Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML, 2015 Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Domain-Adversarial Training of Neural Networks, JMLR, 2016 Domain classifier fails in the end It should struggle ……
  • 24. Domain-adversarial training Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML, 2015 Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, Domain-Adversarial Training of Neural Networks, JMLR, 2016
  • 25. Transfer Learning - Overview Target Data Source Data (not directly related to the task) labelled labelled unlabeled unlabeled Multitask Learning Fine-tuning Domain-adversarial training Zero-shot learning
  • 26. Zero-shot Learning • Source data: 𝑥𝑠, 𝑦𝑠 • Target data: 𝑥𝑡 Different tasks In speech recognition, we can not have all possible words in the source (training) data. Training data Testing data cat dog How we solve this problem in speech recognition? 𝑥𝑠: 𝑦𝑠: …… …… 𝑥𝑡 : http://evchk.wikia.com/wiki/%E8%8 D%89%E6%B3%A5%E9%A6%AC
  • 27. Zero-shot Learning • Representing each class by its attributes furry 4 legs tail … Dog O O O Fish X X O Chimp O X X … sufficient attributes for one to one mapping class attributes NN Training NN furry 4 legs tail furry 4 legs tail 1 1 1 1 0 0 Database
  • 28. Zero-shot Learning • Representing each class by its attributes furry 4 legs tail … Dog O O O Fish X X O Chimp O X X … class attributes Find the class with the most similar attributes Testing sufficient attributes for one to one mapping furry 4 legs tail 0 0 1 NN
  • 29. Zero-shot Learning • Attribute embedding Embedding Space y2 (attribute of dog) x1 x2 x3 y1 (attribute of chimp) y3 (attribute of Grass-mud horse) 𝑓 𝑥1 𝑓 𝑥2 𝑔 𝑥3 𝑔 𝑦2 𝑓 ∗ and g ∗ can be NN. Training target: 𝑓 𝑥𝑛 and 𝑔 𝑦𝑛 as close as possible 𝑓 𝑦3 𝑔 𝑦1
  • 30. Zero-shot Learning • Attribute embedding + word embedding Embedding Space y2 (attribute of dog) x1 x2 x3 y1 (attribute of chimp) y3 (attribute of Grass-mud horse) 𝑓 𝑥1 𝑓 𝑥2 𝑔 𝑥3 𝑔 𝑦2 𝑓 𝑦3 𝑔 𝑦1 What if we don’t have database V(chimp) V(dog) V(Grass- mud_horse)
  • 31. ∗, 𝑔∗ = 𝑎𝑟𝑔 min 𝑓,𝑔 𝑛 𝑚𝑎𝑥 0, 𝑚 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑥𝑛 + max 𝑚≠𝑛 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 Zero-shot Learning 𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min 𝑓,𝑔 𝑛 𝑓 𝑥𝑛 − 𝑔 𝑦𝑛 2 Problem? 𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min 𝑓,𝑔 𝑛 𝑚𝑎𝑥 0, 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max 𝑚≠𝑛 𝑓 𝑥𝑚 ∙ 𝑔 𝑥 Zero loss: 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max 𝑚≠𝑛 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 < 0 Margin you defined 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 − max 𝑚≠𝑛 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 > 𝑘 𝑓 𝑥𝑛 and 𝑔 𝑦𝑛 as close 𝑓 𝑥𝑛 and 𝑔 𝑦𝑚 not as close
  • 32. Zero-shot Learning • Convex Combination of Semantic Embedding NN lion tiger 0.5 0.5 V(lion) V(tiger) V(liger) 0.5V(tiger)+0.5V(lion) Find the closest word vector Only need off-the-shelf NN for ImageNet and word vector
  • 34. Example of Zero-shot Learning Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, arXiv preprint 2016
  • 36. Transfer Learning - Overview Target Data Source Data (not directly related to the task) labelled labelled unlabeled unlabeled Multitask Learning Fine-tuning Self-taught learning Domain-adversarial training Zero-shot learning Self-taught Clustering Rajat Raina , Alexis Battle , Honglak Lee , Benjamin Packer , Andrew Y. Ng, Self-taught learning: transfer learning from unlabeled data, ICML, 2007 Wenyuan Dai, Qiang Yang, Gui-Rong Xue, Yong Yu, "Self- taught clustering", ICML 2008 Different from semi- supervised learning
  • 37. Self-taught learning • Learning to extract better representation from the source data (unsupervised approach) • Extracting better representation for target data
  • 38. Acknowledgement • 感謝 劉致廷 同學於上課時發現投影片上的錯誤 • 感謝 John Chou 發現投影片上的錯字
  • 40. More about Zero-shot learning • Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M. Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS 2009 • Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia Schmid, “Label-Embedding for Attribute-Based Classification”, CVPR 2013 • Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep Visual-Semantic Embedding Model”, NIPS 2013 • Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean, “Zero-Shot Learning by Convex Combination of Semantic Embeddings”, arXiv preprint 2013 • Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko, “Captioning Images with Diverse Objects”, arXiv preprint 2016

Editor's Notes

  1. Transfer Self-taught
  2. even sick 手速 進度報告
  3. Just train cannot work Extra constraint To evaluate how the size of the adaptation set affects the result, the number of adaptation utterances varies from 5 (32 s) to 200 (22min).
  4. Where shold I add -> I don’t know
  5. 60,0000
  6. Should I put Tanja’s AF here?
  7. Translation/summarization
  8. https://arxiv.org/pdf/1606.04671v3.pdf
  9. 高領毛衣
  10. Stratosphere同溫層