Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. The document discusses several transfer learning techniques including fine-tuning, multitask learning, domain adaptation, and zero-shot learning. Fine-tuning involves using a pre-trained model and updating its parameters on a new task with limited labeled data to prevent overfitting. Multitask learning trains a model simultaneously on multiple related tasks by using a shared representation. Domain adaptation aligns the distributions of the source and target domains using techniques like domain-adversarial training. Zero-shot learning recognizes new classes not seen during training by learning a semantic embedding space relating classes and attributes.
Deep Content Learning in Traffic Prediction and Text ClassificationHPCC Systems
As part of the 2018 HPCC Systems Community Day event:
In this talk, Jingqing will introduce recent advances at the Data Science Institute, Imperial College London, and focus on a general framework named Deep Content Learning. Two recent projects will be discussed as examples. In the traffic prediction project, we released a new large-scale traffic dataset with auxiliary information including search queries from Baidu Map app and proposed hybrid models to achieve state-of-the-art prediction accuracy. The other project on zero-shot text classification integrated semantic knowledge and used a two-phase architecture to tackle the challenging zero-shot learning in textual data. The integration of TensorLayer and HPCC Systems will be discussed in the talk.
Jingqing Zhang is a 1st-year PhD (HiPEDS) at Data Science Institute, Imperial College London under the supervision of Prof. Yi-Ke Guo. His research interest includes Text Mining, Data Mining, Deep Learning and their applications. He received his MRes degree in Computing from Imperial College with Distinction in 2017 and BEng in Computer Science and Technology from Tsinghua University in 2016.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Deep Content Learning in Traffic Prediction and Text ClassificationHPCC Systems
As part of the 2018 HPCC Systems Community Day event:
In this talk, Jingqing will introduce recent advances at the Data Science Institute, Imperial College London, and focus on a general framework named Deep Content Learning. Two recent projects will be discussed as examples. In the traffic prediction project, we released a new large-scale traffic dataset with auxiliary information including search queries from Baidu Map app and proposed hybrid models to achieve state-of-the-art prediction accuracy. The other project on zero-shot text classification integrated semantic knowledge and used a two-phase architecture to tackle the challenging zero-shot learning in textual data. The integration of TensorLayer and HPCC Systems will be discussed in the talk.
Jingqing Zhang is a 1st-year PhD (HiPEDS) at Data Science Institute, Imperial College London under the supervision of Prof. Yi-Ke Guo. His research interest includes Text Mining, Data Mining, Deep Learning and their applications. He received his MRes degree in Computing from Imperial College with Distinction in 2017 and BEng in Computer Science and Technology from Tsinghua University in 2016.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen wide spread use. Why has this been the case?
In this presentation, we will first introduce RNNs as a concept. Then we will sketch how to implement them and cover the tricks necessary to make them work well. With the basics covered, we will investigate using RNNs as general text classification and regression models, examining where they succeed and where they fail compared to more traditional text analysis models. A straightforward open-source Python and Theano library for training RNNs with a scikit-learn style interface will be introduced and we’ll see how to use it through a tutorial on a real world text dataset
https://www.youtube.com/watch?v=5ZUlVlumIQo&list=PLqJzTtkUiq54DDEEZvzisPlSGp_BadhNJ&index=10
Over the last years, deep learning is rapidly advancing with impressive results obtained in several areas including computer vision, machine translation and speech recognition. Deep learning attempts to learn complex function through learning hierarchical representation of data. A deep learning model is composed of non-linear modules that each transforms the representation from lower layer to the higher more abstract one. Very complex functions can be learned using enough composition of the non-linear modules. Furthermore, the need for manual feature engineering can be obviated by learning features themselves through the representation learning. In this talk, we first explain how deep learning architecture in particular and neural networks in general are loosely inspired by mammalian visual cortex and nervous system respectively. We also discuss about the reason for big and successful comeback of neural networks with the deep learning models. Finally, we give a brief introduction of various deep structures and their applications to several domains.
References:
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP." Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL) (2013).
Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. 2010.
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013.
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
https://www.udacity.com/course/deep-learning--ud730
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
Introductory seminar on NLP for CS sophomores. Presented to Texas A&M's Fall 2022 CSCE181 class. Slides are a bit redundant due to compatibility issues :\
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Deep Learning: concepts and use cases (October 2018)Julien SIMON
An introduction to Deep Learning theory
Neurons & Neural Networks
The Training Process
Backpropagation
Optimizers
Common network architectures and use cases
Convolutional Neural Networks
Recurrent Neural Networks
Long Short Term Memory Networks
Generative Adversarial Networks
Getting started
Recurrent Neural Networks hold great promise as general sequence learning algorithms. As such, they are a very promising tool for text analysis. However, outside of very specific use cases such as handwriting recognition and recently, machine translation, they have not seen wide spread use. Why has this been the case?
In this presentation, we will first introduce RNNs as a concept. Then we will sketch how to implement them and cover the tricks necessary to make them work well. With the basics covered, we will investigate using RNNs as general text classification and regression models, examining where they succeed and where they fail compared to more traditional text analysis models. A straightforward open-source Python and Theano library for training RNNs with a scikit-learn style interface will be introduced and we’ll see how to use it through a tutorial on a real world text dataset
https://www.youtube.com/watch?v=5ZUlVlumIQo&list=PLqJzTtkUiq54DDEEZvzisPlSGp_BadhNJ&index=10
Over the last years, deep learning is rapidly advancing with impressive results obtained in several areas including computer vision, machine translation and speech recognition. Deep learning attempts to learn complex function through learning hierarchical representation of data. A deep learning model is composed of non-linear modules that each transforms the representation from lower layer to the higher more abstract one. Very complex functions can be learned using enough composition of the non-linear modules. Furthermore, the need for manual feature engineering can be obviated by learning features themselves through the representation learning. In this talk, we first explain how deep learning architecture in particular and neural networks in general are loosely inspired by mammalian visual cortex and nervous system respectively. We also discuss about the reason for big and successful comeback of neural networks with the deep learning models. Finally, we give a brief introduction of various deep structures and their applications to several domains.
References:
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.
Socher, Richard, Yoshua Bengio, and Chris Manning. "Deep learning for NLP." Tutorial at Association of Computational Logistics (ACL), 2012, and North American Chapter of the Association of Computational Linguistics (NAACL) (2013).
Lee, Honglak. "Tutorial on deep learning and applications." NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning. 2010.
LeCun, Yann, and M. Ranzato. "Deep learning tutorial." Tutorials in International Conference on Machine Learning (ICML’13). 2013.
Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the conference on empirical methods in natural language processing (EMNLP). Vol. 1631. 2013.
https://www.youtube.com/channel/UC9OeZkIwhzfv-_Cb7fCikLQ
https://www.udacity.com/course/deep-learning--ud730
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
Introductory seminar on NLP for CS sophomores. Presented to Texas A&M's Fall 2022 CSCE181 class. Slides are a bit redundant due to compatibility issues :\
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
2. Transfer Learning
Dog/Cat
Classifier
cat dog
Data not directly related to the task considered
elephant tiger
Similar domain, different tasks Different domains, same task
http://weebly110810.weebly.com/3
96403913129399.html
http://www.sucaitianxia.com/png/c
artoon/200811/4261.html
dog cat
3. Why?
Task Considered
Image
Recognition
Data not directly related
Speech
Recognition
Specific
domain
Taiwanese
English
Chinese
Webpages
……
Medical
Images
http://www.bigr.nl/website/structure/main.php?page=resear
chlines&subpage=project&id=64
Text
Analysis
http://www.spear.com.hk/Translation-company-Directory.html
4. Transfer Learning
• Example in real life
爆漫王
責編
漫畫家
投稿 jump
畫分鏡
指導教授
研究生
投稿期刊
跑實驗
漫畫家
研究生
(word embedding knows that)
5. Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Model Fine-tuning
Warning: different terminology in
different literature
6. Model Fine-tuning
• Task description
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
, 𝑦𝑡
• Example: (supervised) speaker adaption
• Source data: audio data and transcriptions from many
speakers
• Target data: audio data and its transcriptions of specific
user
• Idea: training a model by source data, then fine-
tune the model by target data
• Challenge: only limited target data, so be careful about
overfitting
Very little
A large amount
One-shot learning: only a few
examples in target domain
7. Conservative Training
Source data
(e.g. Audio data of
Many speakers)
Target data (e.g.
A little data from
target speaker)
Input layer
Output layer
initialization
Input layer
Output layer
output close
parameter close
8. Layer Transfer
Input layer
Output layer Copy some parameters
Target data
Source
data
1. Only train the rest layers (prevent
overfitting)
2. fine-tune the whole network (if
there is sufficient data)
9. Layer Transfer
• Which layer can be transferred (copied)?
• Speech: usually copy the last few layers
• Image: usually copy the first few layers
Layer 1 Layer 2 Layer L
1
x
2
x
……
……
……
……
……
elephant
……
N
x
Pixels
……
……
……
10. Layer Transfer - Image
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How
transferable are features in deep neural networks?”, NIPS, 2014
Only train the
rest layers
fine-tune the
whole network
Source: 500 classes
from ImageNet
Target: another 500
classes from ImageNet
11. Layer Transfer - Image
Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, “How
transferable are features in deep neural networks?”, NIPS, 2014
Only train the
rest layers
12. Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
13. Multitask Learning
• The multi-layer structure makes NN suitable for
multitask learning
Input
feature
Task A Task B
Task A Task B
Input feature
for task A
Input feature
for task B
14. Multitask Learning
- Multilingual Speech Recognition
acoustic features
states of
French
states of
German
Human languages
share some common
characteristics.
states of
Spanish
states of
Italian
states of
Mandarin
Similar idea in translation: Daxiang Dong, Hua Wu, Wei He, Dianhai Yu and
Haifeng Wang, "Multi-task learning for multiple language translation.“, ACL 2015
15. Multitask Learning - Multilingual
Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual
deep neural network with shared hidden layers." ICASSP, 2013
25
30
35
40
45
50
1 10 100 1000
Character
Error
Rate
Hours of training data for Mandarin
Mandarin
only
With
European
Language
16. Progressive Neural Networks
input
Task 1
input
Task 2
input
Task 3
Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert
Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia
Hadsell, “Progressive Neural Networks”, arXiv preprint 2016
17. Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori
Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, Daan Wierstra,
“PathNet: Evolution Channels Gradient Descent in Super Neural Networks”,
arXiv preprint, 2017
18. Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Domain-adversarial
training
19. Task description
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
Training data
Testing data
Same task,
mismatch
with label
without label
22. Domain-adversarial training
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
This is a big network, but different parts have different goals.
Maximize label
classification accuracy
Maximize domain
classification accuracy
Maximize label classification accuracy +
minimize domain classification accuracy
feature extractor
Domain classifier
Label predictor
23. Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Domain classifier fails in the end
It should struggle ……
24. Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
25. Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Domain-adversarial
training
Zero-shot learning
26. Zero-shot Learning
• Source data: 𝑥𝑠, 𝑦𝑠
• Target data: 𝑥𝑡
Different
tasks
In speech recognition, we can not have all possible words
in the source (training) data.
Training data
Testing data
cat dog
How we solve this problem in speech recognition?
𝑥𝑠:
𝑦𝑠:
……
……
𝑥𝑡
:
http://evchk.wikia.com/wiki/%E8%8
D%89%E6%B3%A5%E9%A6%AC
27. Zero-shot Learning
• Representing each class by its attributes
furry 4 legs tail …
Dog O O O
Fish X X O
Chimp O X X
…
sufficient attributes for one
to one mapping
class
attributes
NN
Training
NN
furry 4 legs tail furry 4 legs tail
1 1 1
1 0 0 Database
28. Zero-shot Learning
• Representing each class by its attributes
furry 4 legs tail …
Dog O O O
Fish X X O
Chimp O X X
…
class
attributes
Find the class with the most
similar attributes
Testing
sufficient attributes for one
to one mapping
furry 4 legs tail
0 0 1
NN
29. Zero-shot Learning
• Attribute embedding
Embedding Space
y2 (attribute
of dog)
x1
x2
x3
y1 (attribute
of chimp)
y3 (attribute of
Grass-mud horse)
𝑓 𝑥1
𝑓 𝑥2
𝑔 𝑥3
𝑔 𝑦2
𝑓 ∗ and g ∗ can be NN.
Training target:
𝑓 𝑥𝑛
and 𝑔 𝑦𝑛
as
close as possible
𝑓 𝑦3
𝑔 𝑦1
30. Zero-shot Learning
• Attribute embedding + word embedding
Embedding Space
y2 (attribute
of dog)
x1
x2
x3
y1 (attribute
of chimp)
y3 (attribute of
Grass-mud horse)
𝑓 𝑥1
𝑓 𝑥2
𝑔 𝑥3
𝑔 𝑦2
𝑓 𝑦3
𝑔 𝑦1
What if we don’t
have database
V(chimp)
V(dog)
V(Grass-
mud_horse)
31. ∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑚𝑎𝑥 0, 𝑚 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑥𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚
Zero-shot Learning
𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑓 𝑥𝑛 − 𝑔 𝑦𝑛
2 Problem?
𝑓∗, 𝑔∗ = 𝑎𝑟𝑔 min
𝑓,𝑔
𝑛
𝑚𝑎𝑥 0, 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑚 ∙ 𝑔 𝑥
Zero loss: 𝑘 − 𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 + max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 < 0
Margin you defined
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑛 − max
𝑚≠𝑛
𝑓 𝑥𝑛 ∙ 𝑔 𝑦𝑚 > 𝑘
𝑓 𝑥𝑛 and 𝑔 𝑦𝑛 as close 𝑓 𝑥𝑛 and 𝑔 𝑦𝑚 not as close
32. Zero-shot Learning
• Convex Combination of Semantic Embedding
NN
lion tiger
0.5 0.5
V(lion)
V(tiger)
V(liger)
0.5V(tiger)+0.5V(lion)
Find the closest
word vector
Only need off-the-shelf NN for
ImageNet and word vector
36. Transfer Learning - Overview
Target
Data
Source Data (not directly related to the task)
labelled
labelled
unlabeled
unlabeled
Multitask Learning
Fine-tuning
Self-taught learning
Domain-adversarial
training
Zero-shot learning
Self-taught Clustering
Rajat Raina , Alexis Battle , Honglak
Lee , Benjamin Packer , Andrew Y. Ng,
Self-taught learning: transfer learning
from unlabeled data, ICML, 2007
Wenyuan Dai, Qiang Yang,
Gui-Rong Xue, Yong Yu, "Self-
taught clustering", ICML 2008
Different from semi-
supervised learning
37. Self-taught learning
• Learning to extract better representation from the source
data (unsupervised approach)
• Extracting better representation for target data
40. More about Zero-shot learning
• Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, Tom M.
Mitchell, “Zero-shot Learning with Semantic Output Codes”, NIPS
2009
• Zeynep Akata, Florent Perronnin, Zaid Harchaoui and Cordelia
Schmid, “Label-Embedding for Attribute-Based Classification”,
CVPR 2013
• Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff
Dean, Marc'Aurelio Ranzato, Tomas Mikolov, “DeViSE: A Deep
Visual-Semantic Embedding Model”, NIPS 2013
• Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram
Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey
Dean, “Zero-Shot Learning by Convex Combination of Semantic
Embeddings”, arXiv preprint 2013
• Subhashini Venugopalan, Lisa Anne Hendricks, Marcus
Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko,
“Captioning Images with Diverse Objects”, arXiv preprint 2016
Editor's Notes
Transfer
Self-taught
even sick
手速
進度報告
Just train cannot work
Extra constraint
To evaluate how the size of the adaptation set affects the result, the number of
adaptation utterances varies from 5 (32 s) to 200 (22min).