SlideShare a Scribd company logo
1 of 21
Download to read offline
Sebastian Ruder

Research Scientist, AYLIEN
PhD Candidate, Insight Centre
@seb_ruder |01.03.17 | LinkedIn Tech Talk
Transfer Learning —
The Next Frontier for ML
Agenda
1. What is Transfer Learning?
2. Why Transfer Learning now?
3. Transfer Learning in practice
4. Transfer Learning for NLP
5. Our research
6. Opportunities and directions
@seb_ruder |01.03.17 | LinkedIn Tech Talk
What is Transfer Learning?
@seb_ruder |01.03.17 | LinkedIn Tech Talk
Model A Model B
Task / domain A
Task / domain B
Traditional ML
Training and
evaluation on the same
task or domain.
What is Transfer Learning?
@seb_ruder |
Knowledge
Model
Source task /
domain Target task /
domain
Transfer learning
Storing knowledge gained solving
one problem and applying it to a
different but related problem.
Model
01.03.17 | LinkedIn Tech Talk
@seb_ruder |
“Transfer learning
will be the next
driver of ML
success.”
Andrew Ng,
NIPS 2016 keynote
@seb_ruder |
Why Transfer Learning now?
@seb_ruder |
Supervised learning
Transfer learning
Unsupervised learning
Reinforcement learning
2016Time
Commercial
success
Drivers of ML success in industry
- Andrew Ng, NIPS 2016 keynote
01.03.17 | LinkedIn Tech Talk
Why Transfer Learning now?
@seb_ruder |
1. Learn very accurate input-output mapping
2. Maturity of ML models
- Computer vision (5% error on ImageNet)
-Automatic speech recognition (3x faster than
typing, 20% more accurate1)
3. Large-scale deployment & adoption of ML models
-Google’s NMT System2
1 Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech Is 3x Faster than Typing for English
and Mandarin Text Entry on Mobile Devices. arXiv preprint arXiv:1608.07323.
2 Wu, Y., Schuster, M., Chen, Z., Le, Q. V, Norouzi, M., Macherey, W., … Dean, J. (2016). Google’s Neural
Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint
arXiv:1609.08144.
Huge reliance on labeled data 

Novel tasks / domains without (labeled) data
01.03.17 | LinkedIn Tech Talk
Transfer Learning in practice
@seb_ruder |
• Train new model on features
of large model trained on
ImageNet3
• Train model to confuse source
and target domains4
• Train model on domain-
invariant representations5,6
3 Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for
recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 512–519.
4 Ganin, Y., & Lempitsky, V. (2015). Unsupervised Domain Adaptation by Backpropagation. Proceedings of the 32nd
International Conference on Machine Learning., 37.
5 Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., & Erhan, D. (2016). Domain Separation Networks. NIPS 2016.
6 Sener, O., Song, H. O., Saxena, A., & Savarese, S. (2016). Learning Transferrable Representations for Unsupervised Domain
Adaptation. NIPS 2016.
Computer vision
01.03.17 | LinkedIn Tech Talk
Transfer Learning in practice
@seb_ruder |
• Progressive Neural

Networks7 have
access to weights
from trained models
• PathNet8 learns
weight paths via a
genetic algorithm
7 Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., … Deepmind, G.
(2016). Progressive Neural Networks. arXiv preprint arXiv:1606.04671.
8 Fernando, C., Banarse, D., Blundell, C., Zwols, Y., Ha, D., Rusu, A. A., … Wierstra, D. (2017). PathNet:
Evolution Channels Gradient Descent in Super Neural Networks. In arXiv preprint arXiv:1701.08734.
Reinforcement learning
01.03.17 | LinkedIn Tech Talk
Transfer Learning for NLP
@seb_ruder |
• Task and domainT D
DS 6= DT TS 6= TT
A (slightly) more technical definition
• Domain where
- : feature space, e.g. BOW representations
- : e.g. distribution over terms in documents
D = {X, P(X)}
X
P(X)
• Task where
- : label space, e.g. true/false labels
- : learned mapping from samples to labels
T = {Y, P(Y |X)}
Y
P(Y |X)
• Transfer learning:

Learning when or
01.03.17 | LinkedIn Tech Talk
Transfer Learning for NLP
@seb_ruder |
Transfer scenarios
1. : Different topics, text types, etc.

2. : Different languages.

3. : Unbalanced classes.

4. : Different tasks.
P(XS) 6= P(XT )
XS 6= XT
P(YS|XS) 6= P(YT |XT )
YS 6= YT
01.03.17 | LinkedIn Tech Talk
Transfer Learning for NLP
@seb_ruder |
Current status
• Not as straightforward as in CV
- No universal deep features
• However: “Simple” transfer through word
embeddings is pervasive
• History of research for task-specific transfer, e.g.
sentiment analysis, POS tagging leveraging NLP
phenomena such as structured features, sentiment
words, etc.
• Few research on transfer between tasks
• More recently: representation-based research
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
Research focus
Finding better ways to transfer knowledge to new
domains, tasks, and languages that
1. perform well in large-scale settings and real-
world applications;
2. are applicable to many tasks and models.
Current focus:
: Training and test distributions are
different.
P(XS) 6= P(XT )
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
Training and test distributions are different.
Different text types. Different accents/ages.
Different topics/categories.
Performance drop or even collapse is inevitable.
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
Transfer learning challenges in real-world applications
1. Domains are not well-defined, but fuzzy and
conflate many factors.





2. One-to-one adaptation is rare and many source
domains are generally available.
3. Models need to be adapted frequently as
conditions change, new data becomes available, etc.
Language
socialfactors
genre
topic
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
• Idea: Use distillation + insights from semi-supervised
learning to transfer knowledge from a single (a) and
multiple teachers (b) to a student model9.
(a) (b)
9 Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Knowledge Adaptation: Teaching to Adapt. In arXiv preprint arXiv:1702.02052.
How to adapt from large source domains?
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
• Idea: Take into account diversity of training data to
select subsets (c) rather than an entire domain (a) or
individual examples (b)10.
10 Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Data Selection Strategies for Multi-Domain Sentiment Analysis. In arXiv preprint arXiv:1702.02426.
How to select data for adaptation?
(a) (b) (c)
01.03.17 | LinkedIn Tech Talk
Our research
@seb_ruder |
Opportunities and future directions
• Learn from past adaptation scenarios and
generalise across domains and tasks.
• Robust adaptation to non-English and low-
resource languages.
• Adaptation for novel tasks and more sophisticated
models, e.g. QA and memory networks.
• Transfer across tasks and leveraging knowledge
from related tasks.
01.03.17 | LinkedIn Tech Talk
References
@seb_ruder |
Image credit
• Google Research blog post11
• Mikolov, T., Joulin, A., & Baroni, M. (2015). A Roadmap towards
Machine Intelligence. arXiv preprint arXiv:1511.08130.
• Google Research blog post12
Our papers
• Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Knowledge Adaptation:
Teaching to Adapt. In arXiv preprint arXiv:1702.02052.
• Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Data Selection Strategies
for Multi-Domain Sentiment Analysis. In arXiv preprint arXiv:
1702.02426.
11 https://research.googleblog.com/2016/10/how-robots-can-acquire-new-skills-from.html
12 https://googleblog.blogspot.ie/2014/04/the-latest-chapter-for-self-driving-car.html
01.03.17 | LinkedIn Tech Talk
@seb_ruder |
Thanks for your attention!
Questions?
01.03.17 | LinkedIn Tech Talk

More Related Content

What's hot

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Hsing-chuan Hsieh
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 

What's hot (20)

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Shap
ShapShap
Shap
 
Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)Introduction to Grad-CAM (complete version)
Introduction to Grad-CAM (complete version)
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 

Similar to Transfer Learning -- The Next Frontier for Machine Learning

Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
Recent developments in CS education research Jul 18
Recent developments in CS education research Jul 18Recent developments in CS education research Jul 18
Recent developments in CS education research Jul 18Sue Sentance
 
Toward Learner-centered Education: Quality Issues
Toward Learner-centered Education: Quality IssuesToward Learner-centered Education: Quality Issues
Toward Learner-centered Education: Quality IssuesLisa Marie Blaschke
 
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...eMadrid network
 
Fighting level 3: From the LA framework to LA practice on the micro-level
Fighting level 3: From the LA framework to LA practice on the micro-levelFighting level 3: From the LA framework to LA practice on the micro-level
Fighting level 3: From the LA framework to LA practice on the micro-levelHendrik Drachsler
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsSungjin Nam
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Debanjan Mahata
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Debanjan Mahata
 
DBR in the m-learning context (A. Palalas), March 2013
DBR in the m-learning context (A. Palalas), March 2013DBR in the m-learning context (A. Palalas), March 2013
DBR in the m-learning context (A. Palalas), March 2013Agnieszka (Aga) Palalas, Ed.D.
 
Learning analytics research informed institutional practice
Learning analytics research informed institutional practiceLearning analytics research informed institutional practice
Learning analytics research informed institutional practiceYi-Shan Tsai
 
Qualitative approaches to learning analytics
Qualitative approaches to learning analyticsQualitative approaches to learning analytics
Qualitative approaches to learning analyticsRebecca Ferguson
 
MS-Word.doc
MS-Word.docMS-Word.doc
MS-Word.docbutest
 
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha A_Palalas C_G...
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha  A_Palalas C_G...DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha  A_Palalas C_G...
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha A_Palalas C_G...Agnieszka (Aga) Palalas, Ed.D.
 
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKING
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKINGTOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKING
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKINGcsandit
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy ProjectDuraSpace
 
Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)danw421
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Developing Liaison Librarians Data-Intensive Research Engagement
Developing Liaison Librarians Data-Intensive Research EngagementDeveloping Liaison Librarians Data-Intensive Research Engagement
Developing Liaison Librarians Data-Intensive Research EngagementThe Entrepreneurial Librarian
 

Similar to Transfer Learning -- The Next Frontier for Machine Learning (20)

Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Recent developments in CS education research Jul 18
Recent developments in CS education research Jul 18Recent developments in CS education research Jul 18
Recent developments in CS education research Jul 18
 
Toward Learner-centered Education: Quality Issues
Toward Learner-centered Education: Quality IssuesToward Learner-centered Education: Quality Issues
Toward Learner-centered Education: Quality Issues
 
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...
VII Jornadas eMadrid "Education in exponential times"."Maturing the learning ...
 
Fighting level 3: From the LA framework to LA practice on the micro-level
Fighting level 3: From the LA framework to LA practice on the micro-levelFighting level 3: From the LA framework to LA practice on the micro-level
Fighting level 3: From the LA framework to LA practice on the micro-level
 
AIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community PresentationsAIM Analytics: U-M Community Presentations
AIM Analytics: U-M Community Presentations
 
COLTT Learning at Scale
COLTT Learning at ScaleCOLTT Learning at Scale
COLTT Learning at Scale
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017
 
DBR in the m-learning context (A. Palalas), March 2013
DBR in the m-learning context (A. Palalas), March 2013DBR in the m-learning context (A. Palalas), March 2013
DBR in the m-learning context (A. Palalas), March 2013
 
Learning analytics research informed institutional practice
Learning analytics research informed institutional practiceLearning analytics research informed institutional practice
Learning analytics research informed institutional practice
 
Qualitative approaches to learning analytics
Qualitative approaches to learning analyticsQualitative approaches to learning analytics
Qualitative approaches to learning analytics
 
MS-Word.doc
MS-Word.docMS-Word.doc
MS-Word.doc
 
Sicilia-Aera08
Sicilia-Aera08Sicilia-Aera08
Sicilia-Aera08
 
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha A_Palalas C_G...
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha  A_Palalas C_G...DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha  A_Palalas C_G...
DBR (Design-Based Research) in mobile learning-Mlearn2013 Doha A_Palalas C_G...
 
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKING
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKINGTOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKING
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKING
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
 
Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)Student research behavior — prototype application (at CIL)
Student research behavior — prototype application (at CIL)
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Developing Liaison Librarians Data-Intensive Research Engagement
Developing Liaison Librarians Data-Intensive Research EngagementDeveloping Liaison Librarians Data-Intensive Research Engagement
Developing Liaison Librarians Data-Intensive Research Engagement
 

More from Sebastian Ruder

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionSebastian Ruder
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSebastian Ruder
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoSebastian Ruder
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimSebastian Ruder
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Sebastian Ruder
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSebastian Ruder
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderSebastian Ruder
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoSebastian Ruder
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENSebastian Ruder
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...Sebastian Ruder
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Sebastian Ruder
 
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisSebastian Ruder
 

More from Sebastian Ruder (20)

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
 
On the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary InductionOn the Limitations of Unsupervised Bilingual Dictionary Induction
On the Limitations of Unsupervised Bilingual Dictionary Induction
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian Ruder
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIEN
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
 
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
 

Recently uploaded

Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 

Recently uploaded (20)

Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 

Transfer Learning -- The Next Frontier for Machine Learning

  • 1. Sebastian Ruder
 Research Scientist, AYLIEN PhD Candidate, Insight Centre @seb_ruder |01.03.17 | LinkedIn Tech Talk Transfer Learning — The Next Frontier for ML
  • 2. Agenda 1. What is Transfer Learning? 2. Why Transfer Learning now? 3. Transfer Learning in practice 4. Transfer Learning for NLP 5. Our research 6. Opportunities and directions @seb_ruder |01.03.17 | LinkedIn Tech Talk
  • 3. What is Transfer Learning? @seb_ruder |01.03.17 | LinkedIn Tech Talk Model A Model B Task / domain A Task / domain B Traditional ML Training and evaluation on the same task or domain.
  • 4. What is Transfer Learning? @seb_ruder | Knowledge Model Source task / domain Target task / domain Transfer learning Storing knowledge gained solving one problem and applying it to a different but related problem. Model 01.03.17 | LinkedIn Tech Talk
  • 6. “Transfer learning will be the next driver of ML success.” Andrew Ng, NIPS 2016 keynote @seb_ruder |
  • 7. Why Transfer Learning now? @seb_ruder | Supervised learning Transfer learning Unsupervised learning Reinforcement learning 2016Time Commercial success Drivers of ML success in industry - Andrew Ng, NIPS 2016 keynote 01.03.17 | LinkedIn Tech Talk
  • 8. Why Transfer Learning now? @seb_ruder | 1. Learn very accurate input-output mapping 2. Maturity of ML models - Computer vision (5% error on ImageNet) -Automatic speech recognition (3x faster than typing, 20% more accurate1) 3. Large-scale deployment & adoption of ML models -Google’s NMT System2 1 Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech Is 3x Faster than Typing for English and Mandarin Text Entry on Mobile Devices. arXiv preprint arXiv:1608.07323. 2 Wu, Y., Schuster, M., Chen, Z., Le, Q. V, Norouzi, M., Macherey, W., … Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144. Huge reliance on labeled data 
 Novel tasks / domains without (labeled) data 01.03.17 | LinkedIn Tech Talk
  • 9. Transfer Learning in practice @seb_ruder | • Train new model on features of large model trained on ImageNet3 • Train model to confuse source and target domains4 • Train model on domain- invariant representations5,6 3 Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 512–519. 4 Ganin, Y., & Lempitsky, V. (2015). Unsupervised Domain Adaptation by Backpropagation. Proceedings of the 32nd International Conference on Machine Learning., 37. 5 Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., & Erhan, D. (2016). Domain Separation Networks. NIPS 2016. 6 Sener, O., Song, H. O., Saxena, A., & Savarese, S. (2016). Learning Transferrable Representations for Unsupervised Domain Adaptation. NIPS 2016. Computer vision 01.03.17 | LinkedIn Tech Talk
  • 10. Transfer Learning in practice @seb_ruder | • Progressive Neural
 Networks7 have access to weights from trained models • PathNet8 learns weight paths via a genetic algorithm 7 Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., … Deepmind, G. (2016). Progressive Neural Networks. arXiv preprint arXiv:1606.04671. 8 Fernando, C., Banarse, D., Blundell, C., Zwols, Y., Ha, D., Rusu, A. A., … Wierstra, D. (2017). PathNet: Evolution Channels Gradient Descent in Super Neural Networks. In arXiv preprint arXiv:1701.08734. Reinforcement learning 01.03.17 | LinkedIn Tech Talk
  • 11. Transfer Learning for NLP @seb_ruder | • Task and domainT D DS 6= DT TS 6= TT A (slightly) more technical definition • Domain where - : feature space, e.g. BOW representations - : e.g. distribution over terms in documents D = {X, P(X)} X P(X) • Task where - : label space, e.g. true/false labels - : learned mapping from samples to labels T = {Y, P(Y |X)} Y P(Y |X) • Transfer learning:
 Learning when or 01.03.17 | LinkedIn Tech Talk
  • 12. Transfer Learning for NLP @seb_ruder | Transfer scenarios 1. : Different topics, text types, etc.
 2. : Different languages.
 3. : Unbalanced classes.
 4. : Different tasks. P(XS) 6= P(XT ) XS 6= XT P(YS|XS) 6= P(YT |XT ) YS 6= YT 01.03.17 | LinkedIn Tech Talk
  • 13. Transfer Learning for NLP @seb_ruder | Current status • Not as straightforward as in CV - No universal deep features • However: “Simple” transfer through word embeddings is pervasive • History of research for task-specific transfer, e.g. sentiment analysis, POS tagging leveraging NLP phenomena such as structured features, sentiment words, etc. • Few research on transfer between tasks • More recently: representation-based research 01.03.17 | LinkedIn Tech Talk
  • 14. Our research @seb_ruder | Research focus Finding better ways to transfer knowledge to new domains, tasks, and languages that 1. perform well in large-scale settings and real- world applications; 2. are applicable to many tasks and models. Current focus: : Training and test distributions are different. P(XS) 6= P(XT ) 01.03.17 | LinkedIn Tech Talk
  • 15. Our research @seb_ruder | Training and test distributions are different. Different text types. Different accents/ages. Different topics/categories. Performance drop or even collapse is inevitable. 01.03.17 | LinkedIn Tech Talk
  • 16. Our research @seb_ruder | Transfer learning challenges in real-world applications 1. Domains are not well-defined, but fuzzy and conflate many factors.
 
 
 2. One-to-one adaptation is rare and many source domains are generally available. 3. Models need to be adapted frequently as conditions change, new data becomes available, etc. Language socialfactors genre topic 01.03.17 | LinkedIn Tech Talk
  • 17. Our research @seb_ruder | • Idea: Use distillation + insights from semi-supervised learning to transfer knowledge from a single (a) and multiple teachers (b) to a student model9. (a) (b) 9 Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Knowledge Adaptation: Teaching to Adapt. In arXiv preprint arXiv:1702.02052. How to adapt from large source domains? 01.03.17 | LinkedIn Tech Talk
  • 18. Our research @seb_ruder | • Idea: Take into account diversity of training data to select subsets (c) rather than an entire domain (a) or individual examples (b)10. 10 Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Data Selection Strategies for Multi-Domain Sentiment Analysis. In arXiv preprint arXiv:1702.02426. How to select data for adaptation? (a) (b) (c) 01.03.17 | LinkedIn Tech Talk
  • 19. Our research @seb_ruder | Opportunities and future directions • Learn from past adaptation scenarios and generalise across domains and tasks. • Robust adaptation to non-English and low- resource languages. • Adaptation for novel tasks and more sophisticated models, e.g. QA and memory networks. • Transfer across tasks and leveraging knowledge from related tasks. 01.03.17 | LinkedIn Tech Talk
  • 20. References @seb_ruder | Image credit • Google Research blog post11 • Mikolov, T., Joulin, A., & Baroni, M. (2015). A Roadmap towards Machine Intelligence. arXiv preprint arXiv:1511.08130. • Google Research blog post12 Our papers • Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Knowledge Adaptation: Teaching to Adapt. In arXiv preprint arXiv:1702.02052. • Ruder, S., Ghaffari, P., & Breslin, J. G. (2017). Data Selection Strategies for Multi-Domain Sentiment Analysis. In arXiv preprint arXiv: 1702.02426. 11 https://research.googleblog.com/2016/10/how-robots-can-acquire-new-skills-from.html 12 https://googleblog.blogspot.ie/2014/04/the-latest-chapter-for-self-driving-car.html 01.03.17 | LinkedIn Tech Talk
  • 21. @seb_ruder | Thanks for your attention! Questions? 01.03.17 | LinkedIn Tech Talk