Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Semi-Supervised Learning 
Lukas Tencer 
PhD student @ ETS
Motivation
Image Similarity 
- Domain of origin 
:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
Face Recognition 
- Cross-race effect 
:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
Motivation in Machine Learning 
:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
Motivation in Machine Learning 
:: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
Methodology
When to use Semi-Supervised Learning? 
• Labelled data is hard to get and expensive 
– Speech analysis: 
• Switchboard dat...
Types of Semi-Supervised Leaning 
• Transductive Learning 
– Does not generalize to unseen data 
– Produces labels only fo...
Selected Semi-Supervised Algorithms 
• Self-Training 
• Help-Training 
• Transductive SVM (S3VM) 
• Multiview Algorithms 
...
Self-Training 
• The Idea: If I am highly confident in a label of examples, I 
am right 
• Given Training set 푇 = {푥푖 }, a...
Self-Training 
• Advantages: 
– Very simple and fast method 
– Frequently used in NLP 
• Disadvantages: 
– Amplifies noise...
Self-Training 
1. Naïve Bayes Classifier on Bag-of-Visual-Word for 2 Classes 
2. Classify Unlabelled Data base on Learned ...
Self-Training 
3. Add the most confident images to the training set 
4. Retrain and repeat 
:: Semi-Supervised Learning ::...
Help-Training 
• The Challenge: How to make Self-Training work for 
Discriminative Classifiers (SVM) ? 
• The Idea: Train ...
Transductive SVM (S3VM) 
• The Idea: Find largest margin classifier, such that, 
unlabelled data are outside of the margin...
Transductive SVM (S3VM) 
• Solving non-convex optimization problem: 
퐽 휃 = 
• Methods: 
1 
2 
푤 2 + 푐1 
푥푖∈푇 
퐿(푦푖푓휃 (푥푖 )...
Transductive SVM (S3VM) 
• Advantages: 
– Can be used with any SVM 
– Clear optimization criterion, mathematically well 
f...
Multiview Algorithms 
• The Idea: Train 2 classifiers on 2 disjoint sets of features, 
then let each classifier label unla...
Multiview Algorithms 
• Application: Web-page Topic Classification 
– 1. Classifier for Images; 2. Classifier for Text 
::...
Multiview Algorithms 
• Advantages: 
– Simple Method applicable to any classifier 
– Can correct mistakes in classificatio...
Graph-Based Algorithms 
• The Idea: Create a connected graph from labelled and 
unlabelled examples, propagate labels over...
Graph-Based Algorithms 
• Advantages: 
– Great performance if graph fits the tasks 
– Can be used in combination with any ...
Generative Models 
• The Idea: Assume distribution using labelled data, update 
using unlabelled data 
• Simple models is:...
Generative Models 
• Advantages: 
– Nice probabilistic framework 
– Instead of EM you can go full Bayesian and include 
pr...
What could go wrong? 
• Semi-Supervised Learning make a lot of assumptions 
– Smoothness 
– Clusters 
– Manifolds 
• Some ...
There is much more out there 
• Structural Learning 
• Co-EM 
• Tri-Training 
• Co-Boosting 
• Unsupervised pretraining – ...
Demo
Conclusion 
• Play with Semi-Supervised Learning 
• Basic methods are vary simple to implement and can give 
you up to 5 t...
Some more resources 
Videos to watch: 
Semisupervised Learning Approaches – Tom Mitchell CMU : 
http://videolectures.net/m...
THANKS FOR YOUR TIME 
Lukas Tencer 
lukas.tencer@gmail.com 
http://lukastencer.github.io/ 
https://github.com/lukastencer ...
Upcoming SlideShare
Loading in …5
×

Semi-Supervised Learning

4,519 views

Published on

Review presentation about Semi-Supervised techniques in Machine Learning. Presentation was done as part of Montreal Data series.

Published in: Technology
  • Be the first to comment

Semi-Supervised Learning

  1. 1. Semi-Supervised Learning Lukas Tencer PhD student @ ETS
  2. 2. Motivation
  3. 3. Image Similarity - Domain of origin :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  4. 4. Face Recognition - Cross-race effect :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  5. 5. Motivation in Machine Learning :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  6. 6. Motivation in Machine Learning :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  7. 7. Methodology
  8. 8. When to use Semi-Supervised Learning? • Labelled data is hard to get and expensive – Speech analysis: • Switchboard dataset • 400 hours annotation time for 1 hour of speech – Natural Language Processing • Penn Chinese Treebank • 2 Years for 4000 sentences – Medical Application • Require experts opinion which might not be unique • Unlabelled data is cheap :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  9. 9. Types of Semi-Supervised Leaning • Transductive Learning – Does not generalize to unseen data – Produces labels only for the data at training time • 1. Assume labels • 2. Train classifier on assumed labels • Inductive Learning – Does generalize to unseen data – Not only produces labels, but also the final classifier – Manifold Assumption :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  10. 10. Selected Semi-Supervised Algorithms • Self-Training • Help-Training • Transductive SVM (S3VM) • Multiview Algorithms • Graph-Based Algorithms • Generative Models • ……. ….. … :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  11. 11. Self-Training • The Idea: If I am highly confident in a label of examples, I am right • Given Training set 푇 = {푥푖 }, and unlabelled set 푈 = {푢푗 } 1. Train 푓 on 푇 2. Get predictions 푃 = 푓(푈) 3. If 푃푖 > 훼 then add (푥, 푓(푥)) to 푇 4. Retrain 푓 on 푇 :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  12. 12. Self-Training • Advantages: – Very simple and fast method – Frequently used in NLP • Disadvantages: – Amplifies noise in labeled data – Requires explicit definition of 푃 푦 푥 – Hard to implement for discriminative classifiers (SVM) :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  13. 13. Self-Training 1. Naïve Bayes Classifier on Bag-of-Visual-Word for 2 Classes 2. Classify Unlabelled Data base on Learned Classifier :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  14. 14. Self-Training 3. Add the most confident images to the training set 4. Retrain and repeat :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  15. 15. Help-Training • The Challenge: How to make Self-Training work for Discriminative Classifiers (SVM) ? • The Idea: Train Generative Help Classifier to get 푝(푦|푥) • Given Training set 푇 = {푥푖 }, unlabelled set 푈 = {푢푗 }, and generative classifier 푔 and discriminative classifier 푓 1. Train 푓 and 푔 on 푇 2. Get predictions 푃푔 = 푔(푈) and 푃푓 = 푓(푈) 3. If 푃푔,푖 > 훼 then add (푥, 푓(푥)) to 푇 4. Reduce the value of 훼 if |푃푔,푖 > 훼| = 0 5. Retrain 푓 and 푔 on 푇 until 푈 = 0 :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  16. 16. Transductive SVM (S3VM) • The Idea: Find largest margin classifier, such that, unlabelled data are outside of the margin as much as possible, use regularization over unlabelled data • Given Training set 푇 = {푥푖 }, and unlabelled set 푈 = {푢푗 } 1. Find all possible labelings 푈1 ⋯ 푈푛 on 푈 2. For each 푇 푘 = 푇 ∪ 푈푘 train a standard SVM 3. Choose SVM with largest margins • What is the catch? • NP hard problem, fortunately approximations exist :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  17. 17. Transductive SVM (S3VM) • Solving non-convex optimization problem: 퐽 휃 = • Methods: 1 2 푤 2 + 푐1 푥푖∈푇 퐿(푦푖푓휃 (푥푖 )) + 푐2 – Local Combinatorial Search – Standard unconstrained optimization solvers (CG, BFGS…) – Continuation Methods – Concave-Convex procedure (CCCP) – Branch and Bound :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data :: 푥푖∈푈 퐿( 푓휃 (푥푖 ) )
  18. 18. Transductive SVM (S3VM) • Advantages: – Can be used with any SVM – Clear optimization criterion, mathematically well formulated • Disadvantages: – Hard to optimize – Prone to local minima – non convex – Only small gain given modest assumptions :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  19. 19. Multiview Algorithms • The Idea: Train 2 classifiers on 2 disjoint sets of features, then let each classifier label unlabelled examples and teach the other classifier • Given Training set 푇 = {푥푖 }, and unlabelled set 푈 = {푢푗 } 1. Split 푇 into 푇1 and 푇2 on the feature dimension 2. Train 푓1 on 푇1 and 푓1 on 푇2 3. Get predictions 푃1 = 푓1(푈) and 푃2 = 푓2(푈) 4. Add: top 푘 from 푃1 to 푇2; top 푘 from 푃1 to 푇1 5. Repeat until 푈 = 0 :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  20. 20. Multiview Algorithms • Application: Web-page Topic Classification – 1. Classifier for Images; 2. Classifier for Text :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  21. 21. Multiview Algorithms • Advantages: – Simple Method applicable to any classifier – Can correct mistakes in classification between the 2 classifiers • Disadvantages: – Assumes conditional independence between features – Natural split may not exist – Artificial split may be complicated if only few eatures :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  22. 22. Graph-Based Algorithms • The Idea: Create a connected graph from labelled and unlabelled examples, propagate labels over the graph :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  23. 23. Graph-Based Algorithms • Advantages: – Great performance if graph fits the tasks – Can be used in combination with any model – Explicit mathematical formulation • Disadvantages: – Problem if graph does not fit the task – Hard to construct graph in sparse spaces :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  24. 24. Generative Models • The Idea: Assume distribution using labelled data, update using unlabelled data • Simple models is: GMM + EM :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  25. 25. Generative Models • Advantages: – Nice probabilistic framework – Instead of EM you can go full Bayesian and include prior with MAP • Disadvantages: – EM find only local minima – Makes strong assumptions about class distributions :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  26. 26. What could go wrong? • Semi-Supervised Learning make a lot of assumptions – Smoothness – Clusters – Manifolds • Some techniques (Co-Training) require very specific setup • Frequently problem with noisy labels • There is no free lunch :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  27. 27. There is much more out there • Structural Learning • Co-EM • Tri-Training • Co-Boosting • Unsupervised pretraining – deep learning • Transductive Inference • Universum Learning • Active Learning + Semi-Supervised Learning • ……. • ….. • … :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data :: My work
  28. 28. Demo
  29. 29. Conclusion • Play with Semi-Supervised Learning • Basic methods are vary simple to implement and can give you up to 5 to 10% accuracy • You can cheat at competitions by using unlabelled data, often no assumption is made about external data • Be careful when running Semi-Supervised Learning in production environment, keep an eye on your algorithm • If running in production, be aware that data patterns change and old assumptions about labels may screw up you new unlabelled data :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  30. 30. Some more resources Videos to watch: Semisupervised Learning Approaches – Tom Mitchell CMU : http://videolectures.net/mlas06_mitchell_sla/ MLSS 2012 Graph based semi-supervised learning - Zoubin Ghahramani Cambridge : https://www.youtube.com/watch?v=HZQOvm0fkLA Books to read: • Semi-Supervised Learning – Chapelle, Schölkopf, Zien • Introduction to Semi-Supervised Learning - Zhu, Oldberg, Brachman, Dietterich :: Semi-Supervised Learning :: Lukas Tencer :: MTL Data ::
  31. 31. THANKS FOR YOUR TIME Lukas Tencer lukas.tencer@gmail.com http://lukastencer.github.io/ https://github.com/lukastencer https://twitter.com/lukastencer Graduating August 2015, looking for ML and DS opportunities

×