- 1. Nov. 23rd, 2009 { On SSL, and beyond } - Theories, Methods, and a Possible Suggestion on Semi-Supervised Learning - Lab Seminar Presentation Eunjeong Park
- 2. Agenda 1. Background 2. Semi-Supervised Learning Methods 3. Assumptions on SSL 4. Future Work
- 3. Agenda 1. Background 2. Semi-Supervised Learning Methods 3. Assumptions on SSL 4. Future Work
- 4. Background Examples (1/2) • Spam E-mail Classification inbox ? spam
- 5. Background Examples (2/2) • Response Modeling respondents ? unlabeled non-respondents
- 6. Background the Question (1/2) • Statistical learning methods require LOTS of training data – But since we only have a limited amount of labeled data, – Can we figure out a way for our learning algorithms to take advantage of all the unlabeled data? Labeled Unlabeled …
- 7. Background the Question (2/2) f: x→y <xi, yi> <xi> …? • Text/Web Mining • Marketing – Document classification - Response Modeling • f: Doc → Class • f: Demo+RFM → Response • Spam filtering, web page classification - Fraud Detection – Information extraction • f: Demo+PaymentHistory → Fraud • f: Sentence → Fact, f: Doc → Fact - Customer Segmentation – Translation • f: EnglishDoc → FrenchDoc • f: Demo+RFM → Customer Seg.
- 8. Agenda 1. Background 2. Semi-Supervised Learning Methods 3. Assumptions on SSL 4. Future Work
- 9. Semi-Supervised Learning Methodology [1] • Generative models – Unlabeled data is used to to either modify or reprioritize hypotheses obtained from labeled data alone – Given the Bayesian formula: p( x | y ) P( y ) P( y | x) = p( x) we can easily discover that p(x) influences p(y|x) – Mixture models with EM is in this category, and to some extent self-training, too • Discriminative models – Original discriminative training cannot be used for SSL, since p(y|x) is estimated ignoring p(x) – To solve the problem, p(x) dependent terms are often brought into the objective function, which amounts to assuming p(y|x) and p(x) share parameters – Transductive SVM, Gaussian processes, information regularization, graph-based methods are in this category ※ For more on GM, DM refer to Appendix 1.
- 10. Semi-Supervised Learning Previous methods SSL Semi-Supervised Learning • EM w/ Generative Mixture Models (Nigam et al., 2000; Miller & Uyar, 1997) •Self-Training • Co-Training and Multiview Learning (Blum & Mitchell, 1998; Goldman & Zhou, 2000) • TSVMs (Bennett et al., 1999; Joachims, 1999) •Gaussian Processes •Information Regularization •Entropy Minimization • Graph-based methods (Blum & Chawla, 2001) Ref [1], [2] reorganized ※ For more on the use of above methods, refer to Appendix 2.
- 11. Semi-Supervised Previous methods: Learning EM w/Generative Models (1/3) Basic EM Algorithm Incorporated w/ unlabeled data [3]
- 12. Semi-Supervised Previous methods: Learning EM w/Generative Models (2/3) • In a binary classification problem, if we assume each class has a Gaussian distribution, then we can use unlabeled data to help parameter estimation. [1]
- 13. Semi-Supervised Previous methods: Learning EM w/Generative Models (3/3)
- 14. Semi-Supervised Previous methods: Learning Co-Training (1/4) Professor Cho My Advisor
- 15. Semi-Supervised Previous methods: Learning Co-Training (2/4) • Key Idea: Classifier1 and Classifier2 must… – Correctly classify labeled examples – Agree on classification of unlabeled Classifier 1: Hyperlinks only Classifier 2: Page only Professor Cho My Advisor
- 16. Semi-Supervised Previous methods: Learning Co-Training (3/4) [4] • Given: labeled data L, unlabeled data U • Loop: – Train g1 (hyperlink classifier) using L – Train g2 (page classifier) using L – Allow g1 to label p positive, n negative examples from U – Allow g2 to label p positive, n negative examples from U – Add these self-labeled examples to L Answer1 Answer2 Classifier1 Classifier2 Professor Cho My Advisor
- 17. Semi-Supervised Previous methods: Learning Co-Training (4/4) • Experimental Settings: – begin with 12 labeled web pages (academic course) – provide 1,000 additional unlabeled web pages – average error: learning from labeled data 11.1%; – average error: cotraining 5.0%
- 18. Semi-Supervised Previous methods: Learning TSVMs + + - + - + - + -
- 19. Semi-Supervised Previous methods: Learning Graph-based methods • Key idea: Define a graph where… – nodes are labeled and unlabeled examples in the dataset, and – edges (may be weighted) reflect the similarity of examples – Then, nodes connected by a large-weight edge tend to have the same label, and labels can propagation throughout the graph • Note: Graph-based methods enjoy nice properties from spectral graph theory
- 20. Agenda 1. Background 2. Semi-Supervised Learning Methods 3. Assumptions on SSL 4. Future Work
- 21. Assumptions on SSL The Utility of Unlabeled Data • Many SSL papers start with an introduction like… “labeled data…is often very difficult and expensive to obtain, and thus…unlabeled data holds significant promise in terms of vastly expanding the applicability of learning methods [5]” …but is this necessarily true? – No! Do not take it for granted! – Even though you don’t to have to spend as much time labeling training data, you still need to spend much effort to design good models / features / kernels / similarity functions for SSL! • A good matching of problem structure with model assumption is necessary to effectively use unlabeled data – Bad matching can lead to degradation in classifier performance
- 22. Assumptions on SSL An Example (1/2) • Unlabeled Data Can Degrade Classification Performance of Generative Classifiers [6] (1/2) Naive Bayes classifier from data generated from a Naive Bayes model (left) and a TAN model (right). Each point summarizes 10 runs of each classifier on testing data; bars cover 30 to 70 percentiles.
- 23. Assumptions on SSL An Example (2/2) Spam=0 Spam=1 #of the word ‘Loan’ Q1: Is this e-mail spam? Q2: Was this e-mail written on a Sunday?
- 24. Agenda 1. Problem Definition 2. Semi-Supervised Learning Methods 3. Assumptions on SSL 4. Future Work
- 25. Future Work Multi-Edge Graph-Based SSL • Aside to Semi-Supervised Classification, there are more… – Semi-Supervised Clustering – Semi-Supervised Regression • There are also very similar methods such as… – Active learning • Based on the theories noted above, here’s my question: f: x→y <x1i> <x2i> <x3i> <x4i>
- 26. Future Work Multi-Edge Graph-Based SSL • Ex1: • Ex2:
- 27. Any Questions? ?
- 28. Appendix 1 GM vs. DM • Discriminative models – 방법론: 결정경계의 도입 – PR이 처음 레이더 신호 해석에 쓰이기 시작하던 1950년대부터, 1990년대 중반까지 사실상 PR을 대표하는 독점적인 방법이었음 – Rosenblat의 Perceptron(1958)과, PDP학파의 MLP(1986)역시 이러한 방향에서 주장된 것이 었음 • Generative models – 1996년, PDP학파의 핵심멤버였던 Geoffrey Hinton에 의해 처음 소개됨 (Hinton, G., Using Generative Models for Handwritten Digit Recognition, tPAMI, 1996.) – 이로 인해, clustering 정도 밖에 없다고 여겨졌던 unsupervised learning도 다시 조명을 받 게 되었고, 곧 subspace analysis(ex: PCA)라는 우군을 얻게 되어 급격히 발전함 – 즉, class의 위치가 반드시 서로 다른 class간에 떨어져 있으리란 법이 없으며, 따라서 그보다 는 분포를 잘 묘사할 중심분포, 즉 혼재된 basis들로 기술해야한다는 관점임 (ex: 푸리에 급 수)
- 29. Appendix 2 The Use of SSL Methods[1] • Do the classes produce well clustered data? – EM w/ generative mixture models • Is the existing supervised classifier complicated and hard to modify? – Self-training • Do the features naturally split into two sets? – Co-training • Already using SVM? – TSVMs • Is it true that two points with similar features tend to be in the same class? – Graph-based methods
- 30. References [1] Zhu, X., (2005). Semi-Supervised Learning Literature Survey, Computer Sciences, University of Wisconsin-Madison. [2] Seeger, M., (2001). Learning with labeled and unlabeled data (Technical Survey). [3] Nigam, K., McCallum, A. K., Mitchell, T. M., (2000). Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning 39, 103-134. [4] Mitchell, T. M., (1999). The Role of Unlabeled Data in Supervised Learning, Sixth International Colloquium on Cognitive Science. [5] Raina, R., Battle, A., Packer, B., Ng, A. Y., (2007). Self-taught Learning: Transfer Learning from Unlabeled Data, 24th International Conference on Machine Learning. [6] Cozman, F. G., Cohen, I., Cirelo M., (2002). Unlabeled data can degrade classification performance of generative classifiers, FLAIRS-02. [7] Balcan, M., Blum, A., Choi, P. P., Lafferty, J., Pantano, B., Rwebangira, M. R., Zhu, X., (2005). Person Identification in Webcam Images: An Application of Semi- Supervised Learning, Proc. of the 22 st ICML Workshop on Learning with Partially Classified Training Data, Bonn, Germany.