Web & Social Media Analytics Previous Year Question Paper.pdf
Semi-Supervised.pptx
1. CS 678 - Ensembles and Bayes 1
Semi-Supervised Learning
Can we improve the quality of our learning by combining
labeled and unlabeled data
Usually a lot more unlabeled data available than labeled
Assume a set L of labeled data and U of unlabeled data
(from the same distribution)
Focus on Semi-Supervised Classification though there are
many other variations
– Aiding clustering with some labeled data
– Regression
– Model selection with unlabeled data (COD)
Transduction vs Induction
2. How Semi-Supervised Works
Most approaches make strong model assumptions
(guesses). If wrong can make things worse.
Some commonly used assumptions:
– Clusters of data are from the same class
– Data can be represented as a mixture of parameterized distributions
– Decision boundaries should go through non-dense areas of the data
– Model should be as simple as possible (Occam)
CS 678 - Ensembles and Bayes 2
3. Unsupervised Learning of Domain
Features
PCA, SVD
NLDR – Non-Linear Dimensionality Reduction
Many Deep Learning Models
– Deep Belief Nets
– Sparse Auto-encoders
– Self-Taught Learning
CS 678 - Ensembles and Bayes 3
4. Deep Net with Greedy Layer Wise Training
Adobe – Deep Learning and Active Learning 4
ML Model
New Feature Space
Original Inputs
Supervised
Learning
Unsupervised
Learning
5. Self-Training (Bootstrap)
Self-Training
– Train supervised model on labeled data L
– Test on unlabeled data U
– Add the most confidently classified members of U to L
– Repeat
Multi-Model
– Uses multiple models to label/move instances of U to L
– Co-Training
Train two models with different independent features sets
Add most confident instances from U of one model into L of the other (i.e.
they “teach” each other)
Repeat
– Multi-View Learning
Train multiple diverse models on L. Those instances in U which most
models agree on are placed in L.
CS 678 - Ensembles and Bayes 5
6. Generative Models
Generative – Assume data can be represented by some
mixture of parameterized models (e.g. Gaussian) and use
EM to learn parameters (ala Baum-Welch)
CS 678 - Ensembles and Bayes 6
7. Graph Models
Graph Models
– Neighbor nodes assumed to be similar with larger edge weights
– Force same class member in L to be close, while maintaining
smoothness with respect to the graph for U.
– Add in members of U as neighbors based on some similarity
– Iteratively label U (breadth first)
CS 678 - Ensembles and Bayes 7
8. TSVM
Transductive SVM (TSVM) or Semi-Supervised SVM
(S3VM)
Maximize margin of both L and U. Decision surface
placed in non-dense spaces
– Assumes classes are "well-separated"
– Can also try to simultaneously maintain class proportion on both
sides similar to labeled proportion
CS 678 - Ensembles and Bayes 8
9. Summary
Oracle Learning
Becoming a more critical area as more unlabeled data
becomes cheaply available
CS 678 - Ensembles and Bayes 9
10. Active Learning
Obtaining labeled data can be the most expensive part of a
machine learning task
Supervised, Unsupervised, and Semi-Supervised Learning
In Active Learning can query an oracle (e.g. a human
expert, test, etc.) to obtain the label for a specific input
In active learning we try to learn the most accurate model
while having to query the least amount of data for labels
Adobe – Deep Learning and Active Learning 10
11. Active Learning
Adobe – Deep Learning and Active Learning 11
Often query:
1) A low confidence instance (i.e. near a decision boundary)
2) An instance which is in a relatively dense neighborhood
12. Active Learning
Adobe – Deep Learning and Active Learning 12
Often query:
1) A low confidence instance (i.e. near a decision boundary)
2) An instance which is in a relatively dense neighborhood
13. Active Learning
Adobe – Deep Learning and Active Learning 13
Often query:
1) A low confidence instance (i.e. near a decision boundary)
2) An instance which is in a relatively dense neighborhood
14. Active Learning
Adobe – Deep Learning and Active Learning 14
Often query:
1) A low confidence instance (i.e. near a decision boundary)
2) An instance which is in a relatively dense neighborhood
15. Active Clustering
Images (Objects, Words, etc.)
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
Adobe – Deep Learning and Active Learning 15
16. Active Clustering
Images (Objects, Words, etc.)
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
Adobe – Deep Learning and Active Learning 16
17. Active Clustering
Images (Objects, Words, etc.)
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
Adobe – Deep Learning and Active Learning 17