Semi-Supervised.pptx

CS 678 - Ensembles and Bayes 1
Semi-Supervised Learning
 Can we improve the quality of our learning by combining
labeled and unlabeled data
 Usually a lot more unlabeled data available than labeled
 Assume a set L of labeled data and U of unlabeled data
(from the same distribution)
 Focus on Semi-Supervised Classification though there are
many other variations
– Aiding clustering with some labeled data
– Regression
– Model selection with unlabeled data (COD)
 Transduction vs Induction

How Semi-Supervised Works
 Most approaches make strong model assumptions
(guesses). If wrong can make things worse.
 Some commonly used assumptions:
– Clusters of data are from the same class
– Data can be represented as a mixture of parameterized distributions
– Decision boundaries should go through non-dense areas of the data
– Model should be as simple as possible (Occam)

Unsupervised Learning of Domain
Features
 PCA, SVD
 NLDR – Non-Linear Dimensionality Reduction
 Many Deep Learning Models
– Deep Belief Nets
– Sparse Auto-encoders
– Self-Taught Learning

Deep Net with Greedy Layer Wise Training
Adobe – Deep Learning and Active Learning 4
ML Model
New Feature Space
Original Inputs
Supervised
Learning
Unsupervised
Learning

Self-Training (Bootstrap)
 Self-Training
– Train supervised model on labeled data L
– Test on unlabeled data U
– Add the most confidently classified members of U to L
– Repeat
 Multi-Model
– Uses multiple models to label/move instances of U to L
– Co-Training
 Train two models with different independent features sets
 Add most confident instances from U of one model into L of the other (i.e.
they “teach” each other)
 Repeat
– Multi-View Learning
 Train multiple diverse models on L. Those instances in U which most
models agree on are placed in L.

Generative Models
 Generative – Assume data can be represented by some
mixture of parameterized models (e.g. Gaussian) and use
EM to learn parameters (ala Baum-Welch)

Graph Models
 Graph Models
– Neighbor nodes assumed to be similar with larger edge weights
– Force same class member in L to be close, while maintaining
smoothness with respect to the graph for U.
– Add in members of U as neighbors based on some similarity
– Iteratively label U (breadth first)

TSVM
 Transductive SVM (TSVM) or Semi-Supervised SVM
(S3VM)
 Maximize margin of both L and U. Decision surface
placed in non-dense spaces
– Assumes classes are "well-separated"
– Can also try to simultaneously maintain class proportion on both
sides similar to labeled proportion

Summary
 Oracle Learning
 Becoming a more critical area as more unlabeled data
becomes cheaply available

Active Learning
 Obtaining labeled data can be the most expensive part of a
machine learning task
 Supervised, Unsupervised, and Semi-Supervised Learning
 In Active Learning can query an oracle (e.g. a human
expert, test, etc.) to obtain the label for a specific input
 In active learning we try to learn the most accurate model
while having to query the least amount of data for labels

Active Learning
Often query:
1) A low confidence instance (i.e. near a decision boundary)
2) An instance which is in a relatively dense neighborhood

Active Learning
Often query:

Active Clustering
Images (Objects, Words, etc.)
 First do unsupervised clustering
 Which points to show an expert in order to get feedback on
the clustering to allow adjustment?

Active Clustering

Semi-Supervised.pptx

Recommended

Recommended

More Related Content

Similar to Semi-Supervised.pptx

Similar to Semi-Supervised.pptx (20)

More from Tamer Nadeem

More from Tamer Nadeem (8)

Recently uploaded

Recently uploaded (20)

Semi-Supervised.pptx