TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
NIPS2010 reading: Semi-supervised learning with adversarially missing label information
1. Semi-supervised learning with
adversarially missing label information
by U. Syad and B. Taskar @ Univ. Penn
Akisato Kimura, Ph.D
[E-mail] akisato@ieee.org [Twitter] @_akisato
2. Abstract
Semi-supervised learning in an adversarial setting
Consider a general framework, including
Covariate shift : Missing labels are not necessarily selected at
random.
Multiple instance learning : Correspondences between labels
and instances are not provided explicitly.
Graph-based regularization : Only information about which
instances are likely to have the same label.
but not including
Malicious label noise setting : Labelers are allowed to mislabel a
part of the training set.
Labelers can remove labels, but cannot change them.
2 NIPS2010 reading @ Cybozu December 26, 2010
3. Framework
: m labeled training examples
: examples
: labels
: unknown PDF
: label regularization function
: soft labeling of the training examples
: probability so that has label
The algorithm can access to only the examples and
the label regularization function .
Correct labeling might be near the minimum of ,
but not necessarily (thus it is called “adversarial”).
3 NIPS2010 reading @ Cybozu December 26, 2010
4. Label regularization functions (1)
Fully supervised learning A set of indexes of unlabeled examples
otherwise
[Note] Labels also denotes the correct soft labeling.
Semi-supervised learning Don’t mind how q labels
examples not in I
otherwise A set of indexes of non-zero components
Ambiguous labeling ( multiple instance learning)
a set of possible labels of the example
otherwise
4 NIPS2010 reading @ Cybozu December 26, 2010
5. Label regularization functions (2)
Laplacian regularization
positive semi-definite
is small and are believed to have the
same label
Any combinations are also in this framework.
Ex. SSL with Laplacian regularization
We don’t specify how the function is selected.
Ex. Semi-supervised learning :
Usually, we can’t know how the unlabeled samples were
selected (“covariate shift” setting).
5 NIPS2010 reading @ Cybozu December 26, 2010
6. Loss functions for model learning
Any types of loss functions can be applicable, but
Particularly we focus on
Negative log likelihood of a log-linear PDF
Binary loss function of a linear classifier
a model/classifier parameter
a feature function (“basis function” in PRML)
6 NIPS2010 reading @ Cybozu December 26, 2010
7. Algorithm
Objective: Find a parameter achieving
Its validity will be disclosed later.
For brevity, we abbreviate the objective as .
The order is opposite !!
Basic algorithm:
Game for Adversarially Missing Evidence (GAME)
1. Constants are given in advance.
2. Find s.t.
3. Find s.t.
4. Output the parameter estimate
7 NIPS2010 reading @ Cybozu December 26, 2010
8. Algorithm in detail
Consider the case of the negative log likelihood
Step 3
The label regularization function can be ignored.
This step is equivalent to maximization of the likelihood of
a log-linear model. Many solvers we have
Step 2
Take the dual of the inner minimization
Easy to
optimize
concave
Shannon entropy
8 NIPS2010 reading @ Cybozu December 26, 2010
9. Convergence properties (1)
Theorem 1 (Converse theorem)
If , then for all parameters and label
regularization functions
satisfies with probability . Similar to the objective
function of the algorithm
9 NIPS2010 reading @ Cybozu December 26, 2010
10. Convergence properties (2)
Theorem 2 (Direct theorem: lower bound)
Consider the binary loss function, and a finite set .
For all learning algorithms and example distributions ,
there exists a labeling function that satisfies
with probability under some assumptions.
The assumptions can be seen in the paper (1,2 and 3).
Some detailed conditions are omitted for the simplicity.
The parameter is obtained by an learning algorithm.
10 NIPS2010 reading @ Cybozu December 26, 2010
11. Convergence properties (3)
Theorem 3 (Direct theorem: upper bound)
Consider the binary loss function. If ,
there exists a family of label regularization function
and a learning algorithm that satisfy
with probability under some assumptions.
Assumption 3 can be removed.
Some detailed conditions are omitted for the simplicity.
The parameter is obtained by the learning algorithm.
11 NIPS2010 reading @ Cybozu December 26, 2010
12. Back to the algorithm part …
Objective: Find a parameter achieving
Minimum upper bound in the converse theorem
Do not minimize over .
Instead, add a regularization part to leave unconstrained.
12 NIPS2010 reading @ Cybozu December 26, 2010