Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Combining Committee-based
Semi-supervised and Active Learning and
Its Application to Handwritten Digits
Recognition
Mohamed Farouk Abdel Hady, Friedhelm Schwenker
Institute of Neural Information Processing
University of Ulm, Germany
{mohamed.abdel-hady|friedhelm.schwenker}@uni-ulm.de
April 8, 2010
1 / 24

Overview
2 / 24

Semi-Supervised Learning
In many domains, the amount of training examples is large
but unlabeled.
Data labeling process is often tedious, expensive and
time consuming because it requires the effort of human
experts.
Research directions of SSL
Semi-Supervised Clustering
Semi-Supervised Classiﬁcation
Semi-Supervised Regression
Semi-Supervised Dimensionality Reduction
3 / 24

Semi-Supervised Learning
Description SSL algorithm
Single-view, Single-learner EM (Nigam and Ghani, 2000)
Single-classifier Self-Training (Nigam and Ghani, 2000)
Multi-view, Single-learner Co-EM (Nigam and Ghani, 2000)
Multiple classifiers Co-Training (Blum and Mitchell, COLT’98)
Single-view, Multi-learner Statistical Co-Learning (Goldman et al., 2000)
Multiple classifiers Democratic Co-Learning (Y. Zhou et al., 2004)
Single-view, Single-learner Tri-Training (Z.-H. Zhou, TKDE’05)
Multiple classifiers Co-Forest (Li and Z.-H. Zhou, TSMC’07)
Co-Training by Committee
Z.-H. Zhou and M. Li, Semi-supervised learning by disagreement, Knowledge and
Information Systems, in press.
4 / 24

How can unlabeled data be helpful?
5 / 24

How can unlabeled data be helpful?
6 / 24

Self-Training
But the most conﬁdent examples often lie away from the target
decision boundary (non informative examples). Therefore, in
many cases this process does not create representative
training sets as it selects non informative examples.
7 / 24

Multi-View Co-Training
Blum and Mitchell (1998)
As any multi-view learning algorithm, it requires that each
training example is represented by multiple sufficient and
redundant views,
i.e. two or more sets of features that are conditionally
independent given the class label and each is sufficient for
learning.
For web page classification: 1) the text appearing on the
page itself, and 2) the text attached to hyperlinks pointing
to this page, from other pages.
8 / 24

Multi-View Co-Training
9 / 24

Single-View Co-Training by Committee
Contribution
A single-view variant of Co-Training for application
domains in which there are not redundant and independent
views is proposed.
Two learning frameworks for combining the merits of active
learning with semi-supervised learning.
Motivation
For many real-world applications, the requirement for two
sufficient and independent views can not be fulfilled.
Co-Training does not work well without an appropriate
feature splitting (Nigam and Ghani, 2000)
Measuring the labeling confidence is not a straightforward
task.
10 / 24

Single-View Co-Training By Committee
11 / 24

How to measure confidence
Inaccurate confidence estimation
→ selecting and adding mislabeled examples to the training set
→ degrade the classification accuracy
Estimating Class Probabilities (CPE) provided by companion
committee.
Confidence(xu, H
(t−1)
i ) = max
1≤c≤C
H
(t−1)
i (xu, ωc)
Unfortunately, in many cases the classifier does not provide an
accurate CPE. For instance, a decision tree provides piecewise
constant probability estimates. That is, all unlabeled examples
xu which lie into a particular leaf, will have the same CPEs
because the exact value of xu is not used in determining its
CPE.
12 / 24

Improving CPE of Decision Trees
Laplace Correction, Probability Estimation Tree (PET),
(Provost, Machine Learning 2003)
P(ωc|xu) =
nc + 1
N + C
Bagging of PET
Retroﬁtting Decision Tree Classiﬁers Using Kernel Density
Estimation (Fayyad, ICML’95)
Improve Decision Trees for Probability-Based Ranking by
Lazy Learners (Liang, ICTAI’06)
13 / 24

Estimating local competence
The local competency of an unlabeled example xu given
H
(t−1)
i is deﬁned as follows:
Comp(xu, H
(t−1)
i ) =
xn∈N(xu),xn∈ωpred
H
(t−1)
i (xn, ωpred )
||xn − xu||2 +
where ωpred is the class label assigned to xu by H
(t−1)
i ;
H
(t−1)
j (xn, ωpred ) is the probability given by H
(t−1)
j that
neighbor xn belongs to class ωpred ; is a constant added to
avoid zero denominator.
It is inspired by decision-dependent distance-based k-nn
estimate of the competence that was proposed for dynamic
classiﬁer selection. (Woods, PAMI’97)
14 / 24

Estimating local competence
estimating local competence of an unlabeled example
given companion committee
15 / 24

Handwritten Digits Recognition
The Handwritten Digits that are described by four sets of
features and are publicly available at UCI Repository. The digits
were extracted from a collection of Dutch utility maps. A total of
2,000 patterns (200 patterns per class) have been digitized in
binary images.
Name Description
mfeat-pix 240 pixel averages in 2 x 3 windows
mfeat-kar 64 Karhunen-Love coefficients
mfeat-fac 216 profile correlations
mfeat-fou 76 Fourier coefficients of the character shapes
16 / 24

Experimental Setup
WEKA
4 runs of 10-fold cross-validation
For SSL, 10% of the training examples (180 patterns) are
randomly selected as the initial labeled data set L while the
remaining are used as unlabeled data set U.
The Random Subspace Method constructs an ensemble of
ten C4.5 pruned decision trees (with Laplace Correction)
where each tree uses only 50% of the features.
We set the pool size u = 100, the sample size n = one and
the number of nearest neighbors used to estimate local
competence k is 10.
17 / 24

Experimental Results
Comparison between forests and individual trees.
Comparison between CoBC and Self-Training.
Comparison between CPE and local competence
conﬁdence measures.
Comparison between CoBC and Co-Forest.
18 / 24

•
: corrected paired t-test implemented in WEKA at 0.05 signiﬁcance level.
19 / 24

Combining QBC and CoBC
Both semi-supervised learning and active learning tackle the
same problem but from different directions.
QBC-then-CoBC: QBC provides CoBC with a better
starting point instead of randomly selecting labeled
examples.
QBC-with-CoBC: In QBC-then-CoBC, QBC does not
beneﬁt from CoBC. On the other hand, in QBC-with-CoBC,
both algorithms are beneﬁting from each other.
20 / 24

•
: corrected paired t-test implemented in WEKA at 0.05 signiﬁcance level.
21 / 24

Conclusion
A new single-view committe-based semi-supervised
learning framework is proposed.
An ensemble of diverse and accurate classiﬁers can
effectively exploit the unlabeled data to improve the
recognition accuracy.
The random subspace method not only enforces the
diversity but also reduces the dimensionality which is
desirable in case of small training set size.
CoBC outperforms Self-Training.
The local competence estimates is an effective conﬁdence
measure that outperforms the class probability estimates
for sample selection.
22 / 24

Future Work
Influence of ensemble size, random subspace size
Different ensemble learners, base learners such as SVM
or kNN
CoBC depends only on the companion committee H
(t−1)
j
constructed at the previous iteration to measure
confidence. We will study the influence of depending on all
the previous versions (H
(t )
j , t = t − 1, t − 2, . . . , 0).
23 / 24

Thanks for your attention
Questions ??
24 / 24

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

Recommended

Recommended

More Related Content

Similar to Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

Similar to Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition (20)

Recently uploaded

Recently uploaded (20)

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition