VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
A model-based relevance estimation approach for feature selection in microarray datasets
1. A model-based relevance estimation
approach for feature selection in
microarray datasets
Gianluca Bontempi, Patrick E. Meyer
{gbonte,pmeyer}@ulb.ac.be
Machine Learning Group,
Computer Science Department
ULB, Université Libre de Bruxelles
Boulevard de Triomphe - CP 212
Bruxelles, Belgium
http://www.ulb.ac.be/di/mlg
A model-based relevance estimation approach for feature selection in microarray datasets – p. 1/1
2. Outline
• Feature selection in microarray classification tasks
• Definition of relevance
• Relevance and feature selection
• Our approach to relevance estimation: between filter and wrapper
• Experimental results
A model-based relevance estimation approach for feature selection in microarray datasets – p. 2/1
3. Feature selection in microarray
• The availability of massive amounts of experimental data based on
genome-wide studies has given impetus in recent years to a large effort
in developing mathematical, statistical and computational techniques to
infer biological models from data.
• In many bioinformatics problems, the number of features is significantly
larger than the number of samples (high feature-to-sample ratio
datasets).
• This is typical of cancer classification tasks where a systematic
investigation of the correlation of expression patterns of thousands of
genes to specific phenotypic variations is expected to provide an
improved taxonomy of cancer.
• In this context, the number of features n corresponds to the number of
expressed gene probes (up to several thousands) and the number of
observations N to the number of tumor samples (typically in the order of
hundreds).
• Feature selection and consequently gene selection is required to
perform classification in such an high dimensional task.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 3/1
4. State-of-the-art
Feature selection requires an accurate assessment of a large number of
alternative subsets in terms of predictive power or relevance to the output
class.
Three main state-of-the-art approaches are
Filters: these are preprocessing methods which assess the merits of features
from the data without having recourse to any learning algorithm.
Examples: ranking, PCA, t-test.
Wrappers: these methods rely on a learning algorithm to assess and compare
subsets of variables. They conduct a search for a good subset using the
learning algorithm itself as part of the evaluation function. Examples are
the forward/backward methods proposed in classical regression
analysis.
Embedded methods: they perform variable selection as part of the learning
procedure and are usually specific to given learning machines.
Examples are classification trees and methods based on regularization
techniques (e.g. lasso)
A model-based relevance estimation approach for feature selection in microarray datasets – p. 4/1
5. Between filters and wrappers
• Filter approaches rely on learner independent estimators to assess the
relevance of a set of features. The rationale of filter techniques is that
the importance of a set of feature should be independent of the
prediction technique.
Our contribution: we propose a model-based strategy to assess
the relevance of a set of features.
• Wrapper depends on a specific learner to assess a set of features and
end up with returning a quantity which confounds the relevance of a
subset (desired quantity) with the quality of the learner (not required). In
other terms wrapper returns a biased estimation of the relevance of a
subset.
Our contribution: since wrapper bias may have a strong negative
impact on the selection procedure we propose a model-based
technique for relevance assessment which is low-biased.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 5/1
6. Feature selection and relevance
• Let us consider a binary classification problem where x ∈ X ⊂ Rn
and
y ∈ Y = {y0, y1}. Let s ⊆ x, s ∈ S be a subset of the input vector.
• Let us denote
p1(s) = Prob {y = y1|s}
p0(s) = Prob {y = y0|s}
• A feature selection problems can be formalized as a problem of (learner
independent) relevance maximization
s∗
= arg max
s⊆x,|s|≤d
Rs
where the goal is to find the subset s that maximizes the relevance
quantity Rs which accounts for the predictive power that the input s has
on the target y.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 6/1
7. Relevance definitions
• A well known example of relevance measure is mutual information
I(s, y) = H(y) − H(y|s).
• Here we will focus on the quantity
Rs =
S
p2
0(s) + p2
1(s) dFs(s) =
S
r(s)dFs(s)
where r(s) = 1 − g(s) and g(s) is Gini index of diversity.
• Note that
r(s) = p2
0(s) + p2
1(s) = 1 − 2p0(s)(1 − p0(s)) = 1 − 2Var {y|s}
where Var {y|s} is the conditional variance of y.
• Also a monotone function
GH (·) : [0, 1] → [0, 0.5]
maps the entropy H(y|s) of a binary variable y to the related Gini index
g. A model-based relevance estimation approach for feature selection in microarray datasets – p. 7/1
8. Bias of the wrapper approach
Given a learner h trained on dataset of size N, the wrapper approach
translates the (learner independent) relevance maximization problem into a
(learner dependent) minimization problem
arg min
s⊆x,|s|≤d
Mh
s = arg min
s⊆x,|s|≤d S
MME
h
(s)dFs(s)
where the Mean Misclassification Error is decomposed as follows (Wolpert,
Kohavi, 96)
MME
h
(s) =
1
2
1 − (p2
0(s) + p2
1(s)) +
+
1
2
(p0(s) − pˆ0(s))2
+ (p1(s) − pˆ1(s))2
+
+
1
2
1 − (p2
ˆ0
(s) + p2
ˆ1
(s)) =
1
2
(n(s) + b(s) + v(s))
where pˆ0 = Prob {ˆy = y0|s}, n(s) = 1 − r(s) is the noise variance term, b(s) is
the learner squared bias and v(s) is the learner variance.
NB: the term b(s) is NOT dependent on relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 8/1
9. Bias of the wrapper approach
• In real classification tasks, the one-zero misclassification error Mh
s of a
learner h for a subset s cannot be derived analytically but only estimated
(typicallly by cross-validation).
• A wrapper selection returns
sh
= arg min
s⊂x,|s|≤d
M
h
s (1)
where M
h
s the estimate of the misclassification error of the learner h (e.g.
computed by cross-validation)
If a wrapper strategy relies on a generic learner h, that is a learner
where the bias term b(s) is significantly different from zero, the
returned feature selection will depend on a quantity which is a
biased estimate of the term r(s) and consequently of the relevance
Rs. In other words, wrappers do not maximize relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 9/1
10. Unbiased wrapper approach
• Intuitively, the bias would be reduced if we adopted a learner having a
small bias term. A low bias, yet high variance, learner is the k-nearest
neighbour classifiers (kNN) for small values of k
• In particular, it has been shown that for a 1NN learner and a binary
classification problem
lim
N→∞
M1NN
s = 1 − Rs
where M1NN
s is the misclassification error of a nearest neighbour.
• Since cross-validation returns a consistent estimation of Mh
and since
the quantity Mh
asymptotically converges to one minus the relevance
Rs, we have that 1 − M
1NN
s is a consistent estimator of the relevance Rs.
• We propose then as relevance estimator
ˆRkNN
s = 1 − M
kNN
s
which is the cross-validation error of a kNN learner with low k. This term
returns an unbiased, yet high variance, estimate of the relevance of the
subset s. A model-based relevance estimation approach for feature selection in microarray datasets – p. 10/1
11. Reducing the variance of the estimator
The low-bias high-variance nature of the ˆRkNN
s estimator suggests that the
best way to employ this estimator is by combining it with other relevance
estimators.
We will take into consideration two possible estimators to combine with:
1. a direct model-based estimator ˆp1 of the conditional probability
p1(s) = Prob {y = y1|s} and consequently of the quantity r(s).
This estimator first samples a set of N unclassified input vectors si
according to the empirical distribution ˆFs and then computes the Monte
Carlo estimation
ˆRD
s =
1
N
N
i=1
ˆp2
1(si ) + ˆp2
0(si ) = 1 −
2
N
N
i=1
ˆp1(si )(1 − ˆp1(si ))
A similar estimator was proposed by Fukunaga in 1973 to estimate the
Bayes error.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 11/1
12. 2. a filter estimator based on the notion of mutual information: several filter
algorithms exploit this notion in order to estimate the relevance. An
example is the MRMR algorithm (Peng et al., 05) where the relevance of
a feature subset s, expressed in terms of the mutual information
I(s, y) = H(y) − H(y|s), is approximated by the incremental formulation
IMRMR(s; y) = IMRMR(si; y) + I(xi, y) −
1
m − 1 xj ∈si
I(xj; xi) (2)
where xi is a feature belonging to the subset s, si is the set s with the xi
feature set aside and m is the number of components of s. Now since
H(y|s) = H(y) − I(s, y) and Gs = 1 − Rs = GH (H(y|s)) we obtain that
ˆRMRMR
s = 1 − GH (H(y) − IMRMR(s, y))
is a MRMR estimator of the relevance Rs where GH (·) is the monotone
mapping between H and Gini index.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 12/1
13. Proposed relevance estimators
We propose two novel relevance estimators based on the principle of
averaging
ˆRs =
ˆRCV
s + ˆRD
s
2
,
ˆRs =
ˆRCV
s + ˆRMRMR
s
2
and the associated feature selection algorithms:
sR
= arg max
s⊂x,|s|≤d
ˆRs
sR
= arg max
s⊂x,|s|≤d
ˆRs
A model-based relevance estimation approach for feature selection in microarray datasets – p. 13/1
14. Experimental session
• 20 public domain microarray expression datasets
• external cross-validation scheme three-fold cross-validation strategy
• to avoid any dependency between the learning algorithm employed by
the wrapper and the classifier used for prediction, the experimental
session is composed of two parts:
• Part 1: comparison with the wrapper WSVM and we use the set of
classifiers C1 ={TREE, NB, SVMSIGM, LDA, LOG} which does not
include the SVMLIN learner,
• Part 2: comparison with the wrapper WNB and we use the set of
classifiers C2 ={TREE, SVMSIGM, SVMLIN, LDA, LOG} which does
not include the NB learner.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 14/1
15. Experiments with cancer datasets
Name N n K
Golub 72 7129 2
Alon 62 2000 2
Notterman 36 7457 2
Nutt 50 12625 2
Shipp 77 7129 2
Singh 102 12600 2
Sorlie 76 7937 2
Wang 286 22283 2
Van’t Veer 65 24481 2
VandeVijver 295 24496 2
Sotiriou 99 7650 2
Pomeroy 60 7129 2
Khan 63 2308 4
Hedenfalk 22 3226 3
West 49 7129 4
Staunton 60 7129 9
Su 174 12533 11
Bhattacharjee 203 12600 5
Armstrong 72 12582 3
Ma 60 22575 3
A model-based relevance estimation approach for feature selection in microarray datasets – p. 15/1
18. Conclusions
• Feature selection demands accurate estimation of relevance of subsets
of features.
• Wrapper methods use cross-validation estimation of misclassification
error with generic learners. We show that this means a biased
estimation of relevance.
• The cross validation assessment ˆRkNN
s returned by kNN techniques
with low k provide a low bias yet high variance estimator of relevance.
• Variance can be reduced by combining with other estimators.
• Experiments on real datasets showed that the resulting relevance
estimator can outperform both conventional wrapper and filter
algorithms.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 18/1