SlideShare a Scribd company logo
A model-based relevance estimation
approach for feature selection in
microarray datasets
Gianluca Bontempi, Patrick E. Meyer
{gbonte,pmeyer}@ulb.ac.be
Machine Learning Group,
Computer Science Department
ULB, Université Libre de Bruxelles
Boulevard de Triomphe - CP 212
Bruxelles, Belgium
http://www.ulb.ac.be/di/mlg
A model-based relevance estimation approach for feature selection in microarray datasets – p. 1/1
Outline
• Feature selection in microarray classification tasks
• Definition of relevance
• Relevance and feature selection
• Our approach to relevance estimation: between filter and wrapper
• Experimental results
A model-based relevance estimation approach for feature selection in microarray datasets – p. 2/1
Feature selection in microarray
• The availability of massive amounts of experimental data based on
genome-wide studies has given impetus in recent years to a large effort
in developing mathematical, statistical and computational techniques to
infer biological models from data.
• In many bioinformatics problems, the number of features is significantly
larger than the number of samples (high feature-to-sample ratio
datasets).
• This is typical of cancer classification tasks where a systematic
investigation of the correlation of expression patterns of thousands of
genes to specific phenotypic variations is expected to provide an
improved taxonomy of cancer.
• In this context, the number of features n corresponds to the number of
expressed gene probes (up to several thousands) and the number of
observations N to the number of tumor samples (typically in the order of
hundreds).
• Feature selection and consequently gene selection is required to
perform classification in such an high dimensional task.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 3/1
State-of-the-art
Feature selection requires an accurate assessment of a large number of
alternative subsets in terms of predictive power or relevance to the output
class.
Three main state-of-the-art approaches are
Filters: these are preprocessing methods which assess the merits of features
from the data without having recourse to any learning algorithm.
Examples: ranking, PCA, t-test.
Wrappers: these methods rely on a learning algorithm to assess and compare
subsets of variables. They conduct a search for a good subset using the
learning algorithm itself as part of the evaluation function. Examples are
the forward/backward methods proposed in classical regression
analysis.
Embedded methods: they perform variable selection as part of the learning
procedure and are usually specific to given learning machines.
Examples are classification trees and methods based on regularization
techniques (e.g. lasso)
A model-based relevance estimation approach for feature selection in microarray datasets – p. 4/1
Between filters and wrappers
• Filter approaches rely on learner independent estimators to assess the
relevance of a set of features. The rationale of filter techniques is that
the importance of a set of feature should be independent of the
prediction technique.
Our contribution: we propose a model-based strategy to assess
the relevance of a set of features.
• Wrapper depends on a specific learner to assess a set of features and
end up with returning a quantity which confounds the relevance of a
subset (desired quantity) with the quality of the learner (not required). In
other terms wrapper returns a biased estimation of the relevance of a
subset.
Our contribution: since wrapper bias may have a strong negative
impact on the selection procedure we propose a model-based
technique for relevance assessment which is low-biased.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 5/1
Feature selection and relevance
• Let us consider a binary classification problem where x ∈ X ⊂ Rn
and
y ∈ Y = {y0, y1}. Let s ⊆ x, s ∈ S be a subset of the input vector.
• Let us denote
p1(s) = Prob {y = y1|s}
p0(s) = Prob {y = y0|s}
• A feature selection problems can be formalized as a problem of (learner
independent) relevance maximization
s∗
= arg max
s⊆x,|s|≤d
Rs
where the goal is to find the subset s that maximizes the relevance
quantity Rs which accounts for the predictive power that the input s has
on the target y.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 6/1
Relevance definitions
• A well known example of relevance measure is mutual information
I(s, y) = H(y) − H(y|s).
• Here we will focus on the quantity
Rs =
S
p2
0(s) + p2
1(s) dFs(s) =
S
r(s)dFs(s)
where r(s) = 1 − g(s) and g(s) is Gini index of diversity.
• Note that
r(s) = p2
0(s) + p2
1(s) = 1 − 2p0(s)(1 − p0(s)) = 1 − 2Var {y|s}
where Var {y|s} is the conditional variance of y.
• Also a monotone function
GH (·) : [0, 1] → [0, 0.5]
maps the entropy H(y|s) of a binary variable y to the related Gini index
g. A model-based relevance estimation approach for feature selection in microarray datasets – p. 7/1
Bias of the wrapper approach
Given a learner h trained on dataset of size N, the wrapper approach
translates the (learner independent) relevance maximization problem into a
(learner dependent) minimization problem
arg min
s⊆x,|s|≤d
Mh
s = arg min
s⊆x,|s|≤d S
MME
h
(s)dFs(s)
where the Mean Misclassification Error is decomposed as follows (Wolpert,
Kohavi, 96)
MME
h
(s) =
1
2
1 − (p2
0(s) + p2
1(s)) +
+
1
2
(p0(s) − pˆ0(s))2
+ (p1(s) − pˆ1(s))2
+
+
1
2
1 − (p2
ˆ0
(s) + p2
ˆ1
(s)) =
1
2
(n(s) + b(s) + v(s))
where pˆ0 = Prob {ˆy = y0|s}, n(s) = 1 − r(s) is the noise variance term, b(s) is
the learner squared bias and v(s) is the learner variance.
NB: the term b(s) is NOT dependent on relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 8/1
Bias of the wrapper approach
• In real classification tasks, the one-zero misclassification error Mh
s of a
learner h for a subset s cannot be derived analytically but only estimated
(typicallly by cross-validation).
• A wrapper selection returns
sh
= arg min
s⊂x,|s|≤d
M
h
s (1)
where M
h
s the estimate of the misclassification error of the learner h (e.g.
computed by cross-validation)
If a wrapper strategy relies on a generic learner h, that is a learner
where the bias term b(s) is significantly different from zero, the
returned feature selection will depend on a quantity which is a
biased estimate of the term r(s) and consequently of the relevance
Rs. In other words, wrappers do not maximize relevance.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 9/1
Unbiased wrapper approach
• Intuitively, the bias would be reduced if we adopted a learner having a
small bias term. A low bias, yet high variance, learner is the k-nearest
neighbour classifiers (kNN) for small values of k
• In particular, it has been shown that for a 1NN learner and a binary
classification problem
lim
N→∞
M1NN
s = 1 − Rs
where M1NN
s is the misclassification error of a nearest neighbour.
• Since cross-validation returns a consistent estimation of Mh
and since
the quantity Mh
asymptotically converges to one minus the relevance
Rs, we have that 1 − M
1NN
s is a consistent estimator of the relevance Rs.
• We propose then as relevance estimator
ˆRkNN
s = 1 − M
kNN
s
which is the cross-validation error of a kNN learner with low k. This term
returns an unbiased, yet high variance, estimate of the relevance of the
subset s. A model-based relevance estimation approach for feature selection in microarray datasets – p. 10/1
Reducing the variance of the estimator
The low-bias high-variance nature of the ˆRkNN
s estimator suggests that the
best way to employ this estimator is by combining it with other relevance
estimators.
We will take into consideration two possible estimators to combine with:
1. a direct model-based estimator ˆp1 of the conditional probability
p1(s) = Prob {y = y1|s} and consequently of the quantity r(s).
This estimator first samples a set of N unclassified input vectors si
according to the empirical distribution ˆFs and then computes the Monte
Carlo estimation
ˆRD
s =
1
N
N
i=1
ˆp2
1(si ) + ˆp2
0(si ) = 1 −
2
N
N
i=1
ˆp1(si )(1 − ˆp1(si ))
A similar estimator was proposed by Fukunaga in 1973 to estimate the
Bayes error.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 11/1
2. a filter estimator based on the notion of mutual information: several filter
algorithms exploit this notion in order to estimate the relevance. An
example is the MRMR algorithm (Peng et al., 05) where the relevance of
a feature subset s, expressed in terms of the mutual information
I(s, y) = H(y) − H(y|s), is approximated by the incremental formulation
IMRMR(s; y) = IMRMR(si; y) + I(xi, y) −
1
m − 1 xj ∈si
I(xj; xi) (2)
where xi is a feature belonging to the subset s, si is the set s with the xi
feature set aside and m is the number of components of s. Now since
H(y|s) = H(y) − I(s, y) and Gs = 1 − Rs = GH (H(y|s)) we obtain that
ˆRMRMR
s = 1 − GH (H(y) − IMRMR(s, y))
is a MRMR estimator of the relevance Rs where GH (·) is the monotone
mapping between H and Gini index.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 12/1
Proposed relevance estimators
We propose two novel relevance estimators based on the principle of
averaging
ˆRs =
ˆRCV
s + ˆRD
s
2
,
ˆRs =
ˆRCV
s + ˆRMRMR
s
2
and the associated feature selection algorithms:
sR
= arg max
s⊂x,|s|≤d
ˆRs
sR
= arg max
s⊂x,|s|≤d
ˆRs
A model-based relevance estimation approach for feature selection in microarray datasets – p. 13/1
Experimental session
• 20 public domain microarray expression datasets
• external cross-validation scheme three-fold cross-validation strategy
• to avoid any dependency between the learning algorithm employed by
the wrapper and the classifier used for prediction, the experimental
session is composed of two parts:
• Part 1: comparison with the wrapper WSVM and we use the set of
classifiers C1 ={TREE, NB, SVMSIGM, LDA, LOG} which does not
include the SVMLIN learner,
• Part 2: comparison with the wrapper WNB and we use the set of
classifiers C2 ={TREE, SVMSIGM, SVMLIN, LDA, LOG} which does
not include the NB learner.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 14/1
Experiments with cancer datasets
Name N n K
Golub 72 7129 2
Alon 62 2000 2
Notterman 36 7457 2
Nutt 50 12625 2
Shipp 77 7129 2
Singh 102 12600 2
Sorlie 76 7937 2
Wang 286 22283 2
Van’t Veer 65 24481 2
VandeVijver 295 24496 2
Sotiriou 99 7650 2
Pomeroy 60 7129 2
Khan 63 2308 4
Hedenfalk 22 3226 3
West 49 7129 4
Staunton 60 7129 9
Su 174 12533 11
Bhattacharjee 203 12600 5
Armstrong 72 12582 3
Ma 60 22575 3
A model-based relevance estimation approach for feature selection in microarray datasets – p. 15/1
Results 1st part
Name R’ WSVM R” MRMR RANK
Golub 0.0917 0.1177 0.1 0.1079 0.1225
Alon 0.2704 0.2658 0.2267 0.1996 0.2281
Notterman 0.1966 0.0985 0.1494 0.1472 0.1432
Nutt 0.3798 0.4171 0.3873 0.3847 0.4189
Shipp 0.1429 0.1319 0.1322 0.1362 0.1873
Singh 0.1619 0.1517 0.1266 0.1374 0.1328
Sorlie 0.3835 0.4314 0.3963 0.4004 0.3987
Wang 0.4282 0.4111 0.4218 0.4232 0.4181
Van’t Veer 0.2786 0.2638 0.2492 0.2217 0.2277
VandeVijver 0.454 0.4724 0.4365 0.4636 0.4482
Sotiriou 0.5279 0.5796 0.5351 0.5708 0.5339
Pomeroy 0.428 0.4191 0.4141 0.3876 0.4181
Khan 0.0878 0.1143 0.0582 0.0686 0.131
Hedenfalk 0.5475 0.5263 0.452 0.5273 0.5389
West 0.6463 0.6109 0.6186 0.5746 0.6109
Staunton 0.6822 0.71 0.6511 0.6865 0.7407
Su 0.2568 0.307 0.2549 0.3772 0.3352
Bhattacharjee 0.1232 0.1347 0.1105 0.1057 0.1515
Armstrong 0.1082 0.1199 0.1306 0.115 0.1122
Ma 0.2456 0.2041 0.2257 0.2413 0.2317
AVG 0.323 0.331 0.310 0.326 0.331
W/B than R’ (R”) 10/7 9/6 9/2
A model-based relevance estimation approach for feature selection in microarray datasets – p. 16/1
Results 2nd part
Name R’ WNB R” MRMR RANK
Golub 0.0886 0.1114 0.0971 0.1019 0.0904
Alon 0.2376 0.2568 0.2181 0.2109 0.221
Notterman 0.1852 0.2059 0.1491 0.1512 0.1645
Nutt 0.3929 0.3402 0.36 0.3898 0.4258
Shipp 0.1261 0.127 0.1198 0.1338 0.1734
Singh 0.1495 0.1454 0.1297 0.1377 0.1245
Sorlie 0.3848 0.4254 0.3808 0.3953 0.3838
Wang 0.4363 0.4345 0.4298 0.4281 0.4255
Van’t Veer 0.2747 0.2715 0.2421 0.2253 0.2325
VandeVijver 0.4626 0.44 0.4763 0.4721 0.4358
Sotiriou 0.5126 0.5578 0.5505 0.5732 0.5611
Pomeroy 0.4367 0.4389 0.4007 0.3902 0.4224
Khan 0.0804 0.0896 0.0628 0.0631 0.0901
Hedenfalk 0.5379 0.5187 0.4369 0.4904 0.4949
West 0.6413 0.6696 0.5542 0.5882 0.6728
Staunton 0.6689 0.8298 0.6981 0.6661 0.83
Su 0.2544 0.3096 0.2646 0.3739 0.3529
Bhattacharjee 0.1235 0.1209 0.101 0.1061 0.1186
Armstrong 0.1079 0.1668 0.125 0.1148 0.1034
Ma 0.2565 0.2635 0.2335 0.2443 0.2681
AVG 0.322 0.3335 0.315 0.327 0.331
W/B than R’ (R”) 9/2 10/3 11/2
A model-based relevance estimation approach for feature selection in microarray datasets – p. 17/1
Conclusions
• Feature selection demands accurate estimation of relevance of subsets
of features.
• Wrapper methods use cross-validation estimation of misclassification
error with generic learners. We show that this means a biased
estimation of relevance.
• The cross validation assessment ˆRkNN
s returned by kNN techniques
with low k provide a low bias yet high variance estimator of relevance.
• Variance can be reduced by combining with other estimators.
• Experiments on real datasets showed that the resulting relevance
estimator can outperform both conventional wrapper and filter
algorithms.
A model-based relevance estimation approach for feature selection in microarray datasets – p. 18/1

More Related Content

What's hot

Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
Christian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
Christian Robert
 
Machine learning
Machine learningMachine learning
Machine learning
Shreyas G S
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
NBER
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
haharrington
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
butest
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
The Statistical and Applied Mathematical Sciences Institute
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
Christian Robert
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
The Statistical and Applied Mathematical Sciences Institute
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
Shenghui Wang
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
Erika G. G.
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Akshay Kanchan
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
Jiawang Liu
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance      A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance
Simone Romano
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
jianingy
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
Christian Robert
 
Boston talk
Boston talkBoston talk
Boston talk
Christian Robert
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth
 

What's hot (20)

Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
Machine learning
Machine learningMachine learning
Machine learning
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance      A Framework to Adjust Dependency Measure Estimates for Chance
A Framework to Adjust Dependency Measure Estimates for Chance
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Boston talk
Boston talkBoston talk
Boston talk
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 

Viewers also liked

Blender
BlenderBlender
Blender
Mauro Fava
 
国外互联网发展趋势
国外互联网发展趋势国外互联网发展趋势
国外互联网发展趋势
envong
 
Client Overview Staffing
Client Overview StaffingClient Overview Staffing
Client Overview Staffing
Dan Stagliano
 
Janussik
JanussikJanussik
Janussik
MAsjan
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
Gianluca Bontempi
 
Computational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionComputational Intelligence for Time Series Prediction
Computational Intelligence for Time Series Prediction
Gianluca Bontempi
 
Ubuntu
UbuntuUbuntu
Ubuntu
Mauro Fava
 
产品经理的视角
产品经理的视角产品经理的视角
产品经理的视角
envong
 
Pony对QQMail的邮件摘录
Pony对QQMail的邮件摘录Pony对QQMail的邮件摘录
Pony对QQMail的邮件摘录
envong
 
指引更工业化的设计
指引更工业化的设计指引更工业化的设计
指引更工业化的设计
envong
 
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
Dung Rwang Pam
 
Adaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor NetworksAdaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor Networks
Gianluca Bontempi
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Gianluca Bontempi
 
Safe and sustainable aviation in africa; alignment of policies, regulation an...
Safe and sustainable aviation in africa; alignment of policies, regulation an...Safe and sustainable aviation in africa; alignment of policies, regulation an...
Safe and sustainable aviation in africa; alignment of policies, regulation an...
Dung Rwang Pam
 
Perspective of feature selection in bioinformatics
Perspective of feature selection in bioinformaticsPerspective of feature selection in bioinformatics
Perspective of feature selection in bioinformatics
Gianluca Bontempi
 
FP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluatorFP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluator
Gianluca Bontempi
 
CMS OPEN - Università degli studi di Macerata (UniMC)
CMS OPEN - Università degli studi di Macerata (UniMC)CMS OPEN - Università degli studi di Macerata (UniMC)
CMS OPEN - Università degli studi di Macerata (UniMC)
Mauro Fava
 
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
Dung Rwang Pam
 
Nigeria aviation industry drifting in turbulent waters
Nigeria aviation industry drifting in turbulent watersNigeria aviation industry drifting in turbulent waters
Nigeria aviation industry drifting in turbulent waters
Dung Rwang Pam
 

Viewers also liked (19)

Blender
BlenderBlender
Blender
 
国外互联网发展趋势
国外互联网发展趋势国外互联网发展趋势
国外互联网发展趋势
 
Client Overview Staffing
Client Overview StaffingClient Overview Staffing
Client Overview Staffing
 
Janussik
JanussikJanussik
Janussik
 
Feature selection and microarray data
Feature selection and microarray dataFeature selection and microarray data
Feature selection and microarray data
 
Computational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionComputational Intelligence for Time Series Prediction
Computational Intelligence for Time Series Prediction
 
Ubuntu
UbuntuUbuntu
Ubuntu
 
产品经理的视角
产品经理的视角产品经理的视角
产品经理的视角
 
Pony对QQMail的邮件摘录
Pony对QQMail的邮件摘录Pony对QQMail的邮件摘录
Pony对QQMail的邮件摘录
 
指引更工业化的设计
指引更工业化的设计指引更工业化的设计
指引更工业化的设计
 
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
When safety comes last; A short synopisis of events in Nigeria aviation (Pam,...
 
Adaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor NetworksAdaptive model selection in Wireless Sensor Networks
Adaptive model selection in Wireless Sensor Networks
 
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature SelectionCombining Lazy Learning, Racing and Subsampling for Effective Feature Selection
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
 
Safe and sustainable aviation in africa; alignment of policies, regulation an...
Safe and sustainable aviation in africa; alignment of policies, regulation an...Safe and sustainable aviation in africa; alignment of policies, regulation an...
Safe and sustainable aviation in africa; alignment of policies, regulation an...
 
Perspective of feature selection in bioinformatics
Perspective of feature selection in bioinformaticsPerspective of feature selection in bioinformatics
Perspective of feature selection in bioinformatics
 
FP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluatorFP7 evaluation & selection: the point of view of an evaluator
FP7 evaluation & selection: the point of view of an evaluator
 
CMS OPEN - Università degli studi di Macerata (UniMC)
CMS OPEN - Università degli studi di Macerata (UniMC)CMS OPEN - Università degli studi di Macerata (UniMC)
CMS OPEN - Università degli studi di Macerata (UniMC)
 
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
A discourse on aviation with the Nigeria aviation safety initiaitive (NASI)
 
Nigeria aviation industry drifting in turbulent waters
Nigeria aviation industry drifting in turbulent watersNigeria aviation industry drifting in turbulent waters
Nigeria aviation industry drifting in turbulent waters
 

Similar to A model-based relevance estimation approach for feature selection in microarray datasets

Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & Dissimilarity
ShivarkarSandip
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
IOSR Journals
 
Recommender system
Recommender systemRecommender system
Recommender system
Bhumi Patel
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
lcr
lcrlcr
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Kuppusamy P
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
Dario Panada
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
Krish_ver2
 
Recommender System Based On Statistical Implicative Analysis.doc
Recommender System Based On Statistical Implicative Analysis.docRecommender System Based On Statistical Implicative Analysis.doc
Recommender System Based On Statistical Implicative Analysis.doc
Dịch vụ viết đề tài trọn gói 0934.573.149
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
sriharipatilin
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
Kamal Singh Lodhi
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
DataminingTools Inc
 
Matlab: Regression
Matlab: RegressionMatlab: Regression
Matlab: Regression
matlab Content
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
Alejandro Bellogin
 
Object class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunalObject class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunal
Kunal Kishor Nirala
 
Cluster
ClusterCluster
CSE545_Porject
CSE545_PorjectCSE545_Porject
CSE545_Porject
han li
 

Similar to A model-based relevance estimation approach for feature selection in microarray datasets (20)

Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Cluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & DissimilarityCluster Analysis: Measuring Similarity & Dissimilarity
Cluster Analysis: Measuring Similarity & Dissimilarity
 
A Modified KS-test for Feature Selection
A Modified KS-test for Feature SelectionA Modified KS-test for Feature Selection
A Modified KS-test for Feature Selection
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
lcr
lcrlcr
lcr
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Recommender System Based On Statistical Implicative Analysis.doc
Recommender System Based On Statistical Implicative Analysis.docRecommender System Based On Statistical Implicative Analysis.doc
Recommender System Based On Statistical Implicative Analysis.doc
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
 
Matlab: Regression
Matlab: RegressionMatlab: Regression
Matlab: Regression
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
Object class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunalObject class recognition by unsupervide scale invariant learning - kunal
Object class recognition by unsupervide scale invariant learning - kunal
 
Cluster
ClusterCluster
Cluster
 
CSE545_Porject
CSE545_PorjectCSE545_Porject
CSE545_Porject
 

Recently uploaded

Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
mohamed Ibrahim
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
vrvipin164
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
Alireza Kamrani
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 

Recently uploaded (20)

Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 

A model-based relevance estimation approach for feature selection in microarray datasets

  • 1. A model-based relevance estimation approach for feature selection in microarray datasets Gianluca Bontempi, Patrick E. Meyer {gbonte,pmeyer}@ulb.ac.be Machine Learning Group, Computer Science Department ULB, Université Libre de Bruxelles Boulevard de Triomphe - CP 212 Bruxelles, Belgium http://www.ulb.ac.be/di/mlg A model-based relevance estimation approach for feature selection in microarray datasets – p. 1/1
  • 2. Outline • Feature selection in microarray classification tasks • Definition of relevance • Relevance and feature selection • Our approach to relevance estimation: between filter and wrapper • Experimental results A model-based relevance estimation approach for feature selection in microarray datasets – p. 2/1
  • 3. Feature selection in microarray • The availability of massive amounts of experimental data based on genome-wide studies has given impetus in recent years to a large effort in developing mathematical, statistical and computational techniques to infer biological models from data. • In many bioinformatics problems, the number of features is significantly larger than the number of samples (high feature-to-sample ratio datasets). • This is typical of cancer classification tasks where a systematic investigation of the correlation of expression patterns of thousands of genes to specific phenotypic variations is expected to provide an improved taxonomy of cancer. • In this context, the number of features n corresponds to the number of expressed gene probes (up to several thousands) and the number of observations N to the number of tumor samples (typically in the order of hundreds). • Feature selection and consequently gene selection is required to perform classification in such an high dimensional task. A model-based relevance estimation approach for feature selection in microarray datasets – p. 3/1
  • 4. State-of-the-art Feature selection requires an accurate assessment of a large number of alternative subsets in terms of predictive power or relevance to the output class. Three main state-of-the-art approaches are Filters: these are preprocessing methods which assess the merits of features from the data without having recourse to any learning algorithm. Examples: ranking, PCA, t-test. Wrappers: these methods rely on a learning algorithm to assess and compare subsets of variables. They conduct a search for a good subset using the learning algorithm itself as part of the evaluation function. Examples are the forward/backward methods proposed in classical regression analysis. Embedded methods: they perform variable selection as part of the learning procedure and are usually specific to given learning machines. Examples are classification trees and methods based on regularization techniques (e.g. lasso) A model-based relevance estimation approach for feature selection in microarray datasets – p. 4/1
  • 5. Between filters and wrappers • Filter approaches rely on learner independent estimators to assess the relevance of a set of features. The rationale of filter techniques is that the importance of a set of feature should be independent of the prediction technique. Our contribution: we propose a model-based strategy to assess the relevance of a set of features. • Wrapper depends on a specific learner to assess a set of features and end up with returning a quantity which confounds the relevance of a subset (desired quantity) with the quality of the learner (not required). In other terms wrapper returns a biased estimation of the relevance of a subset. Our contribution: since wrapper bias may have a strong negative impact on the selection procedure we propose a model-based technique for relevance assessment which is low-biased. A model-based relevance estimation approach for feature selection in microarray datasets – p. 5/1
  • 6. Feature selection and relevance • Let us consider a binary classification problem where x ∈ X ⊂ Rn and y ∈ Y = {y0, y1}. Let s ⊆ x, s ∈ S be a subset of the input vector. • Let us denote p1(s) = Prob {y = y1|s} p0(s) = Prob {y = y0|s} • A feature selection problems can be formalized as a problem of (learner independent) relevance maximization s∗ = arg max s⊆x,|s|≤d Rs where the goal is to find the subset s that maximizes the relevance quantity Rs which accounts for the predictive power that the input s has on the target y. A model-based relevance estimation approach for feature selection in microarray datasets – p. 6/1
  • 7. Relevance definitions • A well known example of relevance measure is mutual information I(s, y) = H(y) − H(y|s). • Here we will focus on the quantity Rs = S p2 0(s) + p2 1(s) dFs(s) = S r(s)dFs(s) where r(s) = 1 − g(s) and g(s) is Gini index of diversity. • Note that r(s) = p2 0(s) + p2 1(s) = 1 − 2p0(s)(1 − p0(s)) = 1 − 2Var {y|s} where Var {y|s} is the conditional variance of y. • Also a monotone function GH (·) : [0, 1] → [0, 0.5] maps the entropy H(y|s) of a binary variable y to the related Gini index g. A model-based relevance estimation approach for feature selection in microarray datasets – p. 7/1
  • 8. Bias of the wrapper approach Given a learner h trained on dataset of size N, the wrapper approach translates the (learner independent) relevance maximization problem into a (learner dependent) minimization problem arg min s⊆x,|s|≤d Mh s = arg min s⊆x,|s|≤d S MME h (s)dFs(s) where the Mean Misclassification Error is decomposed as follows (Wolpert, Kohavi, 96) MME h (s) = 1 2 1 − (p2 0(s) + p2 1(s)) + + 1 2 (p0(s) − pˆ0(s))2 + (p1(s) − pˆ1(s))2 + + 1 2 1 − (p2 ˆ0 (s) + p2 ˆ1 (s)) = 1 2 (n(s) + b(s) + v(s)) where pˆ0 = Prob {ˆy = y0|s}, n(s) = 1 − r(s) is the noise variance term, b(s) is the learner squared bias and v(s) is the learner variance. NB: the term b(s) is NOT dependent on relevance. A model-based relevance estimation approach for feature selection in microarray datasets – p. 8/1
  • 9. Bias of the wrapper approach • In real classification tasks, the one-zero misclassification error Mh s of a learner h for a subset s cannot be derived analytically but only estimated (typicallly by cross-validation). • A wrapper selection returns sh = arg min s⊂x,|s|≤d M h s (1) where M h s the estimate of the misclassification error of the learner h (e.g. computed by cross-validation) If a wrapper strategy relies on a generic learner h, that is a learner where the bias term b(s) is significantly different from zero, the returned feature selection will depend on a quantity which is a biased estimate of the term r(s) and consequently of the relevance Rs. In other words, wrappers do not maximize relevance. A model-based relevance estimation approach for feature selection in microarray datasets – p. 9/1
  • 10. Unbiased wrapper approach • Intuitively, the bias would be reduced if we adopted a learner having a small bias term. A low bias, yet high variance, learner is the k-nearest neighbour classifiers (kNN) for small values of k • In particular, it has been shown that for a 1NN learner and a binary classification problem lim N→∞ M1NN s = 1 − Rs where M1NN s is the misclassification error of a nearest neighbour. • Since cross-validation returns a consistent estimation of Mh and since the quantity Mh asymptotically converges to one minus the relevance Rs, we have that 1 − M 1NN s is a consistent estimator of the relevance Rs. • We propose then as relevance estimator ˆRkNN s = 1 − M kNN s which is the cross-validation error of a kNN learner with low k. This term returns an unbiased, yet high variance, estimate of the relevance of the subset s. A model-based relevance estimation approach for feature selection in microarray datasets – p. 10/1
  • 11. Reducing the variance of the estimator The low-bias high-variance nature of the ˆRkNN s estimator suggests that the best way to employ this estimator is by combining it with other relevance estimators. We will take into consideration two possible estimators to combine with: 1. a direct model-based estimator ˆp1 of the conditional probability p1(s) = Prob {y = y1|s} and consequently of the quantity r(s). This estimator first samples a set of N unclassified input vectors si according to the empirical distribution ˆFs and then computes the Monte Carlo estimation ˆRD s = 1 N N i=1 ˆp2 1(si ) + ˆp2 0(si ) = 1 − 2 N N i=1 ˆp1(si )(1 − ˆp1(si )) A similar estimator was proposed by Fukunaga in 1973 to estimate the Bayes error. A model-based relevance estimation approach for feature selection in microarray datasets – p. 11/1
  • 12. 2. a filter estimator based on the notion of mutual information: several filter algorithms exploit this notion in order to estimate the relevance. An example is the MRMR algorithm (Peng et al., 05) where the relevance of a feature subset s, expressed in terms of the mutual information I(s, y) = H(y) − H(y|s), is approximated by the incremental formulation IMRMR(s; y) = IMRMR(si; y) + I(xi, y) − 1 m − 1 xj ∈si I(xj; xi) (2) where xi is a feature belonging to the subset s, si is the set s with the xi feature set aside and m is the number of components of s. Now since H(y|s) = H(y) − I(s, y) and Gs = 1 − Rs = GH (H(y|s)) we obtain that ˆRMRMR s = 1 − GH (H(y) − IMRMR(s, y)) is a MRMR estimator of the relevance Rs where GH (·) is the monotone mapping between H and Gini index. A model-based relevance estimation approach for feature selection in microarray datasets – p. 12/1
  • 13. Proposed relevance estimators We propose two novel relevance estimators based on the principle of averaging ˆRs = ˆRCV s + ˆRD s 2 , ˆRs = ˆRCV s + ˆRMRMR s 2 and the associated feature selection algorithms: sR = arg max s⊂x,|s|≤d ˆRs sR = arg max s⊂x,|s|≤d ˆRs A model-based relevance estimation approach for feature selection in microarray datasets – p. 13/1
  • 14. Experimental session • 20 public domain microarray expression datasets • external cross-validation scheme three-fold cross-validation strategy • to avoid any dependency between the learning algorithm employed by the wrapper and the classifier used for prediction, the experimental session is composed of two parts: • Part 1: comparison with the wrapper WSVM and we use the set of classifiers C1 ={TREE, NB, SVMSIGM, LDA, LOG} which does not include the SVMLIN learner, • Part 2: comparison with the wrapper WNB and we use the set of classifiers C2 ={TREE, SVMSIGM, SVMLIN, LDA, LOG} which does not include the NB learner. A model-based relevance estimation approach for feature selection in microarray datasets – p. 14/1
  • 15. Experiments with cancer datasets Name N n K Golub 72 7129 2 Alon 62 2000 2 Notterman 36 7457 2 Nutt 50 12625 2 Shipp 77 7129 2 Singh 102 12600 2 Sorlie 76 7937 2 Wang 286 22283 2 Van’t Veer 65 24481 2 VandeVijver 295 24496 2 Sotiriou 99 7650 2 Pomeroy 60 7129 2 Khan 63 2308 4 Hedenfalk 22 3226 3 West 49 7129 4 Staunton 60 7129 9 Su 174 12533 11 Bhattacharjee 203 12600 5 Armstrong 72 12582 3 Ma 60 22575 3 A model-based relevance estimation approach for feature selection in microarray datasets – p. 15/1
  • 16. Results 1st part Name R’ WSVM R” MRMR RANK Golub 0.0917 0.1177 0.1 0.1079 0.1225 Alon 0.2704 0.2658 0.2267 0.1996 0.2281 Notterman 0.1966 0.0985 0.1494 0.1472 0.1432 Nutt 0.3798 0.4171 0.3873 0.3847 0.4189 Shipp 0.1429 0.1319 0.1322 0.1362 0.1873 Singh 0.1619 0.1517 0.1266 0.1374 0.1328 Sorlie 0.3835 0.4314 0.3963 0.4004 0.3987 Wang 0.4282 0.4111 0.4218 0.4232 0.4181 Van’t Veer 0.2786 0.2638 0.2492 0.2217 0.2277 VandeVijver 0.454 0.4724 0.4365 0.4636 0.4482 Sotiriou 0.5279 0.5796 0.5351 0.5708 0.5339 Pomeroy 0.428 0.4191 0.4141 0.3876 0.4181 Khan 0.0878 0.1143 0.0582 0.0686 0.131 Hedenfalk 0.5475 0.5263 0.452 0.5273 0.5389 West 0.6463 0.6109 0.6186 0.5746 0.6109 Staunton 0.6822 0.71 0.6511 0.6865 0.7407 Su 0.2568 0.307 0.2549 0.3772 0.3352 Bhattacharjee 0.1232 0.1347 0.1105 0.1057 0.1515 Armstrong 0.1082 0.1199 0.1306 0.115 0.1122 Ma 0.2456 0.2041 0.2257 0.2413 0.2317 AVG 0.323 0.331 0.310 0.326 0.331 W/B than R’ (R”) 10/7 9/6 9/2 A model-based relevance estimation approach for feature selection in microarray datasets – p. 16/1
  • 17. Results 2nd part Name R’ WNB R” MRMR RANK Golub 0.0886 0.1114 0.0971 0.1019 0.0904 Alon 0.2376 0.2568 0.2181 0.2109 0.221 Notterman 0.1852 0.2059 0.1491 0.1512 0.1645 Nutt 0.3929 0.3402 0.36 0.3898 0.4258 Shipp 0.1261 0.127 0.1198 0.1338 0.1734 Singh 0.1495 0.1454 0.1297 0.1377 0.1245 Sorlie 0.3848 0.4254 0.3808 0.3953 0.3838 Wang 0.4363 0.4345 0.4298 0.4281 0.4255 Van’t Veer 0.2747 0.2715 0.2421 0.2253 0.2325 VandeVijver 0.4626 0.44 0.4763 0.4721 0.4358 Sotiriou 0.5126 0.5578 0.5505 0.5732 0.5611 Pomeroy 0.4367 0.4389 0.4007 0.3902 0.4224 Khan 0.0804 0.0896 0.0628 0.0631 0.0901 Hedenfalk 0.5379 0.5187 0.4369 0.4904 0.4949 West 0.6413 0.6696 0.5542 0.5882 0.6728 Staunton 0.6689 0.8298 0.6981 0.6661 0.83 Su 0.2544 0.3096 0.2646 0.3739 0.3529 Bhattacharjee 0.1235 0.1209 0.101 0.1061 0.1186 Armstrong 0.1079 0.1668 0.125 0.1148 0.1034 Ma 0.2565 0.2635 0.2335 0.2443 0.2681 AVG 0.322 0.3335 0.315 0.327 0.331 W/B than R’ (R”) 9/2 10/3 11/2 A model-based relevance estimation approach for feature selection in microarray datasets – p. 17/1
  • 18. Conclusions • Feature selection demands accurate estimation of relevance of subsets of features. • Wrapper methods use cross-validation estimation of misclassification error with generic learners. We show that this means a biased estimation of relevance. • The cross validation assessment ˆRkNN s returned by kNN techniques with low k provide a low bias yet high variance estimator of relevance. • Variance can be reduced by combining with other estimators. • Experiments on real datasets showed that the resulting relevance estimator can outperform both conventional wrapper and filter algorithms. A model-based relevance estimation approach for feature selection in microarray datasets – p. 18/1