SlideShare a Scribd company logo
1 of 20
Download to read offline
Computing, Artificial Intelligence and Information Technology
Ensemble strategies for a medical diagnostic decision
support system: A breast cancer diagnosis application
David West a,*, Paul Mangiameli b
, Rohit Rampal c
, Vivian West d
a
Department of Decision Sciences, College of Business Administration, East Carolina University, Greenville, NC 27836, USA
b
College of Business Administration, University of Rhode Island, Kingston, RI 02881, USA
c
School of Business Administration, Portland State University, Portland, OR 97201, USA
d
School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Received 21 March 2002; accepted 1 October 2003
Available online 19 December 2003
Abstract
The model selection strategy is an important determinant of the performance and acceptance of a medical diagnostic
decision support system based on supervised learning algorithms. This research investigates the potential of various
selection strategies from a population of 24 classification models to form ensembles in order to increase the accuracy of
decision support systems for the early detection and diagnosis of breast cancer. Our results suggest that ensembles
formed from a diverse collection of models are generally more accurate than either pure-bagging ensembles (formed
from a single model) or the selection of a ‘‘single best model.’’ We find that effective ensembles are formed from a small
and selective subset of the population of available models with potential candidates identified by a multicriteria process
that considers the properties of model generalization error, model instability, and the independence of model decisions
relative to other ensemble members.
Ó 2003 Elsevier B.V. All rights reserved.
Keywords: Decision support systems; Medical informatics; Neural networks; Bootstrap aggregate models; Ensemble strategies
1. Introduction
Breast cancer is one of the most prevalent can-
cers, ranking third worldwide and is the most pre-
valent form of cancer worldwide among women
(Parkin, 1998). Most developed countries have seen
increases in its incidence within the past 20 years.
Based on recent available international data, breast
cancer ranks second only to lung cancer as the most
common newly diagnosed cancer (Parkin, 2001). In
2001 the American Cancer Society predicted that in
the United States approximately 40,200 deaths
would result from breast cancer and that 192,200
women would be newly diagnosed with breast can-
cer (Greenlee et al., 2001). Breast cancer outcomes
have improved during the last decade with the
development of more effective diagnostic techniques
and improvements in treatment methodologies. A
key factor in this trend is the early detection and
*
Corresponding author. Tel.: +1-252-3286370; fax: +1-919-
3284092.
E-mail address: westd@mail.ecu.edu (D. West).
0377-2217/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.ejor.2003.10.013
European Journal of Operational Research 162 (2005) 532–551
www.elsevier.com/locate/dsw
accurate diagnosis of this disease. The long-term
survival rate for women in whom breast cancer has
not metastasized has increased, with the majority of
women surviving many years after diagnosis and
treatment. A medical diagnostic decision support
system (MDSS) is one technology that can facilitate
the early detection of breast cancer.
The MDSS (Miller, 1994; Sheng, 2000) is an
evolving technology capable of increasing diag-
nostic decision accuracy by augmenting the natural
capabilities of human diagnosticians in the com-
plex process of medical diagnosis. A recent study
finds that the physiciansÕ diagnostic performance
can be strongly influenced by the quality of infor-
mation produced by a diagnostic decision support
system (Berner et al., 1999). For MDSS imple-
mentations that are based on supervised learning
algorithms, the quality of information produced is
dependent on the choice of an algorithm that learns
to predict the presence/absence of a disease from a
collection of examples with known outcomes. The
focus of this paper is MDSS systems based on these
inductive learning algorithms and excludes expert
system based approaches. Some of the earliest
MDSS systems used simple parametric models like
linear discriminant analysis and logistic regression.
The high costs of making a wrong diagnosis has
motivated an intense search for more accurate
algorithms, including non-parametric methods
such as k nearest neighbor or kernel density, feed-
forward neural networks such as multilayer
perceptron or radial basis function, and classifica-
tion-and-regression trees. Unfortunately, there is
no theory available to guide the selection of an
algorithm for a specific diagnostic application.
Traditionally, the model selection is accomplished
by selecting the ‘‘single best’’ (i.e., most accurate)
method after comparing the relative accuracy of a
limited set of models in a cross validation study.
Recent research suggests that an alternate
strategy to the selection of the ‘‘single best model’’
is to employ ensembles of models. Breiman (1996)
reports that ‘‘bootstrap ensembles,’’ combinations
of models built from perturbed versions of the
learning sets, may have significantly lower errors
than the ‘‘single best model’’ selection strategy. In
fact, several authors provide evidence that the
‘‘single best model’’ selection strategy may be the
wrong approach (Breiman, 1995, 1996; Wolpert,
1992; Zhang, 1999a,b).
The purpose of this research is to investigate
the potential of bootstrap ensembles to reduce the
diagnostic errors of MDSS applications for the
early detection and diagnosis of breast cancer. We
specifically investigate the effect of model diversity
(the number of different models in the ensemble)
on the generalization accuracy of the ensemble.
The ensemble strategies investigated include a
‘‘baseline-bagging’’ ensemble (i.e., an ensemble
formed from multiple instances of a single model),
a diverse ensemble with controlled levels of model
diversity, and an ensemble formed from a multi-
criteria selection methodology. The three ensemble
strategies are benchmarked against the ‘‘single best
model.’’ In all cases, an aggregate ensemble deci-
sion is achieved by majority vote of the decisions
of the ensemble members.
In the next section of this paper we review the
model decisions of several recent MDSS imple-
mentations. The third section will discuss our re-
search methodology and the experimental design
that we use to estimate the generalization error for
the MDSS ensembles. The fourth section presents
our results, the mean generalization error for sev-
eral ensemble strategies. We conclude with a dis-
cussion of these results, and implications to guide
the ensemble formation strategy for a medical
diagnostic decision support system.
2. MDSS model selection
Virtually all MDSS implementations to date use
the ‘‘single best model’’ selection strategy. This
strategy selects a model from a limited set of po-
tential models whose accuracy is estimated in cross
validation tests (Anders and Korn, 1999; Kon-
onenko and Bratko, 1991). The most accurate
model in the cross validation study is then selected
for use in the MDSS. A brief discussion of some of
the single model MDSS implementations reported
in the research literature follows. It is not possible
to present a complete survey of MDSS applica-
tions. The readers interested in more information
on this subject are referred to the survey papers of
Miller (1994) and Lisboa (2002).
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 533
2.1. Single model MDSS applications
Linear discriminant analysis has been used
to diagnosis coronary artery disease (Detrano
et al., 1989), acute myocardial infarction (Gilpin
et al., 1983), and breast cancer (West and
West, 2000). Logistic regression has been used to
predict or diagnose spondylarthropathy (Douga-
dos et al., 1991), acute myocardial infarction
(Gilpin et al., 1983), coronary artery disease
(Hubbard et al., 1992), liver metastases (Makuch
and Rosenberg, 1988), gallstones (Nomura et al.,
1988), ulcers (Schubert et al., 1993), mortality risk
for reactive airway disease (Tierney et al., 1997),
and breast cancer (West and West, 2000). Non-
parametric models have also been used to diagnose
or predict various pathologies. K nearest neighbor
was used in comparative studies to diagnose lower
back disorders (Bounds et al., 1990), predict 30-day
mortality and survival following acute myocardial
infarction (Gilpin et al., 1983), and separate can-
cerous and non-cancerous breast tumor masses
(West and West, 2000). Kernel density has been
utilized to determine outcomes from a set of pa-
tients with severe head injury (Tourassi et al.,
1993), and to differentiate malignant and benign
cells taken from fine needle aspirates of breast
tumors (Wolberg et al., 1995). Neural networks
have also been used in a great number of MDSS
applications because of the belief that they have
greater predictive power (Tu et al., 1998). The
traditional multilayer perceptron has been used to
diagnose breast cancer (Baker et al., 1995, 1996;
Josefson, 1997; Wilding et al., 1994; Wu et al.,
1993), acute myocardial infarction (Baxt, 1990,
1991, 1994; Fricker, 1997; Rosenberg et al., 1993),
colorectal cancer (Bottaci et al., 1997), lower back
disorders (Bounds et al., 1990), hepatic cancer
(Maclin and Dempsey, 1994), sepsis (Marble and
Healy, 1999), cytomegalovirus retinopathy
(Sheppard et al., 1999), trauma outcome (Palocsay
et al., 1996), and ovarian cancer (Wilding et al.,
1994). PAPNET, an MDSS based on an MLP, is
now available for screening gynecologic cytology
smears (Mango, 1994, 1996). The radial basis
function neural network has been used to diagnose
lower back disorders (Bounds et al., 1990), classify
micro-calcifications in digital mammograms (Tsujji
et al., 1999), and in a comparative study of acute
pulmonary embolism (Tourassi et al., 1993). Clas-
sification and regression trees have been used to
predict patient function following head trauma
(Temkin et al., 1995), to evaluate patients with
chest pains (Buntinx et al., 1992), and to diagnose
anterior chest pain (Crichton et al., 1997).
2.2. Ensemble applications
There is a growing amount of evidence that
ensembles, a committee of machine learning algo-
rithms, result in higher prediction accuracy. One of
the most popular ensemble strategies is bootstrap
aggregation or bagging predictors advanced by
Breiman (1996). This strategy, depicted in Fig. 1,
uses multiple instances of a learning algorithm
(C1ðxÞ . . . CBðxÞ) trained on bootstrap replicates of
the learning set (TB1 . . . TBB). Plurality vote is used
to produce an aggregate decision from the modelÕs
individual decisions. If the classification algorithm
is unstable in the sense that perturbed versions of
the training set produce significant changes in the
predictor, then bagging predictors can increase the
decision accuracy. Breiman demonstrates this by
constructing bootstrap ensemble models from
classification and regression trees and tests the
resulting ensembles on several benchmark data
sets. On these data sets, the CART bagging
ensembles achieve reductions in classification
errors ranging from 6% to 77%. Breiman found
that ensembles with as few as 10 bootstrap repli-
cates are sufficient to generate most of the
improvement in classification accuracy (Breiman,
1996). Model instability is therefore an important
consideration in constructing bagging ensembles.
Models with higher levels of instability will achieve
greater relative improvement in classification
accuracy as a result of a bagging ensemble strategy.
The current research on ensemble strategies fo-
cuses primarily on ensembles of a single classifica-
tion algorithm. For example, Zhang (1999b)
aggregates 30 multilayer perceptron neural net-
works with varying numbers of hidden neurons to
estimate polymer reactor quality, while Cunning-
ham et al. (2000) report improved diagnostic pre-
diction for MDSS systems that aggregate neural
network models. Bay (1999) tested combinations of
534 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
nearest neighbor classifiers trained on a random
subset of features and found the aggregate model
to outperform standard nearest neighbor variants.
Zhilkin and Somorjai (1996) explore model diver-
sity in bagging ensembles by using combinations of
linear and quadratic discriminant analysis, log-
istic regression, and multilayer perceptron neu-
ral networks to classify brain spectra by magnetic
resonance measurement. They report that the boot-
strap ensembles are more accurate in this applica-
tion than any ‘‘single best model,’’ and that the
performance of the single models varies widely,
performing well on some data sets and poorly on
others.
Research to date confirms that the generaliza-
tion error of a specific model can be reduced by
bootstrap ensemble methods. There has been little
systematic study of the properties of multimodel
MDSS systems. The contribution of this paper is
to investigate more thoroughly the model selection
strategies available to the practitioner implement-
ing an MDSS system, including the role of model
diversity in bagging ensembles.
3. Research methodology
Our research methodology is presented in three
parts. Part one describes the two data sets that we
examine. Part two describes the 24 models that we
employ for our MDSS ensembles. Part three pre-
sents our experimental design.
3.1. Breast cancer data sets
The two data sets investigated in this research
are both contributed by researchers at the Uni-
versity of Wisconsin. These data sets are avail-
able from the UCI Machine Learning Repository
(Blake and Merz, 1998). The Wisconsin breast
cancer data consists of records of breast cytol-
ogy first collected and analyzed by Wolberg,
Street, and Mangasarian (Mangasarian et al.,
1995; Wolberg et al., 1995). The data, which we
will refer to as the ‘‘cytology data,’’ consist of 699
records of virtually assessed nuclear features of
fine needle aspirates from patients whose diag-
noses resulted in 458 benign and 241 malig-
nant cases of breast cancer. A malignant label is
confirmed by performing a biopsy on the breast
tissue. Nine ordinal variables measure properties
of the cell, such as thickness, size, and shape, and
are used to classify the case as benign or malig-
nant.
The second data set we refer to as the ‘‘prog-
nostic data.’’ These data are from follow-up visits
and include only those cases exhibiting invasive
breast cancer without evidence of distant metas-
tases at the time of diagnosis (Mangasarian et al.,
1995; Wolberg et al., 1995). Thirty features, com-
puted from a digitized image of a fine needle
aspirate of a breast mass, describe characteristics
of the cell nuclei present in the image. There are
198 examples; 47 are recurrent cases and 151 are
non-recurrent cases.
...
Pˆ
TB1
TB2
TBB
C
*
(x)= argy ∈Y
max∑1(C (x)=y)
C1
(x)
C2
(x)
CB
(x)
...
t
i
i =1
Fig. 1. Bagging ensemble scheme for classification decisions.
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 535
3.2. Model description
The learning algorithms have been selected to
represent most of the methods used in prior MDSS
research and commercial applications. We use the
term ‘‘model’’ to refer to a specific configuration of
a learning architecture. These include linear dis-
criminant analysis (LDA), logistic regression (LR),
three different neural network algorithms (multi-
layer perceptron (MLP), mixture-of-experts
(MOE), and radial basis functions (RBF)), classi-
fication and regression trees (CART), nearest
neighbor classifier (KNN), and kernel density
(KD). Many of these algorithms require specific
configuration decisions or parameter choices. In
these cases, the specific configurations and
parameters we use are guided by principles of
model simplicity and generally accepted practices.
For example, the neural network models are lim-
ited to a single hidden layer with the number of
hidden layer neurons generally ranging from 2 to 8.
A total of four specific neural network configura-
tions are created for each of the three architectures.
Four different configurations are also created for
KNN, with a range of nearest neighbors from 3 to
11. For KD, density widths range from 0.1 to 7.0.
Both the Gini and Twoing splitting rules are used
to create two different CART configurations.
3.3. Experimental design
The purpose of the experimental design is to
provide reliable estimates of the generalization
error for the ‘‘single best model’’ selection strategy
and several ensemble formation strategies. To
estimate the mean generalization error of each
strategy, the available data, D, is split into three
partitions: a training set, Tij, a validation set, Vij,
and a final independent holdout test set, Hi which
is used to measure the modelÕs generalization er-
ror. The subscript i is an index for the run number
varying from 1 to 100 while j indexes the cross
validation partition number varying from 1 to 10.
3.3.1. Estimating ‘‘single best model’’ generalization
error
The ‘‘single best’’ generalization error is esti-
mated by generating 10-fold cross validation par-
titions, training all 24 models with Tij, determining
the model with the lowest error on the validation
set Vij, and finally measuring that modelÕs gener-
alization error on the independent holdout test set
Hi. The 10-fold cross validation partitioning is
repeated 100 times. The validation set Vij is used to
implement early stopping during training of the
neural networks to avoid model overfitting. The
details of this process are specified in the algorithm
below. We begin the process in step 1 by randomly
selecting data to form an independent test set and
removing the test set from the available data, D
(step 2). The test set is sized to be approximately
10% of the available data. The remaining data, Di
(where Di ¼ D À Hi), is randomly shuffled and
partitioned into 10 mutually exclusive sets in step
3. One partition is used as a validation set (step 4a)
and the other nine are consolidated and used for
training (step 4b). This is repeated 10 times so that
each partition functions as a validation set. The
‘‘single best model’’ is then identified as the model
with the lowest error on the 10 validation sets (step
5 and 6). An estimate of this modelÕs accuracy on
future unseen cases is measured using the inde-
pendent test set (step 7). The seven steps are re-
peated 100 times to determine a mean
generalization error for the ‘‘single best model’’
strategy (step 8).
Repeat for i ¼ 1 to 100
Step 1. Create holdout test set, Hi, by randomly
sampling without replacement from the
available data, D. Hi is sized to be approx-
imately 10% of D
Step 2. Remove Hi from D forming Di ¼ D À Hi
Step 3. Randomly shuffle and partition Di into
j ¼ 1 . . . 10 mutually exclusive partitions,
Dij
Step 4. Repeat for j ¼ 1 to 10
a. Form validation set Vij ¼ Dij
b. Form training set Tij ¼ Di À Vij by consoli-
dating the remaining partitions
c. Repeat for model number B where B ¼ 1 to
24
i. Train each classifier model CBðxÞ with
training set Tij
ii. Evaluate classifier error EVijB on valida-
tion set, Vij
536 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
end loop B
end loop j
Step 5. Determine average validation error EiB ¼P10
j¼1 EVijB for each model
Step 6. Identify the ‘‘single best model’’ BÃ
¼
arg minðEiBÞ 8B ¼ f1; . . . ; 24g
Step 7. Estimate the generalization error for the
winning model, ^EiBÃ , using the indepen-
dent hold out test set, Hi
end loop i
Step 8. Determine the mean generalization error
for the ‘‘single best model’’ strategy,
^EBÃ ¼
P100
i¼1
^EiBÃ =100
3.3.2. Estimating bagging ensemble generalization
error
The generalization error for bagging ensembles
is estimated at several controlled levels of ensemble
model diversity (the number of different models in
the ensemble). Each ensemble will consist of ex-
actly 24 models. We chose 24 ensemble members
because of the symmetry with the number of
models investigated and note that this number
exceeds the minimum of 10 members reported by
Breiman (1996) as necessary to achieve an
improvement in generalization accuracy. We use
the term ‘‘baseline-bagging ensembles’’ to describe
ensembles whose membership is limited to repli-
cations of a single model. These are the least di-
verse ensembles. The most diverse ensembles are
constructed from the 24 different models. Inter-
mediate levels of diversity (between the baseline
ensembles and the most diverse ensembles) are
constructed with 12, 8, 6, 4, 3, and 2 distinct
models. For these intermediate levels of diversity,
the ensembles are formed by randomly selecting
the specified number of models without replace-
ment from the set of 24 available models. The
models chosen are replicated to maintain a total of
24 ensemble members. For example, 12 models
would be replicated twice, and 8 models would be
replicated three times. The population of available
models from which the ensembles are formed is
also an important determinant of ensemble accu-
racy. To explore this effect, we also form diverse
models whose members are sampled from popu-
lations of the 12 most accurate models (top 50%)
and the 6 most accurate models (top 25%). Our
final ensemble is formed from a multicriteria
selection process that includes model instability
and model correlation as well as model accuracy
(Cunningham et al., 2000; Sharkey, 1996). The
ensemble configurations for these restricted pop-
ulations are limited to those defined in Table 1
below.
A methodology for measuring comparable
generalization errors for bagging ensembles is
discussed next. This methodology parallels the
work of Breiman (1996), Wolpert (1992), and
Zhang (1999b). All bagging ensembles investigated
in this research consist of 24 voting members that
have been trained with different bootstrap repli-
cate data. The purpose of the bootstrap replicates
Table 1
Ensemble configurations for model diversity
Number of different
models
Number of model
replications
Population of models
24 models 12 models 6 models 3 models
24 1 X
12 2 X X
8 3 X X
6 4 X X X
4 6 X X X
3 8 X X X Xa
2 12 X X X
1b
24 X
a
Multicriteria selection ensemble.
b
Baseline ensemble.
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 537
is to create a different learning perspective for each
of the 24 models, thereby increasing the indepen-
dence of the prediction errors and generating more
accurate ensembles. The data partitions used to
train and test the ensemble members (i.e.,
Tij; Vij; Hi) are the same partitions created in the
earlier section for the ‘‘single best model’’ strategy.
For clarity in expressing the ensemble algorithm,
steps 1–3 used to create these partitions are re-
peated below. The ensemble algorithm first differs
from the ‘‘single best model’’ strategyÕs 10-fold
cross validation algorithm in step 4.c.i where 100
bootstrap training replicates are formed by sam-
pling with replacement from the training set Tij.
We refer to these bootstrap training sets by the
symbol TBijk where i refers to the run number, j the
cross validation partition and k the bootstrap
replicate. A total of k ¼ 1 . . . 100 bootstrap sam-
ples are created for each cross validation for a total
of 1000 bootstrap replicates. We create more than
the minimum number of bootstrap replicates re-
quired by the 24 models to produce better esti-
mates of the generalization error during the
sampling process described next.
Steps 5 through 7 define a process of randomly
creating ensembles with controlled levels of diver-
sity from the test sets created for each model in step
4.c.ii.2 where each trained classifier is tested on the
independent holdout test set Hi. For each level of
diversity investigated, 500 different ensembles are
formed to estimate a mean generalization error.
For each level of diversity, the number of models
included in the ensemble membership is defined in
Table 1 and varies from 1 (for the baseline-bagging
ensemble) to 24 (for the most diverse ensembles).
For the baseline-bagging ensembles, we form 24
ensembles, one for each model. For the most di-
verse ensembles, all models are included in the
ensemble. The intermediate levels of diversity in-
clude ensembles with 2, 3, 4, 6, 8, and 12 models
and are formed by randomly sampling from the
population of models without replacement (step 5).
Once the models are specified, step 6 defines a
process to identify the test results by randomly
sampling without replacement from the set of 100
bootstrap replicates produced in step 4.c.ii.2. En-
semble generalization errors can then be estimated
by a majority vote of the 24 ensemble members
(step 7). This process of ensemble formation is re-
peated 500 times to estimate a mean generalization
error for all of the ensemble formation strategies
investigated. Our decision to test the single best on
100 repetitions and the ensemble on 500 repetitions
is designed to obtain approximately equivalent
precision. The ensembles formation process intro-
duces some additional sources of variability. First,
the ensemble membership varies substantially
from run to run based on the random sampling of
models. Secondly, the ensembles are formed from
bootstrap training sets, a method of intentionally
perturbing the data set to introduce more diversity.
We therefore require additional iterations to obtain
an acceptable precision.
Repeat for i ¼ 1 to 100
Step 1. Create holdout test set, Hi, by randomly
sampling without replacement from the
available data, D. Hi is sized to be approx-
imately 10% of D
Step 2. Remove Hi from D forming Di ¼ D À Hi
Step 3. Randomly shuffle and partition Di into
j ¼ 1 . . . 10 mutually exclusive partitions,
Dij
Step 4. Repeat for j ¼ 1 to 10
a. Form validation set Vij ¼ Dij
b. Form training set Tij ¼ Di À Vij by consoli-
dating the remaining partitions
c. Repeat for k ¼ 1 to 100
i. Form bootstrap training TBijk set by sam-
pling with replacement from Tij
ii. Repeat for B ¼ 1 to 24
1.Train each classifier model CBðxÞ with
bootstrap training set TBijk
2.Test each classifier CBðxÞ on test set Hi
end loop B
end loop k
end loop j
Repeat for z ¼ 1 to 500 */ these steps form 500
different ensembles for each level of ensemble
diversity*/
Step 5. For intermediate levels of diversity {2, 3,
4, 6, 8, and 12 models}, randomly identify
models without replacement from the
population of models and form ensembles
with 24 members
538 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
Step 6. For each model selected retrieve test re-
sults for respective model for run i, for
j ¼ 1 to 10 and for random k ¼ f1 . . .
100g
Step 7. Estimate the generalization error, E
_
Bgi for
the ensemble by majority vote
end loop z
end loop i
4. Generalization error results of ensemble strate-
gies
4.1. ‘‘Single best model’’ strategy generalization
error results
We first estimate the generalization error of a
strategy that selects the ‘‘single best model’’ from
the 24 models investigated. The most accurate
model (lowest generalization error) is identified
from a 10-fold cross validation study. These ‘‘single
best’’ generalization errors, which are mean values
of the 100 runs conducted, are reported at the
bottom of Table 2 in the line labeled ‘‘average
error.’’ The estimate of the generalization error for
the ‘‘single best’’ strategy is 0.226 for the prognostic
data and 0.029 for the cytology data. The model
‘‘winning frequency’’ is also reported in Table 2.
The winning frequency is the proportion of times
each model is determined to be the most accurate
for the validation data during the 100 cross vali-
dation trials. It is recognized that the ‘‘single best
model’’ selection procedure is very dependent on
the data partitions used to train and test the models
(Cunningham et al., 2000). The results of Table 2
demonstrate this deficiency. As the composition of
the cross validation training and test sets change
during the 100 trials, different models are judged to
Table 2
Single best model, winning frequency, and average generalization error
Proportion of wins
Model Cytology Prognostic
1 MLPa 0.02 0.05
2 MLPb 0.02 0.03
3 MLPc 0 0.08
4 MLPd 0 0.07
5 MOEa 0.03 0.02
6 MOEb 0.01 0.04
7 MOEc 0.02 0.04
8 MOEd 0.05 0.08
9 RBFa 0.08 0.08
10 RBFb 0.11 0.08
11 RBFc 0.22 0.12
12 RBFd 0.42 0.22
13 CARTa 0 0.04
14 CARTb 0 0.03
15 LDA 0 0
16 LR 0 0
17 KNNa 0 0
18 KNNb 0.02 0.01
19 KNNc 0 0.01
20 KNNd 0 0
21 KDa 0 0
22 KDb 0 0
23 KDc 0 0
24 KDd 0 0
Average error 0.029 0.226
Standard deviation (0.001) (0.007)
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 539
be the most accurate. Of the 24 models investi-
gated, 16 different models are determined to be
most accurate on at least one trial for the prog-
nostic data, while the cytology data has 11 models
judged most accurate. It is also evident from Table
2 that the neural network models, particularly the
radial basis function networks, tend to win a dis-
proportionate share of the time for both data sets.
4.2. Baseline-bagging ensemble generalization error
results
The generalization error of baseline-bagging
ensembles formed with 24 members of a single
model is investigated next. Each of these ensembles
consists of 24 members that are variations of a
single classification model, e.g. MLPa or CARTb.
Diversity in the baseline-bagging ensembles is
introduced by differences in the bootstrap learning
sets, and in some instances from different parame-
ter initialization (i.e., neural network models). The
average generalization errors, as well as the maxi-
mum, minimum, and standard deviation for each
of the baseline-bagging ensembles, are summarized
in Table 3 for the prognostic data and Table 4 for
the cytology data. Each table is sorted in ascending
order by the generalization error. It is evident that
the most accurate baseline-bagging ensembles
correspond to models with high winning frequency
in the cross validation studies. For the prognostic
data, there are a total of eight (one-third of the
models) baseline-bagging ensembles that achieve a
mean generalization error lower than the 0.226 of
the ‘‘single best’’ strategy. For the cytology data,
only the RBFc and RBFd baseline-bagging
ensembles achieve comparable or lower errors than
the 0.029 of the ‘‘single best’’ strategy. There is
considerable variation in the generalization error
among the collection of baseline-bagging ensem-
bles. For the prognostic data, the RBFc model has
the lowest generalization error at 0.209, compared
to the highest error of 0.385 for the KNNa model.
For the cytology data, the RBFc model is again
most accurate with a generalization error of 0.028,
Table 3
Results of baseline-bagging ensembles prognostic data
Model Average Min Max StDev
RBFc 0.209 0.195 0.226 0.006
RBFd 0.212 0.195 0.237 0.007
RBFb 0.211 0.195 0.232 0.006
MOEc 0.223 0.200 0.247 0.009
RBFa 0.217 0.200 0.232 0.007
MOEd 0.222 0.200 0.242 0.009
MOEa 0.223 0.200 0.242 0.008
MOEb 0.225 0.205 0.247 0.008
MLPc 0.233 0.211 0.253 0.008
MLPb 0.230 0.200 0.253 0.009
MLPd 0.239 0.211 0.263 0.009
MLPa 0.221 0.195 0.247 0.010
CARTa 0.250 0.216 0.284 0.015
CARTb 0.250 0.205 0.295 0.015
LR 0.255 0.221 0.284 0.012
LDA 0.282 0.253 0.316 0.012
KDa 0.315 0.279 0.358 0.014
KDb 0.317 0.279 0.363 0.015
KDc 0.347 0.300 0.400 0.015
KNNb 0.357 0.305 0.416 0.016
KNNd 0.369 0.326 0.426 0.016
KNNc 0.379 0.332 0.421 0.017
KDd 0.380 0.337 0.421 0.014
KNNa 0.385 0.337 0.432 0.015
540 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
compared to the highest error of 0.071 for the KDd
model. This suggests that model selection is still an
important consideration in the design of baseline-
bagging ensembles. In this application, it is not
feasible to produce a baseline-bagging ensemble
with low generalization error using a randomly
selected model.
While many of the baseline-bagging ensembles
for the prognostic data achieve meaningful error
reductions relative to the ‘‘single best’’ strategy,
the same effect is not as pronounced for the
cytology data. The reason for the ineffectiveness of
ensemble methods for the cytology application
may be that the decision concepts to be learned
result in relatively similar decisions among the
ensemble members. Sharkey (1996) argues that the
reduction of error by bagging ensembles is limited
in situations where the ensemble members exhibit
a high degree of decisions consensus. The levels of
concurrence among ensemble members can be in-
ferred by inspecting the correlation of the model
outputs expressed as posterior probabilities of
class membership. The correlation of model out-
puts for pairs of potential ensemble members is
given in Table 5 for the prognostic data and Table
6 for the cytology data. To save space, only models
in the upper 67th percentile of accuracy are in-
cluded in these tables. The average correlation of
all models is 0.476 for the prognostic data and
0.616 for the cytology data. The substantially
higher level of model concurrence among the
cytology ensemble members indicates that a di-
verse set of independent ensemble experts has not
been achieved, and that the potential for error
reduction from the use of bagging ensembles will
be more modest. We also note that the intra-
architecture correlations are very high for both
data sets. For example, the correlations for the
four radial basis function models for the prog-
nostic data range from 0.90 to 0.95, correlations
for the mixture-of-experts models range from 0.81
to 0.84, and correlations for the multilayer per-
ceptron models range from 0.72 to 0.76. These
correlations suggest that a more effective ensemble
formation strategy may be to select different
architectures for ensemble membership, and in
Table 4
Results of baseline-bagging ensembles cytology data
Model Average Min Max StDev
RBFc 0.028 0.026 0.031 0.001
RBFd 0.029 0.026 0.031 0.001
RBFa 0.030 0.028 0.032 0.001
RBFb 0.030 0.026 0.032 0.001
MOEd 0.033 0.029 0.037 0.002
MOEa 0.034 0.029 0.038 0.001
MLPb 0.034 0.031 0.038 0.001
MOEc 0.034 0.031 0.038 0.001
MOEb 0.035 0.032 0.038 0.001
MLPa 0.035 0.031 0.038 0.002
MLPd 0.035 0.031 0.040 0.002
MLPc 0.036 0.031 0.041 0.002
KNNd 0.036 0.035 0.040 0.001
KNNb 0.036 0.034 0.040 0.001
KNNa 0.037 0.032 0.040 0.002
KNNc 0.037 0.034 0.040 0.001
LDA 0.040 0.040 0.043 0.001
KDc 0.043 0.037 0.049 0.002
KDb 0.046 0.040 0.051 0.002
CARTa 0.047 0.040 0.054 0.003
CARTb 0.047 0.040 0.054 0.003
KDa 0.048 0.038 0.054 0.003
LR 0.062 0.051 0.071 0.003
KDd 0.071 0.068 0.076 0.002
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 541
Table 5
Correlation of models for prognostic ensemble members
RBFc RBFd RBFb RBFa MOEd MOEa MOEb MOEc MLPa MLPc MLPb MLPd LR LDA CARTA CARTB
RBFc 1
RBFd 0.94 1
RBFb 0.95 0.93 1
RBFa 0.91 0.90 0.93 1
MOEd 0.65 0.64 0.64 0.63 1
MOEa 0.66 0.65 0.66 0.64 0.84 1
MOEb 0.66 0.65 0.66 0.64 0.83 0.82 1
MOEc 0.67 0.66 0.67 0.65 0.82 0.81 0.81 1
MLPa 0.66 0.66 0.66 0.67 0.73 0.73 0.73 0.74 1
MLPc 0.67 0.66 0.67 0.66 0.73 0.74 0.75 0.75 0.72 1
MLPb 0.67 0.66 0.67 0.66 0.74 0.74 0.74 0.74 0.72 0.75 1
MLPd 0.67 0.66 0.66 0.66 0.74 0.74 0.74 0.74 0.72 0.76 0.75 1
LR 0.34 0.34 0.33 0.33 0.38 0.38 0.36 0.37 0.34 0.34 0.34 0.34 1
LDA 0.26 0.26 0.25 0.24 0.40 0.40 0.37 0.37 0.32 0.33 0.33 0.33 0.53 1
CARTA 0.37 0.37 0.36 0.36 0.35 0.35 0.36 0.35 0.34 0.35 0.34 0.34 0.22 0.21 1
CARTB 0.36 0.37 0.36 0.36 0.35 0.35 0.36 0.35 0.34 0.35 0.34 0.34 0.22 0.21 0.50 1
Average correlation ¼ 0.476.
Table 6
Correlation of models for cytology ensemble members
RBFc RBFd RBFb RBFa MOEd MOEa MOEb MOEc MLPa MLPc MLPb MLPd LR LDA CARTA CARTB
RBFc 1
RBFd 0.96 1
RBFb 0.96 0.96 1
RBFa 0.96 0.96 0.96 1
MOEd 0.76 0.76 0.75 0.74 1
MOEa 0.73 0.73 0.72 0.72 0.87 1
MOEb 0.71 0.71 0.70 0.70 0.85 0.84 1
MOEc 0.70 0.70 0.69 0.69 0.84 0.82 0.81 1
MLPa 0.68 0.68 0.67 0.67 0.78 0.77 0.77 0.76 1
MLPc 0.65 0.65 0.64 0.64 0.75 0.75 0.75 0.74 0.72 1
MLPb 0.67 0.67 0.66 0.66 0.77 0.76 0.75 0.74 0.72 0.69 1
MLPd 0.64 0.64 0.63 0.63 0.73 0.73 0.72 0.72 0.70 0.68 0.68 1
LR 0.36 0.36 0.36 0.36 0.39 0.39 0.38 0.38 0.37 0.37 0.37 0.36 1
LDA 0.59 0.59 0.59 0.60 0.69 0.69 0.66 0.64 0.64 0.61 0.62 0.59 0.38 1
CARTA 0.44 0.44 0.43 0.43 0.46 0.46 0.46 0.45 0.46 0.44 0.44 0.43 0.34 0.43 1
CARTB 0.44 0.44 0.43 0.43 0.46 0.46 0.46 0.45 0.46 0.44 0.44 0.43 0.34 0.43 0.86 1
Average correlation ¼ 0.616.
542D.Westetal./EuropeanJournalofOperationalResearch162(2005)532–551
particular those architectures with low levels of
correlation. For the prognostic data, logistic
regression, linear discriminant analysis, and
CART appear to be good potential candidates as
these models have correlations substantially lower
than the average of all models. For the cytology
data, logistic regression and CART have the low-
est correlations and are potentially good candi-
dates for ensemble membership. We will use these
insights later in a multicriteria selection of models
for ensemble membership.
4.3. Ensemble generalization error results at con-
trolled levels of model diversity
We next investigate the strategy of forming
ensembles with higher levels of model diversity.
We expect that this additional source of diversity
will result in more accurate ensembles. This new
source of diversity is induced in the ensemble by
controlling ensemble membership to include dif-
ferent models and different architectures. Includ-
ing all of the 24 models investigated forms the
most diverse ensembles. We also create interme-
diate levels of diversity by forming ensembles with
12, 8, 6, 4, 3, and 2 different models. In all cases,
the models to include in ensemble membership are
chosen randomly without replacement from the set
of 24 models, and are replicated to maintain a total
of 24 ensemble members. For example, to form
ensembles with 12 different models requires form-
ing a random subset of 12 of the 24 models and
replicating each of the 12 models two times in the
ensemble. The test results for each model are
randomly chosen without replacement from the
100 available bootstrap test results for each data
partition. The generalization error for each level of
model diversity is estimated from 500 repetitions
of ensemble construction. The errors are plotted as
rectangular markers in Figs. 2 and 3 for the
prognostic data and the cytology data respectively.
These figures include the average from the ‘‘single
best’’ results of the cross validation study (plotted
as a solid horizontal line), the results of the base-
line-bagging ensembles, and the results of a mul-
ticriteria selection algorithm discussed in the next
subsection. Figs. 2 and 3 also show the effect of
restricting the population of available models for
the ensembles to the 12 most accurate models
(represented by triangular markers), and to the 6
most accurate models (represented by circular
makers). All generalization errors estimates are
means values from the 500 repetitions of ensemble
formation.
The results depicted in Figs. 2 and 3 demon-
strate that higher levels of model diversity result in
lower ensemble generalization error, although the
improvement is fairly modest for ensembles with
three or more different models. It is also clear that
the policy of restricting the population of potential
0.15
0.2
0.25
0.3
0.35
0.4
0 5 10 15 20 25
Number of Different Models in Bagging Ensemble
Generalizationerror
Baseline Ensembles
Ensembles from 24 models
Ensembles from top 12 models
Ensembles from top 6 models
Single best CV model
Ensemble from multicriteria selection
Fig. 2. Mean generalization error for bootsrap ensembles––prognostic data.
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 543
ensemble members to a smaller subset of the more
accurate models produces ensembles with lower
mean generalization errors for both data sets. For
example, ensembles of six different models sam-
pled from a population of 24 possible models for
the prognostic data have an average generalization
error of 0.225. The corresponding error for similar
ensembles with members sampled from the top 12
models is 0.212, and for ensembles sampled from
the top six models is 0.203. Similarly, for the
cytology data the expected generalization error is
0.033 for ensembles formed from 24 potential
models, 0.031 for ensembles formed from 12 po-
tential members, and 0.027 for ensembles formed
from six potential members. Two other conclu-
sions are evident from Figs. 2 and 3. In both
applications, the more diverse ensembles are more
accurate than the expected error of the ‘‘single
best’’ strategy. For the prognostic data, the strat-
egy of diverse bagging ensembles is universally
superior to the ‘‘single best’’ strategy for ensembles
with six or more models. For the cytology data,
the bagging ensembles are superior to the ‘‘single
best’’ strategy only for the most restricted model
population. For both data sets, the diverse
ensembles sampled from the population of the top
six models have a lower error than any of the
corresponding baseline-bagging ensembles.
4.4. Multicriteria model selection results
The prior observations suggest a strategy of
restricting the population of available ensemble
models to a small number of models with the
lowest generalization error. In this section we ex-
pand the selection criteria beyond accuracy to in-
clude low levels of model correlation as well as
high relative instability (Cunningham et al., 2000;
Sharkey, 1996). We measure model instability by
the range (maximum–minimum) of generalization
errors for the 500 baseline ensembles constructed.
High potential candidates for ensemble member-
ship are identified in Figs. 4 and 5. These figures
plot the baseline-bagging ensemble generalization
error as a function of model instability. Notice
that a significant positive slope exists for both data
sets with a regression model R2
of 0.746 and 0.803
respectively. This positive trend inhibits the effec-
tiveness of more diverse ensembles because higher
model instability is achieved by sacrificing model
accuracy. An ‘‘efficient frontier’’ for identifying
high potential ensemble members exists in the
lower right areas of the instability regression
plots, where models with low error and high
instability are identified. For the prognostic data,
the RBFd model is favored over RBFc because of
increased model instability. For similar reasons,
the CARTb model is preferred over CARTa. The
multicriteria methodology of ensemble construc-
tion for the prognostic application is to combine
RBFd, the most accurate model with two other
members from the set MOEc (0.52), MLPa (0.49),
LR (0.34), and CARTb (0.38). The average model
correlation is given in parenthesis and shown in
Fig. 4. The preferred models are CARTb and LR,
as they both have higher levels of instability and
relatively low correlation. The generalization error
of the resulting ensemble (RBFd, LR, CARTb), is
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0.065
0.07
0.075
0 5 10 15 20 25
Number of Different Models in Bagging Ensemble
GeneralizationError
Baseline Ensembles
Ensembles from 24 models
Ensembles from top 12 models
Ensembles from top 6 models
Single best CV models
Ensemble from multicriteria selection
Fig. 3. Mean generalization error for bootsrap ensemble––cytology data.
544 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
0.194 and is plotted in Fig. 2 as a horizontal
dashed line. Applying a similar multicriteria
ensemble construction strategy to the cytology
application produces an ensemble of RBFc,
RBFb, and MOEa with a resulting generalization
error of 0.027. This value is plotted on Fig. 3 as a
dashed horizontal line.
Table 7 presents a summary of the minimum
mean generalization errors achieved for each of
the ensemble strategies. For both of these data
sets, ensembles are an effective means to reduce
diagnostic error. The strategy of forming a base-
line-bagging ensemble from a single classification
model lowers the generalization error from the
error of a ‘‘single best’’ strategy for both data sets.
More diverse ensembles result in larger error
reductions than those achieved by the baseline-
bagging ensembles, but also require careful con-
sideration of relative model error as well as model
instability and independence of model decisions.
The multicriteria process, a selective strategy of
forming ensembles from a subset of three models,
yields the lowest generalization error of all strate-
gies. The multicriteria result for the cytology data
(0.027) can be compared to two other comparable
bagging ensemble studies that have been pub-
lished. Breiman (1996) reports a generalization
error of 0.037 for baseline-bagging ensembles
consisting of CART models, while Parmanto et al.
(1996) report a generalization error of 0.039 using
ensembles of neural networks.
The multicriteria selection ensemble reduces the
generalization error (relative to the ‘‘single best’’
strategy) for the prognostic data from 0.226 to
y = 2.3278x + 0.1234
R2= 0.8043
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.02 0.04 0.06 0.08 0.1 0.12
Model Instability
GeneralizationError
RBFc
RBFa
RBFb
RBFd
MLPc
MOEc
MLPa
MLPd
BPNETc
LRLDA
CARTa
CARTb
KDa
KDb
KD
d
KNNc
KDc
KNNb
KNNa
KNdc
MOEb
(0.50)
(0.52)(0.49)
(0.34) (0.38)
Multicriteria selection
ensemble candidates
Fig. 4. Model instability versus generalization error––prognostic data.
y = 1.5609x + 0.0245
R2
= 0.7467
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0 0.005 0.01 0.015 0.02 0.025
Model Instability
GeneralizationError
LD
KNNd
KNNbcRBFb
RBFc,d,a
MOEb
KDd
MOEdKNNa
MOEa
MLPd
MLPc
KDc
KDb
CARTa,b
KDa
LR
MLPa
Multicriteria selection
ensemble candidates
(0.63)
(0.63)
(0.65)
(0.59)
Fig. 5. Model instability versus generalization error––cytology data.
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 545
0.194. The default error rate for the na€ıve baseline
classifier for this data set is 0.226. The error
reduction achieved by the multicriteria selection
ensemble is a statistically significant reduction of
Table 7
Comparison of generalization errors
Model selection strategy Prognostic data Cytology data
Single best CV model 0.226 (0.007) 0.029 (0.0013)
Baseline-bagging Ensemble 0.209 (0.006) 0.028 (0.0013)
Diverse ensembles
Ensemble––24 potential members 0.215 (0.014) 0.033 (0.0024)
Ensemble––12 potential members 0.209 (0.013) 0.031 (0.0033)
Ensemble––6 potential members 0.203 (0.010) 0.027 (0.0011)
Multicriteria ensemble 3 members 0.194 (0.011) 0.027 (0.0012)
Parenthesis identify standard errors.
Multi-
criteria
RBFc
RBFb
RBFd
RBFa
MLPa
MOEd
MOEa
MOEc
MOEb
Single
MLPa
MLPc
MLPd
CARTb
CARTa
LR
LDA
Kda
KDb
KDc
KNNb
KNNd
KNNc
KDd
KNNa
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Fig. 6. 95% confidence intervals––prognostic data (generalization error for baseline, single best, and multicriteria ensembles).
Greedy
RBFc
RBFd
Single
RBFa
RBFb
MOEd
MOEa
MLPb
MOEc
MOEb
MLPa
MLPd
MLPc
KNNd
KNNb
KNNa
KNNc
LDA
KDc
KDb
CARTA
CARTB
Kda
LR
KDd
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Fig. 7. 95% confidence intervals––cytology data (generalization error for baseline, single best, and greedy ensembles).
546 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
error at a p value less than 0.05 and represents a
14.1% error reduction. The multicriteria selection
ensemble did not result in a statistically significant
error reduction compared to the ‘‘single best
model’’ strategy with the cytology data (0.027
versus 0.029); it did show an error reduction of
6.9%. This is a decisive error reduction from the
na€ıve baseline value of 0.345 for this data set. It is
not possible to provide t tests for all the ensemble
alternatives investigated in this research for there
are over 325 different pairs that could be consid-
ered. A perspective that includes the variability in
our estimates of mean generalization error is pro-
vided by the 95% confidence intervals shown in
Fig. 6 (prognostic data) and Fig. 7 (cytology data).
The reader is cautioned not to draw statistical
significance conclusions from these figures because
of the Bonferroni effect, the multitude of individ-
ual pairs that can be contrasted.
5. Concluding discussion
Breast cancer outcomes, like many decision
support applications in the health care field, are
critically dependent on early detection and accu-
rate diagnosis. The accuracy of these diagnostic
decisions can be increased by an effective medical
decision support system. Recent research has
shown that physiciansÕ diagnostic performance is
directly linked to the quality of information
available from a decision support system (Berner
et al., 1999). The model selection strategy is
therefore an important determinant of the per-
formance and acceptance of a MDSS application.
The primary strategy to select a model for an
MDSS application has been the identification of a
single most accurate model in a cross validation
study. We show that this strategy is critically
dependent on the data partitions used in the cross
validation study, and that this single best strategy
does not result in a MDSS with the lowest
achievable generalization error. A strategy of
forming ensembles (a collection of individual
models) provides more accurate diagnostic guid-
ance for the physician. There is a significant body
of research evidence today that suggests that the
generalization error of a single model can be re-
duced by using ensembles of that model provided
the model is unstable. This research typically
compares the errors of a single model (frequently
CART or neural networks) to the resulting
ensembles formed from that same model. The
reader should appreciate that our definition of
‘‘single best’’ strategy is a more realistic estimate of
the expected error from a cross validation study.
We use a population of 24 different models and
conduct 100 cross validation repetitions, during
which a number of different models are judged to
be winners. We believe this is the first MDSS re-
search to show in a rigorous fashion that ensem-
bles are more accurate than the ‘‘single best’’
strategy where the ‘‘single best’’ model is selected
from a diverse group of models.
While the theory of bagging ensembles may give
the impression that model selection is no longer
relevant (i.e., the practitioner should choose an
unstable model and aggregate the ensemble mem-
bersÕ decisions), our results demonstrate that the
identification of high potential candidate models
for ensemble membership remains critically
important. For example, radial basis function
neural networks dominate the most accurate
ensembles for the cytology data. Our results also
suggest that ensembles formed from a diversity of
models are generally more accurate than the
baseline-bagging ensembles. There is a discernable
decrease in generalization error as the number of
different models in an ensemble increases. Most of
the improvement occurs with ensembles formed
from 3–5 different models. The most effective
ensembles formed in this research result from a
small and selective subset of the population of
available models, with potential candidates iden-
tified by jointly considering the properties of
model generalization error, model instability, and
the independence of model decisions relative to
other ensemble members. The ensemble formed
from the multicriteria selection process for the
cytology data is also more accurate than results
reported for baseline-bagging ensembles of CART
models (Breiman, 1996) and ensembles of neural
network models (Parmanto et al., 1996).
The multicriteria selection ensemble reduces the
expected generalization error obtained from the
single best strategy for the prognostic data from
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 547
0.226 to 0.194. There is a 14.1% error reduction
which is statistically significant at a p value less
than 0.05. The clinical significance of this
improvement is that there are potentially 27,000
fewer incorrect diagnoses per year based on the US
breast cancer incidence rate alone. Although with
the cytology data the multicriteria selection algo-
rithm did not result in a statistically significant
reduction compared to the ‘‘single best model’’
strategy for reasons explained earlier, it did show
an error reduction of 6.9%. While the focus of
this paper is to minimize total misclassification
errors, it is likely that a clinical implementation
would be designed at a specificity-sensitivity
tradeoff that minimizes the occurrence of false
negatives.
The reader is cautioned that there are a number
of combining theories described in the research
literature for constructing optimal ensembles. The
use of these combining algorithms might result in
more accurate ensembles than the ensembles
formed in this research using majority vote. The
numerical instability and associated estimation
problems typical of these combining theories,
however, may mitigate against their use in health-
care applications. While we feel the medical data
used in this research is representative of diagnostic
decision support applications, the conclusions are
based on two specific applications in the breast
cancer diagnosis domain. More research is needed
to verify that these results generalize to other
medical domains and to areas beyond health care.
Prognostic data
Model Parameter Note
Neural network
MLPa 32 · 2 · 2 MLP ¼ multilayer perceptron
MLPb 32 · 4 · 2 Format I Â H1 Â O where
I ¼ number of input nodes
H1 ¼ number of nodes in hidden layer 1
O ¼ number of output nodes
MLPc 32 · 6 · 2
MLPd 32 · 8 · 2
MOEa 32 · 2 · 2(2 · 2) MOE ¼ mixture of experts
MOEb 32 · 4 · 2(3 · 2) Format I  H  O (Gh  Go) where
I ¼ number of input nodes
H ¼ number of nodes in hidden layer
O ¼ number of output nodes
Gh ¼ number of nodes in Gating hidden layer
Go ¼ number of Gating output nodes
MOEc 32 · 6 · 2(4 · 2)
MOEd 32 · 8 · 2(4 · 2)
RBFa 32 · 20 · 2 RBF ¼ radial basis function
RBFb 32 · 40 · 2 Format I Â H Â O where
I ¼ number of input nodes
H ¼ number of nodes in hidden layer
O ¼ number of output nodes
RBFc 32 · 60 · 2
RBFd 32 · 80 · 2
Parametric
LDA LDA ¼ FisherÕs linear discriminant analysis
LR LR ¼ logistic regression
Appendix A. Model definitions
548 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
Appendix A (continued)
Non-parametric
KNNa k ¼ 3 KNN ¼ k nearest neighbor
KNNb k ¼ 5 Format k ¼ i where
i ¼ number of nearest neighborsKNNc k ¼ 7
KNNd k ¼ 9
Kda R ¼ 1 KD ¼ kernel density
KDb R ¼ 3 Format R ¼ j where
j ¼ radius of kernel functionKDc R ¼ 5
KDd R ¼ 7
CARTa Gini Splitting rule
CARTb Twoing
Cytology data
Model Parameter Note
Neural network
MLPa 9 · 2 · 2 MLP ¼ multilayer perceptron
MLPb 9 · 4 · 2 Format I Â H1 Â O where
I ¼ number of input nodes
H1 ¼ number of nodes in hidden layer 1
O ¼ number of output nodes
MLPc 9 · 6 · 2
MLPd 9 · 8 · 2
MOEa 9 · 2 · 2(2 · 2) MOE ¼ mixture of experts
MOEb 9 · 4 · 2(3 · 2) Format I  H  O (Gh  Go) where
I ¼ number of input nodes
H ¼ number of nodes in hidden layer
O ¼ number of output nodes,
Gh ¼ number of nodes in Gating hidden layer
Go ¼ number of Gating output nodes
MOEc 9 · 6 · 2(4 · 2)
MOEd 9 · 8 · 2(4 · 2)
RBFa 9 · 20 · 2 RBF ¼ radial basis function
RBFb 9 · 40 · 2 Format I Â H Â O where
I ¼ number of input nodes
H ¼ number of nodes in hidden layer
O ¼ number of output nodes
RBFc 9 · 60 · 2
RBFd 9 · 80 · 2
Parametric
LDA LDA ¼ FisherÕs linear discriminant analysis
LR LR ¼ logistic regression
Non-parametric
KNNa k ¼ 5 KNN ¼ k nearest neighbor
KNNb k ¼ 7 Format k ¼ i
where
i ¼ number of nearest neighbors
KNNc k ¼ 9
KNNd k ¼ 11
(continued on next page)
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 549
References
Anders, U., Korn, O., 1999. Model selection in neural
networks. Neural Networks 12, 309–323.
Baker, J.A., Kornguth, P.J., Lo, J.Y., Williford, M.E., Floyd,
C.E., 1995. Breast cancer: Prediction with artificial neural
network based on bi-rads standardized lexicon. Radiology
196, 817–822.
Baker, J.A., Kornguth, P.J., Lo, J.Y., Floyd, C.E., 1996.
Artificial neural network: Improving the quality of breast
biopsy recommendations. Radiology 198, 131–135.
Baxt, W.G., 1990. Use of an artificial neural network for
data analysis in clinical decision-making: The diagnosis of
acute coronary occlusion. Neural Computation 2, 480–
489.
Baxt, W.G., 1991. Use of an artificial neural network for the
diagnosis of myocardial infarction. Annals of Internal
Medicine 115, 843–848.
Baxt, W.G., 1994. A neural network trained to identify the
presence of myocardial infarction bases some decisions on
clinical associations that differ from accepted clinical
teaching. Medical Decision Making 14, 217–222.
Bay, S.D., 1999. Nearest neighbor classification from
multiple feature subset. Intelligent Data Analysis 3, 191–
209.
Berner, E.S., Maisiak, R.S., Cobbs, C.G., Taunton, O.D., 1999.
Effects of a decision support system on physiciansÕ diagnos-
tic performance. Journal of the American Medical Infor-
matics Association: JAMIA 6, 420–427.
Blake, C.L., Merz, C.J., 1998. UCI repository of machine
learning databases. Department of Information and Com-
puter Science, University of California, Irvine. Avail-
able from http://www1.ics.uci.edu/~mlearn/MLSummary.
html.
Bottaci, L., Drew, P.J., Hartley, J.E., Hadfieldet, M.B., 1997.
Artificial neural networks applied to outcome prediction for
colorectal cancer patients in separate institutions. The
Lancet 350, 469–472.
Bounds, D.G., Lloyd, P.J., Mathew, B.G., 1990. A comparison
of neural network and other pattern recognition approaches
to the diagnosis of low back disorders. Neural Networks 3,
583–591.
Breiman, L., 1995. Stacked regressions. Machine Learning 24,
49–64.
Breiman, L., 1996. Bagging predictors. Machine Learning 26,
123–140.
Buntinx, F., Truyen, J., Embrechts, P., Moreel, G., Peeters, R.,
1992. Evaluating patients with chest pain using classification
and regression trees. Family Practice 9 (2), 149–
153.
Crichton, N.J., Hinde, J.P., Marchini, J., 1997. Models for
diagnosing chest pain: Is CART helpful? Statistics in
Medicine 16 (7), 717–727.
Cunningham, P., Carney, J., Jacob, S., 2000. Stability problems
with artificial neural networks and the ensemble solution.
Artificial Intelligence in Medicine 20, 217–225.
Detrano, R., Janosi, A., Steinbrun, W., Pfisterer, M., Schmid,
J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V., 1989.
International application of a new probability algorithm for
the diagnosis of coronary artery disease. The American
Journal of Cardiology 64, 304–310.
Dougados, M., Van der Linden, S., Juhlin, R., Huitfeldt, B.,
Amor, B., Calin, A., Cats, A., Dijkmans, B., Olivieri, I.,
Pasero, G., Veys, E., Zeidler, H., 1991. The European
spondylarthropathy study group preliminary criteria for the
classification of spondylarthropathy. Arthritis and Rheu-
matism 34 (10), 1218–1230.
Fricker, J., 1997. Artificial neural networks improve diagnosis
of acute myocardial infarction. The Lancet 350, 935.
Gilpin, E., Olshen, R., Henning, H., Ross Jr., J., 1983. Risk
prediction after myocardial infarction. Cardiology 70, 73–
84.
Greenlee, R., Hill-Harmon, M.B., Murray, T., Thun, M., 2001.
Cancer statistics 2001. A Cancer Journal for Clinicians 51,
15–36.
Hubbard, B.L., Gibbons, R.J., Lapeyre, A.C., Zinsmeister,
A.R., Clements, I.P., 1992. Identification of severe coronary
artery disease using simple clinical parameters. Archives of
Internal Medicine 152, 309–312.
Josefson, D., 1997. Computers beat doctors in interpreting
ECGs. British Medical Journal 315, 764–765.
Kononenko, I., Bratko, I., 1991. Information-based evaluation
criterion for classifierÕs performance. Machine Learning 6,
67–80.
Lisboa, P.J.G., 2002. A review of evidence of health benefit
from artificial neural networks in medical intervention.
Neural Networks 15, 11–39.
Maclin, P.S., Dempsey, J., 1994. How to improve a neural
network for early detection of hepatic cancer. Cancer
Letters 77, 95–101.
Makuch, R.W., Rosenberg, P.S., 1988. Identifying prognostic
factors in binary outcome data: An application using liver
Appendix A (continued)
KDa R ¼ 0:1 KD ¼ kernel density
KDb R ¼ 0:5 Format R ¼ j where
j ¼ radius of kernel functionKDc R ¼ 1:0
KDd R ¼ 1:5
CARTa Gini Splitting rule
CARTb Twoing
550 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
function tests and age to predict liver metastases. Statistics
in Medicine 7, 843–856.
Mangasarian, O.L., Street, W.N., Wolberg, W.H., 1995. Breast
cancer diagnosis and prognosis via linear programming.
Operations Research 43 (4), 570–577.
Mango, L.J., 1994. Computer-assisted cervical cancer screening
using neural networks. Cancer Letters 77, 155–162.
Mango, L.J., 1996. Reducing false negatives in clinical practice:
The role of neural network technology. American Journal of
Obstetrics and Gynecology 175, 1114–1119.
Marble, R.P., Healy, J.C., 1999. A neural network approach to
the diagnosis of morbidity outcomes in trauma care.
Artificial Intelligence in Medicine 15, 299–307.
Miller, R.A., 1994. Medical diagnostic decision support sys-
tems-past, present, and future: A threaded bibliography and
brief commentary. Journal of the American Medical Infor-
matics Association 1, 8–27.
Nomura, H., Kashiwagi, S., Hayashi, J., Kajiyama, W.,
Ikematsu, H., Noguchi, A., Tani, S., Goto, M., 1988.
Prevalence of gallstone disease in a general population of
Okinawa, Japan. American Journal of Epidemiology 128,
598–605.
Palocsay, S.W., Stevens, S.P., Brookshire, R.G., Sacco, W.J.,
Copes, W.S., Buckman Jr., R.F., Smith, J.S., 1996. Using
neural networks for trauma outcome evaluation. European
Journal of Operational Research 93, 369–386.
Parkin, D.M., 1998. Epidemiology of cancer: Global patterns
and trends. Toxicology Letters 102–103, 227–234.
Parkin, D.M., 2001. Global cancer statistics in the year 2000.
Lancet Oncology 2, 533–534.
Parmanto, B., Munro, P.W., Doyle, H.R., 1996. Reducing
variance of committee prediction with resampling tech-
niques. Connection Science 8, 405–425.
Rosenberg, C., Erel, J., Atlan, H., 1993. A neural network that
learns to interpret myocardial planar thallium scintigrams.
Neural Computation 5, 492–502.
Schubert, T.T., Bologna, S.D., Nensey, Y., Schubert, A.,
Mascha, E.J., Ma, C.K., 1993. Ulcer risk factors: Interac-
tions between heliocobacter pylori infection, nonsteroidal
use and age. The American Journal of Medicine 94, 416–
418.
Sharkey, A.J.C., 1996. On combining artificial neural nets.
Connection Science 8, 299–313.
Sheng, O.R.L, 2000. Editorial: Decision support for healthcare
in a new information age. Decision Support Systems 30,
101–103.
Sheppard, D., McPhee, D., Darke, C., Shrethra, B., Moore, R.,
Jurewits, A., Gray, A., 1999. Predicting cytomegalovirus
disease after renal transplantation: An artificial neural
network approach. International Journal of Medical Infor-
matics 54, 55–71.
Temkin, N.R., Holubkov, R., Machamer, J.E., Winn, H.R.,
Dikmen, S.S., 1995. Classification and regression trees
(CART) for prediction of function at 1 year following head
trauma. Journal of Neurosurgery 82, 764–771.
Tierney, W.H., Murray, M.D., Gaskins, D.L., Zhou, X.-H.,
1997. Using computer-based medical records to predict
mortality risk for inner-city patients with reactive airways
disease. Journal of the Medical Informatics Association 4,
313–321.
Tourassi, G.D., Floyd, C.E., Sostman, H.D., Coleman, R.E.,
1993. Acute pulmonary embolism: Artificial neural network
approach for diagnosis. Radiology 189, 555–558.
Tsujji, O., Freedman, M.T., Mun, S.K., 1999. Classification of
microcalcifications in digital mammograms using trend-
oriented radial basis function neural network. Pattern
Recognition 32, 891–903.
Tu, J.V., Weinstein, M.C., McNeil, B.J., Naylor, C.D., The
Steering Committee of the Cardiac Care Network of
Ontario, 1998. Predicting mortality after coronary artery
bypass surgery: What do artificial neural networks learn?
Medical Decision Making 18, 229–235.
West, D., West, V., 2000. Model selection for a medical
diagnostic decision support system: A breast cancer detec-
tion case. Artificial Intelligence in Medicine 20, 183–204.
Wilding, P., Morgan, M.A., Grygotis, A.E., Shoffner, M.A.,
Rosato, E.F., 1994. Application of backpropagation neural
networks to diagnosis of breast and ovarian cancer. Cancer
Letters 77, 145–153.
Wolberg, W.H., Street, W.N., Heisey, D.M., Mangasarian,
O.L., 1995. Computerized breast cancer diagnosis and
prognosis from fine needle aspirates. Archives of Surgery
130, 511–516.
Wolpert, D., 1992. Stacked generalization. Neural Networks 5,
241–259.
Wu, Y., Giger, M.L., Doi, K., Vyborny, C.J., Schmidt, R.A.,
Metz, C.E., 1993. Artificial neural networks in mammogra-
phy: Application to decision making in the diagnosis of
breast cancer. Radiology 187, 81–87.
Zhang, J., 1999a. Developing robust non-linear models through
bootstrap aggregated neural networks. Neurocomputing 25,
93–113.
Zhang, J., 1999b. Inferential estimation of polymer quality
using bootstrap aggregated neural networks. Neural Net-
works 12, 927–938.
Zhilkin, P.A., Somorjai, R.L., 1996. Application of several
methods of classification fusion to magnetic resonance
spectra. Connection Science 8, 427–442.
D. West et al. / European Journal of Operational Research 162 (2005) 532–551 551

More Related Content

What's hot

i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachi.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachJonathan Josue Cid Galiot
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...Kiogyf
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...CSCJournals
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing countryDr. Ashish lakhey
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...CSCJournals
 
On Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachOn Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachMasud Rana Basunia
 
Image processing and machine learning techniques used in computer-aided dete...
Image processing and machine learning techniques  used in computer-aided dete...Image processing and machine learning techniques  used in computer-aided dete...
Image processing and machine learning techniques used in computer-aided dete...IJECEIAES
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONijscai
 
IRJET- A Survey on Categorization of Breast Cancer in Histopathological Images
IRJET- A Survey on Categorization of Breast Cancer in Histopathological ImagesIRJET- A Survey on Categorization of Breast Cancer in Histopathological Images
IRJET- A Survey on Categorization of Breast Cancer in Histopathological ImagesIRJET Journal
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Joel Saltz
 
Improving Hierarchical Decision Approach for Single Image Classification of P...
Improving Hierarchical Decision Approach for Single Image Classification of P...Improving Hierarchical Decision Approach for Single Image Classification of P...
Improving Hierarchical Decision Approach for Single Image Classification of P...IJECEIAES
 
Perceived benefits and barriers to exercise for recently treated patients wit...
Perceived benefits and barriers to exercise for recently treated patients wit...Perceived benefits and barriers to exercise for recently treated patients wit...
Perceived benefits and barriers to exercise for recently treated patients wit...Enrique Moreno Gonzalez
 
A convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinomaA convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinomanguyên anh doanh
 
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...Enrique Moreno Gonzalez
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...ijaia
 

What's hot (17)

i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approachi.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
i.a.Preoperative ovarian cancer diagnosis using neuro fuzzy approach
 
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
Toward Integrated Clinical and Gene Expression Profiles for Breast Cancer Pro...
 
Digital pathology in developing country
Digital pathology in developing countryDigital pathology in developing country
Digital pathology in developing country
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
 
On Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining ApproachOn Predicting and Analyzing Breast Cancer using Data Mining Approach
On Predicting and Analyzing Breast Cancer using Data Mining Approach
 
Image processing and machine learning techniques used in computer-aided dete...
Image processing and machine learning techniques  used in computer-aided dete...Image processing and machine learning techniques  used in computer-aided dete...
Image processing and machine learning techniques used in computer-aided dete...
 
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTIONSVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
 
IRJET- A Survey on Categorization of Breast Cancer in Histopathological Images
IRJET- A Survey on Categorization of Breast Cancer in Histopathological ImagesIRJET- A Survey on Categorization of Breast Cancer in Histopathological Images
IRJET- A Survey on Categorization of Breast Cancer in Histopathological Images
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
 
Improving Hierarchical Decision Approach for Single Image Classification of P...
Improving Hierarchical Decision Approach for Single Image Classification of P...Improving Hierarchical Decision Approach for Single Image Classification of P...
Improving Hierarchical Decision Approach for Single Image Classification of P...
 
Perceived benefits and barriers to exercise for recently treated patients wit...
Perceived benefits and barriers to exercise for recently treated patients wit...Perceived benefits and barriers to exercise for recently treated patients wit...
Perceived benefits and barriers to exercise for recently treated patients wit...
 
A convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinomaA convenient clinical nomogram for small intestine adenocarcinoma
A convenient clinical nomogram for small intestine adenocarcinoma
 
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...
Intensity-modulated radiotherapy with simultaneous modulated accelerated boos...
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
 
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
 

Viewers also liked

decision support system
decision support systemdecision support system
decision support systemsayivc
 
Executive Information System
Executive Information SystemExecutive Information System
Executive Information SystemTheju Paul
 
Decision Support System & Group Decision Support System
Decision Support System & Group Decision Support SystemDecision Support System & Group Decision Support System
Decision Support System & Group Decision Support SystemNaresh Rupareliya
 
Executive support system (ess)
Executive support system (ess)Executive support system (ess)
Executive support system (ess)Saumya Singh
 
Chapter 2:review of related literature and studies
Chapter 2:review of related literature and studiesChapter 2:review of related literature and studies
Chapter 2:review of related literature and studiesmhel15
 
Decision Support System - Management Information System
Decision Support System - Management Information SystemDecision Support System - Management Information System
Decision Support System - Management Information SystemNijaz N
 
Management Information System (MIS)
Management Information System (MIS)Management Information System (MIS)
Management Information System (MIS)Navneet Jingar
 
Decision Support System(DSS)
Decision Support System(DSS)Decision Support System(DSS)
Decision Support System(DSS)Sayantan Sur
 
Types Of Information Systems
Types Of Information SystemsTypes Of Information Systems
Types Of Information SystemsManuel Ardales
 

Viewers also liked (12)

decision support system
decision support systemdecision support system
decision support system
 
Executive Information System
Executive Information SystemExecutive Information System
Executive Information System
 
Executive support system
Executive support systemExecutive support system
Executive support system
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
Decision Support System & Group Decision Support System
Decision Support System & Group Decision Support SystemDecision Support System & Group Decision Support System
Decision Support System & Group Decision Support System
 
Executive support system (ess)
Executive support system (ess)Executive support system (ess)
Executive support system (ess)
 
Chapter 2:review of related literature and studies
Chapter 2:review of related literature and studiesChapter 2:review of related literature and studies
Chapter 2:review of related literature and studies
 
Decision Support System - Management Information System
Decision Support System - Management Information SystemDecision Support System - Management Information System
Decision Support System - Management Information System
 
Executive Information System
Executive Information SystemExecutive Information System
Executive Information System
 
Management Information System (MIS)
Management Information System (MIS)Management Information System (MIS)
Management Information System (MIS)
 
Decision Support System(DSS)
Decision Support System(DSS)Decision Support System(DSS)
Decision Support System(DSS)
 
Types Of Information Systems
Types Of Information SystemsTypes Of Information Systems
Types Of Information Systems
 

Similar to Ensemble strategies improve breast cancer diagnosis using decision support systems

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceDr. Amarjeet Singh
 
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...IRJET Journal
 
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESPREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESIAEME Publication
 
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisPerformance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisIOSR Journals
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...ijsc
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...ijsc
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...IRJET Journal
 
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...IRJET Journal
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
 
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...Shakas Technologies
 
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...
Breast cancer diagnosis: a survey of pre-processing,  segmentation, feature e...Breast cancer diagnosis: a survey of pre-processing,  segmentation, feature e...
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...IJECEIAES
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...mlaij
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisPramod Sharma
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...IJTET Journal
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerIRJET Journal
 

Similar to Ensemble strategies improve breast cancer diagnosis using decision support systems (20)

A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer RecurrenceA Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
A Review on Data Mining Techniques for Prediction of Breast Cancer Recurrence
 
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUESPREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
PREDICTION OF BREAST CANCER USING DATA MINING TECHNIQUES
 
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer DiagnosisPerformance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
 
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...Mining of Important Informative Genes and Classifier Construction for Cancer ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...
 
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
 
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
 
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
PREDICTION OF BREAST CANCER,COMPARATIVE REVIEW OF MACHINE LEARNING TECHNIQUES...
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer Data
 
fnano-04-972421.pdf
fnano-04-972421.pdffnano-04-972421.pdf
fnano-04-972421.pdf
 
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...
A Comparative Analysis of Hybridized Genetic Algorithm in Predictive Models o...
 
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...
Breast cancer diagnosis: a survey of pre-processing,  segmentation, feature e...Breast cancer diagnosis: a survey of pre-processing,  segmentation, feature e...
Breast cancer diagnosis: a survey of pre-processing, segmentation, feature e...
 
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
 
I1803056267
I1803056267I1803056267
I1803056267
 
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast CancerLogistic Regression Model for Predicting the Malignancy of Breast Cancer
Logistic Regression Model for Predicting the Malignancy of Breast Cancer
 

More from dewisetiyana52

Struktur dan Fungsi Jaringan Hewan
Struktur dan Fungsi Jaringan HewanStruktur dan Fungsi Jaringan Hewan
Struktur dan Fungsi Jaringan Hewandewisetiyana52
 
Human genetics concepts and applications
Human genetics concepts and applicationsHuman genetics concepts and applications
Human genetics concepts and applicationsdewisetiyana52
 
Organ Sistem Pendengaran
Organ Sistem PendengaranOrgan Sistem Pendengaran
Organ Sistem Pendengarandewisetiyana52
 
Echinodermata presentation
Echinodermata presentationEchinodermata presentation
Echinodermata presentationdewisetiyana52
 
Penyakit Gigi dan Mulut
Penyakit Gigi dan MulutPenyakit Gigi dan Mulut
Penyakit Gigi dan Mulutdewisetiyana52
 
Stadium Kanker Payudara
Stadium Kanker PayudaraStadium Kanker Payudara
Stadium Kanker Payudaradewisetiyana52
 
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...dewisetiyana52
 
Drug interactions between oral contraceptives and antibiotics
Drug interactions between oral contraceptives and antibioticsDrug interactions between oral contraceptives and antibiotics
Drug interactions between oral contraceptives and antibioticsdewisetiyana52
 
Genetic testing of breast and ovarian cancer patients: clinical characteristi...
Genetic testing of breast and ovarian cancer patients: clinical characteristi...Genetic testing of breast and ovarian cancer patients: clinical characteristi...
Genetic testing of breast and ovarian cancer patients: clinical characteristi...dewisetiyana52
 
Sistem Muskuloskeletal full
Sistem Muskuloskeletal fullSistem Muskuloskeletal full
Sistem Muskuloskeletal fulldewisetiyana52
 
Gene Expression Patterns_in_Ovarian_Carcinomas
Gene Expression Patterns_in_Ovarian_CarcinomasGene Expression Patterns_in_Ovarian_Carcinomas
Gene Expression Patterns_in_Ovarian_Carcinomasdewisetiyana52
 
Jurnal pentingnya Kecerdasan Sosial
Jurnal pentingnya Kecerdasan SosialJurnal pentingnya Kecerdasan Sosial
Jurnal pentingnya Kecerdasan Sosialdewisetiyana52
 
Identifikasi Gen mer A melalui metode komputasi Repaired
Identifikasi Gen mer A melalui metode komputasi RepairedIdentifikasi Gen mer A melalui metode komputasi Repaired
Identifikasi Gen mer A melalui metode komputasi Repaireddewisetiyana52
 
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...dewisetiyana52
 

More from dewisetiyana52 (20)

Struktur dan Fungsi Jaringan Hewan
Struktur dan Fungsi Jaringan HewanStruktur dan Fungsi Jaringan Hewan
Struktur dan Fungsi Jaringan Hewan
 
Anatomi Urinaria
Anatomi UrinariaAnatomi Urinaria
Anatomi Urinaria
 
Human genetics concepts and applications
Human genetics concepts and applicationsHuman genetics concepts and applications
Human genetics concepts and applications
 
Anatomi Tumbuhan
Anatomi TumbuhanAnatomi Tumbuhan
Anatomi Tumbuhan
 
Organ Sistem Pendengaran
Organ Sistem PendengaranOrgan Sistem Pendengaran
Organ Sistem Pendengaran
 
Ocean life
Ocean lifeOcean life
Ocean life
 
Kelenjar
KelenjarKelenjar
Kelenjar
 
Echinodermata presentation
Echinodermata presentationEchinodermata presentation
Echinodermata presentation
 
Penyakit Gigi dan Mulut
Penyakit Gigi dan MulutPenyakit Gigi dan Mulut
Penyakit Gigi dan Mulut
 
Oral Medicine
Oral MedicineOral Medicine
Oral Medicine
 
Stadium Kanker Payudara
Stadium Kanker PayudaraStadium Kanker Payudara
Stadium Kanker Payudara
 
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...
Nutritional Status, Lifestyle, and Risk Behaviors for Eating Disorders in Nut...
 
Drug interactions between oral contraceptives and antibiotics
Drug interactions between oral contraceptives and antibioticsDrug interactions between oral contraceptives and antibiotics
Drug interactions between oral contraceptives and antibiotics
 
Genetic testing of breast and ovarian cancer patients: clinical characteristi...
Genetic testing of breast and ovarian cancer patients: clinical characteristi...Genetic testing of breast and ovarian cancer patients: clinical characteristi...
Genetic testing of breast and ovarian cancer patients: clinical characteristi...
 
PROSES KEHAMILAN
PROSES KEHAMILANPROSES KEHAMILAN
PROSES KEHAMILAN
 
Sistem Muskuloskeletal full
Sistem Muskuloskeletal fullSistem Muskuloskeletal full
Sistem Muskuloskeletal full
 
Gene Expression Patterns_in_Ovarian_Carcinomas
Gene Expression Patterns_in_Ovarian_CarcinomasGene Expression Patterns_in_Ovarian_Carcinomas
Gene Expression Patterns_in_Ovarian_Carcinomas
 
Jurnal pentingnya Kecerdasan Sosial
Jurnal pentingnya Kecerdasan SosialJurnal pentingnya Kecerdasan Sosial
Jurnal pentingnya Kecerdasan Sosial
 
Identifikasi Gen mer A melalui metode komputasi Repaired
Identifikasi Gen mer A melalui metode komputasi RepairedIdentifikasi Gen mer A melalui metode komputasi Repaired
Identifikasi Gen mer A melalui metode komputasi Repaired
 
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...
Detection of Wuchereria bancrofti DNA in paired serum and urine samples using...
 

Recently uploaded

College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...Miss joya
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...narwatsonia7
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiCall Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiNehru place Escorts
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...narwatsonia7
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatorenarwatsonia7
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Miss joya
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 

Recently uploaded (20)

College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...
Russian Call Girls in Pune Tanvi 9907093804 Short 1500 Night 6000 Best call g...
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiCall Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
Low Rate Call Girls Ambattur Anika 8250192130 Independent Escort Service Amba...
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 

Ensemble strategies improve breast cancer diagnosis using decision support systems

  • 1. Computing, Artificial Intelligence and Information Technology Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application David West a,*, Paul Mangiameli b , Rohit Rampal c , Vivian West d a Department of Decision Sciences, College of Business Administration, East Carolina University, Greenville, NC 27836, USA b College of Business Administration, University of Rhode Island, Kingston, RI 02881, USA c School of Business Administration, Portland State University, Portland, OR 97201, USA d School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Received 21 March 2002; accepted 1 October 2003 Available online 19 December 2003 Abstract The model selection strategy is an important determinant of the performance and acceptance of a medical diagnostic decision support system based on supervised learning algorithms. This research investigates the potential of various selection strategies from a population of 24 classification models to form ensembles in order to increase the accuracy of decision support systems for the early detection and diagnosis of breast cancer. Our results suggest that ensembles formed from a diverse collection of models are generally more accurate than either pure-bagging ensembles (formed from a single model) or the selection of a ‘‘single best model.’’ We find that effective ensembles are formed from a small and selective subset of the population of available models with potential candidates identified by a multicriteria process that considers the properties of model generalization error, model instability, and the independence of model decisions relative to other ensemble members. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Decision support systems; Medical informatics; Neural networks; Bootstrap aggregate models; Ensemble strategies 1. Introduction Breast cancer is one of the most prevalent can- cers, ranking third worldwide and is the most pre- valent form of cancer worldwide among women (Parkin, 1998). Most developed countries have seen increases in its incidence within the past 20 years. Based on recent available international data, breast cancer ranks second only to lung cancer as the most common newly diagnosed cancer (Parkin, 2001). In 2001 the American Cancer Society predicted that in the United States approximately 40,200 deaths would result from breast cancer and that 192,200 women would be newly diagnosed with breast can- cer (Greenlee et al., 2001). Breast cancer outcomes have improved during the last decade with the development of more effective diagnostic techniques and improvements in treatment methodologies. A key factor in this trend is the early detection and * Corresponding author. Tel.: +1-252-3286370; fax: +1-919- 3284092. E-mail address: westd@mail.ecu.edu (D. West). 0377-2217/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2003.10.013 European Journal of Operational Research 162 (2005) 532–551 www.elsevier.com/locate/dsw
  • 2. accurate diagnosis of this disease. The long-term survival rate for women in whom breast cancer has not metastasized has increased, with the majority of women surviving many years after diagnosis and treatment. A medical diagnostic decision support system (MDSS) is one technology that can facilitate the early detection of breast cancer. The MDSS (Miller, 1994; Sheng, 2000) is an evolving technology capable of increasing diag- nostic decision accuracy by augmenting the natural capabilities of human diagnosticians in the com- plex process of medical diagnosis. A recent study finds that the physiciansÕ diagnostic performance can be strongly influenced by the quality of infor- mation produced by a diagnostic decision support system (Berner et al., 1999). For MDSS imple- mentations that are based on supervised learning algorithms, the quality of information produced is dependent on the choice of an algorithm that learns to predict the presence/absence of a disease from a collection of examples with known outcomes. The focus of this paper is MDSS systems based on these inductive learning algorithms and excludes expert system based approaches. Some of the earliest MDSS systems used simple parametric models like linear discriminant analysis and logistic regression. The high costs of making a wrong diagnosis has motivated an intense search for more accurate algorithms, including non-parametric methods such as k nearest neighbor or kernel density, feed- forward neural networks such as multilayer perceptron or radial basis function, and classifica- tion-and-regression trees. Unfortunately, there is no theory available to guide the selection of an algorithm for a specific diagnostic application. Traditionally, the model selection is accomplished by selecting the ‘‘single best’’ (i.e., most accurate) method after comparing the relative accuracy of a limited set of models in a cross validation study. Recent research suggests that an alternate strategy to the selection of the ‘‘single best model’’ is to employ ensembles of models. Breiman (1996) reports that ‘‘bootstrap ensembles,’’ combinations of models built from perturbed versions of the learning sets, may have significantly lower errors than the ‘‘single best model’’ selection strategy. In fact, several authors provide evidence that the ‘‘single best model’’ selection strategy may be the wrong approach (Breiman, 1995, 1996; Wolpert, 1992; Zhang, 1999a,b). The purpose of this research is to investigate the potential of bootstrap ensembles to reduce the diagnostic errors of MDSS applications for the early detection and diagnosis of breast cancer. We specifically investigate the effect of model diversity (the number of different models in the ensemble) on the generalization accuracy of the ensemble. The ensemble strategies investigated include a ‘‘baseline-bagging’’ ensemble (i.e., an ensemble formed from multiple instances of a single model), a diverse ensemble with controlled levels of model diversity, and an ensemble formed from a multi- criteria selection methodology. The three ensemble strategies are benchmarked against the ‘‘single best model.’’ In all cases, an aggregate ensemble deci- sion is achieved by majority vote of the decisions of the ensemble members. In the next section of this paper we review the model decisions of several recent MDSS imple- mentations. The third section will discuss our re- search methodology and the experimental design that we use to estimate the generalization error for the MDSS ensembles. The fourth section presents our results, the mean generalization error for sev- eral ensemble strategies. We conclude with a dis- cussion of these results, and implications to guide the ensemble formation strategy for a medical diagnostic decision support system. 2. MDSS model selection Virtually all MDSS implementations to date use the ‘‘single best model’’ selection strategy. This strategy selects a model from a limited set of po- tential models whose accuracy is estimated in cross validation tests (Anders and Korn, 1999; Kon- onenko and Bratko, 1991). The most accurate model in the cross validation study is then selected for use in the MDSS. A brief discussion of some of the single model MDSS implementations reported in the research literature follows. It is not possible to present a complete survey of MDSS applica- tions. The readers interested in more information on this subject are referred to the survey papers of Miller (1994) and Lisboa (2002). D. West et al. / European Journal of Operational Research 162 (2005) 532–551 533
  • 3. 2.1. Single model MDSS applications Linear discriminant analysis has been used to diagnosis coronary artery disease (Detrano et al., 1989), acute myocardial infarction (Gilpin et al., 1983), and breast cancer (West and West, 2000). Logistic regression has been used to predict or diagnose spondylarthropathy (Douga- dos et al., 1991), acute myocardial infarction (Gilpin et al., 1983), coronary artery disease (Hubbard et al., 1992), liver metastases (Makuch and Rosenberg, 1988), gallstones (Nomura et al., 1988), ulcers (Schubert et al., 1993), mortality risk for reactive airway disease (Tierney et al., 1997), and breast cancer (West and West, 2000). Non- parametric models have also been used to diagnose or predict various pathologies. K nearest neighbor was used in comparative studies to diagnose lower back disorders (Bounds et al., 1990), predict 30-day mortality and survival following acute myocardial infarction (Gilpin et al., 1983), and separate can- cerous and non-cancerous breast tumor masses (West and West, 2000). Kernel density has been utilized to determine outcomes from a set of pa- tients with severe head injury (Tourassi et al., 1993), and to differentiate malignant and benign cells taken from fine needle aspirates of breast tumors (Wolberg et al., 1995). Neural networks have also been used in a great number of MDSS applications because of the belief that they have greater predictive power (Tu et al., 1998). The traditional multilayer perceptron has been used to diagnose breast cancer (Baker et al., 1995, 1996; Josefson, 1997; Wilding et al., 1994; Wu et al., 1993), acute myocardial infarction (Baxt, 1990, 1991, 1994; Fricker, 1997; Rosenberg et al., 1993), colorectal cancer (Bottaci et al., 1997), lower back disorders (Bounds et al., 1990), hepatic cancer (Maclin and Dempsey, 1994), sepsis (Marble and Healy, 1999), cytomegalovirus retinopathy (Sheppard et al., 1999), trauma outcome (Palocsay et al., 1996), and ovarian cancer (Wilding et al., 1994). PAPNET, an MDSS based on an MLP, is now available for screening gynecologic cytology smears (Mango, 1994, 1996). The radial basis function neural network has been used to diagnose lower back disorders (Bounds et al., 1990), classify micro-calcifications in digital mammograms (Tsujji et al., 1999), and in a comparative study of acute pulmonary embolism (Tourassi et al., 1993). Clas- sification and regression trees have been used to predict patient function following head trauma (Temkin et al., 1995), to evaluate patients with chest pains (Buntinx et al., 1992), and to diagnose anterior chest pain (Crichton et al., 1997). 2.2. Ensemble applications There is a growing amount of evidence that ensembles, a committee of machine learning algo- rithms, result in higher prediction accuracy. One of the most popular ensemble strategies is bootstrap aggregation or bagging predictors advanced by Breiman (1996). This strategy, depicted in Fig. 1, uses multiple instances of a learning algorithm (C1ðxÞ . . . CBðxÞ) trained on bootstrap replicates of the learning set (TB1 . . . TBB). Plurality vote is used to produce an aggregate decision from the modelÕs individual decisions. If the classification algorithm is unstable in the sense that perturbed versions of the training set produce significant changes in the predictor, then bagging predictors can increase the decision accuracy. Breiman demonstrates this by constructing bootstrap ensemble models from classification and regression trees and tests the resulting ensembles on several benchmark data sets. On these data sets, the CART bagging ensembles achieve reductions in classification errors ranging from 6% to 77%. Breiman found that ensembles with as few as 10 bootstrap repli- cates are sufficient to generate most of the improvement in classification accuracy (Breiman, 1996). Model instability is therefore an important consideration in constructing bagging ensembles. Models with higher levels of instability will achieve greater relative improvement in classification accuracy as a result of a bagging ensemble strategy. The current research on ensemble strategies fo- cuses primarily on ensembles of a single classifica- tion algorithm. For example, Zhang (1999b) aggregates 30 multilayer perceptron neural net- works with varying numbers of hidden neurons to estimate polymer reactor quality, while Cunning- ham et al. (2000) report improved diagnostic pre- diction for MDSS systems that aggregate neural network models. Bay (1999) tested combinations of 534 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 4. nearest neighbor classifiers trained on a random subset of features and found the aggregate model to outperform standard nearest neighbor variants. Zhilkin and Somorjai (1996) explore model diver- sity in bagging ensembles by using combinations of linear and quadratic discriminant analysis, log- istic regression, and multilayer perceptron neu- ral networks to classify brain spectra by magnetic resonance measurement. They report that the boot- strap ensembles are more accurate in this applica- tion than any ‘‘single best model,’’ and that the performance of the single models varies widely, performing well on some data sets and poorly on others. Research to date confirms that the generaliza- tion error of a specific model can be reduced by bootstrap ensemble methods. There has been little systematic study of the properties of multimodel MDSS systems. The contribution of this paper is to investigate more thoroughly the model selection strategies available to the practitioner implement- ing an MDSS system, including the role of model diversity in bagging ensembles. 3. Research methodology Our research methodology is presented in three parts. Part one describes the two data sets that we examine. Part two describes the 24 models that we employ for our MDSS ensembles. Part three pre- sents our experimental design. 3.1. Breast cancer data sets The two data sets investigated in this research are both contributed by researchers at the Uni- versity of Wisconsin. These data sets are avail- able from the UCI Machine Learning Repository (Blake and Merz, 1998). The Wisconsin breast cancer data consists of records of breast cytol- ogy first collected and analyzed by Wolberg, Street, and Mangasarian (Mangasarian et al., 1995; Wolberg et al., 1995). The data, which we will refer to as the ‘‘cytology data,’’ consist of 699 records of virtually assessed nuclear features of fine needle aspirates from patients whose diag- noses resulted in 458 benign and 241 malig- nant cases of breast cancer. A malignant label is confirmed by performing a biopsy on the breast tissue. Nine ordinal variables measure properties of the cell, such as thickness, size, and shape, and are used to classify the case as benign or malig- nant. The second data set we refer to as the ‘‘prog- nostic data.’’ These data are from follow-up visits and include only those cases exhibiting invasive breast cancer without evidence of distant metas- tases at the time of diagnosis (Mangasarian et al., 1995; Wolberg et al., 1995). Thirty features, com- puted from a digitized image of a fine needle aspirate of a breast mass, describe characteristics of the cell nuclei present in the image. There are 198 examples; 47 are recurrent cases and 151 are non-recurrent cases. ... Pˆ TB1 TB2 TBB C * (x)= argy ∈Y max∑1(C (x)=y) C1 (x) C2 (x) CB (x) ... t i i =1 Fig. 1. Bagging ensemble scheme for classification decisions. D. West et al. / European Journal of Operational Research 162 (2005) 532–551 535
  • 5. 3.2. Model description The learning algorithms have been selected to represent most of the methods used in prior MDSS research and commercial applications. We use the term ‘‘model’’ to refer to a specific configuration of a learning architecture. These include linear dis- criminant analysis (LDA), logistic regression (LR), three different neural network algorithms (multi- layer perceptron (MLP), mixture-of-experts (MOE), and radial basis functions (RBF)), classi- fication and regression trees (CART), nearest neighbor classifier (KNN), and kernel density (KD). Many of these algorithms require specific configuration decisions or parameter choices. In these cases, the specific configurations and parameters we use are guided by principles of model simplicity and generally accepted practices. For example, the neural network models are lim- ited to a single hidden layer with the number of hidden layer neurons generally ranging from 2 to 8. A total of four specific neural network configura- tions are created for each of the three architectures. Four different configurations are also created for KNN, with a range of nearest neighbors from 3 to 11. For KD, density widths range from 0.1 to 7.0. Both the Gini and Twoing splitting rules are used to create two different CART configurations. 3.3. Experimental design The purpose of the experimental design is to provide reliable estimates of the generalization error for the ‘‘single best model’’ selection strategy and several ensemble formation strategies. To estimate the mean generalization error of each strategy, the available data, D, is split into three partitions: a training set, Tij, a validation set, Vij, and a final independent holdout test set, Hi which is used to measure the modelÕs generalization er- ror. The subscript i is an index for the run number varying from 1 to 100 while j indexes the cross validation partition number varying from 1 to 10. 3.3.1. Estimating ‘‘single best model’’ generalization error The ‘‘single best’’ generalization error is esti- mated by generating 10-fold cross validation par- titions, training all 24 models with Tij, determining the model with the lowest error on the validation set Vij, and finally measuring that modelÕs gener- alization error on the independent holdout test set Hi. The 10-fold cross validation partitioning is repeated 100 times. The validation set Vij is used to implement early stopping during training of the neural networks to avoid model overfitting. The details of this process are specified in the algorithm below. We begin the process in step 1 by randomly selecting data to form an independent test set and removing the test set from the available data, D (step 2). The test set is sized to be approximately 10% of the available data. The remaining data, Di (where Di ¼ D À Hi), is randomly shuffled and partitioned into 10 mutually exclusive sets in step 3. One partition is used as a validation set (step 4a) and the other nine are consolidated and used for training (step 4b). This is repeated 10 times so that each partition functions as a validation set. The ‘‘single best model’’ is then identified as the model with the lowest error on the 10 validation sets (step 5 and 6). An estimate of this modelÕs accuracy on future unseen cases is measured using the inde- pendent test set (step 7). The seven steps are re- peated 100 times to determine a mean generalization error for the ‘‘single best model’’ strategy (step 8). Repeat for i ¼ 1 to 100 Step 1. Create holdout test set, Hi, by randomly sampling without replacement from the available data, D. Hi is sized to be approx- imately 10% of D Step 2. Remove Hi from D forming Di ¼ D À Hi Step 3. Randomly shuffle and partition Di into j ¼ 1 . . . 10 mutually exclusive partitions, Dij Step 4. Repeat for j ¼ 1 to 10 a. Form validation set Vij ¼ Dij b. Form training set Tij ¼ Di À Vij by consoli- dating the remaining partitions c. Repeat for model number B where B ¼ 1 to 24 i. Train each classifier model CBðxÞ with training set Tij ii. Evaluate classifier error EVijB on valida- tion set, Vij 536 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 6. end loop B end loop j Step 5. Determine average validation error EiB ¼P10 j¼1 EVijB for each model Step 6. Identify the ‘‘single best model’’ BÃ ¼ arg minðEiBÞ 8B ¼ f1; . . . ; 24g Step 7. Estimate the generalization error for the winning model, ^EiBÃ , using the indepen- dent hold out test set, Hi end loop i Step 8. Determine the mean generalization error for the ‘‘single best model’’ strategy, ^EBÃ ¼ P100 i¼1 ^EiBÃ =100 3.3.2. Estimating bagging ensemble generalization error The generalization error for bagging ensembles is estimated at several controlled levels of ensemble model diversity (the number of different models in the ensemble). Each ensemble will consist of ex- actly 24 models. We chose 24 ensemble members because of the symmetry with the number of models investigated and note that this number exceeds the minimum of 10 members reported by Breiman (1996) as necessary to achieve an improvement in generalization accuracy. We use the term ‘‘baseline-bagging ensembles’’ to describe ensembles whose membership is limited to repli- cations of a single model. These are the least di- verse ensembles. The most diverse ensembles are constructed from the 24 different models. Inter- mediate levels of diversity (between the baseline ensembles and the most diverse ensembles) are constructed with 12, 8, 6, 4, 3, and 2 distinct models. For these intermediate levels of diversity, the ensembles are formed by randomly selecting the specified number of models without replace- ment from the set of 24 available models. The models chosen are replicated to maintain a total of 24 ensemble members. For example, 12 models would be replicated twice, and 8 models would be replicated three times. The population of available models from which the ensembles are formed is also an important determinant of ensemble accu- racy. To explore this effect, we also form diverse models whose members are sampled from popu- lations of the 12 most accurate models (top 50%) and the 6 most accurate models (top 25%). Our final ensemble is formed from a multicriteria selection process that includes model instability and model correlation as well as model accuracy (Cunningham et al., 2000; Sharkey, 1996). The ensemble configurations for these restricted pop- ulations are limited to those defined in Table 1 below. A methodology for measuring comparable generalization errors for bagging ensembles is discussed next. This methodology parallels the work of Breiman (1996), Wolpert (1992), and Zhang (1999b). All bagging ensembles investigated in this research consist of 24 voting members that have been trained with different bootstrap repli- cate data. The purpose of the bootstrap replicates Table 1 Ensemble configurations for model diversity Number of different models Number of model replications Population of models 24 models 12 models 6 models 3 models 24 1 X 12 2 X X 8 3 X X 6 4 X X X 4 6 X X X 3 8 X X X Xa 2 12 X X X 1b 24 X a Multicriteria selection ensemble. b Baseline ensemble. D. West et al. / European Journal of Operational Research 162 (2005) 532–551 537
  • 7. is to create a different learning perspective for each of the 24 models, thereby increasing the indepen- dence of the prediction errors and generating more accurate ensembles. The data partitions used to train and test the ensemble members (i.e., Tij; Vij; Hi) are the same partitions created in the earlier section for the ‘‘single best model’’ strategy. For clarity in expressing the ensemble algorithm, steps 1–3 used to create these partitions are re- peated below. The ensemble algorithm first differs from the ‘‘single best model’’ strategyÕs 10-fold cross validation algorithm in step 4.c.i where 100 bootstrap training replicates are formed by sam- pling with replacement from the training set Tij. We refer to these bootstrap training sets by the symbol TBijk where i refers to the run number, j the cross validation partition and k the bootstrap replicate. A total of k ¼ 1 . . . 100 bootstrap sam- ples are created for each cross validation for a total of 1000 bootstrap replicates. We create more than the minimum number of bootstrap replicates re- quired by the 24 models to produce better esti- mates of the generalization error during the sampling process described next. Steps 5 through 7 define a process of randomly creating ensembles with controlled levels of diver- sity from the test sets created for each model in step 4.c.ii.2 where each trained classifier is tested on the independent holdout test set Hi. For each level of diversity investigated, 500 different ensembles are formed to estimate a mean generalization error. For each level of diversity, the number of models included in the ensemble membership is defined in Table 1 and varies from 1 (for the baseline-bagging ensemble) to 24 (for the most diverse ensembles). For the baseline-bagging ensembles, we form 24 ensembles, one for each model. For the most di- verse ensembles, all models are included in the ensemble. The intermediate levels of diversity in- clude ensembles with 2, 3, 4, 6, 8, and 12 models and are formed by randomly sampling from the population of models without replacement (step 5). Once the models are specified, step 6 defines a process to identify the test results by randomly sampling without replacement from the set of 100 bootstrap replicates produced in step 4.c.ii.2. En- semble generalization errors can then be estimated by a majority vote of the 24 ensemble members (step 7). This process of ensemble formation is re- peated 500 times to estimate a mean generalization error for all of the ensemble formation strategies investigated. Our decision to test the single best on 100 repetitions and the ensemble on 500 repetitions is designed to obtain approximately equivalent precision. The ensembles formation process intro- duces some additional sources of variability. First, the ensemble membership varies substantially from run to run based on the random sampling of models. Secondly, the ensembles are formed from bootstrap training sets, a method of intentionally perturbing the data set to introduce more diversity. We therefore require additional iterations to obtain an acceptable precision. Repeat for i ¼ 1 to 100 Step 1. Create holdout test set, Hi, by randomly sampling without replacement from the available data, D. Hi is sized to be approx- imately 10% of D Step 2. Remove Hi from D forming Di ¼ D À Hi Step 3. Randomly shuffle and partition Di into j ¼ 1 . . . 10 mutually exclusive partitions, Dij Step 4. Repeat for j ¼ 1 to 10 a. Form validation set Vij ¼ Dij b. Form training set Tij ¼ Di À Vij by consoli- dating the remaining partitions c. Repeat for k ¼ 1 to 100 i. Form bootstrap training TBijk set by sam- pling with replacement from Tij ii. Repeat for B ¼ 1 to 24 1.Train each classifier model CBðxÞ with bootstrap training set TBijk 2.Test each classifier CBðxÞ on test set Hi end loop B end loop k end loop j Repeat for z ¼ 1 to 500 */ these steps form 500 different ensembles for each level of ensemble diversity*/ Step 5. For intermediate levels of diversity {2, 3, 4, 6, 8, and 12 models}, randomly identify models without replacement from the population of models and form ensembles with 24 members 538 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 8. Step 6. For each model selected retrieve test re- sults for respective model for run i, for j ¼ 1 to 10 and for random k ¼ f1 . . . 100g Step 7. Estimate the generalization error, E _ Bgi for the ensemble by majority vote end loop z end loop i 4. Generalization error results of ensemble strate- gies 4.1. ‘‘Single best model’’ strategy generalization error results We first estimate the generalization error of a strategy that selects the ‘‘single best model’’ from the 24 models investigated. The most accurate model (lowest generalization error) is identified from a 10-fold cross validation study. These ‘‘single best’’ generalization errors, which are mean values of the 100 runs conducted, are reported at the bottom of Table 2 in the line labeled ‘‘average error.’’ The estimate of the generalization error for the ‘‘single best’’ strategy is 0.226 for the prognostic data and 0.029 for the cytology data. The model ‘‘winning frequency’’ is also reported in Table 2. The winning frequency is the proportion of times each model is determined to be the most accurate for the validation data during the 100 cross vali- dation trials. It is recognized that the ‘‘single best model’’ selection procedure is very dependent on the data partitions used to train and test the models (Cunningham et al., 2000). The results of Table 2 demonstrate this deficiency. As the composition of the cross validation training and test sets change during the 100 trials, different models are judged to Table 2 Single best model, winning frequency, and average generalization error Proportion of wins Model Cytology Prognostic 1 MLPa 0.02 0.05 2 MLPb 0.02 0.03 3 MLPc 0 0.08 4 MLPd 0 0.07 5 MOEa 0.03 0.02 6 MOEb 0.01 0.04 7 MOEc 0.02 0.04 8 MOEd 0.05 0.08 9 RBFa 0.08 0.08 10 RBFb 0.11 0.08 11 RBFc 0.22 0.12 12 RBFd 0.42 0.22 13 CARTa 0 0.04 14 CARTb 0 0.03 15 LDA 0 0 16 LR 0 0 17 KNNa 0 0 18 KNNb 0.02 0.01 19 KNNc 0 0.01 20 KNNd 0 0 21 KDa 0 0 22 KDb 0 0 23 KDc 0 0 24 KDd 0 0 Average error 0.029 0.226 Standard deviation (0.001) (0.007) D. West et al. / European Journal of Operational Research 162 (2005) 532–551 539
  • 9. be the most accurate. Of the 24 models investi- gated, 16 different models are determined to be most accurate on at least one trial for the prog- nostic data, while the cytology data has 11 models judged most accurate. It is also evident from Table 2 that the neural network models, particularly the radial basis function networks, tend to win a dis- proportionate share of the time for both data sets. 4.2. Baseline-bagging ensemble generalization error results The generalization error of baseline-bagging ensembles formed with 24 members of a single model is investigated next. Each of these ensembles consists of 24 members that are variations of a single classification model, e.g. MLPa or CARTb. Diversity in the baseline-bagging ensembles is introduced by differences in the bootstrap learning sets, and in some instances from different parame- ter initialization (i.e., neural network models). The average generalization errors, as well as the maxi- mum, minimum, and standard deviation for each of the baseline-bagging ensembles, are summarized in Table 3 for the prognostic data and Table 4 for the cytology data. Each table is sorted in ascending order by the generalization error. It is evident that the most accurate baseline-bagging ensembles correspond to models with high winning frequency in the cross validation studies. For the prognostic data, there are a total of eight (one-third of the models) baseline-bagging ensembles that achieve a mean generalization error lower than the 0.226 of the ‘‘single best’’ strategy. For the cytology data, only the RBFc and RBFd baseline-bagging ensembles achieve comparable or lower errors than the 0.029 of the ‘‘single best’’ strategy. There is considerable variation in the generalization error among the collection of baseline-bagging ensem- bles. For the prognostic data, the RBFc model has the lowest generalization error at 0.209, compared to the highest error of 0.385 for the KNNa model. For the cytology data, the RBFc model is again most accurate with a generalization error of 0.028, Table 3 Results of baseline-bagging ensembles prognostic data Model Average Min Max StDev RBFc 0.209 0.195 0.226 0.006 RBFd 0.212 0.195 0.237 0.007 RBFb 0.211 0.195 0.232 0.006 MOEc 0.223 0.200 0.247 0.009 RBFa 0.217 0.200 0.232 0.007 MOEd 0.222 0.200 0.242 0.009 MOEa 0.223 0.200 0.242 0.008 MOEb 0.225 0.205 0.247 0.008 MLPc 0.233 0.211 0.253 0.008 MLPb 0.230 0.200 0.253 0.009 MLPd 0.239 0.211 0.263 0.009 MLPa 0.221 0.195 0.247 0.010 CARTa 0.250 0.216 0.284 0.015 CARTb 0.250 0.205 0.295 0.015 LR 0.255 0.221 0.284 0.012 LDA 0.282 0.253 0.316 0.012 KDa 0.315 0.279 0.358 0.014 KDb 0.317 0.279 0.363 0.015 KDc 0.347 0.300 0.400 0.015 KNNb 0.357 0.305 0.416 0.016 KNNd 0.369 0.326 0.426 0.016 KNNc 0.379 0.332 0.421 0.017 KDd 0.380 0.337 0.421 0.014 KNNa 0.385 0.337 0.432 0.015 540 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 10. compared to the highest error of 0.071 for the KDd model. This suggests that model selection is still an important consideration in the design of baseline- bagging ensembles. In this application, it is not feasible to produce a baseline-bagging ensemble with low generalization error using a randomly selected model. While many of the baseline-bagging ensembles for the prognostic data achieve meaningful error reductions relative to the ‘‘single best’’ strategy, the same effect is not as pronounced for the cytology data. The reason for the ineffectiveness of ensemble methods for the cytology application may be that the decision concepts to be learned result in relatively similar decisions among the ensemble members. Sharkey (1996) argues that the reduction of error by bagging ensembles is limited in situations where the ensemble members exhibit a high degree of decisions consensus. The levels of concurrence among ensemble members can be in- ferred by inspecting the correlation of the model outputs expressed as posterior probabilities of class membership. The correlation of model out- puts for pairs of potential ensemble members is given in Table 5 for the prognostic data and Table 6 for the cytology data. To save space, only models in the upper 67th percentile of accuracy are in- cluded in these tables. The average correlation of all models is 0.476 for the prognostic data and 0.616 for the cytology data. The substantially higher level of model concurrence among the cytology ensemble members indicates that a di- verse set of independent ensemble experts has not been achieved, and that the potential for error reduction from the use of bagging ensembles will be more modest. We also note that the intra- architecture correlations are very high for both data sets. For example, the correlations for the four radial basis function models for the prog- nostic data range from 0.90 to 0.95, correlations for the mixture-of-experts models range from 0.81 to 0.84, and correlations for the multilayer per- ceptron models range from 0.72 to 0.76. These correlations suggest that a more effective ensemble formation strategy may be to select different architectures for ensemble membership, and in Table 4 Results of baseline-bagging ensembles cytology data Model Average Min Max StDev RBFc 0.028 0.026 0.031 0.001 RBFd 0.029 0.026 0.031 0.001 RBFa 0.030 0.028 0.032 0.001 RBFb 0.030 0.026 0.032 0.001 MOEd 0.033 0.029 0.037 0.002 MOEa 0.034 0.029 0.038 0.001 MLPb 0.034 0.031 0.038 0.001 MOEc 0.034 0.031 0.038 0.001 MOEb 0.035 0.032 0.038 0.001 MLPa 0.035 0.031 0.038 0.002 MLPd 0.035 0.031 0.040 0.002 MLPc 0.036 0.031 0.041 0.002 KNNd 0.036 0.035 0.040 0.001 KNNb 0.036 0.034 0.040 0.001 KNNa 0.037 0.032 0.040 0.002 KNNc 0.037 0.034 0.040 0.001 LDA 0.040 0.040 0.043 0.001 KDc 0.043 0.037 0.049 0.002 KDb 0.046 0.040 0.051 0.002 CARTa 0.047 0.040 0.054 0.003 CARTb 0.047 0.040 0.054 0.003 KDa 0.048 0.038 0.054 0.003 LR 0.062 0.051 0.071 0.003 KDd 0.071 0.068 0.076 0.002 D. West et al. / European Journal of Operational Research 162 (2005) 532–551 541
  • 11. Table 5 Correlation of models for prognostic ensemble members RBFc RBFd RBFb RBFa MOEd MOEa MOEb MOEc MLPa MLPc MLPb MLPd LR LDA CARTA CARTB RBFc 1 RBFd 0.94 1 RBFb 0.95 0.93 1 RBFa 0.91 0.90 0.93 1 MOEd 0.65 0.64 0.64 0.63 1 MOEa 0.66 0.65 0.66 0.64 0.84 1 MOEb 0.66 0.65 0.66 0.64 0.83 0.82 1 MOEc 0.67 0.66 0.67 0.65 0.82 0.81 0.81 1 MLPa 0.66 0.66 0.66 0.67 0.73 0.73 0.73 0.74 1 MLPc 0.67 0.66 0.67 0.66 0.73 0.74 0.75 0.75 0.72 1 MLPb 0.67 0.66 0.67 0.66 0.74 0.74 0.74 0.74 0.72 0.75 1 MLPd 0.67 0.66 0.66 0.66 0.74 0.74 0.74 0.74 0.72 0.76 0.75 1 LR 0.34 0.34 0.33 0.33 0.38 0.38 0.36 0.37 0.34 0.34 0.34 0.34 1 LDA 0.26 0.26 0.25 0.24 0.40 0.40 0.37 0.37 0.32 0.33 0.33 0.33 0.53 1 CARTA 0.37 0.37 0.36 0.36 0.35 0.35 0.36 0.35 0.34 0.35 0.34 0.34 0.22 0.21 1 CARTB 0.36 0.37 0.36 0.36 0.35 0.35 0.36 0.35 0.34 0.35 0.34 0.34 0.22 0.21 0.50 1 Average correlation ¼ 0.476. Table 6 Correlation of models for cytology ensemble members RBFc RBFd RBFb RBFa MOEd MOEa MOEb MOEc MLPa MLPc MLPb MLPd LR LDA CARTA CARTB RBFc 1 RBFd 0.96 1 RBFb 0.96 0.96 1 RBFa 0.96 0.96 0.96 1 MOEd 0.76 0.76 0.75 0.74 1 MOEa 0.73 0.73 0.72 0.72 0.87 1 MOEb 0.71 0.71 0.70 0.70 0.85 0.84 1 MOEc 0.70 0.70 0.69 0.69 0.84 0.82 0.81 1 MLPa 0.68 0.68 0.67 0.67 0.78 0.77 0.77 0.76 1 MLPc 0.65 0.65 0.64 0.64 0.75 0.75 0.75 0.74 0.72 1 MLPb 0.67 0.67 0.66 0.66 0.77 0.76 0.75 0.74 0.72 0.69 1 MLPd 0.64 0.64 0.63 0.63 0.73 0.73 0.72 0.72 0.70 0.68 0.68 1 LR 0.36 0.36 0.36 0.36 0.39 0.39 0.38 0.38 0.37 0.37 0.37 0.36 1 LDA 0.59 0.59 0.59 0.60 0.69 0.69 0.66 0.64 0.64 0.61 0.62 0.59 0.38 1 CARTA 0.44 0.44 0.43 0.43 0.46 0.46 0.46 0.45 0.46 0.44 0.44 0.43 0.34 0.43 1 CARTB 0.44 0.44 0.43 0.43 0.46 0.46 0.46 0.45 0.46 0.44 0.44 0.43 0.34 0.43 0.86 1 Average correlation ¼ 0.616. 542D.Westetal./EuropeanJournalofOperationalResearch162(2005)532–551
  • 12. particular those architectures with low levels of correlation. For the prognostic data, logistic regression, linear discriminant analysis, and CART appear to be good potential candidates as these models have correlations substantially lower than the average of all models. For the cytology data, logistic regression and CART have the low- est correlations and are potentially good candi- dates for ensemble membership. We will use these insights later in a multicriteria selection of models for ensemble membership. 4.3. Ensemble generalization error results at con- trolled levels of model diversity We next investigate the strategy of forming ensembles with higher levels of model diversity. We expect that this additional source of diversity will result in more accurate ensembles. This new source of diversity is induced in the ensemble by controlling ensemble membership to include dif- ferent models and different architectures. Includ- ing all of the 24 models investigated forms the most diverse ensembles. We also create interme- diate levels of diversity by forming ensembles with 12, 8, 6, 4, 3, and 2 different models. In all cases, the models to include in ensemble membership are chosen randomly without replacement from the set of 24 models, and are replicated to maintain a total of 24 ensemble members. For example, to form ensembles with 12 different models requires form- ing a random subset of 12 of the 24 models and replicating each of the 12 models two times in the ensemble. The test results for each model are randomly chosen without replacement from the 100 available bootstrap test results for each data partition. The generalization error for each level of model diversity is estimated from 500 repetitions of ensemble construction. The errors are plotted as rectangular markers in Figs. 2 and 3 for the prognostic data and the cytology data respectively. These figures include the average from the ‘‘single best’’ results of the cross validation study (plotted as a solid horizontal line), the results of the base- line-bagging ensembles, and the results of a mul- ticriteria selection algorithm discussed in the next subsection. Figs. 2 and 3 also show the effect of restricting the population of available models for the ensembles to the 12 most accurate models (represented by triangular markers), and to the 6 most accurate models (represented by circular makers). All generalization errors estimates are means values from the 500 repetitions of ensemble formation. The results depicted in Figs. 2 and 3 demon- strate that higher levels of model diversity result in lower ensemble generalization error, although the improvement is fairly modest for ensembles with three or more different models. It is also clear that the policy of restricting the population of potential 0.15 0.2 0.25 0.3 0.35 0.4 0 5 10 15 20 25 Number of Different Models in Bagging Ensemble Generalizationerror Baseline Ensembles Ensembles from 24 models Ensembles from top 12 models Ensembles from top 6 models Single best CV model Ensemble from multicriteria selection Fig. 2. Mean generalization error for bootsrap ensembles––prognostic data. D. West et al. / European Journal of Operational Research 162 (2005) 532–551 543
  • 13. ensemble members to a smaller subset of the more accurate models produces ensembles with lower mean generalization errors for both data sets. For example, ensembles of six different models sam- pled from a population of 24 possible models for the prognostic data have an average generalization error of 0.225. The corresponding error for similar ensembles with members sampled from the top 12 models is 0.212, and for ensembles sampled from the top six models is 0.203. Similarly, for the cytology data the expected generalization error is 0.033 for ensembles formed from 24 potential models, 0.031 for ensembles formed from 12 po- tential members, and 0.027 for ensembles formed from six potential members. Two other conclu- sions are evident from Figs. 2 and 3. In both applications, the more diverse ensembles are more accurate than the expected error of the ‘‘single best’’ strategy. For the prognostic data, the strat- egy of diverse bagging ensembles is universally superior to the ‘‘single best’’ strategy for ensembles with six or more models. For the cytology data, the bagging ensembles are superior to the ‘‘single best’’ strategy only for the most restricted model population. For both data sets, the diverse ensembles sampled from the population of the top six models have a lower error than any of the corresponding baseline-bagging ensembles. 4.4. Multicriteria model selection results The prior observations suggest a strategy of restricting the population of available ensemble models to a small number of models with the lowest generalization error. In this section we ex- pand the selection criteria beyond accuracy to in- clude low levels of model correlation as well as high relative instability (Cunningham et al., 2000; Sharkey, 1996). We measure model instability by the range (maximum–minimum) of generalization errors for the 500 baseline ensembles constructed. High potential candidates for ensemble member- ship are identified in Figs. 4 and 5. These figures plot the baseline-bagging ensemble generalization error as a function of model instability. Notice that a significant positive slope exists for both data sets with a regression model R2 of 0.746 and 0.803 respectively. This positive trend inhibits the effec- tiveness of more diverse ensembles because higher model instability is achieved by sacrificing model accuracy. An ‘‘efficient frontier’’ for identifying high potential ensemble members exists in the lower right areas of the instability regression plots, where models with low error and high instability are identified. For the prognostic data, the RBFd model is favored over RBFc because of increased model instability. For similar reasons, the CARTb model is preferred over CARTa. The multicriteria methodology of ensemble construc- tion for the prognostic application is to combine RBFd, the most accurate model with two other members from the set MOEc (0.52), MLPa (0.49), LR (0.34), and CARTb (0.38). The average model correlation is given in parenthesis and shown in Fig. 4. The preferred models are CARTb and LR, as they both have higher levels of instability and relatively low correlation. The generalization error of the resulting ensemble (RBFd, LR, CARTb), is 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0 5 10 15 20 25 Number of Different Models in Bagging Ensemble GeneralizationError Baseline Ensembles Ensembles from 24 models Ensembles from top 12 models Ensembles from top 6 models Single best CV models Ensemble from multicriteria selection Fig. 3. Mean generalization error for bootsrap ensemble––cytology data. 544 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 14. 0.194 and is plotted in Fig. 2 as a horizontal dashed line. Applying a similar multicriteria ensemble construction strategy to the cytology application produces an ensemble of RBFc, RBFb, and MOEa with a resulting generalization error of 0.027. This value is plotted on Fig. 3 as a dashed horizontal line. Table 7 presents a summary of the minimum mean generalization errors achieved for each of the ensemble strategies. For both of these data sets, ensembles are an effective means to reduce diagnostic error. The strategy of forming a base- line-bagging ensemble from a single classification model lowers the generalization error from the error of a ‘‘single best’’ strategy for both data sets. More diverse ensembles result in larger error reductions than those achieved by the baseline- bagging ensembles, but also require careful con- sideration of relative model error as well as model instability and independence of model decisions. The multicriteria process, a selective strategy of forming ensembles from a subset of three models, yields the lowest generalization error of all strate- gies. The multicriteria result for the cytology data (0.027) can be compared to two other comparable bagging ensemble studies that have been pub- lished. Breiman (1996) reports a generalization error of 0.037 for baseline-bagging ensembles consisting of CART models, while Parmanto et al. (1996) report a generalization error of 0.039 using ensembles of neural networks. The multicriteria selection ensemble reduces the generalization error (relative to the ‘‘single best’’ strategy) for the prognostic data from 0.226 to y = 2.3278x + 0.1234 R2= 0.8043 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0 0.02 0.04 0.06 0.08 0.1 0.12 Model Instability GeneralizationError RBFc RBFa RBFb RBFd MLPc MOEc MLPa MLPd BPNETc LRLDA CARTa CARTb KDa KDb KD d KNNc KDc KNNb KNNa KNdc MOEb (0.50) (0.52)(0.49) (0.34) (0.38) Multicriteria selection ensemble candidates Fig. 4. Model instability versus generalization error––prognostic data. y = 1.5609x + 0.0245 R2 = 0.7467 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0 0.005 0.01 0.015 0.02 0.025 Model Instability GeneralizationError LD KNNd KNNbcRBFb RBFc,d,a MOEb KDd MOEdKNNa MOEa MLPd MLPc KDc KDb CARTa,b KDa LR MLPa Multicriteria selection ensemble candidates (0.63) (0.63) (0.65) (0.59) Fig. 5. Model instability versus generalization error––cytology data. D. West et al. / European Journal of Operational Research 162 (2005) 532–551 545
  • 15. 0.194. The default error rate for the na€ıve baseline classifier for this data set is 0.226. The error reduction achieved by the multicriteria selection ensemble is a statistically significant reduction of Table 7 Comparison of generalization errors Model selection strategy Prognostic data Cytology data Single best CV model 0.226 (0.007) 0.029 (0.0013) Baseline-bagging Ensemble 0.209 (0.006) 0.028 (0.0013) Diverse ensembles Ensemble––24 potential members 0.215 (0.014) 0.033 (0.0024) Ensemble––12 potential members 0.209 (0.013) 0.031 (0.0033) Ensemble––6 potential members 0.203 (0.010) 0.027 (0.0011) Multicriteria ensemble 3 members 0.194 (0.011) 0.027 (0.0012) Parenthesis identify standard errors. Multi- criteria RBFc RBFb RBFd RBFa MLPa MOEd MOEa MOEc MOEb Single MLPa MLPc MLPd CARTb CARTa LR LDA Kda KDb KDc KNNb KNNd KNNc KDd KNNa 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Fig. 6. 95% confidence intervals––prognostic data (generalization error for baseline, single best, and multicriteria ensembles). Greedy RBFc RBFd Single RBFa RBFb MOEd MOEa MLPb MOEc MOEb MLPa MLPd MLPc KNNd KNNb KNNa KNNc LDA KDc KDb CARTA CARTB Kda LR KDd 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Fig. 7. 95% confidence intervals––cytology data (generalization error for baseline, single best, and greedy ensembles). 546 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 16. error at a p value less than 0.05 and represents a 14.1% error reduction. The multicriteria selection ensemble did not result in a statistically significant error reduction compared to the ‘‘single best model’’ strategy with the cytology data (0.027 versus 0.029); it did show an error reduction of 6.9%. This is a decisive error reduction from the na€ıve baseline value of 0.345 for this data set. It is not possible to provide t tests for all the ensemble alternatives investigated in this research for there are over 325 different pairs that could be consid- ered. A perspective that includes the variability in our estimates of mean generalization error is pro- vided by the 95% confidence intervals shown in Fig. 6 (prognostic data) and Fig. 7 (cytology data). The reader is cautioned not to draw statistical significance conclusions from these figures because of the Bonferroni effect, the multitude of individ- ual pairs that can be contrasted. 5. Concluding discussion Breast cancer outcomes, like many decision support applications in the health care field, are critically dependent on early detection and accu- rate diagnosis. The accuracy of these diagnostic decisions can be increased by an effective medical decision support system. Recent research has shown that physiciansÕ diagnostic performance is directly linked to the quality of information available from a decision support system (Berner et al., 1999). The model selection strategy is therefore an important determinant of the per- formance and acceptance of a MDSS application. The primary strategy to select a model for an MDSS application has been the identification of a single most accurate model in a cross validation study. We show that this strategy is critically dependent on the data partitions used in the cross validation study, and that this single best strategy does not result in a MDSS with the lowest achievable generalization error. A strategy of forming ensembles (a collection of individual models) provides more accurate diagnostic guid- ance for the physician. There is a significant body of research evidence today that suggests that the generalization error of a single model can be re- duced by using ensembles of that model provided the model is unstable. This research typically compares the errors of a single model (frequently CART or neural networks) to the resulting ensembles formed from that same model. The reader should appreciate that our definition of ‘‘single best’’ strategy is a more realistic estimate of the expected error from a cross validation study. We use a population of 24 different models and conduct 100 cross validation repetitions, during which a number of different models are judged to be winners. We believe this is the first MDSS re- search to show in a rigorous fashion that ensem- bles are more accurate than the ‘‘single best’’ strategy where the ‘‘single best’’ model is selected from a diverse group of models. While the theory of bagging ensembles may give the impression that model selection is no longer relevant (i.e., the practitioner should choose an unstable model and aggregate the ensemble mem- bersÕ decisions), our results demonstrate that the identification of high potential candidate models for ensemble membership remains critically important. For example, radial basis function neural networks dominate the most accurate ensembles for the cytology data. Our results also suggest that ensembles formed from a diversity of models are generally more accurate than the baseline-bagging ensembles. There is a discernable decrease in generalization error as the number of different models in an ensemble increases. Most of the improvement occurs with ensembles formed from 3–5 different models. The most effective ensembles formed in this research result from a small and selective subset of the population of available models, with potential candidates iden- tified by jointly considering the properties of model generalization error, model instability, and the independence of model decisions relative to other ensemble members. The ensemble formed from the multicriteria selection process for the cytology data is also more accurate than results reported for baseline-bagging ensembles of CART models (Breiman, 1996) and ensembles of neural network models (Parmanto et al., 1996). The multicriteria selection ensemble reduces the expected generalization error obtained from the single best strategy for the prognostic data from D. West et al. / European Journal of Operational Research 162 (2005) 532–551 547
  • 17. 0.226 to 0.194. There is a 14.1% error reduction which is statistically significant at a p value less than 0.05. The clinical significance of this improvement is that there are potentially 27,000 fewer incorrect diagnoses per year based on the US breast cancer incidence rate alone. Although with the cytology data the multicriteria selection algo- rithm did not result in a statistically significant reduction compared to the ‘‘single best model’’ strategy for reasons explained earlier, it did show an error reduction of 6.9%. While the focus of this paper is to minimize total misclassification errors, it is likely that a clinical implementation would be designed at a specificity-sensitivity tradeoff that minimizes the occurrence of false negatives. The reader is cautioned that there are a number of combining theories described in the research literature for constructing optimal ensembles. The use of these combining algorithms might result in more accurate ensembles than the ensembles formed in this research using majority vote. The numerical instability and associated estimation problems typical of these combining theories, however, may mitigate against their use in health- care applications. While we feel the medical data used in this research is representative of diagnostic decision support applications, the conclusions are based on two specific applications in the breast cancer diagnosis domain. More research is needed to verify that these results generalize to other medical domains and to areas beyond health care. Prognostic data Model Parameter Note Neural network MLPa 32 · 2 · 2 MLP ¼ multilayer perceptron MLPb 32 · 4 · 2 Format I  H1  O where I ¼ number of input nodes H1 ¼ number of nodes in hidden layer 1 O ¼ number of output nodes MLPc 32 · 6 · 2 MLPd 32 · 8 · 2 MOEa 32 · 2 · 2(2 · 2) MOE ¼ mixture of experts MOEb 32 · 4 · 2(3 · 2) Format I  H  O (Gh  Go) where I ¼ number of input nodes H ¼ number of nodes in hidden layer O ¼ number of output nodes Gh ¼ number of nodes in Gating hidden layer Go ¼ number of Gating output nodes MOEc 32 · 6 · 2(4 · 2) MOEd 32 · 8 · 2(4 · 2) RBFa 32 · 20 · 2 RBF ¼ radial basis function RBFb 32 · 40 · 2 Format I  H  O where I ¼ number of input nodes H ¼ number of nodes in hidden layer O ¼ number of output nodes RBFc 32 · 60 · 2 RBFd 32 · 80 · 2 Parametric LDA LDA ¼ FisherÕs linear discriminant analysis LR LR ¼ logistic regression Appendix A. Model definitions 548 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 18. Appendix A (continued) Non-parametric KNNa k ¼ 3 KNN ¼ k nearest neighbor KNNb k ¼ 5 Format k ¼ i where i ¼ number of nearest neighborsKNNc k ¼ 7 KNNd k ¼ 9 Kda R ¼ 1 KD ¼ kernel density KDb R ¼ 3 Format R ¼ j where j ¼ radius of kernel functionKDc R ¼ 5 KDd R ¼ 7 CARTa Gini Splitting rule CARTb Twoing Cytology data Model Parameter Note Neural network MLPa 9 · 2 · 2 MLP ¼ multilayer perceptron MLPb 9 · 4 · 2 Format I  H1  O where I ¼ number of input nodes H1 ¼ number of nodes in hidden layer 1 O ¼ number of output nodes MLPc 9 · 6 · 2 MLPd 9 · 8 · 2 MOEa 9 · 2 · 2(2 · 2) MOE ¼ mixture of experts MOEb 9 · 4 · 2(3 · 2) Format I  H  O (Gh  Go) where I ¼ number of input nodes H ¼ number of nodes in hidden layer O ¼ number of output nodes, Gh ¼ number of nodes in Gating hidden layer Go ¼ number of Gating output nodes MOEc 9 · 6 · 2(4 · 2) MOEd 9 · 8 · 2(4 · 2) RBFa 9 · 20 · 2 RBF ¼ radial basis function RBFb 9 · 40 · 2 Format I  H  O where I ¼ number of input nodes H ¼ number of nodes in hidden layer O ¼ number of output nodes RBFc 9 · 60 · 2 RBFd 9 · 80 · 2 Parametric LDA LDA ¼ FisherÕs linear discriminant analysis LR LR ¼ logistic regression Non-parametric KNNa k ¼ 5 KNN ¼ k nearest neighbor KNNb k ¼ 7 Format k ¼ i where i ¼ number of nearest neighbors KNNc k ¼ 9 KNNd k ¼ 11 (continued on next page) D. West et al. / European Journal of Operational Research 162 (2005) 532–551 549
  • 19. References Anders, U., Korn, O., 1999. Model selection in neural networks. Neural Networks 12, 309–323. Baker, J.A., Kornguth, P.J., Lo, J.Y., Williford, M.E., Floyd, C.E., 1995. Breast cancer: Prediction with artificial neural network based on bi-rads standardized lexicon. Radiology 196, 817–822. Baker, J.A., Kornguth, P.J., Lo, J.Y., Floyd, C.E., 1996. Artificial neural network: Improving the quality of breast biopsy recommendations. Radiology 198, 131–135. Baxt, W.G., 1990. Use of an artificial neural network for data analysis in clinical decision-making: The diagnosis of acute coronary occlusion. Neural Computation 2, 480– 489. Baxt, W.G., 1991. Use of an artificial neural network for the diagnosis of myocardial infarction. Annals of Internal Medicine 115, 843–848. Baxt, W.G., 1994. A neural network trained to identify the presence of myocardial infarction bases some decisions on clinical associations that differ from accepted clinical teaching. Medical Decision Making 14, 217–222. Bay, S.D., 1999. Nearest neighbor classification from multiple feature subset. Intelligent Data Analysis 3, 191– 209. Berner, E.S., Maisiak, R.S., Cobbs, C.G., Taunton, O.D., 1999. Effects of a decision support system on physiciansÕ diagnos- tic performance. Journal of the American Medical Infor- matics Association: JAMIA 6, 420–427. Blake, C.L., Merz, C.J., 1998. UCI repository of machine learning databases. Department of Information and Com- puter Science, University of California, Irvine. Avail- able from http://www1.ics.uci.edu/~mlearn/MLSummary. html. Bottaci, L., Drew, P.J., Hartley, J.E., Hadfieldet, M.B., 1997. Artificial neural networks applied to outcome prediction for colorectal cancer patients in separate institutions. The Lancet 350, 469–472. Bounds, D.G., Lloyd, P.J., Mathew, B.G., 1990. A comparison of neural network and other pattern recognition approaches to the diagnosis of low back disorders. Neural Networks 3, 583–591. Breiman, L., 1995. Stacked regressions. Machine Learning 24, 49–64. Breiman, L., 1996. Bagging predictors. Machine Learning 26, 123–140. Buntinx, F., Truyen, J., Embrechts, P., Moreel, G., Peeters, R., 1992. Evaluating patients with chest pain using classification and regression trees. Family Practice 9 (2), 149– 153. Crichton, N.J., Hinde, J.P., Marchini, J., 1997. Models for diagnosing chest pain: Is CART helpful? Statistics in Medicine 16 (7), 717–727. Cunningham, P., Carney, J., Jacob, S., 2000. Stability problems with artificial neural networks and the ensemble solution. Artificial Intelligence in Medicine 20, 217–225. Detrano, R., Janosi, A., Steinbrun, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V., 1989. International application of a new probability algorithm for the diagnosis of coronary artery disease. The American Journal of Cardiology 64, 304–310. Dougados, M., Van der Linden, S., Juhlin, R., Huitfeldt, B., Amor, B., Calin, A., Cats, A., Dijkmans, B., Olivieri, I., Pasero, G., Veys, E., Zeidler, H., 1991. The European spondylarthropathy study group preliminary criteria for the classification of spondylarthropathy. Arthritis and Rheu- matism 34 (10), 1218–1230. Fricker, J., 1997. Artificial neural networks improve diagnosis of acute myocardial infarction. The Lancet 350, 935. Gilpin, E., Olshen, R., Henning, H., Ross Jr., J., 1983. Risk prediction after myocardial infarction. Cardiology 70, 73– 84. Greenlee, R., Hill-Harmon, M.B., Murray, T., Thun, M., 2001. Cancer statistics 2001. A Cancer Journal for Clinicians 51, 15–36. Hubbard, B.L., Gibbons, R.J., Lapeyre, A.C., Zinsmeister, A.R., Clements, I.P., 1992. Identification of severe coronary artery disease using simple clinical parameters. Archives of Internal Medicine 152, 309–312. Josefson, D., 1997. Computers beat doctors in interpreting ECGs. British Medical Journal 315, 764–765. Kononenko, I., Bratko, I., 1991. Information-based evaluation criterion for classifierÕs performance. Machine Learning 6, 67–80. Lisboa, P.J.G., 2002. A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Networks 15, 11–39. Maclin, P.S., Dempsey, J., 1994. How to improve a neural network for early detection of hepatic cancer. Cancer Letters 77, 95–101. Makuch, R.W., Rosenberg, P.S., 1988. Identifying prognostic factors in binary outcome data: An application using liver Appendix A (continued) KDa R ¼ 0:1 KD ¼ kernel density KDb R ¼ 0:5 Format R ¼ j where j ¼ radius of kernel functionKDc R ¼ 1:0 KDd R ¼ 1:5 CARTa Gini Splitting rule CARTb Twoing 550 D. West et al. / European Journal of Operational Research 162 (2005) 532–551
  • 20. function tests and age to predict liver metastases. Statistics in Medicine 7, 843–856. Mangasarian, O.L., Street, W.N., Wolberg, W.H., 1995. Breast cancer diagnosis and prognosis via linear programming. Operations Research 43 (4), 570–577. Mango, L.J., 1994. Computer-assisted cervical cancer screening using neural networks. Cancer Letters 77, 155–162. Mango, L.J., 1996. Reducing false negatives in clinical practice: The role of neural network technology. American Journal of Obstetrics and Gynecology 175, 1114–1119. Marble, R.P., Healy, J.C., 1999. A neural network approach to the diagnosis of morbidity outcomes in trauma care. Artificial Intelligence in Medicine 15, 299–307. Miller, R.A., 1994. Medical diagnostic decision support sys- tems-past, present, and future: A threaded bibliography and brief commentary. Journal of the American Medical Infor- matics Association 1, 8–27. Nomura, H., Kashiwagi, S., Hayashi, J., Kajiyama, W., Ikematsu, H., Noguchi, A., Tani, S., Goto, M., 1988. Prevalence of gallstone disease in a general population of Okinawa, Japan. American Journal of Epidemiology 128, 598–605. Palocsay, S.W., Stevens, S.P., Brookshire, R.G., Sacco, W.J., Copes, W.S., Buckman Jr., R.F., Smith, J.S., 1996. Using neural networks for trauma outcome evaluation. European Journal of Operational Research 93, 369–386. Parkin, D.M., 1998. Epidemiology of cancer: Global patterns and trends. Toxicology Letters 102–103, 227–234. Parkin, D.M., 2001. Global cancer statistics in the year 2000. Lancet Oncology 2, 533–534. Parmanto, B., Munro, P.W., Doyle, H.R., 1996. Reducing variance of committee prediction with resampling tech- niques. Connection Science 8, 405–425. Rosenberg, C., Erel, J., Atlan, H., 1993. A neural network that learns to interpret myocardial planar thallium scintigrams. Neural Computation 5, 492–502. Schubert, T.T., Bologna, S.D., Nensey, Y., Schubert, A., Mascha, E.J., Ma, C.K., 1993. Ulcer risk factors: Interac- tions between heliocobacter pylori infection, nonsteroidal use and age. The American Journal of Medicine 94, 416– 418. Sharkey, A.J.C., 1996. On combining artificial neural nets. Connection Science 8, 299–313. Sheng, O.R.L, 2000. Editorial: Decision support for healthcare in a new information age. Decision Support Systems 30, 101–103. Sheppard, D., McPhee, D., Darke, C., Shrethra, B., Moore, R., Jurewits, A., Gray, A., 1999. Predicting cytomegalovirus disease after renal transplantation: An artificial neural network approach. International Journal of Medical Infor- matics 54, 55–71. Temkin, N.R., Holubkov, R., Machamer, J.E., Winn, H.R., Dikmen, S.S., 1995. Classification and regression trees (CART) for prediction of function at 1 year following head trauma. Journal of Neurosurgery 82, 764–771. Tierney, W.H., Murray, M.D., Gaskins, D.L., Zhou, X.-H., 1997. Using computer-based medical records to predict mortality risk for inner-city patients with reactive airways disease. Journal of the Medical Informatics Association 4, 313–321. Tourassi, G.D., Floyd, C.E., Sostman, H.D., Coleman, R.E., 1993. Acute pulmonary embolism: Artificial neural network approach for diagnosis. Radiology 189, 555–558. Tsujji, O., Freedman, M.T., Mun, S.K., 1999. Classification of microcalcifications in digital mammograms using trend- oriented radial basis function neural network. Pattern Recognition 32, 891–903. Tu, J.V., Weinstein, M.C., McNeil, B.J., Naylor, C.D., The Steering Committee of the Cardiac Care Network of Ontario, 1998. Predicting mortality after coronary artery bypass surgery: What do artificial neural networks learn? Medical Decision Making 18, 229–235. West, D., West, V., 2000. Model selection for a medical diagnostic decision support system: A breast cancer detec- tion case. Artificial Intelligence in Medicine 20, 183–204. Wilding, P., Morgan, M.A., Grygotis, A.E., Shoffner, M.A., Rosato, E.F., 1994. Application of backpropagation neural networks to diagnosis of breast and ovarian cancer. Cancer Letters 77, 145–153. Wolberg, W.H., Street, W.N., Heisey, D.M., Mangasarian, O.L., 1995. Computerized breast cancer diagnosis and prognosis from fine needle aspirates. Archives of Surgery 130, 511–516. Wolpert, D., 1992. Stacked generalization. Neural Networks 5, 241–259. Wu, Y., Giger, M.L., Doi, K., Vyborny, C.J., Schmidt, R.A., Metz, C.E., 1993. Artificial neural networks in mammogra- phy: Application to decision making in the diagnosis of breast cancer. Radiology 187, 81–87. Zhang, J., 1999a. Developing robust non-linear models through bootstrap aggregated neural networks. Neurocomputing 25, 93–113. Zhang, J., 1999b. Inferential estimation of polymer quality using bootstrap aggregated neural networks. Neural Net- works 12, 927–938. Zhilkin, P.A., Somorjai, R.L., 1996. Application of several methods of classification fusion to magnetic resonance spectra. Connection Science 8, 427–442. D. West et al. / European Journal of Operational Research 162 (2005) 532–551 551