SlideShare a Scribd company logo
1 of 7
To appear Proc. The 2003 International Conference on Machine Learning and Applications (ICMLA'03)
                                    Los Angeles, California, June 23-24, 2003.


                       Fast Decision Tree Learning Techniques
                              for Microarray Data Collections

                                       Xiaoyong Li and Christoph F. Eick
                                         Department of Computer Science
                                      University of Houston, TX 77204-3010
                                              e-mail: ceick@cs.uh.edu
                     Abstract                               gene expression profiles of tumors from cancer
                                                            patients [1]. In addition to the enormous scientific
   DNA microarrays allow monitoring of                      potential of DNA microarrays to help in
   expression levels for thousands of genes                 understanding gene regulation and interactions,
   simultaneously. The ability to successfully              microarrays have very important applications in
   analyze the huge amounts of genomic data is of           pharmaceutical and clinical research. By comparing
   increasing importance for research in biology            gene expression in normal and abnormal cells,
   and medicine. The focus of this paper is the             microarrays may be used to identify which genes are
   discussion of techniques and algorithms of a             involved in causing particular diseases. Currently,
   decision tree learning tool that has been devised        most approaches to the computational analysis of
   taking into consideration the special features of        gene expression data focus more on the attempt to
   microarray data sets: continuous-valued                  learn about genes and tumor classes in an
   attributes and small size of examples with a             unsupervised way. Many research projects employ
   large number of genes. The paper introduces              cluster analysis for both tumor samples and genes,
   novel approaches to speed up leave-one-out               and mostly use hierarchical clustering methods [2,3]
   cross validation through the reuse of results of         and partitioning methods, such as self-organizing
   previous computations, attribute pruning, and            maps [4] to identify groups of similar genes and
   through approximate computation techniques.              groups of similar samples.
   Our approach employs special histogram-based
   data structures for continuous attributes for         This paper, however, centers on the application of
   speed up and for the purpose of pruning. We           supervised learning techniques to microarray data
   present experimental results concerning three         collections. In particular, we will discuss the features
   microarray data sets that suggest that these          of a decision tree learning tool for microarray data
   optimizations lead to speedups between 150%           sets. We assume that each data set includes gene
   and 400%. We also present arguments that our          expression data of m-RNA samples. Normally, in
   attribute pruning techniques not only lead to         these data sets the number of genes is pretty large
   better speed but also enhance the testing             (usually between 1000 and 10,000). Each gene is
   accuracy.                                             characterized by numerical values that measure the
                                                         degree the gene is turned on for the particular
   Key words and phrases: decision trees, concept        sample. The number of examples in the training set,
   learning for microarray data sets, leave-one-out      on the other hand, is typically below one hundred.
   cross validation, heuristics for split point          Associated with each sample is its type or class that
   selection, decision tree reuse.                       we are trying to predict. Moreover, in this paper we
                                                         will restrict our discussions to binary classification
                                                         problems.
1. Introduction
                                                         Section 2 introduces decision tree learning
The advent of DNA microarray technology provides         techniques for microarray data collections. Section 3
biologists with the ability of monitoring expression     discusses how to speed up leave-one-out cross
levels for thousands of genes simultaneously.            validation. Section 4 presents experimental results
Applications of microarrays range from the study of      that evaluate our techniques for three microarrray
gene expression in yeast under different                 data sets and Section 5 summarizes our findings.
environmental stress conditions to the comparison of
∑
                                                                                      2
2. Decision Tree Learning Techniques                       Gain(D,S)= H(D) −          i =1
                                                                                             (| D i | / | D |) * H(D i )
for Microarray Data Collections                            In the above |D| denotes the number of elements in
                                                           set D and D=(p1, p2) with p1+ p2 =1 and indicates
2.1 Decision Tree Algorithms Reviewed                      that of the |D| examples p1*|D| examples belong to
                                                           the first class and p2*|D| examples belong to the
The traditional decision tree learning algorithm (for      second class.
more discussions on decision trees see [5]) builds a
                                                                   Procedure buildTree(D):
decision tree breadth-first by recursively dividing
                                                            1. Initialize root node R of tree T using data set D;
the examples until each partition is pure by
                                                            2. Initialize queue Q to contain root node R;
definition or meets other termination conditions (to
                                                            3. While Q is not empty do {
be discussed later). If a node satisfies a termination
                                                            4.     De-queue the first node N in Q;
condition, the node is marked with a class label that
                                                            5.     If N is not satisfying the termination
is the majority class of the samples associated with
                                                                condition {
this node. In the case of microarray data sets, the
splitting criterion for assigning examples to nodes is
                                                            6.         For each gene Gi (i= 1, 2, …. )
of the form “A < v” (where A is an attribute v is a         7.             {Evaluate splits on gene Gi based on
real number).                                                   information gain;
      In algorithms description in Fig. 1 below, we         8.              Record the best split point Si for Gi
assume that                                                     and its information gain}
1. D is the whole microarray training data set;             9.         Determine split point Smax with the
2. T is the decision tree to be built;                          highest information gain
3. N is one node of the decision tree in which holds        10.        Use Smax to divide node N into N1 and N2
     the indexes of samples;                                    and attach nodes N1 and N2 to node N in the
4. R is the root node of the decision tree;                     decision tree T;
5. Q is a queue which contains nodes of the same            11.       En-queue N1 and N2 to Q;
     type with N;                                           12.        }
6. Si: is a split point which is a structure containing     13. }
     a gene index i, a real number v and an
                                                           Figure 1: Decision Tree Learning Algorithm
     information gain value. A split point can be used
     to provided a split criterion to partition the tree   2.2 Attribute Histograms
     node N into two nodes N1 and N2 based on
     whether the gene i’s value of each example in         Our research introduced a number of new data
     the node is or isnot greater than value v;            structures for the purpose of speeding up the
7. Gi: denotes the i-th gene.                              decision tree learning algorithms. One of these data
                                                           structures is called attribute histogram that captures
     The result of applying the decision tree learning     the class distribution of a sorted continuous attribute.
algorithm is a tree whose intermediate nodes               Let us assume we have 7 examples and their
associate split points with attributes, and whose leaf     attribute values for an attribute A are 1.01, 1.07,
nodes represent decisions (classes in our case). Test      1.44, 2.20, 3.86, 4.3, and 5.71 and their class
conditions for a node are selected maximizing the          distribution is (-, +, +, +, -, -, +); that is, the first
information gain relying on the following                  example belongs to class 2, the second example is
framework: We assume we have 2 classes ,                   class 1,... If we group all the adjacent samples with
sometimes called ‘+’ and ‘-“ in the following, in our      the same class, we obtain the histogram for this
classification problem. A test S subdivides the            attribute which is (1-, 3+, 2-, 1+), for short (1,3,2,1)
examples D= (p1,p2) into 2 subsets D1 =(p11,p12)           as depicted in Fig. 2; if the class distribution for the
and D2 =(p21,p22). The quality of a test S is              sorted attribute A would have been (+,+,-,-,-,-,+) A’s
measured using Gain(D,S):                                  histogram would be (2,4,1). Efficient algorithms to
Let H(D=(p1,…,pm))= Σi=1 (pi log2(1/pi)) (called           compute attribute histograms have been discussed in
the entropy function)                                      [6].
2.3 Searching for the Best Split Point                      3. Optimizations for Leave-one-out
As mentioned earlier the traditional decision tree          Cross-validation
algorithm has a preference for tests that reduce            In k-fold cross-validation, we divide the data into k
entropy. To find the best test for a node, we have to       disjoint subsets of (approximately) equal size, then
search through all the possible split points for each       train the classifier k times, each time leaving out one
attribute. In order to compute the best split point for     of the subsets from training, but using only the
a numeric attribute, normally the (sorted) list of its      omitted subset as the test set to compute the error
values is scanned from the beginning, and for each          rate. If k equals the sample size, this is called "leave-
split point that is placed half way between every two       one-out" cross-validation. For the large data set size,
adjacent attribute values, the entropy is computed.         leave-one-out is very computation demanding since
The entropy for each split point can actually be            it has to construct more decision trees than normal
efficiently computed as shown in Figure 2 because           types of cross validation (k=10 is a popular choice in
of the existence of our attribute histogram data            the literature). But for data sets with few examples,
structure. Based on its histogram (1-, 3+, 2-, 1+), we      such as microarray data sets, leave-one-out cross
only consider three possible split (1- | 3+, 2-, 1+),       validation is pretty popular and practical since it
(1-, 3+ | 2-, 1+) and (1-, 3+, 2- | 1+). The vertical bar   gives the most unbiased evaluation model. Also,
represents the split point. Thus we eliminate from 6        when doing leave-one-out cross validation the
split points (Fayyad and Irani proved in [7] that           computations for different subsets tend to be very
splitting between adjacent samples that belong to the       similar. Therefore, it seems attractive to speed up
same class leads to sub-optimal information gain; in        leave-one-out cross validation through the reuse of
general, their paper advocates a multi-splitting            results of previous computations, which is the main
algorithms for continuous attributes whereas our            topic of the next subsection.
approach relies on binary splits) down to 3 split
points.                                                     3.1 Reuse of Sub-trees from Previous Runs
                                                            It is important to note that the whole data set and the
                                                            training sets in leave-one-out only differ in one
                                                            example. Therefore, in the likely event that the same
                                                            root test is selected for the two data sets, we already
                                                            know that at least one of the 2 sub-trees below the
                                                            root node generated by the first run (for the whole
                                                            data set) can be reused when constructing other
                                                            decision trees. Similar opportunities for reuse exist
                                                            at other levels of decision trees. Taking advantage of
                                                            this property, we compare the node to be split with
                                                            the stored nodes that are from pervious runs, and
                                                            reuse sub-trees if a match occurs.
Figure 2: Example of an Attribute Histogram                       In order to get a speed up through sub-tree
                                                            reuse, it is critical that matching nodes from
      A situation that we have not discussed until
                                                            previous runs can be found quickly. To facilitate the
now, involves histograms that contain identical
                                                            comparison of two nodes, we use bit strings to
attribute values that belong to different classes. To
                                                            represent the sample list of each node. For example,
cope with this situation when considering a split
                                                            if we have totally 10 samples, and 5 are associated
point, we need to check the two neighboring
                                                            with the current node, we use the bit string
examples’ attribute values on both sides of the split
                                                            “0101001101” as the signature of this node, and use
point. If they are the same, we have to discard this
                                                            XOR string comparisons and signature hashing to
split point even if its information gain is high.
                                                            quickly determine if a reusable sub-tree exists.
      After we determined the best split point for all
the attributes (genes in our cases), the attribute with     3.2 Using Histograms for Attribute Pruning
highest information gain is selected and used to split
the current node.
Assume that two histograms A (2+, 2-) and B (1+,          2nd: (2-, 3+, 7- | 5+, 2-). Apparently, the 2nd is better
1-, 1+, 1-) are given. In this case, our job is to find   than the 1st. Since we are dealing with only binary
the best split point among all possible splits of both    classification, we can assign a numeric value of +1
histograms. Obviously, B can never give a better          to one class and a value of –1 to the other class, and
split than A because (2+ | 2-) has entropy 0. This        we can use the sum of absolute differences in class
implies that performing information gain                  memberships in the two resulting partitions to
computations for attribute B is a waste of time. That     approximate entropy computations; the larger this
prompts us to think of some way to distinguish            result is, the lower the entropy is. In this case, for the
between “good” and “bad” histograms, and to               first split the sum is |-2 + 3| + |-7 + 5 – 2| = 5, and for
exclude attributes with bad histograms from               the second the sum is |-2 + 3 – 7| + |5 – 2| = 9. We
consideration for speed up.                               call this method absolute difference heuristic. We
      Mathematically, it might be quite complicated       performed some experiments [8] to determine how
to come up with a formula that predicts the best          often the same split point is picked by the
attribute to be used for a particular node of the         information gain heuristic and the absolute
decision tree. However, we are considering an             difference heuristic. Our results indicate that in most
approximate method that may not always be correct         cases (approx. between 91 and 100% depending on
but hopefully most of the time can be correct. The        data set characteristics) the same split point is picked
idea is to use an index, which we call “hist index”.      by both methods.
The hist index of histogram S is defined as:
                  m                                       4. Evaluation
      Hist(S) =   ∑
                  j= 1
                         Pj2
                                                          In this section we present the results of experiments
where Pj is the relative frequency of block j in S.       that evaluate our methods for 3 different microarray
      For example, if we have a histogram (1, 3, 4, 2),   data sets.
its hist index would be: 12 + 32 + 42 + 22 = 30. A        4.1 Data Sets and Experimental Design
histogram with a high hist index is more likely to
contain the best split point than a histogram with low    The first data set is a leukemia data collection that
hist index. Intuitively, we know that the fewer           consists of 62 bone marrow and 10 peripheral blood
blocks the histogram has, the better chance it has to     samples from acute leukemia patients (obtained
contain a good split point ---, mathematically, (a2 >     from Golub el al [8]). The total 72 samples fall into
a12 + a22) holds if we have (a = a1 + a2).                two types of acute leukemia: acute myeloid
      Our decision tree learning algorithm uses the       leukemia (AML) and acute lymphoblastic leukemia
hist index to prune attributes as follows. Prior to       (ALL). These samples come from both adults and
determining the best split point of an attribute, its     children. The RNA samples was hybridized to
hist index is computed and we compare it with the         Affymetrix high-density oligonucleotide microarrays
average hist index of all the previous histograms in      that contains probes for p = 7,130 human genes.
the same round; only if its hist index value is larger         The second data set a colon tissue data set
than the previous average the best split point for this   contains expression level (Red intensity/Green
attribute will be determined, otherwise, the attribute    intensity) of the 2000 genes with highest minimal
is excluded from consideration for test conditions of     intensity across 62 colon tissues. These gene
the particular node.                                      expressions in 40 tumor and 22 normal colon tissue
                                                          samples were analyzed with an Affymetrix
3.3 Approximating Entropy Computations                    oligonucleotide array containing over 6,500 human
This sub-section addresses the following question:        genes (Alon et al. [2]).
Do we really have to compute the log values that               The third data set comes from a study of gene
require a lot of floating point computation to find       expression in the breast cancer patients (Veer et al.
the smallest entropy values?                              [3]). The data set contains data from 98 primary
     Let us assume we have a histogram (2-, 3+, 7-,       breast cancers patients: 34 from patients who
5+, 2-) and we need to determine its split point that     developed distant metastases within 5 years, 44 from
minimizes entropy. Let us consider the difference         patients who continued to be disease-free after a
between the two splits. 1st: (*2-, 3+ | 7-, 5+, 2-) and   period of at least 5 years, 18 from patients with
BRCA1 germline mutations, and 2 from BRCA2
carriers. All patients were lymph node negative, and
under 55 years of age at diagnosis.                                4.2 Experimental Results
     In the experiments, we did not use all genes, but
                                                                   The first experiment evaluated the accuracy of the
rather selected a subset P with p elements of the
                                                                   three decision tree learning tools. Tables 1-3 below
genes. Decision trees were then learnt that operate
                                                                   display each algorithm’s error rate using the three
on the selected subset of genes. As proposed in [9],
                                                                   different data sets and also using three different p
we are removing genes from datasets based on the
                                                                   values for gene selection.
ratio of their between-groups to within-groups sum
                                                                        The first column of the three tables represents
of squares. For a particular gene j, the ratio is
                                                                   the p values that were used. The other columns give
             BSS ( j )   ∑ i ∑ kI ( yi = k )( x kj − x . j ) 2     the number of total misclassification and the error
defined as:            =                                       ,
             WSS ( j )   ∑ i ∑ kI ( yi = k )( xij − x kj ) 2       rate (inside the braces). Error rates were computed
where x . j denotes the average expression level of                using leave-one-out cross validation.
gene j across all samples and x kj denotes the average               Table 1: The Leukemia data set test result (72 samples)
level of gene j across samples belonging to class k.
                                                                         Tools   C5.0           Microarray Optimized
      To give an explicit example here, assume we
                                                                                 Decision       Decision   Decision
have four samples and two genes for each sample:
                                                                                 Tree           Tree       Tree
the first gene’s expression level values for the four
                                                                     P
samples are (1, 2, 3, 4) and the second’s are (1, 3, 2,
4); the sample class memberships are (+, -, +, -)                    1024        5(6.9%)        5(6.9%)         4(5.6%)
(listed in the order of samples no.1, no.2, no.3 and
                                                                     900         4(4.6%)        8(11.1%)        5(6.9%)
no.4). For gene 1, we have BSS/WSS = 0.125, and
for gene 2, BSS/WSS = 4. If we have to remove one                    750         13(18.1%       11(15.3%)       3(4.2%)
gene, gene 1 will be removed according to our rule                               )
since it has a lower BSS/WSS value. The removal of
gene 1 is reasonable because we can tell the class                   Table 2: Colon Tissue data set test result (62 Samples)
membership of the samples by looking at their gene
2 expression level values: if one sample’s gene 2                        Tools   C5.0           Microarray Optimized
expression level is greater than 2.5, the sample                                 Decision       Decision   Decision
should belong to the negative class, otherwise the                               Tree           Tree       Tree
                                                                     P
sample belongs to the positive class. If we evaluate
gene 1 instead, we will not be able to perform the                   1600        12(19.4%       15(24.2%)       16(25.8%)
classification in one single step like we have just                              )
done with gene 2.
      After we calculate the BSS/WSS ratios for all                  1200        12(19.4%       15(24.2%)       16(25.8%)
genes in a data set, only the p genes with the largest                           )
ratios will remain in the datasets that will be used in              800         12(19.4%       14(22.6%)       16(25.8%)
the experiments. Experiments were conducted with                                 )
different p values.
      In the experiments, we compared the popular                    Table 3: Breast Cancer data set test result (78 Samples)
C5.0/See5.0 decision tree tool (which was run with
its default parameter settings) with two versions of                     Tools   C5.0           Microarray Optimized
our tool. The first version, called microarray                                   Decision       Decision   Decision
decision tree tool, does not use any optimizations                               Tree           Tree       Tree
but employs pre-pruning. It stops growing the tree                   P
when at least 90% of the examples belong to the                      5000        38(48.7%       29(33.3%)       35(44.9%)
majority class. The second version of our tool, that is                          )
called optimized decision tree tool, uses the same
pre-pruning and employs all the techniques that were                 1600        39(50.0%       32(41.0%)       30(38.5%)
discussed in Section 3.
)                                             normal (Microarray Decision Tree) and optimized
                                                           (Optimized Decision Tree). All these experiments
  1200       39(50.0%      31(39.7%)     29(33.3%)
                                                           were performed on an 850 Mhz Intel Pentium
             )
                                                           processor with 128MB main memory. The cpu time
                                                           that is displayed (in seconds) in Table 4 includes the
      If we study the error rates for the three methods    time of tree building and evaluation process (Note:
listed in the three tables carefully, it can be noticed    these experiments are identical to those previously
that at an average the error rates for the optimized       listed in Tables 1 to 3). Our experimental results
decision tree are lower than that of the one not being     suggest that the decision tree tool designed for
optimized, which looks quite surprising since in the       microarray data sets normally runs slightly faster
optimized decision tree tool used a lot of                 than the C5.0 tool, while the speedup of the
approximate computations and pruning.                      optimized microarray decision tree tool is quite
      However, further analysis revealed that the use      significant and ranges from 150% to 400%.
of attribute pruning (using the hist index we               Table 4: CPU time comparison of three different decision
introduced in Section 3.2) provides an explanation                               tree tools
for the better average accuracy of the optimized
decision tree tool . Why would attribute pruning lead                      P-           CPU Time (Seconds)
                                                            Data Sets
to a more accurate prediction in some cases? The                          Value      C5.0    Normal      Optimized
reason is that the entropy function does not take the
                                                                           1024       6.7       3.5          1.2
class distribution on sorted attributes into
                                                            Leukemia
consideration. For example, if we have two attribute                        900       5.6       3.1          1.1
                                                             Data set
histograms (3+, 3-, 6+) and (3+, 1-, 2+, 1-, 2+, 1-,                        750       6.0       4.1          1.1
2+), for the first histogram the best split point is (3+
| 3-, 6+) but for the second histogram there is one                        1600      12.0       8.0          2.2
                                                              Colon
similar split point (3+ | 1-, 2+, 1-, 2+, 1-, 2+) which      Tissue        1200       9.0       6.0          1.7
is equivalent to (3+ | 3-, 6+) with respect to the           Data set
                                                                            800       5.9       3.8          1.1
information gain heuristic. Therefore, both split
points have the same chance to be selected. But, just                      5000      74.5      75.3         15.9
                                                             Breast
by intuition, we would say that the second split point       Cancer        2000      30.4      30.2          6.4
is a much worse than the first split point because of        Data set
its large number of blocks, requiring more tests to                        1500      22.4      20.4          4.8
separate the two classes properly than the first one.
      The traditional information gain heuristic           5. Summary and Conclusion
ignores such distributional aspects at all, which
                                                           We introduced decision tree learning algorithms for
causes the loss of accuracy in some circumstances.
                                                           microarray data sets, and its optimization to speed
However, hist index based pruning, as proposed in
                                                           up leave-one-out cross validation. Aimed at this
3.2, improved on this situation by removing
                                                           goal, several strategies were employed: the
attributes that have a low hist index (like the second
                                                           introduction of hist index to help pruning attributes,
attribute in the above example) beforehand.
                                                           approximate computations that measure entropy; and
Intuitively, continuous attributes with long
                                                           the reuse of subtrees from previous runs. We claim
histograms “representing flip-flopping class
                                                           that first two ideas are new, whereas, the third idea
memberships” are not very attractive to be chosen in
                                                           was also explored in Blockeel’s paper [10] that
test conditions, because more nodes/tests are
                                                           centered on the reuse of split points. The
necessary in a decision tree to predict classes
                                                           performance of microarray decision tree tool was
correctly based on this attribute. In summary, some
                                                           compared with that of commercially available
of those “bad” attributes were removed by attribute
                                                           decision tree tool C5.0/See5.0 using 3 microarray
pruning that explains the higher average accuracy in
                                                           data sets. The experiments suggest that our tool runs
the experiments.
                                                           between 150% and 400% faster than C5.0.
      In another experiment we compared the cpu
                                                                We also compared the trees that were generated
time for leave-one cross validation for the three tree
                                                           in the experiments for the same data sets. We
decision tree learning tools: C5.0 Decision Tree,
observed that the trees generated by the same tool         [6] Xiaoyong Li. Concept learning techniques for
are very similar. Trees generated by different tools         microarray data collections, Master’s Thesis,
also had a significant degree of similarity. Basically,      University of Houston, December 2002.
all the trees that were generated for the three data       [7] U. Fayyad, and K. Irani. Multi-interval
sets are of small size with normally less than 10            discretization of continuous-valued attributes for
nodes. We also noticed that smaller trees seem to be         classification learning, Proc. Int. Joint Conf. On
correlated with a lower error rates.                         Artificial Intelligence (IJCAI-93), pp. 1022-1029,
                                                             1993.
      Also worth mentioning is that our experimental
results revealed that the use of the hist index resulted   [8] T. R. Golub, D. K. Slonim, P. Tamayo, C.
in a better accuracy in some cases. These results also       Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller,
                                                             M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D.
suggest that for continuous attributes the traditional       Bloomfield, and E. S. Lander. Molecular
entropy-based information gain heuristic does not            classification of cancer: class discovery and class
work very well, because of its weakness to reflect           prediction by gene expression monitoring, Science,
the class distribution characteristics of the samples        286:531-537, 1999.
with respect to continuous attributes. Therefore,          [9] S. Dudoit, J. Fridlyand, and T. P. Speed.
better evaluation heuristics are needed for                  Comparison of discrimination methods for the
continuous attributes. This problem is the subject of        classification of tumors using gene expression
our current research; in particular, we are currently        data, Journal of the American Statistical
investigating multi-modal heuristics that use both           Association, Vol. 97, No. 457, pp. 77—87, 2002.
hist index and entropy. Another problem that is            [10] H. Blockeel, J. Struyf. Efficient algorithms for
investigated in our current research is the                  decision tree cross-validation, Machine Learning:
generalization of the techniques described in this           Proceedings of the Eighteenth International
paper to classification problems that involve more           Conference, 11-18, 2001.
than two classes.

References
[1] A. Brazma, J. Vilo. Gene expression data
  analysis, FEBS Letters, 480:17-24, 2000.
[2] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S.
  Ybarra, D. Mack, and A. J. Levine. Broad patterns
  of gene expression revealed by clustering analysis
  of tumor and normal colon tissues probed by
  oligonucleotide arrays, Cell Biology, Vol. 96, pp.
  6745-6750, June 1999.
[3] Laura J. van ‘t Veer, Hongyue Dai, Marc J. van
  de Vijver, Yudong D. He, Augustinus A.M. Hart,
  Mao Mao, Hans L. Peterse, Karin van der Kooy,
  Matthew J. Marton, Anke T. Witteveen, George J.
  Schreiber, Ron M. Kerkhoven, Chris Roberts,
  Peter S. Linsley, René Bernards and Stephen H.
  Friend. Gene expression profiling predicts clinical
  outcome of breast cancer, Nature, 415, pp. 530–
  536, 2002.
[4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S.
  Kitareewan, E. Dmitrovsky, E. Lander, and T.
  Golub. Interpreting patterns of gene expression
  with self-organizing maps. PNAS, 96:2907-2912,
  1999.
[5] J.R. Quinlan. C4.5: Programs for machine
  learning. Morgan Kaufman, San Mateo, 1993.

More Related Content

What's hot

A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsEditor IJCATR
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.docbutest
 
Comparative study of ksvdd and fsvm for classification of mislabeled data
Comparative study of ksvdd and fsvm for classification of mislabeled dataComparative study of ksvdd and fsvm for classification of mislabeled data
Comparative study of ksvdd and fsvm for classification of mislabeled dataeSAT Journals
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
 
Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision treePrafulla Shukla
 
Efficient exploration of region hierarchies for semantic segmentation
Efficient exploration of region hierarchies for semantic segmentationEfficient exploration of region hierarchies for semantic segmentation
Efficient exploration of region hierarchies for semantic segmentationUniversitat Politècnica de Catalunya
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292HARDIK SINGH
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clusteringChinnapat Kaewchinporn
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern  RecognitionA Comprehensive Overview of Clustering Algorithms in Pattern  Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern RecognitionIOSR Journals
 
METHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYMETHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYIJDKP
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET Journal
 

What's hot (20)

Ajas11 alok
Ajas11 alokAjas11 alok
Ajas11 alok
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
Study of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN AlgorithmsStudy of Different Multi-instance Learning kNN Algorithms
Study of Different Multi-instance Learning kNN Algorithms
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.doc
 
Comparative study of ksvdd and fsvm for classification of mislabeled data
Comparative study of ksvdd and fsvm for classification of mislabeled dataComparative study of ksvdd and fsvm for classification of mislabeled data
Comparative study of ksvdd and fsvm for classification of mislabeled data
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
Multivariate decision tree
Multivariate decision treeMultivariate decision tree
Multivariate decision tree
 
Efficient exploration of region hierarchies for semantic segmentation
Efficient exploration of region hierarchies for semantic segmentationEfficient exploration of region hierarchies for semantic segmentation
Efficient exploration of region hierarchies for semantic segmentation
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292IJCSI-10-6-1-288-292
IJCSI-10-6-1-288-292
 
winbis1005
winbis1005winbis1005
winbis1005
 
A combination of decision tree learning and clustering
A combination of decision tree learning and clusteringA combination of decision tree learning and clustering
A combination of decision tree learning and clustering
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
Bj24390398
Bj24390398Bj24390398
Bj24390398
 
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern  RecognitionA Comprehensive Overview of Clustering Algorithms in Pattern  Recognition
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
 
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
Layering Based Network Intrusion Detection System to Enhance Network Attacks ...
 
METHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEYMETHODS FOR INCREMENTAL LEARNING: A SURVEY
METHODS FOR INCREMENTAL LEARNING: A SURVEY
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label Classification
 

Viewers also liked

Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommunTjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommunKatrineholmskommun
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
LaranEvansResume
LaranEvansResumeLaranEvansResume
LaranEvansResumebutest
 
BEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPERBEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPERbutest
 
Mills_Metafeatures.doc
Mills_Metafeatures.docMills_Metafeatures.doc
Mills_Metafeatures.docbutest
 
mathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schoolsmathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schoolsbutest
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...butest
 

Viewers also liked (8)

Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommunTjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
Tjänstemannaunderlag till planeringsdirektiv 2011-2013 Katrineholms kommun
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
LaranEvansResume
LaranEvansResumeLaranEvansResume
LaranEvansResume
 
BEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPERBEGIN TITLE THREE INCHES FROM TOP OF PAPER
BEGIN TITLE THREE INCHES FROM TOP OF PAPER
 
ppt
pptppt
ppt
 
Mills_Metafeatures.doc
Mills_Metafeatures.docMills_Metafeatures.doc
Mills_Metafeatures.doc
 
mathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schoolsmathnightinfo.docx - Anne Arundel County Public Schools
mathnightinfo.docx - Anne Arundel County Public Schools
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...
 

Similar to LE03.doc

Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Methodpraveena06
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Miningijdmtaiir
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data AnalysisJhoirene Clemente
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...IOSR Journals
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...ijcnes
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...IOSRjournaljce
 
Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...IDES Editor
 
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...IJNSA Journal
 
Privacy preserving naive bayes classifier for horizontally partitioned data u...
Privacy preserving naive bayes classifier for horizontally partitioned data u...Privacy preserving naive bayes classifier for horizontally partitioned data u...
Privacy preserving naive bayes classifier for horizontally partitioned data u...IJNSA Journal
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...cscpconf
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 

Similar to LE03.doc (20)

Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Mining
 
Gene Expression Data Analysis
Gene Expression Data AnalysisGene Expression Data Analysis
Gene Expression Data Analysis
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...Performance Evaluation of Different Data Mining Classification Algorithm and ...
Performance Evaluation of Different Data Mining Classification Algorithm and ...
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
 
2224d_final
2224d_final2224d_final
2224d_final
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
 
Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...Performance Evaluation of Classifiers used for Identification of Encryption A...
Performance Evaluation of Classifiers used for Identification of Encryption A...
 
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
 
Privacy preserving naive bayes classifier for horizontally partitioned data u...
Privacy preserving naive bayes classifier for horizontally partitioned data u...Privacy preserving naive bayes classifier for horizontally partitioned data u...
Privacy preserving naive bayes classifier for horizontally partitioned data u...
 
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
X-TREPAN: A MULTI CLASS REGRESSION AND ADAPTED EXTRACTION OF COMPREHENSIBLE D...
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
08 entropie
08 entropie08 entropie
08 entropie
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

LE03.doc

  • 1. To appear Proc. The 2003 International Conference on Machine Learning and Applications (ICMLA'03) Los Angeles, California, June 23-24, 2003. Fast Decision Tree Learning Techniques for Microarray Data Collections Xiaoyong Li and Christoph F. Eick Department of Computer Science University of Houston, TX 77204-3010 e-mail: ceick@cs.uh.edu Abstract gene expression profiles of tumors from cancer patients [1]. In addition to the enormous scientific DNA microarrays allow monitoring of potential of DNA microarrays to help in expression levels for thousands of genes understanding gene regulation and interactions, simultaneously. The ability to successfully microarrays have very important applications in analyze the huge amounts of genomic data is of pharmaceutical and clinical research. By comparing increasing importance for research in biology gene expression in normal and abnormal cells, and medicine. The focus of this paper is the microarrays may be used to identify which genes are discussion of techniques and algorithms of a involved in causing particular diseases. Currently, decision tree learning tool that has been devised most approaches to the computational analysis of taking into consideration the special features of gene expression data focus more on the attempt to microarray data sets: continuous-valued learn about genes and tumor classes in an attributes and small size of examples with a unsupervised way. Many research projects employ large number of genes. The paper introduces cluster analysis for both tumor samples and genes, novel approaches to speed up leave-one-out and mostly use hierarchical clustering methods [2,3] cross validation through the reuse of results of and partitioning methods, such as self-organizing previous computations, attribute pruning, and maps [4] to identify groups of similar genes and through approximate computation techniques. groups of similar samples. Our approach employs special histogram-based data structures for continuous attributes for This paper, however, centers on the application of speed up and for the purpose of pruning. We supervised learning techniques to microarray data present experimental results concerning three collections. In particular, we will discuss the features microarray data sets that suggest that these of a decision tree learning tool for microarray data optimizations lead to speedups between 150% sets. We assume that each data set includes gene and 400%. We also present arguments that our expression data of m-RNA samples. Normally, in attribute pruning techniques not only lead to these data sets the number of genes is pretty large better speed but also enhance the testing (usually between 1000 and 10,000). Each gene is accuracy. characterized by numerical values that measure the degree the gene is turned on for the particular Key words and phrases: decision trees, concept sample. The number of examples in the training set, learning for microarray data sets, leave-one-out on the other hand, is typically below one hundred. cross validation, heuristics for split point Associated with each sample is its type or class that selection, decision tree reuse. we are trying to predict. Moreover, in this paper we will restrict our discussions to binary classification problems. 1. Introduction Section 2 introduces decision tree learning The advent of DNA microarray technology provides techniques for microarray data collections. Section 3 biologists with the ability of monitoring expression discusses how to speed up leave-one-out cross levels for thousands of genes simultaneously. validation. Section 4 presents experimental results Applications of microarrays range from the study of that evaluate our techniques for three microarrray gene expression in yeast under different data sets and Section 5 summarizes our findings. environmental stress conditions to the comparison of
  • 2. 2 2. Decision Tree Learning Techniques Gain(D,S)= H(D) − i =1 (| D i | / | D |) * H(D i ) for Microarray Data Collections In the above |D| denotes the number of elements in set D and D=(p1, p2) with p1+ p2 =1 and indicates 2.1 Decision Tree Algorithms Reviewed that of the |D| examples p1*|D| examples belong to the first class and p2*|D| examples belong to the The traditional decision tree learning algorithm (for second class. more discussions on decision trees see [5]) builds a Procedure buildTree(D): decision tree breadth-first by recursively dividing 1. Initialize root node R of tree T using data set D; the examples until each partition is pure by 2. Initialize queue Q to contain root node R; definition or meets other termination conditions (to 3. While Q is not empty do { be discussed later). If a node satisfies a termination 4. De-queue the first node N in Q; condition, the node is marked with a class label that 5. If N is not satisfying the termination is the majority class of the samples associated with condition { this node. In the case of microarray data sets, the splitting criterion for assigning examples to nodes is 6. For each gene Gi (i= 1, 2, …. ) of the form “A < v” (where A is an attribute v is a 7. {Evaluate splits on gene Gi based on real number). information gain; In algorithms description in Fig. 1 below, we 8. Record the best split point Si for Gi assume that and its information gain} 1. D is the whole microarray training data set; 9. Determine split point Smax with the 2. T is the decision tree to be built; highest information gain 3. N is one node of the decision tree in which holds 10. Use Smax to divide node N into N1 and N2 the indexes of samples; and attach nodes N1 and N2 to node N in the 4. R is the root node of the decision tree; decision tree T; 5. Q is a queue which contains nodes of the same 11. En-queue N1 and N2 to Q; type with N; 12. } 6. Si: is a split point which is a structure containing 13. } a gene index i, a real number v and an Figure 1: Decision Tree Learning Algorithm information gain value. A split point can be used to provided a split criterion to partition the tree 2.2 Attribute Histograms node N into two nodes N1 and N2 based on whether the gene i’s value of each example in Our research introduced a number of new data the node is or isnot greater than value v; structures for the purpose of speeding up the 7. Gi: denotes the i-th gene. decision tree learning algorithms. One of these data structures is called attribute histogram that captures The result of applying the decision tree learning the class distribution of a sorted continuous attribute. algorithm is a tree whose intermediate nodes Let us assume we have 7 examples and their associate split points with attributes, and whose leaf attribute values for an attribute A are 1.01, 1.07, nodes represent decisions (classes in our case). Test 1.44, 2.20, 3.86, 4.3, and 5.71 and their class conditions for a node are selected maximizing the distribution is (-, +, +, +, -, -, +); that is, the first information gain relying on the following example belongs to class 2, the second example is framework: We assume we have 2 classes , class 1,... If we group all the adjacent samples with sometimes called ‘+’ and ‘-“ in the following, in our the same class, we obtain the histogram for this classification problem. A test S subdivides the attribute which is (1-, 3+, 2-, 1+), for short (1,3,2,1) examples D= (p1,p2) into 2 subsets D1 =(p11,p12) as depicted in Fig. 2; if the class distribution for the and D2 =(p21,p22). The quality of a test S is sorted attribute A would have been (+,+,-,-,-,-,+) A’s measured using Gain(D,S): histogram would be (2,4,1). Efficient algorithms to Let H(D=(p1,…,pm))= Σi=1 (pi log2(1/pi)) (called compute attribute histograms have been discussed in the entropy function) [6].
  • 3. 2.3 Searching for the Best Split Point 3. Optimizations for Leave-one-out As mentioned earlier the traditional decision tree Cross-validation algorithm has a preference for tests that reduce In k-fold cross-validation, we divide the data into k entropy. To find the best test for a node, we have to disjoint subsets of (approximately) equal size, then search through all the possible split points for each train the classifier k times, each time leaving out one attribute. In order to compute the best split point for of the subsets from training, but using only the a numeric attribute, normally the (sorted) list of its omitted subset as the test set to compute the error values is scanned from the beginning, and for each rate. If k equals the sample size, this is called "leave- split point that is placed half way between every two one-out" cross-validation. For the large data set size, adjacent attribute values, the entropy is computed. leave-one-out is very computation demanding since The entropy for each split point can actually be it has to construct more decision trees than normal efficiently computed as shown in Figure 2 because types of cross validation (k=10 is a popular choice in of the existence of our attribute histogram data the literature). But for data sets with few examples, structure. Based on its histogram (1-, 3+, 2-, 1+), we such as microarray data sets, leave-one-out cross only consider three possible split (1- | 3+, 2-, 1+), validation is pretty popular and practical since it (1-, 3+ | 2-, 1+) and (1-, 3+, 2- | 1+). The vertical bar gives the most unbiased evaluation model. Also, represents the split point. Thus we eliminate from 6 when doing leave-one-out cross validation the split points (Fayyad and Irani proved in [7] that computations for different subsets tend to be very splitting between adjacent samples that belong to the similar. Therefore, it seems attractive to speed up same class leads to sub-optimal information gain; in leave-one-out cross validation through the reuse of general, their paper advocates a multi-splitting results of previous computations, which is the main algorithms for continuous attributes whereas our topic of the next subsection. approach relies on binary splits) down to 3 split points. 3.1 Reuse of Sub-trees from Previous Runs It is important to note that the whole data set and the training sets in leave-one-out only differ in one example. Therefore, in the likely event that the same root test is selected for the two data sets, we already know that at least one of the 2 sub-trees below the root node generated by the first run (for the whole data set) can be reused when constructing other decision trees. Similar opportunities for reuse exist at other levels of decision trees. Taking advantage of this property, we compare the node to be split with the stored nodes that are from pervious runs, and reuse sub-trees if a match occurs. Figure 2: Example of an Attribute Histogram In order to get a speed up through sub-tree reuse, it is critical that matching nodes from A situation that we have not discussed until previous runs can be found quickly. To facilitate the now, involves histograms that contain identical comparison of two nodes, we use bit strings to attribute values that belong to different classes. To represent the sample list of each node. For example, cope with this situation when considering a split if we have totally 10 samples, and 5 are associated point, we need to check the two neighboring with the current node, we use the bit string examples’ attribute values on both sides of the split “0101001101” as the signature of this node, and use point. If they are the same, we have to discard this XOR string comparisons and signature hashing to split point even if its information gain is high. quickly determine if a reusable sub-tree exists. After we determined the best split point for all the attributes (genes in our cases), the attribute with 3.2 Using Histograms for Attribute Pruning highest information gain is selected and used to split the current node.
  • 4. Assume that two histograms A (2+, 2-) and B (1+, 2nd: (2-, 3+, 7- | 5+, 2-). Apparently, the 2nd is better 1-, 1+, 1-) are given. In this case, our job is to find than the 1st. Since we are dealing with only binary the best split point among all possible splits of both classification, we can assign a numeric value of +1 histograms. Obviously, B can never give a better to one class and a value of –1 to the other class, and split than A because (2+ | 2-) has entropy 0. This we can use the sum of absolute differences in class implies that performing information gain memberships in the two resulting partitions to computations for attribute B is a waste of time. That approximate entropy computations; the larger this prompts us to think of some way to distinguish result is, the lower the entropy is. In this case, for the between “good” and “bad” histograms, and to first split the sum is |-2 + 3| + |-7 + 5 – 2| = 5, and for exclude attributes with bad histograms from the second the sum is |-2 + 3 – 7| + |5 – 2| = 9. We consideration for speed up. call this method absolute difference heuristic. We Mathematically, it might be quite complicated performed some experiments [8] to determine how to come up with a formula that predicts the best often the same split point is picked by the attribute to be used for a particular node of the information gain heuristic and the absolute decision tree. However, we are considering an difference heuristic. Our results indicate that in most approximate method that may not always be correct cases (approx. between 91 and 100% depending on but hopefully most of the time can be correct. The data set characteristics) the same split point is picked idea is to use an index, which we call “hist index”. by both methods. The hist index of histogram S is defined as: m 4. Evaluation Hist(S) = ∑ j= 1 Pj2 In this section we present the results of experiments where Pj is the relative frequency of block j in S. that evaluate our methods for 3 different microarray For example, if we have a histogram (1, 3, 4, 2), data sets. its hist index would be: 12 + 32 + 42 + 22 = 30. A 4.1 Data Sets and Experimental Design histogram with a high hist index is more likely to contain the best split point than a histogram with low The first data set is a leukemia data collection that hist index. Intuitively, we know that the fewer consists of 62 bone marrow and 10 peripheral blood blocks the histogram has, the better chance it has to samples from acute leukemia patients (obtained contain a good split point ---, mathematically, (a2 > from Golub el al [8]). The total 72 samples fall into a12 + a22) holds if we have (a = a1 + a2). two types of acute leukemia: acute myeloid Our decision tree learning algorithm uses the leukemia (AML) and acute lymphoblastic leukemia hist index to prune attributes as follows. Prior to (ALL). These samples come from both adults and determining the best split point of an attribute, its children. The RNA samples was hybridized to hist index is computed and we compare it with the Affymetrix high-density oligonucleotide microarrays average hist index of all the previous histograms in that contains probes for p = 7,130 human genes. the same round; only if its hist index value is larger The second data set a colon tissue data set than the previous average the best split point for this contains expression level (Red intensity/Green attribute will be determined, otherwise, the attribute intensity) of the 2000 genes with highest minimal is excluded from consideration for test conditions of intensity across 62 colon tissues. These gene the particular node. expressions in 40 tumor and 22 normal colon tissue samples were analyzed with an Affymetrix 3.3 Approximating Entropy Computations oligonucleotide array containing over 6,500 human This sub-section addresses the following question: genes (Alon et al. [2]). Do we really have to compute the log values that The third data set comes from a study of gene require a lot of floating point computation to find expression in the breast cancer patients (Veer et al. the smallest entropy values? [3]). The data set contains data from 98 primary Let us assume we have a histogram (2-, 3+, 7-, breast cancers patients: 34 from patients who 5+, 2-) and we need to determine its split point that developed distant metastases within 5 years, 44 from minimizes entropy. Let us consider the difference patients who continued to be disease-free after a between the two splits. 1st: (*2-, 3+ | 7-, 5+, 2-) and period of at least 5 years, 18 from patients with
  • 5. BRCA1 germline mutations, and 2 from BRCA2 carriers. All patients were lymph node negative, and under 55 years of age at diagnosis. 4.2 Experimental Results In the experiments, we did not use all genes, but The first experiment evaluated the accuracy of the rather selected a subset P with p elements of the three decision tree learning tools. Tables 1-3 below genes. Decision trees were then learnt that operate display each algorithm’s error rate using the three on the selected subset of genes. As proposed in [9], different data sets and also using three different p we are removing genes from datasets based on the values for gene selection. ratio of their between-groups to within-groups sum The first column of the three tables represents of squares. For a particular gene j, the ratio is the p values that were used. The other columns give BSS ( j ) ∑ i ∑ kI ( yi = k )( x kj − x . j ) 2 the number of total misclassification and the error defined as: = , WSS ( j ) ∑ i ∑ kI ( yi = k )( xij − x kj ) 2 rate (inside the braces). Error rates were computed where x . j denotes the average expression level of using leave-one-out cross validation. gene j across all samples and x kj denotes the average Table 1: The Leukemia data set test result (72 samples) level of gene j across samples belonging to class k. Tools C5.0 Microarray Optimized To give an explicit example here, assume we Decision Decision Decision have four samples and two genes for each sample: Tree Tree Tree the first gene’s expression level values for the four P samples are (1, 2, 3, 4) and the second’s are (1, 3, 2, 4); the sample class memberships are (+, -, +, -) 1024 5(6.9%) 5(6.9%) 4(5.6%) (listed in the order of samples no.1, no.2, no.3 and 900 4(4.6%) 8(11.1%) 5(6.9%) no.4). For gene 1, we have BSS/WSS = 0.125, and for gene 2, BSS/WSS = 4. If we have to remove one 750 13(18.1% 11(15.3%) 3(4.2%) gene, gene 1 will be removed according to our rule ) since it has a lower BSS/WSS value. The removal of gene 1 is reasonable because we can tell the class Table 2: Colon Tissue data set test result (62 Samples) membership of the samples by looking at their gene 2 expression level values: if one sample’s gene 2 Tools C5.0 Microarray Optimized expression level is greater than 2.5, the sample Decision Decision Decision should belong to the negative class, otherwise the Tree Tree Tree P sample belongs to the positive class. If we evaluate gene 1 instead, we will not be able to perform the 1600 12(19.4% 15(24.2%) 16(25.8%) classification in one single step like we have just ) done with gene 2. After we calculate the BSS/WSS ratios for all 1200 12(19.4% 15(24.2%) 16(25.8%) genes in a data set, only the p genes with the largest ) ratios will remain in the datasets that will be used in 800 12(19.4% 14(22.6%) 16(25.8%) the experiments. Experiments were conducted with ) different p values. In the experiments, we compared the popular Table 3: Breast Cancer data set test result (78 Samples) C5.0/See5.0 decision tree tool (which was run with its default parameter settings) with two versions of Tools C5.0 Microarray Optimized our tool. The first version, called microarray Decision Decision Decision decision tree tool, does not use any optimizations Tree Tree Tree but employs pre-pruning. It stops growing the tree P when at least 90% of the examples belong to the 5000 38(48.7% 29(33.3%) 35(44.9%) majority class. The second version of our tool, that is ) called optimized decision tree tool, uses the same pre-pruning and employs all the techniques that were 1600 39(50.0% 32(41.0%) 30(38.5%) discussed in Section 3.
  • 6. ) normal (Microarray Decision Tree) and optimized (Optimized Decision Tree). All these experiments 1200 39(50.0% 31(39.7%) 29(33.3%) were performed on an 850 Mhz Intel Pentium ) processor with 128MB main memory. The cpu time that is displayed (in seconds) in Table 4 includes the If we study the error rates for the three methods time of tree building and evaluation process (Note: listed in the three tables carefully, it can be noticed these experiments are identical to those previously that at an average the error rates for the optimized listed in Tables 1 to 3). Our experimental results decision tree are lower than that of the one not being suggest that the decision tree tool designed for optimized, which looks quite surprising since in the microarray data sets normally runs slightly faster optimized decision tree tool used a lot of than the C5.0 tool, while the speedup of the approximate computations and pruning. optimized microarray decision tree tool is quite However, further analysis revealed that the use significant and ranges from 150% to 400%. of attribute pruning (using the hist index we Table 4: CPU time comparison of three different decision introduced in Section 3.2) provides an explanation tree tools for the better average accuracy of the optimized decision tree tool . Why would attribute pruning lead P- CPU Time (Seconds) Data Sets to a more accurate prediction in some cases? The Value C5.0 Normal Optimized reason is that the entropy function does not take the 1024 6.7 3.5 1.2 class distribution on sorted attributes into Leukemia consideration. For example, if we have two attribute 900 5.6 3.1 1.1 Data set histograms (3+, 3-, 6+) and (3+, 1-, 2+, 1-, 2+, 1-, 750 6.0 4.1 1.1 2+), for the first histogram the best split point is (3+ | 3-, 6+) but for the second histogram there is one 1600 12.0 8.0 2.2 Colon similar split point (3+ | 1-, 2+, 1-, 2+, 1-, 2+) which Tissue 1200 9.0 6.0 1.7 is equivalent to (3+ | 3-, 6+) with respect to the Data set 800 5.9 3.8 1.1 information gain heuristic. Therefore, both split points have the same chance to be selected. But, just 5000 74.5 75.3 15.9 Breast by intuition, we would say that the second split point Cancer 2000 30.4 30.2 6.4 is a much worse than the first split point because of Data set its large number of blocks, requiring more tests to 1500 22.4 20.4 4.8 separate the two classes properly than the first one. The traditional information gain heuristic 5. Summary and Conclusion ignores such distributional aspects at all, which We introduced decision tree learning algorithms for causes the loss of accuracy in some circumstances. microarray data sets, and its optimization to speed However, hist index based pruning, as proposed in up leave-one-out cross validation. Aimed at this 3.2, improved on this situation by removing goal, several strategies were employed: the attributes that have a low hist index (like the second introduction of hist index to help pruning attributes, attribute in the above example) beforehand. approximate computations that measure entropy; and Intuitively, continuous attributes with long the reuse of subtrees from previous runs. We claim histograms “representing flip-flopping class that first two ideas are new, whereas, the third idea memberships” are not very attractive to be chosen in was also explored in Blockeel’s paper [10] that test conditions, because more nodes/tests are centered on the reuse of split points. The necessary in a decision tree to predict classes performance of microarray decision tree tool was correctly based on this attribute. In summary, some compared with that of commercially available of those “bad” attributes were removed by attribute decision tree tool C5.0/See5.0 using 3 microarray pruning that explains the higher average accuracy in data sets. The experiments suggest that our tool runs the experiments. between 150% and 400% faster than C5.0. In another experiment we compared the cpu We also compared the trees that were generated time for leave-one cross validation for the three tree in the experiments for the same data sets. We decision tree learning tools: C5.0 Decision Tree,
  • 7. observed that the trees generated by the same tool [6] Xiaoyong Li. Concept learning techniques for are very similar. Trees generated by different tools microarray data collections, Master’s Thesis, also had a significant degree of similarity. Basically, University of Houston, December 2002. all the trees that were generated for the three data [7] U. Fayyad, and K. Irani. Multi-interval sets are of small size with normally less than 10 discretization of continuous-valued attributes for nodes. We also noticed that smaller trees seem to be classification learning, Proc. Int. Joint Conf. On correlated with a lower error rates. Artificial Intelligence (IJCAI-93), pp. 1022-1029, 1993. Also worth mentioning is that our experimental results revealed that the use of the hist index resulted [8] T. R. Golub, D. K. Slonim, P. Tamayo, C. in a better accuracy in some cases. These results also Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M.L. Loh, J. R. Downing, M. A. Caligiuri, C. D. suggest that for continuous attributes the traditional Bloomfield, and E. S. Lander. Molecular entropy-based information gain heuristic does not classification of cancer: class discovery and class work very well, because of its weakness to reflect prediction by gene expression monitoring, Science, the class distribution characteristics of the samples 286:531-537, 1999. with respect to continuous attributes. Therefore, [9] S. Dudoit, J. Fridlyand, and T. P. Speed. better evaluation heuristics are needed for Comparison of discrimination methods for the continuous attributes. This problem is the subject of classification of tumors using gene expression our current research; in particular, we are currently data, Journal of the American Statistical investigating multi-modal heuristics that use both Association, Vol. 97, No. 457, pp. 77—87, 2002. hist index and entropy. Another problem that is [10] H. Blockeel, J. Struyf. Efficient algorithms for investigated in our current research is the decision tree cross-validation, Machine Learning: generalization of the techniques described in this Proceedings of the Eighteenth International paper to classification problems that involve more Conference, 11-18, 2001. than two classes. References [1] A. Brazma, J. Vilo. Gene expression data analysis, FEBS Letters, 480:17-24, 2000. [2] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Cell Biology, Vol. 96, pp. 6745-6750, June 1999. [3] Laura J. van ‘t Veer, Hongyue Dai, Marc J. van de Vijver, Yudong D. He, Augustinus A.M. Hart, Mao Mao, Hans L. Peterse, Karin van der Kooy, Matthew J. Marton, Anke T. Witteveen, George J. Schreiber, Ron M. Kerkhoven, Chris Roberts, Peter S. Linsley, René Bernards and Stephen H. Friend. Gene expression profiling predicts clinical outcome of breast cancer, Nature, 415, pp. 530– 536, 2002. [4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps. PNAS, 96:2907-2912, 1999. [5] J.R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufman, San Mateo, 1993.