Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lazy Association Classification


Published on

Published in: Business, Education

Lazy Association Classification

  1. 1. Lazy Associative Classification Adriano Veloso, Wagner Meira Jr, Mohammed J. Zaki Computer Science Dept, Federal University of Minas Gerais, Brazil Computer Science Dept, Rensselaer Polytechnic Institute, Troy, USA ICDM’06 Reporter: Chieh-Chang Yang Date: 2007.03.19
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>Eager Associative Classifiers </li></ul><ul><li>Lazy Associative Classifiers </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  3. 3. Introduction <ul><li>Classification is a well-suited problem and several models have been proposed. </li></ul><ul><li>Among these models, decision tree classifiers are particularly suited because it’s relatively fast and simple. </li></ul><ul><li>Decision trees perform a greedy search for rules by selecting the most promising features. Such greedy search may prune important rules. </li></ul>
  4. 4. Introduction <ul><li>As an alternative models, associative classifiers first mine association rules from training data, and use these rules to build a classifier. </li></ul><ul><li>Associative classifiers perform a global search for rules satisfying some quality constraints. However, this global search may generate a large number of rules. </li></ul>
  5. 5. Introduction <ul><li>In this paper we propose a novel lazy associative classifier, in which the computation is performed on a demand-driven basis. It focus on the features that actually occur within the test instance while generating the rules. </li></ul><ul><li>We assess the performance of the lazy associative classifier, and prove that it outperforms the eager associative one and decision tree classifier. </li></ul>
  6. 6. Related Work <ul><li>Most existing work on associative classification relies on developing new algorithms to improve the overall accuracy. </li></ul><ul><li>CBA generate a single rule-set and rank the rules according to their confidence/support. It selects the best rule to be applied to each test instance. </li></ul><ul><li>HARMONY uses an instance-centric rule-generation approach that it assures the inclusion of at least one rule for each training instance in the final rule set. </li></ul><ul><li>CMAR uses multiple rules to perform the classification. </li></ul><ul><li>CPAR adopts a greedy technique to generate a smaller rule-sets. </li></ul><ul><li>CAEP explores the concept of emerging patterns that usually predict accurately all classes even if their populations are unbalanced. </li></ul>
  7. 7. Related Work <ul><li>Rule induction classifiers includes RISE, RIPPER, and SLEEPER. </li></ul><ul><li>RISE performs a complete overfiting by considering each instance as a rule, and then generalizes the rules. </li></ul><ul><li>RIPPER and SLEEPER extend the “ overfit and prune ” paradigm, that is, they start with a large rule-set and prune it using several heuristics. </li></ul><ul><li>SLEEPER also associates a probability with each rule, weighting the contribution of the rule during classification. </li></ul>
  8. 8. Eager Associative Classifier <ul><li>Decision Trees and Decision Rules </li></ul><ul><li>Entropy-based Associative Classifier </li></ul>
  9. 9. Decision trees and decision rules <ul><li>Given any subset of training instance S, let s i denote the number of instance with class c i , and |S|= Σ s i . Then p i =s i /|S| denotes the probability of class c i in S. </li></ul><ul><li>The entropy of S is E(S)= Σ p i log p i . </li></ul><ul><li>For any partition of S into m subsets, with S= ∪ S i , the split entropy is E({S i })= Σ(|S i |/|S|) E(S i ). </li></ul><ul><li>The information gain for the split is I(S,{S i })=E(S)-E({S i }). </li></ul>
  10. 10. Decision trees and decision rules <ul><li>A decision tree is built using a greedy, recursive splitting strategy, where the best split is chosen at each internal node according to the information gain. </li></ul><ul><li>The splitting at a node stops when all instances are from a single class or if the size of the node falls below a minimum support threshold, called minsup . </li></ul>
  11. 11. Decision trees and decision rules
  12. 12. Entropy-based Associative Classifier <ul><li>We denote as class association rules (CAR) those association rules of the form X-> c, where the antecedent (X) is composed of feature variables and the consequent (c) is just a class. </li></ul><ul><li>CAR may be generated by a slightly modified association rule mining algorithm. Each itemset must contain a class and the rule generation also follows a template in which the consequent is just a class. </li></ul><ul><li>CARs are ranked in decreasing order of information gain . During the testing phase, the associative classifier simply checks whether each CAR matches the test instance; the class associated with the first match is chosen. </li></ul>
  13. 13. Entropy-based Associative Classifier
  14. 14. Entropy-based Associative Classifier <ul><li>Three CARs match the test instance of our example using EAC: </li></ul><ul><li>{windy=false and temperature=cool->play=yes} </li></ul><ul><li>{outlook=sunny and humidity=high->play=no} </li></ul><ul><li>{outlook=sunny and temperature=cool->play=yes} </li></ul><ul><li>First rule is selected. In our example, the test case is recognized by only one rule in the decision tree, while the same test case is recognized by three CARs in the associative classifier. </li></ul>
  15. 15. Entropy-based Associative Classifier <ul><li>They discuss two theorems about the performance of decision trees and eager associative classifiers. They have proved both are true. </li></ul><ul><li>The rules derived from a decision tree are a subset of the CARs mined using an eager associative classifier based on information gain. </li></ul><ul><li>CARs perform no worse than decision tree rules, according to the information gain principle. </li></ul>
  16. 16. Entropy-based Associative Classifier
  17. 17. Lazy Associative Classifier
  18. 18. Lazy Associative Classifier <ul><li>By definition, both C e A and C l A are composed of CARs {X->c} in which X ≤A. Because D A ≤D, for a given minsup, if a rule {X->c} is frequent in D, then it must also be frequent in D A . Since C l A is generated from D A and C e A is generated from D (and D A ≤D ), C e A ≤ C l A. </li></ul>
  19. 19. Lazy Associative Classifier
  20. 20. Lazy Associative Classifier & Eager Associative Classifier <ul><li>Suppose minsup is set to 40% (|D|=10, so must occur at least 4 times in D), the set of CARs found by eager classifier is composed of these two: </li></ul><ul><li>{windy=false and humidity=normal->play=yes} </li></ul><ul><li>{windy=false and temperature=cool->play=yes} </li></ul><ul><li>None of the two CARs matches the testing instance. </li></ul><ul><li>The lazy classifier found two CARs in D A (only need two times in D A ): </li></ul><ul><li>{outlook=overcast->play=yes} </li></ul><ul><li>{temperature=hot->play=yes} </li></ul>
  21. 21. Lazy Associative Classifier & Eager Associative Classifier <ul><li>Intuitively, lazy classifier perform better than eager classifiers because of two characteristics: </li></ul><ul><li>Missing CARs : Eager classifiers search for CARs in a large search space, which is induced by all features of the training data. While this strategy generates a large rule-set, CARs that are important to some specific test instances may be missed. </li></ul><ul><li>Highly Disjunctive Spaces : Eager classifiers generate CARs before the test instance is even known. For this reason, eager classifiers often combine small disjuncts in order to generate more general predicitions. This can reduce classification performance in highly disjunctive spaces. </li></ul>
  22. 22. Problems of Lazy Associative Classifier <ul><li>The aforementioned discussion show an intuitive concept: the more CARs are generated, the better is the classifier. </li></ul><ul><li>However, the same concept also leads to overfitting, reducing the generalization and affecting the classification accuracy. </li></ul>
  23. 23. Problems of Lazy Associative Classifier <ul><li>In fact, overfitting and high sensitivity to irrelevant features are shortcomings of lazy classifications. </li></ul><ul><li>A natural solution is to identify and discard the irrelevant features. Thus, feature selection methods may be used. </li></ul><ul><li>In experiments we show that our lazy classifiers were not seriously affected by overfitting because only the best and more general CARs are used. </li></ul>
  24. 24. Problems of Lazy Associative Classifier <ul><li>Another disadvantage is that the lazy classifiers is typically require more work to classify all test instances. </li></ul><ul><li>However, simple caching mechanisms are very effective to decrease this workload. The basic idea is that different test instances may induce different rule-sets, but different rule-sets may share common CARs. </li></ul>
  25. 25. Experimental Evaluation <ul><li>In this section we show the experimental results for the evaluation of the proposed classifiers in terms of classification effectiveness and computational performance . </li></ul><ul><li>Our evaluation is based on a comparison against C4.5 and LazyDT decision tree classifiers . We also compare our numbers to some results from other associative classifiers , such as CPAR, CMAR, and HARMONY, and to some results from rule induction classifiers , such as RISE, RIPPER, and SLEEPER. </li></ul>
  26. 26. Experimental Evaluation <ul><li>We used 26 datasets from the UCI Machine Learning Repository to compare the effectiveness of the classifiers. </li></ul><ul><li>In all experiments we used 10-fold cross-validation. </li></ul><ul><li>We quantify the classification effectiveness of the classifiers through the conventional error rate. </li></ul><ul><li>We used the entropy method to discretize continuous attributes. </li></ul><ul><li>In the experiments we set minimum confidence to 50% and minsup to 1%. </li></ul>
  27. 27. Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers
  28. 28. Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers
  29. 29. Comparison Between Rule Induction and Associative Classifiers
  30. 30. Overfitting and Underfitting
  31. 31. Execution Times
  32. 32. Conclusions <ul><li>We present an assessment of associative classification and propose improvements to associative classification by introducing a novel lazy classifier. </li></ul><ul><li>An important feature of the proposed lazy classifier is its ability to deal with the small disjuncts problem. </li></ul><ul><li>We also compare the proposed classifiers against other three associative classifiers and three rule induction classifiers and outperform them in most of the cases. </li></ul>