Decision tables


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Decision tables

  1. 1. Learning Rules that Classify E-Mail William W. Cohen
  2. 2. Motivation <ul><li>Rules are easier to comprehend, so users would be able to modify them </li></ul><ul><li>An interactive system based on combination of automatic and manual learner </li></ul><ul><li>However, rules base their decision on the presence or absence of small number of keywords </li></ul><ul><li>paper aims to evaluate any loss in accuracy by using keyword-spotting rules. It also evaluates the runtime performance to check its fitment to an interactive message filtering system. </li></ul>
  3. 3. What does the paper talk about? <ul><li>Two methods for learning text classifiers on personal e-mail messages </li></ul><ul><ul><li>TF-IDF weighting based method </li></ul></ul><ul><ul><li>modified RIPPER rule learning algorithm for learning keyword-spotting rules </li></ul></ul><ul><li>How much accuracy is lost (if any) in keyword-spotting rules? </li></ul><ul><li>How much CPU time? </li></ul><ul><li>Number of examples needed to learn accurate rule-based classifier? </li></ul>
  4. 4. Experiments setting <ul><li>All words from the fields – from , to , subject - and first 100 words from body were used </li></ul><ul><li>For Modified RIPPER, these four fields were used as set-valued attributes </li></ul><ul><li>For TF-IDF, tokens of the form f_w formed the basis where f is a field and w is a word. e.g. subject_call </li></ul>
  5. 5. Experiments <ul><li>Recognizing “talk announcements” in e-mail messages. </li></ul><ul><li>E-mail folders - A corpus of messages filed in folders was used. Each folder corresponded to a class label. Three different tests were performed using folders that were highly correlated with the sender of the email, were semantically correlated or were a combination of both (representing noisy data). </li></ul><ul><li>E-mail filters - Message categories pertaining to filtering or prioritizing were considered. Here accurate writing of manual classification rules was difficult. </li></ul>
  6. 6. Observations <ul><li>Both RIPPER and TF-IDF showed a steep learning curve </li></ul><ul><li>Overall RIPPER did better than TF-IDF however TF-IDF performed best when there is little training data and particularly well with few positive examples </li></ul><ul><li>The keyword-spotting ruleset performed well in both cases </li></ul><ul><ul><li>when there was a concise keyword-based description of a category </li></ul></ul><ul><ul><li>when the categories were semantically defined </li></ul></ul><ul><li>It suffered in runtime performance </li></ul>
  7. 7. Conclusions and Critical Remarks <ul><li>The performance of keyword-spotting rules is efficient enough to conclude that a system combining user constructed and learned keyword-spotting rules is indeed viable </li></ul><ul><li>The paper however does not present any illustration of the learned keyword-spotting rules in support of the argument that the rules learned are intuitive enough for users to modify </li></ul><ul><li>The assumption that the first 100 words in body capture the keywords, might not work well for TF-IDF </li></ul>
  8. 8. The Power of Decision Tables Ron Kohavi
  9. 9. Motivation <ul><li>Decision Tables are one of the simplest hypotheses spaces possible </li></ul><ul><li>They are easy to understand </li></ul><ul><li>This paper explores the power of Decision Table Majority (DTM) for hypotheses representation </li></ul>
  10. 10. DTM (Decision Tree Majority) <ul><li>A DTM has two components - </li></ul><ul><ul><li>schema which is a set of features and </li></ul></ul><ul><ul><li>body which is a multiset of labelled instances. </li></ul></ul><ul><li>To label an unlabeled instance I, it is compared against the instances in the body to get Ἷ, the set of matching instances. If Ἶ is empty, then return the majority class in the DTM else return the majority class in Ἷ. </li></ul><ul><li>The error err(h, Ť) is estimated using an independent test set Ť, h being the hypothesis. Then the optimal feature subset A* is the one using which an hypothesis can be built so that the this error is minimum </li></ul>
  11. 11. Inducer of DTMs (IDTM) <ul><li>Two challenges - </li></ul><ul><ul><li>Target function is unknown so exact error cannot be predicted </li></ul></ul><ul><ul><li>feature subset space 2 n for n features is too large to search </li></ul></ul><ul><li>IDTM addresses the second issue by transforming the feature subsets into states and using the best-first search to heuristically search the state space </li></ul><ul><li>For accuracy estimation, cross-validation is used </li></ul>
  12. 12. Experiments and results <ul><li>Datasets with discrete and continuous features were used </li></ul><ul><li>IDTM was compared against C4.5 using ten-fold cross validation </li></ul><ul><li>Results demonstrate that IDTM can achieve high accuracy in discrete domains using the simplest hypotheses space of DTMs </li></ul><ul><li>Even in the case of continuous features, IDTM outperforms C4.5 for some of the datasets beating the expectation </li></ul>
  13. 13. Conclusions <ul><li>DTMs are suited to concepts where the features interact and thus few features are relevant </li></ul><ul><li>An algorithm like IDTM employing best-first search is able to capture these global interactions producing an optimal feature subset </li></ul><ul><li>suited for real-time applications </li></ul><ul><li>could also be used to select a feature subset that in turn seeds into a more complex algorithm </li></ul>