• Save
Decision tables
Upcoming SlideShare
Loading in...5

Decision tables






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Decision tables Decision tables Presentation Transcript

  • Learning Rules that Classify E-Mail William W. Cohen
  • Motivation
    • Rules are easier to comprehend, so users would be able to modify them
    • An interactive system based on combination of automatic and manual learner
    • However, rules base their decision on the presence or absence of small number of keywords
    • paper aims to evaluate any loss in accuracy by using keyword-spotting rules. It also evaluates the runtime performance to check its fitment to an interactive message filtering system.
  • What does the paper talk about?
    • Two methods for learning text classifiers on personal e-mail messages
      • TF-IDF weighting based method
      • modified RIPPER rule learning algorithm for learning keyword-spotting rules
    • How much accuracy is lost (if any) in keyword-spotting rules?
    • How much CPU time?
    • Number of examples needed to learn accurate rule-based classifier?
    View slide
  • Experiments setting
    • All words from the fields – from , to , subject - and first 100 words from body were used
    • For Modified RIPPER, these four fields were used as set-valued attributes
    • For TF-IDF, tokens of the form f_w formed the basis where f is a field and w is a word. e.g. subject_call
    View slide
  • Experiments
    • Recognizing “talk announcements” in e-mail messages.
    • E-mail folders - A corpus of messages filed in folders was used. Each folder corresponded to a class label. Three different tests were performed using folders that were highly correlated with the sender of the email, were semantically correlated or were a combination of both (representing noisy data).
    • E-mail filters - Message categories pertaining to filtering or prioritizing were considered. Here accurate writing of manual classification rules was difficult.
  • Observations
    • Both RIPPER and TF-IDF showed a steep learning curve
    • Overall RIPPER did better than TF-IDF however TF-IDF performed best when there is little training data and particularly well with few positive examples
    • The keyword-spotting ruleset performed well in both cases
      • when there was a concise keyword-based description of a category
      • when the categories were semantically defined
    • It suffered in runtime performance
  • Conclusions and Critical Remarks
    • The performance of keyword-spotting rules is efficient enough to conclude that a system combining user constructed and learned keyword-spotting rules is indeed viable
    • The paper however does not present any illustration of the learned keyword-spotting rules in support of the argument that the rules learned are intuitive enough for users to modify
    • The assumption that the first 100 words in body capture the keywords, might not work well for TF-IDF
  • The Power of Decision Tables Ron Kohavi
  • Motivation
    • Decision Tables are one of the simplest hypotheses spaces possible
    • They are easy to understand
    • This paper explores the power of Decision Table Majority (DTM) for hypotheses representation
  • DTM (Decision Tree Majority)
    • A DTM has two components -
      • schema which is a set of features and
      • body which is a multiset of labelled instances.
    • To label an unlabeled instance I, it is compared against the instances in the body to get Ἷ, the set of matching instances. If Ἶ is empty, then return the majority class in the DTM else return the majority class in Ἷ.
    • The error err(h, Ť) is estimated using an independent test set Ť, h being the hypothesis. Then the optimal feature subset A* is the one using which an hypothesis can be built so that the this error is minimum
  • Inducer of DTMs (IDTM)
    • Two challenges -
      • Target function is unknown so exact error cannot be predicted
      • feature subset space 2 n for n features is too large to search
    • IDTM addresses the second issue by transforming the feature subsets into states and using the best-first search to heuristically search the state space
    • For accuracy estimation, cross-validation is used
  • Experiments and results
    • Datasets with discrete and continuous features were used
    • IDTM was compared against C4.5 using ten-fold cross validation
    • Results demonstrate that IDTM can achieve high accuracy in discrete domains using the simplest hypotheses space of DTMs
    • Even in the case of continuous features, IDTM outperforms C4.5 for some of the datasets beating the expectation
  • Conclusions
    • DTMs are suited to concepts where the features interact and thus few features are relevant
    • An algorithm like IDTM employing best-first search is able to capture these global interactions producing an optimal feature subset
    • suited for real-time applications
    • could also be used to select a feature subset that in turn seeds into a more complex algorithm