Recognizing “talk announcements” in e-mail messages.
E-mail folders - A corpus of messages filed in folders was used. Each folder corresponded to a class label. Three different tests were performed using folders that were highly correlated with the sender of the email, were semantically correlated or were a combination of both (representing noisy data).
E-mail filters - Message categories pertaining to filtering or prioritizing were considered. Here accurate writing of manual classification rules was difficult.
Both RIPPER and TF-IDF showed a steep learning curve
Overall RIPPER did better than TF-IDF however TF-IDF performed best when there is little training data and particularly well with few positive examples
The keyword-spotting ruleset performed well in both cases
when there was a concise keyword-based description of a category
when the categories were semantically defined
It suffered in runtime performance
Conclusions and Critical Remarks
The performance of keyword-spotting rules is efficient enough to conclude that a system combining user constructed and learned keyword-spotting rules is indeed viable
The paper however does not present any illustration of the learned keyword-spotting rules in support of the argument that the rules learned are intuitive enough for users to modify
The assumption that the first 100 words in body capture the keywords, might not work well for TF-IDF
The Power of Decision Tables Ron Kohavi
Decision Tables are one of the simplest hypotheses spaces possible
They are easy to understand
This paper explores the power of Decision Table Majority (DTM) for hypotheses representation
DTM (Decision Tree Majority)
A DTM has two components -
schema which is a set of features and
body which is a multiset of labelled instances.
To label an unlabeled instance I, it is compared against the instances in the body to get Ἷ, the set of matching instances. If Ἶ is empty, then return the majority class in the DTM else return the majority class in Ἷ.
The error err(h, Ť) is estimated using an independent test set Ť, h being the hypothesis. Then the optimal feature subset A* is the one using which an hypothesis can be built so that the this error is minimum
Inducer of DTMs (IDTM)
Two challenges -
Target function is unknown so exact error cannot be predicted
feature subset space 2 n for n features is too large to search
IDTM addresses the second issue by transforming the feature subsets into states and using the best-first search to heuristically search the state space
For accuracy estimation, cross-validation is used
Experiments and results
Datasets with discrete and continuous features were used
IDTM was compared against C4.5 using ten-fold cross validation
Results demonstrate that IDTM can achieve high accuracy in discrete domains using the simplest hypotheses space of DTMs
Even in the case of continuous features, IDTM outperforms C4.5 for some of the datasets beating the expectation
DTMs are suited to concepts where the features interact and thus few features are relevant
An algorithm like IDTM employing best-first search is able to capture these global interactions producing an optimal feature subset
suited for real-time applications
could also be used to select a feature subset that in turn seeds into a more complex algorithm