Experiments with Randomisation and Boosting for Multi-instance        Classification   Luke Bjerring, James Foulds, Eibe F...
Whats in this talk?    • What is multi-instance learning?    • Basic multi-instance data format in WEKA    • The standard ...
Multi-instance learning    • Generalized (supervised) learning scenario where each      example for learning is a bag of i...
Example applications    • Applicable whenever an object can best be represented      as an unordered collection of instanc...
Multi-instance data in WEKA  • Bag of data given as value of relation-valued attribute          bag identifier            ...
Whats the big deal?  • Multi-instance learning is challenging because instance-    level classifications are assumed to be...
A synthetic example  • 10 positive/negative bags, 10 instances per bag© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAI...
A synthetic example  • Bag positive iff at least one instance in (0.4,0.6)x(0.4,0.6)© THE UNIVERSITY OF WAIKATO • TE WHARE...
Assigning bag labels to instances...  • 100 positive/negative bags, 10 instances per bag© THE UNIVERSITY OF WAIKATO • TE W...
Partitioning generated by C4.5  • Many leaf nodes, only one of them matters...© THE UNIVERSITY OF WAIKATO • TE WHARE WANAN...
Partitioning generated by C4.5  • Many leaf nodes, only one of them matters...© THE UNIVERSITY OF WAIKATO • TE WHARE WANAN...
Blockeel et al.s MITI tree learner  • Idea: home in on big positive leaf node, remove    instances associated with that le...
How MITI works  • Two key modifications compared to standard top-down    decision tree inducers:            ­ Nodes are ex...
Learning rules: MIRI  • Conceptual drawback of MITI tree learner: deactivated    data may have already been used to grow o...
Random forests for MI learning  • Random forests are well-known to be high-performance    ensemble classifiers in single-i...
Some experimental results: MITI© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO   09/13/11   16
Some experimental results: MIRI© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO   09/13/11   17
Marons diverse density learning  • Idea: identify point x in instance space where positive    bags overlap, centre bell-sh...
Boosting diverse density learning  • Point x and scaling vector s are found using gradient    descent by maximising bag-le...
Some experimental results: Boosted DD© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO   09/13/11   20
So how do the ensembles compare?© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO   09/13/11   21
But: improvement on “naive” methods?  • Can apply standard single-instance random forests to    multi-instance data using ...
Summary  • MITI and MIRI are fast methods for learning compact    decision trees and rule sets for MI data  • Randomisatio...
Where in WEKA?  • Location of multi-instance learners in Explorer GUI:  • Available via package manager in WEKA 3.7, which...
Details on QuickDD for RealAdaboost  • Weights in RealAdaboost are updated using odds ratio:  • Weighted conditional likel...
Random forest vs. bagging and boosting© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO   09/13/11   26
Upcoming SlideShare
Loading in …5
×

Experiments with Randomisation and Boosting for Multi-instance Classification

1,501 views
1,406 views

Published on

A fairly recent development in the WEKA software has been the addition of algorithms for multi-instance classification, in particular, methods for ensemble learning. Ensemble classification is a well-known approach for obtaining highly accurate classifiers for single-instance data. This talk will first discuss how randomisation can be applied to multi-instance data by adapting Blockeel et al.'s multi-instance tree inducer to form an ensemble classifier, and then investigate how Maron's diverse density learning method can be used as a weak classifier to form an ensemble using boosting. Experimental results show the benefit of ensemble learning in both cases.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,501
On SlideShare
0
From Embeds
0
Number of Embeds
769
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Experiments with Randomisation and Boosting for Multi-instance Classification

  1. 1. Experiments with Randomisation and Boosting for Multi-instance Classification Luke Bjerring, James Foulds, Eibe Frank University of Waikato September 13, 2011
  2. 2. Whats in this talk? • What is multi-instance learning? • Basic multi-instance data format in WEKA • The standard assumption in multi-instance learning • Learning decision tree and rules • Ensembles using randomisation • Diverse density learning • Boosting diverse density learning • Experimental comparison • Conclusions© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 2
  3. 3. Multi-instance learning • Generalized (supervised) learning scenario where each example for learning is a bag of instances Single-instance model one feature vector classification Multi-instance model classification multiple feature vectors Figure based on diagram in Dietterich et al (1997)© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 3
  4. 4. Example applications • Applicable whenever an object can best be represented as an unordered collection of instances • Two popular application areas in the literature: − Image classification (e.g. does an image contain a tiger?) • Approach: image is split into regions, each region becomes an instance described by a fixed-length feature vector • Motivation for MI learning: location of object not important for classification, some “key” regions determine outcome − Activity of molecules (e.g. does molecule smell musky?) • Approach: instances describe possible conformations in 3D space, based on fixed-length feature vector • Motivation for MI learning: conformations cannot easily be ordered, only some responsible for activity© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 4
  5. 5. Multi-instance data in WEKA • Bag of data given as value of relation-valued attribute bag identifier instances in bag class label© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 5
  6. 6. Whats the big deal? • Multi-instance learning is challenging because instance- level classifications are assumed to be unknown − Algorithm is told that an image contains a tiger, but not which regions are “tiger-like” − Similarly, a molecule is known to be active (or inactive), but algorithm is not told which conformation is responsible for this • Basic (standard) assumption in MI learning: bag is positive iff it contains at least one positive instance − Example: molecule is active if at least one conformation is active, and inactive otherwise • Generalizations of this are possible that assume interactions between instances in a bag • Alternative: instances contribute collectively to bag label© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 6
  7. 7. A synthetic example • 10 positive/negative bags, 10 instances per bag© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 7
  8. 8. A synthetic example • Bag positive iff at least one instance in (0.4,0.6)x(0.4,0.6)© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 8
  9. 9. Assigning bag labels to instances... • 100 positive/negative bags, 10 instances per bag© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 9
  10. 10. Partitioning generated by C4.5 • Many leaf nodes, only one of them matters...© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 10
  11. 11. Partitioning generated by C4.5 • Many leaf nodes, only one of them matters...© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 11
  12. 12. Blockeel et al.s MITI tree learner • Idea: home in on big positive leaf node, remove instances associated with that leaf node y <= 0.3942 : 443 [0 / 443] (-) y > 0.3942 : 1189 | y <= 0.6004 : 418 | | x <= 0.6000 : 262 | | | x <= 0.3676 : 59 [0 / 59] (-) | | | x > 0.3676 : 128 | | | | x <= 0.3975 : 2 [0 / 2] (-) | | | | x > 0.3975 : 118 | | | | | y <= 0.3989 : 1 [0 / 1] (-) | | | | | y > 0.3989 : 116 [116 / 0] (+) | | x > 0.6000 : 88 [0 / 88] (-) | y > 0.6004 : 407 [0 / 407] (-)© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 12
  13. 13. How MITI works • Two key modifications compared to standard top-down decision tree inducers: ­ Nodes are expanded in best-first manner, based on proportion of positive instances (→ identify positive leaf nodes early) ­ Once a positive leaf node has been found, all bags associated with this leaf node are removed from the training data (→ all other instances in these bags are irrelevant) • Blockeel et al. also use special purpose splitting criterion and biased estimate of proportion of positives • Our experiments indicate that it is better to use Gini index and unbiased estimate of proportion →Trees are generally slight more accurate and substantially smaller (also affects runtime)© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 13
  14. 14. Learning rules: MIRI • Conceptual drawback of MITI tree learner: deactivated data may have already been used to grow other branches • Simple fix based on separate-and-conquer rule learning using partial trees: ‒ When positive leaf is found, make the path to this leaf into an if-then rule, discard the rest of the tree ‒ Start (partial) tree generation from scratch on the remaining data to generate the next rule ‒ Stop when no positive leaf can be made; add default rule • Experiments show: resulting rule learner (MIRI) has similar classification accuracy to MITI • However: rule sets are much more compact than corresponding decision trees© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 14
  15. 15. Random forests for MI learning • Random forests are well-known to be high-performance ensemble classifiers in single-instance learning • Straightforward to adapt MITI to learn semi-random decision trees from multi-instance data – At each node, choose random fixed-size subset of attributes, then choose best split amongst those – Also possible to apply semi-random node expansion (not best-first), but this yields little benefit • Can trivially apply this to MIRI rule learning as well: its based on partially grown MITI trees • Ensemble can be generated in WEKA using RandomCommittee meta classifier© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 15
  16. 16. Some experimental results: MITI© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 16
  17. 17. Some experimental results: MIRI© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 17
  18. 18. Marons diverse density learning • Idea: identify point x in instance space where positive bags overlap, centre bell-shaped function at this point • Using this function, probability that instance Bij is positive, based on current hypothesis h, is assumed to be: where hypothesis h includes location x, but also a feature scaling vector s: • Instance-level probabilities are turned into bag-level probabilities using noisy-or function:© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 18
  19. 19. Boosting diverse density learning • Point x and scaling vector s are found using gradient descent by maximising bag-level likelihood • Problem: very slow; takes very long to converge • QuickDD heuristic: find best point x first, using fixed scaling vector s, then optimise s; if necessary, iterate • Much faster, similar accuracy on benchmark data (also, compares favourably to subsampling-based EMDD) • Makes it computationally practical to apply boosting (RealAdaboost) to improve accuracy: – In this case, QuickDD is applied with weighted likelihood, symmetric learning, and localised model© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 19
  20. 20. Some experimental results: Boosted DD© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 20
  21. 21. So how do the ensembles compare?© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 21
  22. 22. But: improvement on “naive” methods? • Can apply standard single-instance random forests to multi-instance data using data transformations...© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 22
  23. 23. Summary • MITI and MIRI are fast methods for learning compact decision trees and rule sets for MI data • Randomisation for ensemble learning yields significantly improved accuracy in both cases • Heuristic QuickDD variant of diverse density learning makes it computationally practical to boost DD learning • Boosting yields substantially improved accuracy • Neither boosting nor randomisation has clear advantage in accuracy, but randomisation is much faster • However: marginal improvement in accuracy compared to “naive” methods© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 23
  24. 24. Where in WEKA? • Location of multi-instance learners in Explorer GUI: • Available via package manager in WEKA 3.7, which also provides MITI, MIRI, and QuickDD© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 24
  25. 25. Details on QuickDD for RealAdaboost • Weights in RealAdaboost are updated using odds ratio: • Weighted conditional likelihood is used in QuickDD: • QuickDD model is thresholded at 0.5 probability to achieve local effect on weight updates: • Symmetric learning is applied (i.e. both classes are tried as the positive class in turn) – Of the two models, the one that maximises weighted conditional likelihood is added into the ensemble© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 25
  26. 26. Random forest vs. bagging and boosting© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 26

×