Experiments with Randomisation and Boosting for Multi-instance Classification

Experiments with Randomisation
and Boosting for Multi-instance
Classification
Luke Bjerring, James Foulds, Eibe Frank
University of Waikato

September 13, 2011

What's in this talk?
• What is multi-instance learning?
• Basic multi-instance data format in WEKA
• The standard assumption in multi-instance learning
• Learning decision tree and rules
• Ensembles using randomisation
• Diverse density learning
• Boosting diverse density learning
• Experimental comparison
• Conclusions

© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 09/13/11 2

Multi-instance learning
• Generalized (supervised) learning scenario where each
example for learning is a bag of instances

Single-instance
model
one feature vector classification

Multi-instance
model

classification

multiple feature vectors

Figure based on diagram in Dietterich et al (1997)


Example applications
• Applicable whenever an object can best be represented
as an unordered collection of instances
• Two popular application areas in the literature:
− Image classification (e.g. does an image contain a tiger?)
• Approach: image is split into regions, each region becomes
an instance described by a fixed-length feature vector
• Motivation for MI learning: location of object not important for
classification, some “key” regions determine outcome
− Activity of molecules (e.g. does molecule smell musky?)
• Approach: instances describe possible conformations in 3D
space, based on fixed-length feature vector
• Motivation for MI learning: conformations cannot easily be
ordered, only some responsible for activity


Multi-instance data in WEKA
• Bag of data given as value of relation-valued attribute

bag identifier instances in
bag

class label


What's the big deal?
• Multi-instance learning is challenging because instance-
level classifications are assumed to be unknown
− Algorithm is told that an image contains a tiger, but not which
regions are “tiger-like”
− Similarly, a molecule is known to be active (or inactive), but
algorithm is not told which conformation is responsible for this

• Basic (standard) assumption in MI learning: bag is
positive iff it contains at least one positive instance
− Example: molecule is active if at least one conformation is active,
and inactive otherwise

• Generalizations of this are possible that assume
interactions between instances in a bag
• Alternative: instances contribute collectively to bag label

A synthetic example
• 10 positive/negative bags, 10 instances per bag


A synthetic example
• Bag positive iff at least one instance in (0.4,0.6)x(0.4,0.6)


Assigning bag labels to instances...
• 100 positive/negative bags, 10 instances per bag


Partitioning generated by C4.5
• Many leaf nodes, only one of them matters...


Blockeel et al.'s MITI tree learner
• Idea: home in on big positive leaf node, remove
instances associated with that leaf node
y <= 0.3942 : 443 [0 / 443] (-)

y > 0.3942 : 1189

| y <= 0.6004 : 418

| | x <= 0.6000 : 262

| | | x <= 0.3676 : 59 [0 / 59] (-)

| | | x > 0.3676 : 128

| | | | x <= 0.3975 : 2 [0 / 2] (-)

| | | | x > 0.3975 : 118

| | | | | y <= 0.3989 : 1 [0 / 1] (-)

| | | | | y > 0.3989 : 116 [116 / 0] (+)

| | x > 0.6000 : 88 [0 / 88] (-)

| y > 0.6004 : 407 [0 / 407] (-)


How MITI works
• Two key modifications compared to standard top-down
decision tree inducers:
Nodes are expanded in best-first manner, based on proportion of
positive instances (→ identify positive leaf nodes early)
Once a positive leaf node has been found, all bags associated
with this leaf node are removed from the training data
(→ all other instances in these bags are irrelevant)

• Blockeel et al. also use special purpose splitting
criterion and biased estimate of proportion of positives
• Our experiments indicate that it is better to use Gini
index and unbiased estimate of proportion
→Trees are generally slight more accurate and
substantially smaller (also affects runtime)


Learning rules: MIRI
• Conceptual drawback of MITI tree learner: deactivated
data may have already been used to grow other branches
• Simple fix based on separate-and-conquer rule learning
using partial trees:
‒ When positive leaf is found, make the path to this leaf into an if-then
rule, discard the rest of the tree
‒ Start (partial) tree generation from scratch on the remaining data to
generate the next rule
‒ Stop when no positive leaf can be made; add default rule

• Experiments show: resulting rule learner (MIRI) has
similar classification accuracy to MITI
• However: rule sets are much more compact than
corresponding decision trees

Random forests for MI learning
• Random forests are well-known to be high-performance
ensemble classifiers in single-instance learning
• Straightforward to adapt MITI to learn semi-random
decision trees from multi-instance data
– At each node, choose random fixed-size subset of
attributes, then choose best split amongst those
– Also possible to apply semi-random node expansion (not
best-first), but this yields little benefit
• Can trivially apply this to MIRI rule learning as well: it's
based on partially grown MITI trees
• Ensemble can be generated in WEKA using
RandomCommittee meta classifier


Some experimental results: MITI


Some experimental results: MIRI


Maron's diverse density learning
• Idea: identify point x in instance space where positive
bags overlap, centre bell-shaped function at this point
• Using this function, probability that instance Bij is positive,
based on current hypothesis h, is assumed to be:

where hypothesis h includes location x, but also a feature
scaling vector s:

• Instance-level probabilities are turned into bag-level
probabilities using noisy-or function:


Boosting diverse density learning
• Point x and scaling vector s are found using gradient
descent by maximising bag-level likelihood
• Problem: very slow; takes very long to converge
• QuickDD heuristic: find best point x first, using fixed
scaling vector s, then optimise s; if necessary, iterate
• Much faster, similar accuracy on benchmark data (also,
compares favourably to subsampling-based EMDD)
• Makes it computationally practical to apply boosting
(RealAdaboost) to improve accuracy:
– In this case, QuickDD is applied with weighted likelihood,
symmetric learning, and localised model


Some experimental results: Boosted DD


So how do the ensembles compare?


But: improvement on “naive” methods?
• Can apply standard single-instance random forests to
multi-instance data using data transformations...


Summary
• MITI and MIRI are fast methods for learning compact
decision trees and rule sets for MI data
• Randomisation for ensemble learning yields significantly
improved accuracy in both cases
• Heuristic QuickDD variant of diverse density learning
makes it computationally practical to boost DD learning
• Boosting yields substantially improved accuracy
• Neither boosting nor randomisation has clear advantage
in accuracy, but randomisation is much faster
• However: marginal improvement in accuracy compared
to “naive” methods


Where in WEKA?
• Location of multi-instance learners in Explorer GUI:

• Available via package manager in WEKA 3.7, which
also provides MITI, MIRI, and QuickDD

Details on QuickDD for RealAdaboost
• Weights in RealAdaboost are updated using odds ratio:

• Weighted conditional likelihood is used in QuickDD:

• QuickDD model is thresholded at 0.5 probability to achieve
local effect on weight updates:

• Symmetric learning is applied (i.e. both classes are tried as the
positive class in turn)
– Of the two models, the one that maximises weighted
conditional likelihood is added into the ensemble


Random forest vs. bagging and boosting


Experiments with Randomisation and Boosting for Multi-instance Classification

Recommended

Recommended

More Related Content

Similar to Experiments with Randomisation and Boosting for Multi-instance Classification

Similar to Experiments with Randomisation and Boosting for Multi-instance Classification (20)

More from LARCA UPC

More from LARCA UPC (8)

Recently uploaded

Recently uploaded (20)

Experiments with Randomisation and Boosting for Multi-instance Classification