Modeling XCS in class imbalances: Population sizing and parameter settings

Modeling XCS in Class
Imbalances: Population Size
and Parameter Settings

Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1

1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University

2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign

Framework

New instance

Information based Knowledge
on experience extraction
Data
Learner
Domain
model

Predicted Output
Examples

Consisting Counter-examples
of

In real-world domains, typically:
Higher cost to obtain examples of the concept to be learnt
So, distribution of examples in the training dataset is usually imbalanced

Applications:
Fraud detection
Medical diagnosis of rare illnesses
Detection of oil spills in satellite images
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2
GECCO’07

Framework

Do learners suffer from class imbalances?

Training Minimize the
Learner
Set global error

num. errorsc1 + num. errorsc 2
error =
Biased towards
number examples
the overwhelmed class

Maximization of the overwhelmed class accuracy,
in detriment of the minority class.

And what about incremental learning?
– Sampling instances of the minority class less frequently
GECCO’07

Aim

Facetwise analysis of XCS for class imbalances

How can XCS create rules of the minority class

When XCS will remove these rules

Population size bound with respect to the imbalance ratio

Until which imbalance ratio would XCS be able to learn
from the minority class?

GECCO’07

1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems

Outline 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions

4. XCS on the one-bit Problem
6. Results
7. Conclusions

GECCO’07


Description of XCS 4. XCS on the one-bit Problem
6. Results
7. Conclusions

In single-step tasks:

Environment

Match Set [M]
Match Set [M]
Problem
Minority
Majority
classinstance
instance
1C A PεF num as ts exp
Selected
action
Population [P]
Population [P] 6C A PεF num as ts exp
Match set
Match set
REWARD
…
…
generation
generation
Prediction Array 1000/0
…
c1 c2 cn
6C A PεF num as ts exp Random Action
Nourished niches
Starved niches
…
…
Action Set [A]
Deletion
Classifier
Selection, Reproduction,
Parameters
Mutation
Update
…
Genetic Algorithm

Problem niche: the schema defines the relevant
attributes for a particular problem niche.
Eg: 10**1*

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


Facetwise Analysis 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Study XCS capabilities to provide representatives of
starved niches:
– Population covering
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
Derive a bound on the population size to guarantee that
XCS will learn starved niches
Depart from theory developed for XCS
– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS
– (Butz, Goldberg & Lanzi, 04): Learning time bound
– (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche
support
– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled
Approach to LCS Analysis and Design.

GECCO’07


6. Results
7. Conclusions

Assumptions
– Problems consisting of n classes
– One class sampled with a lower frequency: minority class

num. instances of any class other than the minority class
ir =
num. instances of the minority class

– Probability of sampling an instance of the minority class:

1
Ps(min) =
1 + ir

GECCO’07


6. Results
7. Conclusions

Facetwise Analysis
– Population initialization
– Population size bound

GECCO’07


Population Initialization 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Covering procedure
– Covering: Generalize over the input with probability P#
– P# needs to satisfy the covering challenge (Butz et al., 01)

Would I trigger covering on minority class instances?
– Probability that one instance is covered, by, at least,
one rule is (Butz et. al, 01): Population Input
specificity length
Initially 1 – P#

Population size

GECCO’07


Population Initialization 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Probability to apply covering on the first minority class instance

l = 20

GECCO’07


6. Results
7. Conclusions

Facetwise Analysis

GECCO’07


Creation of Representatives of

Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions

Assumptions
– Covering has not provided any representative of starved niches
– Simplified model: only consider mutation in our model.

How can we generate representative of starved niches?
– In the population there are:
• Representative of nourished niches
• Overgeneral classifiers

– Specifying correctly all the bits of the schema that represents the
starved niche

GECCO’07


Creation of Representatives of

Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions

Summing up, time to get the first representative of a
starved niche

n: number of classes
μ: Mutation probability
km: Order of the schema

Time to extinction

GECCO’07


6. Results
7. Conclusions

Facetwise Analysis

GECCO’07


Bounding the Population Size 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Population size bound to guarantee that there will be
representatives of starved niches

– Require that:

– Bound:

μ: Mutation probability
km: Order of the schema

GECCO’07


Bounding the Population Size 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Population size bound to guarantee that representatives of
starved niches will receive a genetic opportunity:
– Consider θGA = 0

– We require that the best representative of a starved niche receive a
genetic event before being removed

– Population size bound:

ir: Imbalance ratio

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


Design of Test Problems 4. XCS on the one-bit Problem
6. Results
7. Conclusions

One-bit problem
Condition
length (l)
000110 :0 Value of the left-most bit

– Only two schemas of order one: 0***** and 1*****

Parity problem Condition
length (l)
Number of 1 mod 2
01001010 :1
Relevant
bits ( k)

– The k bits of parity form a single building block

1
Ps(min) =
Undersampling instances of the class labeled as 1
1 + ir

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


XCS on the one-bit Problem 4. XCS on the one-bit Problem
6. Results
7. Conclusions

XCS configuration
α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6
selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir

Evaluation of the results:
– Minimum population size to achieve:
TP rate * TN rate > 95%

– Results are averages over 25 seeds

GECCO’07


XCS on the one-bit Problem 4. XCS on the one-bit Problem
6. Results
7. Conclusions

N remains constant up to ir = 64

N increases linearly from ir=64
to ir=256

N increases exponentially from
ir=256 to ir=1024

Higher ir could not be solved

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


Analysis of the Deviations 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Inheritance Error of Classifiers’ Parameters
– New promising representatives of starved niches are created from classifiers that
belong to nourished niches.
– These new promising rules inherit parameters from these classifiers. This is
specially delicate for the action set size (as).
– Approach: initialize as=1.

Subsumption
– An overgeneral classifier of the majority class may receive ir positive reward
before receiving the first negative reward
– Approach: set θsub>ir

Stabilizing the population before testing
– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA switched
off.

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem
6. Results
7. Conclusions

N remains constant up to ir = 128

For higher ir, N slightly increases

We only have to guarantee that a
representative of the starved niche
will be created

GECCO’07


XCS+PCM in the Parity Problem 4. XCS on the one-bit Problem
6. Results
7. Conclusions

Building blocks of size 3 need to
be processed

Empirical results agree with the
theory

Population size bound to guarantee
that a representative of the niche
will receive a genetic event

GECCO’07

Outline

6. Results
7. Conclusions

GECCO’07


Conclusions and Further Work 4. XCS on the one-bit Problem
6. Results
7. Conclusions

We derived models that analyzed the representatives of starved
niches provided by covering and mutation
A population size bound was derived
We saw that the empirical observations met the theory if four
aspects were considered:
– as initialization
– Subsumption
– Stabilization of the population

XCS really robust to class imbalances
Further analysis of the covering operator

GECCO’07

Motivation

And what about incremental learning?

Sampling instances of the minority class less frequently

This influences the mechanisms of XCS (Orriols & Bernadó, 2006)

GECCO’07


6. Results
7. Conclusions

Niched Mutation vs. Free Mutation
– Classifiers can only be created if minority class instances are sampled

Inheritance Error of Classifiers’ Parameters
– New promising representatives of starved niches are created from
classifiers that belong to nourished niches
– These new promising rules inherit parameters from these classifiers.
This is specially delicate for the action set size (as).
– Approach: initialize as=1.

GECCO’07


6. Results
7. Conclusions

Subsumption
– An overgeneral classifier of the majority class may receive ir positive
reward before receiving the first negative reward
– Approach: set θsub>ir

Stabilizing the population before testing
– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA
switched off.

We gather all these little tweaks in XCS+PMC

GECCO’07

Modeling XCS in class imbalances: Population sizing and parameter settings

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Modeling XCS in class imbalances: Population sizing and parameter settings

Similar to Modeling XCS in class imbalances: Population sizing and parameter settings (20)

More from kknsastry

More from kknsastry (20)

Recently uploaded

Recently uploaded (20)

Modeling XCS in class imbalances: Population sizing and parameter settings