This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
Modeling XCS in class imbalances: Population sizing and parameter settings
1. Modeling XCS in Class
Imbalances: Population Size
and Parameter Settings
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
2. Framework
New instance
Information based Knowledge
on experience extraction
Data
Learner
Domain
model
Predicted Output
Examples
Consisting Counter-examples
of
In real-world domains, typically:
Higher cost to obtain examples of the concept to be learnt
So, distribution of examples in the training dataset is usually imbalanced
Applications:
Fraud detection
Medical diagnosis of rare illnesses
Detection of oil spills in satellite images
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 2
GECCO’07
3. Framework
Do learners suffer from class imbalances?
Training Minimize the
Learner
Set global error
num. errorsc1 + num. errorsc 2
error =
Biased towards
number examples
the overwhelmed class
Maximization of the overwhelmed class accuracy,
in detriment of the minority class.
And what about incremental learning?
– Sampling instances of the minority class less frequently
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 3
GECCO’07
4. Aim
Facetwise analysis of XCS for class imbalances
How can XCS create rules of the minority class
When XCS will remove these rules
Population size bound with respect to the imbalance ratio
Until which imbalance ratio would XCS be able to learn
from the minority class?
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 4
GECCO’07
5. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Outline 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 5
GECCO’07
6. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Description of XCS 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
In single-step tasks:
Environment
Match Set [M]
Match Set [M]
Problem
Minority
Majority
classinstance
instance
1C A PεF num as ts exp
1C A PεF num as ts exp
Selected
3C A PεF num as ts exp
3C A PεF num as ts exp
action
5C A PεF num as ts exp
5C A PεF num as ts exp
Population [P]
Population [P] 6C A PεF num as ts exp
6C A PεF num as ts exp
Match set
Match set
REWARD
…
…
generation
generation
1C A PεF num as ts exp
1C A PεF num as ts exp
Prediction Array 1000/0
2C A PεF num as ts exp
2C A PεF num as ts exp
3C A PεF num as ts exp
3C A PεF num as ts exp
…
c1 c2 cn
4C A PεF num as ts exp
4C A PεF num as ts exp
5C A PεF num as ts exp
5C A PεF num as ts exp
6C A PεF num as ts exp
6C A PεF num as ts exp Random Action
Nourished niches
Starved niches
…
…
Action Set [A]
1C A PεF num as ts exp
Deletion
Classifier
3C A PεF num as ts exp
Selection, Reproduction,
Parameters
Mutation
5C A PεF num as ts exp
Update
6C A PεF num as ts exp
…
Genetic Algorithm
Problem niche: the schema defines the relevant
attributes for a particular problem niche.
Eg: 10**1*
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 6
GECCO’07
7. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 7
GECCO’07
8. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Study XCS capabilities to provide representatives of
starved niches:
– Population covering
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
Derive a bound on the population size to guarantee that
XCS will learn starved niches
Depart from theory developed for XCS
– (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS
– (Butz, Goldberg & Lanzi, 04): Learning time bound
– (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche
support
– (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled
Approach to LCS Analysis and Design.
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 8
GECCO’07
9. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Assumptions
– Problems consisting of n classes
– One class sampled with a lower frequency: minority class
num. instances of any class other than the minority class
ir =
num. instances of the minority class
– Probability of sampling an instance of the minority class:
1
Ps(min) =
1 + ir
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 9
GECCO’07
10. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 10
GECCO’07
11. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Population Initialization 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Covering procedure
– Covering: Generalize over the input with probability P#
– P# needs to satisfy the covering challenge (Butz et al., 01)
Would I trigger covering on minority class instances?
– Probability that one instance is covered, by, at least,
one rule is (Butz et. al, 01): Population Input
specificity length
Initially 1 – P#
Population size
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 11
GECCO’07
12. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Population Initialization 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Probability to apply covering on the first minority class instance
l = 20
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 12
GECCO’07
13. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 13
GECCO’07
14. 1. Description of XCS
Creation of Representatives of
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions
Assumptions
– Covering has not provided any representative of starved niches
– Simplified model: only consider mutation in our model.
How can we generate representative of starved niches?
– In the population there are:
• Representative of nourished niches
• Overgeneral classifiers
– Specifying correctly all the bits of the schema that represents the
starved niche
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 14
GECCO’07
15. 1. Description of XCS
Creation of Representatives of
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
Starved Niches 5. Analysis of Deviations
6. Results
7. Conclusions
Summing up, time to get the first representative of a
starved niche
n: number of classes
μ: Mutation probability
km: Order of the schema
Time to extinction
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 15
GECCO’07
16. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Facetwise Analysis 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Facetwise Analysis
– Population initialization
– Generation of correct representatives of starved niches
– Time of extinction of these correct classifiers
– Population size bound
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 16
GECCO’07
17. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Population size bound to guarantee that there will be
representatives of starved niches
– Require that:
– Bound:
n: number of classes
μ: Mutation probability
km: Order of the schema
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 17
GECCO’07
18. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Bounding the Population Size 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Population size bound to guarantee that representatives of
starved niches will receive a genetic opportunity:
– Consider θGA = 0
– We require that the best representative of a starved niche receive a
genetic event before being removed
– Population size bound:
n: number of classes
ir: Imbalance ratio
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 18
GECCO’07
19. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 19
GECCO’07
20. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Design of Test Problems 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
One-bit problem
Condition
length (l)
000110 :0 Value of the left-most bit
– Only two schemas of order one: 0***** and 1*****
Parity problem Condition
length (l)
Number of 1 mod 2
01001010 :1
Relevant
bits ( k)
– The k bits of parity form a single building block
1
Ps(min) =
Undersampling instances of the class labeled as 1
1 + ir
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 20
GECCO’07
21. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 21
GECCO’07
22. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS on the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
XCS configuration
α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6
selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir
Evaluation of the results:
– Minimum population size to achieve:
TP rate * TN rate > 95%
– Results are averages over 25 seeds
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 22
GECCO’07
23. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS on the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
N remains constant up to ir = 64
N increases linearly from ir=64
to ir=256
N increases exponentially from
ir=256 to ir=1024
Higher ir could not be solved
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 23
GECCO’07
24. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 24
GECCO’07
25. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Analysis of the Deviations 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Inheritance Error of Classifiers’ Parameters
– New promising representatives of starved niches are created from classifiers that
belong to nourished niches.
– These new promising rules inherit parameters from these classifiers. This is
specially delicate for the action set size (as).
– Approach: initialize as=1.
Subsumption
– An overgeneral classifier of the majority class may receive ir positive reward
before receiving the first negative reward
– Approach: set θsub>ir
Stabilizing the population before testing
– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA switched
off.
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 25
GECCO’07
26. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 26
GECCO’07
27. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
N remains constant up to ir = 128
For higher ir, N slightly increases
We only have to guarantee that a
representative of the starved niche
will be created
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 27
GECCO’07
28. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
XCS+PCM in the Parity Problem 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Building blocks of size 3 need to
be processed
Empirical results agree with the
theory
Population size bound to guarantee
that a representative of the niche
will receive a genetic event
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 28
GECCO’07
29. Outline
1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 29
GECCO’07
30. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Conclusions and Further Work 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
We derived models that analyzed the representatives of starved
niches provided by covering and mutation
A population size bound was derived
We saw that the empirical observations met the theory if four
aspects were considered:
– as initialization
– Subsumption
– Stabilization of the population
XCS really robust to class imbalances
Further analysis of the covering operator
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 30
GECCO’07
31. Modeling XCS in Class
Imbalances: Population Size
and Parameter Settings
Albert Orriols-Puig1,2 David E. Goldberg2
Kumara Sastry2 Ester Bernadó-Mansilla1
1Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle, Ramon Llull University
2Illinois
Genetic Algorithms Laboratory
Department of Industrial and Enterprise Systems Engineering
University of Illinois at Urbana Champaign
32. Motivation
And what about incremental learning?
Sampling instances of the minority class less frequently
This influences the mechanisms of XCS (Orriols & Bernadó, 2006)
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 32
GECCO’07
33. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Analysis of the Deviations 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Niched Mutation vs. Free Mutation
– Classifiers can only be created if minority class instances are sampled
Inheritance Error of Classifiers’ Parameters
– New promising representatives of starved niches are created from
classifiers that belong to nourished niches
– These new promising rules inherit parameters from these classifiers.
This is specially delicate for the action set size (as).
– Approach: initialize as=1.
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 33
GECCO’07
34. 1. Description of XCS
2. Facetwise Analysis
3. Design of test Problems
Analysis of the Deviations 4. XCS on the one-bit Problem
5. Analysis of Deviations
6. Results
7. Conclusions
Subsumption
– An overgeneral classifier of the majority class may receive ir positive
reward before receiving the first negative reward
– Approach: set θsub>ir
Stabilizing the population before testing
– Overgeneral classifiers poorly evaluated
– Approach: introduce some extra runs at the end of learning with the GA
switched off.
We gather all these little tweaks in XCS+PMC
Illinois Genetic Algorithms Laboratory and Group of Research in Intelligent Systems Slide 34
GECCO’07