• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings
 

GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings

on

  • 652 views

 

Statistics

Views

Total Views
652
Views on SlideShare
652
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings Presentation Transcript

    • Modeling XCS in Class Imbalances: Population Size and Parameter Settings g Albert Orriols-Puig1,2 David E. Goldberg2 Kumara Sastry2 Ester Bernadó-Mansilla1 Bernadó Mansilla 1Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign
    • Framework New instance Information based Knowledge on experience extraction Learner Model Mdl Domain Predicted Output Examples Consisting Cou te e a p es Counter-examples of , yp y In real-world domains, typically: Higher cost to obtain examples of the concept to be learnt So, distribution of examples in the training dataset is usually imbalanced Applications: Fraud detection Medical diagnosis of rare illnesses Detection of oil spills in satellite images Enginyeria i Arquitectura la Salle Slide 2 GRSI
    • Framework Do learners suffer from class imbalances? – Methods that do global optimization Training Minimize the Learner L Set global error num. errorsc1 + num. errorsc 2 error = Biased towards number examples the overwhelmed class Maximization of the overwhelmed class accuracy, in detriment of the minority class. Enginyeria i Arquitectura la Salle Slide 3 GRSI
    • Motivation And what about incremental learning? Sampling instances of the minority class less frequently Rules that match instances of the minority class poorly activated Rules of the minority class would receive less genetic opportunities (Orriols & Bernadó, 2006) Enginyeria i Arquitectura la Salle Slide 4 GRSI
    • Aim Facetwise analysis of XCS for class imbalances Impact of class imbalances on the initialization process How can XCS create rules of the minority class if the covering process fails gp Population size bound with respect to the imbalance ratio U til which imbalance ratio would XCS be able to learn Until hi h i b l ti ld b bl t l from the minority class? Enginyeria i Arquitectura la Salle Slide 5 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Outline 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 6 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Description of XCS p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions In single-step tasks: g p Environment Match Set [M] Match Set [M] Problem Minority Majority classinstance instance 1C A PεF num as ts exp 1C A PεF num as ts exp Selected 3C A PεF num as ts exp 3C A PεF num as ts exp action 5C A PεF num as ts exp 5C A PεF num as ts exp Population [P] Population [P] 6C A PεF num as ts exp 6C A PεF num as ts exp Match set Match set REWARD … … generation generation 1C A PεF num as ts exp 1C A PεF num as ts exp Prediction Array 1000/0 2C A PεF num as ts exp 2C A PεF num as ts exp 3C A PεF num as ts exp 3C A PεF num as ts exp … c1 c2 cn 4C A PεF num as ts exp 4C A PεF num as ts exp 5C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp 6C A PεF num as ts exp Random Action Nourished niches Starved niches … … Action S t A ti Set [A] 1C A PεF num as ts exp Deletion Classifier 3C A PεF num as ts exp Selection, Reproduction, Parameters Mutation 5C A PεF num as ts exp Update 6C A PεF num as ts exp … Genetic Algorithm Problem niche: the schema defines the relevant attributes for a particular problem niche. Eg: 10**1* Enginyeria i Arquitectura la Salle Slide 7 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 8 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Study XCS capabilities to provide representatives of starved niches: – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers Derive a bound on the population size Depart from theory developed for XCS – (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche support – (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design. Enginyeria i Arquitectura la Salle Slide 9 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Problems consisting of n classes – One class sampled with a lower frequency: minority class num. instances of any class other than the minority class ir = num. instances of the minority class – Probability of sampling an instance of the minority class: 1 ir Ps(min) = Ps(maj) = ( ) ( j) 1+ i 1+ i ir ir Enginyeria i Arquitectura la Salle Slide 10 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 11 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Covering procedure – Covering: Generalize over the input with probability P# – P# needs to satisfy the covering challenge (Butz et al., 01) Would I trigger covering on minority class instances? – Probability that one instance is covered by at least covered, by, least, one rule is (Butz et. al, 01): Population Input specificity length Initially 1 – P# y Population size Enginyeria i Arquitectura la Salle Slide 12 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 13 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 14 GRSI
    • 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Covering has not provided any representative of starved niches – Simplified model: only consider mutation in our model. How can we generate representative of starved niches? – Specifying correctly all the bits of the schema that represents the starved niche Enginyeria i Arquitectura la Salle Slide 15 GRSI
    • 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Possible cases: 1 Ps(min) = – Sample a minority class instance 1 + ir • Activate a niche of the minority class μ: Mutation probability Km: Order of the schema • Activate a niche of another class ir Ps(maj) = – Sample a majority class instance 1 + ir • Activate a niche of the minority class • Activate a niche of another class Enginyeria i Arquitectura la Salle Slide 16 GRSI
    • 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Summing up, time to get the first representative of a sta ed c e starved niche n: number of classes μ: Mutation probability Km: Order of the schema It increases: Linearly with the number of classes Exponentially with the order of the schema It does not depend on the imbalance ratio Enginyeria i Arquitectura la Salle Slide 17 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 18 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Time to extinction – Consider random deletion: Enginyeria i Arquitectura la Salle Slide 19 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 20 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that there will be representatives o sta ed niches ep ese tat es of starved c es – Require that: – Bound: Enginyeria i Arquitectura la Salle Slide 21 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity: – Consider θGA = 0 – We require that the best representative of a starved niche receive a genetic event before being removed – Time to receive the first genetic event Enginyeria i Arquitectura la Salle Slide 22 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that representatives o sta ed c es of starved niches will receive a ge et c oppo tu ty ece e genetic opportunity: The population size to guarantee that the best representatives of starve niches will receive at least one genetic opportunity g pp y increases linearly with the imbalance ratio Enginyeria i Arquitectura la Salle Slide 23 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 24 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Design of Test Problems g 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions One-bit problem Condition length (l) 000110 :0 Value of the left-most bit – Only two schemas of order one: 0***** and 1***** – Undersampling instances of the class labeled as 1 1 Ps(min) = 1 + ir Enginyeria i Arquitectura la Salle Slide 25 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Design of Test Problems g 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Parity problem Condition length (l) Number of 1 mod 2 01001010 :1 Relevant bits ( k) – The k bits of parity form a single building block – Undersampling instances of the class labeled as 1 1 Ps(min) = 1 + ir Enginyeria i Arquitectura la Salle Slide 26 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 27 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions XCS configuration α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6 selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir Evaluation of the results: – Minimum population size to achieve: TP rate * TN rate > 95% –R Results are averages over 25 seeds lt d Enginyeria i Arquitectura la Salle Slide 28 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 64 N increases linearly from ir=64 to ir=256 N increases exponentially from p y ir=256 to ir=1024 Higher ir could not be solved Enginyeria i Arquitectura la Salle Slide 29 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 30 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Niched Mutation vs. Free Mutation – Classifiers can only be created if minority class instances are sampled Inheritance Error of Classifiers’ Parameters – New promising representatives of starved niches are created from classifiers th t b l l ifi that belong t nourished niches to ihd ih – These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as) (as). – Approach: initialize as=1. Enginyeria i Arquitectura la Salle Slide 31 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Subsumption – An overgeneral classifier of the majority class may receive ir positive reward before receiving the first negative reward – Approach: set θsub>ir pp Stabilizing the population before testing – Overgeneral classifiers poorly evaluated – Approach: introduce some extra runs at the end of learning with the GA switched off. We gather all these little tweaks in XCS+PMC Enginyeria i Arquitectura la Salle Slide 32 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 33 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 128 For hi h i F higher ir, N slightly increases li htl i We only have to guarantee that a representative of the starved niche will be created Enginyeria i Arquitectura la Salle Slide 34 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the Parity Problem y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Building blocks of size 3 need to be processed Empirical results agree with the theory Population size bound to guarantee P l ti ib dt t that a representative of the niche will receive a genetic event Enginyeria i Arquitectura la Salle Slide 35 GRSI
    • Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 36 GRSI
    • 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Conclusions and Further Work 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions We derived models that analyzed the representatives of starved niches provided by covering and mutation A population size bound was derived We saw that the empirical observations met the theory if four aspects were considered: – Type of mutation – as initialization – Subsumption – Stabilization of the population Further analysis of the covering operator Enginyeria i Arquitectura la Salle Slide 37 GRSI
    • Modeling XCS in Class Imbalances: Population Size and Parameter Settings g Albert Orriols-Puig1,2 David E. Goldberg2 Kumara Sastry2 Ester Bernadó-Mansilla1 Bernadó Mansilla 1Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign