GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings

523 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
523
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter Settings

  1. 1. Modeling XCS in Class Imbalances: Population Size and Parameter Settings g Albert Orriols-Puig1,2 David E. Goldberg2 Kumara Sastry2 Ester Bernadó-Mansilla1 Bernadó Mansilla 1Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign
  2. 2. Framework New instance Information based Knowledge on experience extraction Learner Model Mdl Domain Predicted Output Examples Consisting Cou te e a p es Counter-examples of , yp y In real-world domains, typically: Higher cost to obtain examples of the concept to be learnt So, distribution of examples in the training dataset is usually imbalanced Applications: Fraud detection Medical diagnosis of rare illnesses Detection of oil spills in satellite images Enginyeria i Arquitectura la Salle Slide 2 GRSI
  3. 3. Framework Do learners suffer from class imbalances? – Methods that do global optimization Training Minimize the Learner L Set global error num. errorsc1 + num. errorsc 2 error = Biased towards number examples the overwhelmed class Maximization of the overwhelmed class accuracy, in detriment of the minority class. Enginyeria i Arquitectura la Salle Slide 3 GRSI
  4. 4. Motivation And what about incremental learning? Sampling instances of the minority class less frequently Rules that match instances of the minority class poorly activated Rules of the minority class would receive less genetic opportunities (Orriols & Bernadó, 2006) Enginyeria i Arquitectura la Salle Slide 4 GRSI
  5. 5. Aim Facetwise analysis of XCS for class imbalances Impact of class imbalances on the initialization process How can XCS create rules of the minority class if the covering process fails gp Population size bound with respect to the imbalance ratio U til which imbalance ratio would XCS be able to learn Until hi h i b l ti ld b bl t l from the minority class? Enginyeria i Arquitectura la Salle Slide 5 GRSI
  6. 6. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Outline 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 6 GRSI
  7. 7. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Description of XCS p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions In single-step tasks: g p Environment Match Set [M] Match Set [M] Problem Minority Majority classinstance instance 1C A PεF num as ts exp 1C A PεF num as ts exp Selected 3C A PεF num as ts exp 3C A PεF num as ts exp action 5C A PεF num as ts exp 5C A PεF num as ts exp Population [P] Population [P] 6C A PεF num as ts exp 6C A PεF num as ts exp Match set Match set REWARD … … generation generation 1C A PεF num as ts exp 1C A PεF num as ts exp Prediction Array 1000/0 2C A PεF num as ts exp 2C A PεF num as ts exp 3C A PεF num as ts exp 3C A PεF num as ts exp … c1 c2 cn 4C A PεF num as ts exp 4C A PεF num as ts exp 5C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp 6C A PεF num as ts exp Random Action Nourished niches Starved niches … … Action S t A ti Set [A] 1C A PεF num as ts exp Deletion Classifier 3C A PεF num as ts exp Selection, Reproduction, Parameters Mutation 5C A PεF num as ts exp Update 6C A PεF num as ts exp … Genetic Algorithm Problem niche: the schema defines the relevant attributes for a particular problem niche. Eg: 10**1* Enginyeria i Arquitectura la Salle Slide 7 GRSI
  8. 8. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 8 GRSI
  9. 9. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Study XCS capabilities to provide representatives of starved niches: – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers Derive a bound on the population size Depart from theory developed for XCS – (Butz, Kovacs, Lanzi, Wilson,04): Model of generalization pressures of XCS – (Butz, Goldberg & Lanzi, 04): Learning time bound – (Butz, Goldberg, Lanzi & Sastry, 07): Population size bound to guarantee niche support – (Butz, 2006): Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design. Enginyeria i Arquitectura la Salle Slide 9 GRSI
  10. 10. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Problems consisting of n classes – One class sampled with a lower frequency: minority class num. instances of any class other than the minority class ir = num. instances of the minority class – Probability of sampling an instance of the minority class: 1 ir Ps(min) = Ps(maj) = ( ) ( j) 1+ i 1+ i ir ir Enginyeria i Arquitectura la Salle Slide 10 GRSI
  11. 11. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 11 GRSI
  12. 12. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Covering procedure – Covering: Generalize over the input with probability P# – P# needs to satisfy the covering challenge (Butz et al., 01) Would I trigger covering on minority class instances? – Probability that one instance is covered by at least covered, by, least, one rule is (Butz et. al, 01): Population Input specificity length Initially 1 – P# y Population size Enginyeria i Arquitectura la Salle Slide 12 GRSI
  13. 13. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Population Initialization p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 13 GRSI
  14. 14. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 14 GRSI
  15. 15. 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Assumptions – Covering has not provided any representative of starved niches – Simplified model: only consider mutation in our model. How can we generate representative of starved niches? – Specifying correctly all the bits of the schema that represents the starved niche Enginyeria i Arquitectura la Salle Slide 15 GRSI
  16. 16. 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Possible cases: 1 Ps(min) = – Sample a minority class instance 1 + ir • Activate a niche of the minority class μ: Mutation probability Km: Order of the schema • Activate a niche of another class ir Ps(maj) = – Sample a majority class instance 1 + ir • Activate a niche of the minority class • Activate a niche of another class Enginyeria i Arquitectura la Salle Slide 16 GRSI
  17. 17. 1. Description of XCS Creation of Representatives of 2. Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem Starved Niches 5. Analysis of Deviations 6. Results 7. Conclusions Summing up, time to get the first representative of a sta ed c e starved niche n: number of classes μ: Mutation probability Km: Order of the schema It increases: Linearly with the number of classes Exponentially with the order of the schema It does not depend on the imbalance ratio Enginyeria i Arquitectura la Salle Slide 17 GRSI
  18. 18. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 18 GRSI
  19. 19. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Time to extinction – Consider random deletion: Enginyeria i Arquitectura la Salle Slide 19 GRSI
  20. 20. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Facetwise Analysis y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Facetwise Analysis – Population initialization – Generation of correct representatives of starved niches – Time of extinction of these correct classifiers – Population size bound Enginyeria i Arquitectura la Salle Slide 20 GRSI
  21. 21. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that there will be representatives o sta ed niches ep ese tat es of starved c es – Require that: – Bound: Enginyeria i Arquitectura la Salle Slide 21 GRSI
  22. 22. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that representatives of starved niches will receive a genetic opportunity: – Consider θGA = 0 – We require that the best representative of a starved niche receive a genetic event before being removed – Time to receive the first genetic event Enginyeria i Arquitectura la Salle Slide 22 GRSI
  23. 23. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Bounding the Population Size g p 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Population size bound to guarantee that representatives o sta ed c es of starved niches will receive a ge et c oppo tu ty ece e genetic opportunity: The population size to guarantee that the best representatives of starve niches will receive at least one genetic opportunity g pp y increases linearly with the imbalance ratio Enginyeria i Arquitectura la Salle Slide 23 GRSI
  24. 24. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 24 GRSI
  25. 25. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Design of Test Problems g 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions One-bit problem Condition length (l) 000110 :0 Value of the left-most bit – Only two schemas of order one: 0***** and 1***** – Undersampling instances of the class labeled as 1 1 Ps(min) = 1 + ir Enginyeria i Arquitectura la Salle Slide 25 GRSI
  26. 26. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Design of Test Problems g 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Parity problem Condition length (l) Number of 1 mod 2 01001010 :1 Relevant bits ( k) – The k bits of parity form a single building block – Undersampling instances of the class labeled as 1 1 Ps(min) = 1 + ir Enginyeria i Arquitectura la Salle Slide 26 GRSI
  27. 27. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 27 GRSI
  28. 28. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions XCS configuration α=0.1, ν=5, ε0=1, θGA=25, χ=0.8, μ=0.4, θdel=20, θsub=200, δ=0.1, P#=0.6 selection=tournament, mutation=niched, [A]sub=false, N = 10,000 ir Evaluation of the results: – Minimum population size to achieve: TP rate * TN rate > 95% –R Results are averages over 25 seeds lt d Enginyeria i Arquitectura la Salle Slide 28 GRSI
  29. 29. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS on the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 64 N increases linearly from ir=64 to ir=256 N increases exponentially from p y ir=256 to ir=1024 Higher ir could not be solved Enginyeria i Arquitectura la Salle Slide 29 GRSI
  30. 30. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 30 GRSI
  31. 31. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Niched Mutation vs. Free Mutation – Classifiers can only be created if minority class instances are sampled Inheritance Error of Classifiers’ Parameters – New promising representatives of starved niches are created from classifiers th t b l l ifi that belong t nourished niches to ihd ih – These new promising rules inherit parameters from these classifiers. This is specially delicate for the action set size (as) (as). – Approach: initialize as=1. Enginyeria i Arquitectura la Salle Slide 31 GRSI
  32. 32. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Analysis of the Deviations y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Subsumption – An overgeneral classifier of the majority class may receive ir positive reward before receiving the first negative reward – Approach: set θsub>ir pp Stabilizing the population before testing – Overgeneral classifiers poorly evaluated – Approach: introduce some extra runs at the end of learning with the GA switched off. We gather all these little tweaks in XCS+PMC Enginyeria i Arquitectura la Salle Slide 32 GRSI
  33. 33. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 33 GRSI
  34. 34. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the one-bit Problem 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions N remains constant up to ir = 128 For hi h i F higher ir, N slightly increases li htl i We only have to guarantee that a representative of the starved niche will be created Enginyeria i Arquitectura la Salle Slide 34 GRSI
  35. 35. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems XCS+PCM in the Parity Problem y 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions Building blocks of size 3 need to be processed Empirical results agree with the theory Population size bound to guarantee P l ti ib dt t that a representative of the niche will receive a genetic event Enginyeria i Arquitectura la Salle Slide 35 GRSI
  36. 36. Outline 1. Description of XCS 2. 2 Facetwise Analysis 3. Design of test Problems 4. XCS on the one-bit Problem 5. Analysis f D i ti 5 A l i of Deviations 6. Results 7. Conclusions Enginyeria i Arquitectura la Salle Slide 36 GRSI
  37. 37. 1. Description of XCS 2. Facetwise Analysis 3. Design of test Problems Conclusions and Further Work 4. XCS on the one-bit Problem 5. Analysis of Deviations 6. Results 7. Conclusions We derived models that analyzed the representatives of starved niches provided by covering and mutation A population size bound was derived We saw that the empirical observations met the theory if four aspects were considered: – Type of mutation – as initialization – Subsumption – Stabilization of the population Further analysis of the covering operator Enginyeria i Arquitectura la Salle Slide 37 GRSI
  38. 38. Modeling XCS in Class Imbalances: Population Size and Parameter Settings g Albert Orriols-Puig1,2 David E. Goldberg2 Kumara Sastry2 Ester Bernadó-Mansilla1 Bernadó Mansilla 1Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle, Ramon Llull University 2Illinois Genetic Algorithms Laboratory Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana Champaign

×