GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets

  • 601 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
601
On Slideshare
599
From Embeds
2
Number of Embeds
1

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 2

https://www.linkedin.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Bounding XCS’s Parameters for U b l f Unbalanced Datasets dD t t Albert Orriols-Puig Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Ramon Llull University Barcelona, Spain ,p
  • 2. Framework New instance Information based Knowledge on experience extraction Learner Model Mdl Dataset Predicted Output Examples Consisting Cou te e a p es Counter-examples of , yp y In real-world domains, typically: Higher cost to obtain examples of the concept to be learnt So, distribution of examples in the training dataset is usually unbalanced Applications: Fraud Detection Rare medical diagnosis Detection of oil spills in satellite images Enginyeria i Arquitectura la Salle Slide 2 GRSI
  • 3. Framework Do learners suffer from class imbalances? Training Minimize the Learner L Set global error num. errorsc1 + num. errorsc 2 error = Biased towards number examples the overwhelmed class Maximization of the overwhelmed class accuracy, in detriment of the minority class. Enginyeria i Arquitectura la Salle Slide 3 GRSI
  • 4. Aim Analyze the performance of XCS when learning from imbalanced datasets Analyze the contribution of the different components Propose approaches that facilitate to learn P h th t f ilit t t l minority class regions Enginyeria i Arquitectura la Salle Slide 4 GRSI
  • 5. Outline 1. Description of XCS 2. Description of the Domain 3. Experimentation 3E i t ti 4. 4 XCS and Class Imbalances 5. Guidelines for Parameter Tuning 6. Online Adaptation 7. Conclusions Enginyeria i Arquitectura la Salle Slide 5 GRSI
  • 6. 1. Description of XCS 2. Description of the domain 1. Description of XCS p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions In single-step tasks: g p Environment Match Set [M] Problem instance 1C A PεF num as ts exp Selected 3C A PεF num as ts exp action 5C A PεF num as ts exp Population [P] 6C A PεF num as ts exp Match set REWARD … generation 1C A PεF num as ts exp Prediction Array 2C A PεF num as ts exp 3C A PεF num as ts exp … c1 c2 cn 4C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp Random Action … Action S t A ti Set [A] 1C A PεF num as ts exp Deletion Classifier 3C A PεF num as ts exp Selection, Reproduction, Parameters Mutation 5C A PεF num as ts exp Update 6C A PεF num as ts exp … Genetic Algorithm Enginyeria i Arquitectura la Salle Slide 6 GRSI
  • 7. 1. Description of XCS 2. Description of the domain 1. Description of XCS p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Learning domain Environment Reward Prediction Set of Rules Reinforcement GA Learning R ti b t Ratio between classes 525 75 l 525:75 1 minority class example 7 majority class examples j y p Enginyeria i Arquitectura la Salle Slide 7 GRSI
  • 8. 1. Description of XCS 2. Description of the domain 2. Description of the Domain p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Selection bits (11-bit) Multiplexer Imbalanced Multiplexer Position bits Example: 000 10010100:1 Co p e y e a ed o e Complexity related to the •We under-sampled class 1 number of selection bits Completely balanced •ir: Proportion between majority and ir: XCS should evolve: minority class instances 000 0#######:0 000 0#######:1 000 1#######:0 000 1#######:1 •i: imbalance level (i=log2ir) 001 #0######:0 001 #0######:1 001 #1######:0 001 #1######:1 010 ##0#####:0 010 ##0#####:1 010 ##1#####:0 010 ##1#####:1 011 ###0####:0 011 ###0####:1 011 ###1####:0 011 ###1####:1 100 ####0###:0 100 ####0###:1 100 ####1###:0 100 ####1###:1 101 #####0##:0 101 #####0##:1 101 #####1##:0 101 #####1##:1 110 ######0#:0 110 ######0#:1 110 ######1#:0 110 ######1#:1 111 #######0:0 111 #######0:1 111 #######1:0 111 #######1:1 Enginyeria i Arquitectura la Salle Slide 8 GRSI
  • 9. 1. Description of XCS 2. Description of the domain 3. Experimentation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions We ran XCS with the following standard configuration from i=0 (ir=1) to i=9 (ir=512:1): N=800, α=0.1, ν=5, Rmax = 1000, ε0=1, θGA=25, β=0.2, χ=0.8, μ=0.4, θdel=20, δ=0.1, θsub=200, P#=0.6 selection=rws, mutation=niched, selection=rws mutation=niched GAsub=true, [A]sub=false Enginyeria i Arquitectura la Salle Slide 9 GRSI
  • 10. 1. Description of XCS 2. Description of the domain 3. Experimentation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions True Negative rate True Positive rate ir = 32:1 ir 64:1 i = 64 1 ir 16:1 i = 16 1 Enginyeria i Arquitectura la Salle Slide 10 GRSI
  • 11. 1. Description of XCS 2. Description of the domain 3. Experimentation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Most numerous rules, ir=128:1 Condition:Action P Error F Num ###########:0 1000 0.120 0.98 385 ###########:1 1.2 · 10-4 0.074 0.98 366 Estimated parameters are too high. Theoretically: P:0 = 992.24 P:1 = 15.38 ε0:0 = ε0:1 = 7.75 Overgeneral classifiers overtake the population (they represent the 94% of the population) Enginyeria i Arquitectura la Salle Slide 11 GRSI
  • 12. 1. Description of XCS 2. Description of the domain 4. XCS and Class Imbalances 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions We analyze the following factors: Classifiers Classifiers’ Error y Stability of Prediction and Error Estimates Occurrence-based Reproduction Enginyeria i Arquitectura la Salle Slide 12 GRSI
  • 13. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4.1. Classifiers 4 1 Classifiers’ Error 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions How does the imbalance ratio influences the classifier’s error? ε cl < ε 0 XCS considers that a classifier is accurate if: XCS receives a reward of Rmax (correct prediction) or 0 (incorrect prediction) XCS computes classifiers’ error (ε) and prediction (p) as window averages: Prediction: pt +1 = pt + β (R − pt ) • P di ti ε t +1 = ε t + β ( R − pt − ε t ) • Error: Enginyeria i Arquitectura la Salle Slide 13 GRSI
  • 14. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4 1 Classifiers’ Error 4.1. Classifiers 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Until which class imbalance will XCS detect overgeneral classifiers? – Bound for inaccurate classifier: ε ≥ ε 0 Overgeneral classifiers – Given the estimated prediction and error: detected P = Pc (cl ) Rmax + (1 − Pc (cl )) Rmin ε =| P − Rmax | Pc (cl )+ | P − Rmin | (1 − Pc (cl )) l l – We derive: ε ≥ ε0 − ε o p + 2 p( Rmax − ε 0 ) − ε 0 ≥ 0 2 Overgeneral classifiers – 1/1998 1998 where not detected p =!C / C – For Rmax = 1000 ε 0 = 1 – we get maximum imbalance ratio:0 ) − ε 0 ≥ 0 − ε o p + 2 p( Rmax − ε 2 irmax = 1998 irmax = 1998 imax = 10 imax = 10 Enginyeria i Arquitectura la Salle Slide 14 GRSI
  • 15. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4 1 Classifiers’ Error 4.1. Classifiers 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions XCS computes classifiers’ error (ε) and prediction (p) as window averages: – Prediction: pt +1 = pt + β (R − pt ) Size of the window ε t +1 = ε t + β ( R − pt − ε t ) – Error: eward Influen of the re β=0.2 The effect of previous rewards is forgotten nce β=0.1 β=0.05 t+2 t+1 t+3 t+4 t+5 t+6 t+7 t+8 time Enginyeria i Arquitectura la Salle Slide 15 GRSI
  • 16. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4 2 Stability of Prediction and Error Estimates 4.2. 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Stability of Prediction and Error f ir=128:1 S f for 7.75 992.24 0.4 0.8 0.3 .3 0.6 β = 0.2 Density Density 0.2 0.4 D 0.1 .1 0.2 0.0 0.0 As ir 20 40 60 80 should be decreased increases β 100 increases, 0 900 920 940 960 980 1000 to stabilize the prediction and error estimates 992.24 7.75 Error Prediction 0.12 0.00 0.05 0.10 0.15 0.20 β = 0.002 0.08 Density Density 0.04 .04 D 5 0.00 900 920 940 960 980 1000 0 20 40 60 80 100 Prediction Error Enginyeria i Arquitectura la Salle Slide 16 GRSI
  • 17. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4 3 Occurrence based Reproduction 4.3. Occurrence-based 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions To receive a GA event a classifier has to belong to [A] event, Frequency of occurrences Classifier pocc 11-Mux ir=128:0 0.5 000 0#######:0 0.062 1 ir 0.4 000 0#######:0 pocc = 000 1#######:1 2 sel +1 1 + i ### ########:0/1 ir 0.3 1 1 000 1#######:1 0.000484 pocc = 2 sel +1 1 + ir 0.2 ### ########:0 ½ 0.5 0.1 ### ########:1 ½ 0.5 0 0 100 200 300 400 500 ir Classifiers that occur more frequently: – Have better estimates – Tend to have more genetic opportunities… … depending on θGA Enginyeria i Arquitectura la Salle Slide 17 GRSI
  • 18. 1. Description of XCS 4. XCS and Class Imbalances 2. Description of the domain 3. Experimentation 4. XCS and class imbalances 4 3 Occurrence based Reproduction 4.3. Occurrence-based 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Genetic opportunities – A classifier goes through a genetic event when (TGA): • It occurs in [A] • Average time since last GA application > θGA TGA(########### 0/1) (###########:0/1) GA GA GA GA Tocc θGA 75 θGA 100 θGA 25 θGA 50 Set θGA = Tocc of the most infrequent niche To balance the genetic opportunities that receive the different niches T (0001#######:1) GA GA Tocc θGA Enginyeria i Arquitectura la Salle Slide 18 GRSI
  • 19. 1. Description of XCS 2. Description of the domain 5. Guidelines for Parameter Tuning g 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions From the analysis we can extract the following guidelines Rmax and ε0 determine the threshold between negligible noise and gg imbalance ratio β represents the reward f t th d forgetfulness ratio. We want this ratio to tf l ti W t thi ti t consider under-sampled instances: f min β = k1 i f maj θGA is the GA rate when Tocc < θGA. If we want that all niches receive the same number of genetic opportunities: 1 θ GA = k 2 f min Enginyeria i Arquitectura la Salle Slide 19 GRSI
  • 20. 1. Description of XCS 2. Description of the domain 5. Guidelines for Parameter Tuning g 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions We set β={0.04,0.02,0.01,0.005} and θGA={200,400,800,800,1600} Standard Configuration Configuration following the guidelines ir = 16:1 ir = 32:1 ir = 64:1 ir = 128:1 ir = 256:1 ir = 64:1 Enginyeria i Arquitectura la Salle Slide 20 GRSI
  • 21. 1. Description of XCS 2. Description of the domain 6. Online Adaptation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Problem: How can we estimate the niche frequency? f maj f min = – In the multiplexer: ir – In a real world problem real-world problem… … niche frequencies may not be related to imbalance ratio small disjuncts ir = 5 in both figures Enginyeria i Arquitectura la Salle Slide 21 GRSI
  • 22. 1. Description of XCS 2. Description of the domain 6. Online Adaptation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Our approach: Let XCS discover small disjuncts. We search for regions that promote overgeneral classifiers We estimate ircl based on that regions We use ircl to adapt β and θGA Overgeneral classifier ircl = 14:1 Enginyeria i Arquitectura la Salle Slide 22 GRSI
  • 23. 1. Description of XCS 2. Description of the domain 6. Online Adaptation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions The Algorithm Checking if prediction oscillates Estimating the imbalance ratio Requiring a minimum of experience and numerosity to adapt the parameters Adapting parameters following the guidelines and the estimation of θGA Enginyeria i Arquitectura la Salle Slide 23 GRSI
  • 24. 1. Description of XCS 2. Description of the domain 6. Online Adaptation p 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions Configuration following the guidelines Standard Configuration Online Adaptation ir = 16:1 irir==128:1 32:1 ir ir = 64:1 = 256:1 ir = 64:1 ir = 128:1 ir = 256:1 ir = 64:1 Enginyeria i Arquitectura la Salle Slide 24 GRSI
  • 25. 1. Description of XCS 2. Description of the domain 7. Conclusions 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions We studied the behavior of XCS when the training set is unbalanced XCS with standard configuration only can solve the multiplexer for an imbalance ratio up to ir=16 p The theoretical analysis denotes that XCS is highly robust to class imbalances if: – Classifier estimates are accurate – N b of genetic opportunities of niches i b l Number f ti t iti f i h is balanced d We define guidelines to adapt XCS’s parameters: – XCS could solve the multiplexer until an imbalance ratio ir=256 Enginyeria i Arquitectura la Salle Slide 25 GRSI
  • 26. 1. Description of XCS 2. Description of the domain 7. Conclusions 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions As an advantage to other learners, XCS can automatically discover small disjuncts: Self-adaptation of parameters Enginyeria i Arquitectura la Salle Slide 26 GRSI
  • 27. 1. Description of XCS 2. Description of the domain 7. Further Work 3. Experimentation 4. XCS and class imbalances 5. Guidelines for P. Tuning g 6. Online Adaptation 7. Conclusions What about the convergence time? – An increase θGA A decrease of search for promising rules p g Cluster-based resampling methods… … unfortunately, there is no a direct relation between cluster and niche What about niche-based resampling? ir i niche = 14 1 14:1 Let s Let’s resample these instances 1/irniche Enginyeria i Arquitectura la Salle Slide 27 GRSI
  • 28. Bounding XCS’s Parameters for U b l f Unbalanced Datasets dD t t Albert Orriols-Puig Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Ramon Llull University Barcelona, Spain ,p