Class Imbalance Problem in UCS
           Cl    I bl       P bl     i
                  Classifier System:
               ...
OUTLINE

    1. Introduction
                                                              1. Introduction


             ...
INTRODUCTION

                                                                                            1. Introduction
...
Description of UCS
                                                                                                       ...
Description of UCS: An example
                                                                                           ...
Chk Problem

   - Two real attributes x,y E [0,1]                                                1. Introduction

   - Two...
We ran UCS in chk with s=4096 c=4 and i=[0 7]
                                  s=4096,        i=[0..7]

                 ...
Obtaining the following results

                                                                                  1. Intr...
Analyzing the population evolved in higher imbalance levels
                                                              ...
Methods to deal with imbalances
                                                                                   1. Intr...
Class-sensitive
                            Class sensitive accuracy
                                                     ...
1. Introduction


                                                                       2. UCS Descript.


              ...
Analyzing the population evolved in higher imbalance levels

                 Id              condition                   ...
Weighted Class-sensitive accuracy
                            Class sensitive
                                            ...
1. Introduction


                                                                           2. UCS Descript.


          ...
Conclusions
                                                                              1. Introduction


              ...
Further Work
                                                                       1. Introduction


                    ...
Thanks f
           Th k for you attention
                         tt ti




CEC-2005    Enginyeria i Arquitectura La Sal...
Upcoming SlideShare
Loading in …5
×

CEC'2005: Class Imbalance Problem in UCS Classifier System: Fitness Adaptation

642 views

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
642
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

CEC'2005: Class Imbalance Problem in UCS Classifier System: Fitness Adaptation

  1. 1. Class Imbalance Problem in UCS Cl I bl P bl i Classifier System: Fitness Adaptation p Albert Orriols Puig Ester Bernadó Mansilla Enginyeria i Arquitectura La Salle Ramon Llull University September 4th, 2005 CEC-2005 Enginyeria i Arquitectura La Salle Page 1
  2. 2. OUTLINE 1. Introduction 1. Introduction 2. UCS Description 2. Description of UCS 3. Dataset Design 3. Dataset design 4. UCS on unbalanced d. 4. 4 UCS on U b l Unbalanced D t d Datasets t 5. Dealing imbalances 5. 5 Dealing with imbalances 6. Class-sensitive acc. 6. Class-Sensitive Accuracy 7. Weighted class-sens. 8. Conclusions 7. Weighted Class-Sensitive Accuracy 8. Conclusions CEC-2005 Enginyeria i Arquitectura La Salle Page 2
  3. 3. INTRODUCTION 1. Introduction 2. UCS Description Real Class imbalances in world t e samples taken the sa p es ta e 3. Dataset Design domains 4. UCS on unbalanced d. 5. Dealing imbalances 6. Class-sensitive acc. Does it affects the learning performance of some well well- known systems? 7. Weighted class-sens. 8. Conclusions Does class imbalances affect If it is, how we can the performance of UCS deal with imbalances CEC-2005 Enginyeria i Arquitectura La Salle Page 3
  4. 4. Description of UCS 1. Introduction Match Set [M] 2. UCS Description 1C A acc F num cs ts exp 3C A acc F num cs ts exp Population [P] 3. Dataset Design 5C A acc F num cs t exp ts Match set 6C A acc F num cs ts exp generation 1C A acc F num cs ts exp … 2C A acc F num cs ts exp 4. UCS on unbalanced d. 3C A acc F num cs ts exp 4C A acc F num cs ts exp Action set 5. Dealing imbalances Classifier 5C A acc F num cs ts exp generation Parameters 6C A acc F num cs ts exp Update … 6. Class-sensitive acc. Correct Set [C] 7. Weighted class-sens. 3 C A acc F num cs ts exp Deletion Selection, Reproduction, # Correct 6 C A acc F num cs ts exp p acc = mutation, Recombination … Experience 8. Conclusions Genetic Algorithm Fitness = accν Problem instance + output class Environment CEC-2005 Enginyeria i Arquitectura La Salle Page 4
  5. 5. Description of UCS: An example 1. Introduction 2. UCS Description Evolved 3. Dataset Design Training T ii UCS Model Dataset 4. UCS on unbalanced d. 5. Dealing imbalances 6. Class-sensitive acc. if sepal_length <= 6.24 and petal_length <= IRIS Dataset 4.49) and petal_width <= 0.67 then Iris-setosa 7. Weighted class-sens. sepal sepal petal petal length width length width class if sepal_lenght >= 4.95 and 2.22 <= petal_length <= 4.76 d 0.51 <= t l idth < 2 36 then I < 4 76 and 0 51 < petal_width <= 2.36 th I- UCS 5.1 3.5 1.4 0.2 Setosa versicolour 8. Conclusions 7.0 3.2 4.7 1.4 Versic-. if petal_length >= 1.80 and petal_width >= 1.75 6.3 3.3 6.0 2.5 Virgin. then Iris-virginica … … CEC-2005 Enginyeria i Arquitectura La Salle Page 5
  6. 6. Chk Problem - Two real attributes x,y E [0,1] 1. Introduction - Two classes 2. UCS Description - Permits varying complexity along: 3. Dataset Design a. Concept Complexity ( ) C tC l it (c) b. Dataset size (s) 4. UCS on unbalanced d. c. Imbalance l I bl level (i) l 5. Dealing imbalances 6. Class-sensitive acc. 7. Weighted class-sens. 8. Conclusions s=4096, c=4, i=2 #inst. maj. class = s/c2 = 4096/16 = 256 #inst. min. class = s/c2*2i = 4096/(16*4) = 64 CEC-2005 Enginyeria i Arquitectura La Salle Page 6
  7. 7. We ran UCS in chk with s=4096 c=4 and i=[0 7] s=4096, i=[0..7] 1. Introduction 2. UCS Description 3. Dataset Design 4. UCS on unbalanced d. 5. Dealing imbalances 6. Class-sensitive acc. 7. Weighted class-sens. 8. Conclusions Training datasets for chk problem CEC-2005 Enginyeria i Arquitectura La Salle Page 7
  8. 8. Obtaining the following results 1. Introduction 2. UCS Description 3. Dataset Design 4. UCS on unbalanced d. 5. Dealing imbalances 6. Class-sensitive acc. 7. Weighted class-sens. 8. Conclusions Boundaries evolved by UCS in the chk problem with imbalance levels from 0 to 7 CEC-2005 Enginyeria i Arquitectura La Salle Page 8
  9. 9. Analyzing the population evolved in higher imbalance levels 1. Introduction Id condition diti Class Cl Acc A F Num N 2. UCS Description 1 [0.509, 0.750] [0.259, 0.492] 1 1.00 1.00 39 2 [0.000, 0.231] [0.252, 0.492] 1 1.00 1.00 38 3. Dataset Design 3 [0.000, 0,248] [0.755, 1.000] 1 1.00 1.00 35 4 [0.761, 1.000] [0.000, 0.249] 1 1.00 1.00 34 4. UCS on unbalanced d. 5 [0.255, 0.498] [0.520, 0.730] 1 1.00 1.00 33 18 rules predicting the 6 [0.751, 1.000] [0.514, 0.737] 1 1.00 1.00 31 5. Dealing imbalances under-sized 7 [0.259, 0.498] [0.000, 0.244] 1 1.00 1.00 27 class As imbalance level increases,1.00 18 the 6. Class-sensitive acc. 8 [ [0.501, 0.743] [0.751, 1.000] , ][ , ] 1 1.00 accuracy of the over-general 9 [0.500, 0.743] [0.751, 1.000] 1 1.00 1.00 9 classifiers increases too. Then, they 8 7. Weighted class-sens. 10 [0.751, 1.000] [0.531, 0.737] 1 1.00 1.00 become stronger… the population. g in pp 8. Conclusions 18 [0.509, 0.750] [0.246, 0.492] 1 0.64 0.01 1 19 [0.000, 1.000] [0.000, 1.000] 0 0.94 0.54 20 47 rules 20 [0.000, 1.000] [0.000, 0.990] 0 0.94 0.54 13 predicting the over-sized 21 [0.012, 1.000] [0.000, 0.990] 0 0.94 0.54 10 class … 64 [0.012, 1.000] [0.038, 0.973] 0 0.94 0.54 1 Rules for imbalance level i=4 CEC-2005 Enginyeria i Arquitectura La Salle Page 9
  10. 10. Methods to deal with imbalances 1. Introduction 2. UCS Description 1. Methods that act at the Sampling Level 3. Dataset Design 4. UCS on unbalanced d. Suppressing the bias towards the majority class Oversampling, Undersampling, Ud li Changing somehow the information available in the 5. Dealing imbalances … training dataset. 6. Class-sensitive acc. 7. Weighted class-sens. 2. Methods that act at the System Level 8. Conclusions Cost-sensitive Imbalanced Datasets are a problem because rare learning classes tends to be more costly to learn approach CEC-2005 Enginyeria i Arquitectura La Salle Page 10
  11. 11. Class-sensitive Class sensitive accuracy 1. Introduction We W compute accuracy for each class t f hl • 2. UCS Description ci Ci = number of examples of class i correctly classified acci = 3. Dataset Design expi = number of examples of class i covered by the rule b f l fl d b th l exp i 4. UCS on unbalanced d. The compound accuracy • 5. Dealing imbalances C 1 ∑ acc C = Number of classes of the problem acc = 6. Class-sensitive acc. Ce = Number of different classes that a rule i Ce i =1|expi > 0 covers. 7. Weighted class-sens. Changing accuracy also changes fitness. So, fitness of 8. Conclusions individuals predicting instances of more than one class decreases very fast y CEC-2005 Enginyeria i Arquitectura La Salle Page 11
  12. 12. 1. Introduction 2. UCS Descript. 3. Dataset Design 4. UCS on unbal. 5. Dealing imb. 6. Chk Problem 7. Contrasting res. 8. Conclusions Class-sensitive accuracy CEC-2005 Enginyeria i Arquitectura La Salle Page 12
  13. 13. Analyzing the population evolved in higher imbalance levels Id condition Class Acc F Num 1. Introduction 1 [0.485, 0.756] [0.483, 0.753] 0 1 - 1.00 34 2 [0.000, 0.253] [0.502, 0.756] [0 000 0 253] [0 502 0 756] 0 1 - 1.00 1 00 34 2. UCS Description 3 [0.252, 0.505] [0.750, 1.000] 0 1 - 1.00 32 8 rules 4 [0.753, 1.000] [0.749, 1.000] 0 1 - 1.00 31 predicting the 3. Dataset Design 5 [0.737, 1.000] [0.238, 0.515] 0 1 - 1.00 29 over-sized over sized class 6 [0.499, 0.772] [0.000, 0.277] 0 1 - 1.00 27 4. UCS on unbalanced d. 7 [0.000, 0.244] [0.000, 0.248] 0 1 - 1.00 27 8 [ [0.225, 0.544] [0.223, 0.529] , ][ , ] 0 1 - 1.00 27 9 [0.252, 0.499] [0.000, 0.207] 1 - 1 1.00 21 5. Dealing imbalances 10 [0.752, 1.000] [0.000, 0.242] 1 - 1 1.00 18 11 [0.751, 1.000] [0.502, 0.738] 1 - 1 1.00 15 6. Class-sensitive acc. 8 rules 12 [0.506, 0.734] [0.761, 1.000] 1 - 1 1.00 15 predicting the 13 [0.510, 0.741] [0.252, 0.479] 1 - 1 1.00 13 under-sized 7. Weighted class-sens. class 14 [0.000, 0.233] [0.757, 1.000] 1 - 1 1.00 12 15 [0.000, 0.240] [0.254, 0.485] 1 - 1 1.00 11 8. Conclusions 16 [0.252, 0.488] [0.516, 0.743] 1 - 1 1.00 6 17 [0.252, 0.498] [0.516, 0.692] 1 - 1 1.00 6 18 [0.000, 0.227] [0.757, 1.000] 1 - 1 1.00 4 19 [0.504, 0.772] [0.000, 0.277] 0 1 - 1.00 4 … Rules for imbalance level i=4 using class-sensitive accuracy CEC-2005 Enginyeria i Arquitectura La Salle Page 13
  14. 14. Weighted Class-sensitive accuracy Class sensitive 1. Introduction We W compute accuracy for each class t f hl • 2. UCS Description ci Ci = number of examples of class i correctly classified acci = 3. Dataset Design expi = number of examples of class i covered by the rule b f l fl d b th l exp i 4. UCS on unbalanced d. The compound accuracy • 5. Dealing imbalances ⎧ C ∑ 1 ∀ i : exp i ≥ θ acc ⎪ acc i Ce = Number of different C e i = 1|exp > 0 acc = ⎨ 1 C i 6. Class-sensitive acc. classes that a rule ep covers. ⎪ C e i =1|∑iacc i w i otherwise ⎩ 7. Weighted class-sens. exp > 0 Where 8. Conclusions • ⎧ expi Cee = Number of if ..0 < expi < θ acc ⎪ θacc f experienced wi = ⎨ classes Ce ·θacc −∑ C expi ⎪ Θacc = threshold below if .. expi ≥ θ acc i =1|0<expi <θ acc f p ⎩ which a class is hi h l i Cee ·θacc inexperienced CEC-2005 Enginyeria i Arquitectura La Salle Page 14
  15. 15. 1. Introduction 2. UCS Descript. 3. Dataset Design 4. UCS on unbal. 5. Dealing imb. 6. Chk Problem 7. Contrasting res. 8. Conclusions Weighted Class-sensitive accuracy CEC-2005 Enginyeria i Arquitectura La Salle Page 15
  16. 16. Conclusions 1. Introduction 2. UCS Description • The class imbalance problem has appeared to be a real problem on UCS 3. Dataset Design – For high unbalanced datasets, overgeneral rules interfere 4. UCS on unbalanced d. with specific rules covering the minority class regions • We proposed fitness adaptation based on class- 5. Dealing imbalances sensitive accuracy to diminish the generalization 6. Class-sensitive acc. pressure th t GA makes (guided by fitness) that k ( id d b fit ) 7. Weighted class-sens. – UCS can discover the right boundaries, but tends to uncover some regions of the feature space 8. Conclusions • A weighted accuracy function was proposed to improve the coverage of the method CEC-2005 Enginyeria i Arquitectura La Salle Page 16
  17. 17. Further Work 1. Introduction 2. UCS Description • Enhance the study analyzing the contribution of each complexity factor 3. Dataset Design • Introducing new problems to the analysis 4. UCS on unbalanced d. 5. Dealing imbalances • Use real-world datasets to validate the results 6. Class-sensitive acc. • T t new strategies that act at the sampling Test t t i th t t t th li level (to appear IWLCS’05) 7. Weighted class-sens. • How can the new strategies deal with: 8. Conclusions – Noise – Scarcity CEC-2005 Enginyeria i Arquitectura La Salle Page 17
  18. 18. Thanks f Th k for you attention tt ti CEC-2005 Enginyeria i Arquitectura La Salle Page 18

×