Your SlideShare is downloading. ×
Learning Classifier Systems  for Class Imbalance  Problems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Learning Classifier Systems for Class Imbalance Problems

921
views

Published on

Ester Bernadó-Mansilla analyzes the behavior of LCS on extreme class imbalance problems

Ester Bernadó-Mansilla analyzes the behavior of LCS on extreme class imbalance problems

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
921
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
  • 2. Aim Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 3. Framework model LCS Dataset + estimated performance • Representativity of the target • Evolutionary pressures concept • Interpretability • Geometrical complexity • Domain of applicability • Class imbalance • Noise Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 4. Class Imbalance When one class is represented by a small number of  examples, compared to other class/es. Usually the class of that describes the circumscribed  concept (positive class) is the minority class Where?  Rare medical diagnoses  Fraud detection  Oil spills in satellite images  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 5. Class Imbalance and Classifiers Is there a bias towards the majority class?  Probably, because…  Most classifier schemes are trained to minimize the global error  As a result  They classify accurately the examples from the majority class  They tend to misclassify the examples of the minority class,  which are often those representing the target concept. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 6. Measures of Performance Confusion matrix Prediction A B Actual A true positive (TP) false negative (FN) B false positive (FP) true negative (TN) Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP) ROC curves Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 7. The Higher Class Imbalance: the Higher Bias? Dataset 1 Dataset 2 concept: 15 concept: 15 counterpart: 150 counterpart: 45 ratio: 10:1 ratio: 3:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 8. XCS XCS class input Set of Rules update search Genetic Reinforcement Algorithms Learning reward Environment Dataset Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 9. Our Approach with XCS Bounding XCS’s parameters for unbalanced datasets  Online identification of small disjuncts  Adaptation of parameters for the discovery of small  disjuncts Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 10. XCS’s Behavior in Unbalanced Datasets Unbalanced 11-multiplexer problem ir=16:1 ir=32:1 ir=64:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 11. XCS’s Population Most numerous rules, ir=128:1 Classifier P Error F Num ###########:0 1000 0.12 0.98 385 1.2 10-4 ###########:1 0.074 0.98 366 estimated estimated too high high prediction: overgeneral error: numerosity fitness 992.24 classifiers 15.38 7.75 Test examples are classified as belonging to the majority class Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 12. How Imbalance Affects XCS Classifier’s error  Stability of prediction and error estimates  Occurrence-based reproduction  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 13. Classifier’s Error in Unbalanced Datasets Will an overgeneral classifier be detected as inaccurate if the  imbalance ratio is high? Bound for inaccurate classifier: !quot;!0 Given the estimated prediction and error: P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl )) We derive: # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0 where !quot;!0 p =!C / C For Rmax = 1000 !0 = 1 we get maximum imbalance ratio: irmax = 1998 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 14. Prediction and Error Estimates and Learning Rate ir=128:1, ###########:0 Error Prediction β=0.2 β=0.002 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 15. Occurrence-based Reproduction Probability of occurrence (pocc) Given ir=maj/min: 0,6 Classifier poccB poccI 0,5 1/2 1/2 ########### :0 probability of occurrence 1/2 1/2 ########### :1 0,4 0,3 0000#######:0 1/32 0,2 0001#######:1 1/32 0,1 0 1 2 4 8 16 32 64 128 256 imbalance ratio 22ir p occB 00001######:1 00000######:0 ir + 1 ###########:0 ###########:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 16. Occurrence-based Reproduction Probability of reproduction (pGA)  1 pGA = TGA if Tocc < % GA #% GA where TGA $ quot; !Tocc otherwise With θGA=20:  GA Tocc … T (# # # # # # # # # # #: 0) ! quot; GA GA θGA Tocc GA … T (0000# # # # # # #: 0) ! T 1 GA occ θGA 1 Assuming non-overlapping Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 17. Guidelines for Parameter Tuning Rmax and є0 determine the threshold between negligible noise and  imbalance ratio β determines the size of the moving window. The window should be  high enough to allow computing examples from both classes: f min ! =k f maj θGA can counterbalance the reproduction opportunities of most frequent  (majority) and least frequent niches (minority): 1 ! GA = k ' f min Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 18. XCS with Parameters Tuning XCS with parameter tuning XCS with standard settings ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 19. XCS Tuning for Real-world Datasets How we can estimate the niche frequency?  Estimate from the ratio of majority class instances and minority  class instances Problem:  • This may not be related to the distribution of niches in the feature space Take the approach to the small disjuncts problem  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 20. Online Identification of Small Disjuncts We search for regions that promote  overgeneral classifiers Estimate ircl based on the classifier’s  experience on each class: exp max ircl = exp min Adapt β and θGA according to ircl  ircl = 20 / 4 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 21. Online Parameter Adaptation ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 22. What about UCS? Supervised XCS:  Needs less exploration  Avoids XCS’s fitness dilemma  More robust to parameter settings  Overgeneral classifiers also tend to overcome the  population Their probability of occurrence depends on the imbalance ratio  Partially minimized with fitness sharing  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 23. What about UCS? ir=256:1 ir=512:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 24. Are LCSs more error-prone to class imbalance than other classifier schemes? C4.5 SMO XCS TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00% Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00% Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88% bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10% gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70% gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72% gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81% gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40% gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21% gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95% h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78% pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75% tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72% thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10% thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93% thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05% wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43% wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57% wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65% wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89% wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37% wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00% wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27% wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52% Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%
  • 25. How can we Minimize the Effects of Small Disjuncts? Resampling the dataset:  Addresses small disjuncts Classical methods:  • Random oversampling • Random undersampling Assumes that clusterization will Heuristic methods:  find small • Tomek links disjuncts and • CNN match classifier’s • One-sided selection approximation • Smote Cluster-based oversampling  Could XCS benefit from the online Cost-sensitive classifiers  identification of small disjuncts? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 26. Domains of Applicability Should we use some counterbalancing scheme?  Which learning scheme should we use?  Is there a combination of counterbalancing  scheme+learner that beats all others? How can we know the presence of small  disjuncts? Are there other complexity factors mixed up with  the small disjuncts problem? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 27. Domains of Applicability Resampling/ Learn it! Classifier/ Resampling+classifier Where are LCSs placed? Dataset Dataset Suggested Prediction characterization approach Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 28. Future Directions Potential benefit of XCS to discover small disjuncts  …and learn from it online Further analyze UCS  How do LCSs perform w.r.t. other classifiers for unbalanced  datasets? Measures for small disjuncts identification  … and other possible complexity factors What is noise and what is a small disjunct?  In which cases a LCS is applicable?  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 29. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain

×