SlideShare a Scribd company logo
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain
Aim




                         Enhance the applicability of LCSs
                       to knowledge discovery from datasets

                                        Classification problems
                                         Real-world domains




Learning Classifier Systems for Class Imbalance Problems          Ester Bernadó-Mansilla
Framework




                                                                  model
                                                      LCS
              Dataset
                                                                    +
                                                               estimated
                                                              performance
       • Representativity of the target
                                                            • Evolutionary pressures
       concept

                                                            • Interpretability
       • Geometrical complexity

                                                            • Domain of applicability
       • Class imbalance

       • Noise
Learning Classifier Systems for Class Imbalance Problems                  Ester Bernadó-Mansilla
Class Imbalance


     When one class is represented by a small number of

     examples, compared to other class/es.

     Usually the class of that describes the circumscribed

     concept (positive class) is the minority class

     Where?

            Rare medical diagnoses
       
            Fraud detection
       
            Oil spills in satellite images
       




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Class Imbalance and Classifiers


     Is there a bias towards the majority class?




     Probably, because…

            Most classifier schemes are trained to minimize the global error
       



     As a result

            They classify accurately the examples from the majority class
       
            They tend to misclassify the examples of the minority class,
       
            which are often those representing the target concept.




Learning Classifier Systems for Class Imbalance Problems        Ester Bernadó-Mansilla
Measures of Performance


                                                       Confusion matrix

                                                               Prediction
                                                           A                   B
          Actual            A              true positive (TP)         false negative (FN)
                            B              false positive (FP)         true negative (TN)




                                Accuracy = (TP+TN)/(TP+FN+FP+TN)

                                TN rate = TN / (TN + FP)

                                TP rate = TP / (FN + TP)

                                ROC curves

Learning Classifier Systems for Class Imbalance Problems                             Ester Bernadó-Mansilla
The Higher Class Imbalance: the
                    Higher Bias?

                          Dataset 1                          Dataset 2

                       concept:      15                    concept:     15
                       counterpart: 150                    counterpart: 45
                       ratio:       10:1                   ratio:       3:1




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS

                                                                           XCS

                                                                                     class
                      input                                Set of
                                                           Rules

                                                                         update
                                      search

                                          Genetic                   Reinforcement
                                         Algorithms                   Learning


                                                                            reward

                                                           Environment




                                                             Dataset




Learning Classifier Systems for Class Imbalance Problems                                     Ester Bernadó-Mansilla
Our Approach with XCS


     Bounding XCS’s parameters for unbalanced datasets




     Online identification of small disjuncts




     Adaptation of parameters for the discovery of small

     disjuncts




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
XCS’s Behavior in Unbalanced
                     Datasets

                       Unbalanced 11-multiplexer problem




                                                    ir=16:1   ir=32:1   ir=64:1


Learning Classifier Systems for Class Imbalance Problems                          Ester Bernadó-Mansilla
XCS’s Population



                                        Most numerous rules, ir=128:1


            Classifier                    P                Error         F          Num
         ###########:0                  1000                0.12       0.98         385
                                       1.2 10-4
         ###########:1                                     0.074       0.98         366




                                        estimated          estimated                   too high
                                                                            high
                                        prediction:
          overgeneral                                        error:                   numerosity
                                                                          fitness
                                          992.24
          classifiers                                        15.38
                                           7.75

          Test examples are classified as belonging to the majority class



Learning Classifier Systems for Class Imbalance Problems                                  Ester Bernadó-Mansilla
How Imbalance Affects XCS


         Classifier’s error
   



         Stability of prediction and error estimates
   



         Occurrence-based reproduction
   




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Classifier’s Error in Unbalanced
                     Datasets

        Will an overgeneral classifier be detected as inaccurate if the
  
        imbalance ratio is high?

         Bound for inaccurate classifier:                       !quot;!0
          Given the estimated prediction and error:

                                          P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
                                          quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
          We derive:

                     # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
          where                                                                                  !quot;!0
                               p =!C / C
          For
                        Rmax = 1000 !0 = 1
          we get maximum imbalance ratio:

                        irmax = 1998
Learning Classifier Systems for Class Imbalance Problems                                            Ester Bernadó-Mansilla
Prediction and Error Estimates and
                     Learning Rate
                 ir=128:1, ###########:0
                                                            Error
                                       Prediction
       β=0.2
       β=0.002




Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Occurrence-based Reproduction


        Probability of occurrence (pocc)

        Given ir=maj/min:
                                                                                          0,6

           Classifier            poccB            poccI
                                                                                          0,5
                                  1/2             1/2
       ########### :0




                                                              probability of occurrence
                                  1/2             1/2
       ########### :1                                                                     0,4



                                                                                          0,3
       0000#######:0             1/32
                                                                                          0,2


       0001#######:1             1/32                                                     0,1



                                                                                           0
                                                                                                1    2    4   8    16   32   64   128   256

                                                                                                             imbalance ratio
                                               22ir
                                                     p occB                                     00001######:1             00000######:0
                                              ir + 1                                            ###########:0             ###########:1




Learning Classifier Systems for Class Imbalance Problems                                                                     Ester Bernadó-Mansilla
Occurrence-based Reproduction


      Probability of reproduction (pGA)
 

                                1
                      pGA =
                               TGA
                                                  if Tocc < % GA
                                  #% GA
                      where TGA $ quot;
                                  !Tocc           otherwise

      With θGA=20:
 
                                                                            GA
                                                                   Tocc
                                                                                 …
        T (# # # # # # # # # # #: 0) ! quot;
         GA                              GA
                                                                     θGA
                                                                                     Tocc
                                                                                                           GA
                                                                                       …
       T (0000# # # # # # #: 0) ! T 1
        GA                         occ
                                                                      θGA
1 Assuming   non-overlapping


 Learning Classifier Systems for Class Imbalance Problems                                   Ester Bernadó-Mansilla
Guidelines for Parameter Tuning


     Rmax and є0 determine the threshold between negligible noise and

     imbalance ratio

     β determines the size of the moving window. The window should be

     high enough to allow computing examples from both classes:
                                                                 f min
                                                       ! =k
                                                                 f maj
     θGA can counterbalance the reproduction opportunities of most frequent

     (majority) and least frequent niches (minority):

                                                                  1
                                                    ! GA = k '
                                                                 f min


Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
XCS with Parameters Tuning


                                                                     XCS with parameter tuning
              XCS with standard settings




ir=16:1           ir=32:1               ir=64:1                ir=64:1        ir=256:1



    Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
XCS Tuning for Real-world Datasets


     How we can estimate the niche frequency?



            Estimate from the ratio of majority class instances and minority
       
            class instances

            Problem:
       


               • This may not be related to the distribution of niches in the feature
                 space


            Take the approach to the small disjuncts problem
       




Learning Classifier Systems for Class Imbalance Problems                 Ester Bernadó-Mansilla
Online Identification of Small Disjuncts



     We search for regions that promote

     overgeneral classifiers

     Estimate ircl based on the classifier’s

     experience on each class:

                                        exp max
                               ircl =
                                        exp min


     Adapt β and θGA according to ircl



                                                           ircl = 20 / 4




Learning Classifier Systems for Class Imbalance Problems                   Ester Bernadó-Mansilla
Online Parameter Adaptation


                                                           ir=256:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
What about UCS?


             Supervised XCS:
       
                   Needs less exploration
               


             Avoids XCS’s fitness dilemma
       


             More robust to parameter settings
       


             Overgeneral classifiers also tend to overcome the
       
             population
                   Their probability of occurrence depends on the imbalance ratio
               
                   Partially minimized with fitness sharing
               




Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
What about UCS?


                                                           ir=256:1
                                     ir=512:1




Learning Classifier Systems for Class Imbalance Problems       Ester Bernadó-Mansilla
Are LCSs more error-prone to class imbalance
                        than other classifier schemes?
                                             C4.5                            SMO                      XCS
TP rate          Bal2c1              0,00%   ±             0,00%    0,00%     ±    0,00%     0,00%     ±         0,00%

                 Bal2c2             81,65%   ±             6,83%    93,72%    ±    4,64%     81,96%    ±         6,00%

                 Bal2c3             81,90%   ±             6,04%    93,77%    ±    5,59%     83,99%    ±         6,88%

                 bpa                42,95%   ±         14,09%       0,00%     ±    0,00%     61,38%    ±         9,10%

                 gls2c1             80,00%   ±         42,16%       0,00%     ±    0,00%     50,00%    ±        52,70%

                 gls2c2             35,00%   ±         47,43%       15,00%    ±    33,75%    55,00%    ±        49,72%

                 gls2c3             30,00%   ±         42,16%       0,00%     ±    0,00%     5,00%     ±        15,81%

                 gls2c4             75,00%   ±         32,63%       81,67%    ±    25,40%    81,67%    ±        25,40%

                 gls2c5             77,14%   ±         16,77%       10,00%    ±    9,64%     84,29%    ±        14,21%

                 gls2c6             59,82%   ±         15,13%       0,00%     ±    0,00%     81,79%    ±        13,95%

                 h-s                75,83%   ±         13,29%       80,00%    ±    7,03%     80,00%    ±         9,78%

                 pim                55,37%   ±         13,27%       53,38%    ±    6,42%     55,93%    ±         9,75%

                 tao                95,23%   ±             2,14%    84,11%    ±    6,17%     92,58%    ±         5,72%

                 thy2c1             90,00%   ±         16,10%       76,67%    ±    22,50%    90,00%    ±        16,10%

                 thy2c2             94,17%   ±         12,45%       54,17%    ±    24,92%    90,83%    ±        14,93%

                 thy2c3             90,95%   ±         10,34%       33,81%    ±    21,35%    90,71%    ±         8,05%

                 wav2c1             75,74%   ±             4,06%    88,51%    ±    3,20%     87,24%    ±         3,43%

                 wav2c2             72,34%   ±             3,89%    84,57%    ±    4,05%     78,72%    ±         2,57%

                 wab2c3             77,64%   ±             2,38%    89,97%    ±    3,48%     87,86%    ±         3,65%

                 wbdc               92,95%   ±             3,42%    95,42%    ±    5,36%     95,83%    ±         5,89%

                 wdbc               92,47%   ±             5,09%    94,81%    ±    2,71%     93,83%    ±         6,37%

                 wine2c1            89,00%   ±         16,63%      100,00%    ±    0,00%    100,00%    ±         0,00%

                 wine2c2            95,00%   ±             8,05%    98,33%    ±    5,27%     98,33%    ±         5,27%

                 wine2c3            90,18%   ±         11,70%       97,14%    ±    6,02%     98,57%    ±         4,52%
Learning Classifier Systems for Class Imbalance Problems                                               Ester Bernadó-Mansilla
                 wpbc               41,00%   ±         12,87%       9,50%     ±    17,07%    30,50%    ±        24,99%
How can we Minimize the Effects of
                     Small Disjuncts?


     Resampling the dataset:
                                                          Addresses small
                                                           disjuncts
            Classical methods:
       
               • Random oversampling
               • Random undersampling                          Assumes that
                                                               clusterization will
            Heuristic methods:
       
                                                               find small
               •   Tomek links
                                                               disjuncts and
               •   CNN                                         match classifier’s
               •   One-sided selection                         approximation
               •   Smote
            Cluster-based oversampling
                                                                 Could XCS
                                                                  benefit from the
                                                                  online
     Cost-sensitive classifiers

                                                                  identification of
                                                                  small disjuncts?
Learning Classifier Systems for Class Imbalance Problems               Ester Bernadó-Mansilla
Domains of Applicability


                   Should we use some counterbalancing scheme?
             


                   Which learning scheme should we use?
             


                   Is there a combination of counterbalancing
             
                   scheme+learner that beats all others?

                   How can we know the presence of small
             
                   disjuncts?

                   Are there other complexity factors mixed up with
             
                   the small disjuncts problem?
Learning Classifier Systems for Class Imbalance Problems   Ester Bernadó-Mansilla
Domains of Applicability


                                                           Resampling/
                                              Learn it!    Classifier/
                                                           Resampling+classifier
                                                                                     Where are
                                                                                       LCSs
                                                                                      placed?

          Dataset                  Dataset
                                                                             Suggested
                                                            Prediction
                               characterization
                                                                             approach


                                                Type of dataset:
                                       Geometrical distribution of classes
                                      Possible presence of small disjuncts
                                           Other complexity factors


Learning Classifier Systems for Class Imbalance Problems                       Ester Bernadó-Mansilla
Future Directions


                   Potential benefit of XCS to discover small disjuncts
             
                     …and learn from it online

                   Further analyze UCS
             


                   How do LCSs perform w.r.t. other classifiers for unbalanced
             
                   datasets?

                   Measures for small disjuncts identification
             
                     … and other possible complexity factors

                   What is noise and what is a small disjunct?
             


                   In which cases a LCS is applicable?
             


Learning Classifier Systems for Class Imbalance Problems            Ester Bernadó-Mansilla
Learning Classifier Systems
   for Class Imbalance
         Problems

           Ester Bernadó-Mansilla

    Research Group in Intelligent Systems
      Enginyeria i Arquitectura La Salle
          Universitat Ramon Llull
             Barcelona, Spain

More Related Content

Similar to Learning Classifier Systems for Class Imbalance Problems

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAclima
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsAlbert Orriols-Puig
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
Giorgos Bamparopoulos
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009
bwhitcher
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
Brian Lange
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
Roger Barga
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
Chakkrit (Kla) Tantithamthavorn
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
H K Yoon
 

Similar to Learning Classifier Systems for Class Imbalance Problems (10)

Structural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOAStructural Accuracy of Probabilistic Models in BOA
Structural Accuracy of Probabilistic Models in BOA
 
5 5 10
5 5 105 5 10
5 5 10
 
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
 
Statistical classification: A review on some techniques
Statistical classification: A review on some techniquesStatistical classification: A review on some techniques
Statistical classification: A review on some techniques
 
Whitcher Ismrm 2009
Whitcher Ismrm 2009Whitcher Ismrm 2009
Whitcher Ismrm 2009
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
The Impact of Class Rebalancing Techniques on the Performance and Interpretat...
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 

More from Xavier Llorà

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha Preview
Xavier Llorà
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with Meandre
Xavier Llorà
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
Xavier Llorà
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
Xavier Llorà
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Xavier Llorà
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Xavier Llorà
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Xavier Llorà
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Xavier Llorà
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
Xavier Llorà
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challenges
Xavier Llorà
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly Detection
Xavier Llorà
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier Systems
Xavier Llorà
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?
Xavier Llorà
 
NIGEL 2006 welcome
NIGEL 2006 welcomeNIGEL 2006 welcome
NIGEL 2006 welcome
Xavier Llorà
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Xavier Llorà
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Xavier Llorà
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring Language
Xavier Llorà
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
Xavier Llorà
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Xavier Llorà
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Xavier Llorà
 

More from Xavier Llorà (20)

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha Preview
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with Meandre
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challenges
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly Detection
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier Systems
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?
 
NIGEL 2006 welcome
NIGEL 2006 welcomeNIGEL 2006 welcome
NIGEL 2006 welcome
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring Language
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
 
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...
 

Recently uploaded

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Learning Classifier Systems for Class Imbalance Problems

  • 1. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
  • 2. Aim Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 3. Framework model LCS Dataset + estimated performance • Representativity of the target • Evolutionary pressures concept • Interpretability • Geometrical complexity • Domain of applicability • Class imbalance • Noise Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 4. Class Imbalance When one class is represented by a small number of  examples, compared to other class/es. Usually the class of that describes the circumscribed  concept (positive class) is the minority class Where?  Rare medical diagnoses  Fraud detection  Oil spills in satellite images  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 5. Class Imbalance and Classifiers Is there a bias towards the majority class?  Probably, because…  Most classifier schemes are trained to minimize the global error  As a result  They classify accurately the examples from the majority class  They tend to misclassify the examples of the minority class,  which are often those representing the target concept. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 6. Measures of Performance Confusion matrix Prediction A B Actual A true positive (TP) false negative (FN) B false positive (FP) true negative (TN) Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP) ROC curves Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 7. The Higher Class Imbalance: the Higher Bias? Dataset 1 Dataset 2 concept: 15 concept: 15 counterpart: 150 counterpart: 45 ratio: 10:1 ratio: 3:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 8. XCS XCS class input Set of Rules update search Genetic Reinforcement Algorithms Learning reward Environment Dataset Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 9. Our Approach with XCS Bounding XCS’s parameters for unbalanced datasets  Online identification of small disjuncts  Adaptation of parameters for the discovery of small  disjuncts Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 10. XCS’s Behavior in Unbalanced Datasets Unbalanced 11-multiplexer problem ir=16:1 ir=32:1 ir=64:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 11. XCS’s Population Most numerous rules, ir=128:1 Classifier P Error F Num ###########:0 1000 0.12 0.98 385 1.2 10-4 ###########:1 0.074 0.98 366 estimated estimated too high high prediction: overgeneral error: numerosity fitness 992.24 classifiers 15.38 7.75 Test examples are classified as belonging to the majority class Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 12. How Imbalance Affects XCS Classifier’s error  Stability of prediction and error estimates  Occurrence-based reproduction  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 13. Classifier’s Error in Unbalanced Datasets Will an overgeneral classifier be detected as inaccurate if the  imbalance ratio is high? Bound for inaccurate classifier: !quot;!0 Given the estimated prediction and error: P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl )) We derive: # quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0 where !quot;!0 p =!C / C For Rmax = 1000 !0 = 1 we get maximum imbalance ratio: irmax = 1998 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 14. Prediction and Error Estimates and Learning Rate ir=128:1, ###########:0 Error Prediction β=0.2 β=0.002 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 15. Occurrence-based Reproduction Probability of occurrence (pocc) Given ir=maj/min: 0,6 Classifier poccB poccI 0,5 1/2 1/2 ########### :0 probability of occurrence 1/2 1/2 ########### :1 0,4 0,3 0000#######:0 1/32 0,2 0001#######:1 1/32 0,1 0 1 2 4 8 16 32 64 128 256 imbalance ratio 22ir p occB 00001######:1 00000######:0 ir + 1 ###########:0 ###########:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 16. Occurrence-based Reproduction Probability of reproduction (pGA)  1 pGA = TGA if Tocc < % GA #% GA where TGA $ quot; !Tocc otherwise With θGA=20:  GA Tocc … T (# # # # # # # # # # #: 0) ! quot; GA GA θGA Tocc GA … T (0000# # # # # # #: 0) ! T 1 GA occ θGA 1 Assuming non-overlapping Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 17. Guidelines for Parameter Tuning Rmax and є0 determine the threshold between negligible noise and  imbalance ratio β determines the size of the moving window. The window should be  high enough to allow computing examples from both classes: f min ! =k f maj θGA can counterbalance the reproduction opportunities of most frequent  (majority) and least frequent niches (minority): 1 ! GA = k ' f min Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 18. XCS with Parameters Tuning XCS with parameter tuning XCS with standard settings ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 19. XCS Tuning for Real-world Datasets How we can estimate the niche frequency?  Estimate from the ratio of majority class instances and minority  class instances Problem:  • This may not be related to the distribution of niches in the feature space Take the approach to the small disjuncts problem  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 20. Online Identification of Small Disjuncts We search for regions that promote  overgeneral classifiers Estimate ircl based on the classifier’s  experience on each class: exp max ircl = exp min Adapt β and θGA according to ircl  ircl = 20 / 4 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 21. Online Parameter Adaptation ir=256:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 22. What about UCS? Supervised XCS:  Needs less exploration  Avoids XCS’s fitness dilemma  More robust to parameter settings  Overgeneral classifiers also tend to overcome the  population Their probability of occurrence depends on the imbalance ratio  Partially minimized with fitness sharing  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 23. What about UCS? ir=256:1 ir=512:1 Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 24. Are LCSs more error-prone to class imbalance than other classifier schemes? C4.5 SMO XCS TP rate Bal2c1 0,00% ± 0,00% 0,00% ± 0,00% 0,00% ± 0,00% Bal2c2 81,65% ± 6,83% 93,72% ± 4,64% 81,96% ± 6,00% Bal2c3 81,90% ± 6,04% 93,77% ± 5,59% 83,99% ± 6,88% bpa 42,95% ± 14,09% 0,00% ± 0,00% 61,38% ± 9,10% gls2c1 80,00% ± 42,16% 0,00% ± 0,00% 50,00% ± 52,70% gls2c2 35,00% ± 47,43% 15,00% ± 33,75% 55,00% ± 49,72% gls2c3 30,00% ± 42,16% 0,00% ± 0,00% 5,00% ± 15,81% gls2c4 75,00% ± 32,63% 81,67% ± 25,40% 81,67% ± 25,40% gls2c5 77,14% ± 16,77% 10,00% ± 9,64% 84,29% ± 14,21% gls2c6 59,82% ± 15,13% 0,00% ± 0,00% 81,79% ± 13,95% h-s 75,83% ± 13,29% 80,00% ± 7,03% 80,00% ± 9,78% pim 55,37% ± 13,27% 53,38% ± 6,42% 55,93% ± 9,75% tao 95,23% ± 2,14% 84,11% ± 6,17% 92,58% ± 5,72% thy2c1 90,00% ± 16,10% 76,67% ± 22,50% 90,00% ± 16,10% thy2c2 94,17% ± 12,45% 54,17% ± 24,92% 90,83% ± 14,93% thy2c3 90,95% ± 10,34% 33,81% ± 21,35% 90,71% ± 8,05% wav2c1 75,74% ± 4,06% 88,51% ± 3,20% 87,24% ± 3,43% wav2c2 72,34% ± 3,89% 84,57% ± 4,05% 78,72% ± 2,57% wab2c3 77,64% ± 2,38% 89,97% ± 3,48% 87,86% ± 3,65% wbdc 92,95% ± 3,42% 95,42% ± 5,36% 95,83% ± 5,89% wdbc 92,47% ± 5,09% 94,81% ± 2,71% 93,83% ± 6,37% wine2c1 89,00% ± 16,63% 100,00% ± 0,00% 100,00% ± 0,00% wine2c2 95,00% ± 8,05% 98,33% ± 5,27% 98,33% ± 5,27% wine2c3 90,18% ± 11,70% 97,14% ± 6,02% 98,57% ± 4,52% Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla wpbc 41,00% ± 12,87% 9,50% ± 17,07% 30,50% ± 24,99%
  • 25. How can we Minimize the Effects of Small Disjuncts? Resampling the dataset:  Addresses small disjuncts Classical methods:  • Random oversampling • Random undersampling Assumes that clusterization will Heuristic methods:  find small • Tomek links disjuncts and • CNN match classifier’s • One-sided selection approximation • Smote Cluster-based oversampling  Could XCS benefit from the online Cost-sensitive classifiers  identification of small disjuncts? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 26. Domains of Applicability Should we use some counterbalancing scheme?  Which learning scheme should we use?  Is there a combination of counterbalancing  scheme+learner that beats all others? How can we know the presence of small  disjuncts? Are there other complexity factors mixed up with  the small disjuncts problem? Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 27. Domains of Applicability Resampling/ Learn it! Classifier/ Resampling+classifier Where are LCSs placed? Dataset Dataset Suggested Prediction characterization approach Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 28. Future Directions Potential benefit of XCS to discover small disjuncts  …and learn from it online Further analyze UCS  How do LCSs perform w.r.t. other classifiers for unbalanced  datasets? Measures for small disjuncts identification  … and other possible complexity factors What is noise and what is a small disjunct?  In which cases a LCS is applicable?  Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
  • 29. Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain