SlideShare a Scribd company logo
1 of 64
Download to read offline
BioHEL System
                           Our approach
                                 Results
                              Summary




     Post-processing Operators for
             Decision Lists

                              María A. Franco

                         Supervisor: Jaume Bacardit
                        University of Nottingham, UK,
                           ICOS Research Group,
                        School of Computer Science
                             mxf@cs.nott.ac.uk


                               June 12, 2012



María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   1 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Motivation


   Goal of my PhD project
   To enhance evolutionary learning systems based on IRL
   (BioHEL) to work better with large scale datasets.

   How have we been doing this?
      Analysing the weaknesses of the system in different
      domains [Franco et al., 2012a]
      Improving the execution time by means of GPGPUs
       [Franco et al., 2010]
       Developing theoretical models that allow us to adapt
       parameters within the system [Franco et al., 2011]
       Improving the quality of the final solutions by means of
       local search (memetic operators) [Franco et al., 2012b]

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   2 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Motivation


   Goal of my PhD project
   To enhance evolutionary learning systems based on IRL
   (BioHEL) to work better with large scale datasets.

   How have we been doing this?
      Analysing the weaknesses of the system in different
      domains [Franco et al., 2012a]
      Improving the execution time by means of GPGPUs
       [Franco et al., 2010]
       Developing theoretical models that allow us to adapt
       parameters within the system [Franco et al., 2011]
       Improving the quality of the final solutions by means of
       local search (memetic operators) [Franco et al., 2012b]

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   2 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Motivation


   Goal of my PhD project
   To enhance evolutionary learning systems based on IRL
   (BioHEL) to work better with large scale datasets.

   How have we been doing this?
      Analysing the weaknesses of the system in different
      domains [Franco et al., 2012a]
      Improving the execution time by means of GPGPUs
       [Franco et al., 2010]
       Developing theoretical models that allow us to adapt
       parameters within the system [Franco et al., 2011]
       Improving the quality of the final solutions by means of
       local search (memetic operators) [Franco et al., 2012b]

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   2 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Motivation


   Goal of my PhD project
   To enhance evolutionary learning systems based on IRL
   (BioHEL) to work better with large scale datasets.

   How have we been doing this?
      Analysing the weaknesses of the system in different
      domains [Franco et al., 2012a]
      Improving the execution time by means of GPGPUs
       [Franco et al., 2010]
       Developing theoretical models that allow us to adapt
       parameters within the system [Franco et al., 2011]
       Improving the quality of the final solutions by means of
       local search (memetic operators) [Franco et al., 2012b]

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   2 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Motivation


   Goal of my PhD project
   To enhance evolutionary learning systems based on IRL
   (BioHEL) to work better with large scale datasets.

   How have we been doing this?
      Analysing the weaknesses of the system in different
      domains [Franco et al., 2012a]
      Improving the execution time by means of GPGPUs
       [Franco et al., 2010]
       Developing theoretical models that allow us to adapt
       parameters within the system [Franco et al., 2011]
       Improving the quality of the final solutions by means of
       local search (memetic operators) [Franco et al., 2012b]

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   2 / 29
BioHEL System
                                    Our approach
                                          Results
                                       Summary

Motivation


   Goal of this work
   To improve the quality of the decision lists by means of local
   search (memetic operators)

   Decision lists are a widespread paradigm in rule learning,
   guided local search and supervised learning.
   Example
       Pittsburgh Learning Classifier Systems
       Rule induction systems in mainstream machine learning
       (PART, CN2, JRip)



         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   3 / 29
BioHEL System
                                    Our approach
                                          Results
                                       Summary

Motivation


   Goal of this work
   To improve the quality of the decision lists by means of local
   search (memetic operators)

   Decision lists are a widespread paradigm in rule learning,
   guided local search and supervised learning.
   Example
       Pittsburgh Learning Classifier Systems
       Rule induction systems in mainstream machine learning
       (PART, CN2, JRip)



         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   3 / 29
BioHEL System
                                     Our approach
                                           Results
                                        Summary

Outline

   1   BioHEL
         Attribute List Knowledge Representation
         Structure of the solutions
         What is the problem?
   2   Our approach: Post-processing the rules
         Swapping
         Pruning
         Cleaning
   3   Results
   4   Summary
         Where to go from here?


          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   4 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Introduction to the BioHEL System




       BIOinformatics-oriented Hierarchical Evolutionary Learning
       - BioHEL [Bacardit et al., 2009]
       BioHEL is an evolutionary learning system that employs
       the Iterative Rule Learning (IRL) paradigm
       BioHEL was especially designed to cope with large scale
       datasets




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   5 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Attribute List Knowledge Representation


       Meta-representation to handle large amount of discrete
       and continuous attributes fast [Bacardit and Krasnogor, 2009].

                               ALKR Classifier Example

                          numAtt          3
                        whichAtt          0
                      predicates        0.5 0.7                     0.3

                      offsetPred           0
                              class       1


         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   6 / 29
BioHEL System       BioHEL System
                                     Our approach      Attribute List Knowledge Representation
                                           Results     Structure of the solutions
                                        Summary        What is the problem?

Attribute List Knowledge Representation


   Discrete attributes
   GABIL representation

                                      F1          F2       F3
                                     100          01      1101
                                     ABC          DE      FGHI

                  F 1 = A ∧ F 2 = E ∧ F 3 = (F ∨ G ∨ I)

   Continuous attributes
   Hyper-rectangle representation
                  C1 = [0.1, 0.3] ∧ C2 = [0.7, 0.9]


          María A. Franco. University of Nottingham    Post-processing Operators for Decision Lists   7 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

Solutions generated by the BioHEL system

      Since BioHEL uses IRL [Venturini, 1993] the solutions are
      hierarchical sets of rules ⇒ decision lists




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   8 / 29
BioHEL System      BioHEL System
                                    Our approach     Attribute List Knowledge Representation
                                          Results    Structure of the solutions
                                       Summary       What is the problem?

How can the rules be improved further?

   We encountered the following problems:
      The rules were learned in the wrong order
            Larger rulesets!

   Example




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   9 / 29
BioHEL System      BioHEL System
                                      Our approach     Attribute List Knowledge Representation
                                            Results    Structure of the solutions
                                         Summary       What is the problem?

How can the rules be improved further?


   We encountered the following problems:
      The rules did not have the correct specificity
              The number of attributes expressed was rather high!

   Example
     Problem:
     x1 = 1 ∧ x3 = 0                               Good
                                                   x1 = 1 ∧ x3 = 0
     000    =    0   100       =    1              Over-specific
     001    =    0   101       =    0              x1 = 1 ∧ x2 = 1 ∧ x3 = 0
     010    =    0   110       =    1              x1 = 1 ∧ x2 = 0 ∧ x3 = 0
     011    =    0   111       =    0


           María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   10 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning
       Cleaning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   11 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning
       Cleaning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   11 / 29
BioHEL System
                                                     Swapping
                                   Our approach
                                                     Pruning
                                         Results
                                                     Cleaning
                                      Summary

Rule Swapping



      Consist is swapping the order of the rules in the final
      rulesets.
      Which rules shall we swap? ⇒ Similarities

  Measure of similarity

                                   Dis                               Real
                  Dis              k     Sk (i, j)         Real                              Mi
     S(i, j) =               Dis
                                                      +                     Sk (i, j) +
                  NA         k     numVals(k )              NA                               NA
                                                                       k

             Measures the overlapping between rules


        María A. Franco. University of Nottingham    Post-processing Operators for Decision Lists   12 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   13 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




                                                               Helps erase
                                                               unnecessary rules
                                                               It does not ensure the
                                                               final rule set is minimal
                                                               It has to reevaluate the
                                                               rules in the new order in
                                                               each iteration




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   14 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




                                                               Helps erase
                                                               unnecessary rules
                                                               It does not ensure the
                                                               final rule set is minimal
                                                               It has to reevaluate the
                                                               rules in the new order in
                                                               each iteration




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   14 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?




                                                               Helps erase
                                                               unnecessary rules
                                                               It does not ensure the
                                                               final rule set is minimal
                                                               It has to reevaluate the
                                                               rules in the new order in
                                                               each iteration




        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   14 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   15 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   15 / 29
BioHEL System
                                                      Swapping
                                     Our approach
                                                      Pruning
                                           Results
                                                      Cleaning
                                        Summary

Rule pruning



      Drops attributes that do not affect the accuracy of the rules.

  Example
    Problem:
    x1 = 1 ∧ x3 = 0                               Good
                                                  x1 = 1 ∧ x3 = 0
    000    =    0   100       =    1              Over-specific
    001    =    0   101       =    0              x1 = 1 ∧ x2 = 1 ∧ x3 = 0
    010    =    0   110       =    1              x1 = 1 ∧ x2 = 0 ∧ x3 = 0
    011    =    0   111       =    0



          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   16 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning ⇒ Wait! This does not work if the other attributes
       are not correctly specified!
       Cleaning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   17 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning ⇒ Wait! This does not work if the other attributes
       are not correctly specified!
       Cleaning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   17 / 29
BioHEL System
                                                     Swapping
                                    Our approach
                                                     Pruning
                                          Results
                                                     Cleaning
                                       Summary

Our approach: Post-processing the rules




   Ruleset-wise operators
       Rule swapping

   Rule-wise operators
       Pruning ⇒ Wait! This does not work if the other attributes
       are not correctly specified!
       Cleaning




         María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   17 / 29
BioHEL System
                                                      Swapping
                                     Our approach
                                                      Pruning
                                           Results
                                                      Cleaning
                                        Summary

Rule cleaning


       In the χary domain is not always possible to drop attributes
       if the correct attributes are misaligned

   Example
  Problem:
  x1 nominal {a,b,c,d,e}                               Rule 1:
  x2 nominal {w,y,z}                                   x1 = (a ∨ b) ∧ x2 = w
  x3 nominal {m,n}
   Generated Rule:
   x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m

   We need to deactivate literals in the attributes

          María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   18 / 29
BioHEL System
                                                    Swapping
                                   Our approach
                                                    Pruning
                                         Results
                                                    Cleaning
                                      Summary

How does it works?

  Cleaning approaches:
      CL - Focus on the positives
      CL2 - Do not infer

       Continuous

         (- - - - ( (+ - + + + + - + -+) ) - - -)
        OLD            CL2      CL                                   CL     CL2         OLD

       Discrete
                111011               Values covered by possitive examples: a,b,c
        OLD                          Values covered by negative examples: c,e
                abcdef

                111000                        111001
         CL                          CL2
                abcdef                        abcdef


        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   19 / 29
BioHEL System
                                   Our approach
                                         Results
                                      Summary

Experimental design


      We analysed the operators over final rulesets generated
      with 35 real world problems
      3 stages of experiments
           Independent operators
           Combinations between CL and PR
           Combinations with the SW operator

  Questions
      Where are the most significant improvements?
      Are the results significant?
      What about the computational time?



        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   20 / 29
BioHEL System
                                   Our approach
                                         Results
                                      Summary

Experimental design


      We analysed the operators over final rulesets generated
      with 35 real world problems
      3 stages of experiments
           Independent operators
           Combinations between CL and PR
           Combinations with the SW operator

  Questions
      Where are the most significant improvements?
      Are the results significant?
      What about the computational time?



        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   20 / 29
BioHEL System
                                                      Our approach
                                                            Results
                                                         Summary

Results of the operators independently
                                                                   Atts
                       0
                     −5
                     −10
                     −15
                     −20

                                                                   Rules
                       0
                      −5
                     −10
                     −15
                     −20
                     −25
                     −30                                                                                                  Algorithm
    % of variation




                                                                 Test_acc                                                     CL
                      2                                                                                                       CL2
                      1                                                                                                       PR
                      0
                                                                                                                              SW
                     −1
                     −2
                     −3

                                                               Test_ensemble


                       2
                       0
                     −2
                     −4
                           Ad
                           C−
                           CN
                                    CN
                                    KD in
                                    Pa up
                                    SS X
                                    ba
                                    bp
                                    bre
                                                   cm
                                                   co
                                                   cr−
                                                   gls
                                                   h−
                                                   h−
                                                   h−
                                                   he
                                                   ion
                                                   irs
                                                   lab
                                                                                  lym
                                                                                  pe
                                                                                  pim
                                                                                          pr t
                                                                                          sa
                                                                                          so
                                                                                          thy
                                                                                          vo
                                                                                          wa
                                                                                          wb
                                                                                          wd
                                                                                          win
                                                                                          wp
                                                                                          zo
                                                                                             t




                                                                                             o
                                                       l




                                                                                             t
                                                                                             n
                                       l
                                       a




                                                       p




                                                                                     n
                                                       c1
                                                       h
                                                       s
                                       rM
                              ult




                                                                                               v
                                                                                               cd
                                                                                               bc


                                                                                               bc
                              4




                                         1




                                                         c
                                         DC




                                                         a
                                         −b




                                                                                                e
                           María A. Franco. University of Nottingham           Post-processing Operators for Decision Lists           21 / 29
22 / 29
                                                                                                                                                           PR−CL2−PR
                                                                                                                              PR−CL−PR
                                                                                                                                         CL2−PR
                                                                                                                                                  PR−CL2
                                                                                                             CL−PR
                                                                                                                      PR−CL
                                                                                                 Algorithm




                                                                                                                                                                                                          Post-processing Operators for Decision Lists

                                                                        Atts                                         Test_acc                                              Test_ensemble
                                                                                                                                                                                                   o
                                                                                                                                                                                                zo bc
                                                                                                                                                                                                wp e
                                                                                                                                                                                                winbc
                                                                                                                                                                                                wdcd
                                                                                                                                                                                                wbv
                                                                                                                                                                                                wa
                                                                                                                                                                                                   t
                                                                                                                                                                                                vo
                                                                                                                                                                                                thy
                                                                                                                                                                                                    n
                                                                                                                                                                                                so
                                                                                                                                                                                                    t
                                                                                                                                                                                                sa
                                                                                                                                                                                                pr t
                                                                                                                                                                                                pim n
                                                                                                                                                                                                pe
                                                                                                                                                                                                lym
                                                                                                                                                                                                lab
                                                                                                                                                                                                irs
                                                                                                                                                                                                ionp
                                                   CL2




                                                                                                                                                                                                hes
                                                                                                                                                                                                h−h
                                                                                                                                                                                                h−c1
                                                                                                                                                                                                h−
                                                                                                                                                                                                gls a
                                                                                                                                                                                                cr− l
                                                                                                                                                                                                co c
                                                                                                                                                                                                cm
                                                                                                                                                                                                bre a
                                                                                                                                                                                                bpl
                                                                                                                                                                                                ba 1
                                                                                                                                                                                                SSrMX p
                                                                                                                                                                                                Pa DCu
                                                                                                                                                                                                KD −bin
                                                                                                                                                                                                CN
                                                                                                                                                                                                CN4
                  Results of combining CL and PR




                                                                                                                                                                                                C−ult
                                                                                                                                                                                                Ad
BioHEL System
  Our approach
        Results
     Summary




                                                                                                                                                                                                          María A. Franco. University of Nottingham
                                                                                                                                                                                                   o
                                                                                                                                                                                                zo bc
                                                                                                                                                                                                wp e
                                                                                                                                                                                                winbc
                                                                                                                                                                                                wdcd
                                                                                                                                                                                                wbv
                                                                                                                                                                                                wa
                                                                                                                                                                                                   t
                                                                                                                                                                                                vo
                                                                                                                                                                                                thy
                                                                                                                                                                                                    n
                                                                                                                                                                                                so
                                                                                                                                                                                                    t
                                                                                                                                                                                                sa
                                                                                                                                                                                                pr t
                                                                                                                                                                                                pim n
                                                                                                                                                                                                pe
                                                                                                                                                                                                lym
                                                                                                                                                                                                lab
                                                                                                                                                                                                irs
                                                                                                                                                                                                ionp
                                                   CL




                                                                                                                                                                                                hes
                                                                                                                                                                                                h−h
                                                                                                                                                                                                h−c1
                                                                                                                                                                                                h−
                                                                                                                                                                                                gls a
                                                                                                                                                                                                cr−
                                                                                                                                                                                                    l
                                                                                                                                                                                                co c
                                                                                                                                                                                                cm
                                                                                                                                                                                                bre a
                                                                                                                                                                                                bpl
                                                                                                                                                                                                ba 1
                                                                                                                                                                                                SSrMX p
                                                                                                                                                                                                Pa DCu
                                                                                                                                                                                                KD −bin
                                                                                                                                                                                                CN
                                                                                                                                                                                                CN4
                                                                                                                                                                                                C−ult
                                                                                                                                                                                                Ad




                                                         0
                                                             −5
                                                                  −10
                                                                        −15
                                                                               −20
                                                                                     −25
                                                                                           −30
                                                                                                 2
                                                                                                        1
                                                                                                                 0
                                                                                                                          −1
                                                                                                                                   −2
                                                                                                                                            −3
                                                                                                                                                     −4

                                                                                                                                                                       4

                                                                                                                                                                            2

                                                                                                                                                                                0

                                                                                                                                                                                     −2

                                                                                                                                                                                           −4
                                                                                                               % of variation
BioHEL System
                                                     Our approach
                                                           Results
                                                        Summary

Results of combining CL, PR and SW
                                                              Atts
                      0
                     −5
                    −10
                    −15
                    −20
                    −25

                                                              Rules
                      0
                     −5
                    −10
                    −15
                    −20
                    −25
                    −30                                                                                     Algorithm
   % of variation




                                                            Test_acc                                            CL−SW
                      2                                                                                         CL2−SW
                      1                                                                                         PR−SW
                      0
                                                                                                                PR−CL2−PR−SW
                    −1
                    −2
                    −3

                                                          Test_ensemble
                      4
                      2
                      0
                    −2
                    −4
                          Ad
                          C−
                          CN
                          CN
                          KD bin
                          Pa Cup
                          SS X
                          ba
                          bp
                          bre
                          cm
                          co
                          cr−
                          gls
                          h−
                          h−
                          h−
                          he
                          ion
                          irs
                          lab
                          lym
                          pe
                          pim
                          pr t
                          sa
                          so
                          thy
                          vo
                          wa
                          wb
                          wd
                          win
                          wp
                          zo
                             t




                             o
                              l




                              t
                              n
                              l
                              a




                              p




                              n
                              c1
                              h
                              s
                               rM
                               ult




                               v
                               cd
                               bc

                               bc
                                4




                                1




                                c
                                D




                                a
                                −




                                e

                          María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists         23 / 29
BioHEL System
                                            Our approach
                                                  Results
                                               Summary

Are the results significant?


   Table: Rankings of the Friedman statistical tests. indicates that the
   algorithm is significantly better (Holm test with 99% confidence).
                                Test                    Test                 # Rules                   # Atts
                                acc                    ensem

   P-Values                    0.708                   0.962                 8.9e-09                  2.2e-16

   Base                                7.80                 7.07          3.73                    10.84
   CL                                  7.73                 7.86                 –                10.84
   CL2                                 7.64                 7.84                 –                10.84
   PR                                  7.57                 7.21                 –                 5.53
   SW                                  7.51                 6.60          2.59                    11.30

   CL-PR                               6.37                 7.29                 –                 3.97
   PR -CL                              6.67                 7.31                 –                 5.53
   PR-CL-PR                            5.87                 6.79                 –                 1.51
   CL2-PR                              6.59                 6.79                 –                 5.81
   PR -CL2                             6.89                 7.16                 –                 5.71
   PR-CL2-PR                           6.36                 6.91                 –                 2.29

   CL-SW                               7.14                 6.51          2.07                    11.23
   CL2-SW                              7.46                 6.83          2.40                    11.17
   PR-SW                               6.94                 6.29          2.14                     5.94
   PR-CL2-PR-SW                        6.46                 6.54          2.07                     2.47


               María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists     24 / 29
BioHEL System
                                            Our approach
                                                  Results
                                               Summary

Are the results significant?


   Table: Rankings of the Friedman statistical tests. indicates that the
   algorithm is significantly better (Holm test with 99% confidence).
                                Test                    Test                 # Rules                   # Atts
                                acc                    ensem

   P-Values                    0.708                   0.962                 8.9e-09                  2.2e-16

   Base                                7.80                 7.07          3.73                    10.84
   CL                                  7.73                 7.86                 –                10.84
   CL2                                 7.64                 7.84                 –                10.84
   PR                                  7.57                 7.21                 –                 5.53
   SW                                  7.51                 6.60          2.59                    11.30

   CL-PR                               6.37                 7.29                 –                 3.97
   PR -CL                              6.67                 7.31                 –                 5.53
   PR-CL-PR                            5.87                 6.79                 –                 1.51
   CL2-PR                              6.59                 6.79                 –                 5.81
   PR -CL2                             6.89                 7.16                 –                 5.71
   PR-CL2-PR                           6.36                 6.91                 –                 2.29

   CL-SW                               7.14                 6.51          2.07                    11.23
   CL2-SW                              7.46                 6.83          2.40                    11.17
   PR-SW                               6.94                 6.29          2.14                     5.94
   PR-CL2-PR-SW                        6.46                 6.54          2.07                     2.47


               María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists     24 / 29
BioHEL System
                                            Our approach
                                                  Results
                                               Summary

Are the results significant?


   Table: Rankings of the Friedman statistical tests. indicates that the
   algorithm is significantly better (Holm test with 99% confidence).
                                Test                    Test                 # Rules                   # Atts
                                acc                    ensem

   P-Values                    0.708                   0.962                 8.9e-09                  2.2e-16

   Base                                7.80                 7.07          3.73                    10.84
   CL                                  7.73                 7.86                 –                10.84
   CL2                                 7.64                 7.84                 –                10.84
   PR                                  7.57                 7.21                 –                 5.53
   SW                                  7.51                 6.60          2.59                    11.30

   CL-PR                               6.37                 7.29                 –                 3.97
   PR -CL                              6.67                 7.31                 –                 5.53
   PR-CL-PR                            5.87                 6.79                 –                 1.51
   CL2-PR                              6.59                 6.79                 –                 5.81
   PR -CL2                             6.89                 7.16                 –                 5.71
   PR-CL2-PR                           6.36                 6.91                 –                 2.29

   CL-SW                               7.14                 6.51          2.07                    11.23
   CL2-SW                              7.46                 6.83          2.40                    11.17
   PR-SW                               6.94                 6.29          2.14                     5.94
   PR-CL2-PR-SW                        6.46                 6.54          2.07                     2.47


               María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists     24 / 29
BioHEL System
                                        Our approach
                                              Results
                                           Summary

How long does the post-processing takes?




   Table: Execution time of the application of each one of the different
   operators independently
   Prob      Ins          Rules             Atts           CL2 (s)            PR (s)               SW (s)

   CN-bin   493788    38.20 ± 1.85       7.12±0.73        17.44±0.76       20.52±0.82         157.51±76.42
   Adult     43960   194.24 ± 10.26     10.18±2.80        49.87±3.85       69.60±10.22       5855.04±874.14
   CN       234638   253.34 ± 12.48     10.09±2.78       314.02±26.01     631.68±70.09      43097.44±5429.48
   KDD      444619   188.84 ± 13.52      4.25±2.99       213.95±18.25     375.85±59.00      23791.21±5041.45
   C-4       60803   316.14 ± 19.10      9.96±3.23        96.49±8.33      192.21±24.76      18763.03±2614.41
   ParMX    235929   394.34 ± 19.39      9.00±0.01       405.77±37.05     619.20±82.02     106343.70±13094.78
   SS1       75583   773.26 ± 30.42     11.49±3.40       293.70±23.26     649.51±85.94     133415.03±19160.27


   Swapping is very slow... It depends on the number of instances
   and number of rules generated.


             María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists       25 / 29
BioHEL System
                                        Our approach
                                              Results
                                           Summary

How long does the post-processing takes?




   Table: Execution time of the application of each one of the different
   operators independently
   Prob      Ins          Rules             Atts           CL2 (s)            PR (s)               SW (s)

   CN-bin   493788    38.20 ± 1.85       7.12±0.73        17.44±0.76       20.52±0.82         157.51±76.42
   Adult     43960   194.24 ± 10.26     10.18±2.80        49.87±3.85       69.60±10.22       5855.04±874.14
   CN       234638   253.34 ± 12.48     10.09±2.78       314.02±26.01     631.68±70.09      43097.44±5429.48
   KDD      444619   188.84 ± 13.52      4.25±2.99       213.95±18.25     375.85±59.00      23791.21±5041.45
   C-4       60803   316.14 ± 19.10      9.96±3.23        96.49±8.33      192.21±24.76      18763.03±2614.41
   ParMX    235929   394.34 ± 19.39      9.00±0.01       405.77±37.05     619.20±82.02     106343.70±13094.78
   SS1       75583   773.26 ± 30.42     11.49±3.40       293.70±23.26     649.51±85.94     133415.03±19160.27


   Swapping is very slow... It depends on the number of instances
   and number of rules generated.


             María A. Franco. University of Nottingham       Post-processing Operators for Decision Lists       25 / 29
BioHEL System
                                   Our approach
                                                    Where to go from here?
                                         Results
                                      Summary

Summary and next steps



  Summary
      The operators manage to reduce the number of rules and
      expressed attributes in 30% in some cases.
  Next steps
      Apply the CL and PR operators during the learning process
      Investigate other measures of similarities among rules
      Apply these operators over other systems
           Different representations
      CUDA accelerated operators?



        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   26 / 29
BioHEL System
                                   Our approach
                                                    Where to go from here?
                                         Results
                                      Summary

Summary and next steps



  Summary
      The operators manage to reduce the number of rules and
      expressed attributes in 30% in some cases.
  Next steps
      Apply the CL and PR operators during the learning process
      Investigate other measures of similarities among rules
      Apply these operators over other systems
           Different representations
      CUDA accelerated operators?



        María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   26 / 29
BioHEL System
                                        Our approach
                                                              Where to go from here?
                                              Results
                                           Summary

References I


      Bacardit, J., Burke, E., and Krasnogor, N. (2009).
      Improving the scalability of rule-based evolutionary learning.
      Memetic Computing, 1(1):55–67.

      Bacardit, J. and Krasnogor, N. (2009).
      A mixed discrete-continuous attribute list representation for large scale classification domains.
      In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages
      1155–1162, New York, NY, USA. ACM Press.

      Franco, M., Krasnogor, N., and Bacardit, J. (2012a).
      Analysing biohel using challenging boolean functions.
      Evolutionary Intelligence, 5:87–102.
      10.1007/s12065-012-0080-9.
      Franco, M. A., Krasnogor, N., and Bacardit, J. (2010).
      Speeding up the evaluation of evolutionary learning systems using GPGPUs.
      In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages
      1039–1046, New York, NY, USA. ACM.

      Franco, M. A., Krasnogor, N., and Bacardit, J. (2011).
      Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding.
      In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages
      1291–1298, New York, NY, USA. ACM.




           María A. Franco. University of Nottingham          Post-processing Operators for Decision Lists     27 / 29
BioHEL System
                                      Our approach
                                                         Where to go from here?
                                            Results
                                         Summary

References II




      Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b).
      Postprocessing operators for decision lists.
      In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation,
      page to appear, New York, NY, USA. ACM Press.

      Venturini, G. (1993).
      SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.
      In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine
      Learning, pages 280–296. Springer-Verlag.




           María A. Franco. University of Nottingham     Post-processing Operators for Decision Lists            28 / 29
BioHEL System
                           Our approach
                                            Where to go from here?
                                 Results
                              Summary




        Questions or comments?




María A. Franco. University of Nottingham   Post-processing Operators for Decision Lists   29 / 29

More Related Content

Similar to Post-processing Operators Improve Quality of Decision Lists in BioHEL System

AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionYannick Djoumbou
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionMario Jose Villamizar Cano
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsRafael C. Jimenez
 
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMGENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMijcseit
 
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMGENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMijcseit
 
Genetic fuzzy process metric measurement system for an operating system
Genetic fuzzy process metric measurement system for an operating systemGenetic fuzzy process metric measurement system for an operating system
Genetic fuzzy process metric measurement system for an operating systemijcseit
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsRavi Kumar
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur PipelineEman Abdelrazik
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Genetic algorithms in molecular design of novel fabrics Sylvia Wower
Genetic algorithms in molecular design of novel fabrics Sylvia Wower Genetic algorithms in molecular design of novel fabrics Sylvia Wower
Genetic algorithms in molecular design of novel fabrics Sylvia Wower Sylvia Wower
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...CS, NcState
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic BiologyNatalio Krasnogor
 
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...IJECEIAES
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification mlaij
 

Similar to Post-processing Operators Improve Quality of Decision Lists in BioHEL System (20)

AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite Prediction
 
06522405
0652240506522405
06522405
 
Bio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow executionBio-UnaGrid: Easing bioinformatics workflow execution
Bio-UnaGrid: Easing bioinformatics workflow execution
 
Java tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular InteractionsJava tutorial: Programmatic Access to Molecular Interactions
Java tutorial: Programmatic Access to Molecular Interactions
 
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMGENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
 
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEMGENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
GENETIC-FUZZY PROCESS METRIC MEASUREMENT SYSTEM FOR AN OPERATING SYSTEM
 
Genetic fuzzy process metric measurement system for an operating system
Genetic fuzzy process metric measurement system for an operating systemGenetic fuzzy process metric measurement system for an operating system
Genetic fuzzy process metric measurement system for an operating system
 
An interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patternsAn interactive approach to multiobjective clustering of gene expression patterns
An interactive approach to multiobjective clustering of gene expression patterns
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
Learning Ecosystem Metamodel Quality Assurance
Learning Ecosystem Metamodel Quality AssuranceLearning Ecosystem Metamodel Quality Assurance
Learning Ecosystem Metamodel Quality Assurance
 
Csmr11a.ppt
Csmr11a.pptCsmr11a.ppt
Csmr11a.ppt
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Genetic algorithms in molecular design of novel fabrics Sylvia Wower
Genetic algorithms in molecular design of novel fabrics Sylvia Wower Genetic algorithms in molecular design of novel fabrics Sylvia Wower
Genetic algorithms in molecular design of novel fabrics Sylvia Wower
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
 
Disease Prediction Using Machine Learning
Disease Prediction Using Machine LearningDisease Prediction Using Machine Learning
Disease Prediction Using Machine Learning
 
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping  Strategy for Systems & Synthetic BiologyTowards a Rapid Model Prototyping  Strategy for Systems & Synthetic Biology
Towards a Rapid Model Prototyping Strategy for Systems & Synthetic Biology
 
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...
Novel Bacteria Foraging Optimization for Energy-efficient Communication in Wi...
 
An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification An Ensemble of Filters and Wrappers for Microarray Data Classification
An Ensemble of Filters and Wrappers for Microarray Data Classification
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 

Post-processing Operators Improve Quality of Decision Lists in BioHEL System

  • 1. BioHEL System Our approach Results Summary Post-processing Operators for Decision Lists María A. Franco Supervisor: Jaume Bacardit University of Nottingham, UK, ICOS Research Group, School of Computer Science mxf@cs.nott.ac.uk June 12, 2012 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29
  • 2. BioHEL System Our approach Results Summary Motivation Goal of my PhD project To enhance evolutionary learning systems based on IRL (BioHEL) to work better with large scale datasets. How have we been doing this? Analysing the weaknesses of the system in different domains [Franco et al., 2012a] Improving the execution time by means of GPGPUs [Franco et al., 2010] Developing theoretical models that allow us to adapt parameters within the system [Franco et al., 2011] Improving the quality of the final solutions by means of local search (memetic operators) [Franco et al., 2012b] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
  • 3. BioHEL System Our approach Results Summary Motivation Goal of my PhD project To enhance evolutionary learning systems based on IRL (BioHEL) to work better with large scale datasets. How have we been doing this? Analysing the weaknesses of the system in different domains [Franco et al., 2012a] Improving the execution time by means of GPGPUs [Franco et al., 2010] Developing theoretical models that allow us to adapt parameters within the system [Franco et al., 2011] Improving the quality of the final solutions by means of local search (memetic operators) [Franco et al., 2012b] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
  • 4. BioHEL System Our approach Results Summary Motivation Goal of my PhD project To enhance evolutionary learning systems based on IRL (BioHEL) to work better with large scale datasets. How have we been doing this? Analysing the weaknesses of the system in different domains [Franco et al., 2012a] Improving the execution time by means of GPGPUs [Franco et al., 2010] Developing theoretical models that allow us to adapt parameters within the system [Franco et al., 2011] Improving the quality of the final solutions by means of local search (memetic operators) [Franco et al., 2012b] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
  • 5. BioHEL System Our approach Results Summary Motivation Goal of my PhD project To enhance evolutionary learning systems based on IRL (BioHEL) to work better with large scale datasets. How have we been doing this? Analysing the weaknesses of the system in different domains [Franco et al., 2012a] Improving the execution time by means of GPGPUs [Franco et al., 2010] Developing theoretical models that allow us to adapt parameters within the system [Franco et al., 2011] Improving the quality of the final solutions by means of local search (memetic operators) [Franco et al., 2012b] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
  • 6. BioHEL System Our approach Results Summary Motivation Goal of my PhD project To enhance evolutionary learning systems based on IRL (BioHEL) to work better with large scale datasets. How have we been doing this? Analysing the weaknesses of the system in different domains [Franco et al., 2012a] Improving the execution time by means of GPGPUs [Franco et al., 2010] Developing theoretical models that allow us to adapt parameters within the system [Franco et al., 2011] Improving the quality of the final solutions by means of local search (memetic operators) [Franco et al., 2012b] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
  • 7. BioHEL System Our approach Results Summary Motivation Goal of this work To improve the quality of the decision lists by means of local search (memetic operators) Decision lists are a widespread paradigm in rule learning, guided local search and supervised learning. Example Pittsburgh Learning Classifier Systems Rule induction systems in mainstream machine learning (PART, CN2, JRip) María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
  • 8. BioHEL System Our approach Results Summary Motivation Goal of this work To improve the quality of the decision lists by means of local search (memetic operators) Decision lists are a widespread paradigm in rule learning, guided local search and supervised learning. Example Pittsburgh Learning Classifier Systems Rule induction systems in mainstream machine learning (PART, CN2, JRip) María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
  • 9. BioHEL System Our approach Results Summary Outline 1 BioHEL Attribute List Knowledge Representation Structure of the solutions What is the problem? 2 Our approach: Post-processing the rules Swapping Pruning Cleaning 3 Results 4 Summary Where to go from here? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 4 / 29
  • 10. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Introduction to the BioHEL System BIOinformatics-oriented Hierarchical Evolutionary Learning - BioHEL [Bacardit et al., 2009] BioHEL is an evolutionary learning system that employs the Iterative Rule Learning (IRL) paradigm BioHEL was especially designed to cope with large scale datasets María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 5 / 29
  • 11. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Attribute List Knowledge Representation Meta-representation to handle large amount of discrete and continuous attributes fast [Bacardit and Krasnogor, 2009]. ALKR Classifier Example numAtt 3 whichAtt 0 predicates 0.5 0.7 0.3 offsetPred 0 class 1 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 6 / 29
  • 12. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Attribute List Knowledge Representation Discrete attributes GABIL representation F1 F2 F3 100 01 1101 ABC DE FGHI F 1 = A ∧ F 2 = E ∧ F 3 = (F ∨ G ∨ I) Continuous attributes Hyper-rectangle representation C1 = [0.1, 0.3] ∧ C2 = [0.7, 0.9] María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 7 / 29
  • 13. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 14. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 15. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 16. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 17. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 18. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 19. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 20. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 21. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 22. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 23. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? Solutions generated by the BioHEL system Since BioHEL uses IRL [Venturini, 1993] the solutions are hierarchical sets of rules ⇒ decision lists María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
  • 24. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? How can the rules be improved further? We encountered the following problems: The rules were learned in the wrong order Larger rulesets! Example María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 9 / 29
  • 25. BioHEL System BioHEL System Our approach Attribute List Knowledge Representation Results Structure of the solutions Summary What is the problem? How can the rules be improved further? We encountered the following problems: The rules did not have the correct specificity The number of attributes expressed was rather high! Example Problem: x1 = 1 ∧ x3 = 0 Good x1 = 1 ∧ x3 = 0 000 = 0 100 = 1 Over-specific 001 = 0 101 = 0 x1 = 1 ∧ x2 = 1 ∧ x3 = 0 010 = 0 110 = 1 x1 = 1 ∧ x2 = 0 ∧ x3 = 0 011 = 0 111 = 0 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 10 / 29
  • 26. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning Cleaning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
  • 27. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning Cleaning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
  • 28. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Rule Swapping Consist is swapping the order of the rules in the final rulesets. Which rules shall we swap? ⇒ Similarities Measure of similarity Dis Real Dis k Sk (i, j) Real Mi S(i, j) = Dis + Sk (i, j) + NA k numVals(k ) NA NA k Measures the overlapping between rules María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 12 / 29
  • 29. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 30. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 31. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 32. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 33. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 34. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 35. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 36. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 37. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 38. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
  • 39. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? Helps erase unnecessary rules It does not ensure the final rule set is minimal It has to reevaluate the rules in the new order in each iteration María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
  • 40. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? Helps erase unnecessary rules It does not ensure the final rule set is minimal It has to reevaluate the rules in the new order in each iteration María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
  • 41. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? Helps erase unnecessary rules It does not ensure the final rule set is minimal It has to reevaluate the rules in the new order in each iteration María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
  • 42. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
  • 43. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
  • 44. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Rule pruning Drops attributes that do not affect the accuracy of the rules. Example Problem: x1 = 1 ∧ x3 = 0 Good x1 = 1 ∧ x3 = 0 000 = 0 100 = 1 Over-specific 001 = 0 101 = 0 x1 = 1 ∧ x2 = 1 ∧ x3 = 0 010 = 0 110 = 1 x1 = 1 ∧ x2 = 0 ∧ x3 = 0 011 = 0 111 = 0 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 16 / 29
  • 45. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning ⇒ Wait! This does not work if the other attributes are not correctly specified! Cleaning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
  • 46. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning ⇒ Wait! This does not work if the other attributes are not correctly specified! Cleaning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
  • 47. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Our approach: Post-processing the rules Ruleset-wise operators Rule swapping Rule-wise operators Pruning ⇒ Wait! This does not work if the other attributes are not correctly specified! Cleaning María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
  • 48. BioHEL System Swapping Our approach Pruning Results Cleaning Summary Rule cleaning In the χary domain is not always possible to drop attributes if the correct attributes are misaligned Example Problem: x1 nominal {a,b,c,d,e} Rule 1: x2 nominal {w,y,z} x1 = (a ∨ b) ∧ x2 = w x3 nominal {m,n} Generated Rule: x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m We need to deactivate literals in the attributes María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 18 / 29
  • 49. BioHEL System Swapping Our approach Pruning Results Cleaning Summary How does it works? Cleaning approaches: CL - Focus on the positives CL2 - Do not infer Continuous (- - - - ( (+ - + + + + - + -+) ) - - -) OLD CL2 CL CL CL2 OLD Discrete 111011 Values covered by possitive examples: a,b,c OLD Values covered by negative examples: c,e abcdef 111000 111001 CL CL2 abcdef abcdef María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 19 / 29
  • 50. BioHEL System Our approach Results Summary Experimental design We analysed the operators over final rulesets generated with 35 real world problems 3 stages of experiments Independent operators Combinations between CL and PR Combinations with the SW operator Questions Where are the most significant improvements? Are the results significant? What about the computational time? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
  • 51. BioHEL System Our approach Results Summary Experimental design We analysed the operators over final rulesets generated with 35 real world problems 3 stages of experiments Independent operators Combinations between CL and PR Combinations with the SW operator Questions Where are the most significant improvements? Are the results significant? What about the computational time? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
  • 52. BioHEL System Our approach Results Summary Results of the operators independently Atts 0 −5 −10 −15 −20 Rules 0 −5 −10 −15 −20 −25 −30 Algorithm % of variation Test_acc CL 2 CL2 1 PR 0 SW −1 −2 −3 Test_ensemble 2 0 −2 −4 Ad C− CN CN KD in Pa up SS X ba bp bre cm co cr− gls h− h− h− he ion irs lab lym pe pim pr t sa so thy vo wa wb wd win wp zo t o l t n l a p n c1 h s rM ult v cd bc bc 4 1 c DC a −b e María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 21 / 29
  • 53. 22 / 29 PR−CL2−PR PR−CL−PR CL2−PR PR−CL2 CL−PR PR−CL Algorithm Post-processing Operators for Decision Lists Atts Test_acc Test_ensemble o zo bc wp e winbc wdcd wbv wa t vo thy n so t sa pr t pim n pe lym lab irs ionp CL2 hes h−h h−c1 h− gls a cr− l co c cm bre a bpl ba 1 SSrMX p Pa DCu KD −bin CN CN4 Results of combining CL and PR C−ult Ad BioHEL System Our approach Results Summary María A. Franco. University of Nottingham o zo bc wp e winbc wdcd wbv wa t vo thy n so t sa pr t pim n pe lym lab irs ionp CL hes h−h h−c1 h− gls a cr− l co c cm bre a bpl ba 1 SSrMX p Pa DCu KD −bin CN CN4 C−ult Ad 0 −5 −10 −15 −20 −25 −30 2 1 0 −1 −2 −3 −4 4 2 0 −2 −4 % of variation
  • 54. BioHEL System Our approach Results Summary Results of combining CL, PR and SW Atts 0 −5 −10 −15 −20 −25 Rules 0 −5 −10 −15 −20 −25 −30 Algorithm % of variation Test_acc CL−SW 2 CL2−SW 1 PR−SW 0 PR−CL2−PR−SW −1 −2 −3 Test_ensemble 4 2 0 −2 −4 Ad C− CN CN KD bin Pa Cup SS X ba bp bre cm co cr− gls h− h− h− he ion irs lab lym pe pim pr t sa so thy vo wa wb wd win wp zo t o l t n l a p n c1 h s rM ult v cd bc bc 4 1 c D a − e María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 23 / 29
  • 55. BioHEL System Our approach Results Summary Are the results significant? Table: Rankings of the Friedman statistical tests. indicates that the algorithm is significantly better (Holm test with 99% confidence). Test Test # Rules # Atts acc ensem P-Values 0.708 0.962 8.9e-09 2.2e-16 Base 7.80 7.07 3.73 10.84 CL 7.73 7.86 – 10.84 CL2 7.64 7.84 – 10.84 PR 7.57 7.21 – 5.53 SW 7.51 6.60 2.59 11.30 CL-PR 6.37 7.29 – 3.97 PR -CL 6.67 7.31 – 5.53 PR-CL-PR 5.87 6.79 – 1.51 CL2-PR 6.59 6.79 – 5.81 PR -CL2 6.89 7.16 – 5.71 PR-CL2-PR 6.36 6.91 – 2.29 CL-SW 7.14 6.51 2.07 11.23 CL2-SW 7.46 6.83 2.40 11.17 PR-SW 6.94 6.29 2.14 5.94 PR-CL2-PR-SW 6.46 6.54 2.07 2.47 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
  • 56. BioHEL System Our approach Results Summary Are the results significant? Table: Rankings of the Friedman statistical tests. indicates that the algorithm is significantly better (Holm test with 99% confidence). Test Test # Rules # Atts acc ensem P-Values 0.708 0.962 8.9e-09 2.2e-16 Base 7.80 7.07 3.73 10.84 CL 7.73 7.86 – 10.84 CL2 7.64 7.84 – 10.84 PR 7.57 7.21 – 5.53 SW 7.51 6.60 2.59 11.30 CL-PR 6.37 7.29 – 3.97 PR -CL 6.67 7.31 – 5.53 PR-CL-PR 5.87 6.79 – 1.51 CL2-PR 6.59 6.79 – 5.81 PR -CL2 6.89 7.16 – 5.71 PR-CL2-PR 6.36 6.91 – 2.29 CL-SW 7.14 6.51 2.07 11.23 CL2-SW 7.46 6.83 2.40 11.17 PR-SW 6.94 6.29 2.14 5.94 PR-CL2-PR-SW 6.46 6.54 2.07 2.47 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
  • 57. BioHEL System Our approach Results Summary Are the results significant? Table: Rankings of the Friedman statistical tests. indicates that the algorithm is significantly better (Holm test with 99% confidence). Test Test # Rules # Atts acc ensem P-Values 0.708 0.962 8.9e-09 2.2e-16 Base 7.80 7.07 3.73 10.84 CL 7.73 7.86 – 10.84 CL2 7.64 7.84 – 10.84 PR 7.57 7.21 – 5.53 SW 7.51 6.60 2.59 11.30 CL-PR 6.37 7.29 – 3.97 PR -CL 6.67 7.31 – 5.53 PR-CL-PR 5.87 6.79 – 1.51 CL2-PR 6.59 6.79 – 5.81 PR -CL2 6.89 7.16 – 5.71 PR-CL2-PR 6.36 6.91 – 2.29 CL-SW 7.14 6.51 2.07 11.23 CL2-SW 7.46 6.83 2.40 11.17 PR-SW 6.94 6.29 2.14 5.94 PR-CL2-PR-SW 6.46 6.54 2.07 2.47 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
  • 58. BioHEL System Our approach Results Summary How long does the post-processing takes? Table: Execution time of the application of each one of the different operators independently Prob Ins Rules Atts CL2 (s) PR (s) SW (s) CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42 Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14 CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48 KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45 C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41 ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78 SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27 Swapping is very slow... It depends on the number of instances and number of rules generated. María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
  • 59. BioHEL System Our approach Results Summary How long does the post-processing takes? Table: Execution time of the application of each one of the different operators independently Prob Ins Rules Atts CL2 (s) PR (s) SW (s) CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42 Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14 CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48 KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45 C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41 ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78 SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27 Swapping is very slow... It depends on the number of instances and number of rules generated. María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
  • 60. BioHEL System Our approach Where to go from here? Results Summary Summary and next steps Summary The operators manage to reduce the number of rules and expressed attributes in 30% in some cases. Next steps Apply the CL and PR operators during the learning process Investigate other measures of similarities among rules Apply these operators over other systems Different representations CUDA accelerated operators? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
  • 61. BioHEL System Our approach Where to go from here? Results Summary Summary and next steps Summary The operators manage to reduce the number of rules and expressed attributes in 30% in some cases. Next steps Apply the CL and PR operators during the learning process Investigate other measures of similarities among rules Apply these operators over other systems Different representations CUDA accelerated operators? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
  • 62. BioHEL System Our approach Where to go from here? Results Summary References I Bacardit, J., Burke, E., and Krasnogor, N. (2009). Improving the scalability of rule-based evolutionary learning. Memetic Computing, 1(1):55–67. Bacardit, J. and Krasnogor, N. (2009). A mixed discrete-continuous attribute list representation for large scale classification domains. In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1155–1162, New York, NY, USA. ACM Press. Franco, M., Krasnogor, N., and Bacardit, J. (2012a). Analysing biohel using challenging boolean functions. Evolutionary Intelligence, 5:87–102. 10.1007/s12065-012-0080-9. Franco, M. A., Krasnogor, N., and Bacardit, J. (2010). Speeding up the evaluation of evolutionary learning systems using GPGPUs. In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 1039–1046, New York, NY, USA. ACM. Franco, M. A., Krasnogor, N., and Bacardit, J. (2011). Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages 1291–1298, New York, NY, USA. ACM. María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 27 / 29
  • 63. BioHEL System Our approach Where to go from here? Results Summary References II Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b). Postprocessing operators for decision lists. In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation, page to appear, New York, NY, USA. ACM Press. Venturini, G. (1993). SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine Learning, pages 280–296. Springer-Verlag. María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 28 / 29
  • 64. BioHEL System Our approach Where to go from here? Results Summary Questions or comments? María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 29 / 29