1. BioHEL System
Our approach
Results
Summary
Post-processing Operators for
Decision Lists
María A. Franco
Supervisor: Jaume Bacardit
University of Nottingham, UK,
ICOS Research Group,
School of Computer Science
mxf@cs.nott.ac.uk
June 12, 2012
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29
2. BioHEL System
Our approach
Results
Summary
Motivation
Goal of my PhD project
To enhance evolutionary learning systems based on IRL
(BioHEL) to work better with large scale datasets.
How have we been doing this?
Analysing the weaknesses of the system in different
domains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs
[Franco et al., 2010]
Developing theoretical models that allow us to adapt
parameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means of
local search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
3. BioHEL System
Our approach
Results
Summary
Motivation
Goal of my PhD project
To enhance evolutionary learning systems based on IRL
(BioHEL) to work better with large scale datasets.
How have we been doing this?
Analysing the weaknesses of the system in different
domains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs
[Franco et al., 2010]
Developing theoretical models that allow us to adapt
parameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means of
local search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
4. BioHEL System
Our approach
Results
Summary
Motivation
Goal of my PhD project
To enhance evolutionary learning systems based on IRL
(BioHEL) to work better with large scale datasets.
How have we been doing this?
Analysing the weaknesses of the system in different
domains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs
[Franco et al., 2010]
Developing theoretical models that allow us to adapt
parameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means of
local search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
5. BioHEL System
Our approach
Results
Summary
Motivation
Goal of my PhD project
To enhance evolutionary learning systems based on IRL
(BioHEL) to work better with large scale datasets.
How have we been doing this?
Analysing the weaknesses of the system in different
domains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs
[Franco et al., 2010]
Developing theoretical models that allow us to adapt
parameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means of
local search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
6. BioHEL System
Our approach
Results
Summary
Motivation
Goal of my PhD project
To enhance evolutionary learning systems based on IRL
(BioHEL) to work better with large scale datasets.
How have we been doing this?
Analysing the weaknesses of the system in different
domains [Franco et al., 2012a]
Improving the execution time by means of GPGPUs
[Franco et al., 2010]
Developing theoretical models that allow us to adapt
parameters within the system [Franco et al., 2011]
Improving the quality of the final solutions by means of
local search (memetic operators) [Franco et al., 2012b]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29
7. BioHEL System
Our approach
Results
Summary
Motivation
Goal of this work
To improve the quality of the decision lists by means of local
search (memetic operators)
Decision lists are a widespread paradigm in rule learning,
guided local search and supervised learning.
Example
Pittsburgh Learning Classifier Systems
Rule induction systems in mainstream machine learning
(PART, CN2, JRip)
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
8. BioHEL System
Our approach
Results
Summary
Motivation
Goal of this work
To improve the quality of the decision lists by means of local
search (memetic operators)
Decision lists are a widespread paradigm in rule learning,
guided local search and supervised learning.
Example
Pittsburgh Learning Classifier Systems
Rule induction systems in mainstream machine learning
(PART, CN2, JRip)
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29
9. BioHEL System
Our approach
Results
Summary
Outline
1 BioHEL
Attribute List Knowledge Representation
Structure of the solutions
What is the problem?
2 Our approach: Post-processing the rules
Swapping
Pruning
Cleaning
3 Results
4 Summary
Where to go from here?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 4 / 29
10. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Introduction to the BioHEL System
BIOinformatics-oriented Hierarchical Evolutionary Learning
- BioHEL [Bacardit et al., 2009]
BioHEL is an evolutionary learning system that employs
the Iterative Rule Learning (IRL) paradigm
BioHEL was especially designed to cope with large scale
datasets
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 5 / 29
11. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Attribute List Knowledge Representation
Meta-representation to handle large amount of discrete
and continuous attributes fast [Bacardit and Krasnogor, 2009].
ALKR Classifier Example
numAtt 3
whichAtt 0
predicates 0.5 0.7 0.3
offsetPred 0
class 1
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 6 / 29
12. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Attribute List Knowledge Representation
Discrete attributes
GABIL representation
F1 F2 F3
100 01 1101
ABC DE FGHI
F 1 = A ∧ F 2 = E ∧ F 3 = (F ∨ G ∨ I)
Continuous attributes
Hyper-rectangle representation
C1 = [0.1, 0.3] ∧ C2 = [0.7, 0.9]
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 7 / 29
13. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
14. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
15. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
16. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
17. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
18. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
19. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
20. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
21. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
22. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
23. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
Solutions generated by the BioHEL system
Since BioHEL uses IRL [Venturini, 1993] the solutions are
hierarchical sets of rules ⇒ decision lists
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29
24. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
How can the rules be improved further?
We encountered the following problems:
The rules were learned in the wrong order
Larger rulesets!
Example
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 9 / 29
25. BioHEL System BioHEL System
Our approach Attribute List Knowledge Representation
Results Structure of the solutions
Summary What is the problem?
How can the rules be improved further?
We encountered the following problems:
The rules did not have the correct specificity
The number of attributes expressed was rather high!
Example
Problem:
x1 = 1 ∧ x3 = 0 Good
x1 = 1 ∧ x3 = 0
000 = 0 100 = 1 Over-specific
001 = 0 101 = 0 x1 = 1 ∧ x2 = 1 ∧ x3 = 0
010 = 0 110 = 1 x1 = 1 ∧ x2 = 0 ∧ x3 = 0
011 = 0 111 = 0
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 10 / 29
26. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning
Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
27. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning
Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29
28. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Rule Swapping
Consist is swapping the order of the rules in the final
rulesets.
Which rules shall we swap? ⇒ Similarities
Measure of similarity
Dis Real
Dis k Sk (i, j) Real Mi
S(i, j) = Dis
+ Sk (i, j) +
NA k numVals(k ) NA NA
k
Measures the overlapping between rules
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 12 / 29
29. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
30. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
31. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
32. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
33. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
34. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
35. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
36. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
37. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
38. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29
39. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
Helps erase
unnecessary rules
It does not ensure the
final rule set is minimal
It has to reevaluate the
rules in the new order in
each iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
40. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
Helps erase
unnecessary rules
It does not ensure the
final rule set is minimal
It has to reevaluate the
rules in the new order in
each iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
41. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
Helps erase
unnecessary rules
It does not ensure the
final rule set is minimal
It has to reevaluate the
rules in the new order in
each iteration
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29
42. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
43. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29
44. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Rule pruning
Drops attributes that do not affect the accuracy of the rules.
Example
Problem:
x1 = 1 ∧ x3 = 0 Good
x1 = 1 ∧ x3 = 0
000 = 0 100 = 1 Over-specific
001 = 0 101 = 0 x1 = 1 ∧ x2 = 1 ∧ x3 = 0
010 = 0 110 = 1 x1 = 1 ∧ x2 = 0 ∧ x3 = 0
011 = 0 111 = 0
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 16 / 29
45. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning ⇒ Wait! This does not work if the other attributes
are not correctly specified!
Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
46. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning ⇒ Wait! This does not work if the other attributes
are not correctly specified!
Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
47. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Our approach: Post-processing the rules
Ruleset-wise operators
Rule swapping
Rule-wise operators
Pruning ⇒ Wait! This does not work if the other attributes
are not correctly specified!
Cleaning
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29
48. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
Rule cleaning
In the χary domain is not always possible to drop attributes
if the correct attributes are misaligned
Example
Problem:
x1 nominal {a,b,c,d,e} Rule 1:
x2 nominal {w,y,z} x1 = (a ∨ b) ∧ x2 = w
x3 nominal {m,n}
Generated Rule:
x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m
We need to deactivate literals in the attributes
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 18 / 29
49. BioHEL System
Swapping
Our approach
Pruning
Results
Cleaning
Summary
How does it works?
Cleaning approaches:
CL - Focus on the positives
CL2 - Do not infer
Continuous
(- - - - ( (+ - + + + + - + -+) ) - - -)
OLD CL2 CL CL CL2 OLD
Discrete
111011 Values covered by possitive examples: a,b,c
OLD Values covered by negative examples: c,e
abcdef
111000 111001
CL CL2
abcdef abcdef
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 19 / 29
50. BioHEL System
Our approach
Results
Summary
Experimental design
We analysed the operators over final rulesets generated
with 35 real world problems
3 stages of experiments
Independent operators
Combinations between CL and PR
Combinations with the SW operator
Questions
Where are the most significant improvements?
Are the results significant?
What about the computational time?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
51. BioHEL System
Our approach
Results
Summary
Experimental design
We analysed the operators over final rulesets generated
with 35 real world problems
3 stages of experiments
Independent operators
Combinations between CL and PR
Combinations with the SW operator
Questions
Where are the most significant improvements?
Are the results significant?
What about the computational time?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29
52. BioHEL System
Our approach
Results
Summary
Results of the operators independently
Atts
0
−5
−10
−15
−20
Rules
0
−5
−10
−15
−20
−25
−30 Algorithm
% of variation
Test_acc CL
2 CL2
1 PR
0
SW
−1
−2
−3
Test_ensemble
2
0
−2
−4
Ad
C−
CN
CN
KD in
Pa up
SS X
ba
bp
bre
cm
co
cr−
gls
h−
h−
h−
he
ion
irs
lab
lym
pe
pim
pr t
sa
so
thy
vo
wa
wb
wd
win
wp
zo
t
o
l
t
n
l
a
p
n
c1
h
s
rM
ult
v
cd
bc
bc
4
1
c
DC
a
−b
e
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 21 / 29
53. 22 / 29
PR−CL2−PR
PR−CL−PR
CL2−PR
PR−CL2
CL−PR
PR−CL
Algorithm
Post-processing Operators for Decision Lists
Atts Test_acc Test_ensemble
o
zo bc
wp e
winbc
wdcd
wbv
wa
t
vo
thy
n
so
t
sa
pr t
pim n
pe
lym
lab
irs
ionp
CL2
hes
h−h
h−c1
h−
gls a
cr− l
co c
cm
bre a
bpl
ba 1
SSrMX p
Pa DCu
KD −bin
CN
CN4
Results of combining CL and PR
C−ult
Ad
BioHEL System
Our approach
Results
Summary
María A. Franco. University of Nottingham
o
zo bc
wp e
winbc
wdcd
wbv
wa
t
vo
thy
n
so
t
sa
pr t
pim n
pe
lym
lab
irs
ionp
CL
hes
h−h
h−c1
h−
gls a
cr−
l
co c
cm
bre a
bpl
ba 1
SSrMX p
Pa DCu
KD −bin
CN
CN4
C−ult
Ad
0
−5
−10
−15
−20
−25
−30
2
1
0
−1
−2
−3
−4
4
2
0
−2
−4
% of variation
54. BioHEL System
Our approach
Results
Summary
Results of combining CL, PR and SW
Atts
0
−5
−10
−15
−20
−25
Rules
0
−5
−10
−15
−20
−25
−30 Algorithm
% of variation
Test_acc CL−SW
2 CL2−SW
1 PR−SW
0
PR−CL2−PR−SW
−1
−2
−3
Test_ensemble
4
2
0
−2
−4
Ad
C−
CN
CN
KD bin
Pa Cup
SS X
ba
bp
bre
cm
co
cr−
gls
h−
h−
h−
he
ion
irs
lab
lym
pe
pim
pr t
sa
so
thy
vo
wa
wb
wd
win
wp
zo
t
o
l
t
n
l
a
p
n
c1
h
s
rM
ult
v
cd
bc
bc
4
1
c
D
a
−
e
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 23 / 29
55. BioHEL System
Our approach
Results
Summary
Are the results significant?
Table: Rankings of the Friedman statistical tests. indicates that the
algorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Atts
acc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84
CL 7.73 7.86 – 10.84
CL2 7.64 7.84 – 10.84
PR 7.57 7.21 – 5.53
SW 7.51 6.60 2.59 11.30
CL-PR 6.37 7.29 – 3.97
PR -CL 6.67 7.31 – 5.53
PR-CL-PR 5.87 6.79 – 1.51
CL2-PR 6.59 6.79 – 5.81
PR -CL2 6.89 7.16 – 5.71
PR-CL2-PR 6.36 6.91 – 2.29
CL-SW 7.14 6.51 2.07 11.23
CL2-SW 7.46 6.83 2.40 11.17
PR-SW 6.94 6.29 2.14 5.94
PR-CL2-PR-SW 6.46 6.54 2.07 2.47
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
56. BioHEL System
Our approach
Results
Summary
Are the results significant?
Table: Rankings of the Friedman statistical tests. indicates that the
algorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Atts
acc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84
CL 7.73 7.86 – 10.84
CL2 7.64 7.84 – 10.84
PR 7.57 7.21 – 5.53
SW 7.51 6.60 2.59 11.30
CL-PR 6.37 7.29 – 3.97
PR -CL 6.67 7.31 – 5.53
PR-CL-PR 5.87 6.79 – 1.51
CL2-PR 6.59 6.79 – 5.81
PR -CL2 6.89 7.16 – 5.71
PR-CL2-PR 6.36 6.91 – 2.29
CL-SW 7.14 6.51 2.07 11.23
CL2-SW 7.46 6.83 2.40 11.17
PR-SW 6.94 6.29 2.14 5.94
PR-CL2-PR-SW 6.46 6.54 2.07 2.47
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
57. BioHEL System
Our approach
Results
Summary
Are the results significant?
Table: Rankings of the Friedman statistical tests. indicates that the
algorithm is significantly better (Holm test with 99% confidence).
Test Test # Rules # Atts
acc ensem
P-Values 0.708 0.962 8.9e-09 2.2e-16
Base 7.80 7.07 3.73 10.84
CL 7.73 7.86 – 10.84
CL2 7.64 7.84 – 10.84
PR 7.57 7.21 – 5.53
SW 7.51 6.60 2.59 11.30
CL-PR 6.37 7.29 – 3.97
PR -CL 6.67 7.31 – 5.53
PR-CL-PR 5.87 6.79 – 1.51
CL2-PR 6.59 6.79 – 5.81
PR -CL2 6.89 7.16 – 5.71
PR-CL2-PR 6.36 6.91 – 2.29
CL-SW 7.14 6.51 2.07 11.23
CL2-SW 7.46 6.83 2.40 11.17
PR-SW 6.94 6.29 2.14 5.94
PR-CL2-PR-SW 6.46 6.54 2.07 2.47
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29
58. BioHEL System
Our approach
Results
Summary
How long does the post-processing takes?
Table: Execution time of the application of each one of the different
operators independently
Prob Ins Rules Atts CL2 (s) PR (s) SW (s)
CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42
Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14
CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48
KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45
C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41
ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78
SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27
Swapping is very slow... It depends on the number of instances
and number of rules generated.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
59. BioHEL System
Our approach
Results
Summary
How long does the post-processing takes?
Table: Execution time of the application of each one of the different
operators independently
Prob Ins Rules Atts CL2 (s) PR (s) SW (s)
CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42
Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14
CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48
KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45
C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41
ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78
SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27
Swapping is very slow... It depends on the number of instances
and number of rules generated.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29
60. BioHEL System
Our approach
Where to go from here?
Results
Summary
Summary and next steps
Summary
The operators manage to reduce the number of rules and
expressed attributes in 30% in some cases.
Next steps
Apply the CL and PR operators during the learning process
Investigate other measures of similarities among rules
Apply these operators over other systems
Different representations
CUDA accelerated operators?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
61. BioHEL System
Our approach
Where to go from here?
Results
Summary
Summary and next steps
Summary
The operators manage to reduce the number of rules and
expressed attributes in 30% in some cases.
Next steps
Apply the CL and PR operators during the learning process
Investigate other measures of similarities among rules
Apply these operators over other systems
Different representations
CUDA accelerated operators?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29
62. BioHEL System
Our approach
Where to go from here?
Results
Summary
References I
Bacardit, J., Burke, E., and Krasnogor, N. (2009).
Improving the scalability of rule-based evolutionary learning.
Memetic Computing, 1(1):55–67.
Bacardit, J. and Krasnogor, N. (2009).
A mixed discrete-continuous attribute list representation for large scale classification domains.
In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages
1155–1162, New York, NY, USA. ACM Press.
Franco, M., Krasnogor, N., and Bacardit, J. (2012a).
Analysing biohel using challenging boolean functions.
Evolutionary Intelligence, 5:87–102.
10.1007/s12065-012-0080-9.
Franco, M. A., Krasnogor, N., and Bacardit, J. (2010).
Speeding up the evaluation of evolutionary learning systems using GPGPUs.
In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages
1039–1046, New York, NY, USA. ACM.
Franco, M. A., Krasnogor, N., and Bacardit, J. (2011).
Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding.
In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages
1291–1298, New York, NY, USA. ACM.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 27 / 29
63. BioHEL System
Our approach
Where to go from here?
Results
Summary
References II
Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b).
Postprocessing operators for decision lists.
In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation,
page to appear, New York, NY, USA. ACM Press.
Venturini, G. (1993).
SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.
In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine
Learning, pages 280–296. Springer-Verlag.
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 28 / 29
64. BioHEL System
Our approach
Where to go from here?
Results
Summary
Questions or comments?
María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 29 / 29