A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression
1. A Double Lexicase Selection Operator for
Bloat Control in Evolutionary Feature
Construction for Regression
Hengzhe Zhang
Supervisor: Mengjie Zhang, Bing Xue, Qi Chen, Wolfgang Banzhaf (MSU)
Victoria University of Wellington
17/07/2023
4. Evolutionary Feature Construction
The general idea of feature construction is to construct a set of new features
{ϕ1, . . . , ϕm} to enhance the learning performance on a given dataset
{{x1, y1}, . . . , {xn, yn}} compared to learning on the original features
{x1, . . . , xp}.
Genetic programming (GP) has been extensively employed to automatically
construct features due to its flexible representation and gradient-free search
mechanism.
(a) Feature Construction on Linear Regression (b) New Feature Space
2 20
5. Bloat Phenomenon
Bloat refers to the tendency of GP solutions to grow more complex over time
without improving the fitness value.
Growth of program size
3 20
6. Bloat Phenomenon
The explanations for bloat include:
Hitchhiking
Defense against crossover
Removal bias
The nature of a program search space
Regardless of the reason for bloat, it is widely acknowledged that addressing bloat
can increase search efficiency and enhance the interpretability of the final model.
4 20
7. Existing Bloat Control Techniques
Depth Limit: Set a strict depth limit for each GP tree.
Variation Operator
▶ Prune-and-Plant (PAP)
▶ Semantic Approximation (SA)
Selection Operator
▶ Double Tournament Selection (DTS)
▶ Semantic Tournament Selection (TS-S)
Fitness Function
▶ Tarpeian
▶ Alpha-Dominance MOGP
5 20
8. Double Tournament Selection
Double Tournament Selection
Double Stages of DTS:
Stage 1: Tournament selection,
get individuals A, B
Stage 2: Select the smaller one in
A,B with a probability of 0.7
Advantage:
Applicable for various scenarios
(GPSR, GPHH)
Disadvantage:
May lead to reduced diversity,
given that the tournament
selection operator is repeatedly
used. -> Lexicase Selection!
6 20
9. Lexicase Selection
Tournament vs Lexicase
Tournament Selection produces a
lot of semantically equivalent
individuals
Lexicase Selection preserves a very
good population diversity
Why lexicase selection?
It is not necessary to sum up all
errors as a scalar for EA methods.
MSE = 1
n
Pn
i=1
Yi − Ŷi
2
How to perform lexicase selection?
Step 1: Construct a filter
MAD (et) = λ (et) =
medianj
etj
− mediank etk
Step 2: Remove bad individuals
based on the filter
Step 3: Construct more filters until
only one individual remains
7 20
11. Double Lexicase Selection
Double Lexicase Selection
Two Stages of DLS:
Stage 1: Lexicase selection, get k
individuals A, B, C, D, ...to form a
candidate pool
Stage 2: Roulette Wheel
Selection on k individuals
negatively proportionate to tree
size
Advantage:
Applicable for every scenario
(GPSR, GPHH)
Fully exploit semantics because
of using the lexicase selection
operator
8 20
13. Datasets
98 Regression datasets are used in the experiments, which are all datasets in
PMLB with less than 2000 data items.
The size of datasets range between 47 and 1059, and the dimension of datasets
are between 2 and 124.
1XPEHURI,QVWDQFHV
1XPEHURI)HDWXUHV
Properties of experimental datasets
9 20
17. R2
Score
Only DLS (first row), DSA and Tarpeian methods have similar or better
performance on most datasets compared to the depth limit method.
For the top three algorithms, the DLS method is better than the Tarpeian and
DSA methods.
Statistical comparison of test R2
score for different bloat control methods on 98 datasets.
(“+,“∼, and “- indicate using the method in a row is better than, similar to or worse than
using the method in a column.)
12 20
18. Model Size
DLS (first row) is a successful bloat control method, as it reduces model sizes
on all datasets.
When comparing the PAP, DSA, DTS and TS-S operators, the DLS operator is
worse at reducing model size.
However, DLS is better than the PAP, DSA, DTS and TS-S operators in terms of
test R2 scores.
Statistical comparison of model sizes for different bloat control methods on 98 datasets.
(+,∼, and - indicate using the method in a row is better than, similar to or worse than
using the method in a column.)
13 20
19. Evolutionary Plots
DLS operator has good effectiveness in terms of R2 scores over the whole
evolution process and thus achieves good final accuracy.
Depth limit cannot effectively control tree sizes. In contrast, the DLS operator
can effectively control tree size to a relatively low level.
0 25 50 75 100
Generation
0.50
0.75
1.00
R
2
Score
OpenML 582
0 25 50 75 100
Generation
0.8
0.9
R
2
Score
OpenML 599
0 25 50 75 100
Generation
0.50
0.75
R
2
Score
OpenML 618
0 25 50 75 100
Generation
0.50
0.75
R
2
Score
OpenML 645
DLS
αMOGP
Tarpeian
DTS
PAP
TS-S
DSA
DepthLimiting
(a) Evolutionary plots of test R2
score for
different bloat control methods.
0 25 50 75 100
Generation
5
10
Tree
Size
OpenML 582
0 25 50 75 100
Generation
5
10
15
Tree
Size
OpenML 599
0 25 50 75 100
Generation
5
10
Tree
Size
OpenML 618
0 25 50 75 100
Generation
5
10
Tree
Size
OpenML 645
DLS
αMOGP
Tarpeian
DTS
PAP
TS-S
DSA
DepthLimiting
(b) Evolutionary plots of average tree sizes
for different bloat control methods.
14 20
20. Overall Analysis
Friedman’s rank of test R2
scores and
tree sizes on 98 datasets for different
bloat control methods.
Only four methods, DLS,
Tarpeian, DSA and αMOGP do not
get worse predictive
performance on R2 scores than
using depth-limited methods
alone.
DLS operator achieves a good
trade-off between test R2 scores
and model size.
15 20
21. Capacity of Candidate Pool
Model sizes decrease with an increase in capacity.
Training time will increase significantly when increasing pool capacity from 10
to 20.
a
(a) Statistical comparison on
tree size between using a
capacity of 10 and 2.
a
(b) Statistical comparison on
tree size between using a
capacity of 10 and 5.
3RRODSDFLW
7UDLQLQJ7LPH
(c) Distribution of training
time versus candidate pool
capacity.
16 20
22. Roulette wheel selection
Roulette wheel selection operator is significantly better than minimum
selection operator in terms of test R2 scores.
The minimum selection operator favors very small individuals and thus leads
to very poor predictive performance.
a
a
(a) Statistical comparison of R2
scores using
roulette instead of minimum as the selection
strategy on 98 datasets.
7UHH6L]H
RXQW
$OJRULWKP
0LQ
5RXOHWWH
(b) Distribution of final tree sizes when using
roulette or minimum as the selection strategy
on 98 datasets.
17 20
24. Compatibility with Multi-objective Methods
There is no significant difference between integration DLS or not in terms of R2
scores.
Incorporating DLS with MO methods can significantly reduce model sizes across
nearly all datasets, as compared to using MO methods alone.
(a) Statistical comparison of test R2
scores for
integration with DLS and MO methods.
(b) Statistical comparison of model sizes for
integration with DLS and MO methods.
19 20
26. Conclusions
By employing a double-stage selection mechanism that takes into account
both the model performance and size, the lexicase selection operator can
effectively control bloat.
Utilizing a large candidate pool and employing roulette wheel selection are
crucial for double lexicase selection.
Open Source Project: Evolutionary Forest (90 GitHub Stars)
20 / 20
27. Thanks for listening!
Email: Hengzhe.zhang@ecs.vuw.ac.nz
GitHub Project: https://github.com/hengzhe-zhang/DoubleLexicaseSelection