Linkage Learning for Pittsburgh LCS: Making Problems Tractable

Linkage Learning for Pittsburgh LCS:
Making Problems Tractable

Xavier Llorà, Kumara Sastry, & David E. Goldberg

Illinois Genetic Algorithms Lab
University of Illinois at Urbana-Champaign

{xllora,kumara,deg}@illigal.ge.uiuc.edu

Motivation and Early Work

• Can we apply Wilson’s ideas for evolving rule sets
formed only by maximally accurate and general rules in
Pittsburgh LCS?
• Previous Multi-objective approaches:
 Bottom up (Bernadó, 2002)
• Panmictic populations
• Multimodal optimization (sharing/crowding for niche formation)
 Top down (Llorà, Goldberg, Traus, Bernadó, 2003)
• Explicitly address accuracy and generality
• Use it to push and product compact rule sets
• The compact classifier system (CCS) roots on the bottom
up approach.

NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 2

Maximally Accurate and General Rules

• Accuracy and generality can be compute as
n t + (r) + n t# (r) n t + (r)
quot;(r) = quot;(r) =
nt nm
• Fitness should combine accuracy and generality
f (r) = quot;(r) # $(r)%
! !
• Such measure can be either applied to rules or rule sets.
• The CCS uses this fitness and a compact genetic algorithm
!
(cGA) to evolve such rules.
• One cGA run provides one rule.
• Multiple rules are required to form a rule set.

The cGA Can Make It

• Rules may be obtained optimizing

f (r) = quot;(r) # $(r)%

The basic CGA scheme
•
0
1. Initialization px i = 0.5
!
2. Model sampling (two individuals are generated)
3. Evaluation (f(r))
4. Selection (tournament selection)
!
5. Probabilistic model updation
6. Repeat steps 2-5 until termination criteria are met


cGA Model Perturbation

• Facilitate the evolution of different rules
• Explore the frequency of appearance of each optimal
rule
• Initial model perturbation
0
px i = 0.5 + U(quot;0.4,0.4)

• Experiments using the 3-input multiplexer
• 1,000 independent runs
!
• Visualize the pair-wise relations of the genes


But One Rule Is Not Enough

• Model perturbation in cGA evolve different rules
• The goal: evolve population of rules that solve the
problem together
• The fitness measure (f(r)) can be also be applied to rule
sets
Two mechanism:
•
 Spawn a population until the solution is meet
 Fusing populations when they represent the same rule


Spawning and Fusing Populations


Experiments & Scalability
• Analysis using multiplexer problems (3-, 6-, and 11-input)
• The number of rules in [O] grow exponentially.
 It grows as 2i, where i is the number of inputs.
 Assume equal probability of hitting a rule (binomial model).
 The number or runs to achieve all the rules in [O] grows
exponentially.
• The cGA success as a function of the problem size!
 3-input: 97%
 6-input: 73.93%
 11-input: 43.03%
• Scalability over 10,000 independent runs


Scalability of CCS


So?
Open questions:
•
 Multiple runs is not an option.
 Could the poor cGA scalability be the result of the existence of linkage?
The χ-ary extended compact classifier system (χeCCS) needs to
•
provide answers to:
 Perform linkage learning to improve the scalability of the rule learning
process.
 Evolve [O] in a single run (rule niching?).
The χeCCS answer:
•
 Use the extended compact genetic algorithm (Harik, 1999)
 Rule niching via restricted tournament replacement (Harik, 1995)


Extended Compact Genetic Algorithm
A Probabilistic model building GA (Harik, 1999)
•
 Builds models of good solutions as linkage groups

Key idea:
•
 Good probability distribution → Linkage learning

Key components:
•
 Representation: Marginal product model (MPM)
• Marginal distribution of a gene partition

 Quality: Minimum description length (MDL)
• Occam’s razor principle
• All things being equal, simpler models are better
 Search Method: Greedy heuristic search


Marginal Product Model (MPM)
• Partition variables into clusters
• Product of marginal distributions on a partition of genes
• Gene partition maps to linkage groups
MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l]

... xl-2 xl-1 xl
x1 x2 x3 x4 x5 x6

{p000, p001, p010, p100, p011, p101, p110, p111}


Minimum Description Length Metric
Hypothesis: For an optimal model
•
 Model size and error is minimum

Model complexity, Cm
•
 # of bits required to store all marginal probabilities

Compressed population complexity, Cp
•
 Entropy of the marginal distribution over all partitions

MDL metric, Cc = Cm + Cp
•


Building an Optimal MPM
Assume independent genes ([1],[2],…,[l])
•

Compute MDL metric, Cc
•

All combinations of two subset merges
•
Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])}
•

Compute MDL metric for all model candidates
•

Select the set with minimum MDL,
•

If , accept the model and go to step 2.
•

Else, the current model is optimal
•


Extended Compact Genetic Algorithm
Initialize the population (usually random initialization)
•

Evaluate the fitness of individuals
•

Select promising solutions (e.g., tournament selection)
•

Build the probabilistic model
•
Optimize structure & parameters to best fit selected individuals
•
Automatic identification of sub-structures
•

Sample the model to create new candidate solutions
•
Effective exchange of building blocks
•

Repeat steps 2–7 till some convergence criteria are met
•


Models built by eCGA
• Use model-building procedure of extended compact GA
 Partition genes into (mutually) independent groups
 Start with the lowest complexity model
 Search for a least-complex, most-accurate model

Model Structure Metric
[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000
[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933
[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819
[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644
M M
[X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273
M M
[X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895


Modifying ecGA for Rule Learning
• Rules are described using χ-ary alphabets {0, 1, #}.
• χeCCS uses a χ-ary version of ecGA (Sastry and Goldberg,
2003; de la Osa, Sastry, and Lobo, 2006).
• Maximally general and maximally accurate rules may be
obtained using:
f (r) = quot;(r) # $(r)%

• Needs to maintain multiple rules in a run → niching
 We need an efficient niching method, that does not adversely
!
affect the quality of the probabilistic models.
 Restricted tournament replacement (Harik, 1995)


Experiments

Goals
•
1. Is linkage learning useful to solve the multiplexer problem using
Pittsburgh LCS?
2. How far can we push it?
Multiplexer problems
•
Address bits determine what input to use

There is un underlying structure, isn’t it?

The larger solved using Pittsburgh approaches (11-input)
•
Match all the examples

No linkage learning available

We borrowed the population sizing theory for ecGA.
•


χeCCS Models for Different Multiplexers
Building Block Size Increases


χeCCS Scalability

Follows facet-wise theory:
•
1. Grows exponential with the number of address bits (building block size)
2. Quadratically with the problem size


Conclusions
The χeCCS builds on competent GAs
•
The facetwise models from GA theory hold
•
The χeCCS is able to:
•
1. Perform linkage learning to improve the scalability of the rule
learning process.
2. Evolve [O] in a single run.
The χeCCS show the need for linkage learning in
•
Pittsburgh LCS to effectively solve multiplexer
problems.
χeCCS solved 20-input, 37-input, and 70-input
•
multiplexers problems for the first time using Pittsburgh
LCS.

Linkage Learning for Pittsburgh LCS: Making Problems Tractable

Recommended

Recommended

More Related Content

Similar to Linkage Learning for Pittsburgh LCS: Making Problems Tractable

Similar to Linkage Learning for Pittsburgh LCS: Making Problems Tractable (20)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Linkage Learning for Pittsburgh LCS: Making Problems Tractable