Linkage Learning for Pittsburgh LCS:
    Making Problems Tractable


    Xavier Llorà, Kumara Sastry, & David E. Goldberg
...
Motivation and Early Work

    • Can we apply Wilson’s ideas for evolving rule sets
      formed only by maximally accurat...
Maximally Accurate and General Rules

    • Accuracy and generality can be compute as
                        n t + (r) + ...
The cGA Can Make It

    • Rules may be obtained optimizing

                                    f (r) = quot;(r) # $(r)%
...
cGA Model Perturbation

    • Facilitate the evolution of different rules
    • Explore the frequency of appearance of eac...
But One Rule Is Not Enough

    • Model perturbation in cGA evolve different rules
    • The goal: evolve population of ru...
Spawning and Fusing Populations




NIGEL 2006        Llorà, X., Sastry, K., and Goldberg, D.   7
Experiments & Scalability
    • Analysis using multiplexer problems (3-, 6-, and 11-input)
    • The number of rules in [O...
Scalability of CCS




NIGEL 2006   Llorà, X., Sastry, K., and Goldberg, D.            9
So?
             Open questions:
    •
              Multiple runs is not an option.
              Could the poor cGA sc...
Extended Compact Genetic Algorithm
             A Probabilistic model building GA (Harik, 1999)
       •
              Bu...
Marginal Product Model (MPM)
       • Partition variables into clusters
       • Product of marginal distributions on a pa...
Minimum Description Length Metric
             Hypothesis: For an optimal model
       •
              Model size and err...
Building an Optimal MPM
             Assume independent genes ([1],[2],…,[l])
       •

             Compute MDL metric, C...
Extended Compact Genetic Algorithm
             Initialize the population (usually random initialization)
   •

          ...
Models built by eCGA
   • Use model-building procedure of extended compact GA
              Partition genes into (mutuall...
Modifying ecGA for Rule Learning
    • Rules are described using χ-ary alphabets {0, 1, #}.
    • χeCCS uses a χ-ary versi...
Experiments

             Goals
    •
             1. Is linkage learning useful to solve the multiplexer problem using
  ...
χeCCS Models for Different Multiplexers
        Building Block Size Increases




NIGEL 2006                              ...
χeCCS Scalability




             Follows facet-wise theory:
     •
             1. Grows exponential with the number of ...
Conclusions
             The χeCCS builds on competent GAs
    •
             The facetwise models from GA theory hold
   ...
Linkage Learning for Pittsburgh LCS:
    Making Problems Tractable


    Xavier Llorà, Kumara Sastry, & David E. Goldberg
...
Upcoming SlideShare
Loading in …5
×

Linkage Learning for Pittsburgh LCS: Making Problems Tractable

718 views

Published on

Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
718
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linkage Learning for Pittsburgh LCS: Making Problems Tractable

  1. 1. Linkage Learning for Pittsburgh LCS: Making Problems Tractable Xavier Llorà, Kumara Sastry, & David E. Goldberg Illinois Genetic Algorithms Lab University of Illinois at Urbana-Champaign {xllora,kumara,deg}@illigal.ge.uiuc.edu
  2. 2. Motivation and Early Work • Can we apply Wilson’s ideas for evolving rule sets formed only by maximally accurate and general rules in Pittsburgh LCS? • Previous Multi-objective approaches:  Bottom up (Bernadó, 2002) • Panmictic populations • Multimodal optimization (sharing/crowding for niche formation)  Top down (Llorà, Goldberg, Traus, Bernadó, 2003) • Explicitly address accuracy and generality • Use it to push and product compact rule sets • The compact classifier system (CCS) roots on the bottom up approach. NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 2
  3. 3. Maximally Accurate and General Rules • Accuracy and generality can be compute as n t + (r) + n t# (r) n t + (r) quot;(r) = quot;(r) = nt nm • Fitness should combine accuracy and generality f (r) = quot;(r) # $(r)% ! ! • Such measure can be either applied to rules or rule sets. • The CCS uses this fitness and a compact genetic algorithm ! (cGA) to evolve such rules. • One cGA run provides one rule. • Multiple rules are required to form a rule set. NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 3
  4. 4. The cGA Can Make It • Rules may be obtained optimizing f (r) = quot;(r) # $(r)% The basic CGA scheme • 0 1. Initialization px i = 0.5 ! 2. Model sampling (two individuals are generated) 3. Evaluation (f(r)) 4. Selection (tournament selection) ! 5. Probabilistic model updation 6. Repeat steps 2-5 until termination criteria are met NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 4
  5. 5. cGA Model Perturbation • Facilitate the evolution of different rules • Explore the frequency of appearance of each optimal rule • Initial model perturbation 0 px i = 0.5 + U(quot;0.4,0.4) • Experiments using the 3-input multiplexer • 1,000 independent runs ! • Visualize the pair-wise relations of the genes NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 5
  6. 6. But One Rule Is Not Enough • Model perturbation in cGA evolve different rules • The goal: evolve population of rules that solve the problem together • The fitness measure (f(r)) can be also be applied to rule sets Two mechanism: •  Spawn a population until the solution is meet  Fusing populations when they represent the same rule NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 6
  7. 7. Spawning and Fusing Populations NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 7
  8. 8. Experiments & Scalability • Analysis using multiplexer problems (3-, 6-, and 11-input) • The number of rules in [O] grow exponentially.  It grows as 2i, where i is the number of inputs.  Assume equal probability of hitting a rule (binomial model).  The number or runs to achieve all the rules in [O] grows exponentially. • The cGA success as a function of the problem size!  3-input: 97%  6-input: 73.93%  11-input: 43.03% • Scalability over 10,000 independent runs NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 8
  9. 9. Scalability of CCS NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 9
  10. 10. So? Open questions: •  Multiple runs is not an option.  Could the poor cGA scalability be the result of the existence of linkage? The χ-ary extended compact classifier system (χeCCS) needs to • provide answers to:  Perform linkage learning to improve the scalability of the rule learning process.  Evolve [O] in a single run (rule niching?). The χeCCS answer: •  Use the extended compact genetic algorithm (Harik, 1999)  Rule niching via restricted tournament replacement (Harik, 1995) NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 10
  11. 11. Extended Compact Genetic Algorithm A Probabilistic model building GA (Harik, 1999) •  Builds models of good solutions as linkage groups Key idea: •  Good probability distribution → Linkage learning Key components: •  Representation: Marginal product model (MPM) • Marginal distribution of a gene partition  Quality: Minimum description length (MDL) • Occam’s razor principle • All things being equal, simpler models are better  Search Method: Greedy heuristic search NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 11
  12. 12. Marginal Product Model (MPM) • Partition variables into clusters • Product of marginal distributions on a partition of genes • Gene partition maps to linkage groups MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l] ... xl-2 xl-1 xl x1 x2 x3 x4 x5 x6 {p000, p001, p010, p100, p011, p101, p110, p111} NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 12
  13. 13. Minimum Description Length Metric Hypothesis: For an optimal model •  Model size and error is minimum Model complexity, Cm •  # of bits required to store all marginal probabilities Compressed population complexity, Cp •  Entropy of the marginal distribution over all partitions MDL metric, Cc = Cm + Cp • NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 13
  14. 14. Building an Optimal MPM Assume independent genes ([1],[2],…,[l]) • Compute MDL metric, Cc • All combinations of two subset merges • Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])} • Compute MDL metric for all model candidates • Select the set with minimum MDL, • If , accept the model and go to step 2. • Else, the current model is optimal • NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 14
  15. 15. Extended Compact Genetic Algorithm Initialize the population (usually random initialization) • Evaluate the fitness of individuals • Select promising solutions (e.g., tournament selection) • Build the probabilistic model • Optimize structure & parameters to best fit selected individuals • Automatic identification of sub-structures • Sample the model to create new candidate solutions • Effective exchange of building blocks • Repeat steps 2–7 till some convergence criteria are met • NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 15
  16. 16. Models built by eCGA • Use model-building procedure of extended compact GA  Partition genes into (mutually) independent groups  Start with the lowest complexity model  Search for a least-complex, most-accurate model Model Structure Metric [X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000 [X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933 [X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819 [X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644 M M [X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273 M M [X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895 NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 16
  17. 17. Modifying ecGA for Rule Learning • Rules are described using χ-ary alphabets {0, 1, #}. • χeCCS uses a χ-ary version of ecGA (Sastry and Goldberg, 2003; de la Osa, Sastry, and Lobo, 2006). • Maximally general and maximally accurate rules may be obtained using: f (r) = quot;(r) # $(r)% • Needs to maintain multiple rules in a run → niching  We need an efficient niching method, that does not adversely ! affect the quality of the probabilistic models.  Restricted tournament replacement (Harik, 1995) NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 17
  18. 18. Experiments Goals • 1. Is linkage learning useful to solve the multiplexer problem using Pittsburgh LCS? 2. How far can we push it? Multiplexer problems • Address bits determine what input to use  There is un underlying structure, isn’t it?  The larger solved using Pittsburgh approaches (11-input) • Match all the examples  No linkage learning available  We borrowed the population sizing theory for ecGA. • NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 18
  19. 19. χeCCS Models for Different Multiplexers Building Block Size Increases NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 19
  20. 20. χeCCS Scalability Follows facet-wise theory: • 1. Grows exponential with the number of address bits (building block size) 2. Quadratically with the problem size NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 20
  21. 21. Conclusions The χeCCS builds on competent GAs • The facetwise models from GA theory hold • The χeCCS is able to: • 1. Perform linkage learning to improve the scalability of the rule learning process. 2. Evolve [O] in a single run. The χeCCS show the need for linkage learning in • Pittsburgh LCS to effectively solve multiplexer problems. χeCCS solved 20-input, 37-input, and 70-input • multiplexers problems for the first time using Pittsburgh LCS. NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 21
  22. 22. Linkage Learning for Pittsburgh LCS: Making Problems Tractable Xavier Llorà, Kumara Sastry, & David E. Goldberg Illinois Genetic Algorithms Lab University of Illinois at Urbana-Champaign {xllora,kumara,deg}@illigal.ge.uiuc.edu

×