Do not Match, Inherit: Fitness Surrogates for
Genetics-Based Machine Learning Techniques


       Xavier Llorà1,2, Kumara ...
Motivation
• Competent GBML
      – Use competent GAs to approach GBML problems
      – Take advantage of competent GA sca...
Motivation
• Competent GAs
      – Byproduct: Models and problem structure insight
      – Revision of the fitness relaxat...
Outline
• Overview χeCCS
• Evaluation relaxation
• Fitness inheritance using least squares fitting
• Fitness inheritance a...
χ-ari Extended Compact Classifier System

• No reinforcement learning is used
• A competent GA is in charge of the learnin...
Maximally Accurate and General Rules
• Accuracy and generality can be compute as

                    n t + (r) + n t# (r)...
Maximally Accurate and General Rules




GECCO 2007         Llorà, Sastry, Yu & Goldberg   7
Extended Compact Genetic Algorithm
  • A Probabilistic model building GA (Harik, 1999)
        – Builds models of good sol...
Marginal Product Model (MPM)
  • Partition variables into disjoint sets
  • Product of marginal distributions on a partiti...
Minimum Description Length Metric
  • Hypothesis: For an optimal model
        – Model size and error is minimum

  • Mode...
Building an Optimal MPM
  •     Assume independent genes ([1],[2],…,[l])

  •     Compute MDL metric, Cc

  •     All comb...
χeCCS Models for Different Multiplexers
Building Block Size Increases
Fitness Inheritance using Least Squares
• Proposed by Sastry, Lima & Goldberg (2006)
• Surrogate is a regression using bas...
Fitness Inheritance using Least Squares
• Recoding defines matrix A




• Normalize the fitness




GECCO 2007            ...
Fitness Inheritance using Least Squares
• Solve using least squares




• Once solved, the fitness surrogate take the foll...
Fitness Inheritance and χeCCS
• Two different problems




               Hidden XOR                            6-input mu...
Hidden XOR
• Evolved rules and model




• Surrogate accuracy




GECCO 2007             Llorà, Sastry, Yu & Goldberg   17
6-input Multiplexer
• The evolved solution and model




• The surrogate is totally off



GECCO 2007            Llorà, Sa...
6-input Multiplexer
• The key = missing basis




• χeCCS is able to solve the problem quickly, reliably,
    and accurate...
Overlapping BBs using DSMGA
• Proposed by Yu, Yassine, Goldberg and Chen (2003)
• Based on organizational theory
• Main pr...
Jumping to the Results
• DSMcluster model for the hidden XOR
      – [i0 i1 i2] [i3] [i4] [i5]


• DSMcluster model for th...
Conclusions
• The matching process is crucial and expensive
• Efficient implementations can take us far to a point
• Relax...
Do not Match, Inherit: Fitness Surrogates for
Genetics-Based Machine Learning Techniques


       Xavier Llorà1,2, Kumara ...
Upcoming SlideShare
Loading in …5
×

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques

1,715 views

Published on

A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,715
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
137
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques

  1. 1. Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques Xavier Llorà1,2, Kumara Sastry2, Tian-Li Yu3, David E. Goldberg2 1 National Center for Supercomputing Applications. University of Illinois at Urbana-Champaign 2 Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign 3 Department of Electrical Engineering, National Taiwan University Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 GECCO 2007 HUMIES 1
  2. 2. Motivation • Competent GBML – Use competent GAs to approach GBML problems – Take advantage of competent GA scalability – Provide insight about problem structure – χeCCS by Llorà, Sastry, Goldberg & de la Ossa (2006) • Rule matching may thread practical applications – Even for small dimensional problems (MUX 20), rule matching may take more than 85% of the execution time in XCS – As dimensionality or cardinality of training sets increase, rule matching rules the overall execution time – Efficient implementation (Llora & Sastry, 2006) still require matching rules GECCO 2007 Llorà, Sastry, Yu & Goldberg 2
  3. 3. Motivation • Competent GAs – Byproduct: Models and problem structure insight – Revision of the fitness relaxation for expensive fitness evaluations – Idea: Build a cheap surrogate fitness accurate enough – Successfully applied to GA (Sastry, Lima & Goldberg, 2006) – Help cut down the number of fitness evaluations • GBML – Can we transfer the same ideas to GBML approaches? – What are the requirements needed for competent GBML to benefit from fitness relaxation? GECCO 2007 Llorà, Sastry, Yu & Goldberg 3
  4. 4. Outline • Overview χeCCS • Evaluation relaxation • Fitness inheritance using least squares fitting • Fitness inheritance and χeCCS • Results • Conclusions GECCO 2007 Llorà, Sastry, Yu & Goldberg 4
  5. 5. χ-ari Extended Compact Classifier System • No reinforcement learning is used • A competent GA is in charge of the learning • The idea: – A population of single rules – For each rule we compute its fitness – The χ-ari extended compact genetic algorithm – Niching to maintain different accurate rules (restricted tournament replacement) GECCO 2007 Llorà, Sastry, Yu & Goldberg 5
  6. 6. Maximally Accurate and General Rules • Accuracy and generality can be compute as n t + (r) + n t# (r) n t + (r) quot;(r) = quot;(r) = nt nm • Fitness should combine accuracy and generality f (r) = quot;(r) # $(r)% ! ! • Such measure can be either applied to rules or rule sets ! GECCO 2007 Llorà, Sastry, Yu & Goldberg 6
  7. 7. Maximally Accurate and General Rules GECCO 2007 Llorà, Sastry, Yu & Goldberg 7
  8. 8. Extended Compact Genetic Algorithm • A Probabilistic model building GA (Harik, 1999) – Builds models of good solutions as linkage groups • Key idea: – Good probability distribution → Linkage learning • Key components: – Representation: Marginal product model (MPM) • Marginal distribution of a gene partition – Quality: Minimum description length (MDL) • Occam’s razor principle • All things being equal, simpler models are better – Search Method: Greedy heuristic search GECCO 2007 Llorà, Sastry, Yu & Goldberg 8
  9. 9. Marginal Product Model (MPM) • Partition variables into disjoint sets • Product of marginal distributions on a partition of genes • Gene partition maps to linkage groups MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l] ... xl-2 xl-1 xl x1 x2 x3 x4 x5 x6 {p000, p001, p00#, p010, p011, p01#, p100, p101, p10#, p110, p111, p11# … }(27 probabilities) GECCO 2007 Llorà, Sastry, Yu & Goldberg 9
  10. 10. Minimum Description Length Metric • Hypothesis: For an optimal model – Model size and error is minimum • Model complexity, Cm – # of bits required to store all marginal probabilities • Compressed population complexity, Cp – Entropy of the marginal distribution over all partitions • MDL metric, Cc = Cm + Cp GECCO 2007 Llorà, Sastry, Yu & Goldberg 10
  11. 11. Building an Optimal MPM • Assume independent genes ([1],[2],…,[l]) • Compute MDL metric, Cc • All combinations of two subset merges Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])} • • Compute MDL metric for all model candidates • Select the set with minimum MDL, • If , accept the model and go to step 2. • Else, the current model is optimal GECCO 2007 Llorà, Sastry, Yu & Goldberg 11
  12. 12. χeCCS Models for Different Multiplexers Building Block Size Increases
  13. 13. Fitness Inheritance using Least Squares • Proposed by Sastry, Lima & Goldberg (2006) • Surrogate is a regression using basis identified by BBs • A simple example: [1,3] [2] [4] • The schemas represented are – {0*0*, 0*1*, 0*#*, 1*0*, 1*1*, 1*#*, #*0*, #*1*, #*#*, *0**, *1**, *#**, ***0 , ***1 , ***#} • Recode the individuals by GECCO 2007 Llorà, Sastry, Yu & Goldberg 13
  14. 14. Fitness Inheritance using Least Squares • Recoding defines matrix A • Normalize the fitness GECCO 2007 Llorà, Sastry, Yu & Goldberg 14
  15. 15. Fitness Inheritance using Least Squares • Solve using least squares • Once solved, the fitness surrogate take the following form GECCO 2007 Llorà, Sastry, Yu & Goldberg 15
  16. 16. Fitness Inheritance and χeCCS • Two different problems Hidden XOR 6-input multiplexer GECCO 2007 Llorà, Sastry, Yu & Goldberg 16
  17. 17. Hidden XOR • Evolved rules and model • Surrogate accuracy GECCO 2007 Llorà, Sastry, Yu & Goldberg 17
  18. 18. 6-input Multiplexer • The evolved solution and model • The surrogate is totally off GECCO 2007 Llorà, Sastry, Yu & Goldberg 18
  19. 19. 6-input Multiplexer • The key = missing basis • χeCCS is able to solve the problem quickly, reliably, and accurately • However, the model basis are not accurate enough to build a proper surrogate GECCO 2007 Llorà, Sastry, Yu & Goldberg 19
  20. 20. Overlapping BBs using DSMGA • Proposed by Yu, Yassine, Goldberg and Chen (2003) • Based on organizational theory • Main property = DSMGA model builder (DSMcluster) deals with overlapping building blocks • The main issue = translate a populations or rules into a dependency structure matrix (DSM) • The intuition = specific bits are the ones responsible for the kind of linkage we seek GECCO 2007 Llorà, Sastry, Yu & Goldberg 20
  21. 21. Jumping to the Results • DSMcluster model for the hidden XOR – [i0 i1 i2] [i3] [i4] [i5] • DSMcluster model for the 6-input multiplexer – [i0 i1] <i2 i3 i4 i5> – It identifies a BB [i0 i1] of variables interacting with a bus <i2 i3 i4 i5> – Translated into χeCCS language: [i0 i1 i2] [i0 i1 i3] [i0 i1 i4] [i0 i1 i5] – The right model which provides the right set of basis GECCO 2007 Llorà, Sastry, Yu & Goldberg 21
  22. 22. Conclusions • The matching process is crucial and expensive • Efficient implementations can take us far to a point • Relaxation can get rid of the need of matching • For some types of problems overlapping BBs are required • DSMGA provides the proper machinery to identify the proper basis for such a surrogate GECCO 2007 Llorà, Sastry, Yu & Goldberg 22
  23. 23. Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning Techniques Xavier Llorà1,2, Kumara Sastry2, Tian-Li Yu3, David E. Goldberg2 1 National Center for Supercomputing Applications. University of Illinois at Urbana-Champaign 2 Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign 3 Department of Electrical Engineering, National Taiwan University Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 GECCO 2007 HUMIES 23

×