Using Problem-Specific Knowledge and Learning             from Experience in Estimation of Distribution                    ...
Motivation       Two key questions               Can we use past EDA runs to solve future problems faster?                ...
Outline          1. EDA bottlenecks.          2. Prior problem-specific knowledge.          3. Learning from experience.   ...
Estimation of Distribution Algorithms       Estimation of distribution algorithms (EDAs)               Work with a populat...
Efficiency Enhancement of EDAs       Main EDA bottlenecks               Evaluation.               Model building.           ...
What Comes Next?          1. Using problem-specific knowledge.          2. Learning from experience.Martin Pelikan, Mark W....
Problem-Specific Knowledge in EDAs       Basic idea              We don’t have to know much about the problem to use EDAs. ...
Example: Biasing Model Structure in Graph Bipartitioning       Graph bipartitioning              Input                    ...
Example: Biasing Model Structure in Graph Bipartitioning       Biasing models in graph bipartitioning              Soft bi...
Important Challenges       Challenges in the use of prior knowledge in EDAs               Parameter bias using prior proba...
Learning from Experience       Basic idea              Consider solving many instances of the same problem class.         ...
Example: Probability Coincidence Matrix       Probability coincidence matrix (PCM)              Hauschild, Pelikan, Sastry...
Example: Probability Coincidence Matrix       Using PCM for hard bias               Hauschild et al. (2008).              ...
(b) 24x24  Results: PCM for 32 × 32 2D Spin Glass                                                5                       E...
Results:Hauschild for 32 × 32 2D Spin Glass  Mark W. PCM                     Size          Execution-time speedup        p...
Example: Distance Restrictions       PCM limitations               Only can be applied when variables have fixed “function”...
Example: Distance Restrictions for Graph Bipartitioning       Example for graph bipartitioning               Given graph G...
Example: Distance Restrictions for ADFs       Distance metric for additively decomposable function              Additively...
(b) 20 × 20     Results: Distance Restrictions on 28 × 28 2D Spin Glass                                                   ...
Results: Distance Restrictions on 2D Spin Glass                                                      Biasing models in hBO...
Important Challenges       Challenges in learning from experience               The process of selecting threshold is manu...
Another Related Idea: Model-Directed Hybridization       Model-directed hybridization               EDA models reveal lot ...
Conclusions and Future Work       Conclusions               EDAs do a lot more than just solve the problem.               ...
Acknowledgments       Acknowledgments               NSF; NSF CAREER grant ECS-0547013.               University of Missour...
Upcoming SlideShare
Loading in …5
×

Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms

1,511 views

Published on

Slides from the presentation from the Optimization by Building and Using Probabilistic Models (OBUPM-2011) at ACM SIGEVO GECCO 2011.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,511
On SlideShare
0
From Embeds
0
Number of Embeds
471
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms

  1. 1. Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms Martin Pelikan and Mark W. Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO pelikan@cs.umsl.edu, mwh308@umsl.edu http://medal.cs.umsl.edu/Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  2. 2. Motivation Two key questions Can we use past EDA runs to solve future problems faster? EDAs do more than solve a problem. EDAs provide us with lot of information about the landscape. Why throw out this information? Can we use problem-specific knowledge to speed up EDAs? EDAs are able to adapt exploration operators to the problem. We do not have to know much about the problem to solve it. But why throw out prior problem-specific information if available? This presentation Reviews some of the approaches that attempt to do this. Focus is on two areas: Using prior problem-specific knowledge. Learning from experience (past EDA runs).Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  3. 3. Outline 1. EDA bottlenecks. 2. Prior problem-specific knowledge. 3. Learning from experience. 4. Summary and conclusions.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  4. 4. Estimation of Distribution Algorithms Estimation of distribution algorithms (EDAs) Work with a population of candidate solutions. Learn probabilistic model of promising solutions. Sample the model to generate new solutions. Probabilistic Model-Building GAs Current Selected New population population population Probabilistic Model …replace crossover+mutation with learning in EDAsMartin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience
  5. 5. Efficiency Enhancement of EDAs Main EDA bottlenecks Evaluation. Model building. Model sampling. Memory complexity (models, candidate solutions). Efficiency enhancement techniques Address one or more bottlenecks. Can adopt much from standard evolutionary algorithms. But EDAs provide opportunities to do more than that! Many approaches, we focus on a few.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  6. 6. What Comes Next? 1. Using problem-specific knowledge. 2. Learning from experience.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  7. 7. Problem-Specific Knowledge in EDAs Basic idea We don’t have to know much about the problem to use EDAs. But what if we do know something about it? Can we use prior problem-specific knowledge in EDAs? Bias populations Inject high quality solutions into population. Modify solutions using a problem-specific procedure. Bias model building How to bias Bias model structure (e.g. Bayesian network structure). Bias model parameters (e.g. conditional probabilities). Types of bias Hard bias: Restrict admissible models/parameters. Soft bias: Some models/parameters given preference over others.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  8. 8. Example: Biasing Model Structure in Graph Bipartitioning Graph bipartitioning Input Graph G = (V, E). V are nodes. E are edges. Task Split V into equally sized subsets so that the number of edges between these subsets is minimized.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  9. 9. Example: Biasing Model Structure in Graph Bipartitioning Biasing models in graph bipartitioning Soft bias (Schwarz & Ocenasek, 2000) Increase prior probability of models with dependencies included in E. Decrease prior probability of models with dependencies not included in E. Hard bias (M¨hlenbein and Mahnig, 2002) u Strictly disallow model dependencies that disagree with edges in E. In both cases performance of EDAs was substantially improved.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  10. 10. Important Challenges Challenges in the use of prior knowledge in EDAs Parameter bias using prior probabilities not explored much. Structural bias introduced only rarely. Model bias often studied only on surface. Theory missing.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  11. 11. Learning from Experience Basic idea Consider solving many instances of the same problem class. Can we learn from past EDA runs to solve future instances of this problem type faster? Similar to the use of prior knowledge, but in this case we automate the discovery of problem properties (instead of relying on expert knowledge). What features to learn? Model structure. Promising candidate solutions or partial solutions. Algorithm parameters. How to use the learned features? Modify/restrict algorithm parameters. Bias populations. Bias models.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  12. 12. Example: Probability Coincidence Matrix Probability coincidence matrix (PCM) Hauschild, Pelikan, Sastry, Goldberg (2008). Each model may contain dependency between Xi and Xj . PCM stores observed probabilities of dependencies. PCM = {pij } where i, j ∈ {1, 2, . . . , n}. pi,j = proportion of models with dependency between Xi and Xj . Example PCMMartin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  13. 13. Example: Probability Coincidence Matrix Using PCM for hard bias Hauschild et al. (2008). Set threshold for the minimum proportion of a dependency. Only accept dependencies occuring at least that often. Strictly disallow other dependencies. Using PCM for soft bias Hauschild and Pelikan (2009). Introduce prior probability of a model structure. Dependencies that were more likely in the past are given preference.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  14. 14. (b) 24x24 Results: PCM for 32 × 32 2D Spin Glass 5 Execution Time Speedup 4 3 2 1 0 1.5 2 0 0.5 1 1.5 2ntage allowed Minimum edge percentage allowed (d) 32x32 (Hauschild, Pelikan, Sastry, Goldberg; 2008)edupPelikan, Mark W. Hauschild restrictions on model-building Experience in EDAs Martin with increased Prior Knowledge and Learning from for 10
  15. 15. Results:Hauschild for 32 × 32 2D Spin Glass Mark W. PCM Size Execution-time speedup pmin % Total Dep. 256 (16 × 16) 3.89 0.020 6.4% 324 (18 × 18) 4.37 0.011 8.7% 400 (20 × 20) 4.34 0.020 7.0% 484 (22 × 22) 4.61 0.010 6.3% 576 (24 × 24) 4.63 0.013 4.6% 676 (26 × 26) 4.62 0.011 4.7% 784 (28 × 28) 4.45 0.009 5.4% 900 (30 × 30) 4.93 0.005 8.1% 1024 (32 × 32) 4.14 0.007 5.5% Table 2: Optimal speedup and the corresponding PCM threshold pmin as well as the percentage of total possible dependencies that were considered for the 2D Ising spin glass. (Hauschild, Pelikan, Sastry, Goldberg; 2008) maximum distance of dependencies remains a challenge. If the distances are restricted too severely, the bias on the model building may be too strong to allow for sufficiently complex models; this was supported also with results in Hauschild, Pelikan, Lima, and Sastry (2007). On the other hand, if the distances are not restricted sufficiently, the benefits of this approach may be negligible. Prior Knowledge and Learning from Experience in EDAsMartin Pelikan, Mark W. Hauschild
  16. 16. Example: Distance Restrictions PCM limitations Only can be applied when variables have fixed “function”. Dependencies between specific variables are either more likely or less likely across many problem instances. Concept is difficult to scale with the number of variables. Distance restrictions Hauschild, Pelikan, Sastry, Goldberg (2008). Introduce a distance metric over problem variables such that variables at shorter distances are more likely to interact. Gather statistics of dependencies at particular distances. Decide on distance threshold to disallow some dependencies. Use distances to provide soft bias via prior distributions. Distance metrics are often straightforward, especially for additively decomposable problems.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  17. 17. Example: Distance Restrictions for Graph Bipartitioning Example for graph bipartitioning Given graph G = (V, E). Assign weight 1 for all edges in E. Distance given as shortest path between vertices. Unconnected vertices given distance |V |.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  18. 18. Example: Distance Restrictions for ADFs Distance metric for additively decomposable function Additively decomposable function (ADF): m f (X1 , . . . , Xn ) = fi (Si ) i=1 fi is ith subfunction Si is subset of variables from {X1 , . . . , Xn } Connect variables in the same subset Si for some i. Distance is shortest path between variables (if connected). Distance is n if path doesn’t exist.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  19. 19. (b) 20 × 20 Results: Distance Restrictions on 28 × 28 2D Spin Glass 6 Execution Time Speedup 5 ← ←4 5 ←6 4 ←3 ←77 ←8←8 3 ←9←9 ←2 ← 10 ← 10 ← 11 ← 11 2 ← 13 ← 12 ← 12 ← 14 24 → 1 28 → 0 0.8 1 0.2 0.4 0.6 0.8 1ependencies Original Ratio of Total Dependencies (Hauschild, Pelikan; 2009) (d) 28 × 28 Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  20. 20. Results: Distance Restrictions on 2D Spin Glass Biasing models in hBOA using prior knowledge Size Execution-time speedup Max Dist Allowed qmin % Total Dep. 256 (16 × 16) 4.2901 2 0.62 4.7% 400 (20 × 20) 4.9288 3 0.64 6.0% 576 (24 × 24) 5.2156 3 0.60 4.1% 784 (28 × 28) 4.9007 5 0.63 7.6% Table 3: Distance cutoff runs with their best speedups by distance as well as the per- centage of total possible dependencies that were considered for 2D Ising spin glass (Hauschild, Pelikan; 2009) with dependencies restricted by the maximum distance, instances we ran experiments which was varied from 1 to the maximum distance found between any two proposi- tions (for example, for p = 2−4 we ran experiments using a maximum distance from 1 to 9). For some instances with p = 1 the maximum distance was 500, indicating that there was no path between some pairs of propositions. On the tested problems, small distance restrictions (restricting to only distance 1 or 2) were sometimes too restrictive and some instances would not be solved even with extremely large population sizes (N = 512000); in these cases the results were omitted (such restrictions were not used).Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  21. 21. Important Challenges Challenges in learning from experience The process of selecting threshold is manual and difficult. The ideas must be applied and tested on more problem types. Theory is missing.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  22. 22. Another Related Idea: Model-Directed Hybridization Model-directed hybridization EDA models reveal lot about problem landscape Use this information to design advanced neighborhood structures (operators). Use this information to design problem-specific operators. Lot of successes, lot of work to be done.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  23. 23. Conclusions and Future Work Conclusions EDAs do a lot more than just solve the problem. EDAs give us a lot of information about the problem. EDAs allow use of prior knowledge of various forms. Yet, most EDA researchers focus on design of new EDAs and only few look at the use of EDAs beyond solving an isolated problem instance. Future work Some of the key challenges were mentioned throughout the talk. If you are interested in collaboration, talk to us.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
  24. 24. Acknowledgments Acknowledgments NSF; NSF CAREER grant ECS-0547013. University of Missouri; High Performance Computing Collaboratory sponsored by Information Technology Services; Research Award; Research Board.Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs

×