Learning for Optimization: EDAs, probabilistic modelling, or ...

  • 442 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
442
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland Q. 4072 marcusg@itee.uq.edu.au
  • 2. Talk outline:  Optimization, heuristics and metaheuristics.  “Estimation of Distribution” (optimization) algorithms (EDAs): a brief overview.  A framework for describing EDAs.  Other modelling approaches in metaheuristics.  Summary Marcus Gallagher - MASCOS Symposium, 26/11/04 2
  • 3. “Hard” Optimization Problems Goal: Find x* S such that f (x*) f (x), x S  where S is often multi-dimensional; real-valued or binary n n S R or S 0,1  Many classes of optimization problems (and algorithms) exist.  When might it be worthwhile to consider metaheuristic or machine learning approaches? Marcus Gallagher - MASCOS Symposium, 26/11/04 3
  • 4. Finding an “exact” solution is intractable. Limited knowledge of f()  No derivative information.  May be discontinuous, noisy,… Evaluating f() is expensive in terms of time or cost. f() is known or suspected to contain nasty features  Many local minima, plateaus, ravines. The search space is high-dimensional. Marcus Gallagher - MASCOS Symposium, 26/11/04 4
  • 5. What is the “practical” goal of (global) optimization?  “There exists a goal (e.g. to find as small a value of f() as possible), there exist resources (e.g. some number of trials), and the problem is how to use these resources in an optimal way.”  A. Torn and A. Zilinskas, Global Optimisation. Springer- Verlag, 1989. Lecture Notes in Computer Science, Vol. 350. Marcus Gallagher - MASCOS Symposium, 26/11/04 5
  • 6. Heuristics Heuristic (or approximate) algorithms aim to find a good solution to a problem in a reasonable amount of computation time – but with no guarantee of “goodness” or “efficiency” (cf. exact or complete algorithms). Broad classes of heuristics:  Constructive methods  Local search methods Marcus Gallagher - MASCOS Symposium, 26/11/04 6
  • 7. Metaheuristics Metaheuristics are (roughly) high-level strategies that combinine lower-level techniques for exploration and exploitation of the search space.  An overarching term to refer to algorithms including Evolutionary Algorithms, Simulated Annealing, Tabu Search, Ant Colony, Particle Swarm, Cross- Entropy,… C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys, 35(3), 2003, pp. 268-308. Marcus Gallagher - MASCOS Symposium, 26/11/04 7
  • 8. Learning/Modelling for Optimization Most optimization algorithms make some (explicit or implicit) assumptions about the nature of f(). Many algorithms vary their behaviour during execution (e.g. simulated annealing). In some optimization algorithms the search is adaptive  Future search points evaluated depend on previous points searched (and/or their f() values, derivatives of f() etc). Learning/modelling can be implicit (e.g, adapting the step-size in gradient descent, population in an EA). …or explicit; examples from optimization literature:  Nelder-Mead simplex algorithm.  Response surfaces (metamodelling, surrogate function). Marcus Gallagher - MASCOS Symposium, 26/11/04 8
  • 9. EDAs: Probabilistic Modelling for Optimization Based on the use of (unsupervised) density estimators/generative statistical models. Idea is to convert the optimization problem into a search over probability distributions.  P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution Algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, 2002. The probabilistic model is in some sense an explicit model of (currently) promising regions of the search space. Marcus Gallagher - MASCOS Symposium, 26/11/04 9
  • 10. EDAs: toy example Marcus Gallagher - MASCOS Symposium, 26/11/04 10
  • 11. EDAs: toy example Marcus Gallagher - MASCOS Symposium, 26/11/04 11
  • 12. GAs and EDAs compared GA pseudocode 1. Initialize the population, X(t); 2. Evaluate the objective function for each point; 3. Selection(); 4. Crossover(); 5. Mutation(); 6.  Form new population X(t+1); 7. While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04 12
  • 13. GAs and EDAs compared EDA pseudocode 1. Initialize a probability model, Q(x); 2. Create a population of points by sampling from Q(x); 3. Evaluate the objective function for each point; 4. Update Q(x) using selected population and f() values; 5. While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04 13
  • 14. EDA Example 1 Population-based Incremental Learning (PBIL)  S. Baluja, R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. ICML’95. p1 = p2 = pn = Pr(x1=1) Pr(x2=1) Pr(xn=1) pi 1 pi xib Marcus Gallagher - MASCOS Symposium, 26/11/04 14
  • 15. EDA Example 2 Mutual Information Maximization for Input Clustering (MIMIC)  J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by estimating probability densities. Advances in Neural Information Processing Systems, vol.9, 1997. p(x) p( xi1 | xi2 ) p( xi2 | xi3 ) p( xin 1 | xin ) p( xin ) Marcus Gallagher - MASCOS Symposium, 26/11/04 15
  • 16. EDA Example 3 Combining Optimizers with Mutual Information Trees (COMIT)  S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial optimization: learning the structure of the search space. Proc. ICML’97. Uses a tree-structured graphical model  Model can be constructed in O(n2) time using a variant of the minimum spanning tree algorithm.  Model is optimal, given the restrictions, in the sense that the Kullback-Liebler divergence between the model and a full joint distribution is minimized. Marcus Gallagher - MASCOS Symposium, 26/11/04 16
  • 17. EDA Example 4 Bayesian Optimization Algorithm (BOA)  M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In Proc. GECCO’99. Bayesian network model where nodes can have at most k parents.  Greedy search over the Bayesian Dirichlet equivalence metric to find the network structure. Marcus Gallagher - MASCOS Symposium, 26/11/04 17
  • 18. Further work on EDAs EDAs have also been developed  For problems with continuous and mixed variables.  That use mixture models and kernel estimators - allowing for the modelling of multi-modal distributions.  …and more! Marcus Gallagher - MASCOS Symposium, 26/11/04 18
  • 19. A framework to describe building and adapting a probabilistic model for optimization See: M. Gallagher and M. Frean. Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift. To appear, Evolutionary Computation, 2005. Consider a continuous EDA with model n Q(x) Qi ( xi ) i 1 Consider a Boltzmann distribution over f(x) 1 f ( x) P( x) exp Z T Marcus Gallagher - MASCOS Symposium, 26/11/04 19
  • 20. As T→0, P(x) tends towards a set of impulse spikes over the global optima. Now, we have a probability distribution that we know the form of, Q(x) and we would like to modify it to be close to P(x). KL divergence: Q( x) K Q( x) log dx x P( x) Let Q(x) be a Gaussian; try and minimize K via gradient descent with respect to the mean parameter of Q(x). Marcus Gallagher - MASCOS Symposium, 26/11/04 20
  • 21. The gradient becomes Q x Q( x) v 1 K Q( x).(x ) f ( x)dx vT x An approximation to the integral is to use a sample of x from Q(x) 1 K ( xi ) f ( xi ) nvT xi S Marcus Gallagher - MASCOS Symposium, 26/11/04 21
  • 22. The algorithm update rule is then (x i ˆ ( xi ) )f n xi S Similar ideas can be found in: A. Berny. Statistical Machine Learning and Combinatorial Optimization. In L. Kallel et al. eds, Theoretical Aspects of Evolutionary Computation, pp. 287-306. Springer. 2001. M. Toussaint. On the evolution of phenotypic exploration distributions. In C. Cotta et al. eds, Foundations of Genetic Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003. Marcus Gallagher - MASCOS Symposium, 26/11/04 22
  • 23. Some insights The derived update rule is closely related to those found in Evolution Strategies and a version of PBIL for continuous spaces. It is possible to view these existing algorithms as approximately doing KL minimization. The objective function appears explicitly in this update rule (no selection). Marcus Gallagher - MASCOS Symposium, 26/11/04 23
  • 24. Other Research in Learning/Modelling for Optimization J. A. Boyan and A. W. Moore. Learning Evaluation Functions to Improve Optimization by Local Search. Journal of Machine Learning Research 1:2, 2000. B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to Noisy and Costly Optimization. International Conference on Machine Learning, 2000. D. R. Jones. A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization 21(4):345- 383, 2001. Reinforcement learning  R. J. Williams (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256.  V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial Optimization Problems Using a Population of Reinforcement Learning Agents, Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999. Marcus Gallagher - MASCOS Symposium, 26/11/04 24
  • 25. Summary The field of metaheuristics (including Evolutionary Computation) has produced  A large variety of optimization algorithms  Demonstrated good performance on a range of real- world problems. Metaheuristics are considerably more general:  can even be applied when there isn’t a “true” objective function (coevolution).  Can evolve non-numerical objects. Marcus Gallagher - MASCOS Symposium, 26/11/04 25
  • 26. Summary EDAs take an explicit modelling approach to optimization.  Existing statistical models and model-fitting algorithms can be employed.  Potential for solving challenging problems.  Model can be more easily visualized/interpreted than a dynamic population in a conventional EA. Although the field is highly active, it is still relatively immature  Improve quality of experimental results.  Make sure research goals are well-defined.  Lots of preliminary ideas, but lack of comparative/followup research.  Difficult to keep up with the literature and see connections with other fields. Marcus Gallagher - MASCOS Symposium, 26/11/04 26
  • 27. The End! Questions? Marcus Gallagher - MASCOS Symposium, 26/11/04 27