Explicit Modelling in
Dr Marcus Gallagher
School of Information Technology and Electrical
University of Queensland Q. 4072
Optimization, heuristics and metaheuristics.
“Estimation of Distribution” (optimization)
algorithms (EDAs): a brief overview.
A framework for describing EDAs.
Other modelling approaches in
Marcus Gallagher - MASCOS Symposium, 26/11/04 2
“Hard” Optimization Problems
x* S such that f (x*) f (x), x S
where S is often multi-dimensional; real-valued or
S R or S 0,1
Many classes of optimization problems (and
When might it be worthwhile to consider metaheuristic
or machine learning approaches?
Marcus Gallagher - MASCOS Symposium, 26/11/04 3
Finding an “exact” solution is intractable.
Limited knowledge of f()
No derivative information.
May be discontinuous, noisy,…
Evaluating f() is expensive in terms of time
f() is known or suspected to contain nasty
Many local minima, plateaus, ravines.
The search space is high-dimensional.
Marcus Gallagher - MASCOS Symposium, 26/11/04 4
What is the “practical” goal of (global)
“There exists a goal (e.g. to find as small a
value of f() as possible), there exist resources
(e.g. some number of trials), and the problem
is how to use these resources in an optimal
A. Torn and A. Zilinskas, Global Optimisation. Springer-
Verlag, 1989. Lecture Notes in Computer Science, Vol.
Marcus Gallagher - MASCOS Symposium, 26/11/04 5
Heuristic (or approximate) algorithms aim
to find a good solution to a problem in a
reasonable amount of computation time –
but with no guarantee of “goodness” or
“efficiency” (cf. exact or complete
Broad classes of heuristics:
Local search methods
Marcus Gallagher - MASCOS Symposium, 26/11/04 6
Metaheuristics are (roughly) high-level strategies
that combinine lower-level techniques for
exploration and exploitation of the search space.
An overarching term to refer to algorithms including
Evolutionary Algorithms, Simulated Annealing, Tabu
Search, Ant Colony, Particle Swarm, Cross-
C. Blum and A. Roli. Metaheuristics in Combinatorial
Optimization: Overview and Conceptual Comparison. ACM
Computing Surveys, 35(3), 2003, pp. 268-308.
Marcus Gallagher - MASCOS Symposium, 26/11/04 7
Learning/Modelling for Optimization
Most optimization algorithms make some (explicit or
implicit) assumptions about the nature of f().
Many algorithms vary their behaviour during execution
(e.g. simulated annealing).
In some optimization algorithms the search is adaptive
Future search points evaluated depend on previous points
searched (and/or their f() values, derivatives of f() etc).
Learning/modelling can be implicit (e.g, adapting the
step-size in gradient descent, population in an EA).
…or explicit; examples from optimization literature:
Nelder-Mead simplex algorithm.
Response surfaces (metamodelling, surrogate function).
Marcus Gallagher - MASCOS Symposium, 26/11/04 8
EDAs: Probabilistic Modelling for
Based on the use of (unsupervised) density
estimators/generative statistical models.
Idea is to convert the optimization problem into a
search over probability distributions.
P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution
Algorithms: a new tool for evolutionary computation. Kluwer
Academic Publishers, 2002.
The probabilistic model is in some sense an
explicit model of (currently) promising regions of
the search space.
Marcus Gallagher - MASCOS Symposium, 26/11/04 9
EDAs: toy example
Marcus Gallagher - MASCOS Symposium, 26/11/04 10
EDAs: toy example
Marcus Gallagher - MASCOS Symposium, 26/11/04 11
GAs and EDAs compared
1. Initialize the population, X(t);
2. Evaluate the objective function for each
6. Form new population X(t+1);
7. While !(terminate()) Goto 2;
Marcus Gallagher - MASCOS Symposium, 26/11/04 12
GAs and EDAs compared
1. Initialize a probability model, Q(x);
2. Create a population of points by
sampling from Q(x);
3. Evaluate the objective function for
4. Update Q(x) using selected population
and f() values;
5. While !(terminate()) Goto 2;
Marcus Gallagher - MASCOS Symposium, 26/11/04 13
EDA Example 1
Population-based Incremental Learning
S. Baluja, R. Caruana. Removing the Genetics from the
Standard Genetic Algorithm. ICML’95.
p1 = p2 = pn =
Pr(x1=1) Pr(x2=1) Pr(xn=1)
pi 1 pi xib
Marcus Gallagher - MASCOS Symposium, 26/11/04 14
EDA Example 2
Mutual Information Maximization for Input
J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by
estimating probability densities. Advances in Neural Information
Processing Systems, vol.9, 1997.
p(x) p( xi1 | xi2 ) p( xi2 | xi3 ) p( xin 1 | xin ) p( xin )
Marcus Gallagher - MASCOS Symposium, 26/11/04 15
EDA Example 3
Combining Optimizers with Mutual Information
S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial
optimization: learning the structure of the search space. Proc. ICML’97.
Uses a tree-structured graphical model
Model can be constructed in O(n2) time using a
variant of the minimum spanning tree algorithm.
Model is optimal, given the restrictions, in the sense
that the Kullback-Liebler divergence between the
model and a full joint distribution is minimized.
Marcus Gallagher - MASCOS Symposium, 26/11/04 16
EDA Example 4
Bayesian Optimization Algorithm (BOA)
M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian
optimization algorithm. In Proc. GECCO’99.
Bayesian network model where nodes can
have at most k parents.
Greedy search over the Bayesian Dirichlet
equivalence metric to find the network
Marcus Gallagher - MASCOS Symposium, 26/11/04 17
Further work on EDAs
EDAs have also been developed
For problems with continuous and mixed
That use mixture models and kernel
estimators - allowing for the modelling of
Marcus Gallagher - MASCOS Symposium, 26/11/04 18
A framework to describe building and adapting a
probabilistic model for optimization
M. Gallagher and M. Frean. Population-Based Continuous
Optimization, Probabilistic Modelling and Mean Shift. To
appear, Evolutionary Computation, 2005.
Consider a continuous EDA with model
Q(x) Qi ( xi )
Consider a Boltzmann distribution over f(x)
1 f ( x)
P( x) exp
Marcus Gallagher - MASCOS Symposium, 26/11/04 19
As T→0, P(x) tends towards a set of impulse
spikes over the global optima.
Now, we have a probability distribution that we
know the form of, Q(x) and we would like to
modify it to be close to P(x). KL divergence:
K Q( x) log dx
Let Q(x) be a Gaussian; try and minimize K via
gradient descent with respect to the mean
parameter of Q(x).
Marcus Gallagher - MASCOS Symposium, 26/11/04 20
The gradient becomes
K Q( x).(x ) f ( x)dx
An approximation to the integral is to use a
sample of x from Q(x)
K ( xi ) f ( xi )
nvT xi S
Marcus Gallagher - MASCOS Symposium, 26/11/04 21
The algorithm update rule is then
(x i ˆ ( xi )
n xi S
Similar ideas can be found in:
A. Berny. Statistical Machine Learning and Combinatorial
Optimization. In L. Kallel et al. eds, Theoretical Aspects of
Evolutionary Computation, pp. 287-306. Springer. 2001.
M. Toussaint. On the evolution of phenotypic exploration
distributions. In C. Cotta et al. eds, Foundations of Genetic
Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003.
Marcus Gallagher - MASCOS Symposium, 26/11/04 22
The derived update rule is closely related
to those found in Evolution Strategies and
a version of PBIL for continuous spaces.
It is possible to view these existing
algorithms as approximately doing KL
The objective function appears explicitly in
this update rule (no selection).
Marcus Gallagher - MASCOS Symposium, 26/11/04 23
Other Research in Learning/Modelling
J. A. Boyan and A. W. Moore. Learning Evaluation Functions to
Improve Optimization by Local Search. Journal of Machine Learning
Research 1:2, 2000.
B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to
Noisy and Costly Optimization. International Conference on
Machine Learning, 2000.
D. R. Jones. A Taxonomy of Global Optimization Methods Based
on Response Surfaces. Journal of Global Optimization 21(4):345-
R. J. Williams (1992). Simple statistical gradient-following algorithms for
connectionist reinforcement learning. Machine Learning, 8:229-256.
V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial
Optimization Problems Using a Population of Reinforcement Learning Agents,
Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999.
Marcus Gallagher - MASCOS Symposium, 26/11/04 24
The field of metaheuristics (including
Evolutionary Computation) has produced
A large variety of optimization algorithms
Demonstrated good performance on a range of real-
Metaheuristics are considerably more general:
can even be applied when there isn’t a “true”
objective function (coevolution).
Can evolve non-numerical objects.
Marcus Gallagher - MASCOS Symposium, 26/11/04 25
EDAs take an explicit modelling approach to
Existing statistical models and model-fitting algorithms can be
Potential for solving challenging problems.
Model can be more easily visualized/interpreted than a dynamic
population in a conventional EA.
Although the field is highly active, it is still relatively
Improve quality of experimental results.
Make sure research goals are well-defined.
Lots of preliminary ideas, but lack of comparative/followup
Difficult to keep up with the literature and see connections with
Marcus Gallagher - MASCOS Symposium, 26/11/04 26
Marcus Gallagher - MASCOS Symposium, 26/11/04 27