Covariance Matrix Adaptation Evolution Strategy - CMA-ES

Covariance Matrix Adaptation Evolution
Strategy (CMA-ES)
BY:
OSAMA SALAH ELDIN
UNDER SUPERVISION:
PROF. MAGDA B. FAYEK
6/3/2016 CAIRO UNIVERSITY - COMPUTER ENGINEERING - 2015

Outline
oWhat is Optimization?
oWhat is an Evolution Strategy?
oStep-size Adaptation
oCumulative step-size adaptation
oCovariance Matrix Adaptation
oApplication - Modeling
6/3/2016 CAIRO UNIVERSITY - COMPUTER ENGINEERING - 2015 2

What is optimization?
oOptimization is the minimization or the maximization of a function
y=f(x)
Global MinimumLocal Minimum xLocal Minimum

oTry to solve these problems:
x = 2
x3 – 8 = 0

x=3, y=2
x2 + 3.y – 15 = 0

x=3, y=2, z=2
x2 + y + 2.z – 15 = 0

Can one try all combinations??
This is not recommended

Use an evolution strategy

Outline

What is an Evolution Strategy?

What is an Evolution Strategy?
oIt is a technique that searches for the optimum solution in a search-space
oEvolution Strategies belong to the family of Evolutionary Computation
oEvolution strategy steps:
1. Generate a population of candidate solutions
2. Evaluate every individual in the population
3. Select parents from the fittest individuals
4. Reproduce offspring of the next generation (Recombination & mutation)
5. Repeat until a termination criterion is met

Evolution Strategies VS. Genetic Algorithms
ES GA
Initial Population
Random mutations of the
initial guess
Random or seeded
Evaluation Objective Function Fitness (Evaluation) Function
Selection Truncation Selection Different methods
Reproduction Recombination + Mutation Crossover + Mutation
Termination Almost similar stop conditions

What is an Evolution Strategy? - Example
1. Generate a population of candidate solutions
y=f(x)
x

fitness
2. Evaluate every individual in the population
y=f(x)
x

fitness
3. Select parents from the fittest individuals
y=f(x)
x

4. Reproduce offspring of the next generation (Recombination & mutation)
y=f(x)
x

y=f(x)
xEvaluate & Select

y=f(x)
xEvaluate & SelectReproduce

y=f(x)
x
Optimum Solution
Evaluate – Select – ReproduceReproduceTerminate

The Basic Evolution Strategy
oThe basic evolution strategy is defined by:
(µ/ρ, λ)-ES and (µ/ρ+ λ)-ES
Where:
µ The number of selected individuals per generation
ρ The number of parents (selected from µ) involved in recombination (≤ µ)
λ The number of individuals per generation (population size)
, Comma Selection  µ parents are selected from the λ individuals
+ Plus Selection  µ parents are selected from the λ individuals + the
current ρ parents

The Basic Evolution Strategy - Example
(10/6 , 50)-ES
Select the fittest 10 individuals from the 50 individuals of the current
population, and select 6 random ones from them. Recombine these 6
parents to generate 50 new offspring
(10/6 + 50)-ES
Select the fittest 10 individuals from the 50 individuals of the current
population along with their 6 parents, and select 6 random ones from them
all (from the 56). Recombine these 6 parents to generate 50 new offspring

The structure of an Individual
Object Parameter Vector (Y) Strategy Parameter Vector (S) Individual’s Fitness (F)
Y The candidate solution of the problem (e.g. (x, y) point)
S The parameters used by the strategy (e.g. mutation strength)
F The fitness of the candidate solution y as measured by the fitness
function (i.e. the value of the objective function)
Y = {x1, x2, z}

The structure of an Individual
Object Parameter Vector (Y) Strategy Parameter Vector (S) Individual’s Fitness (F)
• Evolution strategies search for the optimum:
1. Solution: The highest fitness
2. Strategy Parameters: The fastest improvement
Two search spaces

ES Steps
1- Initial Solution

2 - Initial Population
ES Steps
1- Initial Solution

ES Steps
3 - Evaluation 1- Initial Solution

3 - Evaluation4 - Selection
ES Steps
µ = 4 ρ=1 2 - Initial Population
1- Initial Solution

5 - Reproduction
ES Steps
1- Initial Solution
3 - Evaluation
4 - Selection

ES Steps
1- Initial Solution
3 - Evaluation
4 - Selection
5 - Reproduction

6 - Termination
ES Steps
1- Initial Solution
3 - Evaluation
4 - Selection
Optimum Solution
5 - Reproduction

1- Initial Solution
3 - Evaluation
4 - Selection
• An initial guess, should be as close as possible to
the expected solution
6 - Termination
5 - Reproduction

1- Initial Solution
3 - Evaluation
4 - Selection
• The intial population is generated by mutating the
initial solution
6 - Termination
5 - Reproduction

1- Initial Solution
3 - Evaluation
4 - Selection
• Every individual is evaluated by the objective
function
6 - TerminationBest Fitness = 0
5 - Reproduction

1- Initial Solution
3 - Evaluation
4 - Selection
• Truncation Selection is used
6 - Termination
5 - Reproduction
Select the fittest µ individuals
Drop the other individuals

1- Initial Solution
3 - Evaluation
4 - Selection
5 - Reproduction
6 - Termination
Recombination
Reproduction
Mutation
Combining two or more
parents to produce a mean
for the new generation
Adding normally-
distributed random vectors
to the new mean

1- Initial Solution
3 - Evaluation
4 - Selection
5 - Reproduction
6 - Termination
Recombination
S1 F11 3
S2 F24 6
Solution Strategy Parameters
S32.5 4.5
A simple recombination is taking the average
P1
P2
Fitness
To be calculated

1- Initial Solution
3 - Evaluation
4 - Selection
5 - Reproduction
6 - Termination
Recombination
ρ = 2 ρ = 4

1- Initial Solution
3 - Evaluation
4 - Selection
5 - Reproduction
6 - Termination
Recombination Mutation
Reproduction
Combining two or more
parents to produce a mean
for the new generation
Adding normally-
distributed random vectors
to the new mean

1- Initial Solution
3 - Evaluation
4 - Selection
5.5 8.0Parent
RX1 RY1
Generate λ normally-distributed
random vectors
RX2 RY2
RX3 RY3
5.5 + RX1 8.0 + RY1
5.5 + RX2 8.0 + RY2
5.5 + RX3 8.0 + RY3
Add each of the λ mutating vectors
to the initial solution 6 - Termination
5 - Reproduction
Mutation
Recombination

1- Initial Solution
3 - Evaluation
4 - Selection
5.5 8.0
RX1 RY1
RX2 RY2
RX3 RY3
RX1
RY1
5.5 + RX1 8.0 + RY1
5.5 + RX2 8.0 + RY2
5.5 + RX3 8.0 + RY3
RX2
RY2
RX3
RY3
5 - Reproduction
Mutation
6 - Termination
Recombination
(5.5, 8.0)

1- Initial Solution
3 - Evaluation
4 - Selection
5.5 8.0
RX1 RY1
RX2 RY2
RX3 RY3
5.5 + RX1 8.0 + RY1
5.5 + RX2 8.0 + RY2
5.5 + RX3 8.0 + RY3
RX1
RY1
RX2
RY2
RX3
RY3
5 - Reproduction
6 - Termination
Recombination
Mutation

6 - Termination
Mutation vectors are
normally – distributed
around their parent
Recombination
5 - Reproduction
1- Initial Solution
3 - Evaluation
4 - Selection
Repeat
Mutation

Outline

Step-size Adaptation (σSA-ES)
Many small steps

These gaps are
unrealistic
Why ??

The parent of a generation is
an individual in the previous
generation

6 - Termination
Recombination
5 - Reproduction
The Basic Evolution Strategy - Revisited
1- Initial Solution
3 - Evaluation
4 - Selection
Mutation
Mutation

The Gaussian (Normal) Distribution
N ( µ , σ2)
Mean
(Standard Deviation)2
= Variance
≡ µ + σ.N (0, 1)
µ + σ.N (0, I)
Multivariate

6 - Termination
Recombination
5 - Reproduction
The Basic Evolution Strategy - Revisited
1- Initial Solution
3 - Evaluation
4 - Selection
Mutation
Mutation
Step-size

The Gaussian (Normal) Distribution
σ=1.0σ=1.5σ=2.0σ=2.5

Object Parameter Vector (Y) Individual’s Fitness (F)
Strategy Parameter Vector (S)
σ
n = length(Y): Problem dimension

Outline

Cumulative Step-size Adaptation (CSA)
Increase step size Decrease step size

1. Calculate the average 𝑍𝑡 of the fittest µ solutions
2. Calculate the cumulative path Pc at generation t
The parameter c is called the cumulation parameter, it determines how rapidly
the information stored in Pct fades. The typical value of c is between 1/n and 1/

3. Update the mutation strength (i.e. step-size)
The damping parameter dσ determines how much the step-size can
change. (Normally, it is set to 1)
Where ||𝑋||‖ is the Euclidean norm of the vector =

Object Parameter Vector (Y) σ Individual’s Fitness (F)

Outline

Promising
individuals
Covariance-Matrix Adaptation (CMA)

N (0, I) N (0, C)
How??
Adapt the Covariance MatrixIdentity Matrix

Covariance-Matrix

oTo which direction should the population be directed?
VarianceCovariance

oVariance is a measure of how far a variable changes away from its mean
Variance
, 𝑋 is the mean of the samples of X

0
2
4
6
8
10
12
0
2
4
6
8
10
12
Variance
High Variance
Low Variance
mean
mean

oCovariance is a measure of how two variables change together
Covariance

Covariance
(a)
(c)
Cov(a, c) = -3.0091
(b)
Cov(a, b) = 8.3909

Covariance-Matrix
oIt is a matrix whose (i, j) element is the covariance between the ith and the
jth variables

Variance=σ2Covariance

Principal Component
oCMA-ES performs a type of Principal Component Analysis (PCA)
oPrincipal Component: The principal variable (component) is equivalent to the
principal player:
1. High Variance
2. Low Covariance with other
components
 Distinct, or very special

 Towards the principal component

• The optimum solution is (5, 50)
 A practical run of CMA-ES x
• The population moves faster
towards the direction of the
second component (50)
• The initial guess is (0, 0)

optimum (5, 50)
Initial (0, 0)
generation
Variance
(σ2)
Step-size

CMA-ES (Steps)-1
oInitial Values
◦ C = I (n x n Identity Matrix)
◦ An initial guess m (n x 1 mean of the initial population)
◦ An initial step size (n x 1 standard-deviation matrix)
1. Generate λ offspring by mutating the mean m:
2. Evaluate the λ offspring
3. Sort the offspring by fitness so that:
Fittest Individual

CMA-ES (Steps)-2
4. Update the mean m of the population
 Weighted average
The constants wi are selected such that:
µ is the number of parents

CMA-ES (Steps)-3
5. Update step-size cumulation path 𝑃 𝜎 :
, where:
 The random vector that generated the individual xi:λ
◦ cσ : Decay rate for evolution path for step-size σ (≈ 4/n)

CMA-ES (Steps)-4
6. Update the covariance-matrix cumulation path Pc ∈ ℝ(nx1):
cc: Decay rate for evolution path of C
7. Update the step-size σ:
Where ||X|| is the Euclidean norm of the vector X(m) =

CMA-ES (Steps)-5
8. Update the covariance matrix C:
c1: Learning rate for rank-one update of C(≈ 2/n2)
cµ: Learning rate for rank-µ update of C (≈ µw/n2)
Repeat the previous steps until a satisfying solution is found or a maximum
number of generations is exceeded or no significant improvement is
achieved

Advantages of CMA-ES
oCMA-ES can outperform other strategies in the following cases:
◦ Non-separable problems (the parameters of the objective function are
dependent)
◦ The derivative of the objective function is not available
◦ High dimension problems (n is large)
◦ Very large search spaces

CMA-ES Limitations
oCMA-ES can be outperformed by other strategies in the following cases:
◦ Partly separable problems (i.e. optimization of n-dimension objective
function can be divided into a series of n optimizations of every single
parameter)
◦ The derivative of the objective function is easily available (Gradient
Descend / Ascend)
◦ Small dimension problems
◦ Problems that can be solved using a relatively small number of function
evaluations (e.g. < 10n evaluations)

Outline

Application - Modeling
f(x)X y
2. Guess a model f(x) = a.x2 + b.x + c
1. Collect Samples
x1
x2
x3
.
.
xn
y1
y2
y3
.
.
yn
3. Optimize the model Find the optimum values of {a, b, c}

Application – Modeling in Robocode
Motion Model
Find a model for this path

Motion Model – Steps
1. Collect Samples: The (x, y) location of the enemy
2. Guess the model (using GA)
3. Optimize the model

Motion Model – Observations
oDifferent models give different human-like behaviors
Careless Reckless Tricky

Using the Source Code
oThe source code (m-file for MATLAB) for CMA-ES (C, C++, Java, Fortran, Python,
R, Scilab, Matlab / Octave) is available at:
https://www.lri.fr/~hansen/cmaes_inmatlab.html
◦ purecmaes.m: Simple implementation
◦ cmaes.m: Production Code
1. Specify the initial values of the parameters (step-size, covariance matrix, initial
guess, population size … etc.)
2. Define your objective function
function f=obj_func(x)
f = (calculate the error here) % e.g. f = x(1)^3 – 8;
Matlab /Octave

3. Call the function Matlab /Octave
Using the Source Code
>> function_mfile( parameters)

Questions
osama.salah.eg@ieee.org

Thanks

Covariance Matrix Adaptation Evolution Strategy - CMA-ES

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Covariance Matrix Adaptation Evolution Strategy - CMA-ES

Similar to Covariance Matrix Adaptation Evolution Strategy - CMA-ES (20)

Recently uploaded

Recently uploaded (20)

Covariance Matrix Adaptation Evolution Strategy - CMA-ES