Why advanced population initialization techniques perform poorly in high dimension

Why Advanced Population Initialization Techniques
Perform Poorly in High Dimension?
Borhan Kazimipour
Xiaodong Li
A.K. Qin

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion
SEAL 2014, Dunedin, NZ 2Why Advanced PITs Perform Poorly in HD?

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion

Definition of Population Initialization
• Definition:
– Initialization is the task of generating a set of initial points as potential solutions of
an optimization problem. These values are seen as the first position (or distribution)
of the individuals in the first generation.
• Common Parameters:
– Population size
– Number of variables or dimensionality (given)
– Variables range (given)
• Note: In this study our main focus is on continuous techniques capable of generating
real-value numbers in continuous spaces.

Importance of Population Initialization
• Why studying population initialization is important?
– Popularity: All population-based algorithms, including EA, need a population
initialization module.
– “initialize population randomly” is the most widely used expression in EA community!
– Variety: Lots of different population initialization techniques are proposed, so far.
– About 100 population initialization techniques are proposed so far*.
– Effectiveness: Clearly, starting from a good position makes it easier and faster to
achieve the aim, than starting from a bad one.
– “Advanced initialization techniques can increase the probability of finding global optima,
reduce the variation of the final results, decrease the computational costs and improve the
solution(s) quality.” *
– Inconsistency(!): Some controversy findings have been reported.
– “For example, one claimed that the desirable effect of uniformity of initial population is more
significant in high dimensions (up to 50 dimensions) while another study, in contrast,
claimed that uniform initialization techniques loose their effectiveness in problems of 12 or
more dimensions.” *
* B. Kazimipour, X. Li, and A. K. Qin. "A review of population initialization techniques for evolutionary algorithms." In Evolutionary Computation (CEC), 2014 IEEE Congress on,
pp. 2585-2592. IEEE, 2014.

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion

Definitions of Randomness
• True Random:
– A true random sequence is usually described as a sequence having strong
properties such as complete unpredictability, incompressibility and irregularity.
– Some believe true random sequences do not exist (theoretical drawback).
– There is no tool to proof a given sequence is truly random (empirical drawback).
• Computational Random:
– A sequence is computationally random if it passes some tests on the properties of
true randomness e.g., unpredictability, and incompressibility.
• Statistical Random:
– A sequence is statistically random if it passes some tests on the statistical
(distributional) properties of true random sequences e.g., uniformity.
Continuum of RandomnessCompletely
Deterministic
Truly
Random

• In this work, we follow the technique proposed in [1] to categorise PITs based on
randomness:
Does output
depend on
initial seed?
Stochastic Deterministic
YES NO
Measuring Randomness

Categorization based on Randomness
Population Initialization Techniques
Stochastic
Pseudo-Random
Number Generator
Chaotic Number
Generator
Deterministic
Quasi-Random
Sequence
Uniform Experimental
Design

Stochastic vs. Deterministic
Stochastic
• Definition:
– Their results depend on initial seeds.
• Properties:
– Unpredictable (computationally)
– Irregularity
• Examples:
– Pseudo-Random Number Generator
(PRNG)
– e.g. WELL, KISS, and Mersenne Twister
– Chaotic Number Generator (CNG)
– e.g. Tent, Logistic and Sine
Deterministic
• Definition:
– They always generate the same
population regardless of any initial
seed.
• Properties:
– Population uniformity is more
important than randomness or
unpredictability.
• Examples:
– Quasi-random Sequence
– e.g. Sobol, Halton
– Uniform Experimental Design
– e.g. Latin hypercube, good lattice points and
orthogonal design
SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD? 10

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion

Questions
Goal
• Research Question:
– Why EAs do not receive great benefit from advanced population initialization
techniques when dimensionality of problems are very high?
• Hypothesis:
– The uniformity of population for both simple and advanced techniques drop to the
same level when dimensionality grows.

Questions…
Two experiments
Part A
(baseline technique)
• Goal: Study the trend of population
uniformity when generated by popular
but simple techniques*.
• Research Questions:
1. How much the uniformity of a
population can be affected by
dimensionality?
2. Is it possible to enhance the
uniformity of initial population in
high dimensional spaces by
increasing the population size?
Part B
(advanced techniques)
• Goal: Compare the performance of
advanced initialization techniques with a
commonly used technique*.
• Research Questions:
1. Can adopting advanced
initialization techniques significantly
improve population uniformity?
2. How population size affects
performance of advanced
initializers?
SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD? 13
*Random number generators (RNG) are the most widely used initializers in the field of EA.

Questions…
Quality measures
In both parts, we use discrepancy values to measure quality of populations.
• Definition of discrepancy:
– Literally, discrepancy means non-uniformity.
– Technically, discrepancy measures are tools for determining non-uniformity level of
a given point set.
– Point sets with low discrepancy are those with high level of uniformity.
• Variations of discrepancy:
– Star L2-discrepancy
– Centred L2-discrepancy*
– Modified L2-discrepancy
– Symmetric L2-discrepancy
– Wrap-around L2-discrepancy
* Centred L2-discrepancy (CD) is used in this study.

Questions…
Analytic formulas
• L2-discrepancy (D: dimensionality, N: population size, P: population, xi,j: ith value of jth
individual)
• Centred L2-discrepancy (D: dimensionality, N: population size, P: population, xi,j: ith
value of jth individual)

Questions…
Why we chose discrepancy?
Discrepancy measures with analytic formulas are used in this study because:
Discrepancy values are not affected by the features of benchmarked problems,
employed EAs or their parameters.
– Unlike final fitness value and success rate.
Discrepancy measures can be easily applied to all kinds of real-value populations.
– Unlike DieHard and TestU01 which can only be applied on stochastic population.
Discrepancy measures having analytic formulas are faster than similar
iterative/recursive algorithms (ideal for large and high dimensional populations).
– Unlike early variants of Lp-discrepancy.

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion

Experiments
Setup
• Six population initialization techniques are selected to study.
• Three stochastic and three deterministic techniques are included in the experiments.
• RNG, which is the most common and simple initializer is chosen as the control method.

Experiments…
Setup
• In both parts:
– 20 different dimension sizes are examined (2 ≤ D ≤ 1,000).
– 20 different population sizes are examined (10 ≤ N ≤ 10,000).
– Each experiment is run for 25 times:
– 25 unique initial seeds are used for stochastic techniques
– 25 unique sequences are used for deterministic techniques (skip schema)
• Part A (baseline technique)
– Only performance of RNG is examined in different situations.
• Part B (advanced technique)
– Performance of advanced techniques are compared with the baseline (RNG).

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
– Part A
– Part B
6. Conclusion

Results
Part A – Dimensionality effect

Results
Part A – Dimensionality effect
• Discrepancy grows (i.e., uniformity drops) exponentially when the dimensionality increases.
– Discrepancy of 10,000 points in 50 dimensions is comparable with the discrepancy of 10 points
in 30 dimensions!
– 66% growth in dimensionality demands 100,000% increase in population size to recover the
uniformity!
• For D ≤ 50, a large population size may lessen the undesirable effect of dimensionality (zoomed in
the graph)

Results
Part A – Low dimensions

Results
Part A – Low dimensions
• Population size has no considerable effect on the uniformity of very small-sized problems (D ≤ 10).
• For 30 ≤ D ≤ 50, population size has a significant effect on uniformity such that it can be improved
10 to 20 times in the CD scale.
• The magnitude of improvements falls rapidly such that increasing population size beyond 1,000
points shows only a minimal improvement.

Results
Part A – Medium dimensions

Results
Part A – Medium dimensions
• Increase in population size significantly lessens the effect of dimensionality (specially N ≤ 200)
• The magnitudes of improvements fall as population grows.

Results
Part A – High dimensions

Results
Part A – High dimensions
• Uniformity of populations in spaces of above 100 dimensions is so weak that increasing population
size from 1,000 to 10,000 cannot recover it.
• The feasible and reasonable population size for very large-scale problems (100 ≤ D) is surprisingly
less than 1,000 points.
• It does not imply N has no effect in D > 100. Instead, it means N must be astronomically large to
achieve a significant enhancement. Since evaluating high dimensional populations in that magnitude
is currently computationally infeasible, keeping it around 1,000 points is more practical and
reasonable.

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
– Part A
– Part B
6. Conclusion

Results
Part B – Improvement
Improvement over common technique:
• To compare advanced initialization techniques with a common RNG, we propose a
simple formula reflecting relative improvement achieved from each advanced
technique:
where Pc is the population generated by the control technique (RNG), and Pi is the
population produced by the ith advanced initialization technique and CD is centred L2-
discrepancy.

Results
Part B – Low dimensions

Results
Part B – Low dimensions
• Some techniques (TNT and SBL) are successful in improving the common initializer (RNG), although
the biggest improvement in 2 ≤ D ≤ 50 is less than 20%.
• Some techniques (GLP) are very sensitive to population size, others (SBL) are more stable.
• For D ≤ 50 ,with no exception, all techniques work relatively better when population size increases.
• Mixed good and bad results can be expected from both categories of initialization techniques*.
*B. Kazimipour, X. Li, and A. K. Qin. "Initialization methods for large scale global optimization." In Evolutionary Computation (CEC), 2013 IEEE Congress on, pp. 2750-2757. IEEE, 2013.

Results
Part B – Medium and High dimensions

Results
• All trends converge to one of the three values: 0%, -25% and -80%.
• This clearly shows that employing advanced initialization techniques provides no significant
improvement in high dimensions, at least in terms of uniformity.

Results
• Even increasing population size from 10 to 10,000 does not result in any relative improvement
• SBL with 10 and TNT with all population sizes perform almost the same as RNG.
• The others, however, perform poorly in comparison with a RNG having the same population size*.
* B. Kazimipour, X. Li, and A. K. Qin. "Effects of population initialization on differential evolution for large scale optimization." In Evolutionary Computation (CEC), 2014 IEEE
Congress on, pp. 2404-2411. IEEE, 2014.

Outlines
1. Introduction
2. Background
3. Questions
4. Experiments
5. Results
6. Conclusion

Conclusion
What we did
• We investigate the reasons that causes advanced population initialization techniques to
perform as poor as simple RNG in high dimensional spaces.
• We also studied the effect of population size on the quality (uniformity) of the resulting
populations.
• We studied:
– 6 techniques (3 deterministic and 3 stochastic),
– 20 dimension sizes (up to 1,000),
– 20 population sizes (up to 10,000),
– thorough 25 runs.

Conclusion
What we observed
• The uniformity of initial population drops exponentially when dimensionality rises
linearly.
• Increasing population size up to a computationally feasible bound cannot maintain
uniformity (except for some small and medium-sized spaces).
• The advanced initializers are as vulnerable to the curse of dimensionality as simple
RNG.
• Adopting advanced initializers in medium and large-scale spaces does not result in
any significant improvement.
• Some advanced techniques are even more sensitive to the adverse effect of
dimensionality than the simple RNG.
We only recommend the use of advanced techniques when the population and
dimension sizes are small. In higher dimensional spaces or when the population size
is relatively large, no significant improvement is excepted from advanced techniques.

Thank you
☺☺☺☺
Any question or comment?
39SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD?

Why advanced population initialization techniques perform poorly in high dimension

Recommended

Recommended

More Related Content

Similar to Why advanced population initialization techniques perform poorly in high dimension

Similar to Why advanced population initialization techniques perform poorly in high dimension (20)

Recently uploaded

Recently uploaded (20)

Why advanced population initialization techniques perform poorly in high dimension