Not all parents are equal for MO-CMA-ES

480 views

Published on

t. The Steady State variants of the Multi-Objective Covariance
Matrix Adaptation Evolution Strategy (SS-MO-CMA-ES) generate one
offspring from a uniformly selected parent. Some other parental selection
operators for SS-MO-CMA-ES are investigated in this paper. These operators involve the definition of multi-objective rewards, estimating the
expectation of the offspring survival and its Hypervolume contribution.
Two selection modes, respectively using tournament, and inspired from
the Multi-Armed Bandit framework, are used on top of these rewards.
Extensive experimental validation comparatively demonstrates the merits of these new selection operators on unimodal MO problems

Published in: Technology, Self Improvement
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
480
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Not all parents are equal for MO-CMA-ES

  1. 1. Motivation Parent Selection Operators Experimental validationNot all parents are equal for MO-CMA-ESIlya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université Paris-Sud, 91128 Orsay Cedex, France Kindly presented by Jan Salmen, Universität Bochum with apologies from the authors Evolutionary Multi-Criterion Optimization - EMO 2011 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16
  2. 2. Motivation Parent Selection Operators Experimental validationContent 1 Motivation Why some parents are better than other ? Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? 2 Parent Selection Operators Tournament Selection (quality-based) Multi-armed Bandits Reward-based Parent Selection 3 Experimental validation Results Summary Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 2/ 16
  3. 3. Motivation Why some parents are better than other ? Parent Selection Operators Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? Experimental validationWhy some parents are better than other ? Objective 2 Objective 2 Objective 1 Objective 1 Hard to find segments of the Pareto front Some parents improve the fitness faster than other Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 3/ 16
  4. 4. Motivation Why some parents are better than other ? Parent Selection Operators Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? Experimental validationWhy MO-CMA-ES ? µM O -(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection CMA-ES excellent on single-objective problems (e.g., BBOB) In µM O -(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES 1 Objective 2 Objective 2 Objective 1 Objective 1 Two typical adaptations for MO-CMA-ES 1 C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 4/ 16
  5. 5. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionEvolution Loop for Steady-state MO-CMA-ES Parent Selection Variation Evaluation Environmental selection Update / Adaptation Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 5/ 16
  6. 6. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionTournament Selection (quality-based) A total preorder relation X is defined on any finite subset X: x X y⇔ P Rank(x, X) < P Rank(y, X) // lower Pareto rank or // same Pareto rank and higher Hypervolume Contribution P Rank(x, X) = P Rank(y, X) and ∆H(x, X) > ∆H(y, X) Tournament (µ +t 1) selection for MOO Input: tournament size t ∈ I ; population of µ individuals X N Procedure: uniformly select t individuals from X Output: the best individual among t according to X criterion With t = 1: standard steady-state MO-CMA-ES with random selection a a C. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and Efficient Covariance Matrix Update in the Multi-objective CMA-ES" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 6/ 16
  7. 7. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionMulti-armed Bandits Original Multi-armed Bandits (MAB) problem A gambler plays arm (machine) j at time t 1 with prob. p and wins reward : rj,t = 0 with prob. (1-p) Goal: maximize the sum of rewards Upper Confidence Bound (UCB) [Auer, 2002] Initialization: play each arm once Loop: play arm j that maximizes: 2 ln nk,t rj,t + C ¯ k nj,t , ¯ rj,t average reward of arm j where nj,t number of plays of arm j Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 7/ 16
  8. 8. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionReward-based Parent Selection MAB for MOO ¯ rj,t is the average reward along a time window of size w ni,t = 0 for new offspring or for an individual selected w steps ago i with ni,t = 0 if exist, select parent i = 2 ln k nk,t i = Argmax rj,t + C ¯ nj,t otherwise At the moment, C = 0 (exploration iff ni,t = 0) 1. = ParentSelection() Objective 2 2. = +mutation 3. Is successful ? 4. Update and 5. ComputeRewards() Objective 1 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 8/ 16
  9. 9. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionDefining Rewards I: (µ + 1succ ), (µ + 1rank ) Parent a from the population Qg generates offspring a . Both the offspring and the parent receive reward r: (µ + 1succ ) If a becomes member of new population Q(g+1) : r = 1 if a ∈ Q(g+1) , and 0 otherwise (µ + 1rank ) A smoother reward is defined by the rank of a in Q(g+1) : rank(a ) r =1− if a ∈ Q(g+1) , and 0 otherwise µ Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 9/ 16
  10. 10. Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionDefining Rewards II: (µ + 1∆H1 ), (µ + 1∆Hi ) (µ + 1∆H1 ) Set the reward to the increase of the total Hypervolume contribution from generation g to g + 1: 0 if offspring is dominated r= (g+1) a∈Q(g+1) ∆H(a, Q )− a∈Q(g) ∆H(a, Q(g) ) otherwise (µ + 1∆Hi ) A relaxation of the above reward, involving a rank-based penalization:   1 (g+1) (g) r=  ∆H(a, ndomk (Q )) − ∆H(a, ndomk (Q )) 2k−1 ndomk (Q(g+1) ) ndomk (Q(g) ) where k denotes the Pareto rank of the current offspring, and ndomk (Q(g) ) is k-th non-dominated front of Q(g) . Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 10/ 16
  11. 11. Motivation Results Parent Selection Operators Summary Experimental validationExperimental validation Algorithms The steady-state MO-CMA-ES with modified parent selection: - 2 tournament-based: (µ +2 1) and (µ +10 1); - 4 MAB-based: (µ + 1succ ), (µ + 1rank ), (µ + 1∆H1 ) and (µ + 1∆Hi ). The baseline MO-CMA-ES: - steady-state (µ + 1)-MO-CMA , (µ + 1)-MO-CMA and generational (µ + µ)-MO-CMA. Default parameters (µ = 100), 200,000 evaluations, 31 runs. Benchmark Problems: sZDT1:3-6 with the true Pareto front shifted in decision space: xi ← |xi − 0.5| for 2 ≤ i ≤ n IHR1:3-6 rotated variants of the original ZDT problems LZ09-1:5 with complicated Pareto front in decision space 2 2 H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II." IEEE TEC 2009 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 11/ 16
  12. 12. Motivation Results Parent Selection Operators Summary Experimental validationResults: sZDT1, IHR1, LZ3 and LZ4 sZDT1 IHR1 0 0 (µ + 1) 10 (µ + 1) 10 (µ< + 1) (µ + 1) < (µ + 1) −1 (µ + 1) 10 10 10 (µ + 1 ) (µ + 1∆ H ) ∆H i i (µ + 1 ) −2 (µ + 1 ) −1 succ 10 succ 10 ∆ Htarget target (µ + 1 ) (µ + 1 ) rank rank ∆H −3 10 −2 −4 10 10 −5 10 0 2 4 6 8 10 0 2 4 6 8 10 number of function evaluations x 10 4 number of function evaluations x 10 4 LZ09−3 LZ09−4 0 (µ + 1) 0 10 10 (µ< + 1) (µ +10 1) (µ + 1 ) ∆H i −1 10 (µ + 1 ) succ ∆ Htarget target (µ + 1 ) −1 rank 10 (µ + 1) ∆H (µ + 1) < (µ + 1) −2 10 10 (µ + 1 ) ∆H i −2 (µ + 1 ) succ 10 (µ + 1 ) rank −3 10 0 2 4 6 8 10 0 2 4 6 8 10 number of function evaluations x 10 4 number of function evaluations x 10 4 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 12/ 16
  13. 13. Motivation Results Parent Selection Operators Summary Experimental validationResults: (µ + 1), (µ +10 1), (µ + 1rank ) on LZ problems LZ09-1 LZ09-1 LZ09-1 LZ09-3 LZ09-3 LZ09-3 1 1 1 1 1 1 0.8 0.8 0.8 0.5 0.5 0.5 0.6 0.6 0.6 3 x3 3 0 0 0 x x 3 x3 3 x x 0.4 0.4 0.4 0.2 0.2 0.2 -0.5 -0.5 -0.5 0 0 0 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.5 0.5 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 x 0 0 x 0 0 x 0 0 -1 -1 -1 2 x1 2 x1 2 x1 x 0 x 0 x 0 2 x1 2 x1 2 x1 LZ09-2 LZ09-2 LZ09-2 LZ09-4 LZ09-4 LZ09-4 1 1 1 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 3 x3 3 0 0 0 x x 3 x3 3 x x -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 x -1 0 x -1 0 x -1 0 -1 -1 -1 2 x1 2 x1 2 x1 x 0 x 0 x 0 2 x1 2 x1 2 x1 LZ09-5 LZ09-5 LZ09-5 1 1 1 0.5 0.5 0.5 0 0 0 3 x3 3 x -0.5 -0.5 x -0.5 -1 -1 -1 1 1 1 1 1 1 0 0 0 0.5 0.5 0.5 x -1 0 x -1 0 x -1 0 2 x1 2 x1 2 x1 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 13/ 16
  14. 14. Motivation Results Parent Selection Operators Summary Experimental validationLoss of diversity sZDT2 IHR3 4.5 5 All populations of MO−CMA−ES All populations of MO−CMA−ES 4 True Pareto Front True Pareto Front Pareto Front of MO−CMA−ES 4 Pareto Front of MO−CMA−ES 3.5 3 3 Objective 2 Objective 2 2.5 2 2 1.5 1 1 0 0.5 0 −1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 Objective 1 Objective 1 Typical behavior of (µ + 1succ )-MO-CMA on sZDT2 (left) and IHR3 (right) problems after 5,000 fitness function evaluations. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 14/ 16
  15. 15. Motivation Results Parent Selection Operators Summary Experimental validationSummary Speed-up (factor 2-5) with MAB schemes: (µ + 1rank ) and (µ + 1∆Hi ). Loss of diversity, especially on multi-modal problems. Too greedy schemes: (µ + 1succ ) and (µ +t 1). Perspectives 2 ln k nk,t Find "appropriate" C for exploration term rj,t + C ¯ nj,t . Allocate some budget of evaluations for dominated arms (individuals) generated in the past to preserve the diversity. Integrate the reward mechanism and update rules of (1+1)-CMA-ES, e.g. success rate and average reward in (µ + 1rank ). Experiments on Many-objective problems. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 15/ 16
  16. 16. Motivation Results Parent Selection Operators Summary Experimental validation Thank you for your attention ! Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 16/ 16

×