• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Not all parents are equal for MO-CMA-ES
 

Not all parents are equal for MO-CMA-ES

on

  • 316 views

t. The Steady State variants of the Multi-Objective Covariance...

t. The Steady State variants of the Multi-Objective Covariance
Matrix Adaptation Evolution Strategy (SS-MO-CMA-ES) generate one
offspring from a uniformly selected parent. Some other parental selection
operators for SS-MO-CMA-ES are investigated in this paper. These operators involve the definition of multi-objective rewards, estimating the
expectation of the offspring survival and its Hypervolume contribution.
Two selection modes, respectively using tournament, and inspired from
the Multi-Armed Bandit framework, are used on top of these rewards.
Extensive experimental validation comparatively demonstrates the merits of these new selection operators on unimodal MO problems

Statistics

Views

Total Views
316
Views on SlideShare
316
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Not all parents are equal for MO-CMA-ES Not all parents are equal for MO-CMA-ES Presentation Transcript

    • Motivation Parent Selection Operators Experimental validationNot all parents are equal for MO-CMA-ESIlya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université Paris-Sud, 91128 Orsay Cedex, France Kindly presented by Jan Salmen, Universität Bochum with apologies from the authors Evolutionary Multi-Criterion Optimization - EMO 2011 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16
    • Motivation Parent Selection Operators Experimental validationContent 1 Motivation Why some parents are better than other ? Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? 2 Parent Selection Operators Tournament Selection (quality-based) Multi-armed Bandits Reward-based Parent Selection 3 Experimental validation Results Summary Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 2/ 16
    • Motivation Why some parents are better than other ? Parent Selection Operators Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? Experimental validationWhy some parents are better than other ? Objective 2 Objective 2 Objective 1 Objective 1 Hard to find segments of the Pareto front Some parents improve the fitness faster than other Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 3/ 16
    • Motivation Why some parents are better than other ? Parent Selection Operators Multi-Objective Covariance Matrix Adaptation Evolution Strategy ? Experimental validationWhy MO-CMA-ES ? µM O -(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection CMA-ES excellent on single-objective problems (e.g., BBOB) In µM O -(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES 1 Objective 2 Objective 2 Objective 1 Objective 1 Two typical adaptations for MO-CMA-ES 1 C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 4/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionEvolution Loop for Steady-state MO-CMA-ES Parent Selection Variation Evaluation Environmental selection Update / Adaptation Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 5/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionTournament Selection (quality-based) A total preorder relation X is defined on any finite subset X: x X y⇔ P Rank(x, X) < P Rank(y, X) // lower Pareto rank or // same Pareto rank and higher Hypervolume Contribution P Rank(x, X) = P Rank(y, X) and ∆H(x, X) > ∆H(y, X) Tournament (µ +t 1) selection for MOO Input: tournament size t ∈ I ; population of µ individuals X N Procedure: uniformly select t individuals from X Output: the best individual among t according to X criterion With t = 1: standard steady-state MO-CMA-ES with random selection a a C. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and Efficient Covariance Matrix Update in the Multi-objective CMA-ES" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 6/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionMulti-armed Bandits Original Multi-armed Bandits (MAB) problem A gambler plays arm (machine) j at time t 1 with prob. p and wins reward : rj,t = 0 with prob. (1-p) Goal: maximize the sum of rewards Upper Confidence Bound (UCB) [Auer, 2002] Initialization: play each arm once Loop: play arm j that maximizes: 2 ln nk,t rj,t + C ¯ k nj,t , ¯ rj,t average reward of arm j where nj,t number of plays of arm j Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 7/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionReward-based Parent Selection MAB for MOO ¯ rj,t is the average reward along a time window of size w ni,t = 0 for new offspring or for an individual selected w steps ago i with ni,t = 0 if exist, select parent i = 2 ln k nk,t i = Argmax rj,t + C ¯ nj,t otherwise At the moment, C = 0 (exploration iff ni,t = 0) 1. = ParentSelection() Objective 2 2. = +mutation 3. Is successful ? 4. Update and 5. ComputeRewards() Objective 1 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 8/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionDefining Rewards I: (µ + 1succ ), (µ + 1rank ) Parent a from the population Qg generates offspring a . Both the offspring and the parent receive reward r: (µ + 1succ ) If a becomes member of new population Q(g+1) : r = 1 if a ∈ Q(g+1) , and 0 otherwise (µ + 1rank ) A smoother reward is defined by the rank of a in Q(g+1) : rank(a ) r =1− if a ∈ Q(g+1) , and 0 otherwise µ Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 9/ 16
    • Motivation Tournament Selection (quality-based) Parent Selection Operators Multi-armed Bandits Experimental validation Reward-based Parent SelectionDefining Rewards II: (µ + 1∆H1 ), (µ + 1∆Hi ) (µ + 1∆H1 ) Set the reward to the increase of the total Hypervolume contribution from generation g to g + 1: 0 if offspring is dominated r= (g+1) a∈Q(g+1) ∆H(a, Q )− a∈Q(g) ∆H(a, Q(g) ) otherwise (µ + 1∆Hi ) A relaxation of the above reward, involving a rank-based penalization:   1 (g+1) (g) r=  ∆H(a, ndomk (Q )) − ∆H(a, ndomk (Q )) 2k−1 ndomk (Q(g+1) ) ndomk (Q(g) ) where k denotes the Pareto rank of the current offspring, and ndomk (Q(g) ) is k-th non-dominated front of Q(g) . Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 10/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validationExperimental validation Algorithms The steady-state MO-CMA-ES with modified parent selection: - 2 tournament-based: (µ +2 1) and (µ +10 1); - 4 MAB-based: (µ + 1succ ), (µ + 1rank ), (µ + 1∆H1 ) and (µ + 1∆Hi ). The baseline MO-CMA-ES: - steady-state (µ + 1)-MO-CMA , (µ + 1)-MO-CMA and generational (µ + µ)-MO-CMA. Default parameters (µ = 100), 200,000 evaluations, 31 runs. Benchmark Problems: sZDT1:3-6 with the true Pareto front shifted in decision space: xi ← |xi − 0.5| for 2 ≤ i ≤ n IHR1:3-6 rotated variants of the original ZDT problems LZ09-1:5 with complicated Pareto front in decision space 2 2 H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets, MOEA/D and NSGA-II." IEEE TEC 2009 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 11/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validationResults: sZDT1, IHR1, LZ3 and LZ4 sZDT1 IHR1 0 0 (µ + 1) 10 (µ + 1) 10 (µ< + 1) (µ + 1) < (µ + 1) −1 (µ + 1) 10 10 10 (µ + 1 ) (µ + 1∆ H ) ∆H i i (µ + 1 ) −2 (µ + 1 ) −1 succ 10 succ 10 ∆ Htarget target (µ + 1 ) (µ + 1 ) rank rank ∆H −3 10 −2 −4 10 10 −5 10 0 2 4 6 8 10 0 2 4 6 8 10 number of function evaluations x 10 4 number of function evaluations x 10 4 LZ09−3 LZ09−4 0 (µ + 1) 0 10 10 (µ< + 1) (µ +10 1) (µ + 1 ) ∆H i −1 10 (µ + 1 ) succ ∆ Htarget target (µ + 1 ) −1 rank 10 (µ + 1) ∆H (µ + 1) < (µ + 1) −2 10 10 (µ + 1 ) ∆H i −2 (µ + 1 ) succ 10 (µ + 1 ) rank −3 10 0 2 4 6 8 10 0 2 4 6 8 10 number of function evaluations x 10 4 number of function evaluations x 10 4 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 12/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validationResults: (µ + 1), (µ +10 1), (µ + 1rank ) on LZ problems LZ09-1 LZ09-1 LZ09-1 LZ09-3 LZ09-3 LZ09-3 1 1 1 1 1 1 0.8 0.8 0.8 0.5 0.5 0.5 0.6 0.6 0.6 3 x3 3 0 0 0 x x 3 x3 3 x x 0.4 0.4 0.4 0.2 0.2 0.2 -0.5 -0.5 -0.5 0 0 0 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.5 0.5 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 x 0 0 x 0 0 x 0 0 -1 -1 -1 2 x1 2 x1 2 x1 x 0 x 0 x 0 2 x1 2 x1 2 x1 LZ09-2 LZ09-2 LZ09-2 LZ09-4 LZ09-4 LZ09-4 1 1 1 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 3 x3 3 0 0 0 x x 3 x3 3 x x -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 x -1 0 x -1 0 x -1 0 -1 -1 -1 2 x1 2 x1 2 x1 x 0 x 0 x 0 2 x1 2 x1 2 x1 LZ09-5 LZ09-5 LZ09-5 1 1 1 0.5 0.5 0.5 0 0 0 3 x3 3 x -0.5 -0.5 x -0.5 -1 -1 -1 1 1 1 1 1 1 0 0 0 0.5 0.5 0.5 x -1 0 x -1 0 x -1 0 2 x1 2 x1 2 x1 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 13/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validationLoss of diversity sZDT2 IHR3 4.5 5 All populations of MO−CMA−ES All populations of MO−CMA−ES 4 True Pareto Front True Pareto Front Pareto Front of MO−CMA−ES 4 Pareto Front of MO−CMA−ES 3.5 3 3 Objective 2 Objective 2 2.5 2 2 1.5 1 1 0 0.5 0 −1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 Objective 1 Objective 1 Typical behavior of (µ + 1succ )-MO-CMA on sZDT2 (left) and IHR3 (right) problems after 5,000 fitness function evaluations. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 14/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validationSummary Speed-up (factor 2-5) with MAB schemes: (µ + 1rank ) and (µ + 1∆Hi ). Loss of diversity, especially on multi-modal problems. Too greedy schemes: (µ + 1succ ) and (µ +t 1). Perspectives 2 ln k nk,t Find "appropriate" C for exploration term rj,t + C ¯ nj,t . Allocate some budget of evaluations for dominated arms (individuals) generated in the past to preserve the diversity. Integrate the reward mechanism and update rules of (1+1)-CMA-ES, e.g. success rate and average reward in (µ + 1rank ). Experiments on Many-objective problems. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 15/ 16
    • Motivation Results Parent Selection Operators Summary Experimental validation Thank you for your attention ! Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 16/ 16