Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-Based Surrogates. PPSN 2010

597 views

Published on

Taking inspiration from approximate ranking, this paper investigates the use of rank-based Support Vector Machine as surrogate model within CMA-ES, enforcing the invariance of the approach with respect to monotonous transformations of the fitness function. Whereas the choice of the SVM kernel is known to be a critical issue, the proposed approach uses the Covariance Matrix adapted by CMA-ES within a Gaussian kernel, ensuring the adaptation of the kernel to the currently explored region of the fitness landscape at almost no computational overhead. The empirical validation of the approach on standard benchmarks, comparatively to CMA-ES and recent surrogate-based CMA-ES, demonstrates the efficiency and scalability of the proposed approach.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
597
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-Based Surrogates. PPSN 2010

  1. 1. Motivation Background ACM-ES Comparison-Based Optimizers Need Comparison-Based SurrogatesIlya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université Paris-Sud, 91128 Orsay Cedex, France Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 1/ 20
  2. 2. Motivation Background ACM-ESContent 1 Motivation Why Comparison-Based Surrogates ? Previous Work 2 Background Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Support Vector Machine (SVM) 3 ACM-ES Algorithm Experiments Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 2/ 20
  3. 3. Motivation Why Comparison-Based Surrogates ? Background Previous Work ACM-ESWhy Comparison-Based Surrogates ? Surrogate Model (Meta-Model) assisted optimization Construct the approximation model M (x) of f (x). Optimize the model M (x) in lieu of f (x) to reduce the number of costly evaluations of the function f (x). Example f (x) = x2 + (x1 + x2 )2 . 1 An efficient Evolutionary Algorithm (EA) with surrogate models may be 4.3 faster on f (x). But the same EA is only 2.4 faster on f (x) = f (x)1/4 ! a a CMA-ES with quadratic meta-model (lmm-CMA-ES) on fSchwefel 2-D Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 3/ 20
  4. 4. Motivation Why Comparison-Based Surrogates ? Background Previous Work ACM-ESOrdinal Regression in Evolutionary Computation Goal: Find the function F (x) which preserves the ordering of the training points xi (xi has rank i): xi xj ⇔ F (xi ) > F (xj ) F (x) is invariant to any rank-preserving transformation. CMA-ES with Rank Support Vector Machine on Rosenbrock: 1 n =2 2 n =5 3 n =10 10 10 0 10 ( I ,λ )CMA-ES Poly, d =2 mean fitness 10 1 10 2 Poly, d =4 2 10 RBF, γ =1 0 1 4 10 10 10 6 1 0 10 10 10 200 400 600 200 600 1000 1000 2000 3000 # function evaluations # function evaluations # function evaluations 1 T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 4/ 20
  5. 5. Motivation Why Comparison-Based Surrogates ? Background Previous Work ACM-ESExploit the local topography of the function CMA-ES adapts the covariance matrix C which describes the local structure of the function. Mahalanobis (fully weighted Euclidean) distance: d(xi , xj ) = (xi − xj )T C −1 (xi − xj ) 2 Results of CMA-ES with quadratic meta-models: 8 10 lmm-CMA-ES O(FLOP)/saved function evaluation Speed-up: a factor of 2-4 for n ≥ 4 6 10 Complexity: from O(n4 ) to O(n6 ) 4 10 Rank-preserving invariance: NO becomes intractable for n>16 2 10 2 4 8 16 n 2 S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 5/ 20
  6. 6. Motivation Why Comparison-Based Surrogates ? Background Previous Work ACM-ESTractable or Efficient ? Answer: Tractable and Efficient and Invariant. Ingredients: CMA-ES (Adaptive Encoding) and Rank SVM. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 6/ 20
  7. 7. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESCovariance Matrix Adaptation Evolution StrategyDecompose to understand While CMA-ES by definition is CMA and ES, only recently the algorithmic decomposition has been presented. 3 Algorithm 1 CMA-ES = Adaptive Encoding + ES 1: xi ← m + σNi (0, I), for i = 1 . . . λ 2: fi ← f (Bxi ), for i = 1 . . . λ 3: if Evolution Strategy (ES) then success rate 4: σ ← σexp∝( expected success rate −1) 5: if Cumulative Step-Size Adaptation ES (CSA-ES) then evolution path 6: σ ← σexp∝( expected evolution path −1) 7: B ←AECMA -Update(Bx1 , . . . , Bxµ ) 3 N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 7/ 20
  8. 8. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESAdaptive EncodingInspired by Principal Component Analysis (PCA) Principal Component Analysis Adaptive Encoding Update Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 8/ 20
  9. 9. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESSupport Vector Machine for ClassificationLinear Classifier L1 Main Idea L2 Training Data: w n L3 D = {(xi , yi )|xi ∈ I p , yi ∈ {−1, +1}}i=1 R w, xi ≥ b + ⇒ yi = +1; w, xi ≤ b − ⇒ yi = −1; Dividing by > 0: b w, xi − b ≥ +1 ⇒ yi = +1; +1 w, xi − b ≤ −1 ⇒ yi = −1; b 0 b -1 support b Optimization Problem: Primal Form vector w n Minimize{w, ξ} 1 ||w||2 + C i=1 ξi 2 xi subject to: yi ( w, xi − b) ≥ 1 − ξi , ξi ≥ 0 2/ ||w|| Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 9/ 20
  10. 10. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESSupport Vector Machine for ClassificationLinear Classifier L1 Optimization Problem: Dual Form L2 From Lagrange Theorem, instead of minimize F : w L3 Minimize{α,G} F − i αi Gi subject to: αi ≥ 0, Gi ≥ 0 Leaving the details, Dual form: n 1 n Maximize{α} i αi − 2 i,j=1 αi αj yi yj xi , xj b n subject to: 0 ≤ αi ≤ C, i αi yi = 0 +1 b 0 b -1 Properties support b vector w Decision Function: n F (x) = sign( i αi yi xi , x − b) xi The Dual form may be solved using standard 2/ ||w|| quadratic programming solver. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 10/ 20
  11. 11. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESSupport Vector Machine for ClassificationNon-Linear Classifier w,F(x) -b = +1 w F w,F(x) -b = -1 xi support vector 2/ ||w|| a) b) c) Non-linear classification with the "Kernel trick" 1n n Maximize{α} i αi − 2 i,j=1 αi αj yi yj K(xi , xj ) n subject to: ai ≥ 0, i αi yi = 0, where K(x, x ) =def < Φ(x), Φ(x ) > is the Kernel function n Decision Function: F (x) = sign( i αi yi K(xi , x) − b) Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 11/ 20
  12. 12. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESSupport Vector Machine for ClassificationNon-Linear Classifier: Kernels Polynomial: k(xi , xj ) = ( xi , xj + 1)d 2 xi −xj Gaussian or Radial Basis Function: k(xi , xj ) = exp( 2σ 2 ) Hyperbolic tangent: k(xi , xj ) = tanh(k xi , xj + c) Examples for Polynomial (left) and Gaussian (right) Kernels: Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 12/ 20
  13. 13. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESRanking Support Vector Machine Find F (x) which preserves the ordering of the training points. w x x r2) L( x r1) L( Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 13/ 20
  14. 14. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Support Vector Machine (SVM) ACM-ESRanking Support Vector Machine Primal problem N Minimize{w, ξ} 1 ||w||2 + 2 i=1 C i ξi w, Φ(xi ) − Φ(xi+1 ) ≥ 1 − ξi (i = 1 . . . N − 1) subject to ξi ≥ 0 (i = 1 . . . N − 1) Dual problem N −1 N −1 Maximize{α} i αi − i,j αij K(xi − xi+1 , xj − xj+1 )) subject to 0 ≤ αi ≤ Ci (i = 1 . . . N − 1) Rank Surrogate Function in the case 1 rank = 1 point N −1 F(x) = i=1 αi (K(xi , x) − K(xi+1 , x)) Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 14/ 20
  15. 15. Motivation Algorithm Background Experiments ACM-ESModel LearningNon-separable Ellipsoid problem (xi −xj )T (xi −xj ) (xi −xj )T C −1 (xi −xj ) K(xi , xj ) = e− 2σ 2 ; KC (xi , xj ) = e− 2σ 2 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 15/ 20
  16. 16. Motivation Algorithm Background Experiments ACM-ESOptimization or filtering?Don’t be too greedy Optimization: Significant Potential Speed-Up if the surrogate model is global and accurate enough Filtering: "Guaranteed" Speed-Up with the local surrogate model Prescreen (λ ) Retain with rank , ~ Evaluate Retain with rank λ, ~λ (λ′ ) 0.6 Probability Density 0.5 0.4 0.3 0.2 0 100 200 300 400 500 Rank Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 16/ 20
  17. 17. Motivation Algorithm Background Experiments ACM-ESACM-ES Optimization Loop 2. The change of coordinates, defined from the current covariance matrix A. Select training points and the current mean value , reads [4]: B. Build a surrogate model 1. Select best k training points. 1 1 0.5 0.5 X2 X2 0 0 -0.5 -0.5 -1 -1 3. Build a surrogate model using Rank SVM. -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 X1 X1 7. Add new λ′ training points and update parameters of CMA-ES. Rank-based Surrogate D. Select most promising children C. Generate pre-children Model 5. Prescreen 4. Generate pre-children and rank them (λ ) Retain with rank , ~ according to surrogate fitness function. 6. Evaluate Retain with rank λ, ~λ (λ′ ) 0.6 Probability Density 0.5 0.4 0.3 0.2 0 100 200 300 400 500 Rank Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 17/ 20
  18. 18. Motivation Algorithm Background Experiments ACM-ESResultsSpeed-up 5 5 Function n λ λ′ e ACM-ES spu CMA-ES Schwefel 10 10 3 801 36 3.3 2667 87 20 12 4 3531 179 2.0 7042 172 40 15 5 13440 281 1.7 22400 289 4 4 Schwefel1/4 10 10 3 1774 37 4.1 7220 206 20 12 4 6138 82 2.5 15600 294 40 15 5 22658 390 1.8 41534 466 Rosenbrock 10 10 3 2059 143 (0.95) 3.7 7669 691 (0.90) Speedup 3 3 20 12 4 11793 574 (0.75) 1.8 21794 1529 40 15 5 49750 2412 (0.9) 1.6 82043 3991 N oisySphere 10 10 3 0.15 766 90 (0.95) 2.7 2058 148 20 12 4 0.11 1361 212 2.8 3777 127 2 2 40 15 5 0.08 2409 120 2.9 7023 173 Schwefel 1/4 Ackley 10 10 3 892 28 4.1 3641 154 Schwefel 20 12 4 1884 50 3.5 6641 108 Rosenbrock 40 15 5 3690 80 3.3 12084 247 1 Noisy Sphere 1 Ellipsoid 10 10 3 1628 95 3.8 6211 264 Ackley 20 12 4 8250 393 2.3 19060 501 Ellipsoid 40 15 5 33602 548 2.1 69642 644 Rastrigin 5 140 70 23293 1374 (0.3) 0.5 12310 1098 (0.75) 0 0 0 5 10 15 20 25 30 35 40 Problem Dimension Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 18/ 20
  19. 19. Motivation Algorithm Background Experiments ACM-ESResultsLearning Time Cost of model learning/testing increases quasi-linearly with d on Sphere function: 10 Slope=1.13 Learning time (ms) 3 10 2 10 1 10 0 1 2 3 10 10 10 10 Problem Dimension Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 19/ 20
  20. 20. Motivation Algorithm Background Experiments ACM-ESSummary ACM-ES ACM-ES is from 2 to 4 times faster on Uni-Modal Problems. Invariant to rank-preserving transformation: Yes The computation complexity (the cost of speed-up) is O(n) comparing to O(n6 ) The source code is available online: http://www.lri.fr/~ilya/publications/ACMESppsn2010.zip Open Questions Extention to multi-modal optimization Adaptation of selection pressure and surrogate model complexity Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 20/ 20
  21. 21. Motivation Algorithm Background Experiments ACM-ESSummary Thank you for your attention! Questions?
  22. 22. Motivation Algorithm Background Experiments ACM-ESParameters SVM Learning: √ Number of training points: Ntraining = 30 d for all problems, except Rosenbrock and Rastrigin, where Ntraining = 70 (d) √ Number of iterations: Niter = 50000 d Kernel function: RBF function with σ equal to the average distance of the training points The cost of constraint violation: Ci = 106 (Ntraining − i)2.0 Offspring Selection Procedure: Number of test points: Ntest = 500 Number of evaluated offsprings: λ = λ3 2 2 Offspring selection pressure parameters: σsel0 = 2σsel1 = 0.8

×