Complexity bounds for comparison-based optimization and parallel optimization

249 views

Published on

@article{fournier:inria-00452791,
hal_id = {inria-00452791},
url = {http://hal.inria.fr/inria-00452791},
title = {{Lower Bounds for Comparison Based Evolution Strategies using VC-dimension and Sign Patterns}},
author = {Fournier, Herv{\'e} and Teytaud, Olivier},
abstract = {{We derive lower bounds on the convergence rate of comparison based or selection based algorithms, improving existing results in the continuous setting, and extending them to non-trivial results in the discrete case. This is achieved by considering the VC-dimension of the level sets of the fitness functions; results are then obtained through the use of the shatter function lemma. In the special case of optimization of the sphere function, improved lower bounds are obtained by an argument based on the number of sign patterns.}},
keywords = {Evolutionary Algorithms;Parallel Optimization;Comparison-based algorithms;VC-dimension;Sign patterns;Complexity},
language = {Anglais},
affiliation = {Parall{\'e}lisme, R{\'e}seaux, Syst{\`e}mes d'information, Mod{\'e}lisation - PRISM , Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
publisher = {Springer},
journal = {Algorithmica},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00452791/PDF/evolution.pdf},
}

@incollection{teytaud:inria-00593179,
hal_id = {inria-00593179},
url = {http://hal.inria.fr/inria-00593179},
title = {{Lower Bounds for Evolution Strategies}},
author = {Teytaud, Olivier},
abstract = {{The mathematical analysis of optimization algorithms involves upper and lower bounds; we here focus on the second case. Whereas other chap- ters will consider black box complexity, we will here consider complexity based on the key assumption that the only information available on the fitness values is the rank of individuals - we will not make use of the exact fitness values. Such a reduced information is known efficient in terms of ro- bustness (Gelly et al., 2007), what gives a solid theoretical foundation to the robustness of evolution strategies, which is often argued without mathemat- ical rigor - and we here show the implications of this reduced information on convergence rates. In particular, our bounds are proved without infi- nite dimension assumption, and they have been used since that time for designing algorithms with better performance in the parallel setting.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Theory of Randomized Search Heuristics}},
publisher = {World Scientific},
pages = {327-354},
volume = {1},
editor = {Anne Auger, Benjamin Doerr },
series = {Series on Theoretical Computer Science },
audience = {internationale },
year = {2011},
month = May,
pdf = {http://hal.inria.fr/inria-00593179/PDF/ws-book9x6.pdf},
}

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
249
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse
  • Complexity bounds for comparison-based optimization and parallel optimization

    1. 1. Complexity bounds in parallel  evolution A. Auger, H. Fournier, N. Hansen, P. Rolet, F. Teytaud, O. Teytaud Paris, 2010Tao, Inria Saclay Ile-De-France,LRI (Université Paris Sud, France),UMR CNRS 8623, I&A team, Digiteo,Pascal Network of Excellence.
    2. 2. Outline Introduction Complexity bounds Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 2
    3. 3. Outline Introduction - What is optimization ? - What are comparison-based optimization algorithms ? - Why we are interested in cp-based opt ? - Why we consider parallel machines ?Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 3
    4. 4. Introduction: what is optimization ? Consider f: X --> R We look for x* such that x,f(x*) ≤ f(x) w random variable f is randomly drawn; f(x) = f(x,w).Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 4
    5. 5. Introduction: what is optimization ? Quality of “Opt” quantified as follows: (to be minimized) w random variableAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 5
    6. 6. Introduction: what is optimization ? Consider f: X --> R We look for x* such that x,f(x*) ≤ f(x) ==> Quasi-Newton, random search, Newton, Simplex, Interior points...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 6
    7. 7. Comparison-based optimization is comparison-based ifAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 7
    8. 8. The main rules for step-size adaptation While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x Update x, update  }Main trouble: choosing Cumulative step-size adaptationMutative self-adaptationEstimation of Multivariate Normal Algorithm
    9. 9. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points } I have a Gaussian...
    10. 10. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points } I generate 6 points
    11. 11. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points } I select the three best
    12. 12. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points } I update the Gaussian
    13. 13. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points } Obviously 6-parallel
    14. 14. Example 2: Mutative self-adaptation  = / 4 While ( I have time ) { Generate points (1,...,) as  x exp(- k.N) Generate points (x1,...,x) distributed as N(x,i) Select the  best points Update x (=mean), update (=log. mean)}
    15. 15. Plenty of comparison-based algorithms EMNA and other EDA Self-adaptive algorithms Cumulative step-size adaptation Pattern Search Methods ...
    16. 16. Families of comparison-based algorithmsMain parameter =  = number of evaluations per iteration = parallelismFull-Ranking vs Selection-Based (param ) FR: we know the ranking of the  best SB: we just know which are the  bestElitist or not Elitist: comparison with all visited points Non-elitist: only within current offspring
    17. 17. EMNA ? Self-adaptation ?Main parameter =  = number of evaluations per iteration = parallelismFull-Ranking vs Selection-Based FR: we know the ranking of all visited points SB: we just know which are the  bestElitist or not Elitist: comparison with all visited points Non-elitist: only within current offspring==> yet, they work quite well
    18. 18. Comparison-based algorithms are robust Consider f: X --> R We look for x* such that x,f(x*) ≤ f(x) ==> what if we see g o f (g increasing) ? ==> x* is the same, but xn might change ==> then, comparison-based methods are optimalAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 18
    19. 19. Robustness of comparison-based algorithms: formalstatement this does not depend on g for a comparison-based algorithm a comparison-based algorithm is optimal for (I dont give a proof here, but I promise its true) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 19
    20. 20. Introduction: I like  large ● Grid5000 = 5 000 cores (increasing) ● Submitting jobs ==> grouping runs ==>  much bigger than number of cores. ● Next generations of computers: tenths, hundreds, thousands of cores. ● Evolutionary algorithms are population based but they have a bad speed-up. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 20
    21. 21. Introduction: I like  large ● Grid5000 = 5 000 cores (increasing) ● Submitting jobs ==> grouping runs ==>  much bigger than number of cores. ● Next generations of computers: tenths, hundreds, thousands of cores. ● Evolutionary algorithms are population based but they have a bad speed-up. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 21
    22. 22. Introduction: I like  large ● Grid5000 = 5 000 cores (increasing) ● Submitting jobs ==> grouping runs ==>  much bigger than number of cores. ● Next generations of computers: tenths, hundreds, thousands of cores. ● Evolutionary algorithms are population based but they have a bad speed-up. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 22
    23. 23. Introduction: I like  large ● Grid5000 = 5 000 cores (increasing) ● Submitting jobs ==> grouping runs ==>  much bigger than number of cores. ● Next generations of computers: tenths, hundreds, thousands of cores. ● Evolutionary algorithms are population based but they have a bad speed-up. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 23
    24. 24. Introduction: concluding :-) ● Optimization = finding minima ● Many algorithms are comparison-based ● ==> good idea for robustness ● Parallel case interesting ● ==> now we can have fun with bounds Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 24
    25. 25. Outline Introduction On a given domain D On a space F of objective Complexity bounds functions such that {x*(f);f∈F}=D Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 25
    26. 26. Complexity bounds (N = dimension) = nb of fitness evaluations for precision  with probability at least ½ for all f N() = cov. number of the search space Exp ( - Convergence ratio ) = Convergence rate Convergence ratio ~ 1 / computational cost ==> more convenient for speed-upsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 26
    27. 27. Complexity bounds ½ = nb of fitness evaluations for precision  with probability at least ½ for all f N() = cov. number of the search space Exp ( - Convergence ratio ) = Convergence rate Convergence ratio ~ 1 / computational cost ==> more convenient for speed-upsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 27
    28. 28. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 28
    29. 29. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 29
    30. 30. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 30
    31. 31. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 31
    32. 32. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 32
    33. 33. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 33
    34. 34. Complexity bounds: -balls We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 34
    35. 35. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn ≥ N (  )Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 35
    36. 36. Complexity bounds on the convergence ratio FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 36
    37. 37. Complexity bounds on the convergence ratio Linear in  ? FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 37
    38. 38. Linear speed-up ? My bound is tight, Ive proved it!Bounds:On a given domain DOn a space F of objective functions such that {x*(f);f∈F}=D==> very strange F possible!==> much easier than F={||x-x*|| ; x*∈ D }Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 38
    39. 39. Linear speed-up ? My bound is Ok, tight bound. tight, But what Ive proved it! happens with a better model ?Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 39
    40. 40. Complexity bounds on the convergence ratio - Comparison-based optimization (or opt. with limited precision numbers) - We have developped bounds based on:Branching factor: finitely many possibleinformations on the problem per time step(→ communication. compl)Packing number (lower bound on number ofpossible outcomes) Adding assumptions ==> better bounds ?Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 40
    41. 41. Complexity bounds: improved technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number nMany of these K verify of iterations should Kn ≥ Nbranches are ( ) very unlikely !Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 41
    42. 42. Complexity bounds: improved technique We want to know how many iterations we need for reaching precision  in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-based Lets consider (for simplicity) a deterministic selection-based non-elitist algorithm First idea: how many different branches we have in a run ? We select  points among  Therefore, at most K = ! / ( ! (  -  )!) different branches Second idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Many of these K n K ≥ N( ) branches are Well use... … VC-dimension ! very unlikely !Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 42
    43. 43. (these slides “shattering + VC-dim” extracted from Xue Meis talk at ENEE698A)Definition of shattering: A set S of points is shattered by a set H of sets if for every dichotomy of S there is a consistent hypothesis in H
    44. 44. Example: Shattering Is this set of points shattered by the set H o
    45. 45. Yes! + - + + + + + + - + + - + - - - - - - + + - - -
    46. 46. Is this set of points shattered by circles?
    47. 47. How About This One?
    48. 48. VC-dimension VC-dimension( set of sets ) = maximum cardinal of a shattered set VC-dimension (set of functions ) = VC-dimension ( level sets) Known (as a function of the dimension) for many sets of functions In particular, quadratic for ellipsoids, linear for homotheties of a fixed ellipsoid linear for circles...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 48
    49. 49. VC-dimension VC-dimension( set of sets ) = maximum cardinal of a shattered set VC-dimension (set of functions ) = VC-dimension ( level sets) Known (as a function of the dimension) for many sets of functions In particular, quadratic for ellipsoids, linear for homotheties of a fixed ellipsoid linear for circles...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 49
    50. 50. VC-dimension VC-dimension( set of sets ) = maximum cardinal of a shattered set VC-dimension (set of functions ) = VC-dimension ( level sets) Known (as a function of the dimension) for many sets of functions In particular, quadratic for ellipsoids, linear for homotheties of a fixed ellipsoid linear for circles...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 50
    51. 51. VC-dimension VC-dimension( set of sets ) = maximum cardinal of a shattered set VC-dimension (set of functions ) = VC-dimension ( sublevel sets) Known (as a function of the dimension) for many sets of functions In particular, quadratic for ellipsoids, linear for homotheties of a fixed ellipsoid linear for circles...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 51
    52. 52. VC-dimension: the link with optimization ? Sauers lemma: number of subsets of V points consistent V with a set of VC-dim V at most  So what ? number of possible selections at most V K≤ ==> instead of K = ! / ( ! (  -  )!) (V at least 3, otherwise a few details change...)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 52
    53. 53. Complexity bounds on the convergence ratio FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 53
    54. 54. Complexity bounds on the convergence ratio Should not be linear in  ! FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 54
    55. 55. Complexity bounds on the convergence ratio Something remains! FR: full ranking (selected points are ranked) SB: selection-based (selected points are not ranked)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 55
    56. 56. Sphere: fitness increases with distance to optimum 1 comparison = 1 hyperplane Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 56
    57. 57. Sphere: fitness increases with distance to optimum 1 comparison = 1 hyperplane Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 57
    58. 58. Sphere: fitness increases with distance to optimum 1 comparison = 1 hyperplane Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 58
    59. 59. Sphere: fitness increases with distance to optimum 1 comparison = 1 hyperplane Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 59
    60. 60. Outline Introduction Complexity bounds Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 60
    61. 61. Branching factor K (more in Gelly06; Fournier08)Rewrite your evolutionary algorithm as follows:g has values in a finite set of cardinal K: - e.g. subsets of {1,2,...,} of size  (K=! / (!(-)!) )- or ordered subsets (K=! / (-)! ).- ...Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 61
    62. 62. Outline Upper bounds for the Introduction dependency in  Complexity bounds Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 62
    63. 63. Automatic parallelizationAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 63
    64. 64. Speculative parallelization with branching factor3 Consider the sequential algorithm. (iteration 1)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 64
    65. 65. Speculative parallelization with branching factor3 Consider the sequential algorithm. (iteration 2)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 65
    66. 66. Speculative parallelization with branching factor3 Consider the sequential algorithm. (iteration 3)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 66
    67. 67. Speculative parallelization with branching factor3 Parallel version for D=2. Population = union of all pops for 2 iterations.Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 67
    68. 68. Outline Introduction Tighter lower bounds for Complexity bounds specific algorithms ? Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 68
    69. 69. Real world algorithms Define: Necessary condition for log() speed-up: - E log( * ) ~ log() But for many algorithms, - E log( * ) = O(1) ==> constant speed-upAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 69
    70. 70. One-fifth rule: - E log( * ) = O(1) = proportion of mutated points better than x While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x Update x = mean Update  By 1/5th rule } or Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 70
    71. 71. One-fifth rule: - E log( * ) = O(1) = proportion of mutated points better than xConsider e.g. Or consider e.g. In both cases * is lower-bounded independently of  ==> parameters should strongly depend on  ! Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 71
    72. 72. Self-adaptation, cumulative step-size adaptation In many cases, the same result: with parameters depending on the dimension only (and not depending on ), the speed-up is limited by a constant!Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 72
    73. 73. Outline Introduction Complexity bounds Branching Factor Automatic Parallelization Real-world algorithms Log() correctionsAuger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 73
    74. 74. The starting point of this work ●We have shown tight bounds. ●Usual algorithms dont reach the bounds for  large. ● ●Trouble: the algorithms we propose are boring (too complicated), people prefer usual algorithms. ● ● A simple patch for these algorithms?Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 74
    75. 75. Log() corrections ● In the discrete case (XPs): automatic parallelization surprisingly efficient. ● Simple trick in the continuous case: - E log( *) should be linear in log() (this provides corrections which work for SA, EMNA and CSA)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 75
    76. 76. Example 1: Estimation of Multivariate Normal Algorithm While ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean  best points = standard deviation of  best points /= log( / 7)1 / d } I select the three best
    77. 77. Ex 2: Log(lambda) correction for mutative self-adapt.  =  / 4 ==> min( /4,d) While ( I have time ) { Generate points (1,...,) as  x exp(- k.N) Generate points (x1,...,x) distributed as N(x,i) Select the  best points Update x (=mean), update (=log. mean) }
    78. 78. Log() corrections (SA, dim 3) ● In the discrete case (XPs): automatic parallelization surprisingly efficient. ● Simple trick in the continuous case - E log( *) should be linear in log() (this provides corrections which work for SA and CSA)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 78
    79. 79. Log() corrections ● In the discrete case (XPs): automatic parallelization surprisingly efficient. ● Simple trick in the continuous case - E log( *) should be linear in log() (this provides corrections which work for SA and CSA)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 79
    80. 80. Conclusion The case of large population size is not well handled by usual algorithms. We proposed (I) theoretical bounds (II) an automatic parallelization matching the bound, and which works well in the discrete case. (III) a necessary condition for the continuous case, which provides useful hints.Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 80
    81. 81. Main limitation (of the application to the design of algo) All this is about a logarithmic speed-up. The computational power is like this ==> <== and the result is like that. ==> much better speed-up for noisy optimization. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 81
    82. 82. Further work 1 Apply VC-bounds for considering only “reasonnable” branches in the automatic parallelization. Theoretically easy, but provides extremely complicated algorithms.Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 82
    83. 83. Further work 2 We have: - proofs for complicated algorithms - efficient (unproved) hints for usual algorithms Proofs for the versions with the “trick” ? Nb: the discrete case is moral: the best algorithm is the proved one :-)Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 83
    84. 84. Further work 3 What if the optimum is not a point but a subset with topological dimension N < N ?Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 84
    85. 85. Further work 4 Parallel bandits ? Experimentally, parallel UCT >> seq. UCT. with speed-up depending on nb of arms. Theory ? Perhaps not very hard, but not done yet.Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 85

    ×