Multimodal or Expensive Optimization

300 views

Published on

Introductory talk


more technicities in
@inproceedings{schoenauer:inria-00625855,
hal_id = {inria-00625855},
url = {http://hal.inria.fr/inria-00625855},
title = {{A Rigorous Runtime Analysis for Quasi-Random Restarts and Decreasing Stepsize}},
author = {Schoenauer, Marc and Teytaud, Fabien and Teytaud, Olivier},
abstract = {{Multi-Modal Optimization (MMO) is ubiquitous in engineer- ing, machine learning and artificial intelligence applications. Many algo- rithms have been proposed for multimodal optimization, and many of them are based on restart strategies. However, only few works address the issue of initialization in restarts. Furthermore, very few comparisons have been done, between different MMO algorithms, and against simple baseline methods. This paper proposes an analysis of restart strategies, and provides a restart strategy for any local search algorithm for which theoretical guarantees are derived. This restart strategy is to decrease some 'step-size', rather than to increase the population size, and it uses quasi-random initialization, that leads to a rigorous proof of improve- ment with respect to random restarts or restarts with constant initial step-size. Furthermore, when this strategy encapsulates a (1+1)-ES with 1/5th adaptation rule, the resulting algorithm outperforms state of the art MMO algorithms while being computationally faster.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Microsoft Research - Inria Joint Centre - MSR - INRIA , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Artificial Evolution}},
address = {Angers, France},
audience = {internationale },
year = {2011},
month = Oct,
pdf = {http://hal.inria.fr/inria-00625855/PDF/qrrsEA.pdf},
}

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
300
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Multimodal or Expensive Optimization

  1. 1. Hard optimization (multimodal, adversarial, noisy) http://www.lri.fr/~teytaud/hardopt.odp http://www.lri.fr/~teytaud/hardopt.pdf (or Quentins web page) Acknowledgments: koalas, dinosaurs, mathematicians.Olivier Teytaud olivier.teytaud@inria.frKnowledge also from A. Auger, M. Schoenauer.
  2. 2. The next slide is the most important of all.Olivier TeytaudInria Tao, en visite dans la belle ville de Liège
  3. 3. In case of trouble, Interrupt me.Olivier TeytaudInria Tao, en visite dans la belle ville de Liège
  4. 4. In case of trouble, Interrupt me. Further discussion needed: - R82A, Montefiore institute - olivier.teytaud@inria.fr - or after the lessons (the 25 th , not the 18th)Olivier TeytaudInria Tao, en visite dans la belle ville de Liège
  5. 5. Rather than complex algorithms,well see some concepts and some simplebuilding blocks for defining algorithms. Robust optimization Noisy optimization Warm starts Nash equilibria Coevolution Surrogate models Niching Maximum uncertainty Clearing Monte-Carlo estimate Multimodal optimization Quasi-Monte-Carlo Restarts (sequential / parallel) Van Der Corput Koalas and Eucalyptus Halton Dinosaurs (and other mass extinction) Scrambling Fictitious Play EGO EXP3
  6. 6. I. Multimodal optimizationII. Adversarial / robust casesIII. Noisy cases
  7. 7. e.g. Schwefels function(thanks toIrafm)
  8. 8. What is multimodal optimization ?- finding one global optimum ? (not getting stuck in a local minimum)- finding all local minima ?- finding all global minima ?
  9. 9. Classical methodologies:1. Restarts2. ``Real diversity mechanisms
  10. 10. Restarts:While (time left >0){ run your favorite algorithm}(requires halting criterion + randomness – why ?)
  11. 11. Parallel restartsConcurrently, p times:{ run your favorite algorithm} ==> still needs randomization
  12. 12. Parallel restartsConcurrently, p times:{ run your favorite algorithm} ==> needs randomization (at least in the initialization)
  13. 13. Random restarts Quasi-random restarts
  14. 14. Lets open a parenthesis... (
  15. 15. The Van Der Corput sequence:quasi-random numbers in dimension 1.How should I write the nth point ?(p=integer, parameter of the algorithm)
  16. 16. A (simple) Scrambled Van Der Corput sequence:
  17. 17. Halton sequence:= generalization to dimension dChoose p1,...,pdAll pis must be different (why ?).Typically, the first prime numbers (why ?).
  18. 18. Scrambled Halton sequence:= generalization to dimension d with scramblingChoose p1,...,pdAll pis must be different (why ?).Typically, the first prime numbers (why ?).
  19. 19. Lets close the parenthesis...(thank you Asterix) )
  20. 20. Consider an ES (an evolution strategy = an evolutionary algorithm in continuous domains).The restarts are parametrized by an initial point x and an initial step-size .Which hypothesis on x and  do we need, so that the restarts of the ES will find all local optima ?
  21. 21. X*(f) = set of optimaIf I start close enough to an optimum, with small enough step-size, I find it.
  22. 22. Restart (RS) algorithm:
  23. 23. What do you suggest ?
  24. 24. Accumulationof a sequence
  25. 25. THEOREM:
  26. 26. THEOREM: ( (1) ==> (2); do you see why ?)
  27. 27. Ok, step-sizes should accumulate around 0,and the xis should be dense.X(i) = uniform random = ok and Or something decreasing; But not a constant.sigma(i) = exp( Gaussian ) are ok. (Why ?)Should I do something more sophisticated ?
  28. 28. I want a small number of restarts.A criterion: dispersion (some maths should be developped around why...).
  29. 29. for random points==> small difference==> the key point is more the decreasing (or small) step-size
  30. 30. Classical methodologies:1. Restarts2. ``Real diversity mechanisms
  31. 31. 7~ -7x10 years ago Highly optimized. Hundreds of millions of years on earth.
  32. 32. 7 ~ -7x10 years ago Died! Not adapted tothe new environment.
  33. 33. 7~ -7x10 years ago Survived thanks to niching.
  34. 34. Eating Eucalyptus is dangerous for health (for many animals). But not for koalas. Also they sleep most of the day. Theyre original animals.==> This is an example of Niching.==> Niching is good for diversity.
  35. 35. USUAL SCHEME: WINNERS KILL LOOSERS POPULATION GOOD BAD GOODs GOOD BABIES
  36. 36. NEW SCHEME: WINNERS KILL LOOSERS ONLY POPULATION IF THEY LOOK LIKE GOOD BAD THEM GOODs GOOD BABIES
  37. 37. NEW SCHEME: WINNERS KILL LOOSERS ONLY POPULATION IF THEY LOOK LIKE GOOD BAD THEM Dont kill koalas; GOODs their food is GOOD useless to you. BABIES They can be useful in a future world.
  38. 38. CLEARING ALGORITHMInput: population Output: smaller population1) Sort the individuals (the best first)2) Loop on individuals; for each of them (if alive) kill individuals which are (i) worse (ii) at distance < some constant (...but the  first murders are sometimes canceled)3) Keep at most  individuals (the best)
  39. 39. I. Multimodal optimizationII. Adversarial / robust cases a) What does it mean ? b) Nashs stuff: computation c) Robustness stuff: computationIII. Noisy cases
  40. 40. What is adversarial / robust optimization ?I want to find argmin f.But I dont know exactly fI just know that f is in a family of functionsparameterized by .I can just compute f(x,), for a given .
  41. 41. What is adversarial / robust optimization ?I want to find argmin f.But I dont know exactly fI just know that f is in a family of functions parameterized by .I can just compute f(x,), for a given .Criteria: Argmin E f(x,). (average case; requires a X . probability distribution)Or Argmin sup f(x,). <== this is adversarial / robust. X 
  42. 42. What is adversarial / robust optimization ?I want to find argmin f.But I dont know exactly f Well see this case later.I just know that f is in a family of functions parameterized by .I can just compute f(x,), for a given .Criteria: Argmin E f(x,). (average case; requires a X . probability distribution)Or Argmin sup f(x,). <== this is adversarial / robust. X  Lets see this.
  43. 43. ==> looks like a game; two opponents choosing their strategies. VS x y(=) Can he try multiple times ? Does he choose his strategy first ?
  44. 44. Lets have a high-level look at all this stuff.We have seen two cases:inf sup f(x,y) (best power plant forx y worst earthquake ?)inf E f(x,y) (best power plant forx y average earthquake)Does it really cover all possible models ?
  45. 45. Lets have a high-level look at all this stuff.We have seen two cases: A bit pessimistic. This is the performance if y isinf sup f(x,y) chosen by an adversary who:x y - knows us - and has infinite computational resourcesinf E f(x,y)x yDoes it really cover all cases ?
  46. 46. Lets have a high-level look at all this stuff.We have seen two cases: A bit pessimistic. This is the performance if y isinf sup f(x,y) chosen by an adversary who:x y - knows us (or can do many trials) - and has infinite computational resourcesinf E f(x,y)x y Good for nuclear power plants.Does it really cover all cases ? Not good for economy, games...
  47. 47. Lets have a high-level look at all this stuff.We have seen two cases: A bit pessimistic. This is the performance if y isinf sup f(x,y) chosen by an adversary who:x y - knows us - and has infinite computational resourcesinf E f(x,y)x y Good for nuclear power plants.Does it really cover all cases ? Not good for economy, games...
  48. 48. Decision = positionning my distribution points.Reward = size of Voronoi cells. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  49. 49. Decision = positionning my distribution points.Reward = size of Voronoi cells.Simple model: Each company makes his positionning am morning for the whole day. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  50. 50. Decision = positionning my distribution points.Reward = size of Voronoi cells.Simple model: Each company makes his positionning am morning for the whole day.But choosing among strategies (i.e. decision rules, programs) leads to the same maths. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  51. 51. Ok for power plants and earthquakes.Inf sup f(x,y) But for economical choices ?x yReplaced by sup inf f(x,y) ? y x
  52. 52. Inf sup f(x,y)x yReplaced by sup inf f(x,y) ? y xMeans that we play second.==> lets see “real” uncertainty techniques.
  53. 53. inf sup f(x,y) is not equal to sup inf f(x,y)x y y x...but with finite domains:inf sup E f(x,y) is equal to sup inf f(x,y)L(x) L(y) L(y) L(x) Von Neumann
  54. 54. inf sup f(x,y) is not equal to sup inf f(x,y)x y y xE.g.: Rock-paper-scissors Inf sup = I win Sup inf = I looseAnd sup inf sup inf sup inf... ==> loopBut equality in Chess, draughts, … (turn-based, full-information)
  55. 55. inf sup f(x,y) is not equal to sup inf f(x,y)x y y x...but with finite domains:inf sup E f(x,y) is equal to sup inf f(x,y)L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y)==> The opponent chooses a randomized strategy without knowing what we choose.
  56. 56. inf sup f(x,y) is not equal to sup inf f(x,y)x y y x...but with finite domains:inf sup E f(x,y) is equal to sup inf f(x,y)L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y)==> The opponent chooses a randomized strategy without knowing what we choose.==> Good model for average performance against non-informed opponents.
  57. 57. Lets summarize.Nash: best distribution against worst distribution = good model if opponent cant know us, if criterion = average perf.Best against worst: good in many games, and good for robust industrial designBe creative: inf sup E is often reasonnable (e.g. sup on opponent, E on climate...)
  58. 58. Lets summarize.Nash: best distribution against worst distribution = good model if opponent cant know us, if criterion = average perf.Best against worst: good in many games, and Nash: good for robust industrial Adversarial bidding (economy). design Choices on markets.Be creative: inf sup E is often reasonnable Military strategy. (e.g. Games. sup on opponent, E on climate...)
  59. 59. Lets summarize.Nash: best distribution against worst distribution = good model if opponent cant know us, if criterion = average perf.Best against worst: good in many games, and good for robust industrial design In repeated games, you can(try to)Be creative: inf sup E is often reasonnablepredict your opponents decisions. (e.g. sup on opponent, E on climate...)E.g.: Rock Paper Scissors.
  60. 60. Lets summarize.Nash: best distribution against worst distribution = good model if opponent cant know us, if criterion = average perf.Best against worst:“worst-case” analysis: Remarks on this good in many games, and good foryoure stronger (e.g. computer against - if robust industrial design humans in chess), take into account humans weaknessBe creative: inf supearnis often reasonnable - in Poker, you will E more money by playing against weak Opponents (e.g. sup on opponent, E on climate...)
  61. 61. inf sup f(x,y) is not equal to sup inf f(x,y)x y y x...but with finite domains:inf sup E f(x,y) is equal to sup inf f(x,y)L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y) L(x),L(y) is a Nash equilibrium; L(x), L(y) not necessarily unique.
  62. 62. I. Multimodal optimizationII. Adversarial / robust cases a) What does it mean ? b) Nashs stuff: computation - fictitious play - EXP3 - coevolution c) Robustness stuff: computationIII. Noisy cases
  63. 63. FIRST CASE: everything finite ( x and  are in finite domains).==> Fictitious Play (Brown, Robinson) Simple, consistant, but slow. (see however Kummers paper)
  64. 64. FIRST CASE: everything finite ( x and  are in finite domains).==> Fictitious Play (Brown, Robinson)==> sequence of simulated games as follows: (x(n),y(n)) defined as follows: Simple, consistant,x(n+1) = argmin sum f(x, y(i)) but slow. x i<n+1 (see howevery(n+1) = argmax sum f(x(i), y) Kummers x i<n+1 paper)
  65. 65. FIRST CASE: everything finite ( x and  are in finite domains).==> Fictitious Play (Brown, Robinson)==> sequence of simulated games as follows: (x(n),y(n)) defined as follows: Simple, consistant,x(n+1) = argmin sum f(x, y(i)) but slow. x i<n+1 (see howevery(n+1) = argmax sum f(x(i), y) Kummers x i<n+1 paper)==> … until n=N; play randomly one of the x(i)s.
  66. 66. Much faster: EXP3, or INF (more complicated, seeAudibert)
  67. 67. Complexity of EXP3 within logarithmic terms: K / epsilon2for reaching precision epsilon.
  68. 68. Complexity of EXP3 within logarithmic terms: K / epsilon2for reaching precision epsilon.==> we dont even read all the matrice of 2 the f(x,y) ! (of size K )
  69. 69. Complexity of EXP3 within logarithmic terms: K / epsilon2for reaching precision epsilon.==> we dont even read all the matrice of 2 the f(x,y) ! (of size K )==> provably better than all deterministic algorithms
  70. 70. Complexity of EXP3 within logarithmic terms: 2 K / epsilonfor reaching precision epsilon.==> we dont even read all the matrice of 2 the f(x,y) ! (of size K )==> provably better than all deterministic algorithms==> requires randomness + RAM
  71. 71. Complexity of EXP3 within logarithmic terms: K / epsilon2for reaching precision epsilon.Ok, its great, but complexity K is still far too big.For Chess, log10(K) l 43.For the game of Go, log10(K) l 170.
  72. 72. Coevolution for K huge: Population 1 Population 2
  73. 73. Coevolution for K huge: Population 1 Population 2 (play games)
  74. 74. Coevolution for K huge: Population 1 Population 2 Fitness = average fitness against opponents
  75. 75. Coevolution for K huge: Population 1 Population 2 Fitness = average Pop. 1 Pop. 2 fitness against opponents
  76. 76. Coevolution for K huge: Population 1 Population 2 Fitness = average+ babies Pop. 1 Pop. 2 fitness against opponents Population 1 Population 2
  77. 77. RED QUEEN EFFECT in finite population:(FP keeps all the memory, not coevolution) 1. Rock vs Scissor 2. Rock vs Paper 3. Scissor vs Paper 4. Scissor vs Rock 5. Paper vs Rock 6. Paper vs Scissor 7. Rock vs Scissor =1
  78. 78. RED QUEEN EFFECT: 1. Rock vs Scissor 2. Rock vs Paper .... 6. Paper vs Scissor 7. Rock vs Scissor =1==> Population with archive (e.g. the best of each past iteration – looks like FP)
  79. 79. Reasonnably good in very hard casesbut plenty of more specialized algorithms forvarious cases: - alpha-beta - (fitted)-Q-learning - TD(lambda) - MCTS - ...
  80. 80. I. Multimodal optimizationII. Adversarial / robust cases a) What does it mean ? b) Nashs stuff: computation c) Robustness stuff: computationIII. Noisy cases
  81. 81. c) Robustness stuff: computation==> we look for argmin sup f(x,y) x y==> classical optimization for x==> classical optimization for y with restart
  82. 82. Naive solution for robust optimizationProcedure Fitness(x) y=maximize f(x,.) Return f(x,y)Mimimize fitness
  83. 83. Warm start for robust optimization ==> much fasterProcedure Fitness(x) Static y=y0; y = maximize f(x,y) Return f(x,y)Mimimize fitness
  84. 84. Population-based warm start forrobust optimizationProcedure Fitness(x) Static P=P0; P = multimodal-maximize f(x,P) Return max f(x,y) y in PMimimize fitness
  85. 85. Population-based warm start for robust optimization Procedure Fitness(x) f(x,y) = sin( y ) + cos((x+y) / pi) x,y in [-10,10] Static P = multimodal-maximize f(x,P)==> global maxima (in y) moves quickly (==> looks hard) Return max f(x,y)==> but local maxima (in y) move slowly as a function of x y in P==> warm start very efficient Mimimize fitness
  86. 86. Population-based warm start for robust optimization Procedure + cos((x+y) / pi) f(x,y) = sin( y ) Fitness(x) x,y in [-10,10] Static P = multimodal-maximize f(x,P) ==> global maxima (in y) moves quickly (==> looks hard) Return max f(x,y) ==> but local maxima (in y) move slowly as a function of x ==> warm start very efficient y in P==> interestingly, we just rediscovered coevolution :-) Mimimize fitness
  87. 87. Plots of f(x,.)for various values of x.
  88. 88. Coevolution for Nash: Population 1 Population 2Fitness = average Fitness = averagefitness against fitness against Pop. 1 Pop. 2 opponents opponents
  89. 89. Coevolution for inf sup: Population 1 Population 2Fitness = worst Fitness = averagefitness against fitness against Pop. 1 Pop. 2 opponents opponents
  90. 90. I. Multimodal optimizationII. Adversarial / robust casesIII. Noisy cases
  91. 91. Arg min E f(x,y) x y1) Monte-Carlo estimates2) Races: choosing the precision3) Using Machine Learning
  92. 92. Monte-Carlo estimatesE f(x,y) = 1/n sum f(x, yi) + errorError scaling as 1/squareRoot( n )==> how to be faster ?==> how to choose n ? (later)
  93. 93. Faster Monte-Carlo evaluation:- more points in critical areas (prop. to standard deviation), (+rescaling!)- evaluating f-g, where g has known expectation and is “close” to f.- evaluating (f + f o h)/2 where h is some symmetry
  94. 94. Faster Monte-Carlo evaluation:- more points in critical areas (prop. to standard deviation), (+rescaling!)- evaluating f-g, where g has known expectation and is “close” to f. Importance sampling- evaluating (f + f o h)/2 where h is some symmetry
  95. 95. Faster Monte-Carlo evaluation:- more points in criticalControl variable to areas (prop. standard deviation), (+rescaling!)- evaluating f-g, where g has known expectation and is “close” to f.- evaluating (f + f o h)/2 where h is some symmetry
  96. 96. Faster Monte-Carlo evaluation:- more points in critical areas (prop. to standard deviation), (+rescaling!)- evaluating f-g, where g has variables Antithetic known expectation and is “close” to f.- evaluating (f + f o h)/2 where h is some symmetry
  97. 97. And quasi-Monte-Carlo evaluation ? dlog(n) / n instead of std / sqrt(n)==> much better for n really large (if some smoothness)==> much worse for d large
  98. 98. Arg min E f(x,y)1) Monte-Carlo estimates2) Races: choosing the precision3) Using Machine Learning
  99. 99. Bernsteins inequality (see Audibert et al):Where: R = range of the Xis
  100. 100. Comparison-based optimization with noise: how to find the best individuals by a race ?While (I want to go on){ I evaluate once each arm. I compute a lower and upper bound for all non-discarded individuals (by Bernsteins bound). I discard individuals which are excluded by the bounds.}
  101. 101. Trouble: What if two individuals have the same fitness value ? ==> infinite run !Tricks for avoiding that at iteration N: - limiting number of evals by some function of N (e.g. exp(N) if a log-linear convergence is expected w.r.t number of iterations) - limiting precision by some function of sigma (e.g. precision = sqrt(sigma) / log(1/sigma) )
  102. 102. Arg min E f(x,y)1) Monte-Carlo estimates2) Races: choosing the precision3) Using Machine Learning
  103. 103. USING MACHINE LEARNINGStatistical tools: f (x) = approximation ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) y(n+1) = f (x(n+1) ) e.g. f = quadratic function closest to f on the x(i)s.and x(n+1) = random or x(n+1) = maxUncertainty or x(n+1) = argmin Ef or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
  104. 104. USING MACHINE LEARNING SVM, Kriging, RBF, Gaussian Processes, Or f(x) = maximum likelihood on a parametric noisy modelStatistical tools: f (x) = parametric approximation ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) y(n+1) = f (x(n+1) ) e.g. f = logistic quadratic function closest to f on the x(i)s.and x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f or x(n+1)=argmax E max(0,bestSoFar - f(y))
  105. 105. USING MACHINE LEARNING Very fast if good modelStatistical tools: f (x) = approximation model + noise) (e.g. quadratic Expensive but efficient: x, x1,f(x1), x2,f(x2), … , xn,f(xn)) ( Efficient Global f (x(n+1) ) y(n+1) = Optimization e.g. f = quadratic function closest to f on the x(i)s. in multimodal contextand x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f or x(n+1)=argmax E max(0,bestSoFar - f(y))
  106. 106. USING MACHINE LEARNING Very fast if good modelStatistical tools: f (x) = approximation model + noise) (e.g. quadratic ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) Alsoy(n+1) = f (x(n+1) ) termed “expected improvement” e.g. f = quadratic function closest to f on the x(i)s.and x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
  107. 107. So some tools for noisy optimization: 1. introducing races + tricks 2. using models equipped with uncertainty (2. more difficult, but sometimes much more efficient)
  108. 108. Deng et al: application to DIRECT
  109. 109. A rectangle j or size alpha(j) is splitted if for some K:
  110. 110. A rectangle j or size alpha(j) is splitted if for some K:
  111. 111. Warm starts Robust optimizationCoevolution Noisy optimizationNiching Nash equilibria Surrogate modelsClearing Maximum uncertaintyMultimodal optimization Monte-Carlo estimateRestarts (sequential / parallel) Quasi-Monte-CarloKoalas and Eucalyptus Van Der CorputDinosaurs (and other mass extinction) HaltonFictitious Play ScramblingEXP3 Antithetic variables, controlEGO variables, importance sampling.
  112. 112. EXERCISES (to be done at home, deadline given soon):1. Implement a (1+1)-ES and test it on the sphere function f(x)=||x||^2 in dimension 3. Plot the convergence. Nb: choose a scale so that you can check the convergence rate conveniently.2. Implement a pattern search method (PSM) and test it on the sphere function, also in dimension 3 (choose the same scale as above).
  113. 113. 3. We will now consider a varying dimension; - x(n,d) will be the nth visited point in dimension d for the (1+1)-ES, and - x(n,d) will be the nth visited point in dimension d for the PSM. For both implementations, choose e.g. n=5000, and plot x(n,d) and x(n,d) as a function of d. Choose a convenient scaling as a function of d so that you can see the effect of the dimension.4. What happens if you consider a quadratic positive definite ill-conditionned function instead of the sphere function ? E.g.: f(x) = sum of 10^i xi^2 = 10x(1)^2 + 100x(2)^2 + 1000x(3)^2 in dimension 3; test in Dimension 50.5. (without implementing) Suggest a solution for the problem in Ex. 4.

×