Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Nature-inspired algorithms

507 views

Published on

A look at using algorithms inspired by nature to solve optimization problems, and some testing of which algorithms actually perform best.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Nature-inspired algorithms

  1. 1. Nature-inspired algorithms Lars Marius Garshol, lars.marius.garshol@schibsted.com http://twitter.com/larsga 2017–09–14, Oslo
  2. 2. The problem 2
  3. 3. The problem • You need to tune something • a database • a search engine • a machine learning solution • … 2
  4. 4. The problem • You need to tune something • a database • a search engine • a machine learning solution • … • Getting good results is important • but there’s lots of values to tune • the effects of tuning them are hard to predict 2
  5. 5. Database tuning 3Automatic database management system tuning through large-scale machine learning Aken et al. , SIGMOD’17
  6. 6. Find the best pair 4 Effects of MySQL tuning
  7. 7. Config settings 5
  8. 8. How to solve 6
  9. 9. How to solve 1. Formulate the problem clearly 6
  10. 10. How to solve 1. Formulate the problem clearly 2. Measure results properly 6
  11. 11. How to solve 1. Formulate the problem clearly 2. Measure results properly 3. Then • try to understand the problem in depth, and/or • let your computer find a good solution 6
  12. 12. Our kind of problem 7
  13. 13. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is 7
  14. 14. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to find the highest point in a many- dimensional space 7
  15. 15. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to find the highest point in a many- dimensional space • One dimension per knob + 1 dimension for evaluation function 7
  16. 16. The hill-climbing problem 8
  17. 17. The hill-climbing problem 8
  18. 18. The hill-climbing problem 8 Explore vs Exploit
  19. 19. A warning! • Be very, very careful about the evaluation function • Your algorithm will produce a good value for the evaluation function • If the function matches poorly with what you actually need, you’re going to work hard to produce something bad … 9
  20. 20. Genetic algorithm 10
  21. 21. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10
  22. 22. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit
  23. 23. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit Explore
  24. 24. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace 10 Exploit Explore
  25. 25. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace • Strength: can solve non-numeric problems • can even write code 10 Exploit Explore
  26. 26. Particle-swarm optimization (1995) • A swarm of particles explore the search space together • move around semi-randomly • communicate about what they’ve seen • particles attracted toward high spots in the landscape 11
  27. 27. PSO, initialization • Use 10 + int(2 * math.sqrt(dimensions)) particles • Position each particle randomly • Give each particle a random velocity 12
  28. 28. PSO, iteration 13
  29. 29. PSO, iteration • For each particle, in each dimension, update the velocity by adding 13
  30. 30. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor 13
  31. 31. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) 13
  32. 32. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) 13
  33. 33. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as 13
  34. 34. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and 13
  35. 35. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and • best and best neighbour position converge 13
  36. 36. Code def f1(x): return 1 + math.sin(2 * np.pi * x) def f2a(x): return x ** 3 - 2 * x ** 2 + 1 * x - 1 def f(x): return f1(x) + f2a(x) swarm = pso.Swarm( dimensions = [(0.0, 2.0)], fitness = lambda x: f(x[0]), particles = 5 ) for ix in range(20): swarm.iterate() print swarm.get_best_ever() 14
  37. 37. Implementation class Particle: def __init__(self, dimensions, fitness): self._dimensions = dimensions self._fitness = fitness self._vel = [pick_velocity(min, max) for (min, max) in dimensions] def iterate(self): for ix in range(len(self._dimensions)): self._vel[ix] = ( self._vel[ix] * w + random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) + random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix]) ) self._pos[ix] += self._vel[ix] self._constrain(ix) self._update() 15
  38. 38. Failure 16
  39. 39. Failure 16
  40. 40. Failure 16
  41. 41. Failure 16
  42. 42. Failure 16
  43. 43. Failure 16
  44. 44. Failure 16
  45. 45. Failure 16
  46. 46. Failure 16 Stuck at local maximum
  47. 47. Failure 16 Stuck at local maximum
  48. 48. Failure 16 Stuck at local maximum
  49. 49. Failure 16 Stuck at local maximum
  50. 50. Success 17
  51. 51. Success 17
  52. 52. Success 17
  53. 53. Success 17
  54. 54. Success 17
  55. 55. Success 17
  56. 56. Success 17
  57. 57. Success 17
  58. 58. Success 17
  59. 59. Evaluation 18 1.14104381602 2.0 1.14104055169 1.14104347433 2.0 1.1410413438 2.0 2.0 1.14104338772 2.0 2.0 2.0 2.0 1.14091395333 2.0 2.0 2.0 2.0 2.0 1.54853512969 1.41797678323 1.95856547819 1.96104312889 1.97134000263 1.77291733994 1.89848029636 1.78709716124 1.87083300112 1.94696318783 1.92667787259 1.88582583083 1.99492413725 1.95307959097 1.84557417213 1.93746433855 1.88898766983 1.80740253329 1.99788027783 PSO Average: 1.7 Average: 1.9 Random
  60. 60. Enough toy problems Let’s try a real problem
  61. 61. Problem: Find the duplicates 20 ID SOURCE NAME ADDRESS1 ADDRESS2 ADDRESS3 CITY ZIP 9354686 1300001 Augustin C. Sundts Gate 22-24 Bergen 5004 8007 9316306 1300006 AUGUSTIN C. SUNDSGATE 22 5004 BERGEN NORWAY BGO 9025453 1300010 Augustin Hotel C. Sundts gate 22 Bergen 5004 9151327 1300010 Basic Hotel Bergen Hakonsgaten 27 Bergen 5015 9150992 1300010 Basic Hotel Marken Kong Oscars gate 45 Bergen 5017 9048595 1300010 Basic Hotel Victoria Kong Oscars Gate 29 Bergen 5017 9151853 1300010 Bergen Bed & Breakfast Hennebysmauet 9 Bergen 5005 9316307 1300006 BERGEN TRAVEL VESTRE TORGGATE 7 5015 BERGEN NORWAY BGO 9062459 1300010 Bergen Travel Hotel Vestre Torggaten 7 Bergen 5015 9010488 1300001 Best Western Hordaheimen C. Sundtsgt. 18 5004 9316314 1300006 BEST WESTERN HORDAHEIMEN C. SUNDTSGATE 18 BERGEN NORWAY 5004 BERGEN BGO 9032340 1300010 Best Western Hotell Hordaheimen C. Sundtsgate 18 Bergen 5004 9362760 1300001 Clarion Admiral C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9316308 1300006 CLARION ADMIRAL C. SUNDTS GATE 9 5804 BERGEN NORWAY BGO 9364882 1300001 Clarion Admiral (Fjord View) C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9010491 1300001 Clarion Bergen Airport Flyplassveien 555 5869 9363104 1300001 Clarion Bergen Airport Flyplassveien 555 Po Box 24 Bergen No-5869
  62. 62. Configure manually 21 PROPERTY COMPARATOR LOW HIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74
  63. 63. Dedup with PSO • A 27-dimension problem • Really difficult to solve optimally • Takes a long time to evaluate solutions • No idea what the best possible solution actually is 22
  64. 64. PSO vs random 23 Green: PSO Blue: random
  65. 65. Two PSO runs 24
  66. 66. 100 PSO runs 25
  67. 67. Average of 100 26
  68. 68. PSO vs genetic 27
  69. 69. Firefly algorithm 28 Xin-She Yang, 2010
  70. 70. Firefly algorithm • Fireflies are positioned randomly 28 Xin-She Yang, 2010
  71. 71. Firefly algorithm • Fireflies are positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself 28 Xin-She Yang, 2010
  72. 72. Firefly algorithm • Fireflies are positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself • Each firefly shines brighter the better its fitness is • but attractiveness falls off with square of the distance 28 Xin-She Yang, 2010
  73. 73. Firefly algorithm • Fireflies are positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself • Each firefly shines brighter the better its fitness is • but attractiveness falls off with square of the distance • Add random jiggling 28 Xin-She Yang, 2010
  74. 74. How it works • The best firefly always stands still, pulling the others toward the best result • Bad fireflies get pulled in all directions, but good fireflies get pulled much less • this is exploit vs explore • Pull diminishes with distance, so the fireflies don’t necessarily all explore the same best position • in order to explore more local maxima 29
  75. 75. Firefly code def iterate(self): for firefly in self._swarm.get_particles(): if self._val < firefly._val: dist = self.distance(firefly) attract = firefly._val / (1 + gamma * (dist ** 2)) for ix in range(len(self._dimensions)): jiggle = alpha * (random.uniform(0, 1) - 0.5) diff = firefly._pos[ix] - self._pos[ix] self._pos[ix] = self._pos[ix] + jiggle + (attract * diff) self._constrain(ix) 30
  76. 76. Evaluation 31 Unit is: number of fitness evaluations to find global maximum (Lower is better) Taken from Yang 2010
  77. 77. Firefly vs PSO 32
  78. 78. Cuckoo search • Very similar to genetic algorithm • take candidate, modify it • if better than existing candidate, replace • Every generation, discard some proportion of candidates • fill up with random new ones • The difference is in how new results are produced • using Lévy flights 33 Yang et al. 2010
  79. 79. Lévy flights 34
  80. 80. How it works • Balance explore and exploit with Lévy flights • usually jump short, sometimes jump long • Never replace good candidates by bad • but always throw away the n worst 35
  81. 81. Toy problem again 36
  82. 82. Toy problem again 36
  83. 83. Toy problem again 36
  84. 84. Starts exploring local maximum Toy problem again 36
  85. 85. Starts exploring local maximum Toy problem again 36
  86. 86. Starts exploring local maximum Toy problem again 36
  87. 87. Starts exploring local maximum Toy problem again 36
  88. 88. Starts exploring local maximum Toy problem again 36
  89. 89. Starts exploring local maximum Toy problem again 36
  90. 90. Starts exploring local maximum Toy problem again 36
  91. 91. Starts exploring local maximum Toy problem again 36
  92. 92. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  93. 93. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  94. 94. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  95. 95. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  96. 96. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  97. 97. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
  98. 98. Evaluation 37
  99. 99. Comparison 38
  100. 100. My 2015 attempt 39
  101. 101. Morale • These are stochastic algorithms • one evaluation doesn’t tell you much about the algorithm • even 10 evaluations isn’t enough • Be careful here, or you can fool yourself! 40
  102. 102. It’s not that simple • Which PSO? • SPSO 2006, 2007, or 2011? • What values to use for decay factor and randomization? • What neighbourhood topology to use? • Firefly has parameters alpha, beta, and gamma • Cuckoo has alpha and scale 41
  103. 103. Firefly, different alphas 42
  104. 104. So … in order to tune our algorithm we need to tune the algorithm that tunes our algorithm
  105. 105. Problems • Doing one run of Cuckoo search takes ~30 minutes • Need to do that ~40 times to get a decent estimate • My laptop was already getting uncomfortably hot • My wife was complaining about the fan noise • What to do? 44
  106. 106. In the cloud 45 Master Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node /get-task /answer
  107. 107. Master algorithm • Run PSO on Cuckoo alpha & scale • Fitness function • sets the task handed out by /get-task • hangs until 20 evaluations have come in via /answer • returns average of evaluations 46
  108. 108. Sunday morning 47
  109. 109. Sunday morning 47 First values I tried
  110. 110. Monday morning 48 First values I tried
  111. 111. Tuesday morning 49
  112. 112. More algorithms • Flower-pollination algorithm [Xin-She Yang 2013] • Bat-inspired algorithm [Xin-She Yang, 2010] • Ant colony algorithm [Marco Dorigo, 1992] • Bee algorithm [Pham et al 2005] • Fish school search [Filho et al 2007] • Artificial bee colony algorithm [Karaboga et al 2005] • … 50 Looks interesting, hard to find algorithm Paper too vague, can’t implement Complicated, didn’t finish implementation
  113. 113. 51 PROPERTY COMPARATOR LOW HIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74 But what about these?
  114. 114. The machine learning way 52 PROPERTY COMPARATOR LOW HIGH Name c1 c1 0.35 0.88 Name c2 c2 0.25 0.65 Name c3 c3 0.5 0.6 Name c4 c4 0.49 0.51 … … 0.45 0.65 Address1 c1 c1 0.25 0.6 Address1 c2 c2 0.0 0.5 Address1 c3 c3 … … Address1 c4 c4 … … … … … …
  115. 115. Opens a door • Means we can drop the probabilities, and just use the numeric values coming out of the comparators • Feed into one of • random forests • Support Vector Machine (SVM) • logistic regression • neural networks • … • Except that’s no longer optimization, but attacking the problem directly, so let’s stick with our algorithms 53
  116. 116. General trick • Quite common to “cheat” this way to use numeric- only machine learning algorithms • Turn boolean into [0, 1] parameter • Turn enumeration into one boolean per value • Looks odd, but in general it does work • of course, now there isn’t much spatial structure any more 54
  117. 117. Tricky, tricky 55
  118. 118. Tricky, tricky • Our problem now has 267 dimensions • should allow us to tune for really detailed signals 55
  119. 119. Tricky, tricky • Our problem now has 267 dimensions • should allow us to tune for really detailed signals • The curse of dimensionality • everywhere is pretty much equally far away from everywhere else • hyperspace consists of all corners and no middle • many of the dimensions contain no signal 55
  120. 120. How it went 56
  121. 121. Another test 57 Any number of dimensions, this is 2 Number of local minima is d!
  122. 122. Performance 58
  123. 123. Meta-tuning again def fitness(pos): (firefly.alpha, firefly.gamma) = pos averages = crap.run_experiment( firefly, dimensions = [(0.0, math.pi)] * 16, fitness = function.michaelwicz, particles = 20, problem = 'michaelwicz', quiet = True ) return crap.average(averages) dimensions = [(0.0, 1.0), (0.0, 10.0)] # alpha, gamma swarm = pso.Swarm(dimensions, fitness, 10) crap.evaluate(swarm, 'meta-michaelwicz-firefly') 59
  124. 124. It’s better… 60
  125. 125. Rosenbrock 61
  126. 126. Griewangk 62
  127. 127. What to choose? • PSO • generally performs best • dead easy to implement • parameters available in the literature • no need to scale to coordinate system used • SPSO 2007 • values for w and c are given • ring topology: each particle knows p-1, p, p+1 63
  128. 128. Simulated annealing • Inspired by the behaviour of cooling metals • guaranteed to eventually find maximum • no guarantee that time taken will be reasonable • To use, requires following parameters • Candidate neighbour generation procedure • Acceptance probability function • Annealing schedule • Initial temperature • May work better than PSO and friends, but also requires a lot more effort to set up 64
  129. 129. Choices, choices • There are always more advanced methods • sometimes they work much better, sometimes not • PSO gives you a dead easy place to start • whip it up in a few lines of code • see how it works • vastly better than random trial and error • adapts nicely to all kinds of problems without tuning 65
  130. 130. Research papers • Publishing standards are probably too lax • Algorithm descriptions are weak • far too little information about tuning parameters • no code available • Evaluation sections are weak • only one evaluation metric • no information about how PSO/GA were tuned • no information about how proposed algorithm was tuned • no cross-comparison with other algorithms 66
  131. 131. See for yourself • https://github.com/larsga/py-snippets/tree/master/ machine-learning/pso • links to all the papers • code for crap, genetic, pso, firefly, cuckoo • bonus: cuckoo2, server • also has the test functions • Total number of experiments: 24,008 • means evaluating fitness 2,400,800 times 67

×