Successfully reported this slideshow.
Upcoming SlideShare
×

# Nature-inspired algorithms

507 views

Published on

A look at using algorithms inspired by nature to solve optimization problems, and some testing of which algorithms actually perform best.

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Nature-inspired algorithms

1. 1. Nature-inspired algorithms Lars Marius Garshol, lars.marius.garshol@schibsted.com http://twitter.com/larsga 2017–09–14, Oslo
2. 2. The problem 2
3. 3. The problem • You need to tune something • a database • a search engine • a machine learning solution • … 2
4. 4. The problem • You need to tune something • a database • a search engine • a machine learning solution • … • Getting good results is important • but there’s lots of values to tune • the effects of tuning them are hard to predict 2
5. 5. Database tuning 3Automatic database management system tuning through large-scale machine learning Aken et al. , SIGMOD’17
6. 6. Find the best pair 4 Effects of MySQL tuning
7. 7. Conﬁg settings 5
8. 8. How to solve 6
9. 9. How to solve 1. Formulate the problem clearly 6
10. 10. How to solve 1. Formulate the problem clearly 2. Measure results properly 6
11. 11. How to solve 1. Formulate the problem clearly 2. Measure results properly 3. Then • try to understand the problem in depth, and/or • let your computer ﬁnd a good solution 6
12. 12. Our kind of problem 7
13. 13. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is 7
14. 14. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to ﬁnd the highest point in a many- dimensional space 7
15. 15. Our kind of problem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to ﬁnd the highest point in a many- dimensional space • One dimension per knob + 1 dimension for evaluation function 7
16. 16. The hill-climbing problem 8
17. 17. The hill-climbing problem 8
18. 18. The hill-climbing problem 8 Explore vs Exploit
19. 19. A warning! • Be very, very careful about the evaluation function • Your algorithm will produce a good value for the evaluation function • If the function matches poorly with what you actually need, you’re going to work hard to produce something bad … 9
20. 20. Genetic algorithm 10
21. 21. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10
22. 22. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit
23. 23. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit Explore
24. 24. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace 10 Exploit Explore
25. 25. Genetic algorithm • The “original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace • Strength: can solve non-numeric problems • can even write code 10 Exploit Explore
26. 26. Particle-swarm optimization (1995) • A swarm of particles explore the search space together • move around semi-randomly • communicate about what they’ve seen • particles attracted toward high spots in the landscape 11
27. 27. PSO, initialization • Use 10 + int(2 * math.sqrt(dimensions)) particles • Position each particle randomly • Give each particle a random velocity 12
28. 28. PSO, iteration 13
29. 29. PSO, iteration • For each particle, in each dimension, update the velocity by adding 13
30. 30. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor 13
31. 31. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) 13
32. 32. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) 13
33. 33. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as 13
34. 34. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and 13
35. 35. PSO, iteration • For each particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and • best and best neighbour position converge 13
36. 36. Code def f1(x): return 1 + math.sin(2 * np.pi * x) def f2a(x): return x ** 3 - 2 * x ** 2 + 1 * x - 1 def f(x): return f1(x) + f2a(x) swarm = pso.Swarm( dimensions = [(0.0, 2.0)], fitness = lambda x: f(x[0]), particles = 5 ) for ix in range(20): swarm.iterate() print swarm.get_best_ever() 14
37. 37. Implementation class Particle: def __init__(self, dimensions, fitness): self._dimensions = dimensions self._fitness = fitness self._vel = [pick_velocity(min, max) for (min, max) in dimensions] def iterate(self): for ix in range(len(self._dimensions)): self._vel[ix] = ( self._vel[ix] * w + random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) + random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix]) ) self._pos[ix] += self._vel[ix] self._constrain(ix) self._update() 15
38. 38. Failure 16
39. 39. Failure 16
40. 40. Failure 16
41. 41. Failure 16
42. 42. Failure 16
43. 43. Failure 16
44. 44. Failure 16
45. 45. Failure 16
46. 46. Failure 16 Stuck at local maximum
47. 47. Failure 16 Stuck at local maximum
48. 48. Failure 16 Stuck at local maximum
49. 49. Failure 16 Stuck at local maximum
50. 50. Success 17
51. 51. Success 17
52. 52. Success 17
53. 53. Success 17
54. 54. Success 17
55. 55. Success 17
56. 56. Success 17
57. 57. Success 17
58. 58. Success 17
59. 59. Evaluation 18 1.14104381602 2.0 1.14104055169 1.14104347433 2.0 1.1410413438 2.0 2.0 1.14104338772 2.0 2.0 2.0 2.0 1.14091395333 2.0 2.0 2.0 2.0 2.0 1.54853512969 1.41797678323 1.95856547819 1.96104312889 1.97134000263 1.77291733994 1.89848029636 1.78709716124 1.87083300112 1.94696318783 1.92667787259 1.88582583083 1.99492413725 1.95307959097 1.84557417213 1.93746433855 1.88898766983 1.80740253329 1.99788027783 PSO Average: 1.7 Average: 1.9 Random
60. 60. Enough toy problems Let’s try a real problem
61. 61. Problem: Find the duplicates 20 ID SOURCE NAME ADDRESS1 ADDRESS2 ADDRESS3 CITY ZIP 9354686 1300001 Augustin C. Sundts Gate 22-24 Bergen 5004 8007 9316306 1300006 AUGUSTIN C. SUNDSGATE 22 5004 BERGEN NORWAY BGO 9025453 1300010 Augustin Hotel C. Sundts gate 22 Bergen 5004 9151327 1300010 Basic Hotel Bergen Hakonsgaten 27 Bergen 5015 9150992 1300010 Basic Hotel Marken Kong Oscars gate 45 Bergen 5017 9048595 1300010 Basic Hotel Victoria Kong Oscars Gate 29 Bergen 5017 9151853 1300010 Bergen Bed & Breakfast Hennebysmauet 9 Bergen 5005 9316307 1300006 BERGEN TRAVEL VESTRE TORGGATE 7 5015 BERGEN NORWAY BGO 9062459 1300010 Bergen Travel Hotel Vestre Torggaten 7 Bergen 5015 9010488 1300001 Best Western Hordaheimen C. Sundtsgt. 18 5004 9316314 1300006 BEST WESTERN HORDAHEIMEN C. SUNDTSGATE 18 BERGEN NORWAY 5004 BERGEN BGO 9032340 1300010 Best Western Hotell Hordaheimen C. Sundtsgate 18 Bergen 5004 9362760 1300001 Clarion Admiral C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9316308 1300006 CLARION ADMIRAL C. SUNDTS GATE 9 5804 BERGEN NORWAY BGO 9364882 1300001 Clarion Admiral (Fjord View) C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9010491 1300001 Clarion Bergen Airport Flyplassveien 555 5869 9363104 1300001 Clarion Bergen Airport Flyplassveien 555 Po Box 24 Bergen No-5869
62. 62. Conﬁgure manually 21 PROPERTY COMPARATOR LOW HIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74
63. 63. Dedup with PSO • A 27-dimension problem • Really difﬁcult to solve optimally • Takes a long time to evaluate solutions • No idea what the best possible solution actually is 22
64. 64. PSO vs random 23 Green: PSO Blue: random
65. 65. Two PSO runs 24
66. 66. 100 PSO runs 25
67. 67. Average of 100 26
68. 68. PSO vs genetic 27
69. 69. Fireﬂy algorithm 28 Xin-She Yang, 2010
70. 70. Fireﬂy algorithm • Fireﬂies are positioned randomly 28 Xin-She Yang, 2010
71. 71. Fireﬂy algorithm • Fireﬂies are positioned randomly • On each iteration, each ﬁreﬂy jumps toward every other ﬁreﬂy based on how bright it looks • that is, the brighter the other ﬁreﬂy appears, the further toward it our ﬁreﬂy jumps • it only jumps toward ﬁreﬂies that are brighter than itself 28 Xin-She Yang, 2010
72. 72. Fireﬂy algorithm • Fireﬂies are positioned randomly • On each iteration, each ﬁreﬂy jumps toward every other ﬁreﬂy based on how bright it looks • that is, the brighter the other ﬁreﬂy appears, the further toward it our ﬁreﬂy jumps • it only jumps toward ﬁreﬂies that are brighter than itself • Each ﬁreﬂy shines brighter the better its ﬁtness is • but attractiveness falls off with square of the distance 28 Xin-She Yang, 2010
73. 73. Fireﬂy algorithm • Fireﬂies are positioned randomly • On each iteration, each ﬁreﬂy jumps toward every other ﬁreﬂy based on how bright it looks • that is, the brighter the other ﬁreﬂy appears, the further toward it our ﬁreﬂy jumps • it only jumps toward ﬁreﬂies that are brighter than itself • Each ﬁreﬂy shines brighter the better its ﬁtness is • but attractiveness falls off with square of the distance • Add random jiggling 28 Xin-She Yang, 2010
74. 74. How it works • The best ﬁreﬂy always stands still, pulling the others toward the best result • Bad ﬁreﬂies get pulled in all directions, but good ﬁreﬂies get pulled much less • this is exploit vs explore • Pull diminishes with distance, so the ﬁreﬂies don’t necessarily all explore the same best position • in order to explore more local maxima 29
75. 75. Fireﬂy code def iterate(self): for firefly in self._swarm.get_particles(): if self._val < firefly._val: dist = self.distance(firefly) attract = firefly._val / (1 + gamma * (dist ** 2)) for ix in range(len(self._dimensions)): jiggle = alpha * (random.uniform(0, 1) - 0.5) diff = firefly._pos[ix] - self._pos[ix] self._pos[ix] = self._pos[ix] + jiggle + (attract * diff) self._constrain(ix) 30
76. 76. Evaluation 31 Unit is: number of ﬁtness evaluations to ﬁnd global maximum (Lower is better) Taken from Yang 2010
77. 77. Fireﬂy vs PSO 32
78. 78. Cuckoo search • Very similar to genetic algorithm • take candidate, modify it • if better than existing candidate, replace • Every generation, discard some proportion of candidates • ﬁll up with random new ones • The difference is in how new results are produced • using Lévy ﬂights 33 Yang et al. 2010
79. 79. Lévy ﬂights 34
80. 80. How it works • Balance explore and exploit with Lévy ﬂights • usually jump short, sometimes jump long • Never replace good candidates by bad • but always throw away the n worst 35
81. 81. Toy problem again 36
82. 82. Toy problem again 36
83. 83. Toy problem again 36
84. 84. Starts exploring local maximum Toy problem again 36
85. 85. Starts exploring local maximum Toy problem again 36
86. 86. Starts exploring local maximum Toy problem again 36
87. 87. Starts exploring local maximum Toy problem again 36
88. 88. Starts exploring local maximum Toy problem again 36
89. 89. Starts exploring local maximum Toy problem again 36
90. 90. Starts exploring local maximum Toy problem again 36
91. 91. Starts exploring local maximum Toy problem again 36
92. 92. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
93. 93. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
94. 94. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
95. 95. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
96. 96. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
97. 97. Starts exploring local maximum Toy problem again 36 Rogue cuckoo strikes gold
98. 98. Evaluation 37
99. 99. Comparison 38
100. 100. My 2015 attempt 39
101. 101. Morale • These are stochastic algorithms • one evaluation doesn’t tell you much about the algorithm • even 10 evaluations isn’t enough • Be careful here, or you can fool yourself! 40
102. 102. It’s not that simple • Which PSO? • SPSO 2006, 2007, or 2011? • What values to use for decay factor and randomization? • What neighbourhood topology to use? • Fireﬂy has parameters alpha, beta, and gamma • Cuckoo has alpha and scale 41
103. 103. Fireﬂy, different alphas 42
104. 104. So … in order to tune our algorithm we need to tune the algorithm that tunes our algorithm
105. 105. Problems • Doing one run of Cuckoo search takes ~30 minutes • Need to do that ~40 times to get a decent estimate • My laptop was already getting uncomfortably hot • My wife was complaining about the fan noise • What to do? 44
106. 106. In the cloud 45 Master Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node Worker node /get-task /answer
107. 107. Master algorithm • Run PSO on Cuckoo alpha & scale • Fitness function • sets the task handed out by /get-task • hangs until 20 evaluations have come in via /answer • returns average of evaluations 46
108. 108. Sunday morning 47
109. 109. Sunday morning 47 First values I tried
110. 110. Monday morning 48 First values I tried
111. 111. Tuesday morning 49
112. 112. More algorithms • Flower-pollination algorithm [Xin-She Yang 2013] • Bat-inspired algorithm [Xin-She Yang, 2010] • Ant colony algorithm [Marco Dorigo, 1992] • Bee algorithm [Pham et al 2005] • Fish school search [Filho et al 2007] • Artiﬁcial bee colony algorithm [Karaboga et al 2005] • … 50 Looks interesting, hard to ﬁnd algorithm Paper too vague, can’t implement Complicated, didn’t ﬁnish implementation
113. 113. 51 PROPERTY COMPARATOR LOW HIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74 But what about these?
114. 114. The machine learning way 52 PROPERTY COMPARATOR LOW HIGH Name c1 c1 0.35 0.88 Name c2 c2 0.25 0.65 Name c3 c3 0.5 0.6 Name c4 c4 0.49 0.51 … … 0.45 0.65 Address1 c1 c1 0.25 0.6 Address1 c2 c2 0.0 0.5 Address1 c3 c3 … … Address1 c4 c4 … … … … … …
115. 115. Opens a door • Means we can drop the probabilities, and just use the numeric values coming out of the comparators • Feed into one of • random forests • Support Vector Machine (SVM) • logistic regression • neural networks • … • Except that’s no longer optimization, but attacking the problem directly, so let’s stick with our algorithms 53
116. 116. General trick • Quite common to “cheat” this way to use numeric- only machine learning algorithms • Turn boolean into [0, 1] parameter • Turn enumeration into one boolean per value • Looks odd, but in general it does work • of course, now there isn’t much spatial structure any more 54
117. 117. Tricky, tricky 55
118. 118. Tricky, tricky • Our problem now has 267 dimensions • should allow us to tune for really detailed signals 55
119. 119. Tricky, tricky • Our problem now has 267 dimensions • should allow us to tune for really detailed signals • The curse of dimensionality • everywhere is pretty much equally far away from everywhere else • hyperspace consists of all corners and no middle • many of the dimensions contain no signal 55
120. 120. How it went 56
121. 121. Another test 57 Any number of dimensions, this is 2 Number of local minima is d!
122. 122. Performance 58
123. 123. Meta-tuning again def fitness(pos): (firefly.alpha, firefly.gamma) = pos averages = crap.run_experiment( firefly, dimensions = [(0.0, math.pi)] * 16, fitness = function.michaelwicz, particles = 20, problem = 'michaelwicz', quiet = True ) return crap.average(averages) dimensions = [(0.0, 1.0), (0.0, 10.0)] # alpha, gamma swarm = pso.Swarm(dimensions, fitness, 10) crap.evaluate(swarm, 'meta-michaelwicz-firefly') 59
124. 124. It’s better… 60
125. 125. Rosenbrock 61
126. 126. Griewangk 62
127. 127. What to choose? • PSO • generally performs best • dead easy to implement • parameters available in the literature • no need to scale to coordinate system used • SPSO 2007 • values for w and c are given • ring topology: each particle knows p-1, p, p+1 63
128. 128. Simulated annealing • Inspired by the behaviour of cooling metals • guaranteed to eventually ﬁnd maximum • no guarantee that time taken will be reasonable • To use, requires following parameters • Candidate neighbour generation procedure • Acceptance probability function • Annealing schedule • Initial temperature • May work better than PSO and friends, but also requires a lot more effort to set up 64
129. 129. Choices, choices • There are always more advanced methods • sometimes they work much better, sometimes not • PSO gives you a dead easy place to start • whip it up in a few lines of code • see how it works • vastly better than random trial and error • adapts nicely to all kinds of problems without tuning 65
130. 130. Research papers • Publishing standards are probably too lax • Algorithm descriptions are weak • far too little information about tuning parameters • no code available • Evaluation sections are weak • only one evaluation metric • no information about how PSO/GA were tuned • no information about how proposed algorithm was tuned • no cross-comparison with other algorithms 66
131. 131. See for yourself • https://github.com/larsga/py-snippets/tree/master/ machine-learning/pso • links to all the papers • code for crap, genetic, pso, ﬁreﬂy, cuckoo • bonus: cuckoo2, server • also has the test functions • Total number of experiments: 24,008 • means evaluating ﬁtness 2,400,800 times 67