Nature-inspired algorithms

Nature-inspired algorithms
Lars Marius Garshol, lars.marius.garshol@schibsted.com
http://twitter.com/larsga
2017–09–14, Oslo

The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
2

The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
• Getting good results is important
• but there’s lots of values to tune
• the effects of tuning them are hard to predict
2

Database tuning
3Automatic database management system tuning through large-scale machine learning Aken et al. , SIGMOD’17

Find the best pair
4
Effects of MySQL tuning

How to solve
1. Formulate the problem clearly
6

How to solve
2. Measure results properly
6

How to solve
2. Measure results properly
3. Then
• try to understand the problem in depth, and/or
• let your computer ﬁnd a good solution
6

Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
7

Our kind of problem
• If
• then
• basically you’re trying to ﬁnd the highest point in a many-
dimensional space
7

Our kind of problem
• If
• then
• basically you’re trying to ﬁnd the highest point in a many-
dimensional space
• One dimension per knob + 1 dimension for
evaluation function
7

The hill-climbing problem
8
Explore vs
Exploit

A warning!
• Be very, very careful about the evaluation function
• Your algorithm will produce a good value for the
evaluation function
• If the function matches poorly with what you
actually need, you’re going to work hard to produce
something bad …
9

Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10

Genetic algorithm
10
Exploit

Genetic algorithm
10
Exploit
Explore

Genetic algorithm
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
10
Exploit
Explore

Genetic algorithm
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
• Strength: can solve non-numeric problems
• can even write code
10
Exploit
Explore

Particle-swarm
optimization (1995)
• A swarm of particles explore the search space
together
• move around semi-randomly
• communicate about what they’ve seen
• particles attracted toward high spots in the landscape
11

PSO, initialization
• Use 10 + int(2 * math.sqrt(dimensions))
particles
• Position each particle randomly
• Give each particle a random velocity
12

PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
13

PSO, iteration
velocity by adding
• old velocity * decay factor
13

PSO, iteration
velocity by adding
• random factor * (best position - current position)
13

PSO, iteration
velocity by adding
• random factor * (best neighbour position - current position)
13

PSO, iteration
velocity by adding
• Velocity tends to decrease as
13

PSO, iteration
velocity by adding
• current position goes toward best position and
13

PSO, iteration
velocity by adding
• current position goes toward best position and
• best and best neighbour position converge
13

Code
def f1(x):
return 1 + math.sin(2 * np.pi * x)
def f2a(x):
return x ** 3 - 2 * x ** 2 + 1 * x - 1
def f(x):
return f1(x) + f2a(x)
swarm = pso.Swarm(
dimensions = [(0.0, 2.0)],
fitness = lambda x: f(x[0]),
particles = 5
)
for ix in range(20):
swarm.iterate()
print swarm.get_best_ever()
14

Implementation
class Particle:
def __init__(self, dimensions, fitness):
self._dimensions = dimensions
self._fitness = fitness
self._vel = [pick_velocity(min, max) for (min, max) in dimensions]
def iterate(self):
for ix in range(len(self._dimensions)):
self._vel[ix] = (
self._vel[ix] * w +
random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) +
random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix])
)
self._pos[ix] += self._vel[ix]
self._constrain(ix)
self._update()
15

Failure
16
Stuck at local maximum

Evaluation
18
1.14104381602
2.0
1.14104055169
1.14104347433
2.0
1.1410413438
2.0
2.0
1.14104338772
2.0
2.0
2.0
2.0
1.14091395333
2.0
2.0
2.0
2.0
2.0
1.54853512969
1.41797678323
1.95856547819
1.96104312889
1.97134000263
1.77291733994
1.89848029636
1.78709716124
1.87083300112
1.94696318783
1.92667787259
1.88582583083
1.99492413725
1.95307959097
1.84557417213
1.93746433855
1.88898766983
1.80740253329
1.99788027783
PSO
Average: 1.7 Average: 1.9
Random

Enough toy problems
Let’s try a real problem

Problem: Find the duplicates
20
ID SOURCE NAME ADDRESS1 ADDRESS2 ADDRESS3 CITY ZIP
9354686 1300001 Augustin
C. Sundts Gate
22-24 Bergen 5004 8007
9316306 1300006 AUGUSTIN C. SUNDSGATE 22 5004 BERGEN NORWAY BGO
9025453 1300010 Augustin Hotel C. Sundts gate 22 Bergen 5004
9151327 1300010 Basic Hotel Bergen Hakonsgaten 27 Bergen 5015
9150992 1300010 Basic Hotel Marken Kong Oscars gate 45 Bergen 5017
9048595 1300010 Basic Hotel Victoria Kong Oscars Gate 29 Bergen 5017
9151853 1300010 Bergen Bed & Breakfast Hennebysmauet 9 Bergen 5005
9316307 1300006 BERGEN TRAVEL VESTRE TORGGATE 7 5015 BERGEN NORWAY BGO
9062459 1300010 Bergen Travel Hotel Vestre Torggaten 7 Bergen 5015
9010488 1300001 Best Western Hordaheimen C. Sundtsgt. 18 5004
9316314 1300006 BEST WESTERN HORDAHEIMEN C. SUNDTSGATE 18 BERGEN NORWAY 5004 BERGEN BGO
9032340 1300010 Best Western Hotell Hordaheimen C. Sundtsgate 18 Bergen 5004
9362760 1300001 Clarion Admiral C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007
9316308 1300006 CLARION ADMIRAL C. SUNDTS GATE 9 5804 BERGEN NORWAY BGO
9364882 1300001 Clarion Admiral (Fjord View) C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007
9010491 1300001 Clarion Bergen Airport Flyplassveien 555 5869
9363104 1300001 Clarion Bergen Airport Flyplassveien 555 Po Box 24 Bergen No-5869

Conﬁgure manually
21
PROPERTY COMPARATOR LOW HIGH
Name LongestCommonSubstring 0.35 0.88
Address1 WeightedLevenshtein 0.25 0.65
Address2 Levenshtein 0.5 0.6
Email Exact 0.49 0.51
Phone Exact 0.45 0.65
Geopos Geoposition 0.25 0.6
Region Exact 0.0 0.5
Threshold: 0.74

Dedup with PSO
• A 27-dimension problem
• Really difﬁcult to solve optimally
• Takes a long time to evaluate solutions
• No idea what the best possible solution actually is
22

PSO vs random
23
Green: PSO
Blue: random

Fireﬂy algorithm
28
Xin-She Yang, 2010

Fireﬂy algorithm
• Fireﬂies are positioned randomly
28
Xin-She Yang, 2010

Firefly algorithm
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
28
Xin-She Yang, 2010

Firefly algorithm
our firefly jumps
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
28
Xin-She Yang, 2010

Firefly algorithm
our firefly jumps
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
• Add random jiggling
28
Xin-She Yang, 2010

How it works
• The best firefly always stands still, pulling the
others toward the best result
• Bad fireflies get pulled in all directions, but good
fireflies get pulled much less
• this is exploit vs explore
• Pull diminishes with distance, so the fireflies don’t
necessarily all explore the same best position
• in order to explore more local maxima
29

Fireﬂy code
def iterate(self):
for firefly in self._swarm.get_particles():
if self._val < firefly._val:
dist = self.distance(firefly)
attract = firefly._val / (1 + gamma * (dist ** 2))
for ix in range(len(self._dimensions)):
jiggle = alpha * (random.uniform(0, 1) - 0.5)
diff = firefly._pos[ix] - self._pos[ix]
self._pos[ix] = self._pos[ix] + jiggle + (attract * diff)
self._constrain(ix)
30

Evaluation
31
Unit is: number of ﬁtness evaluations
to ﬁnd global maximum
(Lower is better)
Taken from Yang 2010

Cuckoo search
• Very similar to genetic algorithm
• take candidate, modify it
• if better than existing candidate, replace
• Every generation, discard some proportion of
candidates
• ﬁll up with random new ones
• The difference is in how new results are produced
• using Lévy ﬂights
33
Yang et al. 2010

How it works
• Balance explore and exploit with Lévy ﬂights
• usually jump short, sometimes jump long
• Never replace good candidates by bad
• but always throw away the n worst
35

Starts exploring local maximum
Toy problem again
36

Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold

Morale
• These are stochastic algorithms
• one evaluation doesn’t tell you much about the algorithm
• even 10 evaluations isn’t enough
• Be careful here, or you can fool yourself!
40

It’s not that simple
• Which PSO?
• SPSO 2006, 2007, or 2011?
• What values to use for decay factor and randomization?
• What neighbourhood topology to use?
• Fireﬂy has parameters alpha, beta, and gamma
• Cuckoo has alpha and scale
41

So … in order to tune our algorithm we need to
tune the algorithm that tunes our algorithm

Problems
• Doing one run of Cuckoo search takes ~30 minutes
• Need to do that ~40 times to get a decent estimate
• My laptop was already getting uncomfortably hot
• My wife was complaining about the fan noise
• What to do?
44

In the cloud
45
Master
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
/get-task
/answer

Master algorithm
• Run PSO on Cuckoo alpha & scale
• Fitness function
• sets the task handed out by /get-task
• hangs until 20 evaluations have come in via /answer
• returns average of evaluations
46

Sunday morning
47
First values
I tried

Monday morning
48
First values
I tried

More algorithms
• Flower-pollination algorithm [Xin-She Yang 2013]
• Bat-inspired algorithm [Xin-She Yang, 2010]
• Ant colony algorithm [Marco Dorigo, 1992]
• Bee algorithm [Pham et al 2005]
• Fish school search [Filho et al 2007]
• Artificial bee colony algorithm [Karaboga et al 2005]
• …
50
Looks interesting,
hard to find algorithm
Paper too vague,
can’t implement
Complicated,
didn’t finish implementation

51
Name LongestCommonSubstring 0.35 0.88
Address1 WeightedLevenshtein 0.25 0.65
Address2 Levenshtein 0.5 0.6
Email Exact 0.49 0.51
Phone Exact 0.45 0.65
Geopos Geoposition 0.25 0.6
Region Exact 0.0 0.5
Threshold: 0.74
But what about these?

The machine learning way
52
Name c1 c1 0.35 0.88
Name c2 c2 0.25 0.65
Name c3 c3 0.5 0.6
Name c4 c4 0.49 0.51
… … 0.45 0.65
Address1 c1 c1 0.25 0.6
Address1 c2 c2 0.0 0.5
Address1 c3 c3 … …
Address1 c4 c4 … …
… … … …

Opens a door
• Means we can drop the probabilities, and just use the
numeric values coming out of the comparators
• Feed into one of
• random forests
• Support Vector Machine (SVM)
• logistic regression
• neural networks
• …
• Except that’s no longer optimization, but attacking the
problem directly, so let’s stick with our algorithms
53

General trick
• Quite common to “cheat” this way to use numeric-
only machine learning algorithms
• Turn boolean into [0, 1] parameter
• Turn enumeration into one boolean per value
• Looks odd, but in general it does work
• of course, now there isn’t much spatial structure any more
54

Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
55

Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
• The curse of dimensionality
• everywhere is pretty much equally far away from
everywhere else
• hyperspace consists of all corners and no middle
• many of the dimensions contain no signal
55

Another test
57
Any number of dimensions, this is 2
Number of local minima is d!

Meta-tuning again
def fitness(pos):
(firefly.alpha, firefly.gamma) = pos
averages = crap.run_experiment(
firefly,
dimensions = [(0.0, math.pi)] * 16,
fitness = function.michaelwicz,
particles = 20,
problem = 'michaelwicz',
quiet = True
)
return crap.average(averages)
dimensions = [(0.0, 1.0), (0.0, 10.0)] # alpha, gamma
swarm = pso.Swarm(dimensions, fitness, 10)
crap.evaluate(swarm, 'meta-michaelwicz-firefly')
59

What to choose?
• PSO
• generally performs best
• dead easy to implement
• parameters available in the literature
• no need to scale to coordinate system used
• SPSO 2007
• values for w and c are given
• ring topology: each particle knows p-1, p, p+1
63

Simulated annealing
• Inspired by the behaviour of cooling metals
• guaranteed to eventually ﬁnd maximum
• no guarantee that time taken will be reasonable
• To use, requires following parameters
• Candidate neighbour generation procedure
• Acceptance probability function
• Annealing schedule
• Initial temperature
• May work better than PSO and friends, but also requires a lot
more effort to set up
64

Choices, choices
• There are always more advanced methods
• sometimes they work much better, sometimes not
• PSO gives you a dead easy place to start
• whip it up in a few lines of code
• see how it works
• vastly better than random trial and error
• adapts nicely to all kinds of problems without tuning
65

Research papers
• Publishing standards are probably too lax
• Algorithm descriptions are weak
• far too little information about tuning parameters
• no code available
• Evaluation sections are weak
• only one evaluation metric
• no information about how PSO/GA were tuned
• no information about how proposed algorithm was tuned
• no cross-comparison with other algorithms
66

See for yourself
• https://github.com/larsga/py-snippets/tree/master/
machine-learning/pso
• links to all the papers
• code for crap, genetic, pso, firefly, cuckoo
• bonus: cuckoo2, server
• also has the test functions
• Total number of experiments: 24,008
• means evaluating fitness 2,400,800 times
67

Nature-inspired algorithms

More Related Content

What's hot

Similar to Nature-inspired algorithms

More from Lars Marius Garshol

Recently uploaded

Nature-inspired algorithms