Nature-inspired algorithms
Lars Marius Garshol, lars.marius.garshol@schibsted.com
http://twitter.com/larsga
2017–09–14, Oslo
The problem
2
The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
2
The problem
• You need to tune something
• a database
• a search engine
• a machine learning solution
• …
• Getting good results is important
• but there’s lots of values to tune
• the effects of tuning them are hard to predict
2
Database tuning
3Automatic database management system tuning through large-scale machine learning Aken et al. , SIGMOD’17
Find the best pair
4
Effects of MySQL tuning
Config settings
5
How to solve
6
How to solve
1. Formulate the problem clearly
6
How to solve
1. Formulate the problem clearly
2. Measure results properly
6
How to solve
1. Formulate the problem clearly
2. Measure results properly
3. Then
• try to understand the problem in depth, and/or
• let your computer find a good solution
6
Our kind of problem
7
Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
7
Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
• then
• basically you’re trying to find the highest point in a many-
dimensional space
7
Our kind of problem
• If
• all your knobs are numeric and
• you can measure how good a given set of settings is
• then
• basically you’re trying to find the highest point in a many-
dimensional space
• One dimension per knob + 1 dimension for
evaluation function
7
The hill-climbing problem
8
The hill-climbing problem
8
The hill-climbing problem
8
Explore vs
Exploit
A warning!
• Be very, very careful about the evaluation function
• Your algorithm will produce a good value for the
evaluation function
• If the function matches poorly with what you
actually need, you’re going to work hard to produce
something bad …
9
Genetic algorithm
10
Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
Exploit
Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
10
Exploit
Explore
Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
10
Exploit
Explore
Genetic algorithm
• The “original” nature-inspired algorithm (1960s)
• make n random solutions
• evaluate them, throw away the worst, duplicate the best
• add random newcomers
• make random changes, repeat
• Weakness: can’t exploit structure of numeric problems
• no sense of hyperspace
• Strength: can solve non-numeric problems
• can even write code
10
Exploit
Explore
Particle-swarm
optimization (1995)
• A swarm of particles explore the search space
together
• move around semi-randomly
• communicate about what they’ve seen
• particles attracted toward high spots in the landscape
11
PSO, initialization
• Use 10 + int(2 * math.sqrt(dimensions))
particles
• Position each particle randomly
• Give each particle a random velocity
12
PSO, iteration
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
• current position goes toward best position and
13
PSO, iteration
• For each particle, in each dimension, update the
velocity by adding
• old velocity * decay factor
• random factor * (best position - current position)
• random factor * (best neighbour position - current position)
• Velocity tends to decrease as
• current position goes toward best position and
• best and best neighbour position converge
13
Code
def f1(x):
return 1 + math.sin(2 * np.pi * x)
def f2a(x):
return x ** 3 - 2 * x ** 2 + 1 * x - 1
def f(x):
return f1(x) + f2a(x)
swarm = pso.Swarm(
dimensions = [(0.0, 2.0)],
fitness = lambda x: f(x[0]),
particles = 5
)
for ix in range(20):
swarm.iterate()
print swarm.get_best_ever()
14
Implementation
class Particle:
def __init__(self, dimensions, fitness):
self._dimensions = dimensions
self._fitness = fitness
self._vel = [pick_velocity(min, max) for (min, max) in dimensions]
def iterate(self):
for ix in range(len(self._dimensions)):
self._vel[ix] = (
self._vel[ix] * w +
random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) +
random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix])
)
self._pos[ix] += self._vel[ix]
self._constrain(ix)
self._update()
15
Failure
16
Failure
16
Failure
16
Failure
16
Failure
16
Failure
16
Failure
16
Failure
16
Failure
16
Stuck at local maximum
Failure
16
Stuck at local maximum
Failure
16
Stuck at local maximum
Failure
16
Stuck at local maximum
Success
17
Success
17
Success
17
Success
17
Success
17
Success
17
Success
17
Success
17
Success
17
Evaluation
18
1.14104381602
2.0
1.14104055169
1.14104347433
2.0
1.1410413438
2.0
2.0
1.14104338772
2.0
2.0
2.0
2.0
1.14091395333
2.0
2.0
2.0
2.0
2.0
1.54853512969
1.41797678323
1.95856547819
1.96104312889
1.97134000263
1.77291733994
1.89848029636
1.78709716124
1.87083300112
1.94696318783
1.92667787259
1.88582583083
1.99492413725
1.95307959097
1.84557417213
1.93746433855
1.88898766983
1.80740253329
1.99788027783
PSO
Average: 1.7 Average: 1.9
Random
Enough toy problems
Let’s try a real problem
Problem: Find the duplicates
20
ID SOURCE NAME ADDRESS1 ADDRESS2 ADDRESS3 CITY ZIP
9354686 1300001 Augustin
C. Sundts Gate
22-24 Bergen 5004 8007
9316306 1300006 AUGUSTIN C. SUNDSGATE 22 5004 BERGEN NORWAY BGO
9025453 1300010 Augustin Hotel C. Sundts gate 22 Bergen 5004
9151327 1300010 Basic Hotel Bergen Hakonsgaten 27 Bergen 5015
9150992 1300010 Basic Hotel Marken Kong Oscars gate 45 Bergen 5017
9048595 1300010 Basic Hotel Victoria Kong Oscars Gate 29 Bergen 5017
9151853 1300010 Bergen Bed & Breakfast Hennebysmauet 9 Bergen 5005
9316307 1300006 BERGEN TRAVEL VESTRE TORGGATE 7 5015 BERGEN NORWAY BGO
9062459 1300010 Bergen Travel Hotel Vestre Torggaten 7 Bergen 5015
9010488 1300001 Best Western Hordaheimen C. Sundtsgt. 18 5004
9316314 1300006 BEST WESTERN HORDAHEIMEN C. SUNDTSGATE 18 BERGEN NORWAY 5004 BERGEN BGO
9032340 1300010 Best Western Hotell Hordaheimen C. Sundtsgate 18 Bergen 5004
9362760 1300001 Clarion Admiral C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007
9316308 1300006 CLARION ADMIRAL C. SUNDTS GATE 9 5804 BERGEN NORWAY BGO
9364882 1300001 Clarion Admiral (Fjord View) C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007
9010491 1300001 Clarion Bergen Airport Flyplassveien 555 5869
9363104 1300001 Clarion Bergen Airport Flyplassveien 555 Po Box 24 Bergen No-5869
Configure manually
21
PROPERTY COMPARATOR LOW HIGH
Name LongestCommonSubstring 0.35 0.88
Address1 WeightedLevenshtein 0.25 0.65
Address2 Levenshtein 0.5 0.6
Email Exact 0.49 0.51
Phone Exact 0.45 0.65
Geopos Geoposition 0.25 0.6
Region Exact 0.0 0.5
Threshold: 0.74
Dedup with PSO
• A 27-dimension problem
• Really difficult to solve optimally
• Takes a long time to evaluate solutions
• No idea what the best possible solution actually is
22
PSO vs random
23
Green: PSO
Blue: random
Two PSO runs
24
100 PSO runs
25
Average of 100
26
PSO vs genetic
27
Firefly algorithm
28
Xin-She Yang, 2010
Firefly algorithm
• Fireflies are positioned randomly
28
Xin-She Yang, 2010
Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
28
Xin-She Yang, 2010
Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
28
Xin-She Yang, 2010
Firefly algorithm
• Fireflies are positioned randomly
• On each iteration, each firefly jumps toward every other
firefly based on how bright it looks
• that is, the brighter the other firefly appears, the further toward it
our firefly jumps
• it only jumps toward fireflies that are brighter than itself
• Each firefly shines brighter the better its fitness is
• but attractiveness falls off with square of the distance
• Add random jiggling
28
Xin-She Yang, 2010
How it works
• The best firefly always stands still, pulling the
others toward the best result
• Bad fireflies get pulled in all directions, but good
fireflies get pulled much less
• this is exploit vs explore
• Pull diminishes with distance, so the fireflies don’t
necessarily all explore the same best position
• in order to explore more local maxima
29
Firefly code
def iterate(self):
for firefly in self._swarm.get_particles():
if self._val < firefly._val:
dist = self.distance(firefly)
attract = firefly._val / (1 + gamma * (dist ** 2))
for ix in range(len(self._dimensions)):
jiggle = alpha * (random.uniform(0, 1) - 0.5)
diff = firefly._pos[ix] - self._pos[ix]
self._pos[ix] = self._pos[ix] + jiggle + (attract * diff)
self._constrain(ix)
30
Evaluation
31
Unit is: number of fitness evaluations
to find global maximum
(Lower is better)
Taken from Yang 2010
Firefly vs PSO
32
Cuckoo search
• Very similar to genetic algorithm
• take candidate, modify it
• if better than existing candidate, replace
• Every generation, discard some proportion of
candidates
• fill up with random new ones
• The difference is in how new results are produced
• using Lévy flights
33
Yang et al. 2010
Lévy flights
34
How it works
• Balance explore and exploit with Lévy flights
• usually jump short, sometimes jump long
• Never replace good candidates by bad
• but always throw away the n worst
35
Toy problem again
36
Toy problem again
36
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Starts exploring local maximum
Toy problem again
36
Rogue cuckoo strikes gold
Evaluation
37
Comparison
38
My 2015 attempt
39
Morale
• These are stochastic algorithms
• one evaluation doesn’t tell you much about the algorithm
• even 10 evaluations isn’t enough
• Be careful here, or you can fool yourself!
40
It’s not that simple
• Which PSO?
• SPSO 2006, 2007, or 2011?
• What values to use for decay factor and randomization?
• What neighbourhood topology to use?
• Firefly has parameters alpha, beta, and gamma
• Cuckoo has alpha and scale
41
Firefly, different alphas
42
So … in order to tune our algorithm we need to
tune the algorithm that tunes our algorithm
Problems
• Doing one run of Cuckoo search takes ~30 minutes
• Need to do that ~40 times to get a decent estimate
• My laptop was already getting uncomfortably hot
• My wife was complaining about the fan noise
• What to do?
44
In the cloud
45
Master
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
Worker
node
/get-task
/answer
Master algorithm
• Run PSO on Cuckoo alpha & scale
• Fitness function
• sets the task handed out by /get-task
• hangs until 20 evaluations have come in via /answer
• returns average of evaluations
46
Sunday morning
47
Sunday morning
47
First values
I tried
Monday morning
48
First values
I tried
Tuesday morning
49
More algorithms
• Flower-pollination algorithm [Xin-She Yang 2013]
• Bat-inspired algorithm [Xin-She Yang, 2010]
• Ant colony algorithm [Marco Dorigo, 1992]
• Bee algorithm [Pham et al 2005]
• Fish school search [Filho et al 2007]
• Artificial bee colony algorithm [Karaboga et al 2005]
• …
50
Looks interesting,
hard to find algorithm
Paper too vague,
can’t implement
Complicated,
didn’t finish implementation
51
PROPERTY COMPARATOR LOW HIGH
Name LongestCommonSubstring 0.35 0.88
Address1 WeightedLevenshtein 0.25 0.65
Address2 Levenshtein 0.5 0.6
Email Exact 0.49 0.51
Phone Exact 0.45 0.65
Geopos Geoposition 0.25 0.6
Region Exact 0.0 0.5
Threshold: 0.74
But what about these?
The machine learning way
52
PROPERTY COMPARATOR LOW HIGH
Name c1 c1 0.35 0.88
Name c2 c2 0.25 0.65
Name c3 c3 0.5 0.6
Name c4 c4 0.49 0.51
… … 0.45 0.65
Address1 c1 c1 0.25 0.6
Address1 c2 c2 0.0 0.5
Address1 c3 c3 … …
Address1 c4 c4 … …
… … … …
Opens a door
• Means we can drop the probabilities, and just use the
numeric values coming out of the comparators
• Feed into one of
• random forests
• Support Vector Machine (SVM)
• logistic regression
• neural networks
• …
• Except that’s no longer optimization, but attacking the
problem directly, so let’s stick with our algorithms
53
General trick
• Quite common to “cheat” this way to use numeric-
only machine learning algorithms
• Turn boolean into [0, 1] parameter
• Turn enumeration into one boolean per value
• Looks odd, but in general it does work
• of course, now there isn’t much spatial structure any more
54
Tricky, tricky
55
Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
55
Tricky, tricky
• Our problem now has 267 dimensions
• should allow us to tune for really detailed signals
• The curse of dimensionality
• everywhere is pretty much equally far away from
everywhere else
• hyperspace consists of all corners and no middle
• many of the dimensions contain no signal
55
How it went
56
Another test
57
Any number of dimensions, this is 2
Number of local minima is d!
Performance
58
Meta-tuning again
def fitness(pos):
(firefly.alpha, firefly.gamma) = pos
averages = crap.run_experiment(
firefly,
dimensions = [(0.0, math.pi)] * 16,
fitness = function.michaelwicz,
particles = 20,
problem = 'michaelwicz',
quiet = True
)
return crap.average(averages)
dimensions = [(0.0, 1.0), (0.0, 10.0)] # alpha, gamma
swarm = pso.Swarm(dimensions, fitness, 10)
crap.evaluate(swarm, 'meta-michaelwicz-firefly')
59
It’s better…
60
Rosenbrock
61
Griewangk
62
What to choose?
• PSO
• generally performs best
• dead easy to implement
• parameters available in the literature
• no need to scale to coordinate system used
• SPSO 2007
• values for w and c are given
• ring topology: each particle knows p-1, p, p+1
63
Simulated annealing
• Inspired by the behaviour of cooling metals
• guaranteed to eventually find maximum
• no guarantee that time taken will be reasonable
• To use, requires following parameters
• Candidate neighbour generation procedure
• Acceptance probability function
• Annealing schedule
• Initial temperature
• May work better than PSO and friends, but also requires a lot
more effort to set up
64
Choices, choices
• There are always more advanced methods
• sometimes they work much better, sometimes not
• PSO gives you a dead easy place to start
• whip it up in a few lines of code
• see how it works
• vastly better than random trial and error
• adapts nicely to all kinds of problems without tuning
65
Research papers
• Publishing standards are probably too lax
• Algorithm descriptions are weak
• far too little information about tuning parameters
• no code available
• Evaluation sections are weak
• only one evaluation metric
• no information about how PSO/GA were tuned
• no information about how proposed algorithm was tuned
• no cross-comparison with other algorithms
66
See for yourself
• https://github.com/larsga/py-snippets/tree/master/
machine-learning/pso
• links to all the papers
• code for crap, genetic, pso, firefly, cuckoo
• bonus: cuckoo2, server
• also has the test functions
• Total number of experiments: 24,008
• means evaluating fitness 2,400,800 times
67

Nature-inspired algorithms

  • 1.
    Nature-inspired algorithms Lars MariusGarshol, lars.marius.garshol@schibsted.com http://twitter.com/larsga 2017–09–14, Oslo
  • 2.
  • 3.
    The problem • Youneed to tune something • a database • a search engine • a machine learning solution • … 2
  • 4.
    The problem • Youneed to tune something • a database • a search engine • a machine learning solution • … • Getting good results is important • but there’s lots of values to tune • the effects of tuning them are hard to predict 2
  • 5.
    Database tuning 3Automatic databasemanagement system tuning through large-scale machine learning Aken et al. , SIGMOD’17
  • 6.
    Find the bestpair 4 Effects of MySQL tuning
  • 7.
  • 8.
  • 9.
    How to solve 1.Formulate the problem clearly 6
  • 10.
    How to solve 1.Formulate the problem clearly 2. Measure results properly 6
  • 11.
    How to solve 1.Formulate the problem clearly 2. Measure results properly 3. Then • try to understand the problem in depth, and/or • let your computer find a good solution 6
  • 12.
    Our kind ofproblem 7
  • 13.
    Our kind ofproblem • If • all your knobs are numeric and • you can measure how good a given set of settings is 7
  • 14.
    Our kind ofproblem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to find the highest point in a many- dimensional space 7
  • 15.
    Our kind ofproblem • If • all your knobs are numeric and • you can measure how good a given set of settings is • then • basically you’re trying to find the highest point in a many- dimensional space • One dimension per knob + 1 dimension for evaluation function 7
  • 16.
  • 17.
  • 18.
  • 19.
    A warning! • Bevery, very careful about the evaluation function • Your algorithm will produce a good value for the evaluation function • If the function matches poorly with what you actually need, you’re going to work hard to produce something bad … 9
  • 20.
  • 21.
    Genetic algorithm • The“original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10
  • 22.
    Genetic algorithm • The“original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit
  • 23.
    Genetic algorithm • The“original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat 10 Exploit Explore
  • 24.
    Genetic algorithm • The“original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace 10 Exploit Explore
  • 25.
    Genetic algorithm • The“original” nature-inspired algorithm (1960s) • make n random solutions • evaluate them, throw away the worst, duplicate the best • add random newcomers • make random changes, repeat • Weakness: can’t exploit structure of numeric problems • no sense of hyperspace • Strength: can solve non-numeric problems • can even write code 10 Exploit Explore
  • 26.
    Particle-swarm optimization (1995) • Aswarm of particles explore the search space together • move around semi-randomly • communicate about what they’ve seen • particles attracted toward high spots in the landscape 11
  • 27.
    PSO, initialization • Use10 + int(2 * math.sqrt(dimensions)) particles • Position each particle randomly • Give each particle a random velocity 12
  • 28.
  • 29.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding 13
  • 30.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor 13
  • 31.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) 13
  • 32.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) 13
  • 33.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as 13
  • 34.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and 13
  • 35.
    PSO, iteration • Foreach particle, in each dimension, update the velocity by adding • old velocity * decay factor • random factor * (best position - current position) • random factor * (best neighbour position - current position) • Velocity tends to decrease as • current position goes toward best position and • best and best neighbour position converge 13
  • 36.
    Code def f1(x): return 1+ math.sin(2 * np.pi * x) def f2a(x): return x ** 3 - 2 * x ** 2 + 1 * x - 1 def f(x): return f1(x) + f2a(x) swarm = pso.Swarm( dimensions = [(0.0, 2.0)], fitness = lambda x: f(x[0]), particles = 5 ) for ix in range(20): swarm.iterate() print swarm.get_best_ever() 14
  • 37.
    Implementation class Particle: def __init__(self,dimensions, fitness): self._dimensions = dimensions self._fitness = fitness self._vel = [pick_velocity(min, max) for (min, max) in dimensions] def iterate(self): for ix in range(len(self._dimensions)): self._vel[ix] = ( self._vel[ix] * w + random.uniform(0, c) * (self._prev_best_pos[ix] - self._pos[ix]) + random.uniform(0, c) * (self.neighbourhood_best(ix) - self._pos[ix]) ) self._pos[ix] += self._vel[ix] self._constrain(ix) self._update() 15
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
    Enough toy problems Let’stry a real problem
  • 61.
    Problem: Find theduplicates 20 ID SOURCE NAME ADDRESS1 ADDRESS2 ADDRESS3 CITY ZIP 9354686 1300001 Augustin C. Sundts Gate 22-24 Bergen 5004 8007 9316306 1300006 AUGUSTIN C. SUNDSGATE 22 5004 BERGEN NORWAY BGO 9025453 1300010 Augustin Hotel C. Sundts gate 22 Bergen 5004 9151327 1300010 Basic Hotel Bergen Hakonsgaten 27 Bergen 5015 9150992 1300010 Basic Hotel Marken Kong Oscars gate 45 Bergen 5017 9048595 1300010 Basic Hotel Victoria Kong Oscars Gate 29 Bergen 5017 9151853 1300010 Bergen Bed & Breakfast Hennebysmauet 9 Bergen 5005 9316307 1300006 BERGEN TRAVEL VESTRE TORGGATE 7 5015 BERGEN NORWAY BGO 9062459 1300010 Bergen Travel Hotel Vestre Torggaten 7 Bergen 5015 9010488 1300001 Best Western Hordaheimen C. Sundtsgt. 18 5004 9316314 1300006 BEST WESTERN HORDAHEIMEN C. SUNDTSGATE 18 BERGEN NORWAY 5004 BERGEN BGO 9032340 1300010 Best Western Hotell Hordaheimen C. Sundtsgate 18 Bergen 5004 9362760 1300001 Clarion Admiral C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9316308 1300006 CLARION ADMIRAL C. SUNDTS GATE 9 5804 BERGEN NORWAY BGO 9364882 1300001 Clarion Admiral (Fjord View) C. Sundts Gate 9 P.o.box 252 Bergen 5004 8007 9010491 1300001 Clarion Bergen Airport Flyplassveien 555 5869 9363104 1300001 Clarion Bergen Airport Flyplassveien 555 Po Box 24 Bergen No-5869
  • 62.
    Configure manually 21 PROPERTY COMPARATORLOW HIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74
  • 63.
    Dedup with PSO •A 27-dimension problem • Really difficult to solve optimally • Takes a long time to evaluate solutions • No idea what the best possible solution actually is 22
  • 64.
    PSO vs random 23 Green:PSO Blue: random
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
    Firefly algorithm • Firefliesare positioned randomly 28 Xin-She Yang, 2010
  • 71.
    Firefly algorithm • Firefliesare positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself 28 Xin-She Yang, 2010
  • 72.
    Firefly algorithm • Firefliesare positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself • Each firefly shines brighter the better its fitness is • but attractiveness falls off with square of the distance 28 Xin-She Yang, 2010
  • 73.
    Firefly algorithm • Firefliesare positioned randomly • On each iteration, each firefly jumps toward every other firefly based on how bright it looks • that is, the brighter the other firefly appears, the further toward it our firefly jumps • it only jumps toward fireflies that are brighter than itself • Each firefly shines brighter the better its fitness is • but attractiveness falls off with square of the distance • Add random jiggling 28 Xin-She Yang, 2010
  • 74.
    How it works •The best firefly always stands still, pulling the others toward the best result • Bad fireflies get pulled in all directions, but good fireflies get pulled much less • this is exploit vs explore • Pull diminishes with distance, so the fireflies don’t necessarily all explore the same best position • in order to explore more local maxima 29
  • 75.
    Firefly code def iterate(self): forfirefly in self._swarm.get_particles(): if self._val < firefly._val: dist = self.distance(firefly) attract = firefly._val / (1 + gamma * (dist ** 2)) for ix in range(len(self._dimensions)): jiggle = alpha * (random.uniform(0, 1) - 0.5) diff = firefly._pos[ix] - self._pos[ix] self._pos[ix] = self._pos[ix] + jiggle + (attract * diff) self._constrain(ix) 30
  • 76.
    Evaluation 31 Unit is: numberof fitness evaluations to find global maximum (Lower is better) Taken from Yang 2010
  • 77.
  • 78.
    Cuckoo search • Verysimilar to genetic algorithm • take candidate, modify it • if better than existing candidate, replace • Every generation, discard some proportion of candidates • fill up with random new ones • The difference is in how new results are produced • using Lévy flights 33 Yang et al. 2010
  • 79.
  • 80.
    How it works •Balance explore and exploit with Lévy flights • usually jump short, sometimes jump long • Never replace good candidates by bad • but always throw away the n worst 35
  • 81.
  • 82.
  • 83.
  • 84.
    Starts exploring localmaximum Toy problem again 36
  • 85.
    Starts exploring localmaximum Toy problem again 36
  • 86.
    Starts exploring localmaximum Toy problem again 36
  • 87.
    Starts exploring localmaximum Toy problem again 36
  • 88.
    Starts exploring localmaximum Toy problem again 36
  • 89.
    Starts exploring localmaximum Toy problem again 36
  • 90.
    Starts exploring localmaximum Toy problem again 36
  • 91.
    Starts exploring localmaximum Toy problem again 36
  • 92.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 93.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 94.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 95.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 96.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 97.
    Starts exploring localmaximum Toy problem again 36 Rogue cuckoo strikes gold
  • 98.
  • 99.
  • 100.
  • 101.
    Morale • These arestochastic algorithms • one evaluation doesn’t tell you much about the algorithm • even 10 evaluations isn’t enough • Be careful here, or you can fool yourself! 40
  • 102.
    It’s not thatsimple • Which PSO? • SPSO 2006, 2007, or 2011? • What values to use for decay factor and randomization? • What neighbourhood topology to use? • Firefly has parameters alpha, beta, and gamma • Cuckoo has alpha and scale 41
  • 103.
  • 104.
    So … inorder to tune our algorithm we need to tune the algorithm that tunes our algorithm
  • 105.
    Problems • Doing onerun of Cuckoo search takes ~30 minutes • Need to do that ~40 times to get a decent estimate • My laptop was already getting uncomfortably hot • My wife was complaining about the fan noise • What to do? 44
  • 106.
  • 107.
    Master algorithm • RunPSO on Cuckoo alpha & scale • Fitness function • sets the task handed out by /get-task • hangs until 20 evaluations have come in via /answer • returns average of evaluations 46
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
    More algorithms • Flower-pollinationalgorithm [Xin-She Yang 2013] • Bat-inspired algorithm [Xin-She Yang, 2010] • Ant colony algorithm [Marco Dorigo, 1992] • Bee algorithm [Pham et al 2005] • Fish school search [Filho et al 2007] • Artificial bee colony algorithm [Karaboga et al 2005] • … 50 Looks interesting, hard to find algorithm Paper too vague, can’t implement Complicated, didn’t finish implementation
  • 113.
    51 PROPERTY COMPARATOR LOWHIGH Name LongestCommonSubstring 0.35 0.88 Address1 WeightedLevenshtein 0.25 0.65 Address2 Levenshtein 0.5 0.6 Email Exact 0.49 0.51 Phone Exact 0.45 0.65 Geopos Geoposition 0.25 0.6 Region Exact 0.0 0.5 Threshold: 0.74 But what about these?
  • 114.
    The machine learningway 52 PROPERTY COMPARATOR LOW HIGH Name c1 c1 0.35 0.88 Name c2 c2 0.25 0.65 Name c3 c3 0.5 0.6 Name c4 c4 0.49 0.51 … … 0.45 0.65 Address1 c1 c1 0.25 0.6 Address1 c2 c2 0.0 0.5 Address1 c3 c3 … … Address1 c4 c4 … … … … … …
  • 115.
    Opens a door •Means we can drop the probabilities, and just use the numeric values coming out of the comparators • Feed into one of • random forests • Support Vector Machine (SVM) • logistic regression • neural networks • … • Except that’s no longer optimization, but attacking the problem directly, so let’s stick with our algorithms 53
  • 116.
    General trick • Quitecommon to “cheat” this way to use numeric- only machine learning algorithms • Turn boolean into [0, 1] parameter • Turn enumeration into one boolean per value • Looks odd, but in general it does work • of course, now there isn’t much spatial structure any more 54
  • 117.
  • 118.
    Tricky, tricky • Ourproblem now has 267 dimensions • should allow us to tune for really detailed signals 55
  • 119.
    Tricky, tricky • Ourproblem now has 267 dimensions • should allow us to tune for really detailed signals • The curse of dimensionality • everywhere is pretty much equally far away from everywhere else • hyperspace consists of all corners and no middle • many of the dimensions contain no signal 55
  • 120.
  • 121.
    Another test 57 Any numberof dimensions, this is 2 Number of local minima is d!
  • 122.
  • 123.
    Meta-tuning again def fitness(pos): (firefly.alpha,firefly.gamma) = pos averages = crap.run_experiment( firefly, dimensions = [(0.0, math.pi)] * 16, fitness = function.michaelwicz, particles = 20, problem = 'michaelwicz', quiet = True ) return crap.average(averages) dimensions = [(0.0, 1.0), (0.0, 10.0)] # alpha, gamma swarm = pso.Swarm(dimensions, fitness, 10) crap.evaluate(swarm, 'meta-michaelwicz-firefly') 59
  • 124.
  • 125.
  • 126.
  • 127.
    What to choose? •PSO • generally performs best • dead easy to implement • parameters available in the literature • no need to scale to coordinate system used • SPSO 2007 • values for w and c are given • ring topology: each particle knows p-1, p, p+1 63
  • 128.
    Simulated annealing • Inspiredby the behaviour of cooling metals • guaranteed to eventually find maximum • no guarantee that time taken will be reasonable • To use, requires following parameters • Candidate neighbour generation procedure • Acceptance probability function • Annealing schedule • Initial temperature • May work better than PSO and friends, but also requires a lot more effort to set up 64
  • 129.
    Choices, choices • Thereare always more advanced methods • sometimes they work much better, sometimes not • PSO gives you a dead easy place to start • whip it up in a few lines of code • see how it works • vastly better than random trial and error • adapts nicely to all kinds of problems without tuning 65
  • 130.
    Research papers • Publishingstandards are probably too lax • Algorithm descriptions are weak • far too little information about tuning parameters • no code available • Evaluation sections are weak • only one evaluation metric • no information about how PSO/GA were tuned • no information about how proposed algorithm was tuned • no cross-comparison with other algorithms 66
  • 131.
    See for yourself •https://github.com/larsga/py-snippets/tree/master/ machine-learning/pso • links to all the papers • code for crap, genetic, pso, firefly, cuckoo • bonus: cuckoo2, server • also has the test functions • Total number of experiments: 24,008 • means evaluating fitness 2,400,800 times 67