Derivative free optimization.
Jeremy Rapin, Tristan Cazenave, Camille Couprie, Pauline Luc, Maxime Oquab, , Baptiste Roziere, Olivier Teytaud
Facebook AI Research
(+thanks many people!)
2019
10
Methods
Evolution, Bayesian
optimization, genetic,
sequential quadratic
programming…
Examples
Let us save the world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
• 1 2 3 4
Outline
1
11
What is derivative free optimization ?
(Numerical) optimization is about finding the (argument) minimum of f.
It’s optimization without derivatives.
Maybe you have learnt Newton, BFGS, etc ? These algorithms need the gradient of f.
It’s finding (approx.) argmin f without knowing gradient(f), just with a black-box x à f(x).
It’s finding f* such that for almost all x, f(x) >= f(f*).
1
Evolutionary programming
Mathematical programming
(cobyla, sqp…)
Design of experiments
Bayesian Optimization (EGO)
12
Methods
Evolution, Bayesian
optimization, genetic,
sequential quadratic
programming…
Examples
Let us save the
world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
2 3 4
Outline
1
DFO for games ?
- Partial obs.: ok
- No gradient: ok
- Multi-agent ok
- Collaborative or not
- Two-levels ok (e.g. each agent has its
training, as in PBT)
2019
Derivative-free optimization
Transition
function
Agent 1
Agent 2
Agent 3
State
Observ. 1
Observ. 2
Observ. 3
14
Advantages of DFO
• Fog of war (partial information) taken into account
• Bounded rationality taken into account
• No gradient ok.
• Noise ok.
• Simple, transparent: small code contains all the
information for justifying the decisions
è you know why power plant X is built (transparency++).
Examples:
• Power systems
• Roads
• Irrigation
• Games
4
Several agents
èno problem AskTellRecom.
No gradient
èno problem for DFO.
èCompared to Bellman-style methods
or model-predictive control: far less
assumptions. No parameter such as
operational/tactical/strategic horizon
15
Serious game in power systems:
Each power plant is maximizing its income.
Which law do we need for making this ~ equivalent to global
maximization ? No details here, but a real problem.
4
Several agents
èno problem AskTellRecom.
No gradient
èno problem for DFO.
èCompared to Bellman-style methods
or model-predictive control: far less
assumptions. No parameter such as
operational/tactical/strategic horizon
21
Methods
Evolution, Bayesian
optimization, genetic,
sequential quadratic
programming…
Examples
Let us save the
world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
1 2 3 4
Outline
22Methods
Evolution, Bayesian optimization, genetic, sequential quadratic programming…
Let us discuss evolution strategies!
3
23
Evolution strategies
(1+1)-ES:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
3
24
Problem: close to the optimum, we might want to reduce σ.
(1+1)-ES with one-fifth rule:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
σ(n+1) = 2 σ(n)
else:
σ(n+1) = 0.84 σ(n)
σ very big è success rate goes to what ?
σ very small è success rate what ?, but slow
progress.
Equilibrium when P(success) = What ?
because 0.84^4 == 1 / 2
25
Problem: close to the optimum, we might want to reduce σ.
(1+1)-ES with one-fifth rule:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
σ(n+1) = 2 σ(n)
else:
σ(n+1) = 0.84 σ(n)
3
σ very big è success rate goes to 0.
σ very small è success rate ½, but
slow progress.
Equilibrium when P(success) = 1/5
because 0.84^4 == 1 / 2
26
Problem: we might want to be parallel! Evaluate λ individuals
simultaneously ?
(µ/µ, λ)-ES with self-adaptation:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
for i in {1, 2, 3, …, λ}
σ(n,i) = σ(n) x exp(1D-Gaussian)
x’(i) = x(n) + σ(n,i) x 2D-Gaussian
pick up the µ best x’(i) and their σ(n,i)
x(n+1) = average of those µ best
σ(n+1) = exp( average of these log σ(n,i))
3
27
Problem: isotropic mutations! We want to mutate some variables more
than others. E.g. f(x) = 100(x(1)-7)2 + x(2)2
(µ/µ, λ)-ES with anisotropic self-adaptation:
x(0) = (0, 0)
σ(0) = (1,1)
for n in {0, 1, 2, 3, …}
for i in {1, 2, 3, …, λ}
σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian)
x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian
pick up the µ best x’(i) and their σ(n,i)
x(n+1) = average of those µ best
σ(n+1) = exp( average of these log σ(n,i) )
3
28
Problem: what if there is noise!
Like, my oven is not completely deterministic.
I want oven(temperature) to be excellent on average.
Problem: classical algorithms stagnate!
(µ/µ, λ)-ES with anisotropic self-adaptation and population control:
x(0) = (0, 0)
σ(0) = (1,1)
for n in {0, 1, 2, 3, …}
for i in {1, 2, 3, …, λ}
σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian)
x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian
pick up the µ best x’(i) and their σ(n,i)
x(n+1) = average of those µ best
σ(n+1) = exp( average of these log σ(n,i) )
if average of population is signif. better than
average of the population 5 iterations earlier,
then decrease λ; otherwise increase λ.
3
29
Problem: what if it’s discrete ? Like, we optimize in
{0,1}10.
Discrete-(1+1)ES
x(0) = (0,0,0,…..,0,0,0,0)
for n in {0, 1, 2, 3, …}
x’ = mutation(x(n))
if x’ better than x(n):
x(n+1) = x’
3
Mutation:
RLS: randomly mutate one of the variables.
(1+1)classical: each variable is mutated with
probability 1/10 (if no mutation, then discard/redraw)
30
Problem: what if mutating a small number of variables does
not work ? E.g. f(x) = bad value as long as we do not have at
least half bits equal to 1.
Discrete-(1+1)ES
x(0) = (0,0,0,…..,0,0,0,0)
for n in {0, 1, 2, 3, …}
x’ = mutation(x(n))
if x’ better than x(n):
x(n+1) = x’
3
Mutation:
RLS: randomly mutate one of the variables.
(1+1)classical: each variable is mutated with
probability 1/10
(1+1)Mixing of mutation rates: randomly draw
the probability p of mutation in (0,1), then each
variable is mutated with probability p.
Duc-Cuong Dang
& Per Christian Lehre:
Uniform mixing of mutation rates
Carola Doerr and Benjamin Doerr:
Parameter control
31
Convergence rates ?
Self-adaptive algorithms, case f(x) = ||x||:
• (x(n)/sigma(n)) is an homogeneous Markov chain
• It has a stationary distribution (non trivial)
• E log ||x(n+1)|| - log ||x(n) || converges to a constant
• log||x(n)||/n converges to a constant
3
log||x(n)||
n
Auger, TCS
32
Convergence rates in the parallel case ?
Does the slope increase ?
• Slope is linear in λ if λ is smaller than the dimension
• Then it becomes logarithmic
3
Slope=
- log||x(n)||/n
λ
Linear speedup
dimension
Beyer, book
Fournier et al, PPSN & Algorithmica.
33
Convergence rates in the noisy case ?
Known for some algorithms (not the most usual ones…):
E log ||x(n)|| / log(n) è -1/2 for some robust ”repeat” algorithms
E log ||x(n)|| / log(n) è -1 for population control and
some other methods under some assumptions
3
log||x(n)||
n
Astete-Morales et al, Foga
Cauwet et al, Foga
344
Derivative-free methods
1.Random search: randomly draw 1 000 000 points, and pick up the best.
2.Estimation of Distribution Algorithm: while (budget not elapsed) randomly draw 1000 points,
select the 250 best, define a Gaussian matching those 250 points, repeat until budget elapsed.
3.Particle swarm optimization: define 50 particles in the domain, with random velocities.
Particles are attracted by their best visited point, and by the best point for the entire
population, and receive random noise.
4. Quasi-Random Search: similar to random search, but try to have a better positioning of
points, using low-discrepancy sequences.
… and so many others!
354
Derivative-free methods
1.CMA: extending anisotropic mutations to rotated ellipsoids
2.Differential Evolution: also rotated anisotropic mutations, with cheap computational cost
3. Lamarckism, i.e. inheritance of acquired characteristics:
a) Part of the state if evolution-optimized
b) The rest evolves during the evaluation (e.g. neural networks training)
èAs if your child could inherit your post-workout muscles or your trained neural net
è remember the inheritance of the immune system ?
36
Methods
Evolution, Bayesian
optimization, genetic,
sequential quadratic
programming…
Examples
Let us save the
world.
Is it useful ?
Yes.
What is derivative
free optimization ?
It’s optimization without
derivatives.
.
1 2 3 4
Outline
37
Examples
Let us save the world.
4
38
Let us save the world.
What is the API of our optimization algorithms ?
• Dear optimizer, I’m asking you which point I should evaluate è
ASK.
• Dear optimizer, at point (3.1, 2.0), I tell you that the quality is 47
è TELL.
• Dear optimizer, I am tired now, please recommend a good point
è RECOMMEND.
è Asynchronous ok
è Multiagent ok.
è Noisy ok.
4
39
Optimize the hyperparameters
of machine learning algorithms or
the weights of a NN.
4
Much better than random search for
hyperparametrizing video prediction!
Much better than random search for
hyperparametrizing image generation!
Population control cool for neuro-playing
007!
40
Example (simplified): power systems.
• Time steps 1, 2, 3, …, 365 x 10 è 10 years.
• For i in {1,2,3,…,12}:
• One stock Si of water for hydroelectricity.
• A random process Ri adding water in Si.
• A random process C “electricity consumption” for each time step.
• A market converting the productions and C into cost per MWh.
• An agent Ai converting water from Si into electricity Ei.
Agent Ai = mapping (X, C, S1, S2, S3, S4, …, S12) to a real number.
This mapping has parameters. For example a neural network.
Simulator(parameters of agents):
- For each time step
- For each stock Si, update it with Ri, and Ai(params) converts some water to elect.
- Quality for society = ecological / economical penalty associated to the production (includes
some agents).
- Quality for independent agents = money earned.
- è one optimizer per independent agent and one optimizer for others + society.
4
Several agents
èno problem AskTellRecom.
No gradient
èno problem for DFO.
èCompared to Bellman-style methods:
far less assumptions. No parameter
such as operationa/tactical/strategic
horizon
41
Nevergrad: super easy to use! Needs Python >= 3.6.
Installtion = pip install nevergrad
Or (developer):
• “git clone” the repository
• Run “ pip install -e .[all] ”
4
Evolutionary programming
Mathematical programming
(cobyla, sqp…)
Design of experiments
Bayesian Optimization (EGO)
424X=deceptive (super hard functions)
X=parallel
X=oneshot
X=illcondi
X=realworld
Evolutionary programming
Mathematical programming
(cobyla, sqp…)
Design of experiments
Bayesian Optimization (EGO)
434
No time for installing, for coding, for experimenting, or I don’t want
to take care of distributing my experiments on many machines.
No problem !!!
Just hack the code directly on the github interface:
- Duplicate the code of one of the optimizers in
optimizerlib.py
- Give it another name, modify it using your knowledge.
- Add your optimizer name in one of the experiments in
experiments.py
èCreate a pull request and our servers will run your
experiments a couple of days after it’s merged and the
result will be visible on
https://dl.fbaipublicfiles.com/nevergrad/allxps/list.html
444
Example: TBPSA (variant of pop. Control)
for games
No redundant
representation è pop
control fails
Redundant
representation è pop
control excellent!
Why ?
454
Example: random seeds!
Consider a stochastic player
Action: f(random, situation)
Now:
- Situation à hash à hash % 5000
- Parameter = vector of 5000 seeds
- Seed 0 means “time” seed (i.e. no seed)
à Looks weird, but works!
à Optimizes the opening book in partially observable games e.g. phantom-Go
World Computer-Bridge Championship in September 2016: Wbridge5.
Fast and easy
experiments
464
Example: game of war; evolution and/or bandits
(strategy: order in which you pick up the cards you earn!)
474
Examples: guesswho game
“Shobute”
484
Our noisy optimization methods

Nevergrad: our platform for evolutionary and derivative free optimization.

  • 1.
    Derivative free optimization. JeremyRapin, Tristan Cazenave, Camille Couprie, Pauline Luc, Maxime Oquab, , Baptiste Roziere, Olivier Teytaud Facebook AI Research (+thanks many people!) 2019
  • 2.
    10 Methods Evolution, Bayesian optimization, genetic, sequentialquadratic programming… Examples Let us save the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. • 1 2 3 4 Outline 1
  • 3.
    11 What is derivativefree optimization ? (Numerical) optimization is about finding the (argument) minimum of f. It’s optimization without derivatives. Maybe you have learnt Newton, BFGS, etc ? These algorithms need the gradient of f. It’s finding (approx.) argmin f without knowing gradient(f), just with a black-box x à f(x). It’s finding f* such that for almost all x, f(x) >= f(f*). 1 Evolutionary programming Mathematical programming (cobyla, sqp…) Design of experiments Bayesian Optimization (EGO)
  • 4.
    12 Methods Evolution, Bayesian optimization, genetic, sequentialquadratic programming… Examples Let us save the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. 2 3 4 Outline 1
  • 5.
    DFO for games? - Partial obs.: ok - No gradient: ok - Multi-agent ok - Collaborative or not - Two-levels ok (e.g. each agent has its training, as in PBT) 2019 Derivative-free optimization Transition function Agent 1 Agent 2 Agent 3 State Observ. 1 Observ. 2 Observ. 3
  • 6.
    14 Advantages of DFO •Fog of war (partial information) taken into account • Bounded rationality taken into account • No gradient ok. • Noise ok. • Simple, transparent: small code contains all the information for justifying the decisions è you know why power plant X is built (transparency++). Examples: • Power systems • Roads • Irrigation • Games 4 Several agents èno problem AskTellRecom. No gradient èno problem for DFO. èCompared to Bellman-style methods or model-predictive control: far less assumptions. No parameter such as operational/tactical/strategic horizon
  • 7.
    15 Serious game inpower systems: Each power plant is maximizing its income. Which law do we need for making this ~ equivalent to global maximization ? No details here, but a real problem. 4 Several agents èno problem AskTellRecom. No gradient èno problem for DFO. èCompared to Bellman-style methods or model-predictive control: far less assumptions. No parameter such as operational/tactical/strategic horizon
  • 8.
    21 Methods Evolution, Bayesian optimization, genetic, sequentialquadratic programming… Examples Let us save the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. 1 2 3 4 Outline
  • 9.
    22Methods Evolution, Bayesian optimization,genetic, sequential quadratic programming… Let us discuss evolution strategies! 3
  • 10.
    23 Evolution strategies (1+1)-ES: x(0) =(0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ 3
  • 11.
    24 Problem: close tothe optimum, we might want to reduce σ. (1+1)-ES with one-fifth rule: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ σ(n+1) = 2 σ(n) else: σ(n+1) = 0.84 σ(n) σ very big è success rate goes to what ? σ very small è success rate what ?, but slow progress. Equilibrium when P(success) = What ? because 0.84^4 == 1 / 2
  • 12.
    25 Problem: close tothe optimum, we might want to reduce σ. (1+1)-ES with one-fifth rule: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ σ(n+1) = 2 σ(n) else: σ(n+1) = 0.84 σ(n) 3 σ very big è success rate goes to 0. σ very small è success rate ½, but slow progress. Equilibrium when P(success) = 1/5 because 0.84^4 == 1 / 2
  • 13.
    26 Problem: we mightwant to be parallel! Evaluate λ individuals simultaneously ? (µ/µ, λ)-ES with self-adaptation: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} for i in {1, 2, 3, …, λ} σ(n,i) = σ(n) x exp(1D-Gaussian) x’(i) = x(n) + σ(n,i) x 2D-Gaussian pick up the µ best x’(i) and their σ(n,i) x(n+1) = average of those µ best σ(n+1) = exp( average of these log σ(n,i)) 3
  • 14.
    27 Problem: isotropic mutations!We want to mutate some variables more than others. E.g. f(x) = 100(x(1)-7)2 + x(2)2 (µ/µ, λ)-ES with anisotropic self-adaptation: x(0) = (0, 0) σ(0) = (1,1) for n in {0, 1, 2, 3, …} for i in {1, 2, 3, …, λ} σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian) x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian pick up the µ best x’(i) and their σ(n,i) x(n+1) = average of those µ best σ(n+1) = exp( average of these log σ(n,i) ) 3
  • 15.
    28 Problem: what ifthere is noise! Like, my oven is not completely deterministic. I want oven(temperature) to be excellent on average. Problem: classical algorithms stagnate! (µ/µ, λ)-ES with anisotropic self-adaptation and population control: x(0) = (0, 0) σ(0) = (1,1) for n in {0, 1, 2, 3, …} for i in {1, 2, 3, …, λ} σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian) x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian pick up the µ best x’(i) and their σ(n,i) x(n+1) = average of those µ best σ(n+1) = exp( average of these log σ(n,i) ) if average of population is signif. better than average of the population 5 iterations earlier, then decrease λ; otherwise increase λ. 3
  • 16.
    29 Problem: what ifit’s discrete ? Like, we optimize in {0,1}10. Discrete-(1+1)ES x(0) = (0,0,0,…..,0,0,0,0) for n in {0, 1, 2, 3, …} x’ = mutation(x(n)) if x’ better than x(n): x(n+1) = x’ 3 Mutation: RLS: randomly mutate one of the variables. (1+1)classical: each variable is mutated with probability 1/10 (if no mutation, then discard/redraw)
  • 17.
    30 Problem: what ifmutating a small number of variables does not work ? E.g. f(x) = bad value as long as we do not have at least half bits equal to 1. Discrete-(1+1)ES x(0) = (0,0,0,…..,0,0,0,0) for n in {0, 1, 2, 3, …} x’ = mutation(x(n)) if x’ better than x(n): x(n+1) = x’ 3 Mutation: RLS: randomly mutate one of the variables. (1+1)classical: each variable is mutated with probability 1/10 (1+1)Mixing of mutation rates: randomly draw the probability p of mutation in (0,1), then each variable is mutated with probability p. Duc-Cuong Dang & Per Christian Lehre: Uniform mixing of mutation rates Carola Doerr and Benjamin Doerr: Parameter control
  • 18.
    31 Convergence rates ? Self-adaptivealgorithms, case f(x) = ||x||: • (x(n)/sigma(n)) is an homogeneous Markov chain • It has a stationary distribution (non trivial) • E log ||x(n+1)|| - log ||x(n) || converges to a constant • log||x(n)||/n converges to a constant 3 log||x(n)|| n Auger, TCS
  • 19.
    32 Convergence rates inthe parallel case ? Does the slope increase ? • Slope is linear in λ if λ is smaller than the dimension • Then it becomes logarithmic 3 Slope= - log||x(n)||/n λ Linear speedup dimension Beyer, book Fournier et al, PPSN & Algorithmica.
  • 20.
    33 Convergence rates inthe noisy case ? Known for some algorithms (not the most usual ones…): E log ||x(n)|| / log(n) è -1/2 for some robust ”repeat” algorithms E log ||x(n)|| / log(n) è -1 for population control and some other methods under some assumptions 3 log||x(n)|| n Astete-Morales et al, Foga Cauwet et al, Foga
  • 21.
    344 Derivative-free methods 1.Random search:randomly draw 1 000 000 points, and pick up the best. 2.Estimation of Distribution Algorithm: while (budget not elapsed) randomly draw 1000 points, select the 250 best, define a Gaussian matching those 250 points, repeat until budget elapsed. 3.Particle swarm optimization: define 50 particles in the domain, with random velocities. Particles are attracted by their best visited point, and by the best point for the entire population, and receive random noise. 4. Quasi-Random Search: similar to random search, but try to have a better positioning of points, using low-discrepancy sequences. … and so many others!
  • 22.
    354 Derivative-free methods 1.CMA: extendinganisotropic mutations to rotated ellipsoids 2.Differential Evolution: also rotated anisotropic mutations, with cheap computational cost 3. Lamarckism, i.e. inheritance of acquired characteristics: a) Part of the state if evolution-optimized b) The rest evolves during the evaluation (e.g. neural networks training) èAs if your child could inherit your post-workout muscles or your trained neural net è remember the inheritance of the immune system ?
  • 23.
    36 Methods Evolution, Bayesian optimization, genetic, sequentialquadratic programming… Examples Let us save the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. . 1 2 3 4 Outline
  • 24.
  • 25.
    38 Let us savethe world. What is the API of our optimization algorithms ? • Dear optimizer, I’m asking you which point I should evaluate è ASK. • Dear optimizer, at point (3.1, 2.0), I tell you that the quality is 47 è TELL. • Dear optimizer, I am tired now, please recommend a good point è RECOMMEND. è Asynchronous ok è Multiagent ok. è Noisy ok. 4
  • 26.
    39 Optimize the hyperparameters ofmachine learning algorithms or the weights of a NN. 4 Much better than random search for hyperparametrizing video prediction! Much better than random search for hyperparametrizing image generation! Population control cool for neuro-playing 007!
  • 27.
    40 Example (simplified): powersystems. • Time steps 1, 2, 3, …, 365 x 10 è 10 years. • For i in {1,2,3,…,12}: • One stock Si of water for hydroelectricity. • A random process Ri adding water in Si. • A random process C “electricity consumption” for each time step. • A market converting the productions and C into cost per MWh. • An agent Ai converting water from Si into electricity Ei. Agent Ai = mapping (X, C, S1, S2, S3, S4, …, S12) to a real number. This mapping has parameters. For example a neural network. Simulator(parameters of agents): - For each time step - For each stock Si, update it with Ri, and Ai(params) converts some water to elect. - Quality for society = ecological / economical penalty associated to the production (includes some agents). - Quality for independent agents = money earned. - è one optimizer per independent agent and one optimizer for others + society. 4 Several agents èno problem AskTellRecom. No gradient èno problem for DFO. èCompared to Bellman-style methods: far less assumptions. No parameter such as operationa/tactical/strategic horizon
  • 28.
    41 Nevergrad: super easyto use! Needs Python >= 3.6. Installtion = pip install nevergrad Or (developer): • “git clone” the repository • Run “ pip install -e .[all] ” 4 Evolutionary programming Mathematical programming (cobyla, sqp…) Design of experiments Bayesian Optimization (EGO)
  • 29.
    424X=deceptive (super hardfunctions) X=parallel X=oneshot X=illcondi X=realworld Evolutionary programming Mathematical programming (cobyla, sqp…) Design of experiments Bayesian Optimization (EGO)
  • 30.
    434 No time forinstalling, for coding, for experimenting, or I don’t want to take care of distributing my experiments on many machines. No problem !!! Just hack the code directly on the github interface: - Duplicate the code of one of the optimizers in optimizerlib.py - Give it another name, modify it using your knowledge. - Add your optimizer name in one of the experiments in experiments.py èCreate a pull request and our servers will run your experiments a couple of days after it’s merged and the result will be visible on https://dl.fbaipublicfiles.com/nevergrad/allxps/list.html
  • 31.
    444 Example: TBPSA (variantof pop. Control) for games No redundant representation è pop control fails Redundant representation è pop control excellent! Why ?
  • 32.
    454 Example: random seeds! Considera stochastic player Action: f(random, situation) Now: - Situation à hash à hash % 5000 - Parameter = vector of 5000 seeds - Seed 0 means “time” seed (i.e. no seed) à Looks weird, but works! à Optimizes the opening book in partially observable games e.g. phantom-Go World Computer-Bridge Championship in September 2016: Wbridge5. Fast and easy experiments
  • 33.
    464 Example: game ofwar; evolution and/or bandits (strategy: order in which you pick up the cards you earn!)
  • 34.
  • 35.