Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Olivier Teytaud, Facebook AI Research
OlIvier Teytaud
Messenger Olivier TEYTAUD
Whatsapp +44 7540 143007,
Started to work in AI last century.
Currently working on games, alphazero style learning,
derivative-free optimization.
Has been working at ARTELYS, INRIA, GOOGLE, FB.
Has 4 beautiful kids.
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Vision:
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
Games 2:
Tristan Cazenave, Univ. Dauphine
Yen-Chi Chen, National Taiwan Normal
University
Guan-Wei Chen, National Dong Hwa University
Shi-Yu Chen, National Dong Hwa University
Xian-Dong Chiu, National Dong Hwa University
Julien Dehos, Univ. Littoral Cote d’Opale
Maria Elsa, National Dong Hwa University
Qucheng Gong, Facebook AI Research
Hengyuan Hu, Facebook AI Research
Vasil Khalidov, Facebook AI Research
Chen-Ling Li, National Dong Hwa University
Hsin-I Lin, National Dong Hwa University
Yu-Jin Lin, National Dong Hwa University
Games 1:
Xavier Martinet, Facebook AI Research
Vegard Mella, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Gabriel Synnaeve, Facebook AI Research
Fabien Teytaud, Univ. Littoral Cote d’Opale
Olivier Teytaud, Facebook AI Research
Shi-Cheng Ye, National Dong Hwa University
Yi-Jun Ye, National Dong Hwa University
Shi-Jim Yen, National Dong Hwa University
Sergey Zagoruyko, Facebook AI Research
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
THANK YOU !!!!
To people who
helped me for visa
issues J
8
Methods
Evolution, Bayesian
optimization,
genetic, sequential
quadratic
programming…
Example
s
Let us save
the world.
Is it
useful ?
Yes.
What is derivative
free optimization ?
It’s
optimization
without
derivatives.
• 1 2 3 4
Outline
1
9
What is derivative free optimization ?
(Numerical) optimization is about finding the (argument) minimum of f.
It’s optimization without derivatives.
Maybe you have learnt Newton, BFGS, etc ? These algorithms need the gradient of f.
It’s finding (approx.) argmin f without knowing gradient(f), just with a black-box x à f(x).
It’s finding f* such that for almost all x, f(x) >= f(f*).
1
1
8
Methods
Evolution,
Bayesian
optimization,
genetic, sequential
quadratic
programming…
Examples
Let us save the
world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
2 3 4
Outline
1
1
9Is it useful ?
Yes.
2
Oven Quality
Quality
measurement
Temperature
Time
2
0Is it useful ?
Yes.
2
Oven Quality
Quality
measurement
Temperature
Time
Aerodynamism
simulator
Quality
Quality
measurement
Shape
parameters
2
1Is it useful ?
Yes.
2
Oven Quality
Quality
measurement
Temperature
Time
Aerodynamism
simulator
Quality
Quality
measurement
Shape
parameters
Agents
simulator
Quality
Quality
measurement
Regulation
parameters
2
2Is it useful ?
Yes.
2
Oven
Quality
(quiche quality or
ceramic quality)
Quality
measurement
Temperature
Time
Aerodynamism
simulator
Quality
(energy &
noise saving)
Quality
measurement
Shape
parameters
Agents
simulator
Quality
Quality
measurement
Regulation
parameters
Simulator
Quality
(power
capacity)
Quality
measurement
Position of wind
turbines
2
3
Methods
Evolution,
Bayesian
optimization,
genetic, sequential
quadratic
programming…
Examples
Let us save the
world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
1 2 3 4
Outline
2
4
Methods
Evolution, Bayesian optimization, genetic, sequential quadratic programming…
Let us discuss evolution strategies!
3
2
5
Evolution strategies
(1+1)-ES:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
3
2
6
Problem: close to the optimum, we might want to
reduce σ.
(1+1)-ES with one-fifth rule:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
σ(n+1) = 2 σ(n)
else:
σ(n+1) = 0.84 σ(n)
σ very big è success rate goes to what ?
σ very small è success rate what ?, but slow
progress.
Equilibrium when P(success) = What ?
because 0.84^4 == 1 / 2
2
7
Problem: close to the optimum, we might want to
reduce σ.
(1+1)-ES with one-fifth rule:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
x’ = x(n) + σ(n) x Gaussian
if x’ better than x(n):
x(n+1) = x’
σ(n+1) = 2 σ(n)
else:
σ(n+1) = 0.84 σ(n)
3
σ very big è success rate goes to 0.
σ very small è success rate ½, but
slow progress.
Equilibrium when P(success) = 1/5
because 0.84^4 == 1 / 2
2
8
Problem: we might want to be parallel! Evaluate λ
individuals simultaneously ?
(µ/µ, λ)-ES with self-adaptation:
x(0) = (0, 0)
σ(0) = 1
for n in {0, 1, 2, 3, …}
for i in {1, 2, 3, …, λ}
σ(n,i) = σ(n) x exp(1D-Gaussian)
x’(i) = x(n) + σ(n,i) x 2D-Gaussian
pick up the µ best x’(i) and their σ(n,i)
x(n+1) = average of those µ best
σ(n+1) = exp( average of these log σ(n,i))
3
2
9
Problem: isotropic mutations! We want to mutate some
variables more than others. E.g. f(x) = 100(x(1)-7)2 + x(2)2
(µ/µ, λ)-ES with anisotropic self-adaptation:
x(0) = (0, 0)
σ(0) = (1,1)
for n in {0, 1, 2, 3, …}
for i in {1, 2, 3, …, λ}
σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian)
x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian
pick up the µ best x’(i) and their σ(n,i)
x(n+1) = average of those µ best
σ(n+1) = exp( average of these log σ(n,i) )
3
3
6
4
Derivative-free methods
1.Random search: randomly draw 1 000 000 points, and pick up the best.
2.Estimation of Distribution Algorithm: while (budget not elapsed) randomly draw 1000 points,
select the 250 best, define a Gaussian matching those 250 points, repeat until budget
elapsed.
3.Particle swarm optimization: define 50 particles in the domain, with random velocities.
Particles are attracted by their best visited point, and by the best point for the entire
population, and receive random noise.
4. Quasi-Random Search: similar to random search, but try to have a better positioning of
points, using low-discrepancy sequences.
… and so many others! Discrete domains, mixed domains,multi-objective, with full covariance
adaptation, etc
3
7
Methods
Evolution,
Bayesian
optimization,
genetic, sequential
quadratic
programming…
Examples
Let us save
the world.
Is it useful ?
Yes.
What is
derivative free
optimization ?
It’s optimization without
derivatives.
.
1 2 3 4
Outline
3
8
Nevergrad: super easy to use!
4
X=deceptive (super hard functions)
X=parallel
X=oneshot
X=illcondi
X=realworld
Evolutionary programming
Mathematical programming
(cobyla, sqp…)
Design of experiments
Bayesian Optimization (EGO)
2200 +
github stars,
growing.
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
Adversarial attacks: given a classifier, find a
small distortion so that it fails.
. 2019
Goodfellow et al, OpenAI
Adversarial attacks: given a classifier, find a
small distortion so that it fails.
. 2019
Goodfellow et al, OpenAI
Black-box adversarial attacks
(no gradient, no white box info: you can just send an
image and get probabilities of classes)
. 2019
Goodfellow et al, OpenAI
Black-box adversarial attacks with tiling
(no gradient, no white box info: you can just send an
image and get probabilities of classes)
. 2019
Contact: oteytaud@fb.com
Black-box adversarial attacks with tiling and evolution
(no gradient, no white box info: you can just send an
image and get probabilities of classes)
. 2019
Contact: oteytaud@fb.com
State of the art ! Use a good library rather than
designing a bad ad hoc variant of random search…
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
+Pauline Luc
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
4
9
Optimize the hyperparameters
of machine learning algorithms.
4
Much better than random search for
hyperparametrizing video prediction!
Much better than random search for
hyperparametrizing image generation!
Population control cool for neuro-playing
007!
5
0
Optimize the hyperparameters
of machine learning algorithms.
4
Much better than random search for
hyperparametrizing video prediction!
Much better than random search for
hyperparametrizing image generation!
Population control cool for neuro-playing
007!
Good because less overfitting.
Less parallel than random search but
still very parallel (e.g. just 4 batches!)
Target = areas stable by variable-wise
perturbation.
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
+Morgane Riviere
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
Generative models
GAN: generative adversarial network
A loss for “the discriminator
must be unable to distinguish
fake from real” (impacting the
generator)
A loss for “the discriminator
must be able to distinguish fake
from real” (impacting the
discriminator)
NeuroEvolution & facial composites
GANs provide generators: given z (e.g. Gaussian in dim 256), G(z) is a
face (or a texture, or …):
Face FashionGen Textures
https://github.com/facebookresearch/pytorch_GAN_zoo
How to find a cool z ?
E.g. a face ~ Mickey
Mouse, or a dress
suggesting a given
flower.
Maybe z = argmin Dissimilarity(G(z), targetImage)
L2, VGG, …
Maybe z = argmin Dissimilarity(G(z), targetImage)
• Pro:
• Simple
• No need for human interaction
• Non trivial if G was not trained on data ~ targetImage
• Con:
• Needs a target image
• Needs a dissimilariy, and similarities on images do not work that
well
But we want more than just a copy-paste!
L2, VGG, …
Discrim(G(z)),
(norm(z)-dim(z))2 …
Maybe z = argmin Dissimilarity(G(z), targetImage)
+ penalization(z) ?
First idea = use Adam. Or SGD, or Nesterov momentum.
Better: use LBFGS. There is no stochasticity, Adam or SGD are just slower than LBFGS.
L2, VGG, …
Discrim(G(z)),
(norm(z)-dim(z))2 …
Maybe z = argmin Dissimilarity(G(z), targetImage)
+ penalization(z) ?
• Pro:
• Simple
• No need for human interaction
• Non trivial if G was not trained on data ~ targetImage
• Con:
• Needs a target image: hard to beat “copy-paste”
• Needs a dissimilarity, and similarities on images do not work
that well
L2, VGG, …
Discrim(G(z)),
(norm(z)-dim(z))2 …
Maybe z = argmin Dissimilarity(G(z), targetImage)
+ penalization(z) ?
First idea = use Adam. Or SGD, or Nesterov momentum.
Better: use LBFGS. There is no stochasticity, Adam or SGD are just slower than LBFGS.
Fun idea: try with evolutionary methods. Because they don’t need “Dissimilarity(G(z),
targetImage) + penalization(z)”, they just need answers to “is G(z1) better than G(z2) ? “
• No dissimilarity
• No penalization
==> it’s all in the user’s head.
L2, VGG, …
Discrim(G(z)),
(norm(z)-dim(z))2 …
LBFGS excellent without any tuning!
When the objective function is a proxy of the
real objective function, evolution ==> better!
è we need
robustness
Facial composites: select the 5 best
Facial composites: select the 5 best
Facial composites: select the 5 best.
Evolution is great – we don’t need numerical criteria,
just comparisons!
Target 3 random reconstructions in 3 minutes each
Also for creating clothes !!!
(fashion gen)
HEVOL rendering
of Triss Merigold
(The Witcher)
Artist rendering of
Triss Merigold (The
Witcher)
Wait, what is the state of the art in facial
composites ?
Holistic evolutionary methods outperform standard “local decomposition”
methods: Frowd et al, 2004, 2010, 2013, Gibson et al 2009, Solomon et al 2009.
Compared to this:
èWe add GANs
èWe compare many derivative-free optimization methods
èWe point out that humans do much better than any similarity measure (in
terms of performance for a limited number of intermediate forward passes).
OVERALL: SELECT THE 5 BEST (REPLACEMENT OK) OUT OF 28 and average them
(then repeat, with random perturbation)
è Easy for humans
è Fast convergence
è End to end for all kinds of data
What else than facial composite ?
A cool fashion generator J
The instruction was respectively to produce “Sportswear”, “Clothes for cold weather”, “Light
clothes”, ”Sophisticated”. 61 images were generated in each case, i.e. 4 generations of 15
images plus the initial one.
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
& our Madagascar friends
& Konstance guys
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
Super resolution
When training with noise injection:
high-resolution = ConditionalGAN(LR, noise)
At inference time, typically noise=0.
Let us try noise maximizing:
- QualityEstimator(ConditionalGAN(LR,noise))
- Discriminator(ConditionalGAN(LR,noise))
- -L2(noise) (regularization)
. 2019
Contact: oteytaud@fb.com
Koncept512
Evolutionary computer vision.
. 2019
Contact: oteytaud@fb.com
Camille Couprie, Facebook AI Research
Laurent Meunier, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Olivier Teytaud, Facebook AI Research
+Vegard Mella
+Qucheng Gong
+Hengyuan Hu
+Xavier Martinet
+Vasil Khalidov
Evolution for dummies
Application: adversarial attacks
Evolution for hyperparameters
Evolutionary GANs
Evolutionary Super Resolution
Vision for something else than vision: Polygames
Because evolution è
robustness to distribution shift
+ parallel + gradient free
Because evolution è
robustness to distribution shift
+ parallel
The best contributions to Nevergrad will
be rewarded (conference grants) è join
us ! Can be a huge technical code
contribution, or a great one-line idea J
Open scalable generic Zero
learning: Polygames @ FB.
Apr. 2019
Contact: oteytaud@fb.com
Alpha Go and Alpha Zero are great.
But
- they use quite a lot of self-play data
- there are still games in which humans are stronger than computers
(global criterion, multiple goals, non-squared locations).
- not all games can be zero-learnt
èScientific innovations needed
èOpen source platform needed
Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT (Upper Confidence Trees) starts with
simple monte carlo
(Monte Carlo)
Monte Carlo …
UCT
(Monte Carlo)
Monte Carlo … and
keep track of
statistics!
UCT
(Monte Carlo)
UCT
(we have
statistics!)
(Monte Carlo)
UCT
(we have
statistics!)
UCT
Kocsis & Szepesvari (06)
Exploitation ...
Monte Carlo, and
build statistics… and
modify MC with
those statistics!
Exploitation ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )
Exploitation ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )
AlphaZero ingredient #1: Monte Carlo: random exploration of
possible futures.
Better than simple MC: use the statistics !
SCORE =
5/7
+ k.sqrt( log(10)/7 )
AlphaZero ingredient #2: Monte Carlo Tree
Search (a.k.a. adaptive Monte Carlo)
... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
Why the second term ?
Why the first term ?
... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
Why the second term ?
è For exploring everything, eventually
Why the first term ?
... or exploration ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
Why the second term ?
è For exploring everything, eventually
Why the first term ?
è For more simulations in good directions.
UCT in one slide
UCT for choosing a move in a board B
While ( I have time left )
{
Do a simulation
{
Start at board B
At each time step, choose action by UCB (or random if no statistics!)
}
Update statistics with this simulation
}
Return the most simulated action.
AlphaZero ingredient #3: deep network
Overview in “Deep learning”, LeCun, Bengio, Hinton 2015
Both a critic network (evaluating the probability of winning in a given
position) and a policy network (providing a probability distribution on
actions).
Image “clarifai.com/technology” and “the data science blog”
ß ß ß Invariance by translation à à à High level features
PUCT: a variant of MCTS with neural prior
SCORE(state, action) =
5/7
+ NN(state, action) .sqrt( 10 / 7 )
AlphaZero in a nutshell : a fix point method!
MCTS(NN): a MCTS which uses a neural net NN for
• Evaluating leaves (no random rollout)
• Suggesting policies (biasing the MCTS)
NN ß MCTS:
• Each client: plays games with a MCTS(NN)
• Server:
• receives batches “(states, actions, reward at end of games)”
• Two loss functions (+weight decay):
• Learn “state à reward” (critic)
• Learn “state à probability distribution on actions” (actor) , i.e. mimic the MCTS
ALPHAZERO:
• randomly initialize NN
• iteratively imitate: NN-actor è MCTS(NN) NN-Critic è game results
Prediction
of value
p imitates
𝜋 from
MCTS
Weight
decay
AlphaZero in a nutshell : a fix point method!
Prediction
of value
p imitates
𝜋 from
MCTS
Weight
decay
Neural network
MCTS using the
Neural network
(tree search +
neural net)
Neural network
trained using
the MCTS
results
Adding mutations in Zero learning ?
Original Zero learning = convolutions + fully connected layers.
Outputs of the network =
(1) A tensor Pi = logits of actions. Typically XxYxC, where XxY = board size
= same first two dims as inputs (as in dense image classification!).
(2) A float V = probability of winning.
Our claims:
(1) Fully connected layers have drawbacks for Pi = lose the local
invariance. Traditional zero = invariant by permutation of the
representation of actions on the board. As much non-sense as full
connections on images. As in dense image classification!
(2) Add global pooling (= fixed-length representation based on statistics
over the board for each channel)
è By 1+2, the network is boardsize-independent
Adding mutations in Zero learning ?
(3) Residual networks è adding
- layers (initialized close to 0)
- channels (initialized close to 0)
- kernel size (new entries close to 0)
è preserves the computed function à incremental zero learning.
è towards architecture search in zero learning.
HEX
According to Bonnet et al
(https://www.lamsade.dauphine.fr/~bonnet/publi/connection-
games.pdf), “Since its independent inventions in 1942 and 1948 by
the poet and mathematician Piet Hein and the economist and
mathematician John Nash, the game of hex has acquired a special
spot in the heart of abstract game aficionados. Its purity and depth
has lead Jack van Rijswijck to conclude his PhD thesis with the
following hyperbole [1]: << Hex has a Platonic existence,
independent of human thought. If ever we find an
extraterrestrial civilization at all, they will know hex, without
any doubt.>> ”
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
HEX
Polygames vs Arek Kulczycki
Bunch of GPUs, several days.
Operated & trained by Vegard, a.k.a
“un putain de hacker de ouf”. (winner last LG tournament, best
ELO-rank on the LittleGolem server)
Thanks a lot ! ! !
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
(Max Pixel)
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
(pngimg.com)
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
Fantastic game with a super
long final path!
TRAINED in 13x13, WON in
19x19
THE END !!!
… we’re coming in many
other games J
Havannah: big board,
diversity of winning
conditions, long games,
hexagons…
Let’s have a beer!
Contact:
oteytaud@fb.com

Evolutionary deep learning: computer vision.

  • 1.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Olivier Teytaud, Facebook AI Research OlIvier Teytaud Messenger Olivier TEYTAUD Whatsapp +44 7540 143007, Started to work in AI last century. Currently working on games, alphazero style learning, derivative-free optimization. Has been working at ARTELYS, INRIA, GOOGLE, FB. Has 4 beautiful kids.
  • 2.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Vision: Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research Games 2: Tristan Cazenave, Univ. Dauphine Yen-Chi Chen, National Taiwan Normal University Guan-Wei Chen, National Dong Hwa University Shi-Yu Chen, National Dong Hwa University Xian-Dong Chiu, National Dong Hwa University Julien Dehos, Univ. Littoral Cote d’Opale Maria Elsa, National Dong Hwa University Qucheng Gong, Facebook AI Research Hengyuan Hu, Facebook AI Research Vasil Khalidov, Facebook AI Research Chen-Ling Li, National Dong Hwa University Hsin-I Lin, National Dong Hwa University Yu-Jin Lin, National Dong Hwa University Games 1: Xavier Martinet, Facebook AI Research Vegard Mella, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Gabriel Synnaeve, Facebook AI Research Fabien Teytaud, Univ. Littoral Cote d’Opale Olivier Teytaud, Facebook AI Research Shi-Cheng Ye, National Dong Hwa University Yi-Jun Ye, National Dong Hwa University Shi-Jim Yen, National Dong Hwa University Sergey Zagoruyko, Facebook AI Research
  • 3.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames THANK YOU !!!! To people who helped me for visa issues J
  • 4.
    8 Methods Evolution, Bayesian optimization, genetic, sequential quadratic programming… Example s Letus save the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. • 1 2 3 4 Outline 1
  • 5.
    9 What is derivativefree optimization ? (Numerical) optimization is about finding the (argument) minimum of f. It’s optimization without derivatives. Maybe you have learnt Newton, BFGS, etc ? These algorithms need the gradient of f. It’s finding (approx.) argmin f without knowing gradient(f), just with a black-box x à f(x). It’s finding f* such that for almost all x, f(x) >= f(f*). 1
  • 6.
    1 8 Methods Evolution, Bayesian optimization, genetic, sequential quadratic programming… Examples Let ussave the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. 2 3 4 Outline 1
  • 7.
    1 9Is it useful? Yes. 2 Oven Quality Quality measurement Temperature Time
  • 8.
    2 0Is it useful? Yes. 2 Oven Quality Quality measurement Temperature Time Aerodynamism simulator Quality Quality measurement Shape parameters
  • 9.
    2 1Is it useful? Yes. 2 Oven Quality Quality measurement Temperature Time Aerodynamism simulator Quality Quality measurement Shape parameters Agents simulator Quality Quality measurement Regulation parameters
  • 10.
    2 2Is it useful? Yes. 2 Oven Quality (quiche quality or ceramic quality) Quality measurement Temperature Time Aerodynamism simulator Quality (energy & noise saving) Quality measurement Shape parameters Agents simulator Quality Quality measurement Regulation parameters Simulator Quality (power capacity) Quality measurement Position of wind turbines
  • 11.
    2 3 Methods Evolution, Bayesian optimization, genetic, sequential quadratic programming… Examples Let ussave the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. 1 2 3 4 Outline
  • 12.
    2 4 Methods Evolution, Bayesian optimization,genetic, sequential quadratic programming… Let us discuss evolution strategies! 3
  • 13.
    2 5 Evolution strategies (1+1)-ES: x(0) =(0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ 3
  • 14.
    2 6 Problem: close tothe optimum, we might want to reduce σ. (1+1)-ES with one-fifth rule: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ σ(n+1) = 2 σ(n) else: σ(n+1) = 0.84 σ(n) σ very big è success rate goes to what ? σ very small è success rate what ?, but slow progress. Equilibrium when P(success) = What ? because 0.84^4 == 1 / 2
  • 15.
    2 7 Problem: close tothe optimum, we might want to reduce σ. (1+1)-ES with one-fifth rule: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} x’ = x(n) + σ(n) x Gaussian if x’ better than x(n): x(n+1) = x’ σ(n+1) = 2 σ(n) else: σ(n+1) = 0.84 σ(n) 3 σ very big è success rate goes to 0. σ very small è success rate ½, but slow progress. Equilibrium when P(success) = 1/5 because 0.84^4 == 1 / 2
  • 16.
    2 8 Problem: we mightwant to be parallel! Evaluate λ individuals simultaneously ? (µ/µ, λ)-ES with self-adaptation: x(0) = (0, 0) σ(0) = 1 for n in {0, 1, 2, 3, …} for i in {1, 2, 3, …, λ} σ(n,i) = σ(n) x exp(1D-Gaussian) x’(i) = x(n) + σ(n,i) x 2D-Gaussian pick up the µ best x’(i) and their σ(n,i) x(n+1) = average of those µ best σ(n+1) = exp( average of these log σ(n,i)) 3
  • 17.
    2 9 Problem: isotropic mutations!We want to mutate some variables more than others. E.g. f(x) = 100(x(1)-7)2 + x(2)2 (µ/µ, λ)-ES with anisotropic self-adaptation: x(0) = (0, 0) σ(0) = (1,1) for n in {0, 1, 2, 3, …} for i in {1, 2, 3, …, λ} σ(n,i) = σ(n) *pointwise-product* exp(2D-Gaussian) x’(i) = x(n) + σ(n,i) *pointwise-product* 2D-Gaussian pick up the µ best x’(i) and their σ(n,i) x(n+1) = average of those µ best σ(n+1) = exp( average of these log σ(n,i) ) 3
  • 18.
    3 6 4 Derivative-free methods 1.Random search:randomly draw 1 000 000 points, and pick up the best. 2.Estimation of Distribution Algorithm: while (budget not elapsed) randomly draw 1000 points, select the 250 best, define a Gaussian matching those 250 points, repeat until budget elapsed. 3.Particle swarm optimization: define 50 particles in the domain, with random velocities. Particles are attracted by their best visited point, and by the best point for the entire population, and receive random noise. 4. Quasi-Random Search: similar to random search, but try to have a better positioning of points, using low-discrepancy sequences. … and so many others! Discrete domains, mixed domains,multi-objective, with full covariance adaptation, etc
  • 19.
    3 7 Methods Evolution, Bayesian optimization, genetic, sequential quadratic programming… Examples Let ussave the world. Is it useful ? Yes. What is derivative free optimization ? It’s optimization without derivatives. . 1 2 3 4 Outline
  • 20.
    3 8 Nevergrad: super easyto use! 4 X=deceptive (super hard functions) X=parallel X=oneshot X=illcondi X=realworld Evolutionary programming Mathematical programming (cobyla, sqp…) Design of experiments Bayesian Optimization (EGO) 2200 + github stars, growing.
  • 21.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames
  • 22.
    Adversarial attacks: givena classifier, find a small distortion so that it fails. . 2019 Goodfellow et al, OpenAI
  • 23.
    Adversarial attacks: givena classifier, find a small distortion so that it fails. . 2019 Goodfellow et al, OpenAI
  • 24.
    Black-box adversarial attacks (nogradient, no white box info: you can just send an image and get probabilities of classes) . 2019 Goodfellow et al, OpenAI
  • 25.
    Black-box adversarial attackswith tiling (no gradient, no white box info: you can just send an image and get probabilities of classes) . 2019 Contact: oteytaud@fb.com
  • 26.
    Black-box adversarial attackswith tiling and evolution (no gradient, no white box info: you can just send an image and get probabilities of classes) . 2019 Contact: oteytaud@fb.com State of the art ! Use a good library rather than designing a bad ad hoc variant of random search…
  • 27.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research +Pauline Luc Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames
  • 28.
    4 9 Optimize the hyperparameters ofmachine learning algorithms. 4 Much better than random search for hyperparametrizing video prediction! Much better than random search for hyperparametrizing image generation! Population control cool for neuro-playing 007!
  • 29.
    5 0 Optimize the hyperparameters ofmachine learning algorithms. 4 Much better than random search for hyperparametrizing video prediction! Much better than random search for hyperparametrizing image generation! Population control cool for neuro-playing 007! Good because less overfitting. Less parallel than random search but still very parallel (e.g. just 4 batches!) Target = areas stable by variable-wise perturbation.
  • 30.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research +Morgane Riviere Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames
  • 31.
    Generative models GAN: generativeadversarial network A loss for “the discriminator must be unable to distinguish fake from real” (impacting the generator) A loss for “the discriminator must be able to distinguish fake from real” (impacting the discriminator)
  • 32.
    NeuroEvolution & facialcomposites GANs provide generators: given z (e.g. Gaussian in dim 256), G(z) is a face (or a texture, or …): Face FashionGen Textures https://github.com/facebookresearch/pytorch_GAN_zoo How to find a cool z ? E.g. a face ~ Mickey Mouse, or a dress suggesting a given flower.
  • 33.
    Maybe z =argmin Dissimilarity(G(z), targetImage) L2, VGG, …
  • 34.
    Maybe z =argmin Dissimilarity(G(z), targetImage) • Pro: • Simple • No need for human interaction • Non trivial if G was not trained on data ~ targetImage • Con: • Needs a target image • Needs a dissimilariy, and similarities on images do not work that well But we want more than just a copy-paste! L2, VGG, … Discrim(G(z)), (norm(z)-dim(z))2 …
  • 35.
    Maybe z =argmin Dissimilarity(G(z), targetImage) + penalization(z) ? First idea = use Adam. Or SGD, or Nesterov momentum. Better: use LBFGS. There is no stochasticity, Adam or SGD are just slower than LBFGS. L2, VGG, … Discrim(G(z)), (norm(z)-dim(z))2 …
  • 36.
    Maybe z =argmin Dissimilarity(G(z), targetImage) + penalization(z) ? • Pro: • Simple • No need for human interaction • Non trivial if G was not trained on data ~ targetImage • Con: • Needs a target image: hard to beat “copy-paste” • Needs a dissimilarity, and similarities on images do not work that well L2, VGG, … Discrim(G(z)), (norm(z)-dim(z))2 …
  • 37.
    Maybe z =argmin Dissimilarity(G(z), targetImage) + penalization(z) ? First idea = use Adam. Or SGD, or Nesterov momentum. Better: use LBFGS. There is no stochasticity, Adam or SGD are just slower than LBFGS. Fun idea: try with evolutionary methods. Because they don’t need “Dissimilarity(G(z), targetImage) + penalization(z)”, they just need answers to “is G(z1) better than G(z2) ? “ • No dissimilarity • No penalization ==> it’s all in the user’s head. L2, VGG, … Discrim(G(z)), (norm(z)-dim(z))2 …
  • 38.
  • 39.
    When the objectivefunction is a proxy of the real objective function, evolution ==> better! è we need robustness
  • 40.
  • 41.
  • 42.
    Facial composites: selectthe 5 best. Evolution is great – we don’t need numerical criteria, just comparisons! Target 3 random reconstructions in 3 minutes each
  • 43.
    Also for creatingclothes !!! (fashion gen)
  • 44.
    HEVOL rendering of TrissMerigold (The Witcher) Artist rendering of Triss Merigold (The Witcher)
  • 45.
    Wait, what isthe state of the art in facial composites ? Holistic evolutionary methods outperform standard “local decomposition” methods: Frowd et al, 2004, 2010, 2013, Gibson et al 2009, Solomon et al 2009. Compared to this: èWe add GANs èWe compare many derivative-free optimization methods èWe point out that humans do much better than any similarity measure (in terms of performance for a limited number of intermediate forward passes). OVERALL: SELECT THE 5 BEST (REPLACEMENT OK) OUT OF 28 and average them (then repeat, with random perturbation) è Easy for humans è Fast convergence è End to end for all kinds of data
  • 46.
    What else thanfacial composite ? A cool fashion generator J The instruction was respectively to produce “Sportswear”, “Clothes for cold weather”, “Light clothes”, ”Sophisticated”. 61 images were generated in each case, i.e. 4 generations of 15 images plus the initial one.
  • 47.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research & our Madagascar friends & Konstance guys Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames
  • 48.
    Super resolution When trainingwith noise injection: high-resolution = ConditionalGAN(LR, noise) At inference time, typically noise=0. Let us try noise maximizing: - QualityEstimator(ConditionalGAN(LR,noise)) - Discriminator(ConditionalGAN(LR,noise)) - -L2(noise) (regularization) . 2019 Contact: oteytaud@fb.com Koncept512
  • 49.
    Evolutionary computer vision. .2019 Contact: oteytaud@fb.com Camille Couprie, Facebook AI Research Laurent Meunier, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Olivier Teytaud, Facebook AI Research +Vegard Mella +Qucheng Gong +Hengyuan Hu +Xavier Martinet +Vasil Khalidov Evolution for dummies Application: adversarial attacks Evolution for hyperparameters Evolutionary GANs Evolutionary Super Resolution Vision for something else than vision: Polygames Because evolution è robustness to distribution shift + parallel + gradient free Because evolution è robustness to distribution shift + parallel The best contributions to Nevergrad will be rewarded (conference grants) è join us ! Can be a huge technical code contribution, or a great one-line idea J
  • 50.
    Open scalable genericZero learning: Polygames @ FB. Apr. 2019 Contact: oteytaud@fb.com Alpha Go and Alpha Zero are great. But - they use quite a lot of self-play data - there are still games in which humans are stronger than computers (global criterion, multiple goals, non-squared locations). - not all games can be zero-learnt èScientific innovations needed èOpen source platform needed
  • 51.
    Coulom (06) Chaslot, Saito& Bouzy (06) Kocsis Szepesvari (06) UCT (Upper Confidence Trees) starts with simple monte carlo (Monte Carlo) Monte Carlo …
  • 52.
    UCT (Monte Carlo) Monte Carlo… and keep track of statistics!
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
    Exploitation ... Monte Carlo,and build statistics… and modify MC with those statistics!
  • 58.
    Exploitation ... SCORE = 5/7 +k.sqrt( log(10)/7 )
  • 59.
    Exploitation ... SCORE = 5/7 +k.sqrt( log(10)/7 )
  • 60.
    AlphaZero ingredient #1:Monte Carlo: random exploration of possible futures.
  • 61.
    Better than simpleMC: use the statistics ! SCORE = 5/7 + k.sqrt( log(10)/7 ) AlphaZero ingredient #2: Monte Carlo Tree Search (a.k.a. adaptive Monte Carlo)
  • 62.
    ... or exploration? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 63.
    ... or exploration? SCORE = 0/2 + k.sqrt( log(10)/2 ) Why the second term ? Why the first term ?
  • 64.
    ... or exploration? SCORE = 0/2 + k.sqrt( log(10)/2 ) Why the second term ? è For exploring everything, eventually Why the first term ?
  • 65.
    ... or exploration? SCORE = 0/2 + k.sqrt( log(10)/2 ) Why the second term ? è For exploring everything, eventually Why the first term ? è For more simulations in good directions.
  • 66.
    UCT in oneslide UCT for choosing a move in a board B While ( I have time left ) { Do a simulation { Start at board B At each time step, choose action by UCB (or random if no statistics!) } Update statistics with this simulation } Return the most simulated action.
  • 67.
    AlphaZero ingredient #3:deep network Overview in “Deep learning”, LeCun, Bengio, Hinton 2015 Both a critic network (evaluating the probability of winning in a given position) and a policy network (providing a probability distribution on actions). Image “clarifai.com/technology” and “the data science blog” ß ß ß Invariance by translation à à à High level features
  • 68.
    PUCT: a variantof MCTS with neural prior SCORE(state, action) = 5/7 + NN(state, action) .sqrt( 10 / 7 )
  • 69.
    AlphaZero in anutshell : a fix point method! MCTS(NN): a MCTS which uses a neural net NN for • Evaluating leaves (no random rollout) • Suggesting policies (biasing the MCTS) NN ß MCTS: • Each client: plays games with a MCTS(NN) • Server: • receives batches “(states, actions, reward at end of games)” • Two loss functions (+weight decay): • Learn “state à reward” (critic) • Learn “state à probability distribution on actions” (actor) , i.e. mimic the MCTS ALPHAZERO: • randomly initialize NN • iteratively imitate: NN-actor è MCTS(NN) NN-Critic è game results Prediction of value p imitates 𝜋 from MCTS Weight decay
  • 70.
    AlphaZero in anutshell : a fix point method! Prediction of value p imitates 𝜋 from MCTS Weight decay Neural network MCTS using the Neural network (tree search + neural net) Neural network trained using the MCTS results
  • 71.
    Adding mutations inZero learning ? Original Zero learning = convolutions + fully connected layers. Outputs of the network = (1) A tensor Pi = logits of actions. Typically XxYxC, where XxY = board size = same first two dims as inputs (as in dense image classification!). (2) A float V = probability of winning. Our claims: (1) Fully connected layers have drawbacks for Pi = lose the local invariance. Traditional zero = invariant by permutation of the representation of actions on the board. As much non-sense as full connections on images. As in dense image classification! (2) Add global pooling (= fixed-length representation based on statistics over the board for each channel) è By 1+2, the network is boardsize-independent
  • 72.
    Adding mutations inZero learning ? (3) Residual networks è adding - layers (initialized close to 0) - channels (initialized close to 0) - kernel size (new entries close to 0) è preserves the computed function à incremental zero learning. è towards architecture search in zero learning.
  • 73.
    HEX According to Bonnetet al (https://www.lamsade.dauphine.fr/~bonnet/publi/connection- games.pdf), “Since its independent inventions in 1942 and 1948 by the poet and mathematician Piet Hein and the economist and mathematician John Nash, the game of hex has acquired a special spot in the heart of abstract game aficionados. Its purity and depth has lead Jack van Rijswijck to conclude his PhD thesis with the following hyperbole [1]: << Hex has a Platonic existence, independent of human thought. If ever we find an extraterrestrial civilization at all, they will know hex, without any doubt.>> ”
  • 74.
    HEX Simplest rules ever! Iplay black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans!
  • 75.
    HEX Polygames vs ArekKulczycki Bunch of GPUs, several days. Operated & trained by Vegard, a.k.a “un putain de hacker de ouf”. (winner last LG tournament, best ELO-rank on the LittleGolem server) Thanks a lot ! ! !
  • 76.
    HEX Simplest rules ever! Iplay black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! (Max Pixel)
  • 77.
    HEX Simplest rules ever! Iplay black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! (pngimg.com)
  • 78.
    HEX Simplest rules ever! Iplay black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! Fantastic game with a super long final path! TRAINED in 13x13, WON in 19x19
  • 79.
    THE END !!! …we’re coming in many other games J Havannah: big board, diversity of winning conditions, long games, hexagons… Let’s have a beer! Contact: oteytaud@fb.com