SlideShare a Scribd company logo
DERIVATIVE-FREE
                     OPTIMIZATION

                       http://www.lri.fr/~teytaud/dfo.pdf
                           (or Quentin's web page ?)



Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège           using also Slides from A. Auger
The next slide is the most important
                                      of all.




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
In case of trouble,
                               Interrupt me.

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
In case of trouble,
                                    Interrupt me.

         Further discussion needed:
            - R82A, Montefiore institute
            - olivier.teytaud@inria.fr
            - or after the lessons (the 25          th
                                                         , not the 18th)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
I. Optimization and DFO
  II. Evolutionary algorithms
  III. From math. programming
  IV. Using machine learning
  V. Conclusions

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Derivative-free optimization of f




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Derivative-free optimization of f




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Derivative-free optimization of f




                                                              No gradient !
                                                    Only depends on the x's and f(x)'s
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Derivative-free optimization of f




      Why derivative free optimization ?
Derivative-free optimization of f




      Why derivative free optimization ?
       Ok, it's slower
Derivative-free optimization of f




      Why derivative free optimization ?
       Ok, it's slower
       But sometimes you have no derivative
Derivative-free optimization of f




      Why derivative free optimization ?
       Ok, it's slower
       But sometimes you have no derivative
       It's simpler (by far) ==> less bugs
Derivative-free optimization of f




      Why derivative free optimization ?
       Ok, it's slower
       But sometimes you have no derivative
       It's simpler (by far)
       It's more robust (to noise, to strange functions...)
Derivative-free optimization of f


          Optimization algorithms
      ==> Newton optimization ?
      Why derivative free
       ==> Quasi-Newton (BFGS)
       Ok, it's slower
       But sometimes you have no derivative
       ==> Gradient descent
       It's simpler (by far)

       ==> ...robust (to noise, to strange functions...)
       It's more
Derivative-free optimization of f



          Optimization algorithms
      Why derivative free optimization ?
       Ok, it's slower
        Derivative-free        optimization
       But sometimes you have no derivative
           (don't need gradients)
       It's simpler (by far)
       It's more robust (to noise, to strange functions...)
Derivative-free optimization of f



          Optimization algorithms
      Why derivative free optimization ?
        Derivative-free optimization
       Ok, it's slower
       But sometimes you have no derivative
              Comparison-based optimization
                      (coming soon),
       It's simpler (by far)comparisons,
                  just needing
       It's more robust (to noise, to strange functions...)
               incuding evolutionary algorithms
I. Optimization and DFO
  II. Evolutionary algorithms
  III. From math. programming
  IV. Using machine learning
  V. Conclusions

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
II. Evolutionary algorithms
          a. Fundamental elements
          b. Algorithms
          c. Math. analysis

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
   ==> for theoretical analysis
Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
K exp( - p(x) ) with
                      - p(x) a degree 2 polynomial (neg. dom coef)
                      - K a normalization constant



Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
K exp( - p(x) ) with
                                 - p(x) a degree 2 polynomial (neg. dom coef)
                                 - K a normalization constant
                  Translation
                    of the
Preliminaries:Gaussian
        Sze
         of the
- Gaussian distribution
       Gaussian


- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
- Markov chains
Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
Preliminaries:
  Isotropic case:
- Gaussian distribution
- Multivariate Gaussian distribution||2 /22)
==> general case: density = K exp( - || x - 
==> level sets are rotationally invariant
- Non-isotropic Gaussian distribution
==> completely defined by  and 
- Markov chains
       (do you understand why K is fixed by ?)
==> “isotropic” Gaussian
Preliminaries:
- Gaussian distribution
- Multivariate Gaussian distribution
- Non-isotropic Gaussian distribution
Step-size different
        on each axis



       K exp( - p(x) ) with
- p(x) a quadratic form (--> + infinity)
- K a normalization constant
Notions that we will see:
- Evolutionary algorithm
- Cross-over
- Truncation selection / roulette wheel
- Linear / log-linear convergence
- Estimation of Distribution Algorithm
- EMNA
- Self-adaptation
- (1+1)-ES with 1/5th rule
- Voronoi representation
- Non-isotropy
Comparison-based optimization



     Observation: we want robustness w.r.t that:



                  is comparison-based if




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   29
Comparison-based optimization


                                                            yi=f(xi)




                  is comparison-based if




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   30
Population-based comparison-based algorithms ?




  X(1)=( x(1,1),x(1,2),...,x(1,) ) = Opt()
  X(2)=( x(2,1),x(2,2),...,x(2,) ) = Opt(x(1),
                                      signs of diff)
           …             …            ...
  x(n)=( x(n,1),x(n,2),...,x(n,) ) = Opt(x(n-1),
                                      signs of diff)
    ==> let's write it for =2.


Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   31
Population-based comparison-based algorithms ?



  x(1)=(x(1,1),x(1,2)) = Opt()
  x(2)=(x(2,1),x(2,2)) = Opt(x(1),
                               sign(y(1,1)-y(1,2)) )
           …           …           ...
  x(n)=(x(n,1),x(n,2)) = Opt(x(n-1),
                               sign(y(n-1,1)-y(n-1,2))

                   with y(i,j) = f ( x(i,j) )

Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   32
Population-based comparison-based algorithms ?




  Abstract notations: x(i) is a population

  x(1) = Opt()
  x(2) = Opt(x(1), sign(y(1,1)-y(1,2)) )
          …           …            ...
  x(n) = Opt(x(n-1), sign(y(n-1,1)-y(n-1,2))



Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   33
Population-based comparison-based algorithms ?



  Abstract notations: x(i) is a population, I(i) is an
                   internal state of the algorithm.

  x(1),I(1) = Opt()
  x(2),I(2) = Opt(x(1), sign(y(1,1)-y(1,2)), I(1) )
            …          …           ...
  x(n),I(n) = Opt(x(n-1),sign(y(n-1,1)-y(n-1,2) ,I(n-1))



Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   34
Population-based comparison-based algorithms ?




  Abstract notations: x(i) is a population, I(i) is an
                   internal state of the algorithm.

  x(1),I(1) = Opt()
  x(2),I(2) = Opt(x(1), (1), I(1) )
            …          …             ...
  x(n),I(n) = Opt(x(n-1),(n-1) ,I(n-1))


Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   35
Comparison-based optimization



           ==> Same behavior on many functions



                  is comparison-based if




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   36
Comparison-based optimization



           ==> Same behavior on many functions



                  is comparison-based if




         Quasi-Newton methods very poor on this.
Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   37
Why comparison-based algorithms ?
         ==> more robust
         ==> this can be mathematically
                  formalized: comparison-based opt.
                  are slow ( d log ||xn-x*||/n ~ constant)
                but robust (optimal for some worst
                  case analysis)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
II. Evolutionary algorithms
          a. Fundamental elements
          b. Algorithms
          c. Math. analysis

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Parameters:
                       Generate  points around x
   x,
                      ( x +  N where N is a standard
                                Gaussian)
       o f an
                egy
   ema
           Strat
 c sch
         ution
Basi
       Evol
Parameters:
                       Generate  points around x
   x,
                      ( x +  N where N is a standard
                                Gaussian)
       o f an
                egy

                      Compute their  fitness values
   ema
           Strat
 c sch
         ution
Basi
       Evol
Parameters:
                       Generate  points around x
   x,
                      ( x +  N where N is a standard
                                Gaussian)
       o f an
                egy

                      Compute their  fitness values
   ema
           Strat
 c sch
         ution




                             Select the  best
Basi
       Evol
Parameters:
                        Generate  points around x
   x,
                      ( x +  N where N is a standard
                                Gaussian)
       o f an
                egy

                      Compute their  fitness values
   ema
           Strat
 c sch
         ution




                             Select the  best
Basi
       Evol




                      Let x = average of these  best
Parameters:
                        Generate  points around x
   x,
                      ( x +  N where N is a standard
                                Gaussian)
       o f an
                egy

                      Compute their  fitness values
   ema
           Strat
 c sch
         ution




                             Select the  best
Basi
       Evol




                      Let x = average of these  best
Parameters:
                     Generate  points around x
    x,
                   ( x +  N where N is a standard
                             Gaussian)
          llel
     para



                   Compute their  fitness values
                   Multi-cores,
   sly 




                 Clusters, Grids...
     ou




                          Select the  best
Obvi




                   Let x = average of these  best
Parameters:
                        Generate  points around x
    x,
                      ( x +  N where N is a standard
                                Gaussian)
          llel
     para



                      Compute their  fitness values
   sly 
               ple.
     ou




                             Select the  best
         ly sim
Obvi
     Real




                      Let x = average of these  best
Parameters:
                        Generate  points around x
    x,
                      ( x +  N where N is a standard
                                Gaussian)
          llel
     para



                                 Not a negligible advantage.
                      Compute their  fitness values
                             When I accessed, for the 1st time,
   sly 




                                     to a crucial industrial
               ple.




                                     code of an important
     ou




                             Select the  best
                                      company, I believed
         ly sim
Obvi




                                         that it would be
     Real




                                      clean and bug free.
                      Let x = average of these  best
                                                        (I was young)
Parameters:
                            Generate 1 point x' around x
  x,
                           ( x +  N where N is a standard
                                      Gaussian)




                              Compute its fitness value
        ) - ES
                  le




                               Keep the best (x or x').
                      ru
     (1 +1
               1/5 th




                           x=best(x,x')
   The
          with




                           =2 if x' best
                           =0.84 otherwise
This is x...
I generate =6 points
I select the =3 best points
x=average of these =3 best points
Ok.
Choosing an initial
      x is as in any algorithm.
But how do I choose sigma ?
Ok.
Choosing x is as in any algorithm.
But how do I choose sigma ?


Sometimes by human guess.
But for large number of iterations,
there is better.
log || xn – x* || ~ - C n
Usually termed “linear convergence”,
      ==> but it's in log-scale.
     log || xn – x* || ~ - C n
Examples of evolutionary algorithms




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   60
Estimation of Multivariate Normal Algorithm




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   61
Estimation of Multivariate Normal Algorithm




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   62
Estimation of Multivariate Normal Algorithm




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   63
Estimation of Multivariate Normal Algorithm




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   64
EMNA is usually non-isotropic




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   65
EMNA is usually non-isotropic




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   66
Self-adaptation (works in many frameworks)




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud   parallel evolution   67
Self-adaptation (works in many frameworks)




                                              Can be used for non-isotropic
                                                   multivariate Gaussian
                                                       distributions.


Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud                   parallel evolution   68
Let's generalize.

 We have seen algorithms which work as follows:

  - we keep one search point in memory
    (and one step-size)
  - we generate individuals
  - we evaluate these individuals
  - we regenerate a search point and a step-size

Maybe we could keep more than one search point ?
Let's generalize.

  We have seen algorithms which work as follows:

  - we keep one search point in memory
    (and one step-size) points
       ==> mu search
  - we generate individuals
  - we evaluate thesegenerated individuals
       ==> lambda individuals
  - we regenerate a search point and a step-size

Maybe we could keep more than one search point ?
Parameters:                Generate  points
  x1,...,x                  around x1,...,x
                      e.g. each x randomly generated
                             from two points
          llel
     para



                      Compute their  fitness values
   sly 
               ple.
     ou




                             Select the  best
         ly sim
Obvi
     Real




                             Don't average...
Generate  points
       around x1,...,x
e.g. each x randomly generated
       from two points
Generate  points
       around x1,...,x
e.g. each x randomly generated    This is a
       from two points           cross-over
Generate  points
        around x1,...,x
 e.g. each x randomly generated              This is a
        from two points                     cross-over
Example of procedure for generating a point:

  - Randomly draw k parents x1,...,xk
       (truncation selection: randomly in selected individuals)


  - For generating the ith coordinate of new individual z:
                           u=random(1,k)
                             z(i) = x(u)i
Let's summarize:
We have seen a general scheme for optimization:
 - generate a population (e.g. from some distribution, or from
             a set of search points)
 - select the best = new search points


==> Small difference between an
    Evolutionary Algorithm (EA) and an
    Estimation of Distribution Algorithm (EDA).
==> Some EA (older than the EDA acronym) are EDAs.
Let's summarize:
We have seen a general scheme for optimization:
 - generate a population (e.g. from some distribution, or from
             a set of search points)
 - select the best = new search points      EDA
                         EA
==> Small difference between an
    Evolutionary Algorithm (EA) and an
    Estimation of Distribution Algorithm (EDA).
==> Some EA (older than the EDA acronym) are EDAs.
Gives a lot freedom:
 - choose your representation
        and operators (depending on the problem)
 - if you have a step-size, choose adaptation rule
 - choose your population-size (depending on your
                       computer/grid )

 - choose  (carefully) e.g. min(dimension,  /4)
Gives a lot freedom:
 - choose your operators (depending on the problem)
 - if you have a step-size, choose adaptation rule
 - choose your population-size (depending on your
                        computer/grid )

 - choose  (carefully) e.g. min(dimension,  /4)


Can handle strange things:
  - optimize a physical structure ?
  - structure represented as a Voronoi
  - cross-over makes sense, benefits from local structure
  - not so many algorithms can work on that
Voronoi representation:
  - a family of points
Voronoi representation:
  - a family of points
Voronoi representation:
  - a family of points
     - their labels
Voronoi representation:
     - a family of points
        - their labels
==> cross-over makes sense
==> you can optimize a shape
Voronoi representation:
                        - a family of points
                           - their labels
                  ==> cross-over makes sense
                 ==> you can optimize a shape
                  ==> not that mathematical;
                          but really useful


Mutations: each label is changed with proba 1/n
Cross-over: each point/label is randomly drawn from one of
      the two parents
Voronoi representation:
                        - a family of points
                           - their labels
                  ==> cross-over makes sense
                  ==> you can optimize a shape
                   ==> not that mathematical;
                           but really useful


Mutations: each label is changed with proba 1/n
Cross-over: randomly pick one split in the representation:
                 - left part from parent 1
                 - right part from parent 2
               ==> related to biology
Gives a lot freedom:
 - choose your operators (depending on the problem)
 - if you have a step-size, choose adaptation rule
 - choose your population-size (depending on your
                        computer/grid )

 - choose  (carefully) e.g. min(dimension,  /4)


Can handle strange things:
  - optimize a physical structure ?
  - structure represented as a Voronoi
  - cross-over makes sense, benefits from local structure
  - not so many algorithms can work on that
II. Evolutionary algorithms
          a. Fundamental elements
          b. Algorithms
          c. Math. Analysis

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Consider the (1+1)-ES.
  x(n) = x(n-1) or x(n-1) + (n-1)N
  We want to maximize:
              - E log || x(n) - f* ||



Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Consider the (1+1)-ES.
  x(n) = x(n-1) or x(n-1) + (n-1)N
  We want to maximize:
              - E log || x(n) - f* ||
           --------------------------
            - E log || x(n-1) – f* ||
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Consider the (1+1)-ES.
  x(n) = x(n-1) or x(n-1) + (n-1)N
                                                        We don't know f*.
  We want to maximize:
                                                    How can we optimize this ?
              - E log || x(n) - f* ||
                                                         We will observe
           -------------------------- the acceptance rate,
            - E log || x(n-1) – f* ||
                                                     and we will deduce if   
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège    is too large or too small..
- E log || x(n) - f* ||
                                                    ON THE NORM FUNCTION
    --------------------------
   - E log || x(n-1) – f* ||


Rejected                                                         Accepted
mutations                                                        mutations




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
- E log || x(n) - f* ||                             For each step-size,
    --------------------------                  evaluate this “expected progress rate”
   - E log || x(n-1) – f* ||                        and evaluate “P(acceptance)”


Rejected                                                           Accepted
mutations                                                          mutations




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Progress rate




                Rejected
                mutations




                            Acceptance rate
Progress rate        We want to be here!




                Rejected                    We observe
                mutations                  (approximately)
                                            this variable




                                             Acceptance rate
Progress rate




                Rejected
                mutations




                     Big      Acceptance rate
                  step-size
Progress rate




                Rejected
                mutations



                             Small
                            step-size   Acceptance rate
Progress rate




            Rejected
Small acceptance rate
            mutations
 ==> decrease sigma




                        Acceptance rate
Progress rate




                Rejected    Big acceptance rate
                mutations   ==> increase sigma




                                        Acceptance rate
th
1/5 rule


                                Based on maths showing
                                     that good step-size
                                <==> success rate < 1/5




Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud           parallel evolution   98
I. Optimization and DFO
  II. Evolutionary algorithms
  III. From math. programming
  IV. Using machine learning
  V. Conclusions

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
III. From math. programming
  ==>pattern search method
                                                    Comparison with ES:
                                                    - code more complicated
                                                    - same rate
                                                    - deterministic
                                                    - less robust




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
III. From math. programming

             Also:
             - Nelder-Mead algorithm (similar to pattern search,
                  better constant in the rate)




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
III. From math. programming

             Also:
             - Nelder-Mead algorithm (similar to pattern search,
                  better constant in the rate)
             - NEWUOA (using value functions and
                   not only comparisons)




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
I. Optimization and DFO
  II. Evolutionary algorithms
  III. From math. programming
  IV. Using machine learning
  V. Conclusions

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
IV. Using machine learning


    What if computing f takes days ?
==> parallelism
==> and “learn” an approximation of f

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
IV. Using machine learning



Statistical tools: f ' (x) = approximation
                             ( x, x1,f(x1), x2,f(x2), … , xn,f(xn))
                y(n+1) = f ' (x(n+1) )


        e.g. f' = quadratic function closest to f on the x(i)'s.
IV. Using machine learning


 ==> keyword “surrogate models”
 ==> use f' instead of f
 ==> periodically, re-use the real f
I. Optimization and DFO
  II. Evolutionary algorithms
  III. From math. programming
  IV. Using machine learning
  V. Conclusions

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Derivative free optimization is fun.


==> nice maths
==> nice applications + easily parallel algorithms
==> can handle really complicated domains
   (mixed continuous / integer, optimization
   on sets of programs)


Yet,
often suboptimal on highly structured problems (when
       BFGS is easy to use, thanks to fast gradients)
Keywords, readings


==> cross-entropy (so close to evolution strategies)
==> genetic programming (evolutionary algorithms for
               automatically building programs)
==> H.-G. Beyer's book on ES = good starting point
==> many resources on the web
==> keep in mind that representation / operators are
   often the key
==> we only considered isotropic algorithms; sometimes not
   a good idea at all

More Related Content

What's hot

11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...
Alexander Decker
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
LARCA UPC
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
Michael Stumpf
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
Sangwoo Mo
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysis
tuxette
 
A current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systemsA current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systems
Alexander Decker
 
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
Alexander Decker
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with Parallelism
Olivier Teytaud
 
Rouviere
RouviereRouviere
Rouviere
eric_gautier
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Alex (Oleksiy) Varfolomiyev
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Vision
zukun
 
Prévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAMPrévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAM
Cdiscount
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
Adrien Ickowicz
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
zukun
 
simplex.pdf
simplex.pdfsimplex.pdf
simplex.pdf
grssieee
 
Fdtd
FdtdFdtd

What's hot (17)

11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...11.solution of linear and nonlinear partial differential equations using mixt...
11.solution of linear and nonlinear partial differential equations using mixt...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
  Spectral Learning Methods for Finite State Machines with Applications to Na...  Spectral Learning Methods for Finite State Machines with Applications to Na...
Spectral Learning Methods for Finite State Machines with Applications to Na...
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysis
 
A current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systemsA current perspectives of corrected operator splitting (os) for systems
A current perspectives of corrected operator splitting (os) for systems
 
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
 
Artificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with ParallelismArtificial Intelligence and Optimization with Parallelism
Artificial Intelligence and Optimization with Parallelism
 
Rouviere
RouviereRouviere
Rouviere
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Vision
 
Prévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAMPrévision de consommation électrique avec adaptive GAM
Prévision de consommation électrique avec adaptive GAM
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
simplex.pdf
simplex.pdfsimplex.pdf
simplex.pdf
 
Fdtd
FdtdFdtd
Fdtd
 

Viewers also liked

Plantilla trazos
Plantilla trazosPlantilla trazos
Plantilla trazos
Ledis Gutierrez
 
Undecidability in partially observable deterministic games
Undecidability in partially observable deterministic gamesUndecidability in partially observable deterministic games
Undecidability in partially observable deterministic games
Olivier Teytaud
 
reinforcement learning for difficult settings
reinforcement learning for difficult settingsreinforcement learning for difficult settings
reinforcement learning for difficult settings
Olivier Teytaud
 
France presented to Taiwanese people
France presented to Taiwanese peopleFrance presented to Taiwanese people
France presented to Taiwanese people
Olivier Teytaud
 
Complexity of multiobjective optimization
Complexity of multiobjective optimizationComplexity of multiobjective optimization
Complexity of multiobjective optimization
Olivier Teytaud
 
An introduction to SVN
An introduction to SVNAn introduction to SVN
An introduction to SVN
Olivier Teytaud
 
Derandomized evolution strategies (quasi-random)
Derandomized evolution strategies (quasi-random)Derandomized evolution strategies (quasi-random)
Derandomized evolution strategies (quasi-random)
Olivier Teytaud
 
Presentación1 1
Presentación1 1Presentación1 1
Presentación1 1Jhavi17
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
Olivier Teytaud
 
Why power system studies (and many others!) should be open data / open source
Why power system studies (and many others!) should be open data / open sourceWhy power system studies (and many others!) should be open data / open source
Why power system studies (and many others!) should be open data / open source
Olivier Teytaud
 
Batchal slides
Batchal slidesBatchal slides
Batchal slides
Olivier Teytaud
 

Viewers also liked (11)

Plantilla trazos
Plantilla trazosPlantilla trazos
Plantilla trazos
 
Undecidability in partially observable deterministic games
Undecidability in partially observable deterministic gamesUndecidability in partially observable deterministic games
Undecidability in partially observable deterministic games
 
reinforcement learning for difficult settings
reinforcement learning for difficult settingsreinforcement learning for difficult settings
reinforcement learning for difficult settings
 
France presented to Taiwanese people
France presented to Taiwanese peopleFrance presented to Taiwanese people
France presented to Taiwanese people
 
Complexity of multiobjective optimization
Complexity of multiobjective optimizationComplexity of multiobjective optimization
Complexity of multiobjective optimization
 
An introduction to SVN
An introduction to SVNAn introduction to SVN
An introduction to SVN
 
Derandomized evolution strategies (quasi-random)
Derandomized evolution strategies (quasi-random)Derandomized evolution strategies (quasi-random)
Derandomized evolution strategies (quasi-random)
 
Presentación1 1
Presentación1 1Presentación1 1
Presentación1 1
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Why power system studies (and many others!) should be open data / open source
Why power system studies (and many others!) should be open data / open sourceWhy power system studies (and many others!) should be open data / open source
Why power system studies (and many others!) should be open data / open source
 
Batchal slides
Batchal slidesBatchal slides
Batchal slides
 

Similar to Derivative Free Optimization

Theories of continuous optimization
Theories of continuous optimizationTheories of continuous optimization
Theories of continuous optimization
Olivier Teytaud
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
Sahil Kumar
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
Olivier Teytaud
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
Xin-She Yang
 
numdoc
numdocnumdoc
numdoc
webuploader
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
guest3550292
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
Magdi Mohamed
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
npinto
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
Xin-She Yang
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
Jagadeeswaran Rathinavel
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Open GL 04 linealgos
Open GL 04 linealgosOpen GL 04 linealgos
Open GL 04 linealgos
Roziq Bahtiar
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
Xin-She Yang
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
Dr. C.V. Suresh Babu
 
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
Laurent Duval
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
Alg1
Alg1Alg1
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
규영 허
 

Similar to Derivative Free Optimization (20)

Theories of continuous optimization
Theories of continuous optimizationTheories of continuous optimization
Theories of continuous optimization
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
 
Noisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) SurveyNoisy optimization --- (theory oriented) Survey
Noisy optimization --- (theory oriented) Survey
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 
numdoc
numdocnumdoc
numdoc
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
Open GL 04 linealgos
Open GL 04 linealgosOpen GL 04 linealgos
Open GL 04 linealgos
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Introduction
IntroductionIntroduction
Introduction
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
BEADS : filtrage asymétrique de ligne de base (tendance) et débruitage pour d...
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Alg1
Alg1Alg1
Alg1
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 

Recently uploaded

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 

Recently uploaded (20)

AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 

Derivative Free Optimization

  • 1. DERIVATIVE-FREE OPTIMIZATION http://www.lri.fr/~teytaud/dfo.pdf (or Quentin's web page ?) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège using also Slides from A. Auger
  • 2. The next slide is the most important of all. Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 3. In case of trouble, Interrupt me. Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 4. In case of trouble, Interrupt me. Further discussion needed: - R82A, Montefiore institute - olivier.teytaud@inria.fr - or after the lessons (the 25 th , not the 18th) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 5.
  • 6. I. Optimization and DFO II. Evolutionary algorithms III. From math. programming IV. Using machine learning V. Conclusions Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 7. Derivative-free optimization of f Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 8. Derivative-free optimization of f Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 9. Derivative-free optimization of f No gradient ! Only depends on the x's and f(x)'s Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 10. Derivative-free optimization of f Why derivative free optimization ?
  • 11. Derivative-free optimization of f Why derivative free optimization ? Ok, it's slower
  • 12. Derivative-free optimization of f Why derivative free optimization ? Ok, it's slower But sometimes you have no derivative
  • 13. Derivative-free optimization of f Why derivative free optimization ? Ok, it's slower But sometimes you have no derivative It's simpler (by far) ==> less bugs
  • 14. Derivative-free optimization of f Why derivative free optimization ? Ok, it's slower But sometimes you have no derivative It's simpler (by far) It's more robust (to noise, to strange functions...)
  • 15. Derivative-free optimization of f Optimization algorithms ==> Newton optimization ? Why derivative free ==> Quasi-Newton (BFGS) Ok, it's slower But sometimes you have no derivative ==> Gradient descent It's simpler (by far) ==> ...robust (to noise, to strange functions...) It's more
  • 16. Derivative-free optimization of f Optimization algorithms Why derivative free optimization ? Ok, it's slower Derivative-free optimization But sometimes you have no derivative (don't need gradients) It's simpler (by far) It's more robust (to noise, to strange functions...)
  • 17. Derivative-free optimization of f Optimization algorithms Why derivative free optimization ? Derivative-free optimization Ok, it's slower But sometimes you have no derivative Comparison-based optimization (coming soon), It's simpler (by far)comparisons, just needing It's more robust (to noise, to strange functions...) incuding evolutionary algorithms
  • 18. I. Optimization and DFO II. Evolutionary algorithms III. From math. programming IV. Using machine learning V. Conclusions Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 19. II. Evolutionary algorithms a. Fundamental elements b. Algorithms c. Math. analysis Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 20. Preliminaries: - Gaussian distribution - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution - Markov chains ==> for theoretical analysis
  • 21. Preliminaries: - Gaussian distribution - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution - Markov chains
  • 22. K exp( - p(x) ) with - p(x) a degree 2 polynomial (neg. dom coef) - K a normalization constant Preliminaries: - Gaussian distribution - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution - Markov chains
  • 23. K exp( - p(x) ) with - p(x) a degree 2 polynomial (neg. dom coef) - K a normalization constant Translation of the Preliminaries:Gaussian Sze of the - Gaussian distribution Gaussian - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution - Markov chains
  • 24. Preliminaries: - Gaussian distribution - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution
  • 25. Preliminaries: Isotropic case: - Gaussian distribution - Multivariate Gaussian distribution||2 /22) ==> general case: density = K exp( - || x -  ==> level sets are rotationally invariant - Non-isotropic Gaussian distribution ==> completely defined by  and  - Markov chains (do you understand why K is fixed by ?) ==> “isotropic” Gaussian
  • 26. Preliminaries: - Gaussian distribution - Multivariate Gaussian distribution - Non-isotropic Gaussian distribution
  • 27. Step-size different on each axis K exp( - p(x) ) with - p(x) a quadratic form (--> + infinity) - K a normalization constant
  • 28. Notions that we will see: - Evolutionary algorithm - Cross-over - Truncation selection / roulette wheel - Linear / log-linear convergence - Estimation of Distribution Algorithm - EMNA - Self-adaptation - (1+1)-ES with 1/5th rule - Voronoi representation - Non-isotropy
  • 29. Comparison-based optimization Observation: we want robustness w.r.t that: is comparison-based if Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 29
  • 30. Comparison-based optimization yi=f(xi) is comparison-based if Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 30
  • 31. Population-based comparison-based algorithms ? X(1)=( x(1,1),x(1,2),...,x(1,) ) = Opt() X(2)=( x(2,1),x(2,2),...,x(2,) ) = Opt(x(1), signs of diff) … … ... x(n)=( x(n,1),x(n,2),...,x(n,) ) = Opt(x(n-1), signs of diff) ==> let's write it for =2. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 31
  • 32. Population-based comparison-based algorithms ? x(1)=(x(1,1),x(1,2)) = Opt() x(2)=(x(2,1),x(2,2)) = Opt(x(1), sign(y(1,1)-y(1,2)) ) … … ... x(n)=(x(n,1),x(n,2)) = Opt(x(n-1), sign(y(n-1,1)-y(n-1,2)) with y(i,j) = f ( x(i,j) ) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 32
  • 33. Population-based comparison-based algorithms ? Abstract notations: x(i) is a population x(1) = Opt() x(2) = Opt(x(1), sign(y(1,1)-y(1,2)) ) … … ... x(n) = Opt(x(n-1), sign(y(n-1,1)-y(n-1,2)) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 33
  • 34. Population-based comparison-based algorithms ? Abstract notations: x(i) is a population, I(i) is an internal state of the algorithm. x(1),I(1) = Opt() x(2),I(2) = Opt(x(1), sign(y(1,1)-y(1,2)), I(1) ) … … ... x(n),I(n) = Opt(x(n-1),sign(y(n-1,1)-y(n-1,2) ,I(n-1)) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 34
  • 35. Population-based comparison-based algorithms ? Abstract notations: x(i) is a population, I(i) is an internal state of the algorithm. x(1),I(1) = Opt() x(2),I(2) = Opt(x(1), (1), I(1) ) … … ... x(n),I(n) = Opt(x(n-1),(n-1) ,I(n-1)) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 35
  • 36. Comparison-based optimization ==> Same behavior on many functions is comparison-based if Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 36
  • 37. Comparison-based optimization ==> Same behavior on many functions is comparison-based if Quasi-Newton methods very poor on this. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 37
  • 38. Why comparison-based algorithms ? ==> more robust ==> this can be mathematically formalized: comparison-based opt. are slow ( d log ||xn-x*||/n ~ constant) but robust (optimal for some worst case analysis) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 39. II. Evolutionary algorithms a. Fundamental elements b. Algorithms c. Math. analysis Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 40. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) o f an egy ema Strat c sch ution Basi Evol
  • 41. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) o f an egy Compute their  fitness values ema Strat c sch ution Basi Evol
  • 42. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) o f an egy Compute their  fitness values ema Strat c sch ution Select the  best Basi Evol
  • 43. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) o f an egy Compute their  fitness values ema Strat c sch ution Select the  best Basi Evol Let x = average of these  best
  • 44. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) o f an egy Compute their  fitness values ema Strat c sch ution Select the  best Basi Evol Let x = average of these  best
  • 45. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) llel para Compute their  fitness values Multi-cores, sly  Clusters, Grids... ou Select the  best Obvi Let x = average of these  best
  • 46. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) llel para Compute their  fitness values sly  ple. ou Select the  best ly sim Obvi Real Let x = average of these  best
  • 47. Parameters: Generate  points around x x, ( x +  N where N is a standard Gaussian) llel para Not a negligible advantage. Compute their  fitness values When I accessed, for the 1st time, sly  to a crucial industrial ple. code of an important ou Select the  best company, I believed ly sim Obvi that it would be Real clean and bug free. Let x = average of these  best (I was young)
  • 48. Parameters: Generate 1 point x' around x x, ( x +  N where N is a standard Gaussian) Compute its fitness value ) - ES le Keep the best (x or x'). ru (1 +1 1/5 th x=best(x,x') The with =2 if x' best =0.84 otherwise
  • 51. I select the =3 best points
  • 52. x=average of these =3 best points
  • 53. Ok. Choosing an initial x is as in any algorithm. But how do I choose sigma ?
  • 54. Ok. Choosing x is as in any algorithm. But how do I choose sigma ? Sometimes by human guess. But for large number of iterations, there is better.
  • 55.
  • 56.
  • 57.
  • 58. log || xn – x* || ~ - C n
  • 59. Usually termed “linear convergence”, ==> but it's in log-scale. log || xn – x* || ~ - C n
  • 60. Examples of evolutionary algorithms Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 60
  • 61. Estimation of Multivariate Normal Algorithm Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 61
  • 62. Estimation of Multivariate Normal Algorithm Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 62
  • 63. Estimation of Multivariate Normal Algorithm Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 63
  • 64. Estimation of Multivariate Normal Algorithm Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 64
  • 65. EMNA is usually non-isotropic Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 65
  • 66. EMNA is usually non-isotropic Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 66
  • 67. Self-adaptation (works in many frameworks) Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 67
  • 68. Self-adaptation (works in many frameworks) Can be used for non-isotropic multivariate Gaussian distributions. Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 68
  • 69. Let's generalize. We have seen algorithms which work as follows: - we keep one search point in memory (and one step-size) - we generate individuals - we evaluate these individuals - we regenerate a search point and a step-size Maybe we could keep more than one search point ?
  • 70. Let's generalize. We have seen algorithms which work as follows: - we keep one search point in memory (and one step-size) points ==> mu search - we generate individuals - we evaluate thesegenerated individuals ==> lambda individuals - we regenerate a search point and a step-size Maybe we could keep more than one search point ?
  • 71. Parameters: Generate  points x1,...,x around x1,...,x e.g. each x randomly generated from two points llel para Compute their  fitness values sly  ple. ou Select the  best ly sim Obvi Real Don't average...
  • 72. Generate  points around x1,...,x e.g. each x randomly generated from two points
  • 73. Generate  points around x1,...,x e.g. each x randomly generated This is a from two points cross-over
  • 74. Generate  points around x1,...,x e.g. each x randomly generated This is a from two points cross-over Example of procedure for generating a point: - Randomly draw k parents x1,...,xk (truncation selection: randomly in selected individuals) - For generating the ith coordinate of new individual z: u=random(1,k) z(i) = x(u)i
  • 75. Let's summarize: We have seen a general scheme for optimization: - generate a population (e.g. from some distribution, or from a set of search points) - select the best = new search points ==> Small difference between an Evolutionary Algorithm (EA) and an Estimation of Distribution Algorithm (EDA). ==> Some EA (older than the EDA acronym) are EDAs.
  • 76. Let's summarize: We have seen a general scheme for optimization: - generate a population (e.g. from some distribution, or from a set of search points) - select the best = new search points EDA EA ==> Small difference between an Evolutionary Algorithm (EA) and an Estimation of Distribution Algorithm (EDA). ==> Some EA (older than the EDA acronym) are EDAs.
  • 77. Gives a lot freedom: - choose your representation and operators (depending on the problem) - if you have a step-size, choose adaptation rule - choose your population-size (depending on your computer/grid ) - choose  (carefully) e.g. min(dimension,  /4)
  • 78. Gives a lot freedom: - choose your operators (depending on the problem) - if you have a step-size, choose adaptation rule - choose your population-size (depending on your computer/grid ) - choose  (carefully) e.g. min(dimension,  /4) Can handle strange things: - optimize a physical structure ? - structure represented as a Voronoi - cross-over makes sense, benefits from local structure - not so many algorithms can work on that
  • 79. Voronoi representation: - a family of points
  • 80. Voronoi representation: - a family of points
  • 81. Voronoi representation: - a family of points - their labels
  • 82. Voronoi representation: - a family of points - their labels ==> cross-over makes sense ==> you can optimize a shape
  • 83. Voronoi representation: - a family of points - their labels ==> cross-over makes sense ==> you can optimize a shape ==> not that mathematical; but really useful Mutations: each label is changed with proba 1/n Cross-over: each point/label is randomly drawn from one of the two parents
  • 84. Voronoi representation: - a family of points - their labels ==> cross-over makes sense ==> you can optimize a shape ==> not that mathematical; but really useful Mutations: each label is changed with proba 1/n Cross-over: randomly pick one split in the representation: - left part from parent 1 - right part from parent 2 ==> related to biology
  • 85. Gives a lot freedom: - choose your operators (depending on the problem) - if you have a step-size, choose adaptation rule - choose your population-size (depending on your computer/grid ) - choose  (carefully) e.g. min(dimension,  /4) Can handle strange things: - optimize a physical structure ? - structure represented as a Voronoi - cross-over makes sense, benefits from local structure - not so many algorithms can work on that
  • 86. II. Evolutionary algorithms a. Fundamental elements b. Algorithms c. Math. Analysis Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 87. Consider the (1+1)-ES. x(n) = x(n-1) or x(n-1) + (n-1)N We want to maximize: - E log || x(n) - f* || Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 88. Consider the (1+1)-ES. x(n) = x(n-1) or x(n-1) + (n-1)N We want to maximize: - E log || x(n) - f* || -------------------------- - E log || x(n-1) – f* || Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 89. Consider the (1+1)-ES. x(n) = x(n-1) or x(n-1) + (n-1)N We don't know f*. We want to maximize: How can we optimize this ? - E log || x(n) - f* || We will observe -------------------------- the acceptance rate, - E log || x(n-1) – f* || and we will deduce if  Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège is too large or too small..
  • 90. - E log || x(n) - f* || ON THE NORM FUNCTION -------------------------- - E log || x(n-1) – f* || Rejected Accepted mutations mutations Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 91. - E log || x(n) - f* || For each step-size, -------------------------- evaluate this “expected progress rate” - E log || x(n-1) – f* || and evaluate “P(acceptance)” Rejected Accepted mutations mutations Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 92. Progress rate Rejected mutations Acceptance rate
  • 93. Progress rate We want to be here! Rejected We observe mutations (approximately) this variable Acceptance rate
  • 94. Progress rate Rejected mutations Big Acceptance rate step-size
  • 95. Progress rate Rejected mutations Small step-size Acceptance rate
  • 96. Progress rate Rejected Small acceptance rate mutations ==> decrease sigma Acceptance rate
  • 97. Progress rate Rejected Big acceptance rate mutations ==> increase sigma Acceptance rate
  • 98. th 1/5 rule Based on maths showing that good step-size <==> success rate < 1/5 Auger, Fournier, Hansen, Rolet, Teytaud, Teytaud parallel evolution 98
  • 99. I. Optimization and DFO II. Evolutionary algorithms III. From math. programming IV. Using machine learning V. Conclusions Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 100. III. From math. programming ==>pattern search method Comparison with ES: - code more complicated - same rate - deterministic - less robust Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 101. III. From math. programming Also: - Nelder-Mead algorithm (similar to pattern search, better constant in the rate) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 102. III. From math. programming Also: - Nelder-Mead algorithm (similar to pattern search, better constant in the rate) - NEWUOA (using value functions and not only comparisons) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 103. I. Optimization and DFO II. Evolutionary algorithms III. From math. programming IV. Using machine learning V. Conclusions Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 104. IV. Using machine learning What if computing f takes days ? ==> parallelism ==> and “learn” an approximation of f Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 105. IV. Using machine learning Statistical tools: f ' (x) = approximation ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) y(n+1) = f ' (x(n+1) ) e.g. f' = quadratic function closest to f on the x(i)'s.
  • 106. IV. Using machine learning ==> keyword “surrogate models” ==> use f' instead of f ==> periodically, re-use the real f
  • 107. I. Optimization and DFO II. Evolutionary algorithms III. From math. programming IV. Using machine learning V. Conclusions Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 108. Derivative free optimization is fun. ==> nice maths ==> nice applications + easily parallel algorithms ==> can handle really complicated domains (mixed continuous / integer, optimization on sets of programs) Yet, often suboptimal on highly structured problems (when BFGS is easy to use, thanks to fast gradients)
  • 109. Keywords, readings ==> cross-entropy (so close to evolution strategies) ==> genetic programming (evolutionary algorithms for automatically building programs) ==> H.-G. Beyer's book on ES = good starting point ==> many resources on the web ==> keep in mind that representation / operators are often the key ==> we only considered isotropic algorithms; sometimes not a good idea at all