Lecture 3


       Stochastic Differentiation
Leonidas Sakalauskas
Institute of Mathematics and Informatics
Vilnius, Lithuania <sakal@ktl.mii.lt>

EURO Working Group on Continuous Optimization
Content
   Concept of stochastic gradient
   Analytical differentiation of expectation
   Differentiation of the objective function of two-
    stage SLP
   Finite difference approach
   Stochastic perturbation approximation
   Likelihood approach
   Differentiation of integrals given by inclusion
   Simulation of stochastic gradient
   Projection of Stochastic Gradient
Expected objective function

    The stochastic programming deals with the objective
    and/or constraint functions defined as expectation of
    random function:
                       F ( x)     Ef x,    ,
                            n
                    f :R              R
         -elementary event in the probability space:
                                , ,Px ,

Px - the measure, defined by probability density function:
                   p : Rn         R
Concept of stochastic gradient

The methods of nonlinear stochastic
programming are built using the concept of
stochastic gradient.

The stochastic gradient of the function   F (x)
is the random vector g (x, ) such that:

                         F ( x)
             Eg ( x, )
                           x
Methods of stochastic differentiation

     Several estimators examined for
      stochastic gradient:
        Analytical approach (AA);

        Finite difference approach (FD);

        Likelihood ratio approach (LR)

        Simulated perturbation
         approximation.
Stochastic gradient:
         an analytical approach


         F ( x)              f ( x, z) p( x, z)dz
                         n




F ( x)
              x   f ( x, z) f ( x, z)    x   ln p( x, z) p( x, z)dz
  x       n




  g ( x, )           x   f ( x, ) f ( x, )            x   ln p( x, )
Analytical approach (AA)
Assume, density of random variable
 doesn’t depends on the decision
 variable.
Thus, the analytical stochastic gradient
 coincides with the gradient of
 random integrated function:

          1         f ( x, )
         g ( x, )
                        x
Analytical approach (AA)
 Let consider the two-stage SLP:

  F ( x) c x E min y q y          min

  W y T x h,         y     Rm ,

       Ax b, x      X,

vectors q, h, and matrices W, T
can be random in general
Analytical approach (AA)

  The stochastic analytical gradient is defined as


              g 1 ( x, ) c T u *
  by the a set of solutions of the dual problem

(h T x)T u*   maxu [(h T x)T u | u W T   q   0, u    m
                                                         ]
Finite difference (FD) approach
Let us approximate the gradient of the random
  function by finite differences.
Thus, the each ith component of the stochastic
  gradient    g 2 ( x, )  is computed as:

         2         f (x   i   , y)   f ( x, y)
        g ( x, )
         i




i    is the vector with zero components except ith
    one, equal to 1,      is some small value.
Simulated perturbation stochastic
  approximation (SPSA)

          3          f (x   , y) f ( x   , y)
         g ( x, y)
                               2

where     is the random vector obtaining values 1 or -1
  with probabilities p=0.5,    is some small value
(Spall 1992).
Likelihood ratio (LR) approach

      F ( x)          f ( x z ) p( z )dz
                  n




 4                                     ln( p ( y ))
g ( x, y )   ( f (x     y)   f ( x))
                                           y
Stochastic differentiation of
integrals given by inclusion

Let consider the integral on the set given by
inclusion


 F ( x)           p( x, z ) dz
          f ( x, z ) B
Stochastic differentiation of
integrals given by inclusion

The gradient of this function is defined as

                         p ( x, z )
 G ( x)                             q( x, z ) dz
          f ( x, z ) B
                             x

where q( x, z ) is defined through
derivatives of p and f (see, Uryasev (1994),
(2002))
Simulation of stochastic gradient
     We assume here that the Monte-Carlo sample of a
     certain size N are provided for any x R n


               Z   ( z1 , z 2 ,..., z N ),

 i
z are independent random copies of           ,

i.e., distributed according to the density   p ( x, z ).
Sampling estimators of the
objective function

the sampling estimator of the objective function:
                      N
           ~          i    1
                             f ( x, z i )
           F ( x)
                             N
and the sampling variance are computed


                     N               i ~         2
                            f ( x, z ) F ( x )
        D 2 ( x)     i 1
                                   N
Sampling estimator of the gradient


The gradient is evaluated using the same
random sample:

                  N
         ~        i   1
                        g ( x, z i )
         G ( x)
                        N
Sampling estimator of the gradient

The sampling covariance matrix is applied
                 1     N           i    ~                 i   ~ T
A( x)                  i 1
                             g ( x, z ) G ( x)     g ( x, z ) G ( x)
         N n

later on for normalising of the gradient
estimator.
Say, the Hotelling statistics can be used for
testing the zero value of the gradient:
             2       N n~ T                 1    ~
         T              G ( x) A( x)             G ( x)
                      n
Wrap-Up and conclusions
   The methods of nonlinear stochastic programming
    are built using the concept of stochastic gradient

   Several methods exist to obtain the stochastic
    gradient by evaluating the objective function and
    stochastic gradient by the same random sample.

Stochastic Differentiation

  • 1.
    Lecture 3 Stochastic Differentiation Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania <sakal@ktl.mii.lt> EURO Working Group on Continuous Optimization
  • 2.
    Content  Concept of stochastic gradient  Analytical differentiation of expectation  Differentiation of the objective function of two- stage SLP  Finite difference approach  Stochastic perturbation approximation  Likelihood approach  Differentiation of integrals given by inclusion  Simulation of stochastic gradient  Projection of Stochastic Gradient
  • 3.
    Expected objective function The stochastic programming deals with the objective and/or constraint functions defined as expectation of random function: F ( x) Ef x, , n f :R R -elementary event in the probability space: , ,Px , Px - the measure, defined by probability density function: p : Rn R
  • 4.
    Concept of stochasticgradient The methods of nonlinear stochastic programming are built using the concept of stochastic gradient. The stochastic gradient of the function F (x) is the random vector g (x, ) such that: F ( x) Eg ( x, ) x
  • 5.
    Methods of stochasticdifferentiation  Several estimators examined for stochastic gradient:  Analytical approach (AA);  Finite difference approach (FD);  Likelihood ratio approach (LR)  Simulated perturbation approximation.
  • 6.
    Stochastic gradient: an analytical approach F ( x) f ( x, z) p( x, z)dz n F ( x) x f ( x, z) f ( x, z) x ln p( x, z) p( x, z)dz x n g ( x, ) x f ( x, ) f ( x, ) x ln p( x, )
  • 7.
    Analytical approach (AA) Assume,density of random variable doesn’t depends on the decision variable. Thus, the analytical stochastic gradient coincides with the gradient of random integrated function: 1 f ( x, ) g ( x, ) x
  • 8.
    Analytical approach (AA) Let consider the two-stage SLP: F ( x) c x E min y q y min W y T x h, y Rm , Ax b, x X, vectors q, h, and matrices W, T can be random in general
  • 9.
    Analytical approach (AA) The stochastic analytical gradient is defined as g 1 ( x, ) c T u * by the a set of solutions of the dual problem (h T x)T u* maxu [(h T x)T u | u W T q 0, u m ]
  • 10.
    Finite difference (FD)approach Let us approximate the gradient of the random function by finite differences. Thus, the each ith component of the stochastic gradient g 2 ( x, ) is computed as: 2 f (x i , y) f ( x, y) g ( x, ) i i is the vector with zero components except ith one, equal to 1, is some small value.
  • 11.
    Simulated perturbation stochastic approximation (SPSA) 3 f (x , y) f ( x , y) g ( x, y) 2 where is the random vector obtaining values 1 or -1 with probabilities p=0.5, is some small value (Spall 1992).
  • 12.
    Likelihood ratio (LR)approach F ( x) f ( x z ) p( z )dz n 4 ln( p ( y )) g ( x, y ) ( f (x y) f ( x)) y
  • 13.
    Stochastic differentiation of integralsgiven by inclusion Let consider the integral on the set given by inclusion F ( x) p( x, z ) dz f ( x, z ) B
  • 14.
    Stochastic differentiation of integralsgiven by inclusion The gradient of this function is defined as p ( x, z ) G ( x) q( x, z ) dz f ( x, z ) B x where q( x, z ) is defined through derivatives of p and f (see, Uryasev (1994), (2002))
  • 15.
    Simulation of stochasticgradient We assume here that the Monte-Carlo sample of a certain size N are provided for any x R n Z ( z1 , z 2 ,..., z N ), i z are independent random copies of , i.e., distributed according to the density p ( x, z ).
  • 16.
    Sampling estimators ofthe objective function the sampling estimator of the objective function: N ~ i 1 f ( x, z i ) F ( x) N and the sampling variance are computed N i ~ 2 f ( x, z ) F ( x ) D 2 ( x) i 1 N
  • 17.
    Sampling estimator ofthe gradient The gradient is evaluated using the same random sample: N ~ i 1 g ( x, z i ) G ( x) N
  • 18.
    Sampling estimator ofthe gradient The sampling covariance matrix is applied 1 N i ~ i ~ T A( x) i 1 g ( x, z ) G ( x) g ( x, z ) G ( x) N n later on for normalising of the gradient estimator. Say, the Hotelling statistics can be used for testing the zero value of the gradient: 2 N n~ T 1 ~ T G ( x) A( x) G ( x) n
  • 19.
    Wrap-Up and conclusions  The methods of nonlinear stochastic programming are built using the concept of stochastic gradient  Several methods exist to obtain the stochastic gradient by evaluating the objective function and stochastic gradient by the same random sample.