Stochastic Differentiation


Published on

AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 3.
More info at

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Stochastic Differentiation

  1. 1. Lecture 3 Stochastic Differentiation Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania <> EURO Working Group on Continuous Optimization
  2. 2. Content  Concept of stochastic gradient  Analytical differentiation of expectation  Differentiation of the objective function of two- stage SLP  Finite difference approach  Stochastic perturbation approximation  Likelihood approach  Differentiation of integrals given by inclusion  Simulation of stochastic gradient  Projection of Stochastic Gradient
  3. 3. Expected objective function The stochastic programming deals with the objective and/or constraint functions defined as expectation of random function: F ( x) Ef x, , n f :R R -elementary event in the probability space: , ,Px , Px - the measure, defined by probability density function: p : Rn R
  4. 4. Concept of stochastic gradient The methods of nonlinear stochastic programming are built using the concept of stochastic gradient. The stochastic gradient of the function F (x) is the random vector g (x, ) such that: F ( x) Eg ( x, ) x
  5. 5. Methods of stochastic differentiation  Several estimators examined for stochastic gradient:  Analytical approach (AA);  Finite difference approach (FD);  Likelihood ratio approach (LR)  Simulated perturbation approximation.
  6. 6. Stochastic gradient: an analytical approach F ( x) f ( x, z) p( x, z)dz n F ( x) x f ( x, z) f ( x, z) x ln p( x, z) p( x, z)dz x n g ( x, ) x f ( x, ) f ( x, ) x ln p( x, )
  7. 7. Analytical approach (AA) Assume, density of random variable doesn’t depends on the decision variable. Thus, the analytical stochastic gradient coincides with the gradient of random integrated function: 1 f ( x, ) g ( x, ) x
  8. 8. Analytical approach (AA) Let consider the two-stage SLP: F ( x) c x E min y q y min W y T x h, y Rm , Ax b, x X, vectors q, h, and matrices W, T can be random in general
  9. 9. Analytical approach (AA) The stochastic analytical gradient is defined as g 1 ( x, ) c T u * by the a set of solutions of the dual problem (h T x)T u* maxu [(h T x)T u | u W T q 0, u m ]
  10. 10. Finite difference (FD) approach Let us approximate the gradient of the random function by finite differences. Thus, the each ith component of the stochastic gradient g 2 ( x, ) is computed as: 2 f (x i , y) f ( x, y) g ( x, ) i i is the vector with zero components except ith one, equal to 1, is some small value.
  11. 11. Simulated perturbation stochastic approximation (SPSA) 3 f (x , y) f ( x , y) g ( x, y) 2 where is the random vector obtaining values 1 or -1 with probabilities p=0.5, is some small value (Spall 1992).
  12. 12. Likelihood ratio (LR) approach F ( x) f ( x z ) p( z )dz n 4 ln( p ( y )) g ( x, y ) ( f (x y) f ( x)) y
  13. 13. Stochastic differentiation of integrals given by inclusion Let consider the integral on the set given by inclusion F ( x) p( x, z ) dz f ( x, z ) B
  14. 14. Stochastic differentiation of integrals given by inclusion The gradient of this function is defined as p ( x, z ) G ( x) q( x, z ) dz f ( x, z ) B x where q( x, z ) is defined through derivatives of p and f (see, Uryasev (1994), (2002))
  15. 15. Simulation of stochastic gradient We assume here that the Monte-Carlo sample of a certain size N are provided for any x R n Z ( z1 , z 2 ,..., z N ), i z are independent random copies of , i.e., distributed according to the density p ( x, z ).
  16. 16. Sampling estimators of the objective function the sampling estimator of the objective function: N ~ i 1 f ( x, z i ) F ( x) N and the sampling variance are computed N i ~ 2 f ( x, z ) F ( x ) D 2 ( x) i 1 N
  17. 17. Sampling estimator of the gradient The gradient is evaluated using the same random sample: N ~ i 1 g ( x, z i ) G ( x) N
  18. 18. Sampling estimator of the gradient The sampling covariance matrix is applied 1 N i ~ i ~ T A( x) i 1 g ( x, z ) G ( x) g ( x, z ) G ( x) N n later on for normalising of the gradient estimator. Say, the Hotelling statistics can be used for testing the zero value of the gradient: 2 N n~ T 1 ~ T G ( x) A( x) G ( x) n
  19. 19. Wrap-Up and conclusions  The methods of nonlinear stochastic programming are built using the concept of stochastic gradient  Several methods exist to obtain the stochastic gradient by evaluating the objective function and stochastic gradient by the same random sample.