AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 3.
More info at http://summerschool.ssa.org.ua
1. Lecture 3
Stochastic Differentiation
Leonidas Sakalauskas
Institute of Mathematics and Informatics
Vilnius, Lithuania <sakal@ktl.mii.lt>
EURO Working Group on Continuous Optimization
2. Content
Concept of stochastic gradient
Analytical differentiation of expectation
Differentiation of the objective function of two-
stage SLP
Finite difference approach
Stochastic perturbation approximation
Likelihood approach
Differentiation of integrals given by inclusion
Simulation of stochastic gradient
Projection of Stochastic Gradient
3. Expected objective function
The stochastic programming deals with the objective
and/or constraint functions defined as expectation of
random function:
F ( x) Ef x, ,
n
f :R R
-elementary event in the probability space:
, ,Px ,
Px - the measure, defined by probability density function:
p : Rn R
4. Concept of stochastic gradient
The methods of nonlinear stochastic
programming are built using the concept of
stochastic gradient.
The stochastic gradient of the function F (x)
is the random vector g (x, ) such that:
F ( x)
Eg ( x, )
x
5. Methods of stochastic differentiation
Several estimators examined for
stochastic gradient:
Analytical approach (AA);
Finite difference approach (FD);
Likelihood ratio approach (LR)
Simulated perturbation
approximation.
6. Stochastic gradient:
an analytical approach
F ( x) f ( x, z) p( x, z)dz
n
F ( x)
x f ( x, z) f ( x, z) x ln p( x, z) p( x, z)dz
x n
g ( x, ) x f ( x, ) f ( x, ) x ln p( x, )
7. Analytical approach (AA)
Assume, density of random variable
doesn’t depends on the decision
variable.
Thus, the analytical stochastic gradient
coincides with the gradient of
random integrated function:
1 f ( x, )
g ( x, )
x
8. Analytical approach (AA)
Let consider the two-stage SLP:
F ( x) c x E min y q y min
W y T x h, y Rm ,
Ax b, x X,
vectors q, h, and matrices W, T
can be random in general
9. Analytical approach (AA)
The stochastic analytical gradient is defined as
g 1 ( x, ) c T u *
by the a set of solutions of the dual problem
(h T x)T u* maxu [(h T x)T u | u W T q 0, u m
]
10. Finite difference (FD) approach
Let us approximate the gradient of the random
function by finite differences.
Thus, the each ith component of the stochastic
gradient g 2 ( x, ) is computed as:
2 f (x i , y) f ( x, y)
g ( x, )
i
i is the vector with zero components except ith
one, equal to 1, is some small value.
11. Simulated perturbation stochastic
approximation (SPSA)
3 f (x , y) f ( x , y)
g ( x, y)
2
where is the random vector obtaining values 1 or -1
with probabilities p=0.5, is some small value
(Spall 1992).
12. Likelihood ratio (LR) approach
F ( x) f ( x z ) p( z )dz
n
4 ln( p ( y ))
g ( x, y ) ( f (x y) f ( x))
y
14. Stochastic differentiation of
integrals given by inclusion
The gradient of this function is defined as
p ( x, z )
G ( x) q( x, z ) dz
f ( x, z ) B
x
where q( x, z ) is defined through
derivatives of p and f (see, Uryasev (1994),
(2002))
15. Simulation of stochastic gradient
We assume here that the Monte-Carlo sample of a
certain size N are provided for any x R n
Z ( z1 , z 2 ,..., z N ),
i
z are independent random copies of ,
i.e., distributed according to the density p ( x, z ).
16. Sampling estimators of the
objective function
the sampling estimator of the objective function:
N
~ i 1
f ( x, z i )
F ( x)
N
and the sampling variance are computed
N i ~ 2
f ( x, z ) F ( x )
D 2 ( x) i 1
N
17. Sampling estimator of the gradient
The gradient is evaluated using the same
random sample:
N
~ i 1
g ( x, z i )
G ( x)
N
18. Sampling estimator of the gradient
The sampling covariance matrix is applied
1 N i ~ i ~ T
A( x) i 1
g ( x, z ) G ( x) g ( x, z ) G ( x)
N n
later on for normalising of the gradient
estimator.
Say, the Hotelling statistics can be used for
testing the zero value of the gradient:
2 N n~ T 1 ~
T G ( x) A( x) G ( x)
n
19. Wrap-Up and conclusions
The methods of nonlinear stochastic programming
are built using the concept of stochastic gradient
Several methods exist to obtain the stochastic
gradient by evaluating the objective function and
stochastic gradient by the same random sample.