Lecture 2


Basics of probability in statistical
simulation and stochastic programming

Leonidas Sakalauskas
Institute of Mathematics and Informatics
Vilnius, Lithuania <sakal@ktl.mii.lt>

EURO Working Group on Continuous Optimization
Content
 Random variables and random
  functions
 Law of Large numbers

 Central Limit Theorem

 Computer simulation of random
  numbers
 Estimation of multivariate integrals
  by the Monte-Carlo method
Simple remark

   Probability theory displays the library
    of mathematical probabilistic models
   Statistics gives us the manual how to
    choose the probabilistic model
    coherent with collected data
   Statistical simulation (Monte-Carlo
    method) gives us knowledge how to
    simulate random environment by
    computer
Random variable

Random variable is described by

   Set of support SUPP(X )
   Probability measure

Probability measure is described
  by distribution function:

           F ( x) Pr ob( X   x)
Probabilistic measure

 Probabilistic
            measure has three
 components:
        Continuous;

        Discrete (integer);

        Singular.
Continuous r.v.

Continuous r.v. is described by
 probability density function     f (x)

Thus:
                 x
        F ( x)       f ( y )dy
Continuous variable
If probability measure is absolutely
   continuous, the expected value of random
   function:



     Ef ( X )       f ( x) p( x)dx
Discrete variable
Discrete r.v. is described by mass
  probabilities:


    x1 , x2 ,..., xn
     p1 , p2 ,..., pn
Discrete variable
If probability measure is discrete, the
expected value of random function is sum
or series:
                    n
         Ef ( X )         f ( xi ) p i
                    i 1
Singular variable


Singular r.v. probabilistic measure is
  concentrated on the set having zero
  Borel measure (say, Kantor set).
Law of Large Numbers (Chebyshev, Kolmogorov)

                            N
                            i   1
                                  zi
                  lim                  z,
                 N          N

here z1 , z2 ,..., z N are independent copies of r. v.   ,

                        z       E
What did we learn ?


        The integral           f ( x, z ) p( z )dz

    is approximated by the sampling average
                               N
                              i 1   i

                              N
if the sample size N is large, here       j
                                              f ( x, z j ),   j 1,..., N ,
 z1 , z 2 ,..., z N is the sample of copies of r.v. , distributed
 with the density p(z ) .
Central limit theorem (Gauss, Lindeberg, ...)



                         xN
                   lim P                    x          ( x),
                  N       / N

                                    x       y2
here                          1              2
                      ( x)              e        dy,
                              2
       N
             xi
                                        2
xN     i 1        ,          EX ,               D2 X      E(X   )2
        N
Beri-Essen theorem


                                        3
                               EX
sup FN ( x)      ( x)   0.41
                                3
  x                                 N


      where   FN ( x)   Pr ob x N   x
What did we learn ?

       According to the LLN:


                      N                                           N
         N                  ( xi       xN )   2                                       3
     1                                                                  xi       xN
 x        xi ,    2   i 1                         , EX   EX
                                                              3   i 1
     Ni 1                          N                                         N



Thus, apply CLT to evaluate the statistical error of
approximation and its validity.
Example

           Let some event occurred n times repeating
            N independent experiments.
          Then confidence interval of probability of
            event :
                   1.96   p (1 p)        1.96   p (1 p)
               p                    ,p
                          N                     N

   here        n (1,96 – 0,975 quantile of normal distribution,
          p      , confidence interval – 5% )
               N

If the Beri-Esseen condition is valid:     N p (1 p) 6            !!!
Statistical integrating …

                  b
              I       f ( x)dx   ???
                  a


            Main idea – to use the
            gaming of a large
            number of random
            events
Statistical integration


      Ef ( X )         f ( x) p( x)dx


          N
              f ( xi )
          i 1          ,   xi  p( )
            N
Statistical simulation and
     Monte-Carlo method

     F ( x)             f ( x, z ) p( z )dz    min
                                                    x




          N
                f ( x, z i )
          i 1
                               min ,   zi  p ( )
                N               x

(Stochastic Analytical Approximation (SAA),
Shapiro, (1985), etc)
Simulation of random variables

There is a lot of techniques and methods to
  simulate r.v.
Let r.v.    be uniformly distributed in the
  interval (0,1]

Then, the random variable U , where
              F (U )  ,

is distributed with the cumulative
distribution function F ( )
F (a)         x cos( x sin( a x) ) e x dx
           0

f ( x)   x cos( x sin( a x) )    N=100, 1000
Wrap-Up and conclusions

o the expectations of random functions,
defined by the multivariate integrals, can be
approximated by sampling averages
according to the LLN, if the sample size is
sufficiently large;

o the CLT can be applied to evaluate the
reliability and statistical error of this
approximation

Basics of probability in statistical simulation and stochastic programming

  • 1.
    Lecture 2 Basics ofprobability in statistical simulation and stochastic programming Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania <sakal@ktl.mii.lt> EURO Working Group on Continuous Optimization
  • 2.
    Content  Random variablesand random functions  Law of Large numbers  Central Limit Theorem  Computer simulation of random numbers  Estimation of multivariate integrals by the Monte-Carlo method
  • 3.
    Simple remark  Probability theory displays the library of mathematical probabilistic models  Statistics gives us the manual how to choose the probabilistic model coherent with collected data  Statistical simulation (Monte-Carlo method) gives us knowledge how to simulate random environment by computer
  • 4.
    Random variable Random variableis described by  Set of support SUPP(X )  Probability measure Probability measure is described by distribution function: F ( x) Pr ob( X x)
  • 5.
    Probabilistic measure  Probabilistic measure has three components:  Continuous;  Discrete (integer);  Singular.
  • 6.
    Continuous r.v. Continuous r.v.is described by probability density function f (x) Thus: x F ( x) f ( y )dy
  • 7.
    Continuous variable If probabilitymeasure is absolutely continuous, the expected value of random function: Ef ( X ) f ( x) p( x)dx
  • 8.
    Discrete variable Discrete r.v.is described by mass probabilities: x1 , x2 ,..., xn p1 , p2 ,..., pn
  • 9.
    Discrete variable If probabilitymeasure is discrete, the expected value of random function is sum or series: n Ef ( X ) f ( xi ) p i i 1
  • 10.
    Singular variable Singular r.v.probabilistic measure is concentrated on the set having zero Borel measure (say, Kantor set).
  • 11.
    Law of LargeNumbers (Chebyshev, Kolmogorov) N i 1 zi lim z, N N here z1 , z2 ,..., z N are independent copies of r. v. , z E
  • 12.
    What did welearn ? The integral f ( x, z ) p( z )dz is approximated by the sampling average N i 1 i N if the sample size N is large, here j f ( x, z j ), j 1,..., N , z1 , z 2 ,..., z N is the sample of copies of r.v. , distributed with the density p(z ) .
  • 13.
    Central limit theorem(Gauss, Lindeberg, ...) xN lim P x ( x), N / N x y2 here 1 2 ( x) e dy, 2 N xi 2 xN i 1 , EX , D2 X E(X )2 N
  • 14.
    Beri-Essen theorem 3 EX sup FN ( x) ( x) 0.41 3 x N where FN ( x) Pr ob x N x
  • 15.
    What did welearn ? According to the LLN: N N N ( xi xN ) 2 3 1 xi xN x xi , 2 i 1 , EX EX 3 i 1 Ni 1 N N Thus, apply CLT to evaluate the statistical error of approximation and its validity.
  • 16.
    Example Let some event occurred n times repeating N independent experiments. Then confidence interval of probability of event : 1.96 p (1 p) 1.96 p (1 p) p ,p N N here n (1,96 – 0,975 quantile of normal distribution, p , confidence interval – 5% ) N If the Beri-Esseen condition is valid: N p (1 p) 6 !!!
  • 17.
    Statistical integrating … b I f ( x)dx ??? a Main idea – to use the gaming of a large number of random events
  • 18.
    Statistical integration Ef ( X ) f ( x) p( x)dx N f ( xi ) i 1 , xi  p( ) N
  • 19.
    Statistical simulation and Monte-Carlo method F ( x) f ( x, z ) p( z )dz min x N f ( x, z i ) i 1 min , zi  p ( ) N x (Stochastic Analytical Approximation (SAA), Shapiro, (1985), etc)
  • 20.
    Simulation of randomvariables There is a lot of techniques and methods to simulate r.v. Let r.v. be uniformly distributed in the interval (0,1] Then, the random variable U , where F (U ) , is distributed with the cumulative distribution function F ( )
  • 21.
    F (a) x cos( x sin( a x) ) e x dx 0 f ( x) x cos( x sin( a x) ) N=100, 1000
  • 22.
    Wrap-Up and conclusions othe expectations of random functions, defined by the multivariate integrals, can be approximated by sampling averages according to the LLN, if the sample size is sufficiently large; o the CLT can be applied to evaluate the reliability and statistical error of this approximation