Probability Theory


Convergence of Random Variables
              Phong VO
      vdphong@fit.hcmus.edu.vn

          September 11, 2010




           – Typeset by FoilTEX –
Markov and Chebychev Inequalities


Theorem 1. (Markov’s Inequality).        If X is a r.v that takes only
nonnegative values, the for any value a > 0

                                                E(X)
                                  P (X ≥ a) ≤
                                                 a




– Typeset by FoilTEX –                                               1
Proof 1. We give a proof for the case where X is continuous with density
f:

                                      ∞
                         E(X) =           xf (x)dx
                                  0
                                      a                     ∞
                             =            xf (x)dx +            xf (x)dx
                                  0                     a
                                      ∞
                             ≥            xf (x)dx
                                  a
                                      ∞
                             ≥            af (x)dx
                                  a
                                          ∞
                             =a               f (x)dx
                                      a
                             = aP (X ≥ a)

– Typeset by FoilTEX –                                                     2
Theorem 2. (Chebyshev’s Inequality). If X is a r.v with mean µ and
variance σ 2, then, for any value k > 0,


                                               σ2
                              P (|X − µ| ≥ k) ≤ 2
                                               k

Proof 2. Since (X − µ)2 is a nonnegative random variable, we can apply
Markov’s inequality to obtain


                                  2    2    E[(X − µ)2]
                         P ((X − µ) ≥ k ) ≤
                                                k2

   But since (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k, the preceding is
equivalent to

– Typeset by FoilTEX –                                                 3
E[(X − µ)2] σ 2
                   P (|X − µ| ≥ k) ≤       2
                                                = 2
                                         k       k
and the proof is complete




– Typeset by FoilTEX –                                 4
Motivation



• Since statistics and data mining are all about gathering data, it is
  naturally interested in what happens as we gather more and more data.

• It is about the behavior of sequences of random variables.




– Typeset by FoilTEX –                                                5
The Weak Law of Large Numbers (WLLN)



• This is one of the most important theorems in probability theory.

• It is said that the mean of a large sample is close to the mean of the
  distribution.

• The proportion of heads of a large number of tosses is expected to be
  closre to 1/2.




– Typeset by FoilTEX –                                                 6
Let X1, X2, . . . be an IID sample and let E(Xi) = µ and σ 2 = V (Xi).
                                                         n
Recall that the sample mean is defined as Xn = 1/n i=1 Xi and that
E(Xn = µ) and V (Xn) = σ 2/n. Then, with probability 1,
                                             P
Theorem 3. If X1, X2, . . . are IID, then Xn → µ as n → ∞

   Interpretation of WLLN: The distribution of Xn becomes more
concentrated around µ as n gets large.




– Typeset by FoilTEX –                                                  7
The Central Limit Theorem (CLT)


Theorem 4. Let X1, X2, . . . be IID with mean µ and variance σ 2. Let
          n
Xn = 1/n i=1 Xi. Then the distribution of

                                       √
                                           n(Xn − µ)
                                Zn ≡                   Z
                                              σ

     where Z ∼ N (0, 1). In other words,

                                                           a
                                               1                     2
                    limn→∞P (Zn ≤ z) = Ω(z) = √                e−x       /2
                                                                              dx
                                                2π     −∞




– Typeset by FoilTEX –                                                             8
Interpretation of CLT: Probability statement about Xn can be
approximated using a Normal distribution. It’s the probability statements
that we are approximating, not the random variable itself.

• This theorem provides a simple method for computing approximate
  probabilities for sums of independent random variables.

• Explain the remarkable fact that the empirical frequencies of so many
  natural ”‘populations”’ exhibit a bell-shaped curve.

• This theorem holds for any distribution of the Xi’s




– Typeset by FoilTEX –                                                  9
Example 1. ( Normal Approximation to the Binomial) Let X be the
number of times that a fair coin, flipped 40 times, lands heads. Find the
probability that X = 20. Use the normal approximation and then compare
it to the exact solution.

Example 2. Let Xi, i = 1, 2, . . . , 10 be independent r.vs, each being
                                                1
uniformly distributed over (0, 1). Estimate P ( 1 0Xi > 7)

Example 3. The lifetime of a special type of battery is a r.v with mean
40 hours and standard deviation 20 hours. A battery is used until it fails,
at which point it is replaced by a new one. Assuming a stockpile of 25
such batteries, the lifetimes of which are independent, approximate the
probability that over 1100 hours of use can be obtained.




– Typeset by FoilTEX –                                                   10
Stochastic Processes


• A stochastic process {X(t), t ∈ T } is a collection of r.vs. For each t ∈ T ,
  X(t) is a r.v.

• We interpret t as time and X(t) as the state of the process at time t.

• T is called the index set of the process; discrete-time process: T is a
  countable set; continuous-time process: T is an interval of the real line

• The state space of a stochastic process is defined as the set of all possible
  values that the r.v X(t) can assume.

• A stochastic process if a family of r.vs that describe the evolution through
  time of some (physical) process.

– Typeset by FoilTEX –                                                       11
Example 4. Consider a particle that moved along a set of m + 1 nodes,
labeled 0, 1, . . . , m, that are arranged around a circle. At each step the
particle is equally likely to move one position in either the clockwise or
counterclockwise direction. That is, is the position of the particle after its
nth step then



                                                                    1
             P (Xn+1 = i + 1|Xn = i) = P (Xn+1 = i − 1|Xn = i) =
                                                                    2


   where i + 1 ≡ 0 when i = m and i − 1 ≡ m when i = 0. Suppose now
that the particle starts at 0 and continues to move around according to the
preceding rules until all the nodes 1, 2, . . . , m have been visited. What is
the probability that node i, i = 1, 2, . . . , m, is the last one visited?


– Typeset by FoilTEX –                                                      12

Intro probability 4

  • 1.
    Probability Theory Convergence ofRandom Variables Phong VO vdphong@fit.hcmus.edu.vn September 11, 2010 – Typeset by FoilTEX –
  • 2.
    Markov and ChebychevInequalities Theorem 1. (Markov’s Inequality). If X is a r.v that takes only nonnegative values, the for any value a > 0 E(X) P (X ≥ a) ≤ a – Typeset by FoilTEX – 1
  • 3.
    Proof 1. Wegive a proof for the case where X is continuous with density f: ∞ E(X) = xf (x)dx 0 a ∞ = xf (x)dx + xf (x)dx 0 a ∞ ≥ xf (x)dx a ∞ ≥ af (x)dx a ∞ =a f (x)dx a = aP (X ≥ a) – Typeset by FoilTEX – 2
  • 4.
    Theorem 2. (Chebyshev’sInequality). If X is a r.v with mean µ and variance σ 2, then, for any value k > 0, σ2 P (|X − µ| ≥ k) ≤ 2 k Proof 2. Since (X − µ)2 is a nonnegative random variable, we can apply Markov’s inequality to obtain 2 2 E[(X − µ)2] P ((X − µ) ≥ k ) ≤ k2 But since (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k, the preceding is equivalent to – Typeset by FoilTEX – 3
  • 5.
    E[(X − µ)2]σ 2 P (|X − µ| ≥ k) ≤ 2 = 2 k k and the proof is complete – Typeset by FoilTEX – 4
  • 6.
    Motivation • Since statisticsand data mining are all about gathering data, it is naturally interested in what happens as we gather more and more data. • It is about the behavior of sequences of random variables. – Typeset by FoilTEX – 5
  • 7.
    The Weak Lawof Large Numbers (WLLN) • This is one of the most important theorems in probability theory. • It is said that the mean of a large sample is close to the mean of the distribution. • The proportion of heads of a large number of tosses is expected to be closre to 1/2. – Typeset by FoilTEX – 6
  • 8.
    Let X1, X2,. . . be an IID sample and let E(Xi) = µ and σ 2 = V (Xi). n Recall that the sample mean is defined as Xn = 1/n i=1 Xi and that E(Xn = µ) and V (Xn) = σ 2/n. Then, with probability 1, P Theorem 3. If X1, X2, . . . are IID, then Xn → µ as n → ∞ Interpretation of WLLN: The distribution of Xn becomes more concentrated around µ as n gets large. – Typeset by FoilTEX – 7
  • 9.
    The Central LimitTheorem (CLT) Theorem 4. Let X1, X2, . . . be IID with mean µ and variance σ 2. Let n Xn = 1/n i=1 Xi. Then the distribution of √ n(Xn − µ) Zn ≡ Z σ where Z ∼ N (0, 1). In other words, a 1 2 limn→∞P (Zn ≤ z) = Ω(z) = √ e−x /2 dx 2π −∞ – Typeset by FoilTEX – 8
  • 10.
    Interpretation of CLT:Probability statement about Xn can be approximated using a Normal distribution. It’s the probability statements that we are approximating, not the random variable itself. • This theorem provides a simple method for computing approximate probabilities for sums of independent random variables. • Explain the remarkable fact that the empirical frequencies of so many natural ”‘populations”’ exhibit a bell-shaped curve. • This theorem holds for any distribution of the Xi’s – Typeset by FoilTEX – 9
  • 11.
    Example 1. (Normal Approximation to the Binomial) Let X be the number of times that a fair coin, flipped 40 times, lands heads. Find the probability that X = 20. Use the normal approximation and then compare it to the exact solution. Example 2. Let Xi, i = 1, 2, . . . , 10 be independent r.vs, each being 1 uniformly distributed over (0, 1). Estimate P ( 1 0Xi > 7) Example 3. The lifetime of a special type of battery is a r.v with mean 40 hours and standard deviation 20 hours. A battery is used until it fails, at which point it is replaced by a new one. Assuming a stockpile of 25 such batteries, the lifetimes of which are independent, approximate the probability that over 1100 hours of use can be obtained. – Typeset by FoilTEX – 10
  • 12.
    Stochastic Processes • Astochastic process {X(t), t ∈ T } is a collection of r.vs. For each t ∈ T , X(t) is a r.v. • We interpret t as time and X(t) as the state of the process at time t. • T is called the index set of the process; discrete-time process: T is a countable set; continuous-time process: T is an interval of the real line • The state space of a stochastic process is defined as the set of all possible values that the r.v X(t) can assume. • A stochastic process if a family of r.vs that describe the evolution through time of some (physical) process. – Typeset by FoilTEX – 11
  • 13.
    Example 4. Considera particle that moved along a set of m + 1 nodes, labeled 0, 1, . . . , m, that are arranged around a circle. At each step the particle is equally likely to move one position in either the clockwise or counterclockwise direction. That is, is the position of the particle after its nth step then 1 P (Xn+1 = i + 1|Xn = i) = P (Xn+1 = i − 1|Xn = i) = 2 where i + 1 ≡ 0 when i = m and i − 1 ≡ m when i = 0. Suppose now that the particle starts at 0 and continues to move around according to the preceding rules until all the nodes 1, 2, . . . , m have been visited. What is the probability that node i, i = 1, 2, . . . , m, is the last one visited? – Typeset by FoilTEX – 12