SlideShare a Scribd company logo
1 of 52
ST318
Probability Theory
Keegan Kang
Spring 2013
Second Edition
Contents
0 Measures 3
1 Axiomatic Probability Theory 10
2 Independence 12
3 Tail σ−algebra and Kolmogorov’s 0 − 1 law 16
4 Integration 24
5 Expectations 27
6 Inequalities 29
7 Convergence of Random Variables 35
8 Characteristic Functions and the Central Limit Theorem 43
9 Conditional Expectation & Martingales 46
10 Filtrations, martingales and stopping times 50
Notes on the First Edition
Thanks to Pierre Tai and Nico Prokop for pointing out the many typos within.
Keegan Kang
Notes on the Second Edition
These notes were written for the 2010-2011 course, so might not be directly relevant to
our course. There have been changes made since the first edition, but these are almost
exclusively cosmetic.
Iain Carson
2
0 Measures
Definition 0.1 – σ-algebra
F is a σ− algebra if it satisfies the following properties:
• Ω ∈ F
• if A ∈ F, then AC
∈ F
• if {Ai}∞
i=0 ∈ F, then ∞
i=0 ∈ F
If we have F, then (Ω, F) is a measurable space.
Example 0.1 – Examples of σ−algebras on a set Ω
smallest σ−algebra (∅, Ω)
largest σ−algebra power set 2Ω
It is also possible to generate other σ−algebras on Ω.
Take a subset A of Ω, i.e. A ⊆ Ω. We know {A} ∈ 2Ω
.
We look at σ({A}), which is the smallest σ−algebra generated by A.
σ({A}) =
F⊇{A}
F this is non-empty because {A} ⊆ 2Ω
= {∅, Ω, A, AC
}
(0.1)
To say that (0.1) is the smallest σ−algebra generated by A, we need to check that:
ˆ (0.1) fulfills the axioms of a σ−algebra
ˆ (0.1) is contained in every σ−algebra containing A
which are trivial.
Therefore, we can take any arbitrary collection C of subsets where C ⊆ 2Ω
to generate
σ−algebras, and σ(C) =
F⊇C
F, where F are σ−algebras.
Definition 0.2 – Borel σ−algebra (for R)
The Borel σ−algebra is the smallest σ−algebra containing all open sets in the topological
space. So when we get Ω = R, then B(R) is the smallest σ−algebra generated by open
intervals in R.
B(R) = σ (J : J open interval in R) = σ((−∞, x], x ∈ Q)
Consider σ({m} : m ∈ Q). Is σ({m} : m ∈ Q) = B(R)? No, it is not.
Proof. We know that intervals (sets) in B(R) are uncountable (and their complements are
uncountable as well).
So if we can show that the sets in σ({m} : m ∈ Q) are countable, or the complements of
the sets are countable, then σ({m} : m ∈ Q) = B(R).
3
But the sets of all rational points are countable. We can construct a bijection from the set
of all rationals to a subset of N.
Define f : Q → N as follows.
• For each q ∈ Q+
, write q = m
n
where m, n ∈ Z, m, n > 0, hcf(m, n) = 1.
• For each q ∈ Q−
, write q = m
n
where m, n ∈ Z, m < 0, n > 0, hcf(|m|, n) = 1.
Write:
f(q) =



2m
3n
q > 0
5|m|
7n
q < 0
1 q = 0
This is an injection from Q to a subset of N, so there exists a bijection between Q and a
particular subset of N, hence Q is countable.
Therefore σ({m} : m ∈ Q) = B(R).
We also want to show that B(R) = σ((−∞, x], x ∈ Q).
Proof. To show that B(R) = σ((−∞, x], x ∈ Q), we need to show that:
F = σ((−∞, x], x ∈ Q) ⊆ B(R) (†)
F = σ((−∞, x], x ∈ Q) ⊇ B(R) (††)
(†)
It is enough to show that (−∞, x] ∈ B(R) ∀ x ∈ Q.
This is true since (−∞, x]C
= (x, ∞) ∈ B(R) ⇒ F ∈ B(R), as B(R) is closed under
complements.
(††)
It is enough to show that F contains J ∀ J = (a, b) ⊆ R because B(R) is the smallest
σ−algebra containing all open intervals. Then any σ−algebra containing all open intervals
contains B(R).
(a, b) ∈ F ⇔ R  (a, b) ∈ F
⇔ (−∞, a] [b, ∞)
♥
∈ F
We just need to show that and ♥ ∈ F.
It is obvious that ∈ F if a ∈ Q. Otherwise, we construct a decreasing sequence of
rationals ai which tends to a, i.e. ai a, and write = (−∞, a] =
i∈N
(−∞, ai] ∈ F. Hence
(−∞, a] ∈ F for a ∈ R.
We now consider ♥. We have ♥ =
i
−∞, b −
1
i
∈ F if b ∈ Q. Otherwise, we
construct a decreasing sequence of rationals bi which tends to b, i.e. bi b, and write:
♥ = [b, ∞) =
n i
−∞, bi −
1
n
∈ F
4
Hence [b, ∞) ∈ F for all b ∈ R.
We thus have shown that (a, b) ∈ F and (††) holds.
Hence B(R) = σ((−∞, x], x ∈ Q).
There is a fundamental question: When are two measures equal?
Let (Ω, F) be a measurable space and let µ, ν be two measures on (Ω, F). When does the
equality hold? i.e. When does µ(F) = ν(F) ∀ F ∈ F?
Definition 0.3 – d-system
Let Ω be a set. A collection of subsets D ⊆ 2Ω
is a d-system if:
i) Ω ∈ D
ii) If A ⊆ B and A, B ∈ D, then B − A ∈ D
iii) If Am ∈ D ∀ m ∈ N and Am ⊆ Am+1 ∀ m ∈ N, then
m∈N
Am ∈ D
Remarks:
i) In literature, the d−system is also called a Dynkin system, or a λ−system.
ii) Every σ−algebra is a d−system.
iii) If µ, ν are finite measures on (Ω, F), such that µ(Ω) = ν(Ω), then
D = {F ∈ F | µ(F) = ν(F)}
iii) is a d−system.
iv) For any collection I ⊆ 2Ω
, the smallest d−system d(I) that contains I is given by
iv) d(I) =
d systems
D⊇I
D
Proof. Proof of ii)
Axiom i) follows by definition.
Axiom ii) is satisfied, since B − A = (B ∩ A ) and is thus in the σ−algebra.
Axiom iii) is satisfied - let A1 ∪ . . . ∪ An = Bn ∀ n ∈ N. Then Bm ⊆ Bm+1 ∀ m ∈ N, and
Bm =
m∈N
Am ∈ D. Proof of iii)
Axiom i) follows since Ω ∈ F by definition of σ−algebra.
To prove Axiom ii), we need to show that A ⊆ B, A, B ∈ D ⇒ µ(B − A) = ν(B − A).
Rewrite µ(B − A) as µ(B) − µ(A) and similarly ν(B − A) as ν(B) − ν(A). We can do so
since
ˆ these are finite measures and hence µ(A), µ(B) are finite
ˆ A ⊆ B therefore µ(B − A) = µ(B) − µ(A) and ν(B − A) = ν(B) − ν(A)
But then we know µ(A) = ν(A) and µ(B) = ν(B). So Axiom ii) holds.
To prove Axiom iii), we need to show that:
Am ∈ D ∀ m ∈ N and Am ⊆ Am+1 ∀ m ∈ N ⇒ µ
m∈N
Am = ν
m∈N
Am
5
By continuity of measures, we can write:
µ
m∈N
Am = lim
m↑∞
µ(Am)
ν
m∈N
Am = lim
m↑∞
ν(Am)
But then we know lim
m↑∞
µ(Am) = lim
m↑∞
ν(Am). So Axiom iii) holds.
Proof of iv) (to show that d(I) =
d systems
D⊇I
D is a d−system)
We need to show that d(I) is non empty, that it is the smallest d−system, and that it
satisfies the axioms of a d−system.
d(I) is non empty as the set B = {I, I , ∅, Ω} contains I, and B fulfills the axioms of a
d−system. Furthermore, all other d−systems containing I must contain B, and hence B is
the smallest d−system.
To show d(I) satisfies the axioms of a d−system, let Dk be an index of d−systems containing
I.
Axiom 1: Ω ∈ Dk ∀ k ∈ N ⇒ Ω ∈ Dk
Axiom 2: Suppose we have B ∈ Dk ∀ k ∈ N. Then A ⊆ B ⇒ A ∈ Dk ∀ k ∈ N as well.
Hence (B − A) ∈ Dk ∀ k ∈ N and thus we have A ⊆ B, A, B ∈ Dk ⇒ (B − A) ∈ Dk.
Axiom 3: Suppose we have A1, A2, . . . ∈ Dk ∀ k ∈ N, with Am ⊆ Am+1 ∀ m ∈ N. Then
Am ∈ Dk ∀ k ∈ N.
Thus we have A1, A2, . . . ∈ Dk, Am ⊆ Am+1 ∀ m ⇒ Am ∈ Dk ∀ k ∈ N.
So d(I) satisfies the axioms of a d−system.
Definition 0.4 – π−system
Let I ⊆ 2Ω
. Then I is a π−system if ∀ A, B ∈ I ⇒ A ∩ B ∈ I.
Example 0.2 – Examples of π−systems on R.
Consider the set R, and take I1 = {(−∞, x] : x ∈ R}. This is a π−system.
I2 = {(−∞, x] : x ∈ Q} is a π−system as well.
Proof. Suppose we have two sets A and B, with A, B ∈ I1. Without loss of generality, take
a ≤ b. Then:
A = (−∞, a]
B = (−∞, b]
and therefore:
A ∩ B = (−∞, a]
So A ∩ B = A ∈ I1. The same holds for I2. So I1 and I2 are both π−systems.
6
The Borel σ−algebra on R is generated by the π−system(I1) (or I2). In other words,
B(R) = σ(I).
Remark 0.1
A collection C ⊆ 2Ω
is a σ−algebra ⇔ C is a d−system and π−system.
Proof.
(⇒)
We have proved that a σ−algebra is a d−system on Page 5. We need to show that a
σ−algebra is a π−system as well, and want to show that ∀ A, B ∈ F, A ∩ B ∈ F.
Take A, B ∈ F. Then:
A, B ∈ F ⇒ A , B ∈ F by Axiom 2 of σ− algebra
⇒ A ∪ B ∈ F by Axiom 3 of σ−algebra
⇒ A ∪ B ∈ F by Axiom 2 of σ−algebra
⇒ A ∩ B ∈ F by De Morgan’s Laws
(⇐)
We then need to show that the definitions of a π−system and a d−system fulfill the axioms
of a σ−algebra.
Axiom 1 is satisfied due to axiom 1 of the d−system, i.e. Ω ∈ D, so Ω ∈ C.
Axiom 2: Choose A ∈ C. Since A ∪ A = Ω, then A ⊆ Ω, Ω, A ∈ C, B − A = A ∈ C (by
axiom 2 of d−system). So this implies that A ∈ C ⇒ A ∈ C.
Axiom 3: Take A1, A2 ∈ C. Wish to show A1 ∪ A2 ∈ C. If we can do so, then by induction,
A1, A2, . . . ∈ C, Ai ∈ C.
We have proven that Axiom 2 of a σ−algebra is satisfied, so A1, A2 ∈ C. By definition of
π−system, A1 ∩ A2 ∈ C. Again, by using Axiom 2 of a σ−algebra, we have A1 ∩ A2 =
(A1 ∪ A2) ∈ C.
Hence by induction, Axiom 3 is satisfied.
Therefore if C is both a d−system and a π−system, then C is a σ−algebra.
Theorem 0.1 – Monotone Class Theorem for Sets
If I ⊆ 2Ω
is a π−system, then d(I) = σ(I). In other words, the smallest d−system generated
by I coincides with the σ−algebra σ(I) generated by I.
Proof. Need to show that:
• d(I) ⊆ σ(I)
• d(I) ⊇ σ(I)
To show (d(I) ⊆ σ(I)):
We have proven that every σ−algebra is a d−system. So it follows that:
d(I) =
d systems
D⊇I
D ⊆
F⊇I
F σ−algebra
F = σ(I)
7
Hence d(I) ⊆ σ(I).
To show (d(I) ⊇ σ(I)):
To prove this, we note Remark 0.1 and show that d(I) is a d−system and a π−system.
Then d(I) would be a σ−algebra, and d(I) ⊇ σ(I).
Define the family of sets:
D1 = {B ∈ d(I) | B ∩ C ∈ d(I) ∀ C ∈ I}
We wish to show that D1 is a d−system, and is in fact d(I).
First note that I ⊆ D1 since B ∈ I ⊆ d(I) ⇒ B ∩C ∈ d(I) ∀ C ∈ I (this is how we defined
our D1).
Secondly, we show that D1 satisfies the axioms of a d−system.
Axiom 1: ∀ C ∈ I, Ω ∩ C = C ∈ d(I), hence Ω ∈ D1.
Axiom 2: Consider the equality (B − A) ∩ C = (B ∩ C) − (A ∩ C) which holds for every set
A, B, C, given that A ⊆ B.
Pick A, B ∈ D1 which satisfies A ⊆ B, and we want to show that B − A ∈ D1.
Since A, B ∈ d(I), which is a d−system, then B − A ∈ d(I). It suffices to check if (B −
A) ∩ C ∈ d(I) ∀ C ∈ I.
Since A ⊆ B, we have (B ∩ C) ⊇ (A ∩ C), and using the above inequality, we have
(B ∩ C) − (A ∩ C) ∈ d(I), and therefore (B − A) ∈ D1.
Axiom 3: Give Am ∈ D1, such that Am ⊆ Am+1 ∀ m ∈ N, we wish to show that ( Am)∩C ∈
d(I) ∀ c ∈ I.
Note that Am ∩ C ∈ d(I) ∀ m ∈ N, so (Am ∩ C) ⊆ (Am+1 ∩ C) ∀ m.
Therefore ( Am) ∩ C ∈ d(I) ∀ c ∈ I.
Now, we have satisfied the axioms for D1 to be a d−system, and since I ⊆ D1, we can write:
I ⊆ D1 ⊆ d(I) ⇒ D1 = d(I) (0.2)
since D1 contains I, and d(I) is the smallest d−system containing I.
Now consider the family of sets:
D2 = {B ∈ d(I) | B ∩ C ∈ d(I) ∀ C ∈ d(I)}
This is the set of subsets in Ω which is in D1.
We want to show that I ∈ D2: But B ∈ I, C ∈ d(I) ⇒ B ∩ C ∈ d(I) using (0.2).
D2 is also a d−system (similar proof to above, and using the same inequality), and therefore,
we can also write:
I ⊆ D2 ⊆ d(I) ⇒ D2 = d(I)
This implies that d(I) is a π−system.
We have thus shown that d(I) is both a π−system and a d−system, and therefore a
σ−algebra, hence d(I) ⊇ σ(I).
Therefore we have shown: d(I) ⊆ σ(I) and d(I) ⊇ σ(I) and hence d(I) = σ(I).
8
We can use the Monotone class theorem (Theorem 0.1) to state certain relations between
measures.
Proposition 0.1
1) Let µ, ν be two measures on a measurable space (Ω, F) such that µ(Ω) = ν(Ω) < ∞. If
µ(C) = ν(C) ∀ C ∈ I where I is a π−system in F, then µ and ν coincide on the smallest
σ−algebra generated by I, i.e. σ(I).
2) Any two probability measures that agree on a π−system must agree on the σ−algebra
generated by this π−system.
Proof. (of part 1)
We define the set:
D = {A ∈ F | µ(A) = ν(A)}.
This is a d−system (using same proof for part iii) of remarks on Page 5. Since:
µ(C) = ν(C) ∀ C ∈ I ⇒ I ⊆ D,
this implies that the σ−algebra generated by I in D, σ(I) = d(I) ⊆ D by the Monotone
class theorem (Theorem 0.1).
Example 0.3
Let P and P be two probability measures on (R, B(R)). If:
P(−∞, x] = P (−∞, x] ∀ x ∈ R (or ∈ Q),
then P and P coincides on B(R). This follows by Proposition 0.1 and therefore:
B(R) = σ({(−∞, x] | x ∈ R}).
Hence a cumulative distribution function of a probability measure P on (R, B(R)):
F : R → [0, 1] such that F(x) = P(−∞, x]
uniquely defines the measure P.
9
1 Axiomatic Probability Theory
We know from previous Probability courses that Ω is our sample space, i.e. all outcomes of
a random experiment.
Example 1.1
i) Toss a coin twice. Then Ω = {HT, TH, HH, TT}.
ii) An infinite sequence of coin tosses. Then Ω = {ω : N → {T, H}}. Here, |Ω| = ∞, and is
uncountable.
Proof. Proof that Example 1.1 ii) is uncountable.
We attempt a proof by contradiction. Suppose that Ω is countable (infinite). Then we can
enumerate out all possible ω. But if we find an ω not in the list, we get a contradiction, and
hence Ω is uncountable.
Note: What Ω = {ω : N → {T, H}} means is simply the set of all ω. So, assume we have
enumerated out all our ω, say:
ω1 = ω11ω12ω13ω14 . . .
ω2 = ω21ω22ω23ω24 . . .
ω3 = ω31ω32ω33ω34 . . .
ω4 = ω41ω42ω43ω44 . . .
... =
...
with ωij = H or T.
So, we construct a sequence ωk say, such that ωkk = ωii, where i = 1, 2, 3, . . .. ωk is not in
the above list, and therefore Ω is uncountable.
Having defined ii) in Example 1.1, we wish to know the probability of the coin landing H
(or T) on the ith
toss. We thus want:
(Ω, F) : F = σ({ω | ω(i) = H}, {ω | ω(i) = T} : i ∈ N)
We thus need a probability measure on F.
It is possible to embed Ω → [0, 1] where a Lebesgue measure has been constructed such
that:
P[{ω | ω(i) = H}] =
1
2
, ∀ i ∈ N
Definition 1.1 – Random Object / Variable / Vector
Given a probability space (Ω, F, P), a random object X in a measurable space (S, Σ) is a
measurable function X : (Ω, F) → (S, Σ) i.e. the pre-image X−1
(Σ) ⊆ F.
If X ∈ R and X ∈ mB(R), then X is a random variable.
If X : Ω → Rn
, X ∈ mB(Rn
), then X is a random vector.
Recall: X ∈ mB(R) means that X is a measurable function with respect to B(R).
10
Example 1.2 – Defining a random variable
Recall part ii) of Example 1.1, where we defined Ω = {ω : N → {H, T}} and our σ−algebra
to be F = σ({ω(i) = T}, {ω(i) = H} : i ∈ N).
We can define our random variable to be:
Xi(ω) =
1 ω(i) = H
0 ω(i) = T
Now Xi : Ω → {0, 1}, and it is a random variable.
Proof. (that Xi is a random variable)
Let F = {Ω, ∅, A, A } where A is the event that Heads occurs at ith
toss (therefore A is
when tails occurs). So X−1
(1) = A and X−1
(0) = A , which are both in F.
We define a random variable Sn =
n
i=1
Xi, which is the number of heads obtained in n
tosses. This is a random variable as well, since the sum of random variables is a random
variable.
From the notes (lecturer’s Measure Theory notes), it follows that lim
n→∞
1
n
Sn = p ∈ F for
any p ∈ [0, 1]. If p /∈ [0, 1], then we obviously get ∅.
Definition 1.2 – Law of a random variable
Given a random variable X on (Ω, F, P), the law of a random variable is the probability
measure PX
on (R, B(R)) given by:
PX
[B(R)] = P X−1
(B(R))
It is enough to know PX
[(−∞, x)] = P[X ∈ (−∞, x)] ∀ x ∈ R (or Q).
11
2 Independence
Definition 2.1 – Independence
Let (Ω, F, P) be a probability space and let Gi ∈ F be sub−σ-algebras for i ∈ N. The family
of sub−σ−algebras Gi, i ∈ N is independent if for any sequence of events, Gi ∈ Gi, i ∈ N,
we have for a family {i1, i2, . . . , ik} ⊆ N of distinct indices:
P
k
j=1
Gij =
k
j=1
P [Gij] (2.1)
Remarks:
1. (2.1) has to hold for all finite subsets {i1, . . . , ik} ⊆ N.
2. If we have a finite family G1, . . . , Gn of sub−σ−algebras, the condition (2.1) collapses
to:
P
n
i=1
Gi =
n
i=1
P [Gi]
where Gi ∈ Gi for i = 1, 2, . . . , m.
3. Random variables X and Y are independent if and only if σ(X), σ(Y ) are independent.
We may wish to ask what is σ(X). While we know σ(X) = X−1
(B(R)), what does it
mean intuitively? σ(X) can be intuitively thought of as “information we can obtain
about the outcome of the random experiment by knowing X(ω), but not knowing ω”.
For random variables X, Y , we have σ(X) and σ(Y ) independent if and only if:
P[X ∈ A, X ∈ B] = P[X ∈ A] · P[X ∈ B] for A, B ∈ B(R)
4. Let E1, E2, . . . be events in F. They are independent if σ(E1), σ(E2) . . . are indepen-
dent, where:
σ(E1) = {Ω, ∅, E1, E1}
σ(E2) = {Ω, ∅, E2, E2}
... =
...
Proof. Prove that E1, E2, . . . are independent if and only if:
P
m
j=1
Eij =
m
j=1
P [Eij] ∀ {i1, i2, . . . , im}
(⇐)
This follows from Definition 2.1, showing that σ(E1), σ(E2), . . . are independent, but
remark 4 shows that this means E1, E2, . . . are independent.
(⇒) Exercise
12
5. Pairwise independence does not imply independence.
Example 2.1 – Example of above statement
Take Ω = {1, 2, 3, 4}, F = 2Ω
, A = {1, 2}, B = {1, 3}, C = {2, 3}, and define the prob-
ability measure P[ω] = 1
4
for ω ∈ Ω. Note that A, B, A, C and B, C are independent,
since:
P[A ∩ B] = P[{1}] = 1
4
P[A] · P[B] = 1
2
· 1
2
= 1
4
P[A ∩ C] = P[{2}] = 1
4
P[A] · P[C] = 1
2
· 1
2
= 1
4
P[B ∩ C] = P[{3}] = 1
4
P[B] · P[C] = 1
2
· 1
2
= 1
4
However, A, B, C are not independent, since:
P[A ∩ B ∩ C] = 0 = P[A] · P[B] · P[C] = 1
8
Theorem 2.1
See Probability with Martingales, Williams D.W. page 39.
Let (Ω, F, P) be a probability space and sub−σ−algebras H, G ⊆ F be generated by
π−systems I and J respectively. In other words:
σ(I) = H, σ(J ) = G
Then H and G are independent if and only if I and J are independent in the sense
A ∈ I, B ∈ J ⇒ P[A ∩ B] = P[A] · P[B]
Proof. Our goal is to prove:
P[H ∩ G] = P[H] · P[G] ∀ H ∈ H, G ∈ G (2.2)
Fix A ∈ I and consider the following two measures on F:
F → P[F ∩ A]
F → P[F] · P[A]
These measures have equal mass given by P[A].
By assumption, we have that the two measures coincide on J . Therefore by Proposi-
tion 0.1, we have
P[F ∩ A] = P[F] · P[A] ∀ F ∈ σ(J ) = G (2.3)
To show (2.2) , we define two measures. Fix G ∈ G and let
F → P[G ∩ F]
F → P[G] · P[F]
These two measures coincide on the π−system I by (2.3). Hence as before, the two measure
coincide on σ(I) = H.
13
Remarks:
1. Let X, Y be random variables on (Ω, F, P). Then:
X and Y are independent
⇔ P[X ∈ A, Y ∈ B] = P[X ∈ A] · P[Y ∈ B] ∀ A, B ∈ B(R) (by Theorem 2.1)
⇔ P[X ≤ x, Y ≤ y] = P[X ≤ x] · P[Y ≤ y] ∀ x, y ∈ R
We claim this since {(−∞, x] : x ∈ R} is a π−system in B(R) which generates
B(R). Hence {X ≤ x : x ∈ R} is a π−system in σ(X) which generates σ(X)
since σ(X) = X−1
(B(R)). So the π−systems π(X), π(Y ) are independent implies
σ(X), σ(Y ) independent.
2. Similarly, X1, . . . , Xn random variables are independent if and only if
P [Xi ≤ xi : 1 ≤ i ≤ n] =
n
i=1
P [Xi ≤ xi] ∀ xi ∈ R, i = 1, 2, . . .
3. If X is independent of Y and X is independent of Z, then it does not follow that X
is independent of (Y, Z).
Example 2.2
Let X = IA, Y = IB and Z = IC. Let A = {2, 3}, B = {1, 2}, C = {1, 3} be subsets in
Ω = {1, 2, 3, 4}, F = 2Ω
.
Recall that A, B independent, B, C independent. However X is not independent of
(Y, Z).
Definition 2.2 – Joint law (of random variables)
Let X, Y be random variables on (Ω, F, P) and let B(R2
) be the Borel σ−algebra on R2
.
The joint law of X and Y is given by
P(X,Y )
[A] = P [(x, y) ∈ A] ∀ A ∈ B(R2
)
Remarks:
1. Recall that:
B(R2
) = B(R) ⊗ B(R) (2.4)
Prove (2.4), i.e. that B(R) ⊗ B(R) = σ (U × V : U, V ∈ B(R))
(2.4) implies that B(R2
) is generated by the π−system {(−∞, x]×(−∞, y] : x, y ∈ R}
14
Proposition 2.1
The following statements are equivalent:
a) X and Y are independent
b) PX,Y
= PX
⊗ PY
c) Define FXY (x, y) = P[X ≤ x, Y ≤ y] ∀ (x, y) ∈ R2
. Then FXY (xy) = FX(x)FY (y).
Furthermore, if (x, y) has a density, i.e. there exists fXY :R2
→ [0, ∞) such that:
PX,Y
[A] =
A
fXY (xy) dx ⊗ dy
then statements a), b), c) are further equivalent to:
d) fXY (xy) = fX(x)fY (y) ∀ x, y ∈ R2
where fX, fY are the densities of X and Y respec-
tively.
Remark:
Note in d), the existence of fXY implies the existence of the densities of the factors X and
Y and:
fX(x) =
R
fXY (x, y) dy
fY (y) =
R
fXY (x, y) dx
Proof. Based on B(R2
) = B(R) ⊗ B(R). This implies that B(R2
) is generated by the
π−system {(−∞, x] × (−∞, y] : x, y ∈ R}. Apply Theorem 2.1.
15
3 Tail σ−algebra and Kolmogorov’s 0 − 1 law
Definition 3.1 – Tail σ−algebra
Let {Fn : n ∈ N} be a collection of σ−algebras. A tail σ−algebra T is given by:
T =
n∈N
Tn
where Tn = σ (Fn, Fn+1, . . .) = σ
k≥n
Fk .
Remarks:
1. T is a σ−algebra that depends on the tail events of a sequence of experiments where
the outcome of the nth
experiment is given by the σ−algebra Fn.
2. Note that the tail σ−algebra depends on the choice of {Fn : n ∈ N}.
Example 3.1
Let X1, X2, . . . be a sequence of random variables on (Ω, F, P) and define Fn := σ(Xn).
Then Tn = σ(Xn, Xn+1, . . .) ∀ n ∈ N and T =
n∈N
Tn. We define the following events:
F1 = ∃ lim
n→∞
Xn = ω ∈ Ω : lim
n→∞
Xn(ω) exists
F2 =
n∈N
Xn exists
F3 = lim
n→∞
X1 + X2 + . . . + Xn
n
exists
F4 =
n∈N
Xn exists and
n∈N
Xn = 0
Then F1, F2, F3 are contained within the tail σ−algebra of the sequences X1, X2, . . ..
Proof. It helps to intuitively think of tail events as those events whose ocurrence or not is
not affected by altering any finite number of random variables in the sequence.
Claim that F1 ∈ T =
n∈N
Tn. It is enough to show that F1 ∈ Tn ∀ n.
This is clear because the limit of a sequence Xk, k ∈ N only depends on (Xn+k)k∈N ∀ n ∈ N.
In other words, for a sequence to have a limit, we look at the tail of the sequence, i.e. we
can first discard the first finitely many terms.
Similarly, F2 ∈ Tn ∀ n ∈ N ⇒ F2 ∈ T .
To show F3 ∈ T is slightly trickier.
Let ξ = lim sup
n→∞
Sn
n
.
16
We need to show the following:
i) ξ(ω) is well defined ∀ ω ∈ Ω and ξ ∈ mσ(X1, X2 . . .).
ii) ξ ∈ mTn ∀ n.
Consider i). We know that ξ(ω) exists in [−∞, ∞] since every sequence of real numbers has
a lim sup (See Probability Theory Ex Sheet 1 Q1c).
Recall that ξ = inf
k∈N
sup
n≥k
Sn
n
and hence:
{ξ ≥ a} = sup
n≥k
Sn
n
≥ a ∈ σ(Xi, i ∈ N)
which implies ξ ∈ mσ(Xi, i ∈ N). Here we used the fact that {(−∞, a], a ∈ R} is a
π−system which generates B(R).
We now wish to prove ii)
Let Sk := Sn+k − Sn =
n+k
i=n+1
Xi ∈ σ (Xn+1, . . . , Xn+k). Then:
Sk
k
=
Sn+k
n + k
·
n + k
k
−
Sn
k
lim
k→∞
Sk
k
=
Sn+k
n + k
· (1) − 0
=
Sn+k
n + k
Therefore, we have lim sup
k→∞
Sk
k
= lim sup
k→∞
Sn+k
n + k
∈ mTn.
Now consider F4. We knew that F2 ∈ T because F2 does not depend on the first finitely
many terms.
F4 ∈ mG, where G = σ(X1, X2, . . .), but is not in the tail σ−algebra T . This is because the
event given by F4 clearly depends on the value of X1 (and possibly the first finitely many
terms). If X1 is different, Xi may or may not be 0.
So F4 is not necessarily in T .
Theorem 3.1 – Kolmogorov’s 0-1 Law
Let {Fn : n ∈ N} be a sequence of independent sub−σ−algebras in (Ω, F, P). Then
the tail σ−algebra T =
n∈N
Tn where Tn = σ(Fn+1, Fn+2, . . .) satisfies the following two
properties:
i) ∀ F ∈ T ⇒ P[F] = 0 or P[F] = 1
ii) ∀ random variables ξ ∈ mT ∃ c ∈ [−∞, ∞] such that P[ξ = c] = 1.
Proof. We start by proving i).
Define Hn := σ(F1, F2, . . . , Fn).
17
Step 1: We claim that Hn and Tn are independent.
We know that:
In =
n
i=1
Fi : Fi ∈ Fi, i = 1, 2, . . . , n
Jn =
l
i=1
Fn+i : Fn+i ∈ Fn+i, i = 1, 2, . . . l, l ∈ N
Both are π−systems as they are closed under intersection.
In generates Hn since Fi ⊆ In ∀ i = 1, . . . , n and In ⊆ Hn.
Similarly, Jn is a π−system that generates Tn, since Jn ⊇ Fn+i ∀ i ∈ N and Jn ⊆ Tn.
So it is clear that ∀ A ∈ In, ∀ B ∈ Jn ⇒ P[A ∩ B] = P[A] · P[B] since we have {Fk : k ∈ N}
independent sub−σ−algebras. So Hn and Tn are independent.
Step 2: We claim Hn and T are independent.
Since T ⊆ Tn ∀ n ∈ N, then T is independent of Hn ∀ n.
Step 3: We claim that T is independent of σ
n∈N
Hn .
Since T is independent of Hn, then this implies that T is independent of
n∈N
Hn which
further implies that T is independent of σ
n∈N
Hn by Theorem 2.1. Here, we have used
the fact that
n∈N
Hn is a π−system since this is an increasing sequence of σ−algebras.
Step 4: Claim that T is independent of T .
Note that σ
n∈N
Hn = σ (Fi : i ∈ N) and hence T ⊆ σ
n∈N
Hn .
So for F ∈ T ⊆ σ
n∈N
Hn , we must have that F is independent of itself. So:
P[F] = P[F ∩ F] = (P[F])2
But (P[F])2
= P[F] for F ∈ [0, 1] → P[F] = 0 or P[F] = 1.
We now prove part ii).
By part i), we have
P [ξ ≤ x] =
0
1
∀ x ∈ R
Let c := sup {x : P[ξ ≤ x] = 0}. Define sup ∅ == ∞, so c is well defined on [−∞, ∞].
Then there are three cases.
18
If c = −∞, this implies that P [ξ ≤ x] = 1 ∀ x ∈ R ⇒ ξ = −∞ (P−a.s.)
Similarly, if c = +∞, this implies that P [ξ ≤ x] = 0 ∀ x ∈ R ⇒ ξ = +∞ (P−a.s.)
Suppose c ∈ R.
We then have P ξ ≤ c − 1
n
= 0 ∀ n, and hence:
P
n∈N
ξ ≤ c −
1
n
= lim
n→∞
P ξ ≤ c −
1
n
= P [ξ < c] = 0
We also have P ξ ≤ c + 1
n
= 1 ∀ n, and hence:
P
n∈N
ξ ≤ c +
1
n
= P lim
n→∞
ξ ≤ c +
1
n
= P [ξ ≤ c] = 1
Therefore P [ξ = c] = 1 (P−a.s.)
Definition 3.2 – Infinitely often (i.o.)
Let (En)n∈N be a sequence of events in (Ω, F, P). The event that En happens for infinitely
many n ∈ N is given by:
lim sup
n→∞
En :=
m∈N n≥m
En = En i.o. (infinitely often)
Definition 3.3 – Eventually (ev)
Let (En)n∈N be a sequence of events in (Ω, F, P). The event that En happens for all n ≥ m
for some m ∈ N is given as:
lim inf
n→∞
En :=
m∈N n≥m
En = En ev (eventually)
Remarks:
1. We can also write: lim sup
n→∞
En = ω ∈ Ω, ∀ m ∈ N ∃n(ω) ≥ m s.t. ω ∈ En(ω) .
2. Similarly, lim inf
n→∞
En = {ω ∈ Ω, ∃ m(ω) ∈ N s.t. ∀ n ≥ m(ω) we have ω ∈ En}.
3. (En i.o.) = En ev . To see this, note that
m∈N n≥m
En =
m∈N n≥m
En .
4. (En i.o.) , (En ev) ∈ T , where T is the tail σ−algebra of the family Fn = σ(Fn). To see
this, recall that T =
n∈N
σ(Fn, Fn+1, . . .) and note that (En i.o.) ∈ σ(Fm, Fm+1) ∀ m ∈
N since (En i.o.) =
k∈N n≥k
En. This is because the sequence of events n≥k En k∈N
is decreasing.
19
Lemma 3.1 – The first Borel-Cantelli lemma
Let (En)n∈N be a sequence of events in (Ω, F, P) and let
n∈N
P [En] < ∞.
Then P [En i.o.] = P [lim sup En] = 0.
Proof. We have lim sup
n→∞
En =
m∈N
Am, where Am =
m≥n
En.
Since Am ⊆ Am+1 ∀ m ∈ N, we find P [En i.o.] = lim
m→∞
P [Am].
But note that:
0 ≤ P [Am] ≤
n≥m
P[En] → 0 as m → ∞
So this concludes the proof.
Remarks:
1. The first Borel-Cantelli Lemma is very important. It is for example used in the con-
struction of Brownian motion.
2. Let (Ω, F, P) be a probability space and let Q be a probability measure on (Ω, F). We
say that Q is absolutely continuous with respect to P(Q P) if ∀ F ∈ F such that
P[F] = 0 ⇒ Q[F] = 0.
We claim that if Q P, then ∀ > 0, ∃ δ > 0 s.t. ∀ F ∈ F with P[F] < δ ⇒ Q(F) < .
Proof. We begin a proof by contradiction, by showing that the converse statement
leads to a contradiction.
Our converse statement is thus:
∃ > 0 s.t. ∀ δ, ∃ Fδ ∈ F s.t. P [Fδ] < δ and Q [Fδ] ≥
Hence ∀ n ∈ N, pick δn = 2−n
and let Fn ∈ F satisfy P [Fn] < 2−n
and Q[Fn] ≥ .
Let F = lim sup
n→∞
Fn.
We have P[F] = 0 by Borel Cantelli Lemma 1 (Lemma 3.1) since
n∈N
P [Fn] < ∞.
But Q[F] = lim
m→∞
Q
n≥m
Fn ≥ ∀ m.
This implies that Q[F] ≥ . But if P[F] = 0, then Q[F] = 0 as well since Q P.
Hence we have a contradiction.
20
Lemma 3.2 – The second Borel-Cantelli lemma
Let (En)n∈N in (Ω, F, P) be a sequence of independent events with
n∈N
P [En] = ∞.
Then P [En i.o.] = P [lim sup En] = 1.
Proof. Note that (En i.o.) =
m∈N n≥m
En.
So if we can show that this has probability 0, we are done. Then:
P
n≥m
En = lim
k→∞
P
k
n=m
En by monotonicity of measure P[Ω] = 1
= lim
k→∞
k
n=m
P En by independence of En
= lim
k→∞
k
n=m
(1 − P[En])
≤ lim
k→∞
e
−
k
n=m
P [En]
by inequality 1 − x ≤ e−x
∀x
= 0
This implies that P
n≥m
En = 0 ∀ m ∈ N. Therefore we have:
P [En i.o.] = P
m∈N n≥m
En ≤
m∈N
P
m≥n
En = 0
Therefore we are done.
Remarks:
1. Let (En)n∈N be the sequence of independent events. Then P [lim sup En] is either 0 or
1 by Kolmogorov’s 0 − 1 Law.
2. Furthermore, P [lim sup En] = 1 ⇔
n∈N
P[En] = ∞ by Borel Cantelli Lemma 2
(Lemma 3.2) .
Example 3.2 z
1. Let X ∼ N(0, 1). Then the following inequality holds:
f(x)
x + x−1
< P[X > x] <
f(x)
x
, f(x) =
1
√
2π
e−x2
2 ∀ x > 0
This follows by noting that: x
∞
x
f(y) dy <
∞
x
yf(y) dy and that f (x) = −xf(x)
and
d f(x)
x
dx
= −f(x) 1 +
1
x2
.
21
2. Let Xn ∼ N(0, 1) independent and let L = lim sup
n→∞
Xn
√
2 log n
. Show that P[L > 1] = 0.
Proof. Define En(a) := Xn > (1 + a)
√
2 log n , a ∈ R.
Note that L > 1 +
1
k
⊆ lim sup
n→∞
En
1
2k
∀ k ∈ N.
We want to show that P lim sup En
1
2k
= 0 by Borel Cantelli Lemma 1 (Lemma 3.1)
.
P En
1
2k
<
1
√
2π
exp{−1
2
1 + 1
2k
2
2 log n}
1 + 1
2k
√
2 log n
using part 1
=
1
√
2π 1 + 1
2k
·
exp − 1 + 1
2k
2
√
2 log n
Since
n∈N
n−α
√
log n
< ∞ for any α > 1, then Borel Cantelli Lemma 1 (Lemma 3.1)
⇒ P L > 1 + 1
k
= 0.
Hence {L > 1} =
k∈N
L > 1 +
1
k
⇒ P[L > 1] = 0. This is also equivalent to
P [L ≤ 1] = 1.
3. Prove that P[L = 1] = 1.
Proof. It is sufficient to show that P [L < 1 − ] = 0 ∀ > 0.
Recall that {L < 1} =
∞
n=2
L < 1 −
1
n
so if we can show {L < 1 − 1
n
} has P = 0 a.s.
then through our operations of countable union, we have {L < 1} has P = 0 a.s.
We pick > 0, and consider the set:
En(a) =
Xn
√
2 log n
> 1 + a
Then {L < 1 − } = En(− ) ev = (En(− ) i.o.) .
We want to show that P [lim sup En(− )] = 1.
So we need to prove that
n∈N
P [En(− )] = ∞ by showing that P [En(− )] ≥ an for
some sequence where an > 0 such that an = ∞ (using the inequalities in part 1).
Exercise: Find such a sequence an.
Note that En(− ) are independent since random variables Xn are independent. There-
fore Borel Cantelli Lemma 2 (Lemma 3.2) ⇒ P [lim sup En(− )] = 1 ⇒ P [L < 1 − ] =
0.
Exercise: Show that L ∈ mT .
22
Example 3.3
Let Xn ∼ N(0, 1) be independent random variables, and let Sn = X1 +. . .+Xn. Show that:
i)
Sn
√
n
∼ N(0, 1)
ii) lim inf
Sn
n
= lim sup
Sn
n
= 0 (implies lim
Sn
n
exists and = 0)
Note that ii) is the strong Law of Large Numbers for N(0, 1) random variables.
For i) It is easy to check that E
Sn
√
n
= 0 by properties of expectations. So all we need
now is to check that Var
Sn
√
n
= 1. This holds since Var
Sn
√
n
=
1
√
n
2
Var [Sn] =
1
n
n
i=1
Var [Xi].
For ii) we consider the set En = |Sn| ≤ 2
√
n log n , and claim that P [En ev] = 1.
This claim is useful as it shows that we have a bound for Sn, i.e. −
√
2 log n ≤ Sn ≤
√
2 log n
for all large n with probability 1.
Therefore we have:
−2
√
n log n
n
≤
Sn
n
≤
2
√
n log n
n
for large n(P−a.s.). But as n → ∞, we then get:
0 ≤
Sn
n
≤ 0
which would enable us to prove ii). It then suffices to prove the claim.
To prove the claim, note that (En ev) = En i.o.. We wish to apply Borel Cantelli Lemma
1 (Lemma 3.1) , and hence we need to show that
n∈N
P En is finite.
We need to find an upper bound on P En = P
|Sn|
√
n
≥ 2 log n ≤ an, say, and such that
an < ∞.
Exercise: Find this upper bound.
We can then use Borel Cantelli Lemma 1 (Lemma 3.1) to show that P En i.o. = 0.
23
4 Integration
(Ω, F, µ) is a measure space, mF = {f : Ω → R s.t. f−1
(B(R)) ⊆ F}.
The Lebesgue integral is first defined for f ∈ (mF)+
where:
f ∈ (mF)+
⇔ f ∈ mF and f ≥ 0 µ a.s.
Let f =
n
i=1
aiIAi
, AI ∈ F, ai ≥ 0 be a simple function with
Ω
f dµ =
n
i=1
aiµ(Ai).
For general f ∈ (mF)+
, we find (fn)n∈N of simple functions such that fn(ω) f(ω) ∀ ω ∈ Ω.
(Here, means that fn(ω) is a monotone increasing sequence which converges to f(ω))
We define:
Ω
f dµ = lim
n→∞
Ω
fn dµ (4.1)
We need to check that (4.1) is a good definition, and hence need to check:
i) lim exists in (4.1) (which is true since fn ≤ fn+1 ∀ n ⇒
Ω
fn dµ ≤
Ω
fn+1 dµ.
ii) gn(ω) f(ω), gn simple functions, then ∀ ω ∈ Ω:
Ω
gn dµ −→
n→∞
Ω
f dµ
ii) In other words we need to check that the definition is independent of sequences (fn)n∈N.
ii) Exercise: check this.
Theorem 4.1 – Monotone convergence theorem
Take f, fn ∈ (mF)+
such that fn(ω) ≤ fn+1(ω) ∀ n ∈ N, ω ∈ Ω and f(ω) = lim
n→∞
fn(ω).
Then we have
Ω
f dµ = lim
n→∞
Ω
fn dµ.
Properties of Lebesgue Integral
• (Linearity) - For a, b ≥ 0, g, h ∈ (mF)+
, then:
Ω
(ag + bh) dµ = a
Ω
g dµ + b
Ω
h dµ
• f ∈ (mF)+
⇒
Ω
f dµ ≥ 0 (from (4.1)) since it is true for simple functions.
24
Definition 4.1 – Integrable
We define:
L1
(Ω, F, µ) =



f ∈ mF s.t.
Ω
f+
dµ,
Ω
f−
dµ < ∞



where f+
= max{f, 0}, f−
= max{−f, 0}. Then we say f ∈ mF is integrable.
We have:
•
Ω
f dµ :=
Ω
f+
dµ −
Ω
f−
dµ
• |f| = f+
+ f−
Therefore we have:
Ω
f dµ =
Ω
f+
dµ −
Ω
f−
dµ
≤
Ω
f+
dµ +
Ω
f−
dµ
=
Ω
|f| dµ
Lemma 4.1 – Fatou’s lemma
Let (fn)n∈N be a sequence in (mF)+
. Then we have:
Ω
lim inf
n→∞
fn dµ ≤ lim inf
Ω
fn dµ
Proof. Recall that lim inf
n→∞
fn = lim
n→∞
gn where gn = inf{fn, fn+1, . . .}.
Note that (gn)n∈N is non-decreasing and gn ≤ fn ∀ n ∈ N.
By Monotone Convergence Theorem (Theorem 4.1), we have:
Ω
lim inf
n→∞
fn dµ =
Ω
lim
n→∞
gn dµ
= lim
n→∞
Ω
gn dµ
= lim inf
n→∞
Ω
gn dµ since if lim exist, then lim inf exists
≤ lim inf
n→∞
Ω
fn dµ using
Ω
f dµ ≥
Ω
g dµ ∀ n ∈ N
25
Theorem 4.2 – Dominated convergence theorem
Let fn, f ∈ (mF) and assume that ∃ g ∈ L1
(Ω, F, µ) s.t. |fn| ≤ g ∀ n ∈ N and lim
n→∞
fn(ω) =
f(ω) ∀ ω ∈ Ω. Then
Ω
f dµ = lim
n→∞
Ω
fn dµ.
Proof. Note that f ∈ L1
since |f| ≤ g. Here we use f+
≤ g ⇒
Ω
f+
dµ ≤
Ω
g dµ, so
f+
, f−
∈ L1
and bounded by g. We wish to show that
Ω
|f − fn| dµ → 0.
Note that:
2g − |f − fn|
Fn
≥ g − |f| + g − |fn|
≥ 0 ∀ n
By Fatou’s lemma (Lemma 4.1) applied to (Fn)n∈N, we get:
Ω
lim inf Fn dµ ≤ lim inf
Ω
Fn dµ (4.2)
We also know that the terms on the LHS and RHS of (4.2):
Ω
lim inf
n→∞
Fn dµ =
Ω
2g − lim inf
n→∞
|f − fn| dµ
Ω
Fn dµ =
Ω
2g dµ −
Ω
|f − fn|dµ
=
Ω
2g − 0 dµ
=
Ω
2g dµ
Rearranging (4.2), we have:
Ω
2g dµ ≤
Ω
2g dµ − lim sup
n→∞
Ω
|f − fn| dµ
≥0
This implies that lim sup
Ω
|f − fn| dµ = 0.
We know that lim sup
n→∞
Ω
|fn−f| dµ = 0 ⇒ lim
n→∞
Ω
|fn−f| dµ = 0 since lim inf
n→∞
Ω
|fn−f| dµ =
0 as well as |f − fn| is non-negative and lim inf ≤ lim sup.
Therefore
Ω
f dµ −
Ω
fn dµ ≤
Ω
|f − fn| dµ → 0.
26
5 Expectations
We take (Ω, F, P) to be our probability space, and X a random variable, which implies that
X ∈ mF.
If X ≥ 0, then E[X] =
Ω
X dP =
Ω
X(ω) P[dω].
For X ∈ mF, we say X ∈ L1
(Ω, F, P) if E [X+
] , E [X−
] < ∞ where:
X+
= max{X, 0}, X−
= max{−X, 0}
Then expectation of X is given by E[X] = E [X+
] − E [X−
].
Proposition 5.1
Let X be a random variable on (Ω, F, P) and let g : F → R be Borel measurable. Then
g(X) is in L1
(Ω, F, P) ⇔ g ∈ L1
(R, B, PX
) where PX
[A] = P[X ∈ A] ∀ A ∈ B(R). Then we
have:
E [g(X)] =
R
g(x)PX
[dx] (5.1)
Remarks:
1. If X is a continuous random variable, i.e. PX
γL = Lebesgue measure ⇔ PX
[A] =
A
fX(x) dx, then by Proposition 5.1, we have E [g(X)] =
R
g(x)fX(x) dx.
2. If X is a discrete random variable, e.g. X ∈ N with probability 1, then E [g(X)] =
k∈N
g(k)P[X = k] where PX
[k] = P[X = k].
Proof. of Proposition 5.1.
Here we want to show that this holds, starting from indicator random variable, simple
random variable, non-negative random variable, to all random variables.
Let g = IA, A ∈ B(R). Then (5.1) holds, since E [IA(x)] = P[X ∈ A] = PX
[A]. (Indicator
random variables)
By linearity of integrals, and that simple random variables are finitely weighted sums of
indicator functions, then (5.1) holds for simple random variables as well. (Simple random
variables)
Assume g ≥ 0 and let 0 ≤ gn g be a sequence of simple random variables that is
monotonic and converges to g. We have E [gn(X)] =
R
gn(X)PX
[dx]. Then Monotone
Convergence Theorem (Theorem 4.1) implies (5.1) for g ≥ 0. This is because gn(X) is a
simple random variable on Ω, lim
n→∞
gn(x) = g(x), gn(x) g(x), so Monotone Convergence
Theorem (Theorem 4.1) tells us that E [g(X)] = lim
n→∞
E [gn(X)] and
R
gndPX
R
g dPX
.
So (5.1) holds for non-negative random variables. (Non-negative random variables)
27
Lastly, take g ∈ L1
(R, B, PX
), then apply (5.1) to g+
, g−
. Then by linearity of integral, (5.1)
holds for random variables. (All random variables)
Lemma 5.1
X ∈ (mF)+
and E[X] = 0 ⇒ P[X = 0] = 1(⇔ P[X > 0] = 0)
Proof. Note that {X > 0} =
n∈N
X >
1
n
.
We attempt a proof by contradiction, and assume that P[X > 0] > 0.
P[X > 0] > 0 ⇒ ∃ n ∈ N s.t. P X > 1
n
> 0.
Then we have:
E[X] =
Ω
X dP
=
Ω
XI{X> 1
n } dP +
Ω
XI{X≤ 1
n } dP
≥
Ω
XI{X> 1
n } dP
≥
Ω
1
n
I{X> 1
n } dP
=
1
n
P X >
1
n
> 0
Which is a contradiction. Therefore P[X] = 0.
28
6 Inequalities
Definition 6.1 – Convex function (in R)
A function f : I → R, where I ⊆ R is an interval (either open or closed) is convex if
∀ p ∈ (0, 1) and x, y ∈ I, we have f(px + (1 − p)y) ≤ pf(x) + (1 − p)f(y).
If f is a convex function, then f is continuous. Exercise: Prove by contradiction.
Example 6.1 – Examples of convex functions
1. x → |x|
2. x → x2
3. x → eθx
∀ θ ∈ R
Example 6.2 – Examples of non-convex functions
1. x → −|x| (concave function)
2. x → sin x (neither convex or concave) Exercise: Prove this.
Proposition 6.1
If f is both convex and concave on R, there exists a, b ∈ R such that f(x) = ax + b ∀x ∈ R.
Exercise: Prove this.
Exercise: Prove that a concave function is continuous (this follows from a convex function
is continuous).
Proposition 6.2
If f : I → R is in C2
(I), then f is convex ⇔ f (x) ≥ 0 ∀ x ∈ I.
Proof.
⇒
Using Taylor’s theorem, we can expand:
f(x + ) = f(x) + f (x) +
2
2
f (ξx), where ξx ∈ (x, x + )
and
f(x − ) = f(x) − f (x) +
2
2
f (ξx), where ξx ∈ (x, x + )
Then we can write:
f (x) =
f(x + ) + f(x − ) − 2f(x)
2
(6.1)
as when 0, then ξx → x.
Assume f is convex.
Then we can write x = p(x − ) + (1 − p)(x + ) where p = 1
2
.
29
By convexity, we can write:
f(x) = f 1
2
(x − ) + 1
2
(x + ) ≤ 1
2
f(x − ) + 1
2
f(x + )
This gives f(x+ )+f(x− )−2f(x) ≥ 0, and since 2
> 0, this implies that (6.1) (f (x)) ≥ 0.
⇐ Exercise
Theorem 6.1 – Markov’s inequality
Let (Ω, F, P) be a probability space. Then take X ∈ mF, and g : I → [0, ∞] a non-
decreasing B−measurable function where I ⊆ R is an interval such that P[X ∈ I] = 1.
Then g(c) · P[X ≥ c] ≤ E [g(X)] ∀ c ∈ I.
Note here that E [g(X)] exists (which may be +∞) since g(X) ∈ (mF)+
.
Proof.
g(c) · P[X ≥ c] = E [g(c) · IX≥c]
≤ E [g(X) · IX≥c] since on {X ≥ c} we have g(X) ≥ g(c)
as g non-decreasing
≤ E [g(X)] this holds since g(X) ≥ 0.
Example 6.3 – Examples of using Markov’s inequality
Suppose x ∈ mF, > 0. Then:
P [|x| ≥ ] ≤
E [|x|]
(6.2)
and
P [|x| ≥ ] ≤
E [x2
]
2
(6.3)
(6.2) follows by applying Markov’s inequality (Theorem 6.1) to the random variable |X|
and having g : [0, ∞] → [0, ∞], with x → x.
(6.3) follows by applying Markov’s inequality (Theorem 6.1) to the random variable |X|
and having g : [0, ∞] → [0, ∞], with x → x2
. (6.3) is also known as Chebyshev’s inequality.
Theorem 6.2 – Jensen’s inequality
Let (Ω, F, P) be a probability space. Let X be a random variable such that P[X ∈ I] = 1,
where I ⊆ R is an interval. Let g : I → R be a convex function such that E [g(X)] < ∞
and E [|x|] < ∞. Then g (E [X]) ≤ E [g(X)].
Proof. Since g is convex, we have g(x) = sup
n∈N
{anx + bn} ∀ x ∈ I and some sequences
(an)n∈N, (bn)n∈N. Hence:
g(X) ≥ anX + bn ∀ n ∈ N
⇒ E [g(X)] ≥ anE[X] + bn
⇒ E [g(X)] ≥ sup
n∈N
{anE[X] + bn}
= g (E[X])
30
Remarks:
1. If assumptions of Jensen’s inequality (Theorem 6.2) hold and g is concave, then we
get the inequality g (E[X]) ≥ E [g(X)].
2. If a random variable X takes two values, x, y ∈ I with p = P[X = x] and 1 − p =
P[X = y], then Jensen’s inequality (Theorem 6.2) is just the definition of convexity
of g. i.e. g (px + (1 − p)y) ≤ pg(x) + (1 − p)g(y).
3. Under assumptions above, we have E[X] ∈ I.
Exercise: Prove this. Hint: If I = (a, b), then P[X < b] = 1 and P[X > a] = 1. We
have E[X] < E[b] = b.
Hence g (E[X]) is well defined.
Definition 6.2 – Lp
space
We define Lp
(Ω, F, P) to be {X ∈ mF : E [|x|p
] < ∞}.
p = 1, 2 are the most common, but for p ≥ 1 we get a vector space.
Proof. (p ≥ 1) is a vector space)
We first note that:
(x + y)p
≤ (2 max{x, y})p
≤ 2p
(xp
+ yp
) ∀ x, y ≥ 0 (6.4)
Take X, Y ∈ Lp
. We need to show that E [|αX + βY |p
] < ∞ for α, β ∈ R. So:
E [|αX + βY |p
] ≤ E [(|αX| + |βY |)p
] by inequality
≤ 2p
(E [|α|p
|X|p
] + E [|β|p
|Y |p
]) using (6.4)
< ∞
Definition 6.3 – x p
We define x p := (E [|X|p
])
1
p for X ∈ Lp
, p ≥ 1.
Note that this is not a norm - the first property fails.
Theorem 6.3 – Cauchy-Schwarz inequality
Take X, Y ∈ L2
. Then XY ∈ L1
and:
|E[XY ]| ≤ E [|XY |] ≤ E[X2
]E[Y 2
]
1
2
Furthermore, we have equality if there exists a, b ∈ R s.t. |a| + |b| > 0 and aX + bY =
0 (P−a.s.)
Proof. We first note that:
0 ≤ (X + λY )2
∀ λ ∈ R (6.5)
and X + λY ∈ L2
.
31
Hence XY = 1
2
[(X + Y )2
− X2
− Y 2
] ∈ L1
.
From (6.5), we have:
0 ≤ X2
+ 2λXY + λ2
Y 2
⇒ E[0] ≤ E[X2
] + 2λE[XY ] + λ2
E[Y 2
] ∀ λ ∈ R
We can differentiate the above function and find λ which gives the minimum value, which
is λ = −
E[XY ]
E[Y 2]
.
Note that if E[Y 2
] = 0 ⇒ Y = 0 (P−a.s.), then Cauchy-Schwarz inequality (Theorem 6.3)
holds. So WLOG, we assume Y = 0.
Substituting the value of λ, we get:
0 ≤ E[X2
] − 2
E[XY ]2
E[Y 2]
+
E[XY ]2
E[Y 2]
⇒ E[XY ]2
≤ E[X2
]E[Y 2
]
This satisfies our theorem. However, if we have equality, then we know that:
0 = E (X + λY )2
for λ =
E[XY ]
E[Y 2]
This implies that X + λY = 0 (P−a.s.)
If Y is 0 (P−a.s.), then E[X2
] = 0, and therefore X = 0 (P−a.s.)
As L2
(Ω, F, P) is a vector space, and we define the ‘inner product’: < X, Y > = E[XY ].
This is well defined since X, Y ∈ L2
⇒ XY ∈ L1
.
Then the Cauchy-Schwarz inequality takes the form:
| < X, Y > | ≤ X 2 Y 2
where X 2 = E [|X|2
]
1
2
.
Note that the inequality X+Y 2 ≤ X 2+ Y 2 holds for X, Y ∈ L2
by Cauchy-Schwarz
inequality (Theorem 6.3).
Proof.
X + Y 2
2 = E [(X + Y )2
]
= E[X2
] + E[Y 2
] + 2E[XY ]
≤ E[X2
] + E[Y 2
] + 2E[X2
]
1
2 E[Y 2
]
1
2 by applying Cauchy-Schwarz inequality
= ( X 2 + Y 2)2
32
Theorem 6.4 – Monotonicity of Lp
norms
Given X ∈ Lp
(Ω, F, P), p ≥ 1; X p = E [|X|p
]
1
p , then for 1 ≤ p ≤ r < ∞, we have for any
Y ∈ Lr
(Ω, F, P), Y p ≤ Y r. In particular, Lr
⊆ Lp
.
Proof. Note that g(x) = x
r
p is convex on [0, ∞). Then we have:
g (E [|Y |p
]) ≤ E [g (|Y |p
)] by Jensen’s inequality
⇒ E [|Y |p
]
r
p ≤ E [|Y |r
]
⇒ Y p ≤ Y r by taking rth
root on both sides
Remark:
Note that Theorem 6.4 holds for probability measures only. Exercise: Find f ∈ L2
(R, B, γL)
such that f /∈ L1
(R, B, γL).
Recap of Definitions from probability:
•Cov[X, Y ] = E [(X − E[X])(Y − E[Y ])] is well defined
• Var[X] = Cov[X, X]
• |Cov[X, Y ]| ≤ Var[X]Var[Y ] (by Cauchy-Schwarz inequality)
• ρ(X, Y ) =
Cov[X, Y ]
Var[X]Var[Y ]
∈ [−1, 1] (correlation between 2 random variables)
These concepts are well defined if X, Y ∈ L2
.
Theorem 6.5 – Independence
If random variables X, Y ∈ L1
(Ω, F, P) and X and Y are independent, then XY ∈
L1
(Ω, F, P). Furthermore E[XY ] = E[X]E[Y ].
Remarks:
1. Let f, g : R → R ∈ mB and (independent) X, Y as in Theorem 6.5. Then if
E [f(X)] , E [g(Y )] are finite, we have:
E [f(X)g(Y )] = E [f(X)] E [g(Y )] (6.6)
Exercise: Prove that X, Y independent ⇒ f(X), g(Y ) independent. Use the fact that
f(X), g(Y ) ∈ mF.
To prove (6.6), we apply Theorem 6.5 to f(X) and g(Y ).
2. If X, Y are independent in L2
⇒ Cov[X, Y ] = 0. So:
Cov[X, Y ] = E [(X − E[X])(Y − E[Y ])]
= E [X − E[X]] E [Y − E[Y ]]
= 0
33
Example 6.4
Take E[X] = 0, E[|X|3
] < ∞. In other words, X ∈ L3
and E[X] = 0. If E[X3
] = 0, then
Cov[X, X2
] = 0 and X and X2
are not independent.
Prove this.
Proof. Sketch proof of Theorem 6.5 Exercise: Write out full proof.
Note that it is enough to prove theorem for X, Y ∈ L1
∩ (mF)+
since X = X+
− X−
, Y =
Y +
− Y −
.
Note that X+
and Y +
are independent since X+
and Y +
are some functions of X, Y (max{X, 0},
max{Y, 0}) and use linearity of expectation.
Assume X, Y ≥ 0 and note that α(n)
(X) X, α(n)
(Y ) (Y ) ∀ ω ∈ Ω for α(n)
: R → R
given by:
α(n)
(x) :=



0 x = 0
(i − 1)2−n
(i − 1)2−n
< x ≤ i2−n
≤ r, i ∈ N
n x > n
Then note that: (Show this as an exercise)
1. α(n)
(X) is a simple function.
2. α(n)
(X), α(n)
(Y ) are independent.
3. E α(n)
(X)α(n)
(Y ) = E α(n)
(X) E α(n)
(Y ) ∀ n.
4. α(n)
(X)α(n)
(Y ) XY as n → ∞.
5. Theorem 6.5 follows by Monotone Convergence Theorem (Theorem 4.1) on 3.
34
7 Convergence of Random Variables
Let (Xn)n∈N be a sequence of random variables on (Ω, F, P).
Definition 7.1 – Converge almost surely
The sequence (Xn)n∈N converges to a random variable X almost surely if the set:
lim
n→∞
Xn = X = ω ∈ Ω lim
n→∞
Xn(ω) = X(ω)
has probability 1, i.e. P lim
n→∞
Xn = X = 1.
Definition 7.2 – Converges in probability
The sequence (Xn)n∈N converges in probability to a random variable X if:
∀ > 0, lim
n→∞
P [|Xn − X| > ] = 0
Definition 7.3 – Converges in Lp
Let Xn ∈ Lp
(Ω, F, P), p ≥ 1 ∀ n. Then the sequence (Xn)n∈N converges in Lp
to a random
variable X ∈ Lp
if E [|Xn − X|p
] −−−−→
n→∞
0.
Notation: Xn
· p
−→ X.
Definition 7.4 – Cauchy (in Lp
)
A sequence (Xn)n∈N is Cauchy in Lp
if:
∀ > 0 ∃ N ∈ N s.t. Xn − Xm p < ∀ n, m > N
Definition 7.5 – Converges in distribution
The sequence (Xn)n∈N converges in distribution to a random variable X if:
lim
n→∞
P [Xn ≤ x] = P [X ≤ x] ∀ x ∈ R
such that the cdf FX(y) = P [X ≤ y] is continuous.
Convergence in distribution is also known as weak convergence.
Notation: Xn
d
−→ X or Xn
w
−→ X.
Remarks:
1. Note that if Xn
d
−→ X, then the random variables (Xn)n∈N, X need not be defined
on the same probability space. For other modes of convergence, (Xn)n∈N, X have to
be defined on the same probability space.
2. (Xn)n∈N in Lp
is Cauchy if and only if sup
n,m≥N
Xn − Xm −−−−→
n→∞
0.
35
Lemma 7.1
Convergence in probability implies almost sure convergence along the subsequence. In other
words, if (Xn)n∈N converges in probability to X, with (Xn)n∈N, X ∈ (mF) (on probability
space (Ω, F, P), then there exists a subsequence (Xkn )n∈N s.t. Xkn
a.s.
−→ X.
Proof. Idea of proof examinable
Let ( n)n∈N be a decreasing sequence of positive real numbers such that n 0.
Then ∀ n ∈ N, ∃ kn ∈ N s.t. P [|Xkn − X| > n] < 2−n
(since Xn
P
−→ X).
WLOG, we can assume that kn < kn+1 ∀ n ∈ N.
Now we prove that (Xkn )n∈N tends to X almost surely, using Borel Cantelli Lemma 1
(Lemma 3.1).
Note that ∀ ω ∈ Ω, we have:
(Xkn (ω))n∈N converges to X(ω) ⇔ ω ∈
m∈N
lim inf
n→∞
|Xkn − X| ≤
1
m
(7.1)
Fix m, then note:
lim inf
n→∞
|Xkn − X| ≤
1
m
⊇ lim inf
n→∞
{|Xkn − X| ≤ n} since n 0
= lim sup
n→∞
{|Xkn − X| > n}
Now:
n∈N
P [|Xkn − X| > n] < ∞
⇒ P lim sup
n→∞
{|Xkn − X| > n} = 0 by Borel Cantelli Lemma 1 (Lemma 3.1)
⇒ P lim inf
n→∞
|Xkn − X| ≤
1
m
= 1 ∀ m ∈ N
⇒ P
m∈N
lim inf
n→∞
|Xkn − X| ≤
1
m
= 1
⇒ P [{Xkn → X}] = 1 by (7.1)
Remark:
If (Xn)n∈N, X are random variables on (Ω, F, P) and Xn ∈ mG ∀ n ∈ N, where G ⊆ F, then
if Xn
P
−→ X, we also have X ∈ mG.
Proof. By Lemma 7.1, ∃ subsequence (Xkn )n∈N s.t. Xkn
a.s.
−−−−→
n→∞
X ⇒ X ∈ mG since
Xkn ∈ mG ∀ n ∈ N.
36
Proposition 7.1
A sequence of random variables (Xn)n∈N converges to X in distribution (or converges weakly)
if and only if lim
n→∞
E [f(Xn)] = E [f(X)] ∀ f : R → R continuous and bounded.
Proof. (⇒)
Let Fn(x) = P[Xn ≤ x], F(x) = P[X ≤ x] be the cdf of Xn and X respectively.
Let ([0, 1], B, γL) be a probability space and define random variables:
Yn(ω) := inf {z ∈ R : ω ≤ Fn(z)} ∀ ω ∈ [0, 1]
Y (ω) := inf {z ∈ R : ω ≤ F(z)} ∀ ω ∈ [0, 1]
Exercise: Show that Yn(ω), Y (ω) ∈ mB.
Note: Yn(ω) ≤ y ⇔ ω ≤ Fn(y) ∀ y ∈ R.
Exercise: Show this.
Therefore:
γL (Yn ≤ y) = Fn(y)
= P [Xn ≤ y]
and E [f(Xn)] = E [f(Yn)].
A similar equality holds for X and Y .
Now:
Xn
d
−→ X ⇒ Fn(x) → F(x) ∀ x ∈ R  {points of discontinuity of F}
⇒ Yn → Y γL a.s.
Hence E [f(Yn)] −−−−→
n→∞
E [f(Y )] = E [f(X)] by Dominated Convergence Theorem (Theo-
rem 4.2) since f(Yn) → f(Y ) as f is continuous and |f(Yn)| ≤ sup
x∈R
|f(x)| < ∞. ⇐ as
homework.
Theorem 7.1 – Modes of Convergence
The implications between modes of converges of random variables are:
a) almost sure convergence implies convergence in probability.
b) Lp
convergence (for p ≥ 1) implies convergence in probability.
c) convergence in probability implies convergence in distribution.
Proof.
a) Let (Xn)n∈N converge almost surely to X, and pick > 0. We need to prove that
P [|Xn − X| > ] −−−−→
n→∞
0.
Let An := {|Xn − X| > } and note that P [An i.o.] = P [Xn does not converge to X] = 0
by our initial assumption. Then:
37
0 = P [An i.o.]
= P lim sup
n→∞
An
= P
m∈N
Bm where Bm =
n≥m
An
= lim
m→∞
P [Bm] since Bm ⊇ Bm+1 ∀ m ∈ N
= inf
m∈N
P[Bm] since (P[Bm])m∈N decreasing
≥ inf
m∈N
sup
n≥m
P[An] since P[Bm] ⊇ P[An] ∀ n ≥ m
= lim
m→∞
sup
n≥m
P[An]
≥ 0
This implies that lim
m→∞
P[Am] = 0.
b) Let (Xn)n∈N converge in Lp
to X.
Pick > 0, then we apply Markov’s inequality (Theorem 6.1) to f(x) = xp
; f : R+
→ R+
to get:
0 ≤ p
P [|Xn − X| > ] ≤ E [|Xn − X|p
]
Hence lim
n→∞
P [|Xn − X| > ] = 0 since lim
n→∞
E [|Xn − X|p
] = 0 by assumption.
c) Let Xn
P
−→ X, and pick f : R → R continuous and bounded.
We need to check that E [f(Xn)] −−−−→
n→∞
E [f(X)].
We argue by contradiction.
Contrapositive statement: ∃ > 0 and an increasing subsequence (kn)n∈N, kn ∈ N s.t.
|E [f (Xkn )] − E [f(X)]| > .
We denote Yn := Xkn ∀ n ∈ N. Then note that:
Xkn
P
−→ X ⇒
∃ subsequence of (Yn)n∈N , say (Yln )n∈N
s.t. Yln −−−−→
n→∞
X a.s. by Lemma 7.1
⇒ f (Yln ) −−−−→
n→∞
f(X) a.s. as f continuous
⇒ lim
n→∞
E [f (Yln )] = E [f(X)]
by Dominated Convergence Theorem (Theorem 4.2) as f bounded
⇒ |E [f (Yln )] − E [f(X)]| < ∀ n ≥ N0 ∈ N
This is a contradiction.
38
Corollary 7.1
Xn → X in probability if and only if every subsequence (Xkn )n∈N of (Xn)n∈N has a further
subsequence that converges almost surely to X.
Proof.
(⇒)
This follows from Lemma 7.1 since Xkn
P
−→ X.
(⇐)
We will prove the negation of the statement.
Assume that (Xn)n∈N does not converge to X in probability, and:
∃ , δ > 0 and k : N → N s.t. P[|Xk(n) − X| >
Ak(n)
] ≥ δ ∀ n ∈ N
Then no subsequence of Xk(n) n∈N
converges to X almost surely.
Let l : N → k(N) be an increasing function. We must show that this subsequence Xl(n) n∈N
of Xk(n) n∈N
does not converge to X almost surely.
Note that:
P Al(n) i.o. = P lim sup
n→∞
Al(n)
≥ lim sup
n→∞
P Al(n) by Fatou’s lemma (Lemma 4.1)
≥ δ
> 0
Therefore the negation is proved.
Corollary 7.2 – Continuous mapping theorem
Let (Xn)n∈N converge to X in probability (or respectively converge to X in distribution), and
let f : R → R be a continuous function. Then (f(Xn))n∈N converges to f(X) in probability
(or respectively in distribution).
Proof.
Converges in probability
By Corollary 7.1, since f(Xn)
P
−−−−→
n→∞
f(X) if and only if every subsequence f Xk(n) n∈N
has a further subsequence that tends to f(X) almost surely because f is continuous.
Converges in distribution
By Proposition 7.1, since f(Xn)
d
−−−−→
n→∞
f(X) if and only if E [g (f(Xn))] −−−−→
n→∞
E [g (f(X))]
for every g : R → R that is continuous and bounded.
39
Theorem 7.2 – Weak law of large numbers
Let Yn ∈ L2
and {Yn : n ∈ N} independent and let µ = E[Yn] identically distributed. µ is
finite since L2
⊂ L1
. Define Xn =
1
n
n
i=1
Yi. Then Xn
P
−−−−→
n→∞
µ.
Proof. For every > 0, we have:
2
P [|Xn − µ| ≥ ] ≤ E [(Xn − µ)2
] by Markov’s inequality (Theorem 6.1)
= Var [Xn]
=
1
n2
Var
n
i=1
Yi
=
Var [Y1]
n
→ 0 as n → ∞
Therefore Xn
P
−−−−→
n→∞
µ.
Theorem 7.3 – Strong law of large numbers
Let Yn ∈ L4
and {Yn : n ∈ N} independent and let µ = E[Yn] identically distributed. µ is
finite since L4
⊂ L1
. Then Xn−−−−→
n→∞
µ almost surely.
Proof. Without loss of generality, we assume µ = 0. Otherwise, we could consider Yn =
Yn − µ. Then:
E [X4
n] =
1
n4
n
i=1
E Y 4
i + 6
n
1≤i<j≤n
E Y 2
i Y 2
j
≤
1
n4
nc + 6
n(n − 1)
2
E Y 2
1 Y 2
2 for some constant c > 0
≤
d
n2
for some constant d > 0
Thus this implies E
∞
n=1
X4
n ≤
∞
n=1
d
n2
< ∞.
Therefore Xn−−−−→
n→∞
0 almost surely.
Theorem 7.4 – Completeness of Lp
The space Lp
(Ω, F, P) is complete for any p ≥ 1. In other words, any Cauchy sequence
(Xn)n∈N in Lp
has a limit in Lp
. In other words, there exists a random variable X in Lp
such that Xn
· p
−→ X.
Proof. Exercise
40
Remarks:
1. Proof of this theorem uses Borel Cantelli lemma, Fatou’s lemma, etc. See notes for
details.
2. If p = 2, we define < X, Y >= E[XY ] where X, Y ∈ L2
. Pythagoras Theorem says:
If X, Y ∈ L2
satisfy < X, Y >= 0 (i.e. they are orthogonal), then:
X + Y 2
2 = X 2
2 + Y 2
2 (7.2)
where X 2 =
√
< X, X > = (E[X2
])
1
2
.
Proof.
X + Y 2
2 = < X + Y, X + Y >
= X 2
2 + Y 2
2 + 2 < X, Y >
0
3. In probabilistic language, if < X, Y >= 0 and E[X] = E[Y ] = 0, then Cov[X, Y ] = 0.
Furthermore, Var[X + Y ] = Var[X] + Var[Y ]. This is equivalent to (7.2).
4. Parallelogram Law:
1
2
X + Y 2
2 + X − Y 2
2 = X 2
2 + Y 2
2 ∀ X, Y ∈ Lp
Proof. Exercise
Theorem 7.5
Let (Ω, F, P) be a probability space and G ⊆ F a sub−σ−algebra of F. Then L2
(Ω, G, P)
is a complete subspace of L2
(Ω, F, P) and ∀ X ∈ L2
(Ω, F, P), there exists Y ∈ L2
(Ω, G, P)
such that the following holds:
i) X − Y 2 = inf { X − Z 2 : Z ∈ L2
(Ω, G, P)}
ii) E [(X − Y )Z] = 0 ∀ Z ∈ L2
(Ω, G, P)
Furthermore, i) and ii) are equivalent and Y ∈ L2
(Ω, G, P) satisfies i) or ii) if and only if
Y = Y (P−a.s.)
Note that ii) ⇔< (X − Y, Z >= 0 ∀Z ∈ L2
(Ω, G, P).
Proof. We need to show that L2
(Ω, G, P) is complete. Then:
Take (Xn)n∈N in L2
(Ω, G, P) Cauchy ⇒ (Xn)n∈N is Cauchy in L2
(Ω, F, P)
⇒ Xn
· 2
−→ X ∈ L2
(Ω, F, P)
⇒ Xn
P
−→ X by Theorem 7.1
⇒ ∃ subsequence Xkn −−−−→
n→∞
X a.s.
⇒ X ∈ mG
⇒ X ∈ L2
(Ω, G, P)
There ∃ (Yn)n∈N ∈ L2
(Ω, G, P) such that:
X − Yn 2 → d := inf X − Z : Z ∈ L2
(Ω, G, P)
41
We apply the parallelogram law to X − Ym, X − Yn:
Ym − Yn
2
2 = 2 ( Ym − X 2
2 + X − Yn
2
2) − 4 X − (Ym + Yn)/2 2
2
≤ 2 ( Ym − X 2
2 + X − Yn
2
2) − 4d2
≤ 2(d2
+ d2
) − 4d2
as n, m → ∞
= 0
Hence (Yn)n∈N is Cauchy in L2
(Ω, G, P) such that Yn − Y 2 −−−−→
n→∞
0.
Note: d ≤ X − Y 2 ≤ X − Yn 2 + Yn − Y 2.
For every n ∈ N ⇒ d ≤ X − Y ≤ d.
Hence i) holds.
Now we show that i) ⇒ ii) by contradiction.
Assume ∃Z ∈ L2
(Ω, G, P) such that < X − Y, Z > > 0 and Z 2 = 1.
Then Y + < X − Y, Z > Z ∈ L2
(Ω, G, P).
X − (Y + < X − Y, Z > Z 2
2 = X − Y 2
2+ < X − Y, Z >2
Z 2
− 2 < X − Y, Z >2
= X − Y 2
2− < X − Y, Z >2
< X − Y 2
2
This is a contradiction because of i): we know that X−Y 2 = inf { X − Z 2 : Z ∈ L2
(Ω, G, P)}.
But X − Y 2 is the smallest element, and we cannot have anything smaller than that.
Hence i) ⇒ ii).
To see that ii) ⇒ i), note that:
X − Z 2
2 = |(X − Y ) + (Y − Z) 2
2
= X − Y 2
2 + Y − Z 2
2 by Pythagoras Theorem since Y − Z ∈ L2
(Ω, G, P)
≥ X − Y 2
2
So ii) ⇒ i).
If Y satisfies ii), then:
a = X − Y 2
2
= X − Y 2
2 + Y − Y 2
2
≥ X − Y 2
2
= b
By i), a = b (since there can only be one infimum), hence Y − Y 2
2 = 0.
This implies that Y = Y (P−a.s.), because E Y − Y 2
2 = 0.
42
8 Characteristic Functions and the Central Limit
Theorem
Definition 8.1 – Characteristic function
Let X be a random variable taking values in R with cumulative distribution function F = FX
and law µ (i.e. µ is a measure on (R, B) such that µ(a, b) = F(b) − F(a) ∀ a ≤ b ∈ R). The
characteristic function of X is given by φ : R → C such that:
φ(θ) = E eiθX
= E [cos(θX)] + iE [sin(θX)]
=
R
eiθx
µ(dx)
=
R
eiθx
dF(x)
Remarks:
1. X ∼ Y ⇒ φX = φY where φX is the characteristic function of X and φY is the
characteristic function of Y .
2. φ(θ) is well-defined for every θ ∈ R since eiθx
= sin2
(θx) + cos2(θx) = 1 ∀ x, θ ∈ R.
Hence eiθX
∈ L1
.
Theorem 8.1
Let φ = φX be the characteristic function of a random variable X. Then:
1. φ(0) = 1 (by definition).
2. |φ(θ)| ≤ 1.
3. θ → φ(θ) is continuous ∀ θ ∈ R.
Exercise: Prove this using DCT.
4. φ−X(θ) = φX(−θ) = φX(θ) ∀ θ ∈ R.
5. φaX+b(θ) = eiθb
φX(aθ) ∀ a, b ∈ R.
6. If E [|X|n
] < ∞ for some n ∈ N, then φ
(n)
X (0) = in
E[Xn
].
Exercise: Prove this using DCT.
Theorem 8.2
If X and Y are independent, then φX+Y (θ) = φX(θ)φY (θ) ∀ θ ∈ R.
Remark:
If E eiαX+iβY
= E eiαX
E eiβY
∀ α, β ∈ R, then X and Y are independent.
43
Theorem 8.3 – Levy’s inversion formula
Let φ be a characteristic function of a random variable X with law µ and cumulative
distribution function F. Then:
lim
T→∞
1
2π
T
−T
e−iθa
− e−iθb
iθ
φ(θ)dθ =
1
2
µ({a}) + µ(a, b) +
1
2
µ({b})
= −
1
2
FX(a) + FX(a−
) +
1
2
FX(b) + FX(b−
)
where F(a−
) = lim
x a
F(x).
Proof. Elementary. Exercise
Remarks:
If φ ∈ L1
(R, B, γL), then Levy’s inversion formula (Theorem 8.3) implies that X has a
density fX : R → R+
:
1
2π
R
e−iθa
− eiθb
iθ
φX(θ) dθ
= FX(b) − FX(a)
=
b
a
fX(y) dy
Furthermore, we have fX(x) =
1
2π
R
e−iθx
φX(θ) dθ.
Theorem 8.4 – Levy’s convergence theorem
Let Fn, n ∈ N be a sequence of cumulative distribution functions with characteristic function:
φn(θ) =
R
eiθx
dFn(x)
Suppose that:
• g(θ) = lim
n→∞
φn(θ) ∀ θ ∈ R
•g is continuous at 0.
Then g is a characteristic function of some cumulative distribution function F
i.e. g(θ) =
R
eiθx
dF(x) and Fn
d
−→ F (i.e. Fn(x) → F(x) ∀ x ∈ R s.t. F is continuous at
x).
Proof. Proof given in Williams: Probability with Martingales.
Theorem 8.5 – Central limit theorem
Let (Xn)n∈N be a sequence of independent identically distributed random variables such that
E [X2
1 ] = σ2
< ∞ and E [X1] = 0. Let Sn =
n
i=1
Xi and Gn =
1
√
nσ
Sn. Then Gn
d
−→ N(0, 1).
44
Remark:
If Xi ∼ N(0, 1) for each i ∈ N, then Gn ∼ N(0, 1) ∀ n ∈ N.
Proof. Note that φGn (θ) = E e
iθ√
nσ
n
i=1 Xi
= φX1
θ
√
nσ
n
since Xi are independent.
Note that since E [X2
1 ] < ∞, we have:
φX1 (θ) = 1 + iE [X1]
iθ
1!
0
+
(iθ)2
2!
E X2
1 + o(θ2
)
= 1 −
θ2
2
σ2
+ o(θ2
)
Hence φGn (θ) = φX1
θ
σ
√
n
n
= 1 −
θ2
2n
+ o
θ
σ
√
n
2 n
.
Using results proved in course where 1 +
bn
n
n
→ eb
as n → ∞, for bn → b ∈ R, we have:
lim
n→∞
φGn (θ) = φ(θ) = e−θ2
2
It is well known that
R
eiθx 1
√
2π
e−x2
2 dx = e−θ2
2 , i.e. this is the characteristic function of
N(0, 1).
Therefore by Levy’s inversion formula (Theorem 8.3), we have that Gn
d
−→ N(0, 1).
45
9 Conditional Expectation & Martingales
Example 9.1
Let X be a random variable on (Ω, F, P) that takes values in A = {X1, X2, . . . , Xm},
P [X ∈ A] = 1, and let Y be a random variable on (Ω, F, P) such that P [Y ∈ B] = 1,
B = {y1, . . . , yn}.
In particular, we assume that P [Y = Yi] > 0 ∀ i = 1, . . . , n. We have:
E [X | Y = yi] =
m
j=1
xj · P [X = xj | Y = yi]
=
m
j=1
xj ·
P [X = xj, Y = yi]
P [Y = yi]
= F(Yi), F : B → R.
In other words, E [X | Y ] = F(Y ).
Note that:
E I{Y =yi}F(Y ) = P [Y = yi] ·
m
j=1
xjP [X = xj | Y = yi]
=
m
j=1
xj · P [X = xj, Y = yi]
= E X · I{Y =yi}
Remarks:
1. To define E [X | Y ], X and Y have to be defined on the same probability space.
2. Note that E [X | Y ] is a random variable in mσ(Y ) such that E [E [X | Y ] · IG] =
E [X · IG] ∀ G ∈ σ(Y ).
Definition 9.1 – Version of conditional expectation
Let X be a random variable on L1
(Ω, F, P) and let G ⊆ F be a sub−σ−algebra. If ˆX
satisfies:
i) ˆX ∈ mG
ii) E ˆX · IG = E [X · IG] ∀ G ∈ G
then ˆX is a version of conditional expectation E [X | G] of X given G.
We denote ˆX = E [X | G] (P−a.s.).
46
Remarks:
1. If X ∈ mG satisfies ii) in Definition 9.1, then X = ˆX (P−a.s.).
Proof. Note that X > ˆX , ˆX > X ∈ G and that:
0 ≤ E X − ˆX · I{X− ˆX} = E X · I{X− ˆX} − E ˆX · I{X− ˆX}
= E X · I{X− ˆX} − E X · I{X− ˆX} by ii) in Def 9.1
= 0
⇒ X ≤ ˆX (P − a.s.)
Similarly, by looking at the event ˆX > X , we find that ˆX ≤ X (P−a.s.).
Therefore, this implies that ˆX = X (P−a.s.).
2. Note that for ii) in Definition 9.1, we implicitly assume that E [X · IG] is well-defined.
Hence we use the fact that X ∈ L1
. (|XIG| ≤ |X|)
3. In Definition 9.1, we can also assume that X ∈ (mF)+
and drop X ∈ L1
.
Theorem 9.1 – Conditional Expectation
Let X ∈ L1
(Ω, F, P) or X ∈ (mF)+
. Then conditional expectation E [X |G] exists and is
unique (P−a.s.). (i.e. if X, ˆX are both versions of E [X |G], then X = ˆX (P−a.s.))
Proof. We consider 3 cases. X ∈ L2
, X ∈ L1
, and X ∈ (mF)+
.
Case 1: If X ∈ L2
, then this implies there exists a unique Y ∈ L2
(Ω, G, P) such that
E [(X − Y ) · IG] = 0 ∀ G ∈ G.
This is equivalent to E [IG · Y ] = E [X · IG] ∀ G ∈ G, which implies that Y is a version of
E [X | G].
Case 2: If X ∈ (mF)+
, then let Xn = min{X, n}.
Note that Xn ∈ L2
, Xn X as n → ∞ almost surely.
Now we have 0 ≥ E ˆX · I{ ˆXn<0} = E Xn · I{ ˆXn<0} ≥ 0 by ii) in Definition 9.1.
This implies that E ˆXn · I{ ˆXn<0} = 0 which in turn implies that P ˆXn < 0 = 0.
Hence we have ˆXn = E [Xn | G] and 0 ≤ ˆXn ≤ ˆXn+1.
To prove ˆXn ≤ ˆXn+1 (P−a.s.), note that Xn+1−Xn ≥ 0 (P−a.s.), and that E [Xn+1 − Xn | G] =
ˆXn+1 − ˆXn implies that ˆXn+1 − ˆXn ≥ 0.
Hence there ∃ ˆX = lim
n→∞
ˆXn ∈ mG since Xn X.
We then have E ˆX · IG = E [X · IG] ∀ G ∈ G by Monotone Convergence Theorem (The-
orem 4.1).
47
Remarks:
1. X ∈ L1
⇒ ˆX ∈ L1
(Ω, G, P)
Proof. We can write ˆX = ˆX+
− ˆX−
, with ˆX = max 0, ˆX , ˆX−
= min 0, − ˆX .
We need to show that ˆX+
, ˆX−
∈ L1
(Ω, G, P).
Note that ˆX+
= ˆX · I{ ˆX≥0}, where ˆX ≥ 0 ∈ G.
Then E I{ ˆX≥0} · ˆX = E X · I{ ˆX≥0} ∈ R (finite)
A similar argument implies E ˆX−
∈ R.
2. If X ∈ (mF)+
⇒ ˆX ≥ 0 (P−a.s.).
Proof. Take ˆX < 0 ∈ G and note that:
0 ≤ E X · I{ ˆX<0} = E ˆX · I{ ˆX<0} ≤ 0
This implies that P ˆX < 0 = 0 ⇔ ˆX ≥ 0 (P−a.s.).
3. If ∃ X ∈ mG and satisfies E [X · IG] = E XIG ∀ G ∈ G, then X = ˆX (P−a.s.).
Proof. To prove this, note the following:
E X − ˆX · I{X> ˆX} = E X · I{X> ˆX} − E X · I{X> ˆX} = 0 (9.1)
This implies that P X > ˆX = 0.
(9.1) holds if X ∈ L1
(Ω, F, P).
Similarly, one can show that P ˆX > X = 0.
Therefore the statement follows if X ∈ L1
(Ω, F, P).
However, if X ≥ 0, and E[X] > ∞, then an approximation argument and (9.1) yields
the statement.
Exercise.
Hint: apply (9.1) to Xn = min {X, n}.
48
Theorem 9.2
Let (Ω, F, P) be our probability space, and let X, Y ∈ mF. Take G, H sub−σ−algebras in
F. Then:
a) If X ∈ mG and X ∈ L1
or X ∈ (mG)+
, then E [X |G] = X (P−a.s.).
b) If X, Y ∈ L1
(Ω, F, P), and a, b ∈ R, then E [aX + bY | G] = aE [X | G] + bE [Y | G].
c) X ∈ L1
(Ω, F, P) on X ∈ (mF)+
, then E [E [X | G]] = E[X].
d) X ∈ mG and assume either X, Y ∈ L2
(Ω, F, P) or X, Y ∈ (mF)+
, then E[XY | G] =
XE [Y | G] (P−a.s.).
e) If X ∈ L1
(Ω, F, P) or X ∈ (mF)+
and H is independent of σ(X), then E[X |H] =
E[X] (P−a.s.).
f) (Tower Property): Let H ⊆ G and X ∈ L1
or X ∈ (mF)+
. Then:
E [E[X | G] | H] = E[X | H]
g) If X ≥ 0, then E[X | G] ≥ 0 (P−a.s.).
h) (Jensen’s Inequality): If φ : R → R is convex such that φ(X), X ∈ L1
(Ω, F, P), then:
E [φ(X) | G] ≥ φ (E [X |G]) (P − a.s.)
i) Let f : R2
→ R be B(R2
) measurable, and X ∈ mG and Y independent of G, and
f(XY ) ∈ L1
(Ω, F, P). Then g(X) : E [f(x, Y )] , x ∈ R (fixed x) defines a Borel
measurable map g : R → R which satisfies E [f(X, Y ) | G] = g(X) (P−a.s.).
Proof. For some parts:
c) This is clear from the definition. E [IG · E [X | G]] = E [IG · X] ∀ G ∈ G by taking
G = Ω.
d) If X = IG, G ∈ G, then our statement d) follows by the definition of conditional
expectation since E [IG · Y | G] = IG · E [Y | G].
For any A ∈ G, we need to see that E [IA · IG · E [Y | G]] = E [IAIG · Y ].
However, this holds since E [IA∩G · E [Y |G]] = E [IA∩G · Y ] ∀ A ∈ G and IAIG = IA∩G.
Our statement d) holds by approximating X ∈ mG by simple functions and proving
E [IA · X · E [Y | G]] = E [IA · XY ] ∀ A ∈ G.
e) We need to prove that ∀ H ∈ H, we have E [IH · E [X]] = E [IH · X].
But E [IH · E[X]] = E [IH] · E[X] since IH, X are independent.
f) Pick H ∈ H and note that:
E IH · ˆX = E [E [IHE [X | G] | H]] by d)
= E [E [E [IH · X | G] | H]] by d), and that H ∈ H ⊆ G
= E [IH · X] by applying c) twice
This implies the Tower Property.
49
10 Filtrations, martingales, and stopping times
Here, we have our time: T ∈ {N, Z}, N ∪ {0} = Z+
.
Definition 10.1 – Filtration
Let (Ω, F, P) be a probability space. A filtration indexed by T is a non-decreasing sequence
of σ−algebras (Ft)t∈T on (Ω, F, P) i.e. we have Fs ⊆ Ft ⊆ F ∀ s, t ∈ T s.t. s ≤ t.
Definition 10.2 – (Stochastic) Process
A process (Xt)t∈T = X is a collection of random variables on (Ω, F, P).
Definition 10.3 – Adapted
The process X = (Xt)t∈T is adapted to the filtration (Ft)t∈T if Xt ∈ mFt.
Definition 10.4 – Filtered probability space
(Ω, F, P) with filtration (Ft)t∈T is called a filtered probability space (Ω, F, (F)t∈T, P).
Definition 10.5 – Martingale
A process M = (Mt)t∈T is a martingale on a filtered probability space (Ω, F, (Ft)t∈T, P) if:
a) M is adapted to (Ft)t∈T, Mt ∈ mFt.
b) Mt ∈ L1
(Ω, F, P) ∀ t ∈ T, i.e. E [|Mt|] < ∞ ∀ t.
c) For any s ≤ t, s, t ∈ T, we have E [Mt | Fs] = Ms (P−a.s.).
Definition 10.6 – Submartingale
M = (Mt)t∈T is a submartingale if a) and b) hold in Definition 10.5 and
E [Mt | IS] ≥ Ms (P−a.s.).
Definition 10.7 – Supermartingale
M = (Mt)t∈T is a supermartingale if a) and b) hold in Definition 10.5 and
E [Mt | IS] ≤ Ms (P−a.s.).
Remarks:
1. Note that c) in Definition 10.5 is equivalent to E [(Mn+1 − Mn) · IA] = 0 ∀ A ∈ Fn
and all n ∈ T.
Proof. To prove this, we need to show that E [Mn+k | Fn] = Mn (P−a.s.).
We have:
E [Mn+2 | Fn] = E [E [Mn+2 | Fn+1]] by Tower Property
= E [Mn+1 | Fn]
= Mn
50
Exercise: Show that in Definition 10.7, E [Mt | IS] ≤ Ms (P−a.s.) is equivalent to
E [(Mn+1 − Mn) · IA] ≤ 0 ∀ A ∈ Fn ∀ n ∈ T.
Example 10.1
1. X ∈ L1
(Ω, F, P) on a filtered probability space (Ω, F, (Ft)t∈T, P). Then Mt = E [X | Ft]
is a martingale. c) from Definition 10.5 follows from the Tower Property of condi-
tional expectation.
2. Let Xi, i ∈ N be iid random variables such that P [Xi = 1] = p, P [Xi = −1] = 1 − p,
with p ∈ (0, 1). Define Mk =
k
i=1
Xi. Claim M = (Mk)k∈N is a supermartingale if and
only if p ≤
1
2
, M is a submartingale if p ≥
1
2
. Hence M is a martingale if and only if
p =
1
2
.
Proof. Here, we use Fk = σ(X1, . . . , Xk), and show that the properties in the definition
of a martingale are satisfied.
a) M is adapted to (Fk).
Mk ∈ Fk is true since Fk = σ(M1, . . . , Mk) since there exists matrix A ∈ Rk×k
such
that:





X1
X2
...
Xk





= A





M1
M2
...
Mk





and A−1





X1
X2
...
Xk





=





M1
M2
...
Mk





with A−1
=





1 0 . . . 0
1 1 . . . 0
...
...
...
...
1 1 . . . 1





So there is a bijection between these two vectors which implies Fk = σ(M1, . . . , Mk).
b) Mk ∈ L1
∀ k ∈ N.
This holds since Xi, i = 1, . . . , k are in L1
. (because Xi = ±1, so Xi ≤ k < ∞
which is bounded)
c) We have:
E[ Mk+1
Xk+1+Mk
| Fk] = Mk + E [Xk] as Mk ∈ mFk and Xk + 1 is independent on Fk
= Mk + p(1) − (1 − p)(1)
= Mk + 2p − 1
So for p ∈ 0, 1
2
, we have a supermartingale, and for p ∈ 1
2
, 1 , we have a submartin-
gale.
This proves the equivalences above.
Definition 10.8 – Stopping time
Let (Ω, F, (Ft)t∈T, P) be a filtered probability space. A random variable τ : Ω → T ∪ {∞}
is a stopping time relative to the filtration (Ft)t∈T if {τ ≤ t} ∈ Ft ∀ t ∈ T.
51
Remark:
In case T = Z+
, then τ is a stopping time if and only if {τ = t} ∈ Ft ∀ t ∈ Z+
(since
{τ = t} = {τ ≤ t}  {τ ≤ t − 1} and {τ ≤ t} =
t
k=0
{τ = k}).
Example 10.2
1. Let M = (Mk)k∈N, Mk =
k
i=1
Xi as before. Then τa = inf {t ∈ N : Mt = a} (a ∈ Z).
Note that {τa ≤ t} =
t
k=1
M−1
k ({a})
∈Fk
∈ Ft and Fk ⊆ Ft ∀ k ≤ t.
2. Every constant time t ∈ T is a stopping time.
Example 10.3
Suppose we have M0 = 0, Mk =
k
i=1
Xi, with Xi iid Bernoulli random variables, P[Xi =
1] = p, P[Xi = −1] = 1 − p, p ∈ (0, 1).
We have H1 = 1, Hk = 2k−1
I{Xi=−1:i=1,...,k−1}.
Note that Hk ∈ mFk−1.
Let Nk =
k
i=1
Hi (Mi − Mi−1)
Xi
be the gains process (N = (Nk)k∈N).
Note Nk =∈ mFk, Fk = σ(X1, . . . , Xk).
Also, Nk ∈ L1
∀ p and if p ≥ 1
2
, then E [Nk+1 | Fk] = Nk (Exercise: Check this)
Furthermore, N is a supermartingale if and only if p ≤ 1
2
.
Let:
τ = inf {t ∈ N | Mt > Mt−1}
= inf {t ∈ N | Xt = 1, Xi = −1 ∀ i = 1, . . . , t − 1}
Exercise: Show τ is a stopping time, i.e. show P[τ = n] = p(1 − p)n−1
n ∈ N.
Note that:
Nk =
k
i=1
Hi(Mi − Mi−1) =
1 − 2k
Xi = −1, i = 1, 2, . . . , k
1 ∃ i = {1, . . . , k} s.t. Xi = 1
which implies that Nτ = 1 (P−a.s.).
52

More Related Content

What's hot

Proofs by contraposition
Proofs by contrapositionProofs by contraposition
Proofs by contrapositionAbdur Rehman
 
Linear transformation.ppt
Linear transformation.pptLinear transformation.ppt
Linear transformation.pptRaj Parekh
 
21 monotone sequences x
21 monotone sequences x21 monotone sequences x
21 monotone sequences xmath266
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic NotationsRishabh Soni
 
CMSC 56 | Lecture 14: Representing Relations
CMSC 56 | Lecture 14: Representing RelationsCMSC 56 | Lecture 14: Representing Relations
CMSC 56 | Lecture 14: Representing Relationsallyn joy calcaben
 
Differential equations
Differential equationsDifferential equations
Differential equationsSeyid Kadher
 
Recurrence relations
Recurrence relationsRecurrence relations
Recurrence relationsIIUM
 
Euler and improved euler method
Euler and improved euler methodEuler and improved euler method
Euler and improved euler methodSohaib Butt
 
Unit 1: Topological spaces (its definition and definition of open sets)
Unit 1:  Topological spaces (its definition and definition of open sets)Unit 1:  Topological spaces (its definition and definition of open sets)
Unit 1: Topological spaces (its definition and definition of open sets)nasserfuzt
 
Tensor 1
Tensor  1Tensor  1
Tensor 1BAIJU V
 
Presentation on Numerical Integration
Presentation on Numerical IntegrationPresentation on Numerical Integration
Presentation on Numerical IntegrationTausif Shahanshah
 
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...jani parth
 

What's hot (20)

Proofs by contraposition
Proofs by contrapositionProofs by contraposition
Proofs by contraposition
 
Linear transformation.ppt
Linear transformation.pptLinear transformation.ppt
Linear transformation.ppt
 
21 monotone sequences x
21 monotone sequences x21 monotone sequences x
21 monotone sequences x
 
Time complexity
Time complexityTime complexity
Time complexity
 
The integral
The integralThe integral
The integral
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
Proof by contradiction
Proof by contradictionProof by contradiction
Proof by contradiction
 
CMSC 56 | Lecture 14: Representing Relations
CMSC 56 | Lecture 14: Representing RelationsCMSC 56 | Lecture 14: Representing Relations
CMSC 56 | Lecture 14: Representing Relations
 
Primality
PrimalityPrimality
Primality
 
Differential equations
Differential equationsDifferential equations
Differential equations
 
Presentation binomial theorem
Presentation binomial theoremPresentation binomial theorem
Presentation binomial theorem
 
Recurrence relations
Recurrence relationsRecurrence relations
Recurrence relations
 
Euler and improved euler method
Euler and improved euler methodEuler and improved euler method
Euler and improved euler method
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
Unit 1: Topological spaces (its definition and definition of open sets)
Unit 1:  Topological spaces (its definition and definition of open sets)Unit 1:  Topological spaces (its definition and definition of open sets)
Unit 1: Topological spaces (its definition and definition of open sets)
 
Tensor 1
Tensor  1Tensor  1
Tensor 1
 
Presentation on Numerical Integration
Presentation on Numerical IntegrationPresentation on Numerical Integration
Presentation on Numerical Integration
 
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...
Ordinary Differential Equations And Their Application: Modeling: Free Oscilla...
 
Supremum And Infimum
Supremum And InfimumSupremum And Infimum
Supremum And Infimum
 
Discrete Math Lecture 03: Methods of Proof
Discrete Math Lecture 03: Methods of ProofDiscrete Math Lecture 03: Methods of Proof
Discrete Math Lecture 03: Methods of Proof
 

Viewers also liked

Sampling, Statistics and Sample Size
Sampling, Statistics and Sample SizeSampling, Statistics and Sample Size
Sampling, Statistics and Sample Sizeclearsateam
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theoremlovemucheca
 
Lecture slides stats1.13.l09.air
Lecture slides stats1.13.l09.airLecture slides stats1.13.l09.air
Lecture slides stats1.13.l09.airatutor_te
 
Probability Theory and Mathematical Statistics
Probability Theory and Mathematical StatisticsProbability Theory and Mathematical Statistics
Probability Theory and Mathematical Statisticsmetamath
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionDanu Saputra
 

Viewers also liked (6)

Sampling, Statistics and Sample Size
Sampling, Statistics and Sample SizeSampling, Statistics and Sample Size
Sampling, Statistics and Sample Size
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theorem
 
Lecture slides stats1.13.l09.air
Lecture slides stats1.13.l09.airLecture slides stats1.13.l09.air
Lecture slides stats1.13.l09.air
 
Probability Theory and Mathematical Statistics
Probability Theory and Mathematical StatisticsProbability Theory and Mathematical Statistics
Probability Theory and Mathematical Statistics
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 

Similar to Probability theory

Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Kwonpyo Ko
 
Problems and solutions inmo-2012
Problems and solutions  inmo-2012Problems and solutions  inmo-2012
Problems and solutions inmo-2012askiitians
 
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...BRNSS Publication Hub
 
Recursive Definitions in Discrete Mathmatcs.pptx
Recursive Definitions in Discrete Mathmatcs.pptxRecursive Definitions in Discrete Mathmatcs.pptx
Recursive Definitions in Discrete Mathmatcs.pptxgbikorno
 
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurMid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurVivekananda Samiti
 
schaums-probability.pdf
schaums-probability.pdfschaums-probability.pdf
schaums-probability.pdfSahat Hutajulu
 
Andrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndries Rusu
 
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...AIRCC Publishing Corporation
 
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualDiscrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualTallulahTallulah
 
Assignments for class XII
Assignments for class XIIAssignments for class XII
Assignments for class XIIindu thakur
 
20200911-XI-Maths-Sets-2 of 2-Ppt.pdf
20200911-XI-Maths-Sets-2 of 2-Ppt.pdf20200911-XI-Maths-Sets-2 of 2-Ppt.pdf
20200911-XI-Maths-Sets-2 of 2-Ppt.pdfMridulDhamija
 

Similar to Probability theory (20)

Ch1 sets and_logic(1)
Ch1 sets and_logic(1)Ch1 sets and_logic(1)
Ch1 sets and_logic(1)
 
Math
MathMath
Math
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Problems and solutions inmo-2012
Problems and solutions  inmo-2012Problems and solutions  inmo-2012
Problems and solutions inmo-2012
 
Lemh1a1
Lemh1a1Lemh1a1
Lemh1a1
 
Lemh1a1
Lemh1a1Lemh1a1
Lemh1a1
 
7_AJMS_246_20.pdf
7_AJMS_246_20.pdf7_AJMS_246_20.pdf
7_AJMS_246_20.pdf
 
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...
On Some Geometrical Properties of Proximal Sets and Existence of Best Proximi...
 
Recursive Definitions in Discrete Mathmatcs.pptx
Recursive Definitions in Discrete Mathmatcs.pptxRecursive Definitions in Discrete Mathmatcs.pptx
Recursive Definitions in Discrete Mathmatcs.pptx
 
Number theory
Number theoryNumber theory
Number theory
 
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurMid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
Mid semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 
schaums-probability.pdf
schaums-probability.pdfschaums-probability.pdf
schaums-probability.pdf
 
Andrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshopAndrei rusu-2013-amaa-workshop
Andrei rusu-2013-amaa-workshop
 
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...
Residual Quotient and Annihilator of Intuitionistic Fuzzy Sets of Ring and Mo...
 
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualDiscrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
 
SETS
SETSSETS
SETS
 
SET THEORY
SET THEORYSET THEORY
SET THEORY
 
Assignments for class XII
Assignments for class XIIAssignments for class XII
Assignments for class XII
 
Chpt 2-sets v.3
Chpt 2-sets v.3Chpt 2-sets v.3
Chpt 2-sets v.3
 
20200911-XI-Maths-Sets-2 of 2-Ppt.pdf
20200911-XI-Maths-Sets-2 of 2-Ppt.pdf20200911-XI-Maths-Sets-2 of 2-Ppt.pdf
20200911-XI-Maths-Sets-2 of 2-Ppt.pdf
 

Recently uploaded

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Probability theory

  • 2. Contents 0 Measures 3 1 Axiomatic Probability Theory 10 2 Independence 12 3 Tail σ−algebra and Kolmogorov’s 0 − 1 law 16 4 Integration 24 5 Expectations 27 6 Inequalities 29 7 Convergence of Random Variables 35 8 Characteristic Functions and the Central Limit Theorem 43 9 Conditional Expectation & Martingales 46 10 Filtrations, martingales and stopping times 50 Notes on the First Edition Thanks to Pierre Tai and Nico Prokop for pointing out the many typos within. Keegan Kang Notes on the Second Edition These notes were written for the 2010-2011 course, so might not be directly relevant to our course. There have been changes made since the first edition, but these are almost exclusively cosmetic. Iain Carson 2
  • 3. 0 Measures Definition 0.1 – σ-algebra F is a σ− algebra if it satisfies the following properties: • Ω ∈ F • if A ∈ F, then AC ∈ F • if {Ai}∞ i=0 ∈ F, then ∞ i=0 ∈ F If we have F, then (Ω, F) is a measurable space. Example 0.1 – Examples of σ−algebras on a set Ω smallest σ−algebra (∅, Ω) largest σ−algebra power set 2Ω It is also possible to generate other σ−algebras on Ω. Take a subset A of Ω, i.e. A ⊆ Ω. We know {A} ∈ 2Ω . We look at σ({A}), which is the smallest σ−algebra generated by A. σ({A}) = F⊇{A} F this is non-empty because {A} ⊆ 2Ω = {∅, Ω, A, AC } (0.1) To say that (0.1) is the smallest σ−algebra generated by A, we need to check that: ˆ (0.1) fulfills the axioms of a σ−algebra ˆ (0.1) is contained in every σ−algebra containing A which are trivial. Therefore, we can take any arbitrary collection C of subsets where C ⊆ 2Ω to generate σ−algebras, and σ(C) = F⊇C F, where F are σ−algebras. Definition 0.2 – Borel σ−algebra (for R) The Borel σ−algebra is the smallest σ−algebra containing all open sets in the topological space. So when we get Ω = R, then B(R) is the smallest σ−algebra generated by open intervals in R. B(R) = σ (J : J open interval in R) = σ((−∞, x], x ∈ Q) Consider σ({m} : m ∈ Q). Is σ({m} : m ∈ Q) = B(R)? No, it is not. Proof. We know that intervals (sets) in B(R) are uncountable (and their complements are uncountable as well). So if we can show that the sets in σ({m} : m ∈ Q) are countable, or the complements of the sets are countable, then σ({m} : m ∈ Q) = B(R). 3
  • 4. But the sets of all rational points are countable. We can construct a bijection from the set of all rationals to a subset of N. Define f : Q → N as follows. • For each q ∈ Q+ , write q = m n where m, n ∈ Z, m, n > 0, hcf(m, n) = 1. • For each q ∈ Q− , write q = m n where m, n ∈ Z, m < 0, n > 0, hcf(|m|, n) = 1. Write: f(q) =    2m 3n q > 0 5|m| 7n q < 0 1 q = 0 This is an injection from Q to a subset of N, so there exists a bijection between Q and a particular subset of N, hence Q is countable. Therefore σ({m} : m ∈ Q) = B(R). We also want to show that B(R) = σ((−∞, x], x ∈ Q). Proof. To show that B(R) = σ((−∞, x], x ∈ Q), we need to show that: F = σ((−∞, x], x ∈ Q) ⊆ B(R) (†) F = σ((−∞, x], x ∈ Q) ⊇ B(R) (††) (†) It is enough to show that (−∞, x] ∈ B(R) ∀ x ∈ Q. This is true since (−∞, x]C = (x, ∞) ∈ B(R) ⇒ F ∈ B(R), as B(R) is closed under complements. (††) It is enough to show that F contains J ∀ J = (a, b) ⊆ R because B(R) is the smallest σ−algebra containing all open intervals. Then any σ−algebra containing all open intervals contains B(R). (a, b) ∈ F ⇔ R (a, b) ∈ F ⇔ (−∞, a] [b, ∞) ♥ ∈ F We just need to show that and ♥ ∈ F. It is obvious that ∈ F if a ∈ Q. Otherwise, we construct a decreasing sequence of rationals ai which tends to a, i.e. ai a, and write = (−∞, a] = i∈N (−∞, ai] ∈ F. Hence (−∞, a] ∈ F for a ∈ R. We now consider ♥. We have ♥ = i −∞, b − 1 i ∈ F if b ∈ Q. Otherwise, we construct a decreasing sequence of rationals bi which tends to b, i.e. bi b, and write: ♥ = [b, ∞) = n i −∞, bi − 1 n ∈ F 4
  • 5. Hence [b, ∞) ∈ F for all b ∈ R. We thus have shown that (a, b) ∈ F and (††) holds. Hence B(R) = σ((−∞, x], x ∈ Q). There is a fundamental question: When are two measures equal? Let (Ω, F) be a measurable space and let µ, ν be two measures on (Ω, F). When does the equality hold? i.e. When does µ(F) = ν(F) ∀ F ∈ F? Definition 0.3 – d-system Let Ω be a set. A collection of subsets D ⊆ 2Ω is a d-system if: i) Ω ∈ D ii) If A ⊆ B and A, B ∈ D, then B − A ∈ D iii) If Am ∈ D ∀ m ∈ N and Am ⊆ Am+1 ∀ m ∈ N, then m∈N Am ∈ D Remarks: i) In literature, the d−system is also called a Dynkin system, or a λ−system. ii) Every σ−algebra is a d−system. iii) If µ, ν are finite measures on (Ω, F), such that µ(Ω) = ν(Ω), then D = {F ∈ F | µ(F) = ν(F)} iii) is a d−system. iv) For any collection I ⊆ 2Ω , the smallest d−system d(I) that contains I is given by iv) d(I) = d systems D⊇I D Proof. Proof of ii) Axiom i) follows by definition. Axiom ii) is satisfied, since B − A = (B ∩ A ) and is thus in the σ−algebra. Axiom iii) is satisfied - let A1 ∪ . . . ∪ An = Bn ∀ n ∈ N. Then Bm ⊆ Bm+1 ∀ m ∈ N, and Bm = m∈N Am ∈ D. Proof of iii) Axiom i) follows since Ω ∈ F by definition of σ−algebra. To prove Axiom ii), we need to show that A ⊆ B, A, B ∈ D ⇒ µ(B − A) = ν(B − A). Rewrite µ(B − A) as µ(B) − µ(A) and similarly ν(B − A) as ν(B) − ν(A). We can do so since ˆ these are finite measures and hence µ(A), µ(B) are finite ˆ A ⊆ B therefore µ(B − A) = µ(B) − µ(A) and ν(B − A) = ν(B) − ν(A) But then we know µ(A) = ν(A) and µ(B) = ν(B). So Axiom ii) holds. To prove Axiom iii), we need to show that: Am ∈ D ∀ m ∈ N and Am ⊆ Am+1 ∀ m ∈ N ⇒ µ m∈N Am = ν m∈N Am 5
  • 6. By continuity of measures, we can write: µ m∈N Am = lim m↑∞ µ(Am) ν m∈N Am = lim m↑∞ ν(Am) But then we know lim m↑∞ µ(Am) = lim m↑∞ ν(Am). So Axiom iii) holds. Proof of iv) (to show that d(I) = d systems D⊇I D is a d−system) We need to show that d(I) is non empty, that it is the smallest d−system, and that it satisfies the axioms of a d−system. d(I) is non empty as the set B = {I, I , ∅, Ω} contains I, and B fulfills the axioms of a d−system. Furthermore, all other d−systems containing I must contain B, and hence B is the smallest d−system. To show d(I) satisfies the axioms of a d−system, let Dk be an index of d−systems containing I. Axiom 1: Ω ∈ Dk ∀ k ∈ N ⇒ Ω ∈ Dk Axiom 2: Suppose we have B ∈ Dk ∀ k ∈ N. Then A ⊆ B ⇒ A ∈ Dk ∀ k ∈ N as well. Hence (B − A) ∈ Dk ∀ k ∈ N and thus we have A ⊆ B, A, B ∈ Dk ⇒ (B − A) ∈ Dk. Axiom 3: Suppose we have A1, A2, . . . ∈ Dk ∀ k ∈ N, with Am ⊆ Am+1 ∀ m ∈ N. Then Am ∈ Dk ∀ k ∈ N. Thus we have A1, A2, . . . ∈ Dk, Am ⊆ Am+1 ∀ m ⇒ Am ∈ Dk ∀ k ∈ N. So d(I) satisfies the axioms of a d−system. Definition 0.4 – π−system Let I ⊆ 2Ω . Then I is a π−system if ∀ A, B ∈ I ⇒ A ∩ B ∈ I. Example 0.2 – Examples of π−systems on R. Consider the set R, and take I1 = {(−∞, x] : x ∈ R}. This is a π−system. I2 = {(−∞, x] : x ∈ Q} is a π−system as well. Proof. Suppose we have two sets A and B, with A, B ∈ I1. Without loss of generality, take a ≤ b. Then: A = (−∞, a] B = (−∞, b] and therefore: A ∩ B = (−∞, a] So A ∩ B = A ∈ I1. The same holds for I2. So I1 and I2 are both π−systems. 6
  • 7. The Borel σ−algebra on R is generated by the π−system(I1) (or I2). In other words, B(R) = σ(I). Remark 0.1 A collection C ⊆ 2Ω is a σ−algebra ⇔ C is a d−system and π−system. Proof. (⇒) We have proved that a σ−algebra is a d−system on Page 5. We need to show that a σ−algebra is a π−system as well, and want to show that ∀ A, B ∈ F, A ∩ B ∈ F. Take A, B ∈ F. Then: A, B ∈ F ⇒ A , B ∈ F by Axiom 2 of σ− algebra ⇒ A ∪ B ∈ F by Axiom 3 of σ−algebra ⇒ A ∪ B ∈ F by Axiom 2 of σ−algebra ⇒ A ∩ B ∈ F by De Morgan’s Laws (⇐) We then need to show that the definitions of a π−system and a d−system fulfill the axioms of a σ−algebra. Axiom 1 is satisfied due to axiom 1 of the d−system, i.e. Ω ∈ D, so Ω ∈ C. Axiom 2: Choose A ∈ C. Since A ∪ A = Ω, then A ⊆ Ω, Ω, A ∈ C, B − A = A ∈ C (by axiom 2 of d−system). So this implies that A ∈ C ⇒ A ∈ C. Axiom 3: Take A1, A2 ∈ C. Wish to show A1 ∪ A2 ∈ C. If we can do so, then by induction, A1, A2, . . . ∈ C, Ai ∈ C. We have proven that Axiom 2 of a σ−algebra is satisfied, so A1, A2 ∈ C. By definition of π−system, A1 ∩ A2 ∈ C. Again, by using Axiom 2 of a σ−algebra, we have A1 ∩ A2 = (A1 ∪ A2) ∈ C. Hence by induction, Axiom 3 is satisfied. Therefore if C is both a d−system and a π−system, then C is a σ−algebra. Theorem 0.1 – Monotone Class Theorem for Sets If I ⊆ 2Ω is a π−system, then d(I) = σ(I). In other words, the smallest d−system generated by I coincides with the σ−algebra σ(I) generated by I. Proof. Need to show that: • d(I) ⊆ σ(I) • d(I) ⊇ σ(I) To show (d(I) ⊆ σ(I)): We have proven that every σ−algebra is a d−system. So it follows that: d(I) = d systems D⊇I D ⊆ F⊇I F σ−algebra F = σ(I) 7
  • 8. Hence d(I) ⊆ σ(I). To show (d(I) ⊇ σ(I)): To prove this, we note Remark 0.1 and show that d(I) is a d−system and a π−system. Then d(I) would be a σ−algebra, and d(I) ⊇ σ(I). Define the family of sets: D1 = {B ∈ d(I) | B ∩ C ∈ d(I) ∀ C ∈ I} We wish to show that D1 is a d−system, and is in fact d(I). First note that I ⊆ D1 since B ∈ I ⊆ d(I) ⇒ B ∩C ∈ d(I) ∀ C ∈ I (this is how we defined our D1). Secondly, we show that D1 satisfies the axioms of a d−system. Axiom 1: ∀ C ∈ I, Ω ∩ C = C ∈ d(I), hence Ω ∈ D1. Axiom 2: Consider the equality (B − A) ∩ C = (B ∩ C) − (A ∩ C) which holds for every set A, B, C, given that A ⊆ B. Pick A, B ∈ D1 which satisfies A ⊆ B, and we want to show that B − A ∈ D1. Since A, B ∈ d(I), which is a d−system, then B − A ∈ d(I). It suffices to check if (B − A) ∩ C ∈ d(I) ∀ C ∈ I. Since A ⊆ B, we have (B ∩ C) ⊇ (A ∩ C), and using the above inequality, we have (B ∩ C) − (A ∩ C) ∈ d(I), and therefore (B − A) ∈ D1. Axiom 3: Give Am ∈ D1, such that Am ⊆ Am+1 ∀ m ∈ N, we wish to show that ( Am)∩C ∈ d(I) ∀ c ∈ I. Note that Am ∩ C ∈ d(I) ∀ m ∈ N, so (Am ∩ C) ⊆ (Am+1 ∩ C) ∀ m. Therefore ( Am) ∩ C ∈ d(I) ∀ c ∈ I. Now, we have satisfied the axioms for D1 to be a d−system, and since I ⊆ D1, we can write: I ⊆ D1 ⊆ d(I) ⇒ D1 = d(I) (0.2) since D1 contains I, and d(I) is the smallest d−system containing I. Now consider the family of sets: D2 = {B ∈ d(I) | B ∩ C ∈ d(I) ∀ C ∈ d(I)} This is the set of subsets in Ω which is in D1. We want to show that I ∈ D2: But B ∈ I, C ∈ d(I) ⇒ B ∩ C ∈ d(I) using (0.2). D2 is also a d−system (similar proof to above, and using the same inequality), and therefore, we can also write: I ⊆ D2 ⊆ d(I) ⇒ D2 = d(I) This implies that d(I) is a π−system. We have thus shown that d(I) is both a π−system and a d−system, and therefore a σ−algebra, hence d(I) ⊇ σ(I). Therefore we have shown: d(I) ⊆ σ(I) and d(I) ⊇ σ(I) and hence d(I) = σ(I). 8
  • 9. We can use the Monotone class theorem (Theorem 0.1) to state certain relations between measures. Proposition 0.1 1) Let µ, ν be two measures on a measurable space (Ω, F) such that µ(Ω) = ν(Ω) < ∞. If µ(C) = ν(C) ∀ C ∈ I where I is a π−system in F, then µ and ν coincide on the smallest σ−algebra generated by I, i.e. σ(I). 2) Any two probability measures that agree on a π−system must agree on the σ−algebra generated by this π−system. Proof. (of part 1) We define the set: D = {A ∈ F | µ(A) = ν(A)}. This is a d−system (using same proof for part iii) of remarks on Page 5. Since: µ(C) = ν(C) ∀ C ∈ I ⇒ I ⊆ D, this implies that the σ−algebra generated by I in D, σ(I) = d(I) ⊆ D by the Monotone class theorem (Theorem 0.1). Example 0.3 Let P and P be two probability measures on (R, B(R)). If: P(−∞, x] = P (−∞, x] ∀ x ∈ R (or ∈ Q), then P and P coincides on B(R). This follows by Proposition 0.1 and therefore: B(R) = σ({(−∞, x] | x ∈ R}). Hence a cumulative distribution function of a probability measure P on (R, B(R)): F : R → [0, 1] such that F(x) = P(−∞, x] uniquely defines the measure P. 9
  • 10. 1 Axiomatic Probability Theory We know from previous Probability courses that Ω is our sample space, i.e. all outcomes of a random experiment. Example 1.1 i) Toss a coin twice. Then Ω = {HT, TH, HH, TT}. ii) An infinite sequence of coin tosses. Then Ω = {ω : N → {T, H}}. Here, |Ω| = ∞, and is uncountable. Proof. Proof that Example 1.1 ii) is uncountable. We attempt a proof by contradiction. Suppose that Ω is countable (infinite). Then we can enumerate out all possible ω. But if we find an ω not in the list, we get a contradiction, and hence Ω is uncountable. Note: What Ω = {ω : N → {T, H}} means is simply the set of all ω. So, assume we have enumerated out all our ω, say: ω1 = ω11ω12ω13ω14 . . . ω2 = ω21ω22ω23ω24 . . . ω3 = ω31ω32ω33ω34 . . . ω4 = ω41ω42ω43ω44 . . . ... = ... with ωij = H or T. So, we construct a sequence ωk say, such that ωkk = ωii, where i = 1, 2, 3, . . .. ωk is not in the above list, and therefore Ω is uncountable. Having defined ii) in Example 1.1, we wish to know the probability of the coin landing H (or T) on the ith toss. We thus want: (Ω, F) : F = σ({ω | ω(i) = H}, {ω | ω(i) = T} : i ∈ N) We thus need a probability measure on F. It is possible to embed Ω → [0, 1] where a Lebesgue measure has been constructed such that: P[{ω | ω(i) = H}] = 1 2 , ∀ i ∈ N Definition 1.1 – Random Object / Variable / Vector Given a probability space (Ω, F, P), a random object X in a measurable space (S, Σ) is a measurable function X : (Ω, F) → (S, Σ) i.e. the pre-image X−1 (Σ) ⊆ F. If X ∈ R and X ∈ mB(R), then X is a random variable. If X : Ω → Rn , X ∈ mB(Rn ), then X is a random vector. Recall: X ∈ mB(R) means that X is a measurable function with respect to B(R). 10
  • 11. Example 1.2 – Defining a random variable Recall part ii) of Example 1.1, where we defined Ω = {ω : N → {H, T}} and our σ−algebra to be F = σ({ω(i) = T}, {ω(i) = H} : i ∈ N). We can define our random variable to be: Xi(ω) = 1 ω(i) = H 0 ω(i) = T Now Xi : Ω → {0, 1}, and it is a random variable. Proof. (that Xi is a random variable) Let F = {Ω, ∅, A, A } where A is the event that Heads occurs at ith toss (therefore A is when tails occurs). So X−1 (1) = A and X−1 (0) = A , which are both in F. We define a random variable Sn = n i=1 Xi, which is the number of heads obtained in n tosses. This is a random variable as well, since the sum of random variables is a random variable. From the notes (lecturer’s Measure Theory notes), it follows that lim n→∞ 1 n Sn = p ∈ F for any p ∈ [0, 1]. If p /∈ [0, 1], then we obviously get ∅. Definition 1.2 – Law of a random variable Given a random variable X on (Ω, F, P), the law of a random variable is the probability measure PX on (R, B(R)) given by: PX [B(R)] = P X−1 (B(R)) It is enough to know PX [(−∞, x)] = P[X ∈ (−∞, x)] ∀ x ∈ R (or Q). 11
  • 12. 2 Independence Definition 2.1 – Independence Let (Ω, F, P) be a probability space and let Gi ∈ F be sub−σ-algebras for i ∈ N. The family of sub−σ−algebras Gi, i ∈ N is independent if for any sequence of events, Gi ∈ Gi, i ∈ N, we have for a family {i1, i2, . . . , ik} ⊆ N of distinct indices: P k j=1 Gij = k j=1 P [Gij] (2.1) Remarks: 1. (2.1) has to hold for all finite subsets {i1, . . . , ik} ⊆ N. 2. If we have a finite family G1, . . . , Gn of sub−σ−algebras, the condition (2.1) collapses to: P n i=1 Gi = n i=1 P [Gi] where Gi ∈ Gi for i = 1, 2, . . . , m. 3. Random variables X and Y are independent if and only if σ(X), σ(Y ) are independent. We may wish to ask what is σ(X). While we know σ(X) = X−1 (B(R)), what does it mean intuitively? σ(X) can be intuitively thought of as “information we can obtain about the outcome of the random experiment by knowing X(ω), but not knowing ω”. For random variables X, Y , we have σ(X) and σ(Y ) independent if and only if: P[X ∈ A, X ∈ B] = P[X ∈ A] · P[X ∈ B] for A, B ∈ B(R) 4. Let E1, E2, . . . be events in F. They are independent if σ(E1), σ(E2) . . . are indepen- dent, where: σ(E1) = {Ω, ∅, E1, E1} σ(E2) = {Ω, ∅, E2, E2} ... = ... Proof. Prove that E1, E2, . . . are independent if and only if: P m j=1 Eij = m j=1 P [Eij] ∀ {i1, i2, . . . , im} (⇐) This follows from Definition 2.1, showing that σ(E1), σ(E2), . . . are independent, but remark 4 shows that this means E1, E2, . . . are independent. (⇒) Exercise 12
  • 13. 5. Pairwise independence does not imply independence. Example 2.1 – Example of above statement Take Ω = {1, 2, 3, 4}, F = 2Ω , A = {1, 2}, B = {1, 3}, C = {2, 3}, and define the prob- ability measure P[ω] = 1 4 for ω ∈ Ω. Note that A, B, A, C and B, C are independent, since: P[A ∩ B] = P[{1}] = 1 4 P[A] · P[B] = 1 2 · 1 2 = 1 4 P[A ∩ C] = P[{2}] = 1 4 P[A] · P[C] = 1 2 · 1 2 = 1 4 P[B ∩ C] = P[{3}] = 1 4 P[B] · P[C] = 1 2 · 1 2 = 1 4 However, A, B, C are not independent, since: P[A ∩ B ∩ C] = 0 = P[A] · P[B] · P[C] = 1 8 Theorem 2.1 See Probability with Martingales, Williams D.W. page 39. Let (Ω, F, P) be a probability space and sub−σ−algebras H, G ⊆ F be generated by π−systems I and J respectively. In other words: σ(I) = H, σ(J ) = G Then H and G are independent if and only if I and J are independent in the sense A ∈ I, B ∈ J ⇒ P[A ∩ B] = P[A] · P[B] Proof. Our goal is to prove: P[H ∩ G] = P[H] · P[G] ∀ H ∈ H, G ∈ G (2.2) Fix A ∈ I and consider the following two measures on F: F → P[F ∩ A] F → P[F] · P[A] These measures have equal mass given by P[A]. By assumption, we have that the two measures coincide on J . Therefore by Proposi- tion 0.1, we have P[F ∩ A] = P[F] · P[A] ∀ F ∈ σ(J ) = G (2.3) To show (2.2) , we define two measures. Fix G ∈ G and let F → P[G ∩ F] F → P[G] · P[F] These two measures coincide on the π−system I by (2.3). Hence as before, the two measure coincide on σ(I) = H. 13
  • 14. Remarks: 1. Let X, Y be random variables on (Ω, F, P). Then: X and Y are independent ⇔ P[X ∈ A, Y ∈ B] = P[X ∈ A] · P[Y ∈ B] ∀ A, B ∈ B(R) (by Theorem 2.1) ⇔ P[X ≤ x, Y ≤ y] = P[X ≤ x] · P[Y ≤ y] ∀ x, y ∈ R We claim this since {(−∞, x] : x ∈ R} is a π−system in B(R) which generates B(R). Hence {X ≤ x : x ∈ R} is a π−system in σ(X) which generates σ(X) since σ(X) = X−1 (B(R)). So the π−systems π(X), π(Y ) are independent implies σ(X), σ(Y ) independent. 2. Similarly, X1, . . . , Xn random variables are independent if and only if P [Xi ≤ xi : 1 ≤ i ≤ n] = n i=1 P [Xi ≤ xi] ∀ xi ∈ R, i = 1, 2, . . . 3. If X is independent of Y and X is independent of Z, then it does not follow that X is independent of (Y, Z). Example 2.2 Let X = IA, Y = IB and Z = IC. Let A = {2, 3}, B = {1, 2}, C = {1, 3} be subsets in Ω = {1, 2, 3, 4}, F = 2Ω . Recall that A, B independent, B, C independent. However X is not independent of (Y, Z). Definition 2.2 – Joint law (of random variables) Let X, Y be random variables on (Ω, F, P) and let B(R2 ) be the Borel σ−algebra on R2 . The joint law of X and Y is given by P(X,Y ) [A] = P [(x, y) ∈ A] ∀ A ∈ B(R2 ) Remarks: 1. Recall that: B(R2 ) = B(R) ⊗ B(R) (2.4) Prove (2.4), i.e. that B(R) ⊗ B(R) = σ (U × V : U, V ∈ B(R)) (2.4) implies that B(R2 ) is generated by the π−system {(−∞, x]×(−∞, y] : x, y ∈ R} 14
  • 15. Proposition 2.1 The following statements are equivalent: a) X and Y are independent b) PX,Y = PX ⊗ PY c) Define FXY (x, y) = P[X ≤ x, Y ≤ y] ∀ (x, y) ∈ R2 . Then FXY (xy) = FX(x)FY (y). Furthermore, if (x, y) has a density, i.e. there exists fXY :R2 → [0, ∞) such that: PX,Y [A] = A fXY (xy) dx ⊗ dy then statements a), b), c) are further equivalent to: d) fXY (xy) = fX(x)fY (y) ∀ x, y ∈ R2 where fX, fY are the densities of X and Y respec- tively. Remark: Note in d), the existence of fXY implies the existence of the densities of the factors X and Y and: fX(x) = R fXY (x, y) dy fY (y) = R fXY (x, y) dx Proof. Based on B(R2 ) = B(R) ⊗ B(R). This implies that B(R2 ) is generated by the π−system {(−∞, x] × (−∞, y] : x, y ∈ R}. Apply Theorem 2.1. 15
  • 16. 3 Tail σ−algebra and Kolmogorov’s 0 − 1 law Definition 3.1 – Tail σ−algebra Let {Fn : n ∈ N} be a collection of σ−algebras. A tail σ−algebra T is given by: T = n∈N Tn where Tn = σ (Fn, Fn+1, . . .) = σ k≥n Fk . Remarks: 1. T is a σ−algebra that depends on the tail events of a sequence of experiments where the outcome of the nth experiment is given by the σ−algebra Fn. 2. Note that the tail σ−algebra depends on the choice of {Fn : n ∈ N}. Example 3.1 Let X1, X2, . . . be a sequence of random variables on (Ω, F, P) and define Fn := σ(Xn). Then Tn = σ(Xn, Xn+1, . . .) ∀ n ∈ N and T = n∈N Tn. We define the following events: F1 = ∃ lim n→∞ Xn = ω ∈ Ω : lim n→∞ Xn(ω) exists F2 = n∈N Xn exists F3 = lim n→∞ X1 + X2 + . . . + Xn n exists F4 = n∈N Xn exists and n∈N Xn = 0 Then F1, F2, F3 are contained within the tail σ−algebra of the sequences X1, X2, . . .. Proof. It helps to intuitively think of tail events as those events whose ocurrence or not is not affected by altering any finite number of random variables in the sequence. Claim that F1 ∈ T = n∈N Tn. It is enough to show that F1 ∈ Tn ∀ n. This is clear because the limit of a sequence Xk, k ∈ N only depends on (Xn+k)k∈N ∀ n ∈ N. In other words, for a sequence to have a limit, we look at the tail of the sequence, i.e. we can first discard the first finitely many terms. Similarly, F2 ∈ Tn ∀ n ∈ N ⇒ F2 ∈ T . To show F3 ∈ T is slightly trickier. Let ξ = lim sup n→∞ Sn n . 16
  • 17. We need to show the following: i) ξ(ω) is well defined ∀ ω ∈ Ω and ξ ∈ mσ(X1, X2 . . .). ii) ξ ∈ mTn ∀ n. Consider i). We know that ξ(ω) exists in [−∞, ∞] since every sequence of real numbers has a lim sup (See Probability Theory Ex Sheet 1 Q1c). Recall that ξ = inf k∈N sup n≥k Sn n and hence: {ξ ≥ a} = sup n≥k Sn n ≥ a ∈ σ(Xi, i ∈ N) which implies ξ ∈ mσ(Xi, i ∈ N). Here we used the fact that {(−∞, a], a ∈ R} is a π−system which generates B(R). We now wish to prove ii) Let Sk := Sn+k − Sn = n+k i=n+1 Xi ∈ σ (Xn+1, . . . , Xn+k). Then: Sk k = Sn+k n + k · n + k k − Sn k lim k→∞ Sk k = Sn+k n + k · (1) − 0 = Sn+k n + k Therefore, we have lim sup k→∞ Sk k = lim sup k→∞ Sn+k n + k ∈ mTn. Now consider F4. We knew that F2 ∈ T because F2 does not depend on the first finitely many terms. F4 ∈ mG, where G = σ(X1, X2, . . .), but is not in the tail σ−algebra T . This is because the event given by F4 clearly depends on the value of X1 (and possibly the first finitely many terms). If X1 is different, Xi may or may not be 0. So F4 is not necessarily in T . Theorem 3.1 – Kolmogorov’s 0-1 Law Let {Fn : n ∈ N} be a sequence of independent sub−σ−algebras in (Ω, F, P). Then the tail σ−algebra T = n∈N Tn where Tn = σ(Fn+1, Fn+2, . . .) satisfies the following two properties: i) ∀ F ∈ T ⇒ P[F] = 0 or P[F] = 1 ii) ∀ random variables ξ ∈ mT ∃ c ∈ [−∞, ∞] such that P[ξ = c] = 1. Proof. We start by proving i). Define Hn := σ(F1, F2, . . . , Fn). 17
  • 18. Step 1: We claim that Hn and Tn are independent. We know that: In = n i=1 Fi : Fi ∈ Fi, i = 1, 2, . . . , n Jn = l i=1 Fn+i : Fn+i ∈ Fn+i, i = 1, 2, . . . l, l ∈ N Both are π−systems as they are closed under intersection. In generates Hn since Fi ⊆ In ∀ i = 1, . . . , n and In ⊆ Hn. Similarly, Jn is a π−system that generates Tn, since Jn ⊇ Fn+i ∀ i ∈ N and Jn ⊆ Tn. So it is clear that ∀ A ∈ In, ∀ B ∈ Jn ⇒ P[A ∩ B] = P[A] · P[B] since we have {Fk : k ∈ N} independent sub−σ−algebras. So Hn and Tn are independent. Step 2: We claim Hn and T are independent. Since T ⊆ Tn ∀ n ∈ N, then T is independent of Hn ∀ n. Step 3: We claim that T is independent of σ n∈N Hn . Since T is independent of Hn, then this implies that T is independent of n∈N Hn which further implies that T is independent of σ n∈N Hn by Theorem 2.1. Here, we have used the fact that n∈N Hn is a π−system since this is an increasing sequence of σ−algebras. Step 4: Claim that T is independent of T . Note that σ n∈N Hn = σ (Fi : i ∈ N) and hence T ⊆ σ n∈N Hn . So for F ∈ T ⊆ σ n∈N Hn , we must have that F is independent of itself. So: P[F] = P[F ∩ F] = (P[F])2 But (P[F])2 = P[F] for F ∈ [0, 1] → P[F] = 0 or P[F] = 1. We now prove part ii). By part i), we have P [ξ ≤ x] = 0 1 ∀ x ∈ R Let c := sup {x : P[ξ ≤ x] = 0}. Define sup ∅ == ∞, so c is well defined on [−∞, ∞]. Then there are three cases. 18
  • 19. If c = −∞, this implies that P [ξ ≤ x] = 1 ∀ x ∈ R ⇒ ξ = −∞ (P−a.s.) Similarly, if c = +∞, this implies that P [ξ ≤ x] = 0 ∀ x ∈ R ⇒ ξ = +∞ (P−a.s.) Suppose c ∈ R. We then have P ξ ≤ c − 1 n = 0 ∀ n, and hence: P n∈N ξ ≤ c − 1 n = lim n→∞ P ξ ≤ c − 1 n = P [ξ < c] = 0 We also have P ξ ≤ c + 1 n = 1 ∀ n, and hence: P n∈N ξ ≤ c + 1 n = P lim n→∞ ξ ≤ c + 1 n = P [ξ ≤ c] = 1 Therefore P [ξ = c] = 1 (P−a.s.) Definition 3.2 – Infinitely often (i.o.) Let (En)n∈N be a sequence of events in (Ω, F, P). The event that En happens for infinitely many n ∈ N is given by: lim sup n→∞ En := m∈N n≥m En = En i.o. (infinitely often) Definition 3.3 – Eventually (ev) Let (En)n∈N be a sequence of events in (Ω, F, P). The event that En happens for all n ≥ m for some m ∈ N is given as: lim inf n→∞ En := m∈N n≥m En = En ev (eventually) Remarks: 1. We can also write: lim sup n→∞ En = ω ∈ Ω, ∀ m ∈ N ∃n(ω) ≥ m s.t. ω ∈ En(ω) . 2. Similarly, lim inf n→∞ En = {ω ∈ Ω, ∃ m(ω) ∈ N s.t. ∀ n ≥ m(ω) we have ω ∈ En}. 3. (En i.o.) = En ev . To see this, note that m∈N n≥m En = m∈N n≥m En . 4. (En i.o.) , (En ev) ∈ T , where T is the tail σ−algebra of the family Fn = σ(Fn). To see this, recall that T = n∈N σ(Fn, Fn+1, . . .) and note that (En i.o.) ∈ σ(Fm, Fm+1) ∀ m ∈ N since (En i.o.) = k∈N n≥k En. This is because the sequence of events n≥k En k∈N is decreasing. 19
  • 20. Lemma 3.1 – The first Borel-Cantelli lemma Let (En)n∈N be a sequence of events in (Ω, F, P) and let n∈N P [En] < ∞. Then P [En i.o.] = P [lim sup En] = 0. Proof. We have lim sup n→∞ En = m∈N Am, where Am = m≥n En. Since Am ⊆ Am+1 ∀ m ∈ N, we find P [En i.o.] = lim m→∞ P [Am]. But note that: 0 ≤ P [Am] ≤ n≥m P[En] → 0 as m → ∞ So this concludes the proof. Remarks: 1. The first Borel-Cantelli Lemma is very important. It is for example used in the con- struction of Brownian motion. 2. Let (Ω, F, P) be a probability space and let Q be a probability measure on (Ω, F). We say that Q is absolutely continuous with respect to P(Q P) if ∀ F ∈ F such that P[F] = 0 ⇒ Q[F] = 0. We claim that if Q P, then ∀ > 0, ∃ δ > 0 s.t. ∀ F ∈ F with P[F] < δ ⇒ Q(F) < . Proof. We begin a proof by contradiction, by showing that the converse statement leads to a contradiction. Our converse statement is thus: ∃ > 0 s.t. ∀ δ, ∃ Fδ ∈ F s.t. P [Fδ] < δ and Q [Fδ] ≥ Hence ∀ n ∈ N, pick δn = 2−n and let Fn ∈ F satisfy P [Fn] < 2−n and Q[Fn] ≥ . Let F = lim sup n→∞ Fn. We have P[F] = 0 by Borel Cantelli Lemma 1 (Lemma 3.1) since n∈N P [Fn] < ∞. But Q[F] = lim m→∞ Q n≥m Fn ≥ ∀ m. This implies that Q[F] ≥ . But if P[F] = 0, then Q[F] = 0 as well since Q P. Hence we have a contradiction. 20
  • 21. Lemma 3.2 – The second Borel-Cantelli lemma Let (En)n∈N in (Ω, F, P) be a sequence of independent events with n∈N P [En] = ∞. Then P [En i.o.] = P [lim sup En] = 1. Proof. Note that (En i.o.) = m∈N n≥m En. So if we can show that this has probability 0, we are done. Then: P n≥m En = lim k→∞ P k n=m En by monotonicity of measure P[Ω] = 1 = lim k→∞ k n=m P En by independence of En = lim k→∞ k n=m (1 − P[En]) ≤ lim k→∞ e − k n=m P [En] by inequality 1 − x ≤ e−x ∀x = 0 This implies that P n≥m En = 0 ∀ m ∈ N. Therefore we have: P [En i.o.] = P m∈N n≥m En ≤ m∈N P m≥n En = 0 Therefore we are done. Remarks: 1. Let (En)n∈N be the sequence of independent events. Then P [lim sup En] is either 0 or 1 by Kolmogorov’s 0 − 1 Law. 2. Furthermore, P [lim sup En] = 1 ⇔ n∈N P[En] = ∞ by Borel Cantelli Lemma 2 (Lemma 3.2) . Example 3.2 z 1. Let X ∼ N(0, 1). Then the following inequality holds: f(x) x + x−1 < P[X > x] < f(x) x , f(x) = 1 √ 2π e−x2 2 ∀ x > 0 This follows by noting that: x ∞ x f(y) dy < ∞ x yf(y) dy and that f (x) = −xf(x) and d f(x) x dx = −f(x) 1 + 1 x2 . 21
  • 22. 2. Let Xn ∼ N(0, 1) independent and let L = lim sup n→∞ Xn √ 2 log n . Show that P[L > 1] = 0. Proof. Define En(a) := Xn > (1 + a) √ 2 log n , a ∈ R. Note that L > 1 + 1 k ⊆ lim sup n→∞ En 1 2k ∀ k ∈ N. We want to show that P lim sup En 1 2k = 0 by Borel Cantelli Lemma 1 (Lemma 3.1) . P En 1 2k < 1 √ 2π exp{−1 2 1 + 1 2k 2 2 log n} 1 + 1 2k √ 2 log n using part 1 = 1 √ 2π 1 + 1 2k · exp − 1 + 1 2k 2 √ 2 log n Since n∈N n−α √ log n < ∞ for any α > 1, then Borel Cantelli Lemma 1 (Lemma 3.1) ⇒ P L > 1 + 1 k = 0. Hence {L > 1} = k∈N L > 1 + 1 k ⇒ P[L > 1] = 0. This is also equivalent to P [L ≤ 1] = 1. 3. Prove that P[L = 1] = 1. Proof. It is sufficient to show that P [L < 1 − ] = 0 ∀ > 0. Recall that {L < 1} = ∞ n=2 L < 1 − 1 n so if we can show {L < 1 − 1 n } has P = 0 a.s. then through our operations of countable union, we have {L < 1} has P = 0 a.s. We pick > 0, and consider the set: En(a) = Xn √ 2 log n > 1 + a Then {L < 1 − } = En(− ) ev = (En(− ) i.o.) . We want to show that P [lim sup En(− )] = 1. So we need to prove that n∈N P [En(− )] = ∞ by showing that P [En(− )] ≥ an for some sequence where an > 0 such that an = ∞ (using the inequalities in part 1). Exercise: Find such a sequence an. Note that En(− ) are independent since random variables Xn are independent. There- fore Borel Cantelli Lemma 2 (Lemma 3.2) ⇒ P [lim sup En(− )] = 1 ⇒ P [L < 1 − ] = 0. Exercise: Show that L ∈ mT . 22
  • 23. Example 3.3 Let Xn ∼ N(0, 1) be independent random variables, and let Sn = X1 +. . .+Xn. Show that: i) Sn √ n ∼ N(0, 1) ii) lim inf Sn n = lim sup Sn n = 0 (implies lim Sn n exists and = 0) Note that ii) is the strong Law of Large Numbers for N(0, 1) random variables. For i) It is easy to check that E Sn √ n = 0 by properties of expectations. So all we need now is to check that Var Sn √ n = 1. This holds since Var Sn √ n = 1 √ n 2 Var [Sn] = 1 n n i=1 Var [Xi]. For ii) we consider the set En = |Sn| ≤ 2 √ n log n , and claim that P [En ev] = 1. This claim is useful as it shows that we have a bound for Sn, i.e. − √ 2 log n ≤ Sn ≤ √ 2 log n for all large n with probability 1. Therefore we have: −2 √ n log n n ≤ Sn n ≤ 2 √ n log n n for large n(P−a.s.). But as n → ∞, we then get: 0 ≤ Sn n ≤ 0 which would enable us to prove ii). It then suffices to prove the claim. To prove the claim, note that (En ev) = En i.o.. We wish to apply Borel Cantelli Lemma 1 (Lemma 3.1) , and hence we need to show that n∈N P En is finite. We need to find an upper bound on P En = P |Sn| √ n ≥ 2 log n ≤ an, say, and such that an < ∞. Exercise: Find this upper bound. We can then use Borel Cantelli Lemma 1 (Lemma 3.1) to show that P En i.o. = 0. 23
  • 24. 4 Integration (Ω, F, µ) is a measure space, mF = {f : Ω → R s.t. f−1 (B(R)) ⊆ F}. The Lebesgue integral is first defined for f ∈ (mF)+ where: f ∈ (mF)+ ⇔ f ∈ mF and f ≥ 0 µ a.s. Let f = n i=1 aiIAi , AI ∈ F, ai ≥ 0 be a simple function with Ω f dµ = n i=1 aiµ(Ai). For general f ∈ (mF)+ , we find (fn)n∈N of simple functions such that fn(ω) f(ω) ∀ ω ∈ Ω. (Here, means that fn(ω) is a monotone increasing sequence which converges to f(ω)) We define: Ω f dµ = lim n→∞ Ω fn dµ (4.1) We need to check that (4.1) is a good definition, and hence need to check: i) lim exists in (4.1) (which is true since fn ≤ fn+1 ∀ n ⇒ Ω fn dµ ≤ Ω fn+1 dµ. ii) gn(ω) f(ω), gn simple functions, then ∀ ω ∈ Ω: Ω gn dµ −→ n→∞ Ω f dµ ii) In other words we need to check that the definition is independent of sequences (fn)n∈N. ii) Exercise: check this. Theorem 4.1 – Monotone convergence theorem Take f, fn ∈ (mF)+ such that fn(ω) ≤ fn+1(ω) ∀ n ∈ N, ω ∈ Ω and f(ω) = lim n→∞ fn(ω). Then we have Ω f dµ = lim n→∞ Ω fn dµ. Properties of Lebesgue Integral • (Linearity) - For a, b ≥ 0, g, h ∈ (mF)+ , then: Ω (ag + bh) dµ = a Ω g dµ + b Ω h dµ • f ∈ (mF)+ ⇒ Ω f dµ ≥ 0 (from (4.1)) since it is true for simple functions. 24
  • 25. Definition 4.1 – Integrable We define: L1 (Ω, F, µ) =    f ∈ mF s.t. Ω f+ dµ, Ω f− dµ < ∞    where f+ = max{f, 0}, f− = max{−f, 0}. Then we say f ∈ mF is integrable. We have: • Ω f dµ := Ω f+ dµ − Ω f− dµ • |f| = f+ + f− Therefore we have: Ω f dµ = Ω f+ dµ − Ω f− dµ ≤ Ω f+ dµ + Ω f− dµ = Ω |f| dµ Lemma 4.1 – Fatou’s lemma Let (fn)n∈N be a sequence in (mF)+ . Then we have: Ω lim inf n→∞ fn dµ ≤ lim inf Ω fn dµ Proof. Recall that lim inf n→∞ fn = lim n→∞ gn where gn = inf{fn, fn+1, . . .}. Note that (gn)n∈N is non-decreasing and gn ≤ fn ∀ n ∈ N. By Monotone Convergence Theorem (Theorem 4.1), we have: Ω lim inf n→∞ fn dµ = Ω lim n→∞ gn dµ = lim n→∞ Ω gn dµ = lim inf n→∞ Ω gn dµ since if lim exist, then lim inf exists ≤ lim inf n→∞ Ω fn dµ using Ω f dµ ≥ Ω g dµ ∀ n ∈ N 25
  • 26. Theorem 4.2 – Dominated convergence theorem Let fn, f ∈ (mF) and assume that ∃ g ∈ L1 (Ω, F, µ) s.t. |fn| ≤ g ∀ n ∈ N and lim n→∞ fn(ω) = f(ω) ∀ ω ∈ Ω. Then Ω f dµ = lim n→∞ Ω fn dµ. Proof. Note that f ∈ L1 since |f| ≤ g. Here we use f+ ≤ g ⇒ Ω f+ dµ ≤ Ω g dµ, so f+ , f− ∈ L1 and bounded by g. We wish to show that Ω |f − fn| dµ → 0. Note that: 2g − |f − fn| Fn ≥ g − |f| + g − |fn| ≥ 0 ∀ n By Fatou’s lemma (Lemma 4.1) applied to (Fn)n∈N, we get: Ω lim inf Fn dµ ≤ lim inf Ω Fn dµ (4.2) We also know that the terms on the LHS and RHS of (4.2): Ω lim inf n→∞ Fn dµ = Ω 2g − lim inf n→∞ |f − fn| dµ Ω Fn dµ = Ω 2g dµ − Ω |f − fn|dµ = Ω 2g − 0 dµ = Ω 2g dµ Rearranging (4.2), we have: Ω 2g dµ ≤ Ω 2g dµ − lim sup n→∞ Ω |f − fn| dµ ≥0 This implies that lim sup Ω |f − fn| dµ = 0. We know that lim sup n→∞ Ω |fn−f| dµ = 0 ⇒ lim n→∞ Ω |fn−f| dµ = 0 since lim inf n→∞ Ω |fn−f| dµ = 0 as well as |f − fn| is non-negative and lim inf ≤ lim sup. Therefore Ω f dµ − Ω fn dµ ≤ Ω |f − fn| dµ → 0. 26
  • 27. 5 Expectations We take (Ω, F, P) to be our probability space, and X a random variable, which implies that X ∈ mF. If X ≥ 0, then E[X] = Ω X dP = Ω X(ω) P[dω]. For X ∈ mF, we say X ∈ L1 (Ω, F, P) if E [X+ ] , E [X− ] < ∞ where: X+ = max{X, 0}, X− = max{−X, 0} Then expectation of X is given by E[X] = E [X+ ] − E [X− ]. Proposition 5.1 Let X be a random variable on (Ω, F, P) and let g : F → R be Borel measurable. Then g(X) is in L1 (Ω, F, P) ⇔ g ∈ L1 (R, B, PX ) where PX [A] = P[X ∈ A] ∀ A ∈ B(R). Then we have: E [g(X)] = R g(x)PX [dx] (5.1) Remarks: 1. If X is a continuous random variable, i.e. PX γL = Lebesgue measure ⇔ PX [A] = A fX(x) dx, then by Proposition 5.1, we have E [g(X)] = R g(x)fX(x) dx. 2. If X is a discrete random variable, e.g. X ∈ N with probability 1, then E [g(X)] = k∈N g(k)P[X = k] where PX [k] = P[X = k]. Proof. of Proposition 5.1. Here we want to show that this holds, starting from indicator random variable, simple random variable, non-negative random variable, to all random variables. Let g = IA, A ∈ B(R). Then (5.1) holds, since E [IA(x)] = P[X ∈ A] = PX [A]. (Indicator random variables) By linearity of integrals, and that simple random variables are finitely weighted sums of indicator functions, then (5.1) holds for simple random variables as well. (Simple random variables) Assume g ≥ 0 and let 0 ≤ gn g be a sequence of simple random variables that is monotonic and converges to g. We have E [gn(X)] = R gn(X)PX [dx]. Then Monotone Convergence Theorem (Theorem 4.1) implies (5.1) for g ≥ 0. This is because gn(X) is a simple random variable on Ω, lim n→∞ gn(x) = g(x), gn(x) g(x), so Monotone Convergence Theorem (Theorem 4.1) tells us that E [g(X)] = lim n→∞ E [gn(X)] and R gndPX R g dPX . So (5.1) holds for non-negative random variables. (Non-negative random variables) 27
  • 28. Lastly, take g ∈ L1 (R, B, PX ), then apply (5.1) to g+ , g− . Then by linearity of integral, (5.1) holds for random variables. (All random variables) Lemma 5.1 X ∈ (mF)+ and E[X] = 0 ⇒ P[X = 0] = 1(⇔ P[X > 0] = 0) Proof. Note that {X > 0} = n∈N X > 1 n . We attempt a proof by contradiction, and assume that P[X > 0] > 0. P[X > 0] > 0 ⇒ ∃ n ∈ N s.t. P X > 1 n > 0. Then we have: E[X] = Ω X dP = Ω XI{X> 1 n } dP + Ω XI{X≤ 1 n } dP ≥ Ω XI{X> 1 n } dP ≥ Ω 1 n I{X> 1 n } dP = 1 n P X > 1 n > 0 Which is a contradiction. Therefore P[X] = 0. 28
  • 29. 6 Inequalities Definition 6.1 – Convex function (in R) A function f : I → R, where I ⊆ R is an interval (either open or closed) is convex if ∀ p ∈ (0, 1) and x, y ∈ I, we have f(px + (1 − p)y) ≤ pf(x) + (1 − p)f(y). If f is a convex function, then f is continuous. Exercise: Prove by contradiction. Example 6.1 – Examples of convex functions 1. x → |x| 2. x → x2 3. x → eθx ∀ θ ∈ R Example 6.2 – Examples of non-convex functions 1. x → −|x| (concave function) 2. x → sin x (neither convex or concave) Exercise: Prove this. Proposition 6.1 If f is both convex and concave on R, there exists a, b ∈ R such that f(x) = ax + b ∀x ∈ R. Exercise: Prove this. Exercise: Prove that a concave function is continuous (this follows from a convex function is continuous). Proposition 6.2 If f : I → R is in C2 (I), then f is convex ⇔ f (x) ≥ 0 ∀ x ∈ I. Proof. ⇒ Using Taylor’s theorem, we can expand: f(x + ) = f(x) + f (x) + 2 2 f (ξx), where ξx ∈ (x, x + ) and f(x − ) = f(x) − f (x) + 2 2 f (ξx), where ξx ∈ (x, x + ) Then we can write: f (x) = f(x + ) + f(x − ) − 2f(x) 2 (6.1) as when 0, then ξx → x. Assume f is convex. Then we can write x = p(x − ) + (1 − p)(x + ) where p = 1 2 . 29
  • 30. By convexity, we can write: f(x) = f 1 2 (x − ) + 1 2 (x + ) ≤ 1 2 f(x − ) + 1 2 f(x + ) This gives f(x+ )+f(x− )−2f(x) ≥ 0, and since 2 > 0, this implies that (6.1) (f (x)) ≥ 0. ⇐ Exercise Theorem 6.1 – Markov’s inequality Let (Ω, F, P) be a probability space. Then take X ∈ mF, and g : I → [0, ∞] a non- decreasing B−measurable function where I ⊆ R is an interval such that P[X ∈ I] = 1. Then g(c) · P[X ≥ c] ≤ E [g(X)] ∀ c ∈ I. Note here that E [g(X)] exists (which may be +∞) since g(X) ∈ (mF)+ . Proof. g(c) · P[X ≥ c] = E [g(c) · IX≥c] ≤ E [g(X) · IX≥c] since on {X ≥ c} we have g(X) ≥ g(c) as g non-decreasing ≤ E [g(X)] this holds since g(X) ≥ 0. Example 6.3 – Examples of using Markov’s inequality Suppose x ∈ mF, > 0. Then: P [|x| ≥ ] ≤ E [|x|] (6.2) and P [|x| ≥ ] ≤ E [x2 ] 2 (6.3) (6.2) follows by applying Markov’s inequality (Theorem 6.1) to the random variable |X| and having g : [0, ∞] → [0, ∞], with x → x. (6.3) follows by applying Markov’s inequality (Theorem 6.1) to the random variable |X| and having g : [0, ∞] → [0, ∞], with x → x2 . (6.3) is also known as Chebyshev’s inequality. Theorem 6.2 – Jensen’s inequality Let (Ω, F, P) be a probability space. Let X be a random variable such that P[X ∈ I] = 1, where I ⊆ R is an interval. Let g : I → R be a convex function such that E [g(X)] < ∞ and E [|x|] < ∞. Then g (E [X]) ≤ E [g(X)]. Proof. Since g is convex, we have g(x) = sup n∈N {anx + bn} ∀ x ∈ I and some sequences (an)n∈N, (bn)n∈N. Hence: g(X) ≥ anX + bn ∀ n ∈ N ⇒ E [g(X)] ≥ anE[X] + bn ⇒ E [g(X)] ≥ sup n∈N {anE[X] + bn} = g (E[X]) 30
  • 31. Remarks: 1. If assumptions of Jensen’s inequality (Theorem 6.2) hold and g is concave, then we get the inequality g (E[X]) ≥ E [g(X)]. 2. If a random variable X takes two values, x, y ∈ I with p = P[X = x] and 1 − p = P[X = y], then Jensen’s inequality (Theorem 6.2) is just the definition of convexity of g. i.e. g (px + (1 − p)y) ≤ pg(x) + (1 − p)g(y). 3. Under assumptions above, we have E[X] ∈ I. Exercise: Prove this. Hint: If I = (a, b), then P[X < b] = 1 and P[X > a] = 1. We have E[X] < E[b] = b. Hence g (E[X]) is well defined. Definition 6.2 – Lp space We define Lp (Ω, F, P) to be {X ∈ mF : E [|x|p ] < ∞}. p = 1, 2 are the most common, but for p ≥ 1 we get a vector space. Proof. (p ≥ 1) is a vector space) We first note that: (x + y)p ≤ (2 max{x, y})p ≤ 2p (xp + yp ) ∀ x, y ≥ 0 (6.4) Take X, Y ∈ Lp . We need to show that E [|αX + βY |p ] < ∞ for α, β ∈ R. So: E [|αX + βY |p ] ≤ E [(|αX| + |βY |)p ] by inequality ≤ 2p (E [|α|p |X|p ] + E [|β|p |Y |p ]) using (6.4) < ∞ Definition 6.3 – x p We define x p := (E [|X|p ]) 1 p for X ∈ Lp , p ≥ 1. Note that this is not a norm - the first property fails. Theorem 6.3 – Cauchy-Schwarz inequality Take X, Y ∈ L2 . Then XY ∈ L1 and: |E[XY ]| ≤ E [|XY |] ≤ E[X2 ]E[Y 2 ] 1 2 Furthermore, we have equality if there exists a, b ∈ R s.t. |a| + |b| > 0 and aX + bY = 0 (P−a.s.) Proof. We first note that: 0 ≤ (X + λY )2 ∀ λ ∈ R (6.5) and X + λY ∈ L2 . 31
  • 32. Hence XY = 1 2 [(X + Y )2 − X2 − Y 2 ] ∈ L1 . From (6.5), we have: 0 ≤ X2 + 2λXY + λ2 Y 2 ⇒ E[0] ≤ E[X2 ] + 2λE[XY ] + λ2 E[Y 2 ] ∀ λ ∈ R We can differentiate the above function and find λ which gives the minimum value, which is λ = − E[XY ] E[Y 2] . Note that if E[Y 2 ] = 0 ⇒ Y = 0 (P−a.s.), then Cauchy-Schwarz inequality (Theorem 6.3) holds. So WLOG, we assume Y = 0. Substituting the value of λ, we get: 0 ≤ E[X2 ] − 2 E[XY ]2 E[Y 2] + E[XY ]2 E[Y 2] ⇒ E[XY ]2 ≤ E[X2 ]E[Y 2 ] This satisfies our theorem. However, if we have equality, then we know that: 0 = E (X + λY )2 for λ = E[XY ] E[Y 2] This implies that X + λY = 0 (P−a.s.) If Y is 0 (P−a.s.), then E[X2 ] = 0, and therefore X = 0 (P−a.s.) As L2 (Ω, F, P) is a vector space, and we define the ‘inner product’: < X, Y > = E[XY ]. This is well defined since X, Y ∈ L2 ⇒ XY ∈ L1 . Then the Cauchy-Schwarz inequality takes the form: | < X, Y > | ≤ X 2 Y 2 where X 2 = E [|X|2 ] 1 2 . Note that the inequality X+Y 2 ≤ X 2+ Y 2 holds for X, Y ∈ L2 by Cauchy-Schwarz inequality (Theorem 6.3). Proof. X + Y 2 2 = E [(X + Y )2 ] = E[X2 ] + E[Y 2 ] + 2E[XY ] ≤ E[X2 ] + E[Y 2 ] + 2E[X2 ] 1 2 E[Y 2 ] 1 2 by applying Cauchy-Schwarz inequality = ( X 2 + Y 2)2 32
  • 33. Theorem 6.4 – Monotonicity of Lp norms Given X ∈ Lp (Ω, F, P), p ≥ 1; X p = E [|X|p ] 1 p , then for 1 ≤ p ≤ r < ∞, we have for any Y ∈ Lr (Ω, F, P), Y p ≤ Y r. In particular, Lr ⊆ Lp . Proof. Note that g(x) = x r p is convex on [0, ∞). Then we have: g (E [|Y |p ]) ≤ E [g (|Y |p )] by Jensen’s inequality ⇒ E [|Y |p ] r p ≤ E [|Y |r ] ⇒ Y p ≤ Y r by taking rth root on both sides Remark: Note that Theorem 6.4 holds for probability measures only. Exercise: Find f ∈ L2 (R, B, γL) such that f /∈ L1 (R, B, γL). Recap of Definitions from probability: •Cov[X, Y ] = E [(X − E[X])(Y − E[Y ])] is well defined • Var[X] = Cov[X, X] • |Cov[X, Y ]| ≤ Var[X]Var[Y ] (by Cauchy-Schwarz inequality) • ρ(X, Y ) = Cov[X, Y ] Var[X]Var[Y ] ∈ [−1, 1] (correlation between 2 random variables) These concepts are well defined if X, Y ∈ L2 . Theorem 6.5 – Independence If random variables X, Y ∈ L1 (Ω, F, P) and X and Y are independent, then XY ∈ L1 (Ω, F, P). Furthermore E[XY ] = E[X]E[Y ]. Remarks: 1. Let f, g : R → R ∈ mB and (independent) X, Y as in Theorem 6.5. Then if E [f(X)] , E [g(Y )] are finite, we have: E [f(X)g(Y )] = E [f(X)] E [g(Y )] (6.6) Exercise: Prove that X, Y independent ⇒ f(X), g(Y ) independent. Use the fact that f(X), g(Y ) ∈ mF. To prove (6.6), we apply Theorem 6.5 to f(X) and g(Y ). 2. If X, Y are independent in L2 ⇒ Cov[X, Y ] = 0. So: Cov[X, Y ] = E [(X − E[X])(Y − E[Y ])] = E [X − E[X]] E [Y − E[Y ]] = 0 33
  • 34. Example 6.4 Take E[X] = 0, E[|X|3 ] < ∞. In other words, X ∈ L3 and E[X] = 0. If E[X3 ] = 0, then Cov[X, X2 ] = 0 and X and X2 are not independent. Prove this. Proof. Sketch proof of Theorem 6.5 Exercise: Write out full proof. Note that it is enough to prove theorem for X, Y ∈ L1 ∩ (mF)+ since X = X+ − X− , Y = Y + − Y − . Note that X+ and Y + are independent since X+ and Y + are some functions of X, Y (max{X, 0}, max{Y, 0}) and use linearity of expectation. Assume X, Y ≥ 0 and note that α(n) (X) X, α(n) (Y ) (Y ) ∀ ω ∈ Ω for α(n) : R → R given by: α(n) (x) :=    0 x = 0 (i − 1)2−n (i − 1)2−n < x ≤ i2−n ≤ r, i ∈ N n x > n Then note that: (Show this as an exercise) 1. α(n) (X) is a simple function. 2. α(n) (X), α(n) (Y ) are independent. 3. E α(n) (X)α(n) (Y ) = E α(n) (X) E α(n) (Y ) ∀ n. 4. α(n) (X)α(n) (Y ) XY as n → ∞. 5. Theorem 6.5 follows by Monotone Convergence Theorem (Theorem 4.1) on 3. 34
  • 35. 7 Convergence of Random Variables Let (Xn)n∈N be a sequence of random variables on (Ω, F, P). Definition 7.1 – Converge almost surely The sequence (Xn)n∈N converges to a random variable X almost surely if the set: lim n→∞ Xn = X = ω ∈ Ω lim n→∞ Xn(ω) = X(ω) has probability 1, i.e. P lim n→∞ Xn = X = 1. Definition 7.2 – Converges in probability The sequence (Xn)n∈N converges in probability to a random variable X if: ∀ > 0, lim n→∞ P [|Xn − X| > ] = 0 Definition 7.3 – Converges in Lp Let Xn ∈ Lp (Ω, F, P), p ≥ 1 ∀ n. Then the sequence (Xn)n∈N converges in Lp to a random variable X ∈ Lp if E [|Xn − X|p ] −−−−→ n→∞ 0. Notation: Xn · p −→ X. Definition 7.4 – Cauchy (in Lp ) A sequence (Xn)n∈N is Cauchy in Lp if: ∀ > 0 ∃ N ∈ N s.t. Xn − Xm p < ∀ n, m > N Definition 7.5 – Converges in distribution The sequence (Xn)n∈N converges in distribution to a random variable X if: lim n→∞ P [Xn ≤ x] = P [X ≤ x] ∀ x ∈ R such that the cdf FX(y) = P [X ≤ y] is continuous. Convergence in distribution is also known as weak convergence. Notation: Xn d −→ X or Xn w −→ X. Remarks: 1. Note that if Xn d −→ X, then the random variables (Xn)n∈N, X need not be defined on the same probability space. For other modes of convergence, (Xn)n∈N, X have to be defined on the same probability space. 2. (Xn)n∈N in Lp is Cauchy if and only if sup n,m≥N Xn − Xm −−−−→ n→∞ 0. 35
  • 36. Lemma 7.1 Convergence in probability implies almost sure convergence along the subsequence. In other words, if (Xn)n∈N converges in probability to X, with (Xn)n∈N, X ∈ (mF) (on probability space (Ω, F, P), then there exists a subsequence (Xkn )n∈N s.t. Xkn a.s. −→ X. Proof. Idea of proof examinable Let ( n)n∈N be a decreasing sequence of positive real numbers such that n 0. Then ∀ n ∈ N, ∃ kn ∈ N s.t. P [|Xkn − X| > n] < 2−n (since Xn P −→ X). WLOG, we can assume that kn < kn+1 ∀ n ∈ N. Now we prove that (Xkn )n∈N tends to X almost surely, using Borel Cantelli Lemma 1 (Lemma 3.1). Note that ∀ ω ∈ Ω, we have: (Xkn (ω))n∈N converges to X(ω) ⇔ ω ∈ m∈N lim inf n→∞ |Xkn − X| ≤ 1 m (7.1) Fix m, then note: lim inf n→∞ |Xkn − X| ≤ 1 m ⊇ lim inf n→∞ {|Xkn − X| ≤ n} since n 0 = lim sup n→∞ {|Xkn − X| > n} Now: n∈N P [|Xkn − X| > n] < ∞ ⇒ P lim sup n→∞ {|Xkn − X| > n} = 0 by Borel Cantelli Lemma 1 (Lemma 3.1) ⇒ P lim inf n→∞ |Xkn − X| ≤ 1 m = 1 ∀ m ∈ N ⇒ P m∈N lim inf n→∞ |Xkn − X| ≤ 1 m = 1 ⇒ P [{Xkn → X}] = 1 by (7.1) Remark: If (Xn)n∈N, X are random variables on (Ω, F, P) and Xn ∈ mG ∀ n ∈ N, where G ⊆ F, then if Xn P −→ X, we also have X ∈ mG. Proof. By Lemma 7.1, ∃ subsequence (Xkn )n∈N s.t. Xkn a.s. −−−−→ n→∞ X ⇒ X ∈ mG since Xkn ∈ mG ∀ n ∈ N. 36
  • 37. Proposition 7.1 A sequence of random variables (Xn)n∈N converges to X in distribution (or converges weakly) if and only if lim n→∞ E [f(Xn)] = E [f(X)] ∀ f : R → R continuous and bounded. Proof. (⇒) Let Fn(x) = P[Xn ≤ x], F(x) = P[X ≤ x] be the cdf of Xn and X respectively. Let ([0, 1], B, γL) be a probability space and define random variables: Yn(ω) := inf {z ∈ R : ω ≤ Fn(z)} ∀ ω ∈ [0, 1] Y (ω) := inf {z ∈ R : ω ≤ F(z)} ∀ ω ∈ [0, 1] Exercise: Show that Yn(ω), Y (ω) ∈ mB. Note: Yn(ω) ≤ y ⇔ ω ≤ Fn(y) ∀ y ∈ R. Exercise: Show this. Therefore: γL (Yn ≤ y) = Fn(y) = P [Xn ≤ y] and E [f(Xn)] = E [f(Yn)]. A similar equality holds for X and Y . Now: Xn d −→ X ⇒ Fn(x) → F(x) ∀ x ∈ R {points of discontinuity of F} ⇒ Yn → Y γL a.s. Hence E [f(Yn)] −−−−→ n→∞ E [f(Y )] = E [f(X)] by Dominated Convergence Theorem (Theo- rem 4.2) since f(Yn) → f(Y ) as f is continuous and |f(Yn)| ≤ sup x∈R |f(x)| < ∞. ⇐ as homework. Theorem 7.1 – Modes of Convergence The implications between modes of converges of random variables are: a) almost sure convergence implies convergence in probability. b) Lp convergence (for p ≥ 1) implies convergence in probability. c) convergence in probability implies convergence in distribution. Proof. a) Let (Xn)n∈N converge almost surely to X, and pick > 0. We need to prove that P [|Xn − X| > ] −−−−→ n→∞ 0. Let An := {|Xn − X| > } and note that P [An i.o.] = P [Xn does not converge to X] = 0 by our initial assumption. Then: 37
  • 38. 0 = P [An i.o.] = P lim sup n→∞ An = P m∈N Bm where Bm = n≥m An = lim m→∞ P [Bm] since Bm ⊇ Bm+1 ∀ m ∈ N = inf m∈N P[Bm] since (P[Bm])m∈N decreasing ≥ inf m∈N sup n≥m P[An] since P[Bm] ⊇ P[An] ∀ n ≥ m = lim m→∞ sup n≥m P[An] ≥ 0 This implies that lim m→∞ P[Am] = 0. b) Let (Xn)n∈N converge in Lp to X. Pick > 0, then we apply Markov’s inequality (Theorem 6.1) to f(x) = xp ; f : R+ → R+ to get: 0 ≤ p P [|Xn − X| > ] ≤ E [|Xn − X|p ] Hence lim n→∞ P [|Xn − X| > ] = 0 since lim n→∞ E [|Xn − X|p ] = 0 by assumption. c) Let Xn P −→ X, and pick f : R → R continuous and bounded. We need to check that E [f(Xn)] −−−−→ n→∞ E [f(X)]. We argue by contradiction. Contrapositive statement: ∃ > 0 and an increasing subsequence (kn)n∈N, kn ∈ N s.t. |E [f (Xkn )] − E [f(X)]| > . We denote Yn := Xkn ∀ n ∈ N. Then note that: Xkn P −→ X ⇒ ∃ subsequence of (Yn)n∈N , say (Yln )n∈N s.t. Yln −−−−→ n→∞ X a.s. by Lemma 7.1 ⇒ f (Yln ) −−−−→ n→∞ f(X) a.s. as f continuous ⇒ lim n→∞ E [f (Yln )] = E [f(X)] by Dominated Convergence Theorem (Theorem 4.2) as f bounded ⇒ |E [f (Yln )] − E [f(X)]| < ∀ n ≥ N0 ∈ N This is a contradiction. 38
  • 39. Corollary 7.1 Xn → X in probability if and only if every subsequence (Xkn )n∈N of (Xn)n∈N has a further subsequence that converges almost surely to X. Proof. (⇒) This follows from Lemma 7.1 since Xkn P −→ X. (⇐) We will prove the negation of the statement. Assume that (Xn)n∈N does not converge to X in probability, and: ∃ , δ > 0 and k : N → N s.t. P[|Xk(n) − X| > Ak(n) ] ≥ δ ∀ n ∈ N Then no subsequence of Xk(n) n∈N converges to X almost surely. Let l : N → k(N) be an increasing function. We must show that this subsequence Xl(n) n∈N of Xk(n) n∈N does not converge to X almost surely. Note that: P Al(n) i.o. = P lim sup n→∞ Al(n) ≥ lim sup n→∞ P Al(n) by Fatou’s lemma (Lemma 4.1) ≥ δ > 0 Therefore the negation is proved. Corollary 7.2 – Continuous mapping theorem Let (Xn)n∈N converge to X in probability (or respectively converge to X in distribution), and let f : R → R be a continuous function. Then (f(Xn))n∈N converges to f(X) in probability (or respectively in distribution). Proof. Converges in probability By Corollary 7.1, since f(Xn) P −−−−→ n→∞ f(X) if and only if every subsequence f Xk(n) n∈N has a further subsequence that tends to f(X) almost surely because f is continuous. Converges in distribution By Proposition 7.1, since f(Xn) d −−−−→ n→∞ f(X) if and only if E [g (f(Xn))] −−−−→ n→∞ E [g (f(X))] for every g : R → R that is continuous and bounded. 39
  • 40. Theorem 7.2 – Weak law of large numbers Let Yn ∈ L2 and {Yn : n ∈ N} independent and let µ = E[Yn] identically distributed. µ is finite since L2 ⊂ L1 . Define Xn = 1 n n i=1 Yi. Then Xn P −−−−→ n→∞ µ. Proof. For every > 0, we have: 2 P [|Xn − µ| ≥ ] ≤ E [(Xn − µ)2 ] by Markov’s inequality (Theorem 6.1) = Var [Xn] = 1 n2 Var n i=1 Yi = Var [Y1] n → 0 as n → ∞ Therefore Xn P −−−−→ n→∞ µ. Theorem 7.3 – Strong law of large numbers Let Yn ∈ L4 and {Yn : n ∈ N} independent and let µ = E[Yn] identically distributed. µ is finite since L4 ⊂ L1 . Then Xn−−−−→ n→∞ µ almost surely. Proof. Without loss of generality, we assume µ = 0. Otherwise, we could consider Yn = Yn − µ. Then: E [X4 n] = 1 n4 n i=1 E Y 4 i + 6 n 1≤i<j≤n E Y 2 i Y 2 j ≤ 1 n4 nc + 6 n(n − 1) 2 E Y 2 1 Y 2 2 for some constant c > 0 ≤ d n2 for some constant d > 0 Thus this implies E ∞ n=1 X4 n ≤ ∞ n=1 d n2 < ∞. Therefore Xn−−−−→ n→∞ 0 almost surely. Theorem 7.4 – Completeness of Lp The space Lp (Ω, F, P) is complete for any p ≥ 1. In other words, any Cauchy sequence (Xn)n∈N in Lp has a limit in Lp . In other words, there exists a random variable X in Lp such that Xn · p −→ X. Proof. Exercise 40
  • 41. Remarks: 1. Proof of this theorem uses Borel Cantelli lemma, Fatou’s lemma, etc. See notes for details. 2. If p = 2, we define < X, Y >= E[XY ] where X, Y ∈ L2 . Pythagoras Theorem says: If X, Y ∈ L2 satisfy < X, Y >= 0 (i.e. they are orthogonal), then: X + Y 2 2 = X 2 2 + Y 2 2 (7.2) where X 2 = √ < X, X > = (E[X2 ]) 1 2 . Proof. X + Y 2 2 = < X + Y, X + Y > = X 2 2 + Y 2 2 + 2 < X, Y > 0 3. In probabilistic language, if < X, Y >= 0 and E[X] = E[Y ] = 0, then Cov[X, Y ] = 0. Furthermore, Var[X + Y ] = Var[X] + Var[Y ]. This is equivalent to (7.2). 4. Parallelogram Law: 1 2 X + Y 2 2 + X − Y 2 2 = X 2 2 + Y 2 2 ∀ X, Y ∈ Lp Proof. Exercise Theorem 7.5 Let (Ω, F, P) be a probability space and G ⊆ F a sub−σ−algebra of F. Then L2 (Ω, G, P) is a complete subspace of L2 (Ω, F, P) and ∀ X ∈ L2 (Ω, F, P), there exists Y ∈ L2 (Ω, G, P) such that the following holds: i) X − Y 2 = inf { X − Z 2 : Z ∈ L2 (Ω, G, P)} ii) E [(X − Y )Z] = 0 ∀ Z ∈ L2 (Ω, G, P) Furthermore, i) and ii) are equivalent and Y ∈ L2 (Ω, G, P) satisfies i) or ii) if and only if Y = Y (P−a.s.) Note that ii) ⇔< (X − Y, Z >= 0 ∀Z ∈ L2 (Ω, G, P). Proof. We need to show that L2 (Ω, G, P) is complete. Then: Take (Xn)n∈N in L2 (Ω, G, P) Cauchy ⇒ (Xn)n∈N is Cauchy in L2 (Ω, F, P) ⇒ Xn · 2 −→ X ∈ L2 (Ω, F, P) ⇒ Xn P −→ X by Theorem 7.1 ⇒ ∃ subsequence Xkn −−−−→ n→∞ X a.s. ⇒ X ∈ mG ⇒ X ∈ L2 (Ω, G, P) There ∃ (Yn)n∈N ∈ L2 (Ω, G, P) such that: X − Yn 2 → d := inf X − Z : Z ∈ L2 (Ω, G, P) 41
  • 42. We apply the parallelogram law to X − Ym, X − Yn: Ym − Yn 2 2 = 2 ( Ym − X 2 2 + X − Yn 2 2) − 4 X − (Ym + Yn)/2 2 2 ≤ 2 ( Ym − X 2 2 + X − Yn 2 2) − 4d2 ≤ 2(d2 + d2 ) − 4d2 as n, m → ∞ = 0 Hence (Yn)n∈N is Cauchy in L2 (Ω, G, P) such that Yn − Y 2 −−−−→ n→∞ 0. Note: d ≤ X − Y 2 ≤ X − Yn 2 + Yn − Y 2. For every n ∈ N ⇒ d ≤ X − Y ≤ d. Hence i) holds. Now we show that i) ⇒ ii) by contradiction. Assume ∃Z ∈ L2 (Ω, G, P) such that < X − Y, Z > > 0 and Z 2 = 1. Then Y + < X − Y, Z > Z ∈ L2 (Ω, G, P). X − (Y + < X − Y, Z > Z 2 2 = X − Y 2 2+ < X − Y, Z >2 Z 2 − 2 < X − Y, Z >2 = X − Y 2 2− < X − Y, Z >2 < X − Y 2 2 This is a contradiction because of i): we know that X−Y 2 = inf { X − Z 2 : Z ∈ L2 (Ω, G, P)}. But X − Y 2 is the smallest element, and we cannot have anything smaller than that. Hence i) ⇒ ii). To see that ii) ⇒ i), note that: X − Z 2 2 = |(X − Y ) + (Y − Z) 2 2 = X − Y 2 2 + Y − Z 2 2 by Pythagoras Theorem since Y − Z ∈ L2 (Ω, G, P) ≥ X − Y 2 2 So ii) ⇒ i). If Y satisfies ii), then: a = X − Y 2 2 = X − Y 2 2 + Y − Y 2 2 ≥ X − Y 2 2 = b By i), a = b (since there can only be one infimum), hence Y − Y 2 2 = 0. This implies that Y = Y (P−a.s.), because E Y − Y 2 2 = 0. 42
  • 43. 8 Characteristic Functions and the Central Limit Theorem Definition 8.1 – Characteristic function Let X be a random variable taking values in R with cumulative distribution function F = FX and law µ (i.e. µ is a measure on (R, B) such that µ(a, b) = F(b) − F(a) ∀ a ≤ b ∈ R). The characteristic function of X is given by φ : R → C such that: φ(θ) = E eiθX = E [cos(θX)] + iE [sin(θX)] = R eiθx µ(dx) = R eiθx dF(x) Remarks: 1. X ∼ Y ⇒ φX = φY where φX is the characteristic function of X and φY is the characteristic function of Y . 2. φ(θ) is well-defined for every θ ∈ R since eiθx = sin2 (θx) + cos2(θx) = 1 ∀ x, θ ∈ R. Hence eiθX ∈ L1 . Theorem 8.1 Let φ = φX be the characteristic function of a random variable X. Then: 1. φ(0) = 1 (by definition). 2. |φ(θ)| ≤ 1. 3. θ → φ(θ) is continuous ∀ θ ∈ R. Exercise: Prove this using DCT. 4. φ−X(θ) = φX(−θ) = φX(θ) ∀ θ ∈ R. 5. φaX+b(θ) = eiθb φX(aθ) ∀ a, b ∈ R. 6. If E [|X|n ] < ∞ for some n ∈ N, then φ (n) X (0) = in E[Xn ]. Exercise: Prove this using DCT. Theorem 8.2 If X and Y are independent, then φX+Y (θ) = φX(θ)φY (θ) ∀ θ ∈ R. Remark: If E eiαX+iβY = E eiαX E eiβY ∀ α, β ∈ R, then X and Y are independent. 43
  • 44. Theorem 8.3 – Levy’s inversion formula Let φ be a characteristic function of a random variable X with law µ and cumulative distribution function F. Then: lim T→∞ 1 2π T −T e−iθa − e−iθb iθ φ(θ)dθ = 1 2 µ({a}) + µ(a, b) + 1 2 µ({b}) = − 1 2 FX(a) + FX(a− ) + 1 2 FX(b) + FX(b− ) where F(a− ) = lim x a F(x). Proof. Elementary. Exercise Remarks: If φ ∈ L1 (R, B, γL), then Levy’s inversion formula (Theorem 8.3) implies that X has a density fX : R → R+ : 1 2π R e−iθa − eiθb iθ φX(θ) dθ = FX(b) − FX(a) = b a fX(y) dy Furthermore, we have fX(x) = 1 2π R e−iθx φX(θ) dθ. Theorem 8.4 – Levy’s convergence theorem Let Fn, n ∈ N be a sequence of cumulative distribution functions with characteristic function: φn(θ) = R eiθx dFn(x) Suppose that: • g(θ) = lim n→∞ φn(θ) ∀ θ ∈ R •g is continuous at 0. Then g is a characteristic function of some cumulative distribution function F i.e. g(θ) = R eiθx dF(x) and Fn d −→ F (i.e. Fn(x) → F(x) ∀ x ∈ R s.t. F is continuous at x). Proof. Proof given in Williams: Probability with Martingales. Theorem 8.5 – Central limit theorem Let (Xn)n∈N be a sequence of independent identically distributed random variables such that E [X2 1 ] = σ2 < ∞ and E [X1] = 0. Let Sn = n i=1 Xi and Gn = 1 √ nσ Sn. Then Gn d −→ N(0, 1). 44
  • 45. Remark: If Xi ∼ N(0, 1) for each i ∈ N, then Gn ∼ N(0, 1) ∀ n ∈ N. Proof. Note that φGn (θ) = E e iθ√ nσ n i=1 Xi = φX1 θ √ nσ n since Xi are independent. Note that since E [X2 1 ] < ∞, we have: φX1 (θ) = 1 + iE [X1] iθ 1! 0 + (iθ)2 2! E X2 1 + o(θ2 ) = 1 − θ2 2 σ2 + o(θ2 ) Hence φGn (θ) = φX1 θ σ √ n n = 1 − θ2 2n + o θ σ √ n 2 n . Using results proved in course where 1 + bn n n → eb as n → ∞, for bn → b ∈ R, we have: lim n→∞ φGn (θ) = φ(θ) = e−θ2 2 It is well known that R eiθx 1 √ 2π e−x2 2 dx = e−θ2 2 , i.e. this is the characteristic function of N(0, 1). Therefore by Levy’s inversion formula (Theorem 8.3), we have that Gn d −→ N(0, 1). 45
  • 46. 9 Conditional Expectation & Martingales Example 9.1 Let X be a random variable on (Ω, F, P) that takes values in A = {X1, X2, . . . , Xm}, P [X ∈ A] = 1, and let Y be a random variable on (Ω, F, P) such that P [Y ∈ B] = 1, B = {y1, . . . , yn}. In particular, we assume that P [Y = Yi] > 0 ∀ i = 1, . . . , n. We have: E [X | Y = yi] = m j=1 xj · P [X = xj | Y = yi] = m j=1 xj · P [X = xj, Y = yi] P [Y = yi] = F(Yi), F : B → R. In other words, E [X | Y ] = F(Y ). Note that: E I{Y =yi}F(Y ) = P [Y = yi] · m j=1 xjP [X = xj | Y = yi] = m j=1 xj · P [X = xj, Y = yi] = E X · I{Y =yi} Remarks: 1. To define E [X | Y ], X and Y have to be defined on the same probability space. 2. Note that E [X | Y ] is a random variable in mσ(Y ) such that E [E [X | Y ] · IG] = E [X · IG] ∀ G ∈ σ(Y ). Definition 9.1 – Version of conditional expectation Let X be a random variable on L1 (Ω, F, P) and let G ⊆ F be a sub−σ−algebra. If ˆX satisfies: i) ˆX ∈ mG ii) E ˆX · IG = E [X · IG] ∀ G ∈ G then ˆX is a version of conditional expectation E [X | G] of X given G. We denote ˆX = E [X | G] (P−a.s.). 46
  • 47. Remarks: 1. If X ∈ mG satisfies ii) in Definition 9.1, then X = ˆX (P−a.s.). Proof. Note that X > ˆX , ˆX > X ∈ G and that: 0 ≤ E X − ˆX · I{X− ˆX} = E X · I{X− ˆX} − E ˆX · I{X− ˆX} = E X · I{X− ˆX} − E X · I{X− ˆX} by ii) in Def 9.1 = 0 ⇒ X ≤ ˆX (P − a.s.) Similarly, by looking at the event ˆX > X , we find that ˆX ≤ X (P−a.s.). Therefore, this implies that ˆX = X (P−a.s.). 2. Note that for ii) in Definition 9.1, we implicitly assume that E [X · IG] is well-defined. Hence we use the fact that X ∈ L1 . (|XIG| ≤ |X|) 3. In Definition 9.1, we can also assume that X ∈ (mF)+ and drop X ∈ L1 . Theorem 9.1 – Conditional Expectation Let X ∈ L1 (Ω, F, P) or X ∈ (mF)+ . Then conditional expectation E [X |G] exists and is unique (P−a.s.). (i.e. if X, ˆX are both versions of E [X |G], then X = ˆX (P−a.s.)) Proof. We consider 3 cases. X ∈ L2 , X ∈ L1 , and X ∈ (mF)+ . Case 1: If X ∈ L2 , then this implies there exists a unique Y ∈ L2 (Ω, G, P) such that E [(X − Y ) · IG] = 0 ∀ G ∈ G. This is equivalent to E [IG · Y ] = E [X · IG] ∀ G ∈ G, which implies that Y is a version of E [X | G]. Case 2: If X ∈ (mF)+ , then let Xn = min{X, n}. Note that Xn ∈ L2 , Xn X as n → ∞ almost surely. Now we have 0 ≥ E ˆX · I{ ˆXn<0} = E Xn · I{ ˆXn<0} ≥ 0 by ii) in Definition 9.1. This implies that E ˆXn · I{ ˆXn<0} = 0 which in turn implies that P ˆXn < 0 = 0. Hence we have ˆXn = E [Xn | G] and 0 ≤ ˆXn ≤ ˆXn+1. To prove ˆXn ≤ ˆXn+1 (P−a.s.), note that Xn+1−Xn ≥ 0 (P−a.s.), and that E [Xn+1 − Xn | G] = ˆXn+1 − ˆXn implies that ˆXn+1 − ˆXn ≥ 0. Hence there ∃ ˆX = lim n→∞ ˆXn ∈ mG since Xn X. We then have E ˆX · IG = E [X · IG] ∀ G ∈ G by Monotone Convergence Theorem (The- orem 4.1). 47
  • 48. Remarks: 1. X ∈ L1 ⇒ ˆX ∈ L1 (Ω, G, P) Proof. We can write ˆX = ˆX+ − ˆX− , with ˆX = max 0, ˆX , ˆX− = min 0, − ˆX . We need to show that ˆX+ , ˆX− ∈ L1 (Ω, G, P). Note that ˆX+ = ˆX · I{ ˆX≥0}, where ˆX ≥ 0 ∈ G. Then E I{ ˆX≥0} · ˆX = E X · I{ ˆX≥0} ∈ R (finite) A similar argument implies E ˆX− ∈ R. 2. If X ∈ (mF)+ ⇒ ˆX ≥ 0 (P−a.s.). Proof. Take ˆX < 0 ∈ G and note that: 0 ≤ E X · I{ ˆX<0} = E ˆX · I{ ˆX<0} ≤ 0 This implies that P ˆX < 0 = 0 ⇔ ˆX ≥ 0 (P−a.s.). 3. If ∃ X ∈ mG and satisfies E [X · IG] = E XIG ∀ G ∈ G, then X = ˆX (P−a.s.). Proof. To prove this, note the following: E X − ˆX · I{X> ˆX} = E X · I{X> ˆX} − E X · I{X> ˆX} = 0 (9.1) This implies that P X > ˆX = 0. (9.1) holds if X ∈ L1 (Ω, F, P). Similarly, one can show that P ˆX > X = 0. Therefore the statement follows if X ∈ L1 (Ω, F, P). However, if X ≥ 0, and E[X] > ∞, then an approximation argument and (9.1) yields the statement. Exercise. Hint: apply (9.1) to Xn = min {X, n}. 48
  • 49. Theorem 9.2 Let (Ω, F, P) be our probability space, and let X, Y ∈ mF. Take G, H sub−σ−algebras in F. Then: a) If X ∈ mG and X ∈ L1 or X ∈ (mG)+ , then E [X |G] = X (P−a.s.). b) If X, Y ∈ L1 (Ω, F, P), and a, b ∈ R, then E [aX + bY | G] = aE [X | G] + bE [Y | G]. c) X ∈ L1 (Ω, F, P) on X ∈ (mF)+ , then E [E [X | G]] = E[X]. d) X ∈ mG and assume either X, Y ∈ L2 (Ω, F, P) or X, Y ∈ (mF)+ , then E[XY | G] = XE [Y | G] (P−a.s.). e) If X ∈ L1 (Ω, F, P) or X ∈ (mF)+ and H is independent of σ(X), then E[X |H] = E[X] (P−a.s.). f) (Tower Property): Let H ⊆ G and X ∈ L1 or X ∈ (mF)+ . Then: E [E[X | G] | H] = E[X | H] g) If X ≥ 0, then E[X | G] ≥ 0 (P−a.s.). h) (Jensen’s Inequality): If φ : R → R is convex such that φ(X), X ∈ L1 (Ω, F, P), then: E [φ(X) | G] ≥ φ (E [X |G]) (P − a.s.) i) Let f : R2 → R be B(R2 ) measurable, and X ∈ mG and Y independent of G, and f(XY ) ∈ L1 (Ω, F, P). Then g(X) : E [f(x, Y )] , x ∈ R (fixed x) defines a Borel measurable map g : R → R which satisfies E [f(X, Y ) | G] = g(X) (P−a.s.). Proof. For some parts: c) This is clear from the definition. E [IG · E [X | G]] = E [IG · X] ∀ G ∈ G by taking G = Ω. d) If X = IG, G ∈ G, then our statement d) follows by the definition of conditional expectation since E [IG · Y | G] = IG · E [Y | G]. For any A ∈ G, we need to see that E [IA · IG · E [Y | G]] = E [IAIG · Y ]. However, this holds since E [IA∩G · E [Y |G]] = E [IA∩G · Y ] ∀ A ∈ G and IAIG = IA∩G. Our statement d) holds by approximating X ∈ mG by simple functions and proving E [IA · X · E [Y | G]] = E [IA · XY ] ∀ A ∈ G. e) We need to prove that ∀ H ∈ H, we have E [IH · E [X]] = E [IH · X]. But E [IH · E[X]] = E [IH] · E[X] since IH, X are independent. f) Pick H ∈ H and note that: E IH · ˆX = E [E [IHE [X | G] | H]] by d) = E [E [E [IH · X | G] | H]] by d), and that H ∈ H ⊆ G = E [IH · X] by applying c) twice This implies the Tower Property. 49
  • 50. 10 Filtrations, martingales, and stopping times Here, we have our time: T ∈ {N, Z}, N ∪ {0} = Z+ . Definition 10.1 – Filtration Let (Ω, F, P) be a probability space. A filtration indexed by T is a non-decreasing sequence of σ−algebras (Ft)t∈T on (Ω, F, P) i.e. we have Fs ⊆ Ft ⊆ F ∀ s, t ∈ T s.t. s ≤ t. Definition 10.2 – (Stochastic) Process A process (Xt)t∈T = X is a collection of random variables on (Ω, F, P). Definition 10.3 – Adapted The process X = (Xt)t∈T is adapted to the filtration (Ft)t∈T if Xt ∈ mFt. Definition 10.4 – Filtered probability space (Ω, F, P) with filtration (Ft)t∈T is called a filtered probability space (Ω, F, (F)t∈T, P). Definition 10.5 – Martingale A process M = (Mt)t∈T is a martingale on a filtered probability space (Ω, F, (Ft)t∈T, P) if: a) M is adapted to (Ft)t∈T, Mt ∈ mFt. b) Mt ∈ L1 (Ω, F, P) ∀ t ∈ T, i.e. E [|Mt|] < ∞ ∀ t. c) For any s ≤ t, s, t ∈ T, we have E [Mt | Fs] = Ms (P−a.s.). Definition 10.6 – Submartingale M = (Mt)t∈T is a submartingale if a) and b) hold in Definition 10.5 and E [Mt | IS] ≥ Ms (P−a.s.). Definition 10.7 – Supermartingale M = (Mt)t∈T is a supermartingale if a) and b) hold in Definition 10.5 and E [Mt | IS] ≤ Ms (P−a.s.). Remarks: 1. Note that c) in Definition 10.5 is equivalent to E [(Mn+1 − Mn) · IA] = 0 ∀ A ∈ Fn and all n ∈ T. Proof. To prove this, we need to show that E [Mn+k | Fn] = Mn (P−a.s.). We have: E [Mn+2 | Fn] = E [E [Mn+2 | Fn+1]] by Tower Property = E [Mn+1 | Fn] = Mn 50
  • 51. Exercise: Show that in Definition 10.7, E [Mt | IS] ≤ Ms (P−a.s.) is equivalent to E [(Mn+1 − Mn) · IA] ≤ 0 ∀ A ∈ Fn ∀ n ∈ T. Example 10.1 1. X ∈ L1 (Ω, F, P) on a filtered probability space (Ω, F, (Ft)t∈T, P). Then Mt = E [X | Ft] is a martingale. c) from Definition 10.5 follows from the Tower Property of condi- tional expectation. 2. Let Xi, i ∈ N be iid random variables such that P [Xi = 1] = p, P [Xi = −1] = 1 − p, with p ∈ (0, 1). Define Mk = k i=1 Xi. Claim M = (Mk)k∈N is a supermartingale if and only if p ≤ 1 2 , M is a submartingale if p ≥ 1 2 . Hence M is a martingale if and only if p = 1 2 . Proof. Here, we use Fk = σ(X1, . . . , Xk), and show that the properties in the definition of a martingale are satisfied. a) M is adapted to (Fk). Mk ∈ Fk is true since Fk = σ(M1, . . . , Mk) since there exists matrix A ∈ Rk×k such that:      X1 X2 ... Xk      = A      M1 M2 ... Mk      and A−1      X1 X2 ... Xk      =      M1 M2 ... Mk      with A−1 =      1 0 . . . 0 1 1 . . . 0 ... ... ... ... 1 1 . . . 1      So there is a bijection between these two vectors which implies Fk = σ(M1, . . . , Mk). b) Mk ∈ L1 ∀ k ∈ N. This holds since Xi, i = 1, . . . , k are in L1 . (because Xi = ±1, so Xi ≤ k < ∞ which is bounded) c) We have: E[ Mk+1 Xk+1+Mk | Fk] = Mk + E [Xk] as Mk ∈ mFk and Xk + 1 is independent on Fk = Mk + p(1) − (1 − p)(1) = Mk + 2p − 1 So for p ∈ 0, 1 2 , we have a supermartingale, and for p ∈ 1 2 , 1 , we have a submartin- gale. This proves the equivalences above. Definition 10.8 – Stopping time Let (Ω, F, (Ft)t∈T, P) be a filtered probability space. A random variable τ : Ω → T ∪ {∞} is a stopping time relative to the filtration (Ft)t∈T if {τ ≤ t} ∈ Ft ∀ t ∈ T. 51
  • 52. Remark: In case T = Z+ , then τ is a stopping time if and only if {τ = t} ∈ Ft ∀ t ∈ Z+ (since {τ = t} = {τ ≤ t} {τ ≤ t − 1} and {τ ≤ t} = t k=0 {τ = k}). Example 10.2 1. Let M = (Mk)k∈N, Mk = k i=1 Xi as before. Then τa = inf {t ∈ N : Mt = a} (a ∈ Z). Note that {τa ≤ t} = t k=1 M−1 k ({a}) ∈Fk ∈ Ft and Fk ⊆ Ft ∀ k ≤ t. 2. Every constant time t ∈ T is a stopping time. Example 10.3 Suppose we have M0 = 0, Mk = k i=1 Xi, with Xi iid Bernoulli random variables, P[Xi = 1] = p, P[Xi = −1] = 1 − p, p ∈ (0, 1). We have H1 = 1, Hk = 2k−1 I{Xi=−1:i=1,...,k−1}. Note that Hk ∈ mFk−1. Let Nk = k i=1 Hi (Mi − Mi−1) Xi be the gains process (N = (Nk)k∈N). Note Nk =∈ mFk, Fk = σ(X1, . . . , Xk). Also, Nk ∈ L1 ∀ p and if p ≥ 1 2 , then E [Nk+1 | Fk] = Nk (Exercise: Check this) Furthermore, N is a supermartingale if and only if p ≤ 1 2 . Let: τ = inf {t ∈ N | Mt > Mt−1} = inf {t ∈ N | Xt = 1, Xi = −1 ∀ i = 1, . . . , t − 1} Exercise: Show τ is a stopping time, i.e. show P[τ = n] = p(1 − p)n−1 n ∈ N. Note that: Nk = k i=1 Hi(Mi − Mi−1) = 1 − 2k Xi = −1, i = 1, 2, . . . , k 1 ∃ i = {1, . . . , k} s.t. Xi = 1 which implies that Nτ = 1 (P−a.s.). 52