1
Probability
SOLO HERMELIN
Updated: 6.06.09http://www.solohermelin.com
2
SOLO
Table of Content
Probability
Set Theory
Probability Definitions
Theorem of Addition
Conditional Probability
Total Probability Theorem
Statistical Independent Events
Theorem of Multiplication
Conditional Probability - Bayes Formula
Random Variables
Probability Distribution and Probability Density Functions
Conditional Probability Distribution and
Conditional Probability Density Functions
Expected Value or Mathematical Expectation
Variance
Moments
Functions of one Random Variable
Jointly, Distributed Random Variables
Characteristic Function and Moment-Generating Function
Existence Theorems (Theorem 1 & Theorem 2)
3
SOLO
Table of Content (continue - 1)
Probability
Law of Large Numbers (History)
Markov’s Inequality
Chebyshev’s Inequality
Bienaymé’s Inequality
Chernoff’s and Hoeffding’s Bounds
Chernoff’s Bound
Hoeffding’s Bound
Convergence Concepts
The Law of Large Numbers
Central Limit Theorem
Bernoulli Trials – The Binomial Distribution
Poisson Asymptotical Development (Law of Rare Events)
Normal (Gaussian) Distribution
De Moivre-Laplace Asymptotical Development
Laplacian Distribution
Gama Distribution
Beta Distribution
Distributions
4
SOLO
Table of Content (continue - 2)
Probability
Cauchy Distribution
Exponential Distribution
Chi-square Distribution
Student’s t-Distribution
Uniform Distribution (Continuous)
Rayleigh Distribution
Rice Distribution
Weibull Distribution
Kinetic Theory of Gases
Maxwell’s Velocity Distribution
Molecular Models
Boltzman Statistics
Bose-Einstein Statistics
Fermi-Dirac Statistics
Monte Carlo Method
Generating Continuous Random Variables
Importance Sampling
Generating Discrete Random Variables
Metropolis & Metropolis – Hastings Algorithms
Markov Chain Monte Carlo (MCMC)
Gibbs Sampling
Monte Carlo Integration
5
SOLO
Table of Content (continue - 3)
Probability
Appendices
Permutations
Combinations
References
Random Processes
Stationarity of a Random Process
Ergodicity
Markov Processes
White Noise
Markov Chains
Existence Theorems (Theorem 3)
6
SOLO Set Theory
A = (ζ1, ζ2,…, ζn) – a set of n elements
A set A is a collection of objects (elements of the set) ζ1, ζ2,…, ζn
A (x)= (|x| < 1) – a set of all numbers smaller than 1
A (x,y)= (0 <x <T, 0<y<T) – a set of points (x,y) in a square
Ǿ - the set that contains no elements
S - the set that contains all elements (Set space)
S
A = Set space of a die:
six independent events
{1}, {2}, {3}, {4}, {5}, {6}
Examples
7
SOLO Set Theory
Set Operations
Inclusion - A is included in B ifBA ⊂ BxAx ∈⇒∈∀
Equality ( ) ( )ABandBABA ⊂⊂⇔=
Addition BxorAxifBAxBA ∈∈∪∈⇒∪
Multiplication BxandAxifBAxBA ∈∈∩∈⇒∩
( ) ( )CBACBA ∪∪=∪∪
AAA =∪ AOA =/∪ SSA =∪
AAA =∩ OOA /=/∩ ASA =∩
S
A
B
S
A
B
BA∪
S
A
B
BA∩
Complement ofA A OAAandSAA /=∩=∪⇒
S
A A
Difference BABA ∩=− S BA−
B
A
AB −
8
SOLO Set Theory
Set Operations
Incompatible Sets A and B are incompatible iff OBA /=∩
Decomposition of a Set
jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
S
OBA /=∩
B
A
S nAAAA ∪∪∪= 21
1A 2A nA
jiOAA ji ≠∀/=∩
S
1A 2A nA
If
we say that A is decomposed in incompatible sets.
jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If
we say that the set space S is decomposed in exhaustive and
incompatible sets.
De Morgan Law ( ) BABA ∩=∪ ( ) BABA ∪=∩
To find the complement of a set operations we must interchange
between and , and use the complements of the sets.∩ ∪
August De Morgan
(1806 – 1871)
On other form of De Morgan Law
AA
i
i
i
 = i
i
i
i
AA  =
Table of Content
9
SOLO Probability
Pr (A) is the probability of the event A if
S nAAAA ∪∪∪= 21
1A 2A nA
jiOAA ji ≠∀/=∩
( ) 0Pr ≥A(1)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
( ) 1Pr =S(2)
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
Probability Axiomatic Definition
Probability Geometric Definition
Assume that the probability of an event in a geometric region A is defined as the
ratio between A surface to surface of S.
( ) ( )
( )SSurface
ASurface
A =Pr
( ) 0Pr ≥A(1)
( ) 1Pr =S(2)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
S
A
10
SOLO Probability
From those definition we can prove the
following:( ) 0=/OP(1’)
Proof: OOSandOSS /=/∩/∪=
( )
( ) ( ) ( ) ( ) 0PrPrPrPr
3
=/⇒/+=⇒ OOSS
( ) ( )APAP −= 1(2’)
Proof: OAAandAAS /=∩∪= ( )
( ) ( )
( ) ( ) ( ) ( )AAAAS Pr1PrPrPr1Pr
32
−=⇒+==⇒
( ) 1Pr0 ≤≤ A(3’)
Proof: ( )
( )
( )
( )
( ) 1Pr0Pr1Pr
1'2
≤⇒≥−= AAA
( )
( )APr0
1
≤
( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )n
AAAA PrPrPrPr 21
+++= 
( ) ( )AABAIf PrPr ≤⇒⊂(4’)
Proof: ( )
( )
( ) ( ) ( ) ( )BAAABB PrPr0PrPrPr
00
3
≤⇒≥+−=
≥≥

( ) ( ) OAABandAABB /=∩−∪−=
( ) ( ) ( ) ( )BABABA ∩−+=∪ PrPrPrPr(5’)
Proof: ( ) ( )
( ) ( ) ( ) ( ) OABBAandABBAB
OABAandABABA
/=−∩∩−∪∩=
/=−∩−∪=∪
( )
( )
( ) ( )
( )
( )
( ) ( )
( ) ( ) ( ) ( )BABABA
ABBAB
ABABA
∩−+=∪⇒




−+∩=
−+=∪
PrPrPrPr
PrPrPr
PrPrPr
3
3
Table of Content
11
SOLO Probability
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr(6’)
Proof by induction:
( ) ( ) ( ) ( )212121
PrPrPrPr AAAAAA ∩−+=∪For n = 2 we found that satisfies the equation
Assume equation true for n – 1.
( ) ( ) ( ) ( ) ( ) ( )


−+−+−=



 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
ni
n
i
ni AAAAAAAAAAAAA
Let calculate for n
but
( )
( ) ( ) ( ) ( ) ( ) ( )


−+




−+−+−=



−+




=








=





−
=
−
=
−





 −
≠≠





 −
≠





 −
=
−
=
−
=
−
==
∑∑∑ 



1
1
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
1
1
1
11
Pr1PrPrPr
PrPrPrPrPr
n
i
nin
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
n
i
in
n
i
in
n
i
i
n
i
i
AAPAPAAAAAAA
AAAAAAA
( ) ( ) ( ) ( ) 




−+−+−=




 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
Theorem of Addition
12
SOLO Probability
(6’)
Proof by induction (continue):
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )


−−−−+−+





−++−+−=





−
=
−





 −
≠≠





 −
≠





 −
=
−
=
−





 −
≠≠≠





 −
≠≠





 −
≠





 −
==
∑∑∑
∑∑∑∑




1
1
2
3
1
,.
2
1
.
1
1
1
1
1
2
4
1
.,.
3
1
,.
2
1
.
1
1
11
Pr1PrPrPrPr
Pr1PrPrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
nin
n
i
i
n
n
lkji
lkji
lkji
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAAAAAAAAAA
AAAAAAAAAAAA
Use the fact that ( )
( )
( )
( )
( )
( )
( )
( ) ( ) 





−
−
+




 −
=
−−−
−
+
−−
−
=
−−
−
−
=
−
=





1
11
!1!1
!1
!!1
!1
!!1
!1
!!
!
k
n
k
n
kkn
n
kkn
n
kkn
n
kn
n
kkn
n
k
n
to obtain
q.e.d.
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr
( ) ( ) ( ) ( ) ( ) ( )


−+−+−=



 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
ni
n
i
ni
AAAAAAAAAAAAA
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr
Theorem of Addition (continue)
Table of Content
13
SOLO Probability
Conditional Probability
S nAAAA ααα ∪∪∪= 21

1αA
jiOAA ji ≠∀/=∩
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

Given two events A and B decomposed in elementary
events
jiOAAandAAAAA ji
n
i
in ≠∀/=∩=∪∪∪=
=
αααααα 
1
21
lkOAAandAAAAB lk
m
k
km ≠∀/=∩=∪∪∪=
=
ββββββ 
1
21
jiOAAandAAABA jir ≠∀/=∩∪∪∪=∩ αβαβαβαβαβ 21
( ) ( ) ( ) ( )n
AAAA ααα PrPrPrPr 21
+++=  ( ) ( ) ( ) ( )mAAAB βββ PrPrPrPr 21 +++= 
( ) ( ) ( ) ( ) nmrAAABA r ,PrPrPrPr 21 ≤+++=∩ βαβαβα 
We want to find the probability of A event under the condition that the event B
had occurred designed as P (A|B)
( )
( ) ( ) ( )
( ) ( ) ( )
( )
( )B
BA
AAA
AAA
BA
m
r
Pr
Pr
PrPrPr
PrPrPr
|Pr
21
21 ∩
=
+++
+++
=
βββ
βαβαβα


14
SOLO Probability
Conditional Probability S nAAAA ααα ∪∪∪= 21

1αA
jiOAA ji ≠∀/=∩
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

If the events A and B are statistical independent, that the fact that B occurred will
not affect the probability of A to occur.
( ) ( )
( )B
BA
BA
Pr
Pr
|Pr
∩
= ( ) ( )
( )A
BA
AB
Pr
Pr
|Pr
∩
=
( ) ( )ABA Pr|Pr = ( ) ( ) ( ) ( ) ( ) ( ) ( )BAAABBBABA PrPrPr|PrPr|PrPr ⋅=⋅=⋅=∩
Definition:
n events Ai i = 1,2,…n are statistical independent if:
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content
15
SOLO Probability
Conditional Probability - Bayes Formula
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA ββββ Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
m
k
k ,
1
∀/=∩∩∩∩=
=
βββ
( ) ( )∑
=
∩=
m
k
k
BAB
1
PrPr β
we obtain:
( ) ( ) ( )
( )
( ) ( )
( ) ( )∑=
⋅
⋅
=
⋅
= m
k
kk
llll
l
AAB
AAB
B
AAB
BA
1
Pr|Pr
Pr|Pr
Pr
Pr|Pr
|Pr
ββ
ββββ
β
and Bayes Formula
Thomas Bayes
1702 - 1761
Table of Content
( ) ( ) ( ) ( ) ( ) ( )∑∑∑ ===
⋅=⋅=∩=
m
k
kk
m
k
k
m
k
k AABBBABAB
111
Pr|PrPr|PrPrPr ββββ
16
SOLO Probability
Total Probability Theorem
Table of Content
jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If
we say that the set space S is decomposed in exhaustive and
incompatible (exclusive) sets.
The Total Probability Theorem states that for any event B,
its probability can be decomposed in terms of conditional
probability as follows:
( ) ( ) ( ) ( )∑∑ ==
==
n
i
i
n
i
i BPBABAB
11
|Pr,PrPr
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
n
k
k ,
1
∀/=∩∩∩∩=
=

( ) ( )∑=
∩=
n
k
k BAB
1
PrPr
For any event B
we obtain:
17
SOLO Probability
Statistical Independent Events
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )∏∑∏∑∏∑
∑∑∑
=
−






≠≠
=






≠
=






=
=
−






≠≠






≠






==
−+−+−=






−+−+−=





n
i
i
n
n
kji
kji i
i
n
ji
ji i
i
n
i
i
tIndependen
lStatisticaA
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAA
AAAAAAAA
i
1
1
3
,.
3
1
2
.
2
1
1
1
1
1
3
,.
2
.
1
11
Pr1PrPrPr
Pr1PrPrPrPr

 
From Theorem of Addition
Therefore
( )[ ]∏==
−=





−
n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr1Pr1  ( )[ ]∏==
−−=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr11Pr 
Since OAASAA
n
i
i
n
i
i
n
i
i
n
i
i /=














=














====
 
1111
&








=





−
==

n
i
i
n
i
i AA
11
PrPr1
( )∏==
=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
PrPr 
If the n events Ai i = 1,2,…n are statistical independent
than are also statistical independentiA
( )∏=
=
n
i
iA
1
Pr





=
=

n
i
i
MorganDe
A
1
Pr ( )[ ]∏=
−=
n
i
i
tIndependen
lStatisticaA
A
i
1
Pr1
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content
18
SOLO Probability
Theorem of Multiplication
( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅=
Proof
( ) ( ) ( )ABABA /PrPrPr ⋅=∩Start from
( )[ ] ( ) ( )12121 /PrPrPr AAAAAAA nn  ⋅=
( ) ( ) ( )2131212 /Pr/Pr/Pr AAAAAAAAA nn  ⋅=
in the same way
( ) ( ) ( )12122112211 /Pr/Pr/Pr −−−−− ⋅= nnnnnnn AAAAAAAAAAAAA 

From those results we obtain:
( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅=
q.e.d.
Table of Content
19
SOLO Review of Probability
Random Variables
Let ascribe to each outcome or event a real number, such we have a one-to-one
correspondence between the real numbers and the Space of Events. Any function
that assigns a real number to each event in the Space of Events is called a random
variable (a random function is more correct, but this is the used terminology).
X
x
0
X
1 2 3 4 5 6
x
The random variables can be:
- Discrete random variables for discrete events
- Continuous random variables for continuous events
Table of Content
20
SOLO Review of Probability
Probability Distribution and Probability Density Functions
The random variables map the space of events X to the space of real numbers x.
( )xP
x
0 ∞+∞−
0.1
The Probability Distribution Function or Cumulative Probability Distribution Function
of x can be defined as:
(1)
PX (x) is a monotonic increasing function
( ) ( ) ∞≤≤∞−≤= xxXxPX Pr:
The Probability Distribution Function has the following properties:
( ) ∞≤≤∞−=∞− xPX
0
(2) ( ) ∞≤≤∞−=∞+ xPX
1
(3)
( ) ( ) 2121 xxxPxP XX ≤⇔≤
The Probability that X lies in the interval (a,b) is given by:
( ) ( ) ( ) 0Pr ≥−=≤< aPbPbXa XX
If PX (x) is a continuous differentiable function of x we can define
( ) ( ) ( ) ( ) ( ) 0lim
Pr
lim:
00
≥=
∆
−∆+
=
∆
∆+≤<
=
→∆→∆
xd
xPd
x
xPxxP
x
xxXx
xp XXX
xx
X
the Probability Density Function of x.
( )xp
x
0 ∞+∞−
0.1
21
SOLO Review of Probability
Probability Distribution and Probability Density Functions (continue – 1)
The Probability Distribution and Probability Density Functions of x can be defined
also for discrete random variables.
( ) ( ) ( ) ( ) ( ) integer
61
616/1
10
6
1
Pr
0
6
10
k
k
kk
k
dxixdxxpkXkxP
k
i
k
XX





≤
<≤++
<
=−==≤== ∫∑∫ =
δ
( )xp
x
0 6
0.1
1 2 3 4 5
( )xP
6/1
3/1
2/1
3/2
6/5
Example
Set space of a die:
six independent events {x=1}, {x=2}, {x=3}, {x=4}, {x=5}, {x=6}
( ) ( )∑
=
−=
6
16
1
:
i
X
ixxp δ
Where δ (x) is the Dirac delta function
( ) ( ) 1&
0
00
=



=∞
≠
= ∫
+∞
∞−
dxx
x
x
x δδ
X
1 2 3 4 5 6
x
22
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(2) Poisson’s Distribution ( ) ( )0
0
exp
!
, k
k
k
nkp
k
−≈
(1) Binomial (Bernoulli) ( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
(3) Normal (Gaussian)
( ) ( ) ( )[ ]
σπ
σµ
σµ
2
2/exp
,;
22
−−
=
x
xp
(4) Laplacian Distribution ( )







 −
−=
b
x
b
bxp
µ
µ exp
2
1
,;
23
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(5) Gama Distribution ( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
(6) Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
(7) Cauchy Distribution ( )
( ) 





+−
= 22
0
0
1
,;
γ
γ
π
γ
xx
xxp
24
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
SOLO
(8) Exponential Distribution
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
(9) Chi-square Distribution
( )
( )
( )
( )





<
≥−
Γ=
−
00
02/exp
2/
2/1
;
12/
2/
x
xxx
kkxp
k
k
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
(10) Student’s t-Distribution
( ) ( )[ ]
( ) ( )( ) 2/12
/12/
2/1
; +
+Γ
+Γ
= ν
ννπν
ν
ν
x
xp
25
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
SOLO
(11) Uniform Distribution (Continuous)
( )





>>
≤≤
−=
bxxa
bxa
abbaxp
0
1
,;
(12) Rayleigh Distribution
( ) 2
2
2
2
exp
;
σ
σ
σ






−
=
x
x
xp
(13) Rice Distribution
( ) 










 +
−
= 202
2
22
2
exp
,;
σσ
σ
σ
vx
I
vx
x
vxp
26
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(14) Weibull Distribution
SOLO
( )





<
>≥













 −
−




 −
=
−
00
0,,exp
,,;
1
x
x
xx
xp
αγµ
α
µ
α
µ
α
γ
αµγ
γγ
Weibull Distribution Table of Content
27
SOLO Review of Probability
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
The Conditional Probability Distribution Function or Cumulative Conditional Probability
Distribution Function of x given is defined as:
( ) ( ) ∞<<∞−∈≤= xYyxXyxP YX
/Pr://
Yy ∈
(1) ( ) ∞≤≤∞−=∞− xyP YX
0//
(2) ( ) ∞≤≤∞−=∞+ xyP YX
1//
PX/Y (x/y) is a monotonic increasing function(3)
( ) ( ) 212/1/ // xxyxPyxP YXYX ≤⇔≤
The Probability that X lies in the interval (a,b) given is given by:
( ) ( ) ( ) 0///Pr //
≥−=≤< yaPybPYbXa YXYX
If PX/Y (x/y) is a continuous differentiable function of x we can define
( ) ( ) ( ) ( ) ( ) 0
///
lim
/Pr
lim:/ ///
00
/ ≥=
∆
−∆+
=
∆
∆+≤<
=
→∆→∆
xd
yxPd
x
yxPyxxP
x
YxxXx
yxp YXYXYX
xx
YX
the Conditional Probability Density Function of x.
Yy ∈
The random variables map the space of events X to the space of real numbers x.
28
SOLO Review of Probability
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
Example 1
Given PX (x) and pX (x) find PX/Y (x/x ≤ a) and pX/Y (x/x ≤ a)
( )
( ) ( )


≤
>
=≤
axaPxP
ax
axxP
XX
YX
/
1
//
( )
( ) ( )


≤
>
=≤
axaPxp
ax
axxp
XX
YX
/
0
//
Example 2
Given PX (x) and pX (x) find PX/Y (x/ b <x ≤ a) and pX/Y (x/ b< x ≤ a)
( ) ( ) ( )
( ) ( )







≥
<≤
−
−
<
=≤
ax
axb
bPaP
bPxP
bx
axxP
XX
XX
YX
1
0
//
( ) ( )
( ) ( )







≥
<≤
−
<
=≤
ax
axb
bPaP
xp
bx
axxp
XX
X
YX
0
0
//
Table of Content
29
SOLO Review of Probability
Expected Value or Mathematical Expectation
Given a Probability Density Function p (x) we define the Expected Value
For a Continuous Random Variable: ( ) ( )∫
+∞
∞−
= dxxpxxE X:
For a Discrete Random Variable: ( ) ( )∑=
k
kXk xpxxE :
For a general function g (x) of the
Random Variable x: ( )[ ] ( ) ( )∫
+∞
∞−
= dxxpxgxgE X:
( )xp
x
0 ∞+∞−
0.1
( )xE
( )
( )
( )∫
∫
∞+
∞−
+∞
∞−
=
dxxp
dxxpx
xE
X
X
:
The Expected Value is the center of
surface enclosed between the
Probability Density Function and x
axis.
Table of Content
30
SOLO Review of Probability
Variance
Given a Probability Density Functions p (x) we define the Variance
( ) ( )[ ]{ } ( ) ( )[ ] ( ) ( )22222
2: xExExExExxExExExVar −=+−=−=
Central Moment
( ) { }k
k xEx =:'µ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the origin
( ) ( )[ ]{ } ( ) ( )∑=
−−
−





=−=
k
j
jk
j
jkk
k xE
j
k
xExEx
0
'1: µµ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the Mean E (x)
Table of Content
31
SOLO Review of Probability
Moments
Normal Distribution ( ) ( ) ( )[ ]
σπ
σ
σ
2
2/exp
;
22
x
xpX
−
=
[ ] ( )


 −⋅
=
oddnfor
evennforn
xE
n
n
0
131 σ
[ ]
( )





+=
=−⋅
= +
12!2
2
2131
12
knfork
knforn
xE kk
n
n
σ
π
σ
Proof:
Start from: and differentiate k time with respect to a( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
Substitute a = 1/(2σ2
) to obtain E [xn
]
( ) ( ) 0
2
1231
exp 12
22
>
−⋅
=− +
∞
∞−
∫ a
a
k
dxxax kk
k π
[ ] ( ) ( )[ ] ( ) ( )[ ]
( ) ( ) 12
!
0
122/
0
222221212
!2
2
exp
2
22
2/exp
2
2
2/exp
2
1
2
+
∞+
=
∞∞
∞−
++
=−=
−=−=
∫
∫∫
kk
k
k
k
xy
kkk
kdyyy
xdxxxdxxxxE
σ
πσ
σ
π
σ
σπ
σ
σπ
σ
  
Now let compute:
[ ] [ ]( )2244
33 xExE == σ
Chi-square
32
SOLO Review of Probability
Moments
Gama Distribution ( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
[ ] ( )
( )
( )
( ) ( )
( )
n
knn
kn
k
n
k
kndxx
x
k
dxxx
k
xE θ
θθ
θ
θ
θ
θ Γ
+Γ
=





−
Γ
=−
Γ
= ∫∫
∞ −+∞
−+
0
1
0
1
/exp/exp
1
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
33
SOLO Review of Probability
Moments
Uniform Distribution (Continuous)
( )





>>−
≤≤−
=−
cxxc
cxc
cccxp
0
2
1
,;
[ ]





+=
+
==
−
+
−
∫
oddnfor
evennfor
n
c
n
x
c
dxx
c
xE
n
c
c
nc
c
nn
0
12
12
1
2
1
2
1
Rayleigh Distribution
( ) 





−= 2
2
2
2
exp;
σσ
σ
xx
xp
[ ]
( )





=
−=−⋅
=
=





−=





−= ∫∫
∞
∞−
+
∞
knfork
knforn
dx
x
xdx
xx
xxE
kk
n
nnn
2!2
12131
2
2
exp
2
1
2
exp
2
2
2
1
0
22
2
2
σ
σ
π
σσσσ

34
SOLO Review of Probability
Example
Repeat an experiment m times to obtain X1, X2,…,Xm.
Define:
Statistical Estimation:
m
XXX
X m
m
+++
=
21
Sample Variation:
( ) ( ) ( )
m
XXXXXX
V mmmm
m
22
2
2
1 −++−+−
=

( ) ( )[ ] 22
σµµ =−= ii
XEXE
( ) ( )[ ] jiXXE ji ≠∀=−− 0µµSince the experiment are uncorrelated:
( ) ( ) ( ) ( ) µ=
+++
=
m
XEXEXE
XE m
m
21
( ) ( )[ ]{ } ( ) ( ) ( )
mm
m
m
XXX
EXEXEXVar m
mmmXm
2
2
22
21
22 σσµµµ
σ ==













 −++−+−
=−==

35
SOLO Review of Probability
Example (continue)
Statistical Estimation:
m
XXX
X m
m
+++
=
21
Sample Variation:
( ) ( ) ( )
m
XXXXXX
V mmmm
m
22
2
2
1 −++−+−
=

Let compute: ( )mVE
( ) ( ) ( )[ ]{ } ( )[ ] ( )[ ] ( )( )[ ]
m
XXEXEXE
m
XXE
m
XX
E mimimimi
µµµµµµ −−−−+−
=
−−−
=







 − 2
2222
( )( )[ ] ( ) ( ) ( ) ( )
( )[ ]
( ) ( )[ ]
mm
XXX
XEXXE
jiXXE
XE
mi
imi
ji
i
20
1
22
σµµµ
µµµ
µµ
σµ
≠∀=−−
=−
=




 −++−++−
−=−−

Therefore:
( ) ( )[ ] ( )[ ] ( )( )[ ] 2
22
2
222
1
2
2
σ
σσ
σ
µµµµ
m
m
m
mm
m
XXEXEXE
m
XX
E mimimi −
=
−+
=
−−−−+−
=







 −
( ) ( )[ ] ( )[ ] ( )[ ] 2
2
22
2
2
1 1
1
σ
σ
m
m
m
m
m
m
m
XXEXXEXXE
VE mmmm
m
−
=
−
=
−++−+−
=

Table of Content
36
SOLO Review of Probability
Functions of one Random Variable
Let y = g (x) a given function of the random variable x defined o the domain Ω, with
probability distribution pX (x). We want to find pY (y).
Fundamental Theorem
Assume x1, x2, …, xn all the solutions of the equation
( ) ( ) ( )n
xgxgxgy ==== 21
( ) ( )
( )
( )
( )
( )
( )n
nXXX
Y
xg
xp
xg
xp
xg
xp
yp
''' 2
2
1
1
+++= 
( ) ( )
xd
xgd
xg =:'
Proof
( ) ( ) ( ) ( ) ( )
( )∑∑∑ ===
==±≤≤=+≤≤=
n
i i
iX
n
i
iiX
n
i
iiiY yd
xg
xp
xdxpxdxxxydyYyydyp
111 '
PrPr:
q.e.d.
Cauchy
Distribution
Derivation of
Chi-square
37
SOLO Review of Probability
Functions of one Random Variable (continue – 1)
Example 1
bxay += ( ) 




 −
=
a
by
p
a
yp XY
1
Example 2
x
a
y = ( ) 





=
y
a
p
y
a
yp XY 2
Example 3
2
xay = ( ) ( )yU
a
y
p
a
y
p
ya
yp XXY
















−+








=
2
1
Example 4
xy = ( ) ( ) ( )[ ] ( )yUypypyp XXY −+=
Table of Content
38
SOLO Review of Probability
Jointly, Distributed Random Variables
We are interested in function of several variables.
( ) ( )nnnXXX xXxXxXxxxP n
≤≤≤= ,,,Pr:,,, 22112121

The Jointly Cumulative Probability Distribution of the random variables
X1, X2, …,Xn is defined as:
The Cumulative Probability Distribution of the random variable Xi, can be
obtained from
( ) ( ) ( )∞∞∞=∞≤≤∞≤∞≤= ,,,,,,,,,,Pr 2121   iXXXniiiX xPXxXXXxP ni
( )nXXX xxxP n
,,, 2121

If the Jointly Cumulative Probability Distribution is continuous and differentiable
in each of the components than we can define the Joint Probability Density Function as:
( )
( )
n
nXXX
n
nXXX
xxx
xxxP
xxxp n
n
∂∂∂
∂
=





21
21
21
,,,
:,,, 21
21
( ) ( )∫ ∫ ∫
∞
∞−
∞
∞−
∞
∞−
≠
≠
=
ik
ik
ni nknXXXiX xdxdxdxxxpxp ,,,,,,, 12121
 
39
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 1)
We define:
( )[ ] ( ) ( )∫ ∫
∞
∞−
∞
∞−
= nnXXXnn xdxdxxxpxxxgxxxgE n
,,,,,,,,:,,, 1212121 21
 
∑
=
=+++=
m
i
imm
XXXXS
1
21
: 
Example: Given the Sum of m Variables
[ ] ( ) ( )
( ) ( )∑∑ ∫ ∫
∫ ∫
==
∞
∞−
∞
∞−
∞
∞−
∞
∞−
==
+++=
m
i
i
m
i
nnXXXi
nnXXXnm
xExdxdxxxpx
xdxdxxxpxxxSE
n
n
11
121
12121
,,,,,
,,,,,:
21
21




[ ] ( )[ ]{ } ( ) ( )
( )[ ]
( )[ ]{ } ( )[ ] ( )[ ]{ }
( ) ( )∑∑∑
∑∑∑
∑
≠
=
≠
==
≠
=
≠
==
=
∑=
∑=
+=
−−+−=











 −=−=
=
=
m
ji
i
m
ij
j
ji
m
i
i
m
ji
i
m
ij
j
jjii
m
i
ii
m
i
ii
XS
XESE
mmm
XXCovXVar
XEXXEXEXEXE
XEXESESESVar
m
i
im
m
i
im
1 11
1 11
2
2
1
2
,2
2
:
1
1
40
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 2)
Given the joint density function of n random variables X1, X2, …, Xn: ( )nXXX
xxxp n
,,, 2121

we want to find the joint density function of n random variables Y1, Y2, …, Yn that
are related to X1, X2, …, Xn, through
( )
( )
( )nnn
n
n
XXXgY
XXXgY
XXXgY
,,,
,,,
,,,
21
2122
2111




=
=
=










































∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
=




















n
n
nnn
n
n
n Xd
Xd
Xd
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
Yd
Yd
Yd






2
1
21
2
2
2
1
2
1
2
1
1
1
2
1
Assuming that the Jacobian
( )






















∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
=
n
nnn
n
n
n
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
XXXJ





21
2
2
2
1
2
1
2
1
1
1
21 det,,,
is nonzero for each X1, X2, …, Xn, exists a unique solution Y1, Y2, …, Yn
41
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 3)
Assume that for a given Y1, Y2, …, Yn we can find k solutions (X1, X2, …, Xn)1,…
( X1, X2, …, Xn)k.
( ) ( )
( ) ( )
( )
( )∑
∑∑
=
==
=
=±≤≤±≤≤=
+≤≤+≤≤=
k
i
n
n
nXX
k
i
nnXX
k
i
innnn
nnnnnnYY
ydyd
xxJ
xxp
xdxdxxpxdxXxxdxXx
ydyYyydyYyydydyyp
n
n
n
1
1
1
1
1
11
1
1111
111111
,,
,,
,,,,Pr
,,Pr:,,
1
1
1








Therefore
( )
( )
( )∑
=
=
k
i
n
nXX
nYY
xxJ
xxp
yyp n
n
1
1
1
1
,,
,,
,, 1
1





The relation between the differential volume in (Y1, Y2, …, Yn) and the differential volume
in (X1, X2, …, Xn) is given by
( ) nnn
xdxdxxJydyd  111
,,=
1xd
2xd
3xd
1yd
2yd
3yd
42
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 4)
Example 1
( ) ( )
( )
( )
( )
( )[ ]
( ) ( )
0,
/exp/exp/exp
, 11
11
, ≥
ΓΓ
+−
=
Γ
−
Γ
−
= −−
+
−−
yxyx
yxyyxx
yxp YX
βα
βαβ
β
α
α
θβα
θ
θβ
θ
θα
θ
X and Y are independent gamma random variables with parameters (α,λ) and
(β, λ), respectively, compute the joint densities of U= X + Y and V = X / (X + Y)
( )
( ) ( ) ( )


−=
=
⇔



+==
+==
VUY
VUX
YXXYXgV
YXYXgU
1/,
,
2
1
( ) ( )
UYX
YX
X
YX
YJ
11
11
22
−=
+
−=
+
−
+
=
( )
( )
( )
( )[ ]
( )[ ]
[ ]
( ) ( )
( ) ( )[ ] uvuuv
u
vuuvJ
vuuvp
yxJ
yxp
vup YXYX
VU
11,,
, 1
/exp
1,
1,
,
,
,
−−
+
−
ΓΓ
−
=
−
−
==
βα
βα
θβα
θ
[ ]
( )
( )
( ) ( )
( ) 11
1
1
/exp −−
+
−+
−
ΓΓ
+Γ
+Γ
−
=
βα
βα
βα
βα
βα
θβα
θ
vv
uu
Therefore ( ) [ ]
( ) βα
βα
θβα
θ
+
−+
+Γ
−
=
/exp1
uu
upU
( ) ( )
( ) ( )
( ) 11
1
−−
−
ΓΓ
+Γ
=
βα
βα
βα
vvvpV
gamma distribution
beta distribution
Table of Content
43
SOLO Review of Probability
Characteristic Function and Moment-Generating Function
Given a Probability Density Functions pX (x) we define the Characteristic Function or
Moment Generating Function
( ) ( )[ ]
( ) ( ) ( ) ( )
( ) ( )




=
==Φ
∑
∫∫
+∞
∞−
+∞
∞−
x
X
XX
X
discretexxpxj
continuousxxPdxjdxxpxj
xjE
ω
ωω
ωω
exp
expexp
exp:
This is in fact the complex conjugate of the Fourier Transfer of the Probability Density
Function. This function is always defined since the condition of the existence of a
Fourier Transfer :
Given the Characteristic Function we can find the Probability Density
Functions pX (x) using the Inverse Fourier Transfer:
( )
( )
( ) ∞<== ∫∫
+∞
∞−
≥+∞
∞−
1
0
dxxpdxxp X
xp
X
( ) ( ) ( )∫
+∞
∞−
Φ−= ωωω
π
dxjxp XX exp
2
1
is always fulfilled.
44
SOLO Review of Probability
Properties of Moment-Generating Function
( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X
ω
ω
ω
exp
( ) ( ) 10
==Φ ∫
+∞
∞−
=
dxxpXX ω
ω
( ) ( ) ( )xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=0ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X 22
2
2
exp ω
ω
ω ( ) ( ) ( ) ( ) ( )2222
0
2
2
xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
nn
n
X
n
ω
ω
ω
exp
( ) ( ) ( ) ( ) ( )nn
X
nn
n
X
n
xEjdxxpxj
d
d
==
Φ
∫
+∞
∞−=0ω
ω
ω
 
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
This is the reason why ΦX (ω) is also called the Moment-Generation Function.
45
SOLO Review of Probability
Properties of Moment-Generating Function
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 

+++++=
+
Φ
++
Φ
+
Φ
+Φ=Φ
===
=
n
n
n
n
X
n
XX
XX
xE
n
j
xE
j
xE
j
d
d
nd
d
d
d
!!2!1
1
!
1
!2
1
2
2
0
2
0
2
2
0
0
ωωω
ω
ω
ω
ω
ω
ω
ω
ω
ω
ωω
ωωω
ω
Develop ΦX (ω) in a Taylor series
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
46
SOLO Review of Probability
Moment-Generating Function
Binomial Distribution
( ) ( )[ ] ( )
( )
( )
( )
( )[ ] ( )
( ) ( )[ ] n
n
k
knk
n
k
knk
pjp
ppkj
knk
n
pp
knk
n
kjkjE
−+=
−
−
=−
−
==Φ ∑∑
=
−
=
−
1exp
1exp
!!
!
1
!!
!
expexp
00
ω
ωωωω
( )
( )
( ) knk
pp
knk
n
nkp
−
−
−
= 1
!!
!
,
Poisson Distribution ( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( ) ( ) ( )[ ]
( ) ( )[ ] ( )[ ]{ }1expexpexpexpexp
!
exp
exp
!
exp
exp
00
−=−=
−=
−
=Φ ∑∑
∞
=
∞
=
ωλλωλ
λω
λ
λλ
ωω
jj
k
j
k
kj
k
k
k
k
Exponential Distribution
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
( ) ( ) ( ) ( )[ ]
( )[ ]
ωλ
λ
λω
λω
λ
λωλλλωω
j
xj
j
dxxjdxxxj
−
=−
−
=
−=−=Φ
∞
∞∞
∫∫
0
00
exp
expexpexp
47
SOLO Review of Probability
Moment-Generating Function
Normal Distribution ( ) ( )





 −
−= 2
2
2
exp
2
1
,;
σ
µ
σπ
σµ
x
xp
( ) ( ) ( ) ( )
∫∫
∞
∞−
∞
∞−





 −+−
−=




 −
−=Φ dx
xjxx
dx
x
xj 2
222
2
2
2
22
exp
2
1
2
expexp
2
1
σ
ωσµµ
σπσ
µ
ω
σπ
ω
Let write ( )
( )[ ] ( )
( )[ ] µσωσωσωµ
µσωµσωµ
µσωµσωµµ
24222
22222
222222
2
222
jjx
jjx
xjxxjxx
−++−=
++−+−=
++−=−+−
Therefore
( ) ( ) ( ) ( )[ ]
  
1
2
22
2
242
2
2
2
exp
2
1
2
2
exp
2
expexp
2
1
∫∫
∞
∞−
∞
∞− 






 +−
−




 −
−=




 −
−=Φ dx
jxj
dx
x
xj
σ
σωµ
σπσ
µσωσω
σ
µ
ω
σπ
ω
Central Limit
Theorem
( ) ( ) ( )∫
+∞
∞−
Φ−= ωωω
π
dxjxp XX
exp
2
1
Using
( ) ( )∫
∞
∞−






−−−=




 −
− ωµωσω
πσ
µ
σπ
dxj
x 22
2
2
2
1
exp
2
1
2
exp
2
1
( ) 





+−=Φ µωσωω j22
2
1
exp Poisson
Distribution
48
SOLO Review of Probability
Properties of Moment-Generating Function
Moment-Generating Function of the Sum of Independent Random Variables
mm
XXXS +++= 21
:
Given the Sum of Independent Random Variables
( ) ( )[ ] ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )∫∫∫
∫ ∫ ∫
∞
∞−
∞
∞−
∞
∞−
=
∞
∞−
∞
∞−
+∞
∞−
=
+++=Φ
mmXmXX
xpxpxpxxxp
VariablesRandomtIndependen
mmSmS
xdxpxjxdxpxjxdxpxj
dxdxdxxxxpXXXj
m
mmXXXmmS
mm
ωωω
ωω
expexpexp
,,,exp
222111
,,,
212121
21
221121



( ) ( ) ( ) ( )ωωωω mm XXXS
ΦΦΦ=Φ 21
( ) ( ) ,m,,ik
k
kp i
i
k
ii
iiXi
21integerpositive
!
exp
; =
−
=
λλ
λ
( ) ( )[ ]{ } mijiXi
,,2,11expexp =−=Φ ωλω
( ) ( ) ( ) ( ) ( ) ( )[ ]{ }1expexp 2121
−+++=ΦΦ⋅Φ=Φ ωλλλωωωω jmXXXS mm

( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
Sum of Poisson Independent Random Variables is a Poisson Random Variable
with mSm
λλλλ +++= 21
Example 1: Sum of Poisson Independent Random Variables mm
XXXS +++= 21
:
49
SOLO Review of Probability
Properties of Moment-Generating Function
Example 2: Sum of Normal Independent Random Variables
( ) ( ) ( ) ( )
( ) ( )





+++++++−=






+−





+−





+−=
ΦΦ⋅Φ=Φ
mm
mm
XXXS
j
jjj
mm
µµµωσσσω
µωσωµωσωµωσω
ωωωω



21
22
2
2
1
2
22
2
2
2
2
1
2
1
2
2
1
exp
2
1
exp
2
1
exp
2
1
exp
21
Sum of Normal Independent Random Variables is a Normal Random Variable with
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
( ) ( )





 −
−= 2
2
2
exp
2
1
,;
i
i
i
iii
x
xp
σ
µ
σπ
σµ
mm
XXXS +++= 21
:
( ) 





+−=Φ iiX
ji
µωσωω
22
2
1
exp
mS
mS
m
m
µµµµ
σσσσ
+++=
+++=


21
22
2
2
1
2
Therefore the Sm probability distribution is:
( ) ( )







 −
−= 2
2
2
exp
2
1
,;
m
m
m
mm
S
S
S
SSm
x
Sp
σ
µ
σπ
σµ
Table of Content
50
SOLO Review of Probability
Existence Theorems
Existence Theorem 1
Given a function G (x) such that
( ) ( ) ( ) 1lim,1,0 ==∞+=∞−
∞→
xGGG
x
( ) ( ) 2121 0 xxifxGxG <=≤ ( G (x) is monotonic non-decreasing)
( ) ( ) ( )xGxGxG n
xx
xx
n
n
==
≥
→
+ lim
We can find an experiment X and a random variable x, defined on X, such that
its distribution function P (x) equals the given function G (x).
Proof of Existence Theorem 1
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axis.
5x
1x 2x 3x 4x 6x 7x 8x
∞− ∞+
To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1).
From our definition of G (x) it follows that P (x) is a distribution function.
Existence Theorem 2 Existence Theorem 3
51
SOLO Review of Probability
Existence Theorems
Existence Theorem 2
If a function F (x,y) is such that
( ) ( ) ( )
( ) ( ) ( ) ( ) 0,,,,
1,,0,,
11122122 ≥+−−
=+∞∞+=−∞=∞−
yxFyxFyxFyxF
FxFyF
for every x1 < x2, y1 < y2, then two random variables x and y can be found such that
F (x,y) is their joint distribution function.
Proof of Existence Theorem 2
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
Assume that the outcome of the experiment Y is any real number -∞ <y < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axes x and y.
To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1).
From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.
The proof is similar to that in the Existence Theorem 1
52
SOLO Review of Probability
Histogram
A histogram is a mapping mi that counts the number
of observations that fall into various disjoint
categories (known as bins), whereas the graph of a
histogram is merely one way to represent a histogram.
Thus, if we let n be the total number of observations
and k be the total number of bins, the histogram mi
meets the following conditions:
∑=
=
k
i
imn
1
A cumulative histogram is a mapping that counts the cumulative number of
observations in all of the bins up to the specified bin. That is, the cumulative
histogram Mi of a histogram mi is defined as:
∑=
=
i
j
ji mM
1
An ordinary and a cumulative
histogram of the same data. The
data shown is a random sample of
10,000 points from a normal
distribution with a mean of 0 and a
standard deviation of 1.
Cumulative Histogram
53
SOLO Review of Probability
Law of Large Numbers (History)
The Weak Law of Large Numbers was first proved by the Swiss
mathematician James Bernoulli in the fourth part of his work
“Ars Conjectandi” published posthumously in 1713.
Jacob Bernoulli
-1654 1705
The Law of Large Numbers has three versions:
• Weak Law of Large Numbers (WLLN)
• Strong Law of Large Numbers (SLLN)
• Uniform Law of Large Numbers (ULLN)
The French mathematician Siméon Poisson generalized
Bernoulli’s theorem around 1800.
Siméon Denis Poisson
1781-1840
The next contribution was by Binaymé and later
in 1866 by Chebyshev and is known as
Binaymé- Chebyshev Inequality.
Pafnuty Lvovich
Chebyshev
1821 - 1894
Irénée-Jules Bienaymé
1796 - 1878
Weak Law of Large Numbers (WLLN)
54
SOLO Review of Probability
Law of Large Numbers (History - continue)
Francesco Paolo
Cantelli
1875-1966
Félix Edouard Justin Ėmile
Borel
1871-1956
Andrey Nikolaevich
Kolmogorov
1903 - 1987
Table of Content
Borel-Cantelli Lemma
55
SOLO Review of Probability
Markov’s Inequality
If X is a random variable which takes only nonnegative values, then for any value a>0
( ) ( )
a
XE
aX ≤≥Pr
Proof:
Suppose X is continuous with probability density function ( )xpX
( ) ( ) ( ) ( )
( ) ( ) ( )
( )aXa
dxxpadxxpadxxpx
dxxpxdxxpxdxxpxXE
a
X
a
X
a
X
a
X
a
XX
≥=
=≥≥
+==
∫∫∫
∫∫∫
∞∞∞
∞∞
Pr
00
1856 - 1922
Since a > 0:
( ) ( )
a
XE
aX ≤≥Pr
Table of Content
56
SOLO Review of Probability
Chebyshev’s Inequality
If X is a random variable with mean μ = E (X) and variance σ2
= E [(X – μ)2
] ,
then for any value k > 0
{ } 2
2
Pr
k
kX
σ
µ ≤≥−
Proof:
Since (X – μ)2
is a nonnegative random variable,
we can apply Markov’s inequality with a = k2
to obtain
( ){ } ( )[ ]
2
2
2
2
22
Pr
kk
XE
kX
σµ
µ =
−
≤≥−
But since (X – μ)2
≥ k2
if and only if | X – μ| ≥ k, the above is equivalent to
{ } 2
2
Pr
k
kX
σ
µ ≤≥−
Pafnuty Lvovich
Chebyshev
1821 - 1894
Weak Law of
Large NumbersTake k σ instead of k to obtain
{ } 2
1
Pr
k
kX ≤≥− σµ
Bernoulli’s
Theorem
Table of Content
57
SOLO Review of Probability
Bienaymé’s Inequality
If X is a random variable then for any values a, k > 0 use
Proof:
Let prove first that if the random variable y takes only positive values, than for any α>0
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
nn
( ) ( )
α
α
yE
y ≤≥Pr
i.e.
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
( ) ( ) ( ) ( ) ( )ααα
αα
≥=≥≥= ∫∫∫
∞∞∞
ydyypdyypydyypyyE YYY Pr
0
Define and choose α = kn
> 00: ≥−=
n
aXy
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
nn
kaXkaX nn
≥−⇒≥−
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
Irénée-Jules
Bienaymé
1796 - 1878
Markov’s Inequality
For n = 2 and a = μ we obtain the Chebyshev’s Inequality. For this reason
Chebyshev’s Inequality is known also as Bienaymé - Chebyshev’s Inequality
Markov’s Inequality
x
0 ∞+
0.1
[ ]
n
n
k
aXE −
−
a
ka +ka −
[ ]
n
n
k
aXE −
( )xpX
Table of Content
58
SOLO Review of Probability
Chernoff’s and Hoeffding’s Bounds
Start from Markov’s Inequality
for a nonnegative random
variable Z and γ > 0
( ) ( ) 0,0Pr ≥>≤≥ Z
ZE
Z γ
γ
γ
Now let take a random variable Y and define the logarithmic
generating function
( )
( )[ ] ( )[ ]



∞
∞<
=Λ
otherwise
YtEifYtE
tY
,
exp,expln
:
Using the fact that exp (x) is a monotonic increasing function
( ) ( ) 0expexp ≥∀≥⇒≥ ttYtY λλ
and applying Markov’s inequality with ( ) ( )λγ tYtZ exp:&exp: ==
we obtain:
( ) ( ) ( )[ ] ( )[ ]
( )
( )[ ]{ } 0exp
exp
exp
expexpPrPr ≥∀Λ−−=≤≥=≥ ttt
t
YtE
tYtY Y
λ
λ
λλ
Therefore: ( ) ( )[ ]{ }ttY Yt
Λ−−≤≥ ≥∀
λλ expinfPr 0
From this inequality, by using different Y, we obtain the Chernoff’s
and Hoeffding’s bounds
To compute ΛY(t) we need to know the distribution function pY (y).
Markov, Chebyshev and Bienaymé inequalities use only Expectation Value
information. Let try to obtain a tighter bound when the probability distribution
function is known.
0
infimum
≥∀t
Table of Content
59
SOLO Review of Probability
Chernoff’s Bound
Let X1, X2,… be independent Bernoulli’s random variables
with Pr (Xi=1) = p and Pr (Xi=0) = 1-p
Herman Chernoff
1921 -
Use
( ) mm XmXXY =++= :/: 1 Define:
( ) ( )[ ]{ } ( ) ( ) ( )[ ]
( )[ ]ptp
ptptXtEt iXi
−+=
−⋅+⋅==Λ
1expln
10exp1explnexpln
( ) ( )[ ]{ }ttY Yt
Λ−−≤≥ ≥∀
λλ expinfPr 0
( ) ( )[ ]{ }pmtpmX
m
t
Et
m
i
iY −+=




















=Λ ∑=
1/explnexpln
1
( )[ ]{ } ( )[ ]tttt Y
t
Yt
Λ−⇔Λ−−
≥∀≥∀
λλ
00
supexpinf
( ) ( )[ ]pmtpmttt Y −+−=Λ− 1/explnλλ
( )[ ] ( )
( )
0
1/exp
/exp
=
−+
−=Λ−
pmtp
mtp
tt
td
d
Y λλ ( )
λ
λ
−
−
=
1
1
/*exp
p
p
mt
( ) ( ) 











−
−
−+





=





−
−
−





−
−
=Λ−
pp
m
p
m
p
p
mtt Y
1
1
ln1ln
1
1
ln
1
1
ln**
λ
λ
λ
λ
λλ
λ
λλ
60
SOLO Review of Probability
Chernoff’s Bound (continue – 1)
Use
( )[ ] ( ) 10
1
1
ln1lnexpinf/Pr
10
1 <<




















−
−
−+





−≤≥++
<<
p
pp
mmXX m
λ
λ
λ
λλ
λ

Define:
( ) ( ) 1,0
1
1
ln1ln:| <<











−
−
−+





−= p
pp
mpH λ
λ
λ
λ
λλ
( ) ( )[ ]{ } ( )[ ]{ }**expsupexpinfsupPr
10010
ttttY YY
t
Λ−−=Λ−−≤≥
<<≥∀<<
λλλ
λλ
( ) 0| == ppH λ
( )






/−





−
−
−/+





−= 1
1
1
ln1ln
|
pp
m
d
pHd λλ
λ
λ ( ) 0
|
=
=
λ
λ
d
ppHd
( ) mm
d
pHd
4
1
11|
2
2
−≤





−
+−=
λλλ
λ
0 0.1
m4−
( )
2
2
|
λ
λ
d
pHd
5.0 λ
61
SOLO Review of Probability
Chernoff’s Bound (continue – 2)
( )[ ] ( )[ ] ( ) 10|supexp|expsup/Pr
1010
1 <<


=≤≥++
<<<<
ppHpHmXX m λλλ
λλ

( ) ( ) 1,0
1
1
ln1ln:| <<











−
−
−+





−= p
pp
mpH λ
λ
λ
λ
λλ
( ) ( ) ( ) ( ) ( ) ( )
( )2
4
10
2
00
2
|
!2
|
!1
||
pm
pH
p
ppH
p
ppHpH
m
−−≤
=
−
+=
−
+==
−≤
≤≤
λ
θλ
λ
λ
λ
λλ
θ
λλλ
    
From which we arrive to the Chernoff’s Bound
( )[ ] ( )[ ] 1,02exp/Pr
2
1 <<−−≤≥++ ppmmXX m λλλ
Define
ελ += p:
( )[ ] [ ] 102exp/Pr 2
1
<<−≤+≥++ pmpmXX m
εε
62
SOLO Review of Probability
Chernoff’s Bound (continue – 3)
Using the Chernoff Bound we obtain
Define now:
( ) ( )( )[ ] [ ] 102exp1/11Pr 2
1
<<−≤+−≥−++− pmpmXX m
εε
( ) ( )[ ] mm XmXXY −=−++−= 1:/11: 1 
or since ( ) ( )( ) ( ) ε+−≥++−=−++− pmXXmXX mm
1/1/11 11

( ) ε−≤++ pmXX m
/1

( )[ ] [ ] 102exp/Pr 2
1
<<−≤−≤++ pmpmXX m
εε
together with:
( )[ ] [ ] 102exp/Pr 2
1
<<−≤+≥++ pmpmXX m
εε
Chernoff’s Bounds
Herman Chernoff
1921 -
( )[ ] [ ] 102exp2/Pr 2
1
<<−≤≥−++ pmpmXX m
εε
By summing those two inequalities we obtain:
( )[ ] ( )[ ] [ ] 102exp2//Pr 2
11
<<−≤≥−+++−≤−++ pmpmXXpmXX mm
εεε 
or:
Table of Content
63
SOLO Review of Probability
Hoeffding’s Bound
Suppose that Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b
and assume E (Y) = 0
Define: 10: ≤≤→
−
−
=
≤≤
αα
bYa
ab
Yb
We have: ( )bab
ab
aY
a
ab
Yb
Y αα −+=
−
−
+
−
−
= 1
Since exp (.) is a convex function, for any t ≥ 0 we
have:
( ) ( ) ( ) ( ) ( ) ( ) 0expexpexp1expexp ≥∀
−
−
+
−
−
=−+≤ ttb
ab
aY
ta
ab
Yb
tbtatY αα ta tbtY
( )taexp
( )tbexp
( )tYexp
( ) ( ) ( )tbta exp1exp λλ −+
( ) 01 ≥−+= ttbtatY λλ
Let take the expectation of this inequality and define:
( )
( )
10/:
0
≤≤→−−=
=
pabap
YE
( ){ } ( )

( ) ( )

( )
( ) ( ) ( )
( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1
expexp1
expexpexp
00
≥∀=−−+−=
+−=
−
−
+
−
−
≤
tutabptabpp
tbptap
tb
ab
aYE
ta
ab
YEb
tYE
φ
Let start with a simpler problem:
64
SOLO Review of Probability
Hoeffding’s Bound (continue – 1)
( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ
where:
( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu
( ) ( )
( )
( ) ( )[ ]
( )
( ) 00
exp1
1exp1
exp1
exp
=→
+−
−−
=
+−
+−=
ud
d
upp
upp
upp
up
pu
ud
d φ
φ
Differentiating we obtain:
( ) ( ) ( )
( ) ( )[ ]22
2
exp1
exp1
upp
upp
u
ud
d
−−+
−−
=φ
( ) ( ) ( )
( ) ( )[ ]
( ) ( )[ ] ( )
p
p
uupp
upp
upp
u
ud
d
−
=−→=−−+−
−−+
−−
=
1
*exp0exp1
exp1
exp1
33
3
φ
( ) ( )
( )
( )
( )
( )
4
1
1
1
1
1
* 22
2
2
2
=






−
−+
−
−
=≤
p
p
pp
p
p
pp
u
ud
d
u
ud
d
φφ
65
SOLO Review of Probability
Hoeffding’s Bound (continue – 2)
( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ
where:
( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu
( ) ( ) ( ) ( ) ( )222
4
100
8
1
''
2
1
0'0 abtuuu −≤++=
≤

θφφφφ
( ){ } ( )[ ] 08/expexp
22
≥∀−≤ tabttYE
End of the simpler problem:
Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and
assume E (Y) = 0
66
SOLO Review of Probability
Hoeffding’s Bound (continue – 3)
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for
i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
( ){ } ( )[ ]
( )
( ){ } ( )[ ] 08/expexp8/expexp
22
0
22
≥∀−≤−→−≤
=
tabttZEabttZE iii
ZE
iii
i
Generalize the result
Use
( ) ( )[ ]
( )
0
exp
exp
Pr ≥∀≤≥ t
t
YtE
Y
λ
λ
in
( ) ( )
( ) ( ) ( ) ( )
( ) ( )[ ]
( ) 0
8
exp2
8/expexp2
expexpexpexp
expexpexpexp
PrPrPr
1
2
2
1
22
11
11
111
≥∀





−+−=
−−≤






−−+





−=












−−+











−≤






≥−+





≥=





≥
∑
∏
∏∏
∑∑
∑∑∑
=
=
==
==
===
tab
t
t
abtt
ZtEtZtEt
ZtEtZtEt
ZZZ
m
i
ii
m
i
ii
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
λ
λ
λλ
λλ
λλλ
mm ZZZZ +++= 21
67
SOLO Review of Probability
Hoeffding’s Bound (continue – 4)
Wassily Hoeffding
1914 - 1991
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for
i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
Generalize the result
but
( ) ( ) 0
8
infexp2
8
exp2Pr
1
2
2
0
1
2
2
1
≥∀












−+−≤





−+−≤





≥ ∑∑∑ =
≥
==
tab
t
tab
t
tZ
m
i
ii
t
m
i
ii
m
i
i λλλ
mm ZZZZ +++= 21
( )
( )
( )∑ −−=





−+−
=
∑ −=
=
≥
=
∑
m
i
ii
abt
m
i
ii
t
abab
t
t
m
i
ii
1
22
/4*
1
2
2
0
/2
8
inf
1
2
λλ
λ
( ) 











−
−≤





≥
∑
∑
=
=
m
i
ii
m
i
i
ab
Z
1
2
1
2
exp2Pr
λ
λ
We finally obtain Hoeffding’s Bound
Table of Content
68
SOLO Review of Probability
Convergence Concepts
We say that the sequence Xn, converge to X with probability 1 if the set of outcomes
x such that
has the probability 1, or
{ } ∞→=→ nforXXn 1Pr
( ) ( )xXxXn
n
=
∞→
lim
Convergence Almost Everywhere (a.e.) (or with Probability 1, or Strongly)
Convergence in the Mean-Square sence (m.s.)
We say that the sequence Xn, converge to X in the mean-square sense if
{ } ∞→→→ nforXXE n 0
2
Convergence in Probability (p) (or Stochastic Convergence or Convergence in Measure)
We say that the sequence Xn, converge to X in Probability sense if
{ } ∞→→>→ nforXXn 0Pr ε
Convergence in Distribution (d) (weak convergence)
We say that the sequence Xn, converge to X in Distribution sense if
( ) ( ) ∞→→ nforxpxp XXn
No convergence
Distribution
Almost
Everywhere
(d)
(p)(a.e.)
(m.s.)
Mean Square
Probability
implies
implies
implies
or XX
ea
n
..
→
or XX
sm
n
..
→
or XX
P
n →
or XX
d
n →
Weak Law of
Large Numbers
Central Limit
Theorem
Bernoulli’s
Theorem
69
SOLO Review of Probability
Convergence Concepts (continue – 1)
According to Cauchy Criterion of Convergence
the sequence Xn, converge to a unknown limit if
Cauchy Criterion of Convergence
Augustin Louis Cauchy
)-1789 1857(00 >∞→→→ + manyandnforXX mnn
Convergence Almost Everywhere (a.e.)
{ } 01Pr >∞→=<→ + manyandnforXX mnn ε
Convergence in the Mean-Square sence (m.s.)
{ } 00
2
>∞→→<→ manyandnforXXE mn ε
{ } ∞→→>→ nforXXn 0Pr ε
Using Chebyshev Inequality
{ } { } 22
/Pr εε XXEXX nn →≤>→
If Xn →X in the m.s. sense, then the right hand,
for a given ε, tends to zero, also the left hand side,
i.e.: Convergence in Probability (p)
The opposite is not true, convergence in probability doesn’t imply convergence in m.s.
No convergence
Distribution
Almost
Everywhere
(d)
(p)(a.e.)
(m.s.)
Mean Square
Probability
implies
implies
implies
Table of Content
70
SOLO Review of Probability
The Laws of Large Numbers
The Law of Large Numbers is a fundamental concept in statistics and probability that
describes how the average of randomly selected sample of a large population is likely
to be close to the average of the whole population. There are two laws of large numbers
the Weak Law and the Strong Law.
The Weak Law of Large Numbers
The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), then
( ) nXXX nn /: 1 ++= 
converges in probability (a weak convergence sense) to μ . We have
{ } ∞→=<− nforXn 1Pr εµ
converges in
probability
The Strong Law of Large Numbers
The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), and E (|Xi|) < ∞
then ,i.e. converges almost surely to μ.{ } ∞→== nforXn 1Pr µ
converges
almost surely
71
SOLO Review of Probability
The Law of Large Numbers
Differences between the Weak Law and the Strong Law
The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ.
Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite
number of times, although it happens at infrequent intervals.
The Strong Law shows that this almost surely will not occur.
In particular, it implies that with probability 1, we have for any positive value ε, the
inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to
an infinite, but infrequent, number of times).
Almost sure convergence is also called strong convergence of random variables.
This version is called the strong law because random variables which converge
strongly (almost surely) are guaranteed to converge weakly (in probability). The
strong law implies the weak law.
72
SOLO Review of Probability
The Law of Large Numbers
Proof of the Weak Law of Large Numbers
( ) iXE i ∀= µ ( ) iXVar i ∀= 2
σ ( )( )[ ] jiXXE ji ≠∀=−− 0µµ
( ) ( ) ( )[ ] µµ ==++= nnnXEXEXE nn //1 
( ) ( )[ ]{ } ( ) ( )
( )( )[ ] ( )[ ] ( )[ ]
nn
n
n
XEXE
n
XX
E
n
XX
EXEXEXVar
n
jiXXE
nn
nnn
ji 2
2
2
2
22
1
0
2
1
2
12
σσµµ
µµ
µ
µµ
==
−++−
=













 −++−
=














−
++
=−=
≠∀=−−


Given
we have:
Using Chebyshev’s inequality on we obtain:nX ( ) 2
2
/
Pr
ε
σ
εµ
n
Xn ≤≥−
Using this equation we obtain:
( ) ( ) ( ) n
XXX nnn 2
2
1Pr1Pr1Pr
ε
σ
εµεµεµ −≥≥−−≥>−−=≤−
As n approaches infinity, the expression approaches 1.
Chebyshev’s
inequality
q.e.d.
Table of Content
Monte Carlo
Integration
Monte Carlo
Integration
73
SOLO Review of Probability
Central Limit Theorem
The first version of this theorem was first postulated by the
French-born English mathematician Abraham de Moivre in
1733, using the normal distribution to approximate the
distribution of the number of heads resulting from many tosses
of a fair coin. This was published in1756 in “The Doctrine
of Chance” 3th Ed.
Pierre-Simon Laplace
(1749-1827)
Abraham de Moivre
(1667-1754)
This finding was forgotten until 1812 when the French
mathematician Pierre-Simon Laplace recovered it in his work
“Théory Analytique des Probabilités”, in which he approximate
the binomial distribution with the normal distribution.
This is known as the De Moivre – Laplace Theorem.
De Moivre – Laplace
Theorem
The present form of the Central Limit Theorem was given by the
Russian mathematician Alexandr Lyapunov in 1901.
Alexandr Mikhailovich
Lyapunov
(1857-1918)
74
SOLO Review of Probability
Central Limit Theorem (continue – 1)
Let X1, X2, …, Xm be a sequence of independent random variables with the same
probability distribution function pX (x). Define the statistical mean:
m
XXX
X m
m
+++
=
21
( ) ( ) ( ) ( ) µ=
+++
=
m
XEXEXE
XE m
m
21
( ) ( )[ ]{ } ( ) ( ) ( )
mm
m
m
XXX
EXEXEXVar m
mmmXm
2
2
22
21
22 σσµµµ
σ ==













 −++−+−
=−==

Define also the new random variable
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
We have:
The probability distribution of Y tends to become gaussian (normal) as m
tends to infinity, regardless of the probability distribution of the random
variable, as long as the mean μ and the variance σ2
are finite.
75
SOLO Review of Probability
Central Limit Theorem (continue – 2)
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
Proof
The Characteristic Function
( ) ( )[ ] ( ) ( ) ( )
( ) ( )
( )
m
X
m
i
m
i
i
m
Y
m
X
m
j
E
m
X
jE
m
XXX
jEYjE
i














Φ=



















 −
=













 −
=













 −++−+−
==Φ
−
=
∏
ω
σ
µω
σ
µ
ω
σ
µµµ
ωωω
σ
µexpexp
expexp
1
21 
( )
( ) ( ) ( ) ( ) ( ) ( )
0/lim
2
1
!3
/
!2
/
!1
/
1
2222
33
1
22
0
=





Ο/





Ο/+−=
+













 −
+













 −
+




 −
+=





Φ
∞→
−
mmmm
X
E
mjX
E
mjX
E
mj
m
m
iii
Xi
ωωωω
σ
µω
σ
µω
σ
µωω
σ
µ 
  
Develop in a Taylor series( ) 





Φ −
miX
ω
σ
µ
76
SOLO Review of Probability
Central Limit Theorem (continue – 3)
Proof (continue – 1)
The Characteristic Function ( ) ( )
m
XY
m
E i














Φ=Φ −
ω
ω
σ
µ
( ) 0/lim
2
1
2222
=





Ο/





Ο/+−=





Φ
∞→
−
mmmmm m
Xi
ωωωωω
σ
µ
( ) ( )2/exp
2
1 2
22
ω
ωω
ω −→











Ο/+−=Φ
∞→m
m
Y
mm
Therefore
( ) ( ) ( ) ( ) ( )2/exp
2
1
2/exp
2
1
exp
2
1 22
ydyjdyjyp
m
YY −=−−→Φ−= ∫∫
+∞
∞−
∞→+∞
∞− π
ωωω
π
ωωω
π
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity
(Convergence in Distribution).
Characteristic Function
of Normal Distribution
Convergence
Concepts
Table of Content
Monte Carlo
Integration
77
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
Jacob
Bernoulli
-1654 1705
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) pnxE =
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance( ) ( )ppnxVar −= 1
( ) ( )∫ −= −
x
a
dtttxa
0
1
exp,γγ is the incomplete gamma function
Moment Generating Function
( ) ( )[ ]n
pjp −+=Φ 1exp ωω Distribution
Examples
78
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 1)
p – probability of success (r = 1) of a given discrete trial
q – probability of failure (r=0) of the given discrete trial
1=+ qp
n – number of independent trials
( )nkp , – probability of k successes in n independent trials (Bernoulli Trials)
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
Using the binomial theorem we obtain
( ) ( )∑=
−
−





==+
n
k
knkn
pp
k
n
qp
0
11
therefore the previous distribution is called binomial distribution.
Jacob
Bernoulli
-1654 1705
Given a random event r = {0,1}
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
The probability of k successful trials from n independent
trials is given by
The number of k successful trials from n independent trials is given by
( )!!
!
knk
n
k
n
−
=





with probability ( ) knk
pp
−
−1
to permutations
and Combinations
Distribution
Examples
79
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 2)
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
( )
( )
( ) ( )
( ) ( )
( )
( )
( )
( ) ( )[ ] pnpppnpp
knk
n
pn
pp
ini
n
pnpp
ini
n
iXE
n
n
k
knk
ik
n
i
ini
n
i
ini
=−+=−
−−
−
=
−
−−
−
=−
−
=
−
−
=
−−
−=
=
−−
=
−
∑
∑∑
1
1
0
1
1
1
1
0
11
!1!
!1
1
!!1
!1
1
!!
!
Mean Value
Moment Generating Function
( ) ( ) ( )
( )
( )
( ) ( )
( )[ ]nj
n
k
knkj
n
k
knkkjXj
pep
ppe
knk
n
pp
knk
n
eeE
−+=
−
−
=−
−
==Φ ∑∑ =
−
=
−
1
1
!!
!
1
!!
!
00
ω
ωωω
ω
80
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 3)
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
( )
( )
( )
( ) ( )
( )
( ) ( )
( )
( ) ( )
( )∑∑
∑∑
=
−
=
−
=
−
=
−
−
−−
+−
−−
=
−
−−
=−
−
=
n
i
ini
n
i
ini
n
i
ini
n
i
ini
pp
ini
n
pp
ini
n
pp
ini
n
ipp
ini
n
iXE
12
00
22
1
!!1
!
1
!!2
!
1
!!1
!
1
!!
!
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )
( ) ( )
( )
( )
( )[ ]
( )
( )
( )
( )[ ]
( ) pnpnn
pp
knk
n
pnpp
mnm
n
pnn
pp
ini
n
pnpp
ini
n
pnn
nn
pp
n
k
knk
pp
n
m
mnm
n
i
ini
n
i
ini
+−=
−
−−
−
+−
−−
−
−=
−
−−
−
+−
−−
−
−=
−−
−+
=
−−
−+
=
−−
=
−−
=
−−
∑∑
∑∑
2
1
0
1
1
0
22
1
1
2
22
1
1
!1!
!1
1
!2!
!2
1
1
!!1
!1
1
!!2
!2
1
12
    
( ) ( ) ( ) ( ) ( ) ( )ppnpnpnpnnXEXEXVar −=−+−=−= 11
2222
Variance
81
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 4)
Let apply Chebyshev’s Inequality:
( ) pnxE = Mean Value
Variance( ) ( ) ( ) ( )ppnXEXExVar −=−= 1
22
( )[ ]{ } ( )( )[ ]
2
2
2
2
22
Pr
kk
XEXE
kXEX
σ
=
−
≤≥−
[ ]{ } ( )
2
22 1
Pr
k
ppn
kpnX
−
≤≥−
we obtain:
An upper bound to this inequality, when p varies (0 ≤ p ≤ 1), can be obtained by
taking the derivative of p (1 – p), equating to zero, and solving for p. The result is
p = 0.5.
[ ]{ } 2
22
4
Pr
k
n
kpnX ≤≥−
Chebyshev’s Inequality
We can see that when k → ∞ ,i.e. X converges in
Probability to Mean Value n p. This is known as Bernoulli’s Theorem.
[ ]{ } 0Pr 22
P
kpnX →≥−
Convergence in
Probability
82
SOLO Review of Probability
Generalized Bernoulli Trials
Consider now r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 =≠/=∩
with their sum equal to certain event S: SAAA r =∪∪∪ 21
and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211 
Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp 
We want to find the probability that in n trials will obtain A1, k1 times, A2, k2
times, and so on, and Ar, kr times such that nkkk r =+++ 21
!!!
!
21 rkkk
n

The number of possible combinations of k1 events A1, k2 events A2, …,kr events Ar
is and the probability of each combination is
rk
r
kk
ppp 21
21
We obtain the probability of the Generalized Bernoulli Trials as
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
Permu-
tations
& combi-
nations
Table of Content
83
SOLO Review of Probability
Poisson Asymptotical Development (Law of Rare Events)
( ) ( )
( )
( ) knkknk
pp
knk
n
pp
k
n
nkp
−−
−
−
=−





= 1
!!
!
1,Start with the Binomial Distribution
We assume that n >> 1 and ( ) 1/1/ 00 <<≈+= nknkp
( ) ( ) 0
0
0
00
111,0 k
n
k
k
n
n
n
e
n
k
n
k
pnkp −
∞→
−
−
→














−=





−=−==
( ) ( ) ( )
( ) ( ) ( ) ( )nkp
k
k
n
k
n
n
k
n
n
nkp
k
k
n
k
n
knnn
n
k
n
k
k
knnn
nkp
k
k
k
k
k
k
k
knk
n
,0
!
1
1
1
1
1
,0
!
1
11
1
!
11
,
0
1
0
0
0
00
=






−





 −
−





−
==






−
+−−
=






−




+−−
=
→∞
→
−
  



( ) ( )0
0
exp
!
, k
k
k
nkp
k
−≈ This is Poisson Asymptotical Development (Law of Rare Events)
Siméon Denis Poisson
1781-1840
Distribution
Examples
84
SOLO Review of Probability
Poisson Distribution
Siméon Denis Poisson
1781-1840
( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) ( ) ( )
( )
( )
( )
λ
λ
λλ
λ
λλ
λλ
λ
=−=
−
−=
−
= ∑∑∑
∞
=
−=∞
=
−∞
=

exp
0
1:
1
1
0 !
exp
!1
exp
!
exp
k
kik
i
i
i
i
kii
i
XE
Probability Density Functions
Mean Value
Variance ( ) ( ) ( ) λ=−=
22
XEXExVar
Moment Generating Function
( ) ( )[ ] ( ) ( ) ( ) ( )[ ]
( )[ ] ( )[ ]{ }1expexpexpexp
!
exp
exp
!
expexp
exp
00
−==
−=
−
==Φ
−
∞
=
∞
=
∑∑
ωλλω
λω
λ
λλω
ωω
λ
jje
m
j
m
mj
kjE
m
m
m
m
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
λλ
λλ
λλλ
λ
λλ
λλ
λλ
+=












−
+
−
−=
−
−=
−
= ∑∑∑∑
∞
=
−∞
=
−∞
=
−∞
=
2
exp
1
1
exp
2
2
1
1
0
2
2
!1!2
exp
!1
exp
!
exp
 i
i
i
i
i
i
i
i
iii
i
i
i
XE
85
SOLO Review of Probability
Poisson Distribution
Moment Generating Function ( ) ( )[ ]{ }1expexp −=Φ ωλω j
Approximation to the Gaussian Distribution
( )[ ] ( ) ( ) ωλωλωλωλωλ sin2/sin2sin1cos1exp 2
jjj +−=+−=−
For λ sufficient large Φ (ω) is negligible for all but very small values of ω,
in which case ( ) ( ) ωωωω ≈≈ sin&2/2/sin
22
( ) ( )[ ]{ } 





+−≈−=Φ ωλ
λ
ω
ωλω jj
2
exp1expexp
2
( ) 





+−=Φ µωσωω j22
2
1
exp
For a normal distribution with mean μ and variance σ2
we found the
Moment Generating Function:
Therefore the Poisson Distribution can be approximated by a Gaussian
Distribution with mean μ = λ and variance σ2
= λ
( ) ( ) ( )





 −
−
−
=
λ
λ
λπ
λλ
λ
2
exp
2
1
~
!
exp
;
2
k
k
kp
k
86
SOLO Review of Probability
Poisson Distribution
Siméon Denis Poisson
1781-1840
( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) λ=xE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance( ) λ=xVar
( ) ( )∫ −= −
x
a
dtttxa
0
1
exp,γγ is the incomplete gamma function
Moment Generating Function
( ) ( )[ ]{ }1expexp −−=Φ ωλω j
Table of Content
87
SOLO Review of Probability
Normal (Gaussian) Distribution
Karl Friederich Gauss
1777-1855
( )
( )
σπ
σ
µ
σµ
2
2
exp
,;
2
2





 −
−
=
x
xp
( ) ( )
∫
∞−





 −
−=
x
du
u
xP 2
2
2
exp
2
1
,;
σ
µ
σπ
σµ
( ) µ=xE
( ) σ=xVar
( ) ( )[ ]
( ) ( )






−=





 −
−=
=Φ
∫
∞+
∞−
2
exp
exp
2
exp
2
1
exp
22
2
2
σω
µω
ω
σ
µ
σπ
ωω
j
duuj
u
xjE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
Distribution
Examples
Table of Content
88
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development
( ) ( )
( )
( ) knkknk
pp
knk
n
pp
k
n
nkp
−−
−
−
=−





= 1
!!
!
1,Start with the Binomial Distribution
Use Stirling asymptotical approximation ( ) n
nnnn −≈ exp2! π
( )
( )
( ) ( ) ( ) ( )
( )
knk
knk
knk
n
kn
qn
k
pn
knk
n
qp
knknknkkk
nnn
nkp
−
−
−






−






−
=
−+−−−
−
≈
π
ππ
π
2
exp2exp2
exp2
,
Define ( ) ( ) pnkkkpnk k 1:&1: 00 +−=−=+= δ
( )
( ) ( )
( ) ( )
( ) ( )
qkqk
pnk
qk
pkn
qpn
knk
qp
knk
n
nkp
nkp
k
knk
knk
δ
−=
+−
−=
+−
=
+−−
−
=
− +−−
−
1
1
1
1
1
!
!1!1
!!
!
,1
,
11
( ) ( ) ( )
( ) ( ) ( )nkpnkppnk
nkpnkppnk
,1,1
,1,1
−<⇒+>
−>⇒+<
89
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development (continue – 1)
( )
( )
knk
kn
qn
k
pn
knk
n
nkp
−






−






−
≈
π2
,
or
pnkpnk
n
k
n
−≈≈
>>>> 11
0 & δ
qpn=:σ
( ) ( )
nkk
qn
kn
pn
k
qpn
knk
nkp
−−−





 −











 −
≈
2
1
22
2
1
,
σπ
( )
( ) ( )kk pn
k
qn
kkk
pnqnpnqn
nkp
δδ
δδδδ
σπ
+−−−−−






+





−





+





−≈ 1111
2
1
,
2
1
2
1
2






+−−





++−






−





+=
2
1
2
1
2
11
2
1
kk qn
k
pn
k
qnpn
δδ
δδ
σπ
( ) ( )kk
k
k
qn
k
pn
k
pn
qn qnpn
δδδ
δ
δδ
σπ
−−+−>>+
>>−






−





+≈ 11
2
1
2
2
1
2
1
90
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development (continue – 2)
( )
( ) ( )kk qn
k
pn
k
qnpn
nkp
δδ
δδ
σπ
−−+−






−





+≈ 11
2
1
,
2
( )[ ] ( ) ( )
( )
( ) ( )
2
22
22
2
22
2
2
1ln
2
2
11
2
22
1ln1ln,2ln
2
2
σ
δδ
δδ
δ
δδ
δ
δ
δ
δ
δσπ
σ
k
qpn
k
kk
k
kk
k
x
xx
k
k
k
k
qnpn
qnqn
qn
pnpn
pn
qn
qn
pn
pnnkp
=
−≈+
≈





+≈








+−−







−+≈






−−+





++≈−



from which
( ) 







−≈ 2
2
2
2
exp
2
1
,
σ
δ
σπ
k
nkp
Pierre-Simon Laplace
(1749-1827)
Distribution
Examples
Abraham de Moivre
(1667-1754)
This result was first published by De Moivre in
1756 in “The Doctrine of Chance” 3th Ed. and
reviewed by Laplace, “Théorie Analytiques de
Probabilités”, 1820
Central
Limit
Theorem
91
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development for Generalized Bernoulli Trials
Consider the r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 =≠/=∩
with their sum equal to certain event S: SAAA r =∪∪∪ 21
and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211 
Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp 
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and
Ar, kr times such that nkkk r =+++ 21
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
For n goes to infinity and we havenpnknpn iii +≤≤−
( ) ( )
( ) r
r
r
rr
k
r
kk
r ppn
pn
pnk
pn
pnk
ppp
kkk
n r




1
1
2
1
2
11
21
21 2
2
1
exp
!!!
! 21
−













 −
++
−
−
→
π
92
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development for Generalized PoissonTrials
Consider the r-1 mutually exclusive events A1, A2, …, Ar-1
rjiOAA ji ,,2,1 =≠/=∩
with small probability of occurrence ( ) ( ) ( ) 112211 ,,, −− === rr pAppAppAp 
such that ( ) ( ) ( ) 11:121121 <<−=+++=+++ −− rrr ppppApApAp 
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and
Ar-1, kr-1 times such that rr knkkk −=+++ − :121 
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
For n goes to infinity
( ) ( ) ( ) ( )
!
exp
!
exp
!!!
!
1
11
1
11
21
21
11
21
−
−− −−
→
−
r
r
k
r
k
k
r
kk
r k
pnpn
k
pnpn
ppp
kkk
n r
r


Table of Content
93
SOLO Review of Probability
Laplacian Distribution
Pierre-Simon Laplace
(1749-1827)
( )







 −
−=
b
x
b
bxp
µ
µ exp
2
1
,;
( ) ∫
∞− 






 −
−=
x
du
b
u
b
bxP
µ
µ exp
2
1
,;
( ) µ=xE
( ) 2
2bxVar =
( ) ( )[ ]
( )
( )
22
1
exp
expexp
2
1
exp
b
j
duuj
b
u
b
xjEX
ω
µω
ω
µ
ωω
+
=





 −
−=
=Φ
∫
∞+
∞−
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
Distribution
Examples
Table of Content
94
SOLO Review of Probability
Gama Distribution
( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
( ) θkxE =
( ) 2
θkxVar =
( ) ( )[ ]
( ) k
X
j
xjE
−
−=
=Φ
θω
ωω
1
exp
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( )
( )
( )





<
≥
Γ=
00
0
/,
,;
x
x
k
xk
kxP
θγ
θ
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
( ) ( )∫ −= −
x
a
dtttxa
0
1
exp,γγ is the incomplete gamma function
Distribution
Examples
Table of Content
95
SOLO Review of Probability
Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
( ) ( )
( ) ( )
( )∫
−−
−
ΓΓ
+Γ
=
x
duuuxP
0
11
1,;
βα
βα
βα
βα
( )
βα
α
+
=xE
( )
( ) ( )1
2
+++
=
βαβα
βα
xVar
( ) ( )[ ]
( )
∑ ∏
∞
=
−
=






++
+
+=
=Φ
1
1
0 !
1
exp
k
kk
r
X
k
j
r
r
xjE
ω
βα
α
ωω
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
Distribution
Examples
Beta Distribution
Example Table of Content
96
SOLO Review of Probability
Cauchy Distribution
Augustin Louis Cauchy
)-1789 1857(
( )
( ) 





+−
=













 −
+
= 22
0
2
0
0
1
1
1
,;
γ
γ
π
γ
γπ
γ
xxxx
xxp
( )
2
1
arctan
1
,; 0
0 +




 −
=
γπ
γ
xx
xxP
Probability Density Functions
Cumulative Distribution Function
Mean Value not defined
Variance not defined
Moment Generating Function not defined
Distribution
Examples
97
SOLO Review of Probability
Cauchy Distribution
( )





≤≤−
=Θ
elsewere
p
0
2
1
11
1
θθθ
θθ
Example of Cauchy Distribution Derivation
Assume a particle is leaving the origin, moving
with constant velocity toward a wall situated at a
distance a from the origin. The angle θ, between
particle velocity vector and Ox axis, is a random
variable uniform distributed between – θ1 and + θ1.
Find the probability distribution function of y, the
distance from Ox axis at which the particle hits the
wall. θtanay =
( ) ( )
( )
( )
( )
( )





≤≤−
+=





≤≤−
+== Θ
elsewere
ya
a
elsewere
a
a
d
d
p
ypY
0
2/
0
tan1
2/1
tan
1122
1
112
1
θθθ
θ
θθθ
θ
θ
θ
θ
θ
Therefore we obtain
Functions of
One Random Variable
Table of Content
98
SOLO Review of Probability
Exponential Distribution
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
( ) ( )
( )
( ) ( )
λ
λλ
λλ
λλ
1
expexp
exp
0
0exp
0
=−+−−=
−=
∫
∫
∞
∞
=
−=
∞
dxxxx
dxxxxE
xu
dxxdv
( ) ( ) ( ) 2
22 1
λ
=−= xExExVar
( ) ( )[ ] ( ) ( )
( )[ ]
1
0
0
1exp
expexpexp
−∞
∞






−=−
−
=
−==Φ ∫
λ
ω
λω
λω
λ
λλωωω
j
xj
j
dxxxjxjEX
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( ) ( )
( )



<
≥−−
=−= ∫∞−
00
0exp1
exp;
x
xx
dxxxP
x
λ
λλλ
( ) ( ) 2
0
2
2
22 2
λω ω
=
Φ
=
=
d
d
jxE X
Distributions
examples
Table of Content
99
SOLO Review of Probability
Chi-square Distribution
( )
( )
( )
( )
( )





<
≥−
Γ=
−
00
02/exp
2/
2/1
;
2/2
2/
x
xxx
kkxp
k
k
( ) kxE =
( ) kxVar 2=
( ) ( )[ ]
( ) 2/
21
exp
k
X
j
xjE
−
−=
=Φ
ω
ωω
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( )
( )
( )





<
≥
Γ=
00
0
2/
2/,2/
;
x
x
k
xk
kxP
γ
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
( ) ( )∫ −= −
x
a
dtttxa
0
1
exp,γγ is the incomplete gamma function
Distributions
examples
100
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions
Given k normal random independent variables X1, X2,…,Xk with zero men values and
same variance σ2
, their joint density is given by
( )
( ) ( ) 




 ++
−=






−
= ∏
=
2
22
1
2/
1
2/1
2
2
1
2
exp
2
1
2
2
exp
,,1
σσπσπ
σ k
kk
k
i
i
normal
tindependen
kXX
xx
x
xxp k


Define
Chi-square 0::
22
1
2
≥++== kk
xxy χ
Chi 0:
22
1
≥++= kk
xx χ
( ) 



 +≤++≤=Χ kkkkkk
dxxdp k
χχχχχ
22
1
Pr 
The region in χk space, where pΧk
(χk) is constant, is a hyper-shell of a volume
(A to be defined)
χχ dAVd k 1−
=
( )
( ) 

Vd
kk
kkkkkkkk
dAdxxdp k
χχ
σ
χ
σπ
χχχχχ 1
2
2
2/
22
1
2
exp
2
1
Pr −
Χ 





−=



 +≤++≤=
( )
( ) 





−=
−
Χ 2
2
2/
1
2
exp
2 σ
χ
σπ
χ
χ k
kk
k
k
A
p k
Compute
1x
2x
3x
χ
χdχχπ ddV 2
4=
101
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 1)
( )
( )
( )k
k
kk
k
k U
A
p k
χ
σ
χ
σπ
χ
χ 





−=
−
Χ 2
2
2/
1
2
exp
2
Chi-square 0:
22
1
2
≥++== kk
xxy χ
( ) ( ) ( ) ( ) ( )
( )





<
≥





−
=








−+==
−
Χ
00
0
2
exp
22
1 2
2/1
2/
0
2
2
2
y
y
y
y
y
A
ypyp
d
yd
ypp
k
kk
y
k
Yk kkk
σσπ
χ
χ χχ


A is determined from the condition ( ) 1=∫
∞
∞−
dyypY
( )
( ) ( )
( ) ( )
( )2/
2
12/
222
exp
22
2/
2/2
0
2
2
2
22/
k
Ak
Ay
d
yyA
dyyp
k
k
k
kY
Γ
=→=Γ=











−





= ∫∫
∞
−
∞
∞−
π
πσσσπ
( ) ( )
( )
( )
( )yU
yy
k
kyp
kk
Y 





−





Γ
=
−
2
2/2
2
2/
2
exp
2/
2/1
,;
σσ
σ
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
( ) ( ) ( )
( )
( )k
k
k
k
k
k
k
U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1
( )



<
≥
=
00
01
:
a
a
aU
Function of
One Random
Variable
102
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 2)
Table of Content
Chi-square 0:
22
1
2
≥++== kk
xxy χ
Mean Value { } { } { }2 2 2 2
1k kE E x E x kχ σ= + + =
{ }
( ){ } ( ){ }
4
2 42 2 4
0
1, ,
& 3
th
i
i i
Moment of a
Gauss Distribution
x i i i i
x E x
i k
E x x E x xσ σ σ
 = =

=
 = − = − =


( ){ } ( ){ }
{ } { }
( )
( )
2
4
2 4
2
2 22 2 2 2 2 4 2 2 4
1
2 2 2 4 4 2 2 2 4
1 1 1 1 1
3
2 2 4 4
3 2
k
k
k k i
i
k k k k k
i j i i j
i j i i j
i j
k k
E k E k E x k
E x x k E x E x x k
k k k k k
χ
σ
σ
σ χ σ χ σ σ
σ σ
σ σ
=
= = = = =
≠
−
   
= − = − = −  ÷
   
    
= − = + −  ÷ ÷
    
= + − − =
∑
∑ ∑ ∑ ∑∑

Variance ( ){ }2
22 2 2 4
2
k
kE k kχ
σ χ σ σ= − =
where xi
are gaussian
with
Gauss’ Distribution
103
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 3)
Tail probabilities of the chi-square and normal densities.
The Table presents the points on the chi-square
distribution for a given upper tail probability
{ }xyQ >= Pr
where y = χn
2
and n is the number of degrees
of freedom. This tabulated function is also
known as the complementary distribution.
An alternative way of writing the previous
equation is: { } ( )QxyQ n −=≤=− 1Pr1
2
χ
which indicates that at the left of the point x
the probability mass is 1 – Q. This is
100 (1 – Q) percentile point.
Examples
1. The 95 % probability region for χ2
2
variable
can be taken at the one-sided probability
region (cutting off the 5% upper tail): ( )[ ] [ ]99.5,095.0,0
2
2 =χ
.5 99
2. Or the two-sided probability region (cutting off both 2.5% tails): ( ) ( )[ ] [ ]38.7,05.0975.0,025.0
2
2
2
2 =χχ
.0 51
.0 975 .0 025.0 05
.7 38
3. For χ1002 variable, the two-sided 95% probability region (cutting off both 2.5% tails) is:
( ) ( )[ ] [ ]130,74975.0,025.0
2
100
2
100 =χχ
74
130
104
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 4)
Note the skewedness of the chi-square
distribution: the above two-sided regions are
not symmetric about the corresponding means
{ } nE n =
2
χ
Tail probabilities of the chi-square and normal densities.
For degrees of freedom above 100, the
following approximation of the points on the
chi-square distribution can be used:
( ) ( )[ ]22
121
2
1
1 −+−=− nQQn Gχ
where G ( ) is given in the last line of the Table
and shows the point x on the standard (zero
mean and unity variance) Gaussian distribution
for the same tail probabilities.
In the case Pr { y } = N (y; 0,1) and with
Q = Pr { y>x }, we have x (1-Q) :=G (1-Q)
.5 99.0 51
.0 975 .0 025.0 05
.7 38
Table of Content
105
SOLO Review of Probability
Student’s t-Distribution
( ) ( )[ ]
( ) ( )( ) 2/12
/12/
2/1
; +
+Γ
+Γ
= ν
ννπν
ν
ν
x
xp
( )



=
>
=
1
10
ν
ν
undefined
xE
( )
( )



∞
>−
=
otherwise
xVar
22/ ννν
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function not defined
( ) ( )[ ]
( )
∑
∞
=






−











 +






Γ
+Γ
+=
0
2
!
2
3
2
1
2
1
2/
2/1
2
1
;
n
n
n
nn
n
x
x
xP
ν
ν
ννπ
ν
ν
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
( ) ( ) ( ) ( )121: −+++= naaaaa n 
It get his name after W.S. Gosset that wrote
under pseudonym “Student”
William Sealey
Gosset
1876 - 1937
Distributions
examples
Table of Content
106
SOLO Review of Probability
Uniform Distribution (Continuous)
( )





>>
≤≤
−=
bxxa
bxa
abbaxp
0
1
,;
( )
2
ba
xE
+
=
( ) ( )
12
2
ab
xVar
−
=
( ) ( )[ ]
( ) ( )
( )abj
ajbj
xjE
−
−
=
=Φ
ω
ωω
ωω
expexp
exp
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( )







>
≤≤
−
−
>
=
bx
bxa
ab
ax
xa
baxP
1
0
,;
Distributions
examples
Moments
Table of Content
107
SOLO Review of Probability
Rayleigh Distribution
( ) 2
2
2
2
exp
;
σ
σ
σ






−
=
x
x
xp
( )
2
π
σ=xE
( ) 2
2
4
σ
π−
=xVar
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( ) 





−−= 2
2
2
exp1;
σ
σ
x
xP
( ) ( )








−





−−=Φ jerfi
22
2/exp1 22 σωπ
σωσωω
John William Strutt
Lord Rayleigh
(1842-1919)
Distributions
examples
Moments
Rayleigh Distribution is the chi-distribution with k=2( ) ( ) ( )
( )
( )k
k
k
k
k
k
k U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1
108
SOLO Review of Probability
Rayleigh Distribution
Given X and Y, two independent gaussian random variables, with zero means and the
same variances σ2
Example of Rayleigh Distribution
( ) 




 +
−= 2
22
2
2
exp
2
1
,
σσπ
yx
yxpXY
find the distributions of R and Θ given by: ( )XYYXR /tan& 122 −
=Θ+=
( ) ( )
( ) ( ) θθ
σπ
θ
σ
σπσ
θθ
dprdrp
drdrr
ydxdyx
ydxdyxpdrdrp
r
XYR
Θ
Θ
=





−=





 +
−==
22
2
22
22
22
exp
22
exp,,
where:
( ) πθ
π
θ 20
2
1
≤≤=Θp
( ) 0
2
exp 2
2
2
≥





−= r
rr
rpr
σσ
Uniform Distribution
Rayleigh Distribution
Solution
Table of Content
x
y
r
θ
109
SOLO Review of Probability
Rice Distribution
( ) 










 +
−
= 202
2
22
2
exp
,;
σσ
σ
σ
vx
I
vx
x
vxp
( )
2
π
σ=xE
( ) 2
2
4
σ
π−
=xVar
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
( ) 





−−= 2
2
2
exp1;
σ
σ
x
xP
( ) ( )








−





−−=Φ jerfi
22
2/exp1 22 σωπ
σωσωω
Stuart Arthur Rice
1889 - 1969
Distributions
examples
where:
( ) ( )∫ 





−=




 π
ϕ
σ
ϕ
πσ
2
0
220
'
2
'cos
exp
2
1
d
vxvx
I
110
SOLO Review of Probability
Rice Distribution
The Rice Distribution applies to the statistics of the envelope of the output of a bandpass
filter consisting of signal plus noise.
Example of Rice Distribution
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )[ ] ( ) ( )[ ] ( )tAtntAtn
ttnttntAtnts
SC
SC
00
000
sinsincoscos
sincoscos
ωϕωϕ
ωωϕω
−++=
+++=+
X = nC (t) and Y = nS (t) are gaussian random variables, with zero mean and the same
variances σ2
and φ is the unknown but constant signal phase.
Define the output envelope R and phase Θ:
( )[ ] ( )[ ]
( )[ ] ( )[ ]{ }ϕϕ
ϕϕ
cos/sintan
sincos
1
22
AtnAtn
AtnAtnR
CS
SC
+−=Θ
−++=
−
( ) ( ) ( ) ( )
( )
222
22
22
2
2
2
22
cos
exp
2
exp
22
sin
exp
2
cos
exp,,
σπ
θ
σ
θϕ
σ
σπσ
ϕ
σ
ϕ
θθ
drdrrAAr
ydxdAyAx
ydxdyxpdrdrp XYR





 +
−




 +
−=





 −
−




 +
−==Θ
Solution
( ) ( ) ( ) ( )∫∫ +




 +
−




 +
−== Θ
ππ
θϕ
σ
θϕ
σπσ
θθ
2
0
222
222
0 2
cos
exp
22
exp, d
rArAr
drprp RR
111
SOLO Review of Probability
Rice Distribution
Example of Rice Distribution (continue – 1)
( ) ( ) ( ) ( )∫∫ 





−




 +
−== Θ
ππ
ϕ
σ
ϕ
πσσ
θθ
2
0
22
22
2
2
0
'
2
'cos
exp
2
1
2
exp, d
rAArr
drprp RR
where:
( ) ( )∫ 





−=




 π
ϕ
σ
ϕ
πσ
2
0
220 '
2
'cos
exp
2
1
d
rAAr
I
is the zero-order modified Bessel function of the first kind
( ) 










 +
−= 202
22
2
2
exp,;
σσσ
σ
Ar
I
Arr
ArpR Rice Distribution
Since I0 (0) = 1, if in the Rice Distribution we take A = 0 we obtain:
Rayleigh Distribution( ) 





−== 2
2
2
2
exp,0;
σσ
σ
rr
ArpR
Table of Content
112
SOLO Review of Probability
Weibull Distribution
( )





<
>≥













 −
−




 −
=
−
00
0,,exp
,,;
1
x
x
xx
xp
αγµ
α
µ
α
µ
α
γ
αµγ
γγ
( ) ( )













 −
−−== ∫∞−
γ
α
µ
αµγαµγ
x
dxxpxP
x
exp1,,;,,;
( ) 





+Γ=
γ
α
1
1xE
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
Ernst Hjalmar
Waloddi Weibull
1887 - 1979
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance( ) ( )22 2
1 xExVar −





+Γ=
γ
α
Distributions
examples
Table of Content
113
KINETIC THEORY OF GASESSOLO
MAXWELL’S VELOCITY DISTRIBUTION
IN 1859 MAXWELL PROPOSED THE FOLLOWING MODEL:
ASSUME THAT THE VELOCITY COMPONENTS OF N
MOLECULES, ENCLOSED IN A CUBE WITH SIDE l, ALONG EACH
OF THE THREE COORDINATE AXES ARE INDEPENDENTLY AND
IDENTICALLY DISTRIBUTED ACCORDING TO THE DENSITY f0(α)
= f0(-α), I.E.,
JAMES CLERK
MAXWELL
(1831 – 1879)
( ) ( ) ( ) ( )
( ) ( )[ ] zyx
zyxzzyyxx
vdvdvdvvvvBA
vdvdvdvvfvvfvvfvdvf
00
000000
3
0
exp


−⋅−−=
−−−=
f (Vi) d Vi = THE PROBABILITY THAT THE i VELOCITY
COMPONENTS IS BETWEEN vi AND vi + d vi ; i=x,y,z
MAXWELL ASSUMED THAT THE DISTRIBUTION DEPENDS
ONLY ON THE MAGNITUDE OF THE VELOCITY.
114
KINETIC THEORY OF GASESSOLO
MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
SINCE THE DEFINITION OF THE TOTAL NUMBER OF PARTICLES N IS:
( )∫ ∫= tvrfvdrdN ,,33 
WE HAVE IN EQUILIBRIUM
( ) ( )[ ]
( ) ( ) ( ) 2
3
222
222
0
3
expexpexp
exp






=−−−=
++−==
∫∫∫
∫ ∫ ∫∫
∞
∞−
∞
∞−
∞
∞−
∞
∞−
∞
∞−
∞
∞−
B
AdvvBdvvBdvvBA
dvdvdvvvvBAvfvd
V
N
zzyyxx
zyxxxx
π

WHERE V IS THE VOLUME OF THE CONTAINER
∫= rdV 3
IT FOLLOWS THAT B > 0 AND
V
NB
A
2
3






=
π
LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2
00 exp vvBAvf

−−=0v

115
KINETIC THEORY OF GASESSOLO
MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2
00 exp vvBAvf

−−=0v

THE AVERAGE VELOCITY IS GIVEN BY:
( )
( )
( ) ( )[ ]
( ) [ ] 00
3
00
3
0
3
0
3
exp
exp
vvvBvvvd
N
VA
vvvvBvvd
N
VA
vfvd
vfvvd
v





=⋅−+=
−⋅−−==
∫
∫
∫
∫
THE AVERAGE KINEMATIC ENERGY OF THE MOLECULES ε WHEN IS00

=v
( )
( )
( ) B
m
vBvvd
N
VAm
vfvd
vfvmvd
4
3
exp
2
2
1
223
0
3
0
23
=−== ∫
∫
∫


ε
WE FOUND ALSO THAT FOR A MONOATOMIC GAS Tk
2
3
=ε
Tk
mm
B
24
3
==
ε V
N
Tk
m
V
NB
A
2
3
2
3
2 





=





=
ππ
THEREFORE
116
KINETIC THEORY OF GASESSOLO
MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
MAXWELL VELOCITY DISTRIBUTION BECOMES
( ) 





⋅−





= vv
Tk
m
Tk
m
V
N
vf

2
exp
2
2
3
0
π
( ) ( ) ( ) ( )
( ) zyxzyx
zyxzyx
vdvdvdvvv
Tk
m
Tk
m
V
N
vdvdvdvfvfvfvdvf






++−





=
=
222
2/3
0
2
exp
2π
3
OR
117
KINETIC THEORY OF GASESSOLO
MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
( ) 





⋅−





= vv
Tk
m
Tk
m
V
N
vf

2
exp
2
2
3
0
π
Table of Content
Maxwell’s Distribution is the chi-distribution with k=3( ) ( ) ( )
( )
( )k
k
k
k
k
k
k
U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1
118
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOLTZMANN STATISTICS
• DISTINGUISHABLE PARTICLES
• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
BOSE-EINSTEIN STATISTICS
• INDISTINGUISHABLE PARTICLES
• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
FERMI-DIRAC STATISTICS
• INDISTINGUISHABLE PARTICLES
• ON PARTICLE PER QUANTUM STATE.
LUDWIG BOLTZMANN
SATYENDRANATH N. BOSE ALBERT EINSTEIN
ENRICO FERMI PAUL A.M. DIRAC
∏ 







=
j j
N
j
N
g
Nw
j
!
!
( )
( )∏ 







−
−+
=
j jjj
jj
Ng
Ng
w
!!1
!1
( )∏ 







−
=
j jjjj
j
NNg
g
w
!!
!
∑=
j
jNN ∑=
j
jj NE 'ε
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
Table of Content
119
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOLTZMANN STATISTICS
• DISTINGUISHABLE PARTICLES
• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
LUDWIG BOLTZMANN
∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
NUMBER OF WAYS N DISTINGUISHABLE
PARTICLES CAN BE DIVIDED IN GROUPS
WITH N1, N2,…,Nj,…PARTICLES IS
∑=
j
jNN
∏j
jN
N
!
!
NUMBER OF WAYS Nj PARTICLES CAN BE PLASED IN THE gj STATES IS
jN
jg
A MACROSTATE IS DEFINED BY
- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES
N1,N2,…Nj
IN STATES g1,g2, …,gj
j',,',' 21 εεε 
120
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
BOLTZMANN STATISTICS ∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
USING STIRLING FORMULA
0'' ==⇒= ∑∑ EdNdNE
j
jj
j
jj εε
( ) aaaa −≈ ln!ln
( ) ( )∑∑ /+−+/−≈−+=
j
jjjjj
STIRLING
j
jjj NNNgNNNNNgNNw lnlnln!lnln!lnln
( ) ( ) 0lnlnln =−−= ∑j
jjjjj NdNNdgNdwd
0==⇒= ∑∑ NdNdNN
j
j
j
j
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST
COMPUTE THE DIFFERENTIAL
CONSTRAINTED BY:
121
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
BOLTZMANN STATISTICS
(CONTINUE) ∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
0' =∑j
jj Ndεβ
( ) 0lnln =








=− ∑j
j
j
j
Nd
g
N
wd
0=∑j
jNdα
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'
*
ln0'ln =++








⇒=








++








∑ j
j
j
j
jj
j
j
g
N
Nd
g
N
εβαεβα
βα,
TO OBTAIN
OR j
eegN jBoltzj
'
*
εβα −−
=
BOLTZMANN
MOST PROBABLE MACROSTATE
Table of Content
122
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOSE-EINSTEIN STATISTICS
• INDISTINGUISHABLE PARTICLES
• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
∑=
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED
IN THE gj STATES IS
( )
( ) !!1
!1
jj
jj
Ng
Ng
−
−+
A MACROSTATE IS DEFINED BY
- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES
N1,N2,…Nj
IN STATES g1,g2, …,gj
SATYENDRANATH N. BOSE
(1894-1974)
ALBERT EINSTEIN
(1879-1955)
j',,',' 21 εεε 
( )
( )∏ −
−+
=−
j jj
jj
EB
Ng
Ng
w
!!1
!1
123
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
BOSE-EINSTEIN STATISTICS
(CONTINUE)
USING STIRLING FORMULA ( ) aaaa −≈ ln!ln
( )[ ] ( ) ( ) ( )[ ]
( )∑
∑∑
















+++=
/+−/+−/+/−++≈−−+=
j j
j
jjjj
j
jjjjjjjjjjjj
STIRLING
j
jjjj
g
N
gNgN
NNNggggNgNgNNggNw
1ln/1ln
lnlnln!ln!ln!ln
( ) 01ln
1
1
1
1lnln
2
=








+=














+
/+
+
−
/+








+= ∑∑ j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
Nd
N
g
Nd
g
N
g
g
N
g
N
g
N
N
g
wd
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST
COMPUTE THE DIFFERENTIAL
( )
( )
( )
∏∏
+
≈
−
−+
=−
j jj
jj
j jj
jj
EB
Ng
Ng
Ng
Ng
w
!!
!
!!1
!1
124
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
BOSE-EINSTEIN STATISTICS
(CONTINUE)
0' =∑j
jj Ndεβ0=∑j
jNdα
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'
*
1ln0'1ln =−−








+⇒=








−−








+∑ j
j
j
j
jj
j
j
N
g
Nd
N
g
εβαεβα
βα,
TO OBTAIN
OR
1
* '
−
=− j
ee
g
N
j
EBj εβα
BOSE-EINSTEIN
MOST PROBABLE MACROSTATE
( )
( )
( )
∏∏
+
≈
−
−+
=−
j jj
jj
j jj
jj
EB
Ng
Ng
Ng
Ng
w
!!
!
!!1
!1
( ) 01lnln =








+= ∑j
j
j
j
Nd
N
g
wd
Table of Content
125
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
FERMI-DIRAC STATISTICS
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
∑=
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED
IN THE gj STATES IS ( ) !!
!
jjj
j
NNg
g
−
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!
• INDISTINGUISHABLE PARTICLES
• ON PARTICLE PER QUANTUM STATE.
ENRICO FERMI
(1901-1954)
PAUL A.M. DIRAC
(1902-1984)
A MACROSTATE IS DEFINED BY
- QUANTUM STATES g1,g2,…,gj AT THE
ENERGY LEVELS
- NUMBER OF PARTICLES N1,N2,…Nj AT
THE ENERGY LEVELS
IN STATES g1,g2, …,gj
j',,',' 21 εεε 
j',,',' 21 εεε 
126
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC STATISTICS
(CONTINUE)
USING STIRLING FORMULA ( ) aaaa −≈ ln!ln
( )[ ] ( ) ( ) ( )[ ]
( ) ( )[ ]∑
∑∑
−−−−=
/+−/−/+−−−/−≈−−−=
j
jjjjjjjj
j
jjjjjjjjjjjj
STIRLING
j
jjjj
NNNgNggg
NNNNgNgNggggNNggw
lnlnln
lnlnln!ln!ln!lnln
( ) ( )[ ] 0lnlnlnln =







 −
=−−−+= ∑∑ j
j
j
jj
j
jjjjjjj Nd
N
Ng
NdNNdNdNgNdwd
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST
COMPUTE THE DIFFERENTIAL
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!
127
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC STATISTICS
(CONTINUE)
0' =∑j
jj Ndεβ0=∑j
jNdα
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'
*
*
ln0'ln =−−







 −
⇒=








−−







 −
∑ j
j
jj
j
jj
j
jj
N
Ng
Nd
N
Ng
εβαεβα
βα,
TO OBTAIN
OR
1
* '
+
=− j
ee
g
N
j
DFj εβα
FERMI-DIRAC
MOST PROBABLE MACROSTATE
( ) 0lnln =







 −
= ∑j
j
j
jj
Nd
N
Ng
wd
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!
128
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC
STATISTICS
OR
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!( )
( )∏ −
−+
=−
j jj
jj
EB
Ng
Ng
w
!!1
!1
BOSE-EINSTEIN
STATISTICS
∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
BOLTZMANN
STATISTICS
FOR GASES AT LOW PRESSURES OR HIGH TEMPERATURE THE NUMBER
OF QUANTUM STATES gj AVAILABLE AT ANY LEVEL IS MUCH LARGER
THAN THE NUMBER OF PARTICLES IN THAT LEVEL Nj.
jj Ng >>
( )
( ) ( )( ) ( ) j
jj
N
j
Ng
jjjjj
j
jj
gNgggg
g
Ng >>
≈−+++=
−
−+
121
!1
!1

( ) ( ) ( ) j
jj
N
j
Ng
jjjj
jj
j
gNggg
Ng
g >>
≈+−−=
−
11
!
!

∏ 







=≈≈
>>
−
>>
−
j j
N
jBoltz
Ng
DF
Ng
EB
N
g
N
w
ww
jjjjj
!!
AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj
'
***
εβα −−
>>
−
>>
− =≈≈
129
KINETIC THEORY OF GASESSOLO
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
∏ 







=≈≈
>>
−
>>
−
j j
N
jBoltz
Ng
DF
Ng
EB
N
g
N
w
ww
jjjjj
!!
AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj
'
***
εβα −−
>>
−
>>
− =≈≈
DIVIDING THE VALUE OF w FOR BOLTZMANN STATISTICS, WHICH
ASSUMED DISTINGUISHABLE PARTICLES, BY N! HAS THE EFFECT OF
DISCOUNTING THE DISTINGUISHABILITY OF THE N PARTICLES.
Table of Content
130
SOLO Review of Probability
Monte Carlo Method
Monte Carlo methods are a class of computational algorithms that
rely on repeated random sampling to compute their results. Monte
Carlo methods are often used when simulating physical and
mathematical systems. Because of their reliance on repeated
computation and random or pseudo-random numbers, Monte Carlo
methods are most suited to calculation by a computer. Monte Carlo
methods tend to be used when it is infeasible or impossible to
compute an exact result with a deterministic algorithm.
The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam,
Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear
weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo
Casino in Monaco where Ulam's uncle would borrow money to gamble)
Stanislaw Ulam
1909 - 1984
Enrico - Fermi
1901 - 1954
John von Neumann
1903 - 1957
Monte Carlo Casino
Nicholas Constantine Metropolis
(1915 –1999)
131
SOLO Review of Probability
Monte Carlo Approximation
Monte Carlo runs, generate a set of random samples that approximate the distribution p (x).
So, with P samples, expectations with respect to the filtering distribution are approximated by
( ) ( ) ( )
( )∑∫ =
≈
P
L
L
xf
P
dxxpxf
1
1
and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution
up to some degree of approximation.
{ } ( ) ( )
∑∫ =
≈==
P
L
L
x
P
dxxpxxE
1
1
1
µ
( ){ } ( ) ( ) ( )
( )∑∫ =
−≈−=−=
P
L
nLnn
n x
P
dxxpxxE
1
111
1
µµµµ

Table of Content
x(L)
are generated (draw) samples from distribution p (x)
( )
( )xpx L
~
132
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (Unknown Statistics)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,
and sample variance, , as estimates of the population mean, m, and variance, σ2
.
2
ˆkσ
kmˆ
( )
{ }
( ) ( ) ( )[ ] ( ) ( )[ ]
2
1
2
1
222
2
22222
1 11
2
1
2
2
11
2
1
2
11
1
1
1
1
1
21
11
2
1
ˆˆ2
1
ˆ
1
σσ
σσσ
k
k
kk
mkmkk
k
mmk
k
m
k
xx
k
Ex
k
xExE
k
mxmxE
k
mx
k
E
k
i
k
i
k
i
k
l
l
k
j
j
k
j
jii
k
k
i
ik
k
i
i
k
i
ki
−
=





−=






++−+++−−+=














+






−=






+−=






−
∑
∑
∑ ∑∑∑
∑∑∑
=
=
= ===
===
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Compute
Biased
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
133
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 1)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,
and sample variance, , as estimates of the population mean, m, and variance, σ2
.
2
ˆkσ
kmˆ
( ) 2
1
2 1
ˆ
1
σ
k
k
mx
k
E
k
i
ki
−
=






−∑=
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Biased
Unbiased
Therefore, the unbiased estimation of the sample variance of the population is defined as:
( )∑=
−
−
=
k
i
kik mx
k 1
22
ˆ
1
1
:ˆσ since { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
134
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 2)
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,
and sample variance, , as estimates of the population mean, m, and variance, σ2
.
2
ˆkσ
kmˆ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
135
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 3)
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EEWe found:
Let Compute:
( ){ } ( )
( ){ } ( ) ( ){ }
( ){ } ( ){ } ( ){ }
k
mxEmxEmxE
k
mxmxEmxE
k
mx
k
Emx
k
EmmE
k
i
k
ij
j
ji
k
i
i
k
i
k
ij
j
ji
k
i
i
k
i
i
k
i
ikmk
2
1 1
00
1
2
2
1 11
2
2
2
1
2
1
22
ˆ
2
1
1
11
ˆ:
σ
σ
σ
=










−−+−=










−−+−=














−=














−=−=
∑ ∑∑
∑∑∑
∑∑
=
≠
==
=
≠
==
==

( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=
136
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 4)
Let Compute:
( ){ } ( ) ( )
( ) ( ) ( ) ( )[ ]
( ) ( ) ( ) ( )














−−
−
+−
−
−
+−
−
=














−−+−−+−
−
=














−−+−
−
=














−−
−
=−=
∑∑
∑
∑∑
==
=
==
2
22
11
2
2
2
1
22
2
2
1
2
2
2
1
22222
ˆ
ˆ
11
ˆ2
1
1
ˆˆ2
1
1
ˆ
1
1
ˆ
1
1
ˆ:2
σ
σ
σσσσσσ
k
k
i
i
k
k
i
i
k
i
kkii
k
i
ki
k
i
kik
mm
k
k
mx
k
mm
mx
k
E
mmmmmxmx
k
E
mmmx
k
Emx
k
EE
k
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−
137
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 4)
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−
( )
( )
( ) ( ) ( )
( ){ }
( ) ( ) ( ) ( ) ( ) ( )
( ){ }4
2
2
4
22
4
44
2
4
44
2
2
2
4
2
4
2
42
ˆ
ˆ
11
7
11
2
1
2
1
2
ˆ
11
4
1
1
1
2
k
k
mmE
k
k
k
k
k
k
kk
k
k
k
mmE
k
k
kk
kk
k
k
k
−
−
+
−
+−
+
−
=
−
−
−
−
−
+
+−
−
+
−
+
−
−
+
−
≈
σ
µσσσ
σ
σσµ
σσ
kk
4
42
ˆ 2
σµ
σσ
−
≈ ( ){ }4
4 : mxE i −=µ
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==
138
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 5)
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
We found:
( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=
( ){ } ( )
k
mx
k
EE
k
i
kik
k
4
4
2
2
1
22222
ˆ
ˆ
1
1
ˆ:2
σµ
σσσσσ
−
≈














−−
−
=−= ∑=
( ){ }4
4 : mxE i −=µ
Kurtosis of random variable xi
Define
4
4
:
σ
µ
λ =
( ){ } ( ) ( )
k
mx
k
EE
k
i
kik
k
42
2
1
22222
ˆ
1
ˆ
1
1
ˆ:2
σλ
σσσσσ
−
≈














−−
−
=−= ∑=
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 6)
[ ] ϕσσσ σσ =≤≤
2
ˆ
2
k
2
k
ˆ-0Prob n
For high values of k, according to the Central Limit Theorem the estimations of mean
and of variance are approximately Gaussian Random Variables.
kmˆ
2
ˆkσ
We want to find a region around that
will contain σ2
with a predefined probability
φ as function of the number of iterations k.
2
ˆkσ
Since are approximately Gaussian Random
Variables nσ is given by solving:
2
ˆkσ
ϕζζ
π
σ
σ
=





−∫
+
−
n
n
d2
2
1
exp
2
1
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
Cumulative Probability within nσ
Standard Deviation of the Mean for a
Gaussian Random Variable
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
( ) ( ) ( ) ( )( )42222
1,0;ˆ~ˆ&,0;ˆ~ˆ σλσσσσ −−− kkkk kmmmk NN
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 7)
[ ] ϕσσσ σσ =≤≤
2
ˆ
2
k
2
k
ˆ-0Prob n
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
22
ˆ
1
2
k
σ
λ
σσ
k
−
=
22
k
2 1
1ˆ
1
1 σ
λ
σσ
λ
σσ 






 −
−≥≥






 −
+
k
n
k
n







 −
−
≥≥







 −
+
k
n
k
n
1
1
ˆ
1
1
2
2
k
2
λ
σ
σ
λ
σ
σσ
k
n
k
n
1
1
:ˆ:
1
1
k
−
−
=≥≥=
−
+
λ
σ
σσσ
λ
σ
σσ
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 8)
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 9)
143
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 10)
k
n
k
n
kk 1ˆ
1
:&
1ˆ
1
:
00
−
−
=
−
+
=
λ
σ
σ
λ
σ
σ
σσ
Monte-Carlo Procedure
Choose the Confidence Level φ and find the corresponding nσ
using the normal (Gaussian) distribution.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
Run a few sample k0 > 20 and estimate λ according to2
( )
( )
2
1
2
0
1
4
0
0
0
0
0
0
ˆ
1
ˆ
1
:ˆ






−
−
=
∑
∑
=
=
k
i
ki
k
i
ki
k
mx
k
mx
k
λ∑=
=
0
0
10
1
:ˆ
k
i
ik x
k
m
3 Compute and as function of kσ σ
4 Find k for which
[ ] ϕσσσ σσ =≤≤
2
ˆ
2
k
2
k
ˆ-0Prob n
5 Run k-k0 simulations
144
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue – 11)
Monte-Carlo Procedure
Choose the Confidence Level φ = 95% that gives the
corresponding nσ=1.96.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
The kurtosis λ = 32
3 Find k for which ϕσ
λ
σσ
σ
σ =












−
≤≤

2
kˆ
22
k
2 1
ˆ-0Prob
k
n
4 Run k>800 simulations
Example:
Assume a Gaussian distribution λ = 3
95.0
2
96.1ˆ-0Prob
2
kˆ
22
k
2
=












≤≤

σ
σσσ
k
Assume also that we require also that with probability φ = 95 %22
k
2
1.0ˆ- σσσ ≤
1.0
2
96.1 =
k
800≈k
145
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 12)
Kurtosis of random variable xi
Kurtosis
Kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure
of the "peakedness" of the probability distribution of a real-valued random variable.
Higher kurtosis means more of the variance is due to infrequent extreme deviations, as
opposed to frequent modestly-sized deviations.
1905 Pearson defines Kurtosis,
as a measure of departure from normality in a paper published in
Biometrika. λ=3 for the normal distribution and the terms
‘leptokurtic’ (λ>3), mesokurtic (λ=3), platikurtic (λ<3) are
introduced.
( ){ } ( ){ }[ ]224
/: mxEmxE ii −−=λ
( ){ }
( ){ }[ ]22
4
:
mxE
mxE
i
i
−
−
=λ
Karl Pearson
(1857 –1936)
A leptokurtic distribution has a more acute "peak" around the mean (that is,
a higher probability than a normally distributed variable of values near the
mean) and "fat tails" (that is, a higher probability than a normally distributed
variable of extreme values).
A platykurtic distribution has a smaller "peak" around the mean (that is, a lower
probability than a normally distributed variable of values near the mean) and
"thin tails" (that is, a lower probability than a normally distributed variable of
extreme values).
146
Hyperbolic-Secant
25






x
2
sech
2
1 π
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 13)
Distribution Graphical
Representation
Functional
Representation
Kurtosis
λ
Excess
Kurtosis
λ-3
Normal
( )
σπ
σ
µ
2
2
exp 2
2





 −
−
x
3 0
Laplace 






 −
−
b
x
b
µ
exp
2
1
6 3
Uniform
bxorxa
bxa
ab
>>
≤≤
−
0
1
1.8 -1.2
Wigner
Rx
RxxR
R
>
≤−
0
2 22
2
π -1.02
147
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable (continue - 14)
Skewness of random variable xi
Skewness
( ){ }
( ){ }[ ] 2/32
3
:
mxE
mxE
i
i
−
−
=γ Karl Pearson
(1857 –1936)
Negative skew: The left tail is longer; the mass of the distribution is concentrated on
the right of the figure. The distribution is said to be left-skewed.
1
Positive skew: The right tail is longer; the mass of the distribution is concentrated
on the left of the figure. The distribution is said to be right-skewed.
2
More data in the left tail than
it would be expected in a
normal distribution
More data in the right tail than
it would be expected in a
normal distribution
Karl Pearson suggested two simpler calculations as a measure of skewness:
• (mean - mode) / standard deviation
• 3 (mean - median) / standard deviation
148
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable using a
Recursive Filter (Unknown Statistics)
We found that using k measurements the estimated mean and variance are given in
batch form by:
∑=
=
k
i
ik x
k
x
1
1
:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,
and the variance pk, by a Recursive Filter
kxˆ
The k+1 measurement will give:
( )1
1
1
1
ˆ
1
1
1
1
ˆ +
+
=
+ +
+
=
+
= ∑ kk
k
i
ik xxk
k
x
k
x
( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++
Therefore the Recursive Filter form for the
k+1 measurement will be:
( )∑=
−
−
=
k
i
kik xx
k
p
1
2
ˆ
1
1
:
( )∑
+
=
++ −=
1
1
2
11
ˆ
1 k
i
kik xx
k
p
149
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable using a
Recursive Filter (Unknown Statistics) (continue – 1)
We found that using k+1 measurements the estimated variance is given in
batch form by:
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,
and the variance pk, by a Recursive Filter
kxˆ
( ) 


 +
−−
+
+= ++ kkkkk p
k
k
xx
k
pp
1
ˆ
1
1 2
11
( )
( )
( )
( )
( )
( ) ( ) ( )
( )
( ) ( )2
1
2
12
1
0
1
1
2
1
1
1
2
1
1
2
1
1
1
2
11
ˆ
1
11
1ˆ
1
1
ˆˆˆ
1
2
ˆˆ
1
1
ˆ
ˆ
1
ˆ
1
kkkkk
kk
k
i
kikkkk
pk
k
i
ki
k
i
kk
ki
k
i
kik
xx
k
p
k
xx
kk
k
xxxxxx
kk
xxxx
k
k
xx
xx
k
xx
k
p
k
−
+
+





−=−
+
+
+










−+−−
+
−












−+−=






+
−
−−=−=
++
+
=
++
−
=
+
=
+
+
=
++
∑∑
∑∑

( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++
150
SOLO Review of Probability
Estimation of the Mean and Variance of a Random Variable using a
Recursive Filter (Unknown Statistics) (continue – 2)
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,
and the variance pk, by a Recursive Filter
kxˆ
( ) 


 +
−−
+
+= ++ kkkkk p
k
k
xx
k
pp
1
ˆ
1
1 2
11
( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++
( ) ( ) ( )kkkk xxkxx ˆˆ1ˆ 11 −+=− ++
( )( ) 



−−++= ++ kkkkk p
k
xxkpp
1
ˆˆ1
2
11
151
SOLO Review of Probability
Estimate the value of a constant x, given discrete measurements of x corrupted by an
uncorrelated gaussian noise sequence with zero mean and variance r0.
The scalar equations describing this situation are:
kk xx =+1
kkk vxz +=
System
Measurement ( )0,0~ rNvk
The Discrete Kalman Filter is given by:
( ) ( )+=−+ kk xx ˆˆ 1
( ) ( ) ( ) ( )[ ] ( )[ ]−−+−−+−=+ ++
−
++++
+
11
1
01111
ˆˆˆ
1
kk
K
kkkk xzrppxx
k
  
 
0
1 kkk
I
kk wxx Γ+Φ=+
 kk
I
kk vxHz +=
( ) ( )[ ] ( )[ ]{ } 
( )
( )+=ΓΓ+Φ+Φ=−−−−=− +++++ k
T
I
T
kk
I
k
T
kkkkk pQpxxxxEp

0
11111
ˆˆ
( ) ( )[ ] ( )[ ]{ }
( ) ( )  
( )  
( )
( ) ( ) ( )
( ) 0
0
11
1
0111111
11111
1
1
ˆˆ
rp
pr
pHrHpHHpp
xxxxEp
k
k
pp
k
I
k
K
T
I
kk
I
k
T
I
kkk
T
kkkkk
kk
k
++
+
=−








+−−−−=
−+−+=+
+=−
++
−
++++++
+++++
+
+
  
General Form
with Known Statistics Moments Using a Discrete Recursive Filter
Estimation of the Mean and Variance of a Random Variable
152
SOLO Review of Probability
Estimate the value of a constant x, given discrete measurements of x corrupted by an
uncorrelated gaussian noise sequence with zero mean and variance r0.
We found that the Discrete Kalman Filter is given by:
( ) ( ) ( )[ ]+−++=+ +++ kkkkk xzKxx ˆˆˆ 111
( ) ( )
( )
( )
( )
0
0
0
1
1
r
p
p
rp
pr
p
k
k
k
k
k
+
+
+
=
++
+
=++
( )
0
0
0
1
1
r
p
p
p
+
=+ ( ) ( )
( )
0
1
1
2
1
r
p
p
p
+
+
+
=+ ( )
k
r
p
p
pk
0
0
0
1+
=+
( )
( ) 0
1
rp
p
K
k
k
k
++
+
=+
( )
( ) 0
1
rp
p
K
k
k
k
++
+
=+( ) ( )
( )
( )[ ]+−
++
++=+ ++ kkkk xz
k
r
p
r
p
xx ˆ
11
ˆˆ 1
0
0
0
0
1
0=k
1=k
0
0
0
2
1
r
p
p
+
=
( )11
1
1
0
0
0
0
0
0
0
0
0
0
0
++
=
+
+
+
=
k
r
p
r
p
r
k
r
p
p
k
r
p
p
with Known Statistics Moments Using a Discrete Recursive Filter (continue – 1)
Estimation of the Mean and Variance of a Random Variable
153
SOLO Review of Probability
Estimate the value of a constant x, given continuous measurements of x corrupted by an
uncorrelated gaussian noise sequence with zero mean and variance r0.
The scalar equations describing this situation are:
0=x
vxz +=
System
Measurement ( )rNv ,0~
The Continuous Kalman Filter is given by:
( )  ( ) ( ) ( ) ( )[ ] ( ) 00ˆ&ˆˆˆ
1
1
0
=−





+=
+
−
xtxtzrHtptxAtx
kK
I


 
00
wxAx Γ+=
 vxHz
I
+=
( ) ( ) ( )[ ] ( ) ( )[ ]{ }T
txtxtxtxEtp −−= ˆˆ:
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 12
1
1
000
−−
−=−++= rtptptHrtHtptGQtGtAtptptAtp TT
I
TT


General Form
with Known Statistics Moments Using a Continuous Recursive Filter
Estimation of the Mean and Variance of a Random Variable
( ) ( ) ( ) 0
12
0& ptprtptp ==−= −
or:
∫∫ −=
tp
p
dt
rp
pd
0
2
0
1 ( )
t
r
p
p
tp
0
0
1+
=
( )
t
r
p
r
p
rtpK
0
0
1
1+
== −
( ) ( )[ ]txz
t
r
p
r
p
tx ˆ
1
ˆ
0
0
−
+
=
154
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
• First attempts to generate “random numbers”:
- Draw balls out of a stirred urn
- Roll dice
• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from
census reports.
• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to
generate random numbers. They published a table of 100,000 digits.
• 1946: J. Von Neumann proposed the “middle square method”.
• 1948: D.H. Lehmer introduced the “linear congruential method”.
• 1955: RAND Corporation published a table of 1,000,000 random digits obtained
from electronic noise.
• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruential
generators.
• 1989: R.S. Wikramaratna proposed the additive congruential method.
Routine RANDU (IBM Corp)
“We guarantee that each number is random individually, but we don’t guarantee
that more than one of them is random”
155
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
On a computer the “random numbers” are not random at all – they are strictly
deterministic and reproducible, but they look like a stream of random numbers.
For this reason the computer programs are called “Pseudo-Random Number Generators”.
Essential Properties of a Pseudo-Random Number Generator
Repeatability – the same sequence should be produced with the same initial values
(or seeds)
Randomness – should produce independent uniformly distributed random variables
that passes all statistical tests for randomness.
Long Period – a pseudo-random number sequence uses finite precision arithmetic,
so the sequence must repeat itself with a finite period. This should be
much longer than the amount of random numbers needed for simulation.
Insensitive to seeds – period and randomness properties should not depend on the
initial seeds.
156
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
Essential Properties of a Pseudo-Random Number Generator (continue -1)
Portability – should give the same results on different computers
Efficiency – should be fast (small number of floating point operations) and not use
much memory.
Disjoint subsequences – different seeds should produce long independent (disjoint)
subsequences so that there are no correlations between simulations
with different initial seeds.
Homogeneity – sequences of all bits should be random.
157
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
A Random Number represents the value of a random variable uniform distributed on (0,1).
Pseudo-Random Numbers constitute a sequence of values, which although are
deterministically generated, have all the appearances of being independent uniform
distributed on (0,1).
One approach
1. Define x0 = integer initial condition or seed
2. Using integers a and m recursively compute
mxax nn modulo1−= mxIntegerxkmaxmkxa nnn <∈+⋅=− ,,,1
Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-random
number is an approximation to the value of uniform (0,1) random variable.
In general the integers a and m should be chose to satisfy three criteria:
1. For any initial seed, the resultant sequence has the “appearance” of being a sequence
of independent (0,1) random variables.
For any initial seed, the number of variables that can be generated before repetition
begins is large.
The values can be computed efficiently on a digital computer.
Multiplicative congruential method
158
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators (continue – 1)
A gudeline is to choose m to be a large prime number compared to the computer word size.
Examples:
32 bits word computer: (some IBM systems)807,16712 531
==−= am
125,35312 535
==−= am36 bits word computer:
Another generator of pseudo-random numbers uses recursions of the type:
( ) mcxax nn modulo1 += −
mxIntegerxkmcaxmkcxa nnn <∈+⋅=+− ,,,,1
Mixed congruential method
32 bits word computer: (VAX)069,69232
== am
32 bits word computer: (transputers)525,664,1232
== am
48 bits word computer: (UNIX, RAND 48 routine)1616
48
6652 BcDDEECEam ===
48 bits word computer: (CDC vector machine)052 1547
=== cam
48 bits word computer: (Cray vector machine)01757228752 16
48
=== cBEAam
64 bits word computer: (Numerical Algorithms Group)0132 1359
=== cam
Return to Table of Content
159
SOLO Review of Probability
Generating Discrete Random Variables
Histograms
Return to Table of Content
A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what
proportion of cases fall into each of several categories: it is a form of data binning. The categories
are usually specified as non-overlapping intervals of some variable. The categories (bars) must be
adjacent. The intervals (or bands, or bins) are generally of the same size.
Histograms are used to plot density of data, and often for density estimation: estimating the
probability density function of the underlying variable. The total area of a histogram always
equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a
relative frequency plot.
A cumulative histogram is a mapping that counts the
cumulative number of observations in all of the bins
up to the specified bin. That is, the cumulative
histogram Mi of a histogram mi is defined as:
An ordinary and a cumulative
histogram of the same data. The
data shown is a random sample of
10,000 points from a normal
distribution with a mean of 0 and
a standard deviation of 1.
Mathematical Definition
∑=
=
k
i
imn
1
In a more general mathematical sense, a histogram is
a mapping mi that counts the number of observations
that fall into various disjoint categories (known as
bins), whereas the graph of a histogram is merely one
way to represent a histogram. Thus, if we let n be the
total number of observations and k be the total number
of bins, the histogram mi meets the following
conditions:
∑=
=
i
j
ji mM
1
160
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method
Suppose we want to generate a discrete random variable X
having probability density function:
( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
To accomplish this, let generate a random number U that is uniformly distributed
over (0,1) and set:











<≤
+<≤
<
=
∑∑ =
−
=


j
i
i
j
i
ij pUpifx
ppUpifx
pUifx
X
1
1
1
1001
00
j
j
i
i
j
i
ij ppUpPxXP =






<<== ∑∑ =
−
= 1
1
1
)(
Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed
P (a ≤ U < b) = b-a, we have:
and so X has the desired distribution.
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 1)
Suppose we want to generate a discrete random variable X
having probability density function: ( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
Draw X, N times,
from p (x)
Histogram of the
Results
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 2)
Generating a Poisson Random Variable: 1,1,0
!
)( ===== ∑−
i
i
i
i pi
i
eiXPp 
λλ
( )
1
!
!1
1
1
+
=
+
=
−
+
−
+
i
i
e
i
e
p
p
i
i
i
i λ
λ
λ
λ
λ
Draw X , N times, from
Poisson Distribution
Histogram of the Results
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 3)
Generating Binominal Random Variable:
( )
( ) 1,1,01
!!
!
)( ==−
−
=== ∑−
i
i
ini
i pipp
ini
n
iXPp 
( ) ( )
( )
( )
( ) p
p
i
in
pp
ini
n
pp
ini
n
p
p
ini
ini
i
i
−+
−
=
−
−
−
−−+
=
−
−−+
+
111
!!
!
1
!1!1
! 11
1
Return to Table of Content
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
Histogram of the Results
164
SOLO Review of Probability
Generating Discrete Random Variables
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having a
probability density function { qj, j ≥0 }. We want to use this to obtain a random
variable that has the probability density function { pj, j ≥0 }.
Let c be a constant such that: 0.. ≠∀≤ j
j
j
qtsjc
q
p
If such a c exists, it must satisfy: cqcp
j
j
j
j ≤⇒≤ ∑∑ 1
11

Rejection Method
Step 1: Simulate the value of Y, having probability density function qj.
Step 2: Generate a random number U (that is uniformly distributed
over (0,1) ).
Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.
165
SOLO Review of Probability
Generating Discrete Random Variables
The Acceptance-Rejection Technique (continue – 1)
Theorem
The random variable X obtained by the rejection method has probability density
function P { X=i } = pi.
Proof
{ } { } { }
{ } { }Acceptance
,
Acceptance
Acceptance,
Acceptance
Method
Acceptance
Method
Acceptance
P
qc
p
UiYP
P
iYP
iYPiXP i
i
Bayes






≤=
=
=
====
{ }
{ } { } { }AcceptanceAcceptanceAcceptance
(0,1)ddistribute
uniformlyU
ceindependen
by
Pc
p
P
qc
p
q
P
qc
p
UPiYP
ii
i
i
i
i
qi
==





≤=
=

Summing over all i, yields
{ }
{ }Acceptance
1
1
Pc
p
iXP i
i
i

 ∑
∑ ==
{ } 1Acceptance =Pc
{ } ipiXP ==
{ } 1
1
Acceptance ≤=
c
P
q.e.d.
166
SOLO Review of Probability
Generating Discrete Random Variables
The Acceptance-Rejection Technique (continue – 2)
Example
Generate a truncated Gaussian using the
Accept-Reject method. Consider the case with
( ) [ ]



 −∈
≈
−
otherwise
xe
xp
x
0
4,42/2/2
π
Consider the Uniform proposal function
( )
[ ]


 −∈
≈
otherwise
x
xq
0
4,48/1
In Figure we can see the results of the
Accept-Reject method using N=10,000 samples.
Return to Table of Content
167
SOLO Review of Probability
Generating Continuous Random Variables
The Inverse Transform Algorithm
Let U be a uniform (0,1) random variable. For any continuous
distribution function F the random variable X defined by
( )UFX 1−
=
has distribution F. [ F-1
(u) is defined to be that value of x such that F (x) = u ]
Proof
Let Px(x) denote the Probability Distribution Function X=F-1
(U)
( ) { } ( ){ }xUFPxXPxPx ≤=≤= −1
Since F is a distribution function, it means that F (x) is a monotonic increasing
function of x and so the inequality “a ≤ b” is equivalent to the inequality
“F (a) ≤ F (b)”, therefore
( ) ( )[ ] ( ){ }
( )[ ]
( ){ } ( )
( )
( )xFxFUP
xFUFFPxP
uniformU
xF
UUFF
x
1,0
10
1
1
≤≤
=
−
=≤=
≤=
−
Return to Table of Content
168
SOLO Review of Probability
Generating Continuous Random Variables
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having a
probability density function g (x). We want to use this to obtain a random
variable that has the probability density function f (x).
Let c be a constant such that: ( )
( )
yc
yg
yf
∀≤
If such a c exists, it must satisfy: ( ) ( ) cdyygcdyyf ≤⇒≤ ∫∫ 1
11

Rejection Method
Step 1: Simulate the value of Y, having probability density function g (Y).
Step 2: Generate a random number U (that is uniformly distributed
over (0,1) ).
Step 3: If U < f (Y)/c g (Y), set X = Y and stop. Otherwise return to Step 1.
169
SOLO Review of Probability
Generating Continuous Random Variables
The Acceptance-Rejection Technique (continue – 1)
Theorem
The random variable X obtained by the rejection method has probability density
function P { Y=y } = f (y).
Proof
{ } { } { }
{ }
( )
( )
{ }Acceptance
,
Acceptance
Acceptance,
Acceptance
Method
Acceptance
Method
Acceptance
P
ygc
yf
UyP
P
yP
YPyYP
Bayes






≤
====
{ }
( )
( )
( )
{ }
( ) ( )
( )
{ }
( )
{ }AcceptanceAcceptanceAcceptance
(0,1)ddistribute
uniformlyU
ceindependen
by
Pc
yf
P
ygc
yf
yg
P
ygc
yf
UPyP
yg
==





≤
=

Summing over all i, yields
{ }
( )
{ }Acceptance
1
1
Pc
dyyf
dyyYP

  
∫
∫ ==
{ } 1Acceptance =Pc
{ } ( )yfyYP ==
{ } 1
1
Acceptance ≤=
c
P
q.e.d.
Return to Table of Content
170
SOLO
The Bootstrap
• Popularized by Bradley Efron (1979)
• The Bootstrap is a name generically applied to statistical resampling schemes
that allow uncertainty in the data to be assesed from the data themselves, in
other words
“pulling yourself up by your bootstraps”
The disadvantage of bootstrapping is that while (under some conditions) it is
asymptotically consistent, it does not provide general finite-sample
guarantees, and has a tendency to be overly optimistic.The apparent
simplicity may conceal the fact that important assumptions are being made
when undertaking the bootstrap analysis (e.g. independence of samples)
where these would be more formally stated in other approaches.
The advantage of bootstrapping over analytical methods is its great simplicity - it is
straightforward to apply the bootstrap to derive estimates of standard errors and
confidence intervals for complex estimators of complex parameters of the
distribution, such as percentile points, proportions, odds ratio, and correlation
coefficients.
Generating Discrete Random Variables
Bradley Efron
1938
Stanford U.
Review of Probability
171
SOLO
The Bootstrap (continue -1)
• Given n observation zi i=1,…,n and a calculated statistics S, what is the uncertainty
in S?
• The Procedure:
Generating Discrete Random Variables
- Draw m values z’i i=1,…,m from the original data with replacement
- Calculate the statistic S’ from the “bootstrapped” sample
- Repeat L times to build a distribution of uncertainty in S.
Review of Probability
Return to Table of Content
172
SOLO Review of Probability
Importance Sampling (IS)
Let Y = (Y1,…,Ym) a vector of random variables having a joint probability density
function p (y1,…,ym), and suppose that we are interested in estimating
( )[ ] ( ) ( )∫== mmmmp dydyyypyygYYgE  1111 ,,,,,,θ
Suppose that a direct generation of the random vector Y so as to compute g (Y) is
inefficient possible because
(a) is difficult to generate the random vector Y, or
(b) the variance of g (Y) is large, or
(c) both of the above
Suppose that W=(W1,…,Wm) is another random vector, which takes values in the
same domain as Y, and has a joint density function q (w1,…,wm) that can be easily
generated. The estimation θ can be expressed as:
( )[ ] ( ) ( )
( )
( ) ( )
( )
( )






=== ∫ Wg
Wq
Wp
Edwdwwwq
wwq
wwpwwg
YYgE qmm
m
mm
mp 


 11
1
11
1 ,,
,,
,,,,
,,θ
Therefore, we can estimate θ by generating values of random vector W, and then
using as the estimator the resulting average of the values g (W) p (W)/ q (W).
Generating Discrete Random Variables
173
SOLO Review of Probability
Importance Sampling (IS) (continue – 1)
[ ] ( )
( )
( ) ( )
( )
( )
( )∑∫ =
≈






==
N
i
w
i
i
iqp
i
xq
xp
x
N
x
xq
xp
Edxxq
xq
xpx
xE
1
1

In Figure the Histogram using the Importance
Weight wi is presented together with the
true PDF
Generating Discrete Random Variables
Example:
Importance Sample for a Bi-Modal Distribution
Consider the following distribution:
( ) ( ) ( )2/1,3:
2
1
1,0:
2
1
xxxp NN +=
We want to calculate the mean value (g (x) = x)
using Importance Sampling.
Use: ( ) ( ) ( )5,5& −∈= Uxqxxg
For i=1,…,N, sample (draw) xi using q (x)
We obtain:
( )
( )i
i
i
xq
xp
w =: Importance Weight
For N=10,000 samples we obtain Ep [x]=1.4915 instead of 1.5.
( ) Nixqx ii ,,1~ =
Return to Table of Content
174
SOLO
Metropolis Algorithm
• This method of generation of an arbitrary probability distribution
was invented by Metropolis, Rosenbluth and Teller (supposedly at a
Los Alamos dinner party) and published June 1953.
Generating Discrete Random Variables
Review of Probability
Procedure
• Set up a Markov Chain that has as a unique stationary solution
the required π (x) Probability Distribution Function (PDF)
• Run the chain until stationary.
• All subsequent samples are from stationary distribution π (x)
as required.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.,
“ Equations of state calculations by fast computing machine”,
Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092
Nicholas Constantine Metropolis
( 1915 – 1999)
This is also called Markov Chain Monte Carlo (MCMC) method.
X3 X2
0.3
0.3
0.1
0.2
X1
0.6
0.50.3
0.6
0.1
175
SOLO
Metropolis Algorithm (continue – 1)
Generating Discrete Random Variables
Review of Probability
Nicholas Constantine Metropolis
( 1915 – 1999)
Proof of the Procedure
Pr (X,t) - the probability of being in the state X at time t.
Pr (X→Y)=Pr (Y|X) - the probability, per unit time, of transition probability, of going
from state X to state Y.
( ) ( ) ( ) ( ) ( ) ( )[ ]∑ −+=+
Y
tXXYtYYXtXtX ,Pr|Pr,Pr|Pr,Pr1,Pr
At large t, once the arbitrary initial state is “forgotten”, we want Pr (X,t) → Pr (X).
Clearly a sufficient (but not necessary) condition for an equilibrium (time independent)
probability distribution is the so called
( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
This method can be used for any probability distribution,
but Metropolis used:
( ) ( ) ( )AEBEE
E
Ee
AB
kTE
−=∆



≤∆
>∆
=
∆−
:
01
0
|Pr
/ Note: E (A) is equivalent to
Energy level of state A
( ) ( ) 1|PrPr ==→ ∑∑ YB
XYYX Sum of probabilities of all states reached from X.
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→
176
SOLO
Metropolis Algorithm (continue – 2)
Generating Discrete Random Variables
Review of Probability
( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
Metropolis defined a symmetric Q (Y|X) = Q (X|Y) as a candidate generating density,
for p (Y|X) such that: ( ) 1| =∑Y
XYQ
In general Q (Y|X) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )tYYXQtXXYQ ,Pr|,Pr| >
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→
The process moves from X to Y too
often and from Y to X too rarely.
A convenient way to correct this is to reduce the number of moves from X to Y by
introducing a probability 0 < Α (Y|X) ≤ 1. This is called the Acceptance Probability.
( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr
177
SOLO
Metropolis Algorithm (continue – 3)
Generating Discrete Random Variables
Review of Probability
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→
Let define the Acceptance Probability as:
( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr
( )
( ) ( ) ( ) ( )
( ) ( )


>
≤
=Α
YX
YXYX
XY
PrPr1
PrPrPr/Pr
| ( )
( ) ( )
( ) ( ) ( ) ( )


>
≤
=Α
YXXY
YX
YX
PrPrPr/Pr
PrPr1
|
( ) ( ) ( ) XYYXYXQYX ≠Α⋅= |||Pr
If Pr (X) ≤ Pr (Y) then A (X|Y) = 1, A (Y|X) = Pr (X)/Pr (Y)
If Pr (X) >Pr (Y) then A (X|Y) = Pr (Y)/ Pr (X), A (Y|X) = 1
In both cases:
( )
( )
( ) ( )
( ) ( )
( ) ( ) ( )
( )
( )
( )Y
X
YX
XY
YXYXQ
XYXYQ
YX
XY YXQXYQ
Pr
Pr
|
|
||
||
|Pr
|Pr ||
=
Α
Α
=
Α⋅
Α⋅
=
=
which is just the Detailed Balance condition.
178
SOLO
Metropolis Algorithm (continue – 2)
Generating Discrete Random Variables
Review of Probability
( ) ( ) ( ) ( )tAABtBBA ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
This method can be used for any probability distribution, but Metropolis used:
( ) ( ) ( )AEBEE
E
Ee
AB
kTE
−=∆



≤∆
>∆
=
∆−
:
01
0
|Pr
/
( )
( )
( )
( )
( ) ( )[ ]
( ) ( )
( ) ( )[ ] ( ) ( )
TkE
TkBEAE
TkAEBE
e
AEBEE
e
AEBEE
e
tA
tB
BA
AB /
/
/
0
1
0
1
,Pr
,Pr
|Pr
|Pr ∆−
−−
−−
=












≤−=∆
>−=∆
==
Therefore
A B
( )BA →Pr
( )AB →Pr
( )
( )BA
AA
→−=
→
Pr1
Pr ( )
( )AB
BB
→−=
→
Pr1
Pr
179
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Discrete Random Variables
Review of Probability
• Set up a Markov Chain T (x’|x) that has as a unique stationary solution
the required π (x’) Probability Distribution Function (PDF)
( ) ( ) ( )∫= xdxxxTx ππ |''
W. Keith Hastings improved the Metropolis algorithm by allowing a non-symmetrical
Candidate Generating Density.
Hastings, W., “Monte Carlo Simulation Methods Using Markov Chains and
Their Applications”, Biometrica, 1970, No. 57, pp. 97 - 109
Here we give the development for Continuous Random Variables
(for Discrete Random Variables the development is similar to that used for
Metropolis Algorithm).
180
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
• The problem is to find the conditional transition probability distribution T (x’|x)
of the Markov Chain, that has states converging, after a transition time, to π (x’).
( ) ( ) ( )∫= xdxxxTx ππ |''
To satisfy this requirement a “necessary condition” (but “not sufficient”) is:
Proof:
( ) ( ) ( ) ( ) ( ) ( ) ( )''|'''||'
1
xxdxxTxxdxxxTxdxxxT ππππ === ∫∫∫ 
q.e.d.
Let define Q (x’|x) as a candidate generating density, for T (x’|x) such that:
( ) 1'|' =∫ xdxxQ
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )''||' xxxTxxxT ππ =
“Detailed Balance”
or “ Reversibility Condition”
or “Time Reversibility”
( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from
x to x’ too often and from x’ to x too rarely.
181
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from
x to x’ too often and from x’ to x too rarely.
A convenient way to correct this is to reduce the number of moves from x to x’ by
introducing a probability 0 < α (x’|x) ≤ 1. This is called the Acceptance Probability.
( ) ( ) ( ) xxxxxxQxxT ≠= '|'|'|' α
If the move is not made the process again returns to x as a value from target distribution.
( ) ( ) ( ) ( ) ( )''||'|' xxxQxxxxxQ ππα =The Detailed Balance is
From which
( ) ( ) ( )
( ) ( )
1
|'
''|
|' ≤=
xxxQ
xxxQ
xx
π
π
α ( ) ( ) ( )
( ) ( ) 





=
xxxQ
xxxQ
xx
π
π
α
|'
''|
,1min|'
In the same way (by interchanging
x’ with x) ( ) ( ) ( )
( ) ( )





=
''|
|'
,1min'|
xxxQ
xxxQ
xx
π
π
α
182
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Let prove that we satisfy the “Detailed Balance” condition:
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) 





=
xxxQ
xxxQ
xxxQxxxT
π
π
ππ
|'
''|
,1min|'|'
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )





=
''|
|'
,1min''|''|
xxxQ
xxxQ
xxxQxxxT
π
π
ππ
Suppose ( ) ( ) ( ) ( )xxxQxxxQ ππ |'''| <
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( )''|
|'
''|
|'|' xxxQ
xxxQ
xxxQ
xxxQxxxT π
π
π
ππ ==
( ) ( ) ( ) ( ) ( ) ( )''|1''|''| xxxQxxxQxxxT πππ =⋅=
Therefore
( ) ( ) ( ) ( )''||' xxxTxxxT ππ = q.e.d.
183
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
The Transition Kernel of the Metropolis Hastings Algorithm is:
( ) ( ) ( ) ( )[ ] ( )'|'1|'|'|' xxxxxxxQxxT xδαα −+=
where δx is the Dirac-mass on {x}.
184
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Therefore the M-H Algorithm will:
Use the previous generated x(t)
Draw a new value xnew
from the candidate distribution Q (xnew
| x(t)
):
( )
( )tnewnew
xxQx |~
Compute the acceptance probability α (xnew
| xj
):
( )
( )
( )
( ) ( )
( )
( ) ( )
( ) 





= ttnew
newnewt
tnew
xxxQ
xxxQ
xx
π
π
α
|
|
,1min|
Use the Acceptance/Rejection method with
(uniform distribution between 0 to 1) and c = 1 (because U [0,1] > α (xnew
|x(t)
) )
( ) [ ]


 ≤≤
==
otherwise
x
Uxq
0
101
1,0
1
2
3
4
185
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
186
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Run This
Example
187
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
The convergence of the M-H Algorithm to the desired unique stationary solution
the required π (x) occurs under the following conditions:
• Irreducibility: every state is eventually reachable from any start state;
for all x, there exist a t such that π (x,t) > 0
• Aperiodicity: the chain doesn’t get caught in cycles.
The process is ergodic if it is both irreductible and aperiodic.
In M-H algorithm the draws are used as sample from the target density π (x) only
after the Markov Chain has passed the transient stage and the effect of the chosen
starting value x0
has become so small that it can be ignored. The rate of convergence
of the Markov Chain is a function of the chosen candidate generating density Q (x’,x)
The efficiency of the algorithm depends on how close is the Acceptance Probability α
to 1.
188
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Example:
( ) ( )
( )[ ]2
2
102.0exp7.0
2.0exp3.0
−⋅−+
−⋅=
x
xxπ
Proposed Candidate
Distribution:
( ) ( )100,| tt
new xxxQ N=
Ramon Sagarna
R,Sagarna@cs.bham.ac.uk
“Lecture 19
Markov Chain Monte Carlo
Methods (MCMC)”
189
SOLO
Metropolis Algorithm
Generating Continuous Random Variables
Review of Probability
If we choose a symmetric candidate generating density: Q (x’|x) = Q (x|x’) for each x’,x
then
( ) ( )
( ) 





=
x
x
xx
π
π
α
'
,1min|' ( ) ( )
( )





=
'
,1min'|
x
x
xx
π
π
α
We obtain the Metroplis Algorithm.
( ) ( ) ( )xExEE
E
Ee
xxQ
kTE
−=∆



≤∆
>∆
=
∆−
':
01
0
|'
/
Metropolis has chosen:
( ) ( ) ( )xExEE
Ee
E
xxQ kTE
−=∆



≤∆
>∆
= ∆+
':
0
01
'| /
190
SOLO
Metropolis Algorithm
Generating Continuous Random Variables
Review of Probability
Return to Table of Content
191
SOLO
Gibbs Sampling
Generating Discrete Random Variables
Review of Probability
Stuart Geman
Brown University
Donald Geman
Johns Hopkins
University
Josiah Willard Gibbs
1839 - 1903
In mathematics and physics, Gibbs sampling is an
algorithm to generate a sequence of samples from the joint
probability distribution of two or more random variables.
The purpose of such a sequence is to approximate the joint
distribution, or to compute an integral (such as an
expected value). Gibbs sampling is a special case of the
Metropolis-Hastings algorithm, and thus an example of a
Markov chain Monte Carlo algorithm. The algorithm is
named after the physicist J. W. Gibbs, in reference to an
analogy between the sampling algorithm and statistical
physics. The algorithm was devised by Stuart Geman and
Donald Geman, some eight decades after the passing of
Gibbs, and is also called the Gibbs sampler.
Geman, S. and Geman, D., “Stochastic Relaxation, Gibbs Distributions and the Bayes
Restoration of Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence,
1984, 6, pp. 721 - 741
192
SOLO
Gibbs Sampling (continue – 1)
Generating Discrete Random Variables
Review of Probability
Gibbs sampler uses what are called the full (or complete) conditional distributions:
( )
( )
( )
( )∫∫ −
−
+−
+−
+− ==










−
jjj
jj
jkjjj
kjjj
Bayes
x
kjjj
xdxx
xx
xdxxxxx
xxxxx
xxxxx
j
,
,
,,,,,,
,,,,,,
,,,,,|
111
111
111
π
π
π
π
π


  

The Gibbs sampler sample one variable in turn
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )







11
3
1
21
2
1
1
1
1
2
1
1
1
4
1
2
1
13
1
3
3
1
12
1
2
321
1
1
,,,|~
,,,|~
,,,,|~
,,,|~
,,,|~
++++
+
−
+++
+++
++
+
t
k
ttt
t
k
tt
k
t
k
t
k
tttt
t
k
ttt
t
k
ttt
xxxxX
xxxxX
xxxxxX
xxxxX
xxxxX
π
π
π
π
π
Gibbs sampler always uses the most recent values.
Suppose that is k ( ≥2 ) dimensional.( )kxxxx ,,, 21 =
193
SOLO
Gibbs Sampling (continue – 2)
Generating Discrete Random Variables
Review of Probability
Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.
To see this let define the candidate generating density: Q (x’|x(t)
) as
( )
( )
( )
( ) ( )



 =
= −−
otherwise
xxifxx
xxQ
t
j
new
j
t
j
new
jtnew
0
|Pr
|
( )
( )
( )
( ) ( )
( )
( ) ( )
( ) 





= ttnew
newnewt
tnew
xxxQ
xxxQ
xx
Pr|
Pr|
,1min|α
( ) ( )
( ) ( )
( ) ( )
( )







=
−
−
tnew
j
new
j
newt
j
t
j
xxx
xxx
Pr|Pr
Pr|Pr
,1min
At any moment one variable is drawn: ( )new
jj
new
j xxx −|~ πnew
jx
where ( ) ( ) ( ) ( ) ( )
( )11
1121 ,,,,,,: −−
+−− = t
k
t
j
t
j
ttnew
j xxxxxx 
The acceptance probability is:( )
( )tnew
xx |α
( ) ( ) ( ) ( ) ( )
( ) ( )new
j
new
j
t
k
t
j
new
j
t
j
ttnew
xxxxxxxxx −
−−
+− == |,,,,,,, 11
1121 The will be
new
x
194
SOLO
Gibbs Sampling (continue – 3)
Generating Discrete Random Variables
Review of Probability
( )
( )
( )
( ) ( )



 =
= −−
otherwise
xxifxx
xxQ
t
j
new
j
t
j
new
jtnew
0
|Pr
|
( )
( )
( )
( ) ( )
( )
( ) ( )
( )
( ) ( )
( ) ( )
( ) ( )
( )







=





=
−
−
tnew
j
new
j
newt
j
t
j
ttnew
newnewt
tnew
xxx
xxx
xxxQ
xxxQ
xx
Pr|Pr
Pr|Pr
,1min
Pr|
Pr|
,1min|α
( )
( )
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
( )







=
−−
−−
tt
j
new
j
new
j
newnew
j
t
j
t
jtnew
xxxx
xxxx
xx
PrPr,Pr
PrPr,Pr
,1min|α
( ) ( )
( )
( ) ( )
( )
( )
( )t
j
t
j
t
j
Bayes
t
j
t
j
x
xx
xx
−
−
− =
Pr
,Pr
|Pr ( ) ( )
( )new
j
new
j
new
j
Bayes
new
j
new
j
x
xx
xx
−
−
− =
Pr
,Pr
|Pr
( )
( ) ( )
( )
( )
( )
1
Pr
Pr
,1min|
t
j
new
j xx
t
j
new
jtnew
x
x
xx
−− =
−
−
=








=α
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( )t
j
t
j
tt
j
t
j
t
xxxxxx −− =→= ,PrPr,( ) ( ) ( )new
j
new
j
newnew
j
new
j
new
xxxxxx −− =→= ,PrPr,
The acceptance probability is:( )
( )tnew
xx |α
Gibbs Sampling always accepts
new
jx
Gibbs Sampling is a special case of
Metropolis – Hastings Algorithm.
candidate generating density: Q (x’|x(t)
)
195
SOLO
Gibbs Sampling (continue – 4)
Generating Discrete Random Variables
Review of Probability
Return to Table of Content
SOLO Review of Probability
Monte Carlo Integration
Monte Carlo Method can be used to numerically evaluate multidimensional integrals
( ) ( )∫∫ == xdxgdxdxxxgI mm  11 ,,
To use Monte Carlo we factorize ( ) ( ) ( )xpxfxg ⋅=
( ) ( ) 1&0 =≥ ∫ xdxpxp
in such a way that is interpreted as a Probability Density Function such that( )xp
We assume that we can draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
Using Monte Carlo we can approximate ( ) ( )∑=
−≈
SN
i
S
i
Nxxxp
1
/δ
( ) ( ) ( ) ( )
( ) ( ) ( )∑∑∫
∫ ∑∫
==
=
=−⋅=
−⋅=≈⋅=
SS
S
S
N
i
i
S
N
i
i
S
N
i
S
i
N
xf
N
xdxxxf
N
xdNxxxfIxdxpxfI
11
1
11
/
δ
δ
SOLO Review of Probability
Monte Carlo Integration
we draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
( ) ( ) ( )∑∫ =
=≈⋅=
S
S
N
i
i
S
N xf
N
IxdxpxfI
1
1
If the samples are independent, then INS
is an unbiased estimate of I.
i
x
II
sa
N
N
S
S
..
∞→
→
( )[ ] ( ) ∞<−= ∫ xdxpIxff
22
:σIf the variance of is finite; i.e.:( )xf
The error of the MC estimate, e = INS
– I, is of the order of O (NS
-1/2
), meaning
that the rate of convergence of the estimate is independent of the dimension of
the integrand. Return to Table of Content
According to the Law of Large Numbers INS
will almost surely converge to I:
then the Central Limit Theorem holds and the estimation error converges in
distribution to a Normal Distribution:
( ) ( )2
,0~lim fNS
N
IIN S
S
σN−
∞→
198
Random ProcessesSOLO
Random Variable:
A variable x determined by the outcome Ω of a random experiment.
( )Ω= xx
Random Process or Stochastic Process:
A function of time x determined by the outcome Ω of a random experiment.
( ) ( )Ω= ,txtx
1
Ω
2
Ω
3Ω
4Ω
x
t
This is a family or an ensemble of
functions of time, in general different
for each outcome Ω.
Mean or Ensemble Average of the Random Process: ( ) ( )[ ] ( ) ( )∫
+∞
∞−
=Ω= ξξξ dptxEtx tx
,:
Autocorrelation of the Random Process: ( ) ( ) ( )[ ] ( ) ( ) ( )∫ ∫
+∞
∞−
+∞
∞−
=ΩΩ= ηξξξη ddptxtxEttR txtx 21 ,2121
,,:,
Autocovariance of the Random Process: ( ) ( ) ( )[ ] ( ) ( )[ ]{ }221121 ,,:, txtxtxtxEttC −Ω−Ω=
( ) ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )2121212121 ,,,, txtxttRtxtxtxtxEttC −=−ΩΩ=
Table of Content
199
SOLO
Stationarity of a Random Process
1. Wide Sense Stationarity of a Random Process:
• Mean Average of the Random Process is time invariant:
( ) ( )[ ] ( ) ( ) .,: constxdptxEtx tx
===Ω= ∫
+∞
∞−
ξξξ
• Autocorrelation of the Random Process is of the form: ( ) ( ) ( )τ
τ
RttRttR
tt 21:
2121
,
−=
=−=
( ) ( ) ( )[ ] ( ) ( ) ( ) ( )12,2121 ,,,:, 21
ttRddptxtxEttR txtx === ∫ ∫
+∞
∞−
+∞
∞−
ηξξξηωωsince:
We have: ( ) ( )ττ −= RR
Power Spectrum or Power Spectral Density of a Stationary Random Process:
( ) ( ) ( )∫
+∞
∞−
−= ττωτω djRS exp:
2. Strict Sense Stationarity of a Random Process:
All probability density functions are time invariant: ( ) ( ) ( ) .,,
constptp xtx
== ωωω
Ergodicity:
( ) ( ) ( )[ ]Ω==Ω=Ω ∫
+
−∞→
,,
2
1
:, lim txExdttx
T
tx
Ergodicity
T
TT
A Stationary Random Process for which Time Average = Assembly Average
Random Processes
200
SOLO
Time Autocorrelation:
Ergodicity:
( ) ( ) ( ) ( ) ( )∫
+
−∞→
Ω+Ω=Ω+Ω=
T
TT
dttxtx
T
txtxR ,,
2
1
:,, lim τττ
For a Ergodic Random Process define
Finite Signal Energy Assumption: ( ) ( ) ( ) ∞<Ω=Ω= ∫
+
−∞→
T
TT
dttx
T
txR ,
2
1
,0 22
lim
Define: ( )
( )


 ≤≤−Ω
=Ω
otherwise
TtTtx
txT
0
,
:, ( ) ( ) ( )∫
+∞
∞−
Ω+Ω= dttxtx
T
R TTT
,,
2
1
: ττ
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )∫∫∫
∫∫∫
−−
−
−
+∞
−
−
−
−
∞−
Ω+Ω−Ω+Ω=Ω+Ω=
Ω+Ω+Ω+Ω++Ω=
T
T
TT
T
T
TT
T
T
TT
T
TT
T
T
TT
T
TTT
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
R
τ
τ
τ
τ
τττ
ττωττ
,,
2
1
,,
2
1
,,
2
1
,,
2
1
,,
2
1
,,
2
1
00

Let compute:
( ) ( ) ( ) ( ) ( )∫∫ −∞→−∞→∞→
Ω+Ω−Ω+Ω=
T
T
TT
T
T
T
TT
T
T
T
dttxtx
T
dttxtx
T
R
τ
τττ ,,
2
1
,,
2
1
limlimlim
( ) ( ) ( )ττ Rdttxtx
T
T
T
TT
T
=Ω+Ω∫−∞→
,,
2
1
lim
( ) ( ) ( ) ( )[ ] 0,,
2
1
,,
2
1
suplimlim →








Ω+Ω≤Ω+Ω
≤≤−∞→−∞→
∫ τττ
ττ
txtx
T
dttxtx
T
TT
TtTT
T
T
TT
T
therefore: ( ) ( )ττ RRT
T
=
→∞
lim
( ) ( ) ( )[ ]Ω==Ω=Ω ∫
+
−∞→
,,
2
1
:, lim txExdttx
T
tx
Ergodicity
T
TT
T− T+
( )txT
t
Random Processes
201
SOLO
Ergodicity (continue):
( ) ( ) ( ) ( ) ( )
( ) ( )[ ] ( ) ( )( )[ ]
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( ) [ ]TTTT
TT
TT
TTT
XX
T
dvvjvxdttjtx
T
dtjtxdttjtx
T
ddttjtxtjtx
T
dttxtxdj
T
djR
*
2
1
exp,exp,
2
1
exp,exp,
2
1
exp,exp,
2
1
,,exp
2
1
exp
=−ΩΩ=
+−Ω+Ω=
+−Ω+Ω=
Ω+Ω−=−
∫∫
∫∫
∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
+∞
∞−
ωω
ττωτω
ττωτω
τττωττωτLet compute:
where: and * means complex-conjugate.( ) ( )∫
+∞
∞−
−Ω= dvvjvxX TT ωexp,:
Define:
( ) ( ) ( ) ( ) ( ) ( )[ ]∫ ∫∫
+∞
∞−
+
−∞→
+∞
∞−∞→∞→ 







Ω+Ω−=








−=








= τττωττωτω ddttxtxE
T
jdjRE
T
XX
ES
T
T
TT
T
T
T
TT
T
,,
2
1
expexp
2
: limlimlim
*
Since the Random Process is Ergodic we can use the Wide Stationarity Assumption:
( ) ( )[ ] ( )ττ RtxtxE TT =Ω+Ω ,,
( ) ( ) ( ) ( ) ( )
( ) ( )∫
∫ ∫∫ ∫
∞+
∞−
+∞
∞−
+
−∞→
+∞
∞−
+
−∞→∞→
−=








−=








−=








=
ττωτ
ττωττττωω
djR
ddt
T
jRddtR
T
j
T
XX
ES
T
TT
T
TT
TT
T
exp
2
1
exp
2
1
exp
2
:
1
*
limlimlim
  
Random Processes
202
SOLO
Ergodicity (continue):
We obtained the Wiener-Khinchine Theorem (Wiener 1930):
( ) ( ) ( )∫
+∞
∞−→∞
−=





= dtjR
T
XX
ES TT
T
τωτω exp
2
:
*
lim
Norbert Wiener
1894 - 1964
Alexander Yakovlevich
Khinchine
1894 - 1959
The Power Spectrum or Power Spectral Density of
a Stationary Random Process S (ω) is the Fourier
Transform of the Autocorrelation Function R (τ).
Random Processes
203
SOLO
White Noise
A (not necessary stationary) Random Process whose Autocorrelation is zero for
any two different times is called white noise in the wide sense.
( ) ( ) ( )[ ] ( ) ( )211
2
2121
,,, ttttxtxEttR −=ΩΩ= δσ
( )1
2
tσ - instantaneous variance
Wide Sense Whiteness
Strict Sense Whiteness
A (not necessary stationary) Random Process in which the outcome for any two
different times is independent is called white noise in the strict sense.
( ) ( ) ( ) ( )2121,
,,21
ttttp txtx
−=Ω δ
A Stationary White Noise Random has the Autocorrelation:
( ) ( ) ( )[ ] ( )τδσττ 2
,, =Ω+Ω= txtxER
Note
In general whiteness requires Strict Sense Whiteness. In practice we have only
moments (typically up to second order) and thus only Wide Sense Whiteness.
Random Processes
204
SOLO
White Noise
A Stationary White Noise Random has the Autocorrelation:
( ) ( ) ( )[ ] ( )τδσττ 2
,, =Ω+Ω= txtxER
The Power Spectral Density is given by performing the Fourier Transform of the
Autocorrelation:
( ) ( ) ( ) ( ) ( ) 22
expexp στωτδστωτω =−=−= ∫∫
+∞
∞−
+∞
∞−
dtjdtjRS
( )ωS
ω
2
σ
We can see that the Power Spectrum Density contains all frequencies at the same
amplitude. This is the reason that is called White Noise.
The Power of the Noise is defined as: ( ) ( ) 2
0 σωτ ==== ∫
+∞
∞−
SdtRP
Random Processes
205
SOLO
Table of Content
Markov Processes
A Markov Process is defined by:
Andrei Andreevich
Markov
1856 - 1922
( ) ( )( ) ( ) ( )( ) 111
,|,,,|, tttxtxptxtxp >∀ΩΩ=≤ΩΩ ττ
i.e. the Random Process, the past up to any time t1 is fully defined
by the process at t1.
Examples of Markov Processes:
1. Continuous Dynamic System
( ) ( )
( ) ( )wuxthtz
vuxtftx
,,,
,,,
=
=
2. Discrete Dynamic System
( ) ( )
( ) ( )kkkkk
kkkkk
wuxthtz
vuxtftx
,,,
,,,
1
1
=
=
+
+
x - state space vector (n x 1)
u - input vector (m x 1)
v - white input noise vector (n x 1)
- measurement vector (p x 1)z
- white measurement noise vector (p x 1)w
Random Processes
206
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
3. Continuous Linear Dynamic System
( ) ( ) ( )
( ) ( )txCtz
tvtxAtx
=
+=
Using the Fourier Transform we obtain: ( ) ( )
( )
( ) ( ) ( )ωωωωω
ω
VHVAIjCZ
H
=−=
−
  
1
Using the Inverse Fourier Transform we obtain:
( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( )
( ) ( ) ( )( )
( )
( ) ( )∫∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
−
∞+
∞−
+∞
∞−
+∞
∞−
+∞
∞−
−=−=








−==
ξξξξωξωω
π
ξ
ωωξξωξω
π
ωωωω
π
ξ
ω
dthvddtjHv
dtjdjvHdtjVHtz
th
egrattion
of
order
change
V
  
  
exp
2
1
expexp
2
1
exp
2
1
int
h (t,τ)
v (t) z (t)
Random Processes
207
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
3. Continuous Linear Dynamic System
( ) ( ) ( )
( ) ( )txCtz
tvtxAtx
=
+=
The Autocorrelation of the output is:
( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
h (t,τ)
v (t) z (t)
( ) ( ) ( )[ ] ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )[ ] ( ) ( ) ( )
( ) ( ) ( ) ( )∫∫
∫ ∫∫ ∫
∫∫
∞+
∞−
−=
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
+=−+−=
−−+−=−+−=








−+−=+=
ζτζζσξξτξσ
ξξξξδξτξσξξξξξτξ
ξξξτξξξττ
ξζ
dhhdthth
ddththddvvEthth
dvthdvthEtztzER
v
t
v
v
zz
2
111
2
212121
2
212111
222111
1
( ) ( ) ( )[ ] ( )τδσττ
2
vvv tvtvER =+=
( ) ( ) ( ) ( ) ( ) 22
expexp vvvvvv
djdjRS σττωτδσττωτω =−=−= ∫∫
+∞
∞−
+∞
∞−
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 2*2
22
2
expexp
expexpexpexp
expexp
xx
xx
x
RR
zzzz
HHdjhdjh
djdjhhdjdjhh
djdhhdjRS
zzzz
σωωχχωχζζωζσ
χχωζζωζχσττζωζζωζτζσ
ττωζτζζσττωτω
χτζ
ττ
=
















−=
−=−−−=
−−=−=
∫∫
∫ ∫∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
=+
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
−=
+∞
∞−
( ) ( ) ( ) ( )ωωωω vvzz
SHHS *
=
Random Processes
208
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
4. Continuous Linear Dynamic System ( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
( ) ( ) ( )[ ] ( )τδσττ
2
vvv tvtvER =+= ( ) 2
vvvS σω =
v (t) z (t)
( )
xj
K
H
ωω
ω
/1+
=
( )
x
j
K
H
ωω
ω
/1+
=
The Power Spectral Density of the output is:
( ) ( ) ( ) ( )
( )2
22
*
/1 x
v
vvzz
K
SHHS
ωω
σ
ωωωω
+
==
( )
( )2
22
/1 x
vv
zz
K
S
ωω
σ
ω
+
=
ω
x
ω
22
vv
K σ
2/
22
vv
K σ
The Autocorrelation of the output is:
( ) ( ) ( )
( )
( )
( )
( )∫∫
∫
∞+
∞−
=
∞+
∞−
+∞
∞−
−
−
=
+
=
=
dss
s
K
j
dj
K
djSR
x
v
js
x
v
zzzz
τ
ω
σ
π
ωτω
ωω
σ
π
ωτωω
π
τ
ω
exp
/12
1
exp
/12
1
exp
2
1
2
22
2
22
ωj
xω
R
( )
0
/1
2
22
=
−∫∞→R
s
x
vv
dse
s
K τ
ω
σ
( )
0
/1
2
22
=
−∫∞→R
s
x
vv
dse
s
K τ
ω
σ
xω−
σ
ωσ js +=
0<τ
0>τ
( ) τωσω
ω x
e
K
R vvx
zz
=
=
2
22
τ
2/
22
vvxK σω
( )τω
σω
x
vx
K
−= exp
2
22
( )
( )
( )
( )














>







+
−
−=
−
−
<








−
−
=
−
−
=
∫
∫
→
−→
0
exp
Reexp
2
1
0
exp
Reexp
2
1
222
22
222
222
22
222
τ
ω
τσω
τ
ω
σω
π
τ
ω
τσω
τ
ω
σω
π
ωω
ωω
x
vx
x
vx
x
vx
x
vx
s
sK
sdss
s
K
j
s
sK
sdss
s
K
j
x
x
Random Processes
209
SOLO
Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with
Time Variable Coefficients
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== 
( ) ( ) ( ) ( ) ( )tetGtetFte wxx +=
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00
λλλλ
The solution of the Linear System is:
where:
( ) ( ) ( ) ( ) ( ) ( ) ( )3132210000
,,,&,&,, ttttttItttttFtt
td
d
Φ=ΦΦ=ΦΦ=Φ
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx deGttettte
0
,, 00 λλλλ
( ){ } ( ) ( ){ } ( ) ( ){ }twEtGtxEtFtxE +=
Random Processes
210
SOLO
Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with
Time Variable Coefficients (continue – 1)
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00
λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV
T
xxx
:
( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR
T
xxx
T
xxx
:,&:,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ==
t
t
TTT
xxx
dtGQGttttVttttRtV
0
,,,,, 000
λλλλλλ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫
+
+Φ+Φ++Φ+Φ=++=+
τ
λλτλλλλττττττ
t
t
TTT
xxx
dtGQGttttVttttRtV
0
,,,,, 000
Random Processes
211
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with
Time Variable Coefficients (continue – 2)
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00 λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV
T
xxx
:
( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR
T
xxx
T
xxx
:,&:,
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( )
( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )











<+Φ+
>Φ+Φ−+Φ+
<Φ+Φ−+Φ
>+Φ
=+
∫
∫
+
+
0,
0,,,
0,,,
0,
,
τττ
τλλλλλλτττ
τλλλλλλττ
ττ
τ
τ
τ
tttV
dtGQGttttV
or
dtGQGttVtt
tVtt
ttR
T
x
t
t
TTT
x
t
t
TT
x
x
x
Random Processes
212
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with
Time Variable Coefficients (continue – 3)
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ==
t
t
TTT
xxx dtGQGttttVttttRtV
0
,,,,, 000 λλλλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx ==:
( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR
T
xxx
T
xxx
:,&:,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGdtFtGQGttFtttVtt
dtGQGttFtttVtttFtV
td
d
T
t
t
TTTTT
x
t
t
TTT
xx
+ΦΦ+ΦΦ+
ΦΦ+ΦΦ=
∫
∫
0
0
,,,,
,,,,
000
000
λλλλλλ
λλλλλλ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV
td
d TT
xxx ++=
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )ττττττττ +++++++++=+ tGtQtGtFtVtVtFtV
td
d TT
xxx
Random Processes
213
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with
Time Variable Coefficients (continue – 4)
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx deGttettte
0
,, 00 λλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==:
( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR
T
xxx
T
xxx
:,&:,
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )


<+Φ++++++++
>+Φ+++++
=+
0,,,
0,,,
,
ττττττττ
τττττ
τ
tttGtQtGtFttRttRtF
tGtQtGtttFttRttRtF
ttR
td
d
TTT
xx
TT
xx
x
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )


<++++Φ+++++
>+Φ+++++
=+
0,,,
0,,,
,
ττττττττ
τττττ
τ
tGtQtGtttFttRttRtF
tttGtQtGtFttRttRtF
ttR
td
d
TT
xx
TTT
xx
x
Random Processes
214
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== Given a Continuous Linear System:
we want to decide if can be approximated by a white noise.( )tw
Let start with a first order linear system with white noise input :( )tw'
( ) ( ) ( )tw
T
tw
T
tw '
11
+−= w (t)w' (t) ( )
Ts
sH
+
=
1
1
( ) ( ) Ttt
w
ett /
0
0
, −−
=φ
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )ttRtwEtwtwEtwE ww
,τττ +=−+−+
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τττ +=+−+− ttRtwEtwtwEtwE ww
,
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( ) ( )ttRtVwEwtwEtwE wwww
,==−− ττ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV
td
d TT
xxx ++= ( ) ( ) Q
T
tV
T
tV
td
d
wwww 2
12
+−=
( ) ( )00
,
1
, tt
T
tt
td
d
ww
φφ −=
where
Random Processes
215
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 1)
( ) ( ) Q
T
tV
T
tV
td
d
wwww 2
12
+−=
( ) ( ) 





−+=
−−
T
t
T
t
wwww
e
T
Q
eVtV
22
1
2
0 t
2/T
( ) T
t
ww
eV
2
0
−






−
−
T
t
e
T
Q 2
1
2 T
Q
V statesteadyww
2
=−
( )tVww
( )
( ) ( ) ( )
( ) ( ) ( )




<+=+Φ+
>=+Φ
=+
−
−
0,
0,
,
ττττ
ττ
τ τ
τ
tVetttV
tVetVtt
ttR
ww
TT
www
ww
T
www
ww
( )
( ) ( ) ( )
( ) ( ) ( )




<+=++Φ
>=+Φ
=+
−
−
0,
0,
,
ττττ
ττ
τ τ
τ
tVetVtt
tVetttV
ttR
ww
T
www
ww
TT
www
ww
For ( ) ( )
T
Q
VtVtV
T
statesteadywwwwww
2
5 ==+≈⇒> −
τ
τ
( ) ( ) ( ) TT
statesteadywwwwwwww
e
T
Q
eVVttRttR
T
ττ
τττ
τ −−
−
=≈≈+≈+⇒>
2
,,5
w (t)w' (t) ( )
Ts
sH
+
=
1
1
Random Processes
216
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 2)
( ) ( ) ( ) TT
statesteadywwwwwwww
e
T
Q
eVVttRttR
T
ττ
τττ
τ −−
−
=≈≈+≈+⇒>
2
,,5
( ) T
ww e
T
Q
V
/
2
τ
τ =
=
τ
T
Q
V statesteadyww
2
=−
T− T
1−
−
⋅eV statesteadyww ( ) Qde
T
Q
dVArea T
ww === ∫∫
+∞
−
+∞
∞− 0
2
2 τττ
τ
T is the correlation time of the noise w (t) and can be found from Vww (τ) by
tacking the time corresponding to Vww steady-state /e.
One other way to find T is by tacking the double sides Laplace Transform L2 on τ of:
( ) ( ){ } ( ) QdetQtQs s
ww
=−=−=Φ ∫
+∞
∞−
−
ττδτδ τ
τ2''
L
( ) ( ){ }
( )
( ) ( )sHQsH
sT
Q
dee
T
Q
Vs sT
sswwww
−=
=
=
==Φ ∫
+∞
∞−
−−
−
2
/
2
1
2
ττ ττ
τL
( )
( )2
2/1
/1 ωω
ω
+
=
Q
Qww
ω
T/12/1 =ω
Q
2/Q
T/12/1 −=−ω
T can be found by tacking ω1/2 of half of the
power spectrum Q/2 and T=1/ ω1/2.
Random Processes
217
SOLO Markov Processes
Examples of Markov Processes:
( )
( )2
2/1/1 ωω
ω
+
=
Q
Qww
ω
T/12/1 =ω
Q
2/Q
T/12/1 −=−ω
Let return to the original system: ( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== 
w (t) x (t)
( )tF
( )tG ∫
x (t)
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 3)
Compute the power spectrum of
and define Q and T.
( )ωjsww =Φ ( )tw
then can be approximated by the white noise with( )tw ( )tw'
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''
then can be approximated by a colored noise that can be obtained by passing
the predefined white noise through a filter
( )tw
( )tw' ( )
sT
sH
+
=
1
1
If
Fofeigenvaluemaximum
1
FofconstanttimeminimumT =<51
If
Fofeigenvaluemaximum
1
FofconstanttimeminimumT =>52
Random Processes
218
SOLO Markov Processes
Examples of Markov Processes:
7. Digital Simulation of a Contimuos Process
Let start with a first order linear system with white noise input :( )tw'
( ) ( ) ( )tw
T
tw
T
tw '
11
+−= w (t)w' (t) ( )
Ts
sH
+
=
1
1
( ) ( ) Ttt
w
ett /
0
0
, −−
=φ ( ) ( )00
,
1
, tt
T
tt
td
d
ww
φφ −=
( ) ( )
( ) ( )
( )∫
−−−
+=
t
t
TtTtt
dw
T
etwetw
0
0
'
1/
0
/
τττ
Let choose t = (k+1) ΔT and t0 = k ΔT
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''where
( )[ ] ( ) ( )[ ]
( )
( )
( )
  
Tkw
Tk
Tk
TTkTT
dw
T
eTkweTkw
∆
∆+
∆
∆+−∆−
∫+∆=∆+
1'
1
/1/
'
1
1 τττ
Random Processes
219
SOLO Markov Processes
Examples of Markov Processes:
7. Digital Simulation of a Contimuos Process (continue – 1)
Define:
TT
e /
: ∆−
=ρ
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
( )[ ] ( )[ ]
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
( )
( )( )
( )[ ]
( )
( )[ ] ( )
( ) ( )2/2
1
/12
2
1
12
/12
2
1 1
122112
/1/1
1111
1
2
1
22
1
''''
1
''''
11
21
21
ρτ
ττττττ
ττ
ττδ
ττ
−=−===
−−=
∆−∆∆−∆
∆−
∆+
∆
∆+−
∆+
∆
∆+−
∆+
∆
∆+
∆ −
∆+−∆+−
∫
∫ ∫
T
Q
e
T
Q
e
T
T
Q
dQ
T
e
ddwEwwEwE
T
ee
TkwETkwTkwETkwE
TT
Tk
Tk
TTk
Tk
Tk
TTk
Tk
Tk
Tk
Tk Q
TTkTTk
  
( )[ ] ( ) ( )[ ]
( )
( )
( )
  
Tkw
Tk
Tk
TTkTT
dw
T
eTkweTkw
∆
∆+
∆
∆+−∆−
∫+∆=∆+
1'
1
/1/
'
1
1 τττ
Define w’ (k) such that:
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
T
Q
kwEkwkwEkwE
2
:'''' =−−
( ) ( )
2
1
1
'
:'
ρ−
=
kw
kw
Therefore:
( )[ ] ( ) ( )kwTkwTkw '11 2
ρρ −+∆=∆+
Random Processes
220
SOLO
Markov Chains
Random Processes
X3 X2
0.3
0.3
0.1
0.2
X1
0.6
0.5
0.3
0.6
0.1
Markov chain, named after Andrey Markov, is a stochastic
process with the Markov property. Having the Markov property
means that, given the present state, future states are independent
of the past states. In other words, the description of the present
state fully captures all the information that could influence the
future evolution of the process. Being a stochastic process means
that all state transitions are probabilistic.Andrey
Andreevich
Markov
1856 - 1922
At each step the system may change its state from the
current state to another state (or remain in the same
state) according to a probability distribution. The
changes of state are called transitions, and the
probabilities associated with various state-changes are
called transition probabilities
Definition of Markov Chains
A Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov
property, namely that, given the present state, the future and past states are independent.
( ) ( )nnnnnn xXxXxXxXxX ====== ++ |Pr,,|Pr 1111 
221
SOLO
Markov Chains
Random Processes
Properties of Markov Chains
Define the probability of going from state i to state j in m time steps as:
( )
( )iXjXp om
m
ji ===→ |Pr
and the single step transition as:
( )iXjXp ji ===→ 01 |Pr
X3 X2
X1
1.011 =→p
6.021 =→p
5.012 =→p
3.032 =→p
3.023 =→p
3.031 =→p
6.013 =→p
1.033 =→p 2.022 =→p
For a time-homogeneous Markov Chain:
( )
( )iXjXp kkm
m
ji === +→ |Pr
and:
( ) ( )iXjXkp kkji === +→ |Pr 1
so, the n-step transition satisfies the Chapman-Kolmogorov equation, that for any k
such that 0 < k <n:
( ) ( ) ( )
∑∈
−
→→→ =
Sr
kn
jr
k
ri
n
ji ppp
222
SOLO
Markov Chains
Random Processes
Properties of Markov Chains (continue – 1)
The marginal distribution Pr (Xk = x) is the
distribution over states at time k:
( ) ( )iXjXkp kkji === +→ |Pr 1
X3 X2
X1
1.011 =→p
6.021 =→p
5.012 =→p
3.032 =→p
3.023 =→p
3.031 =→p
6.013 =→p
1.033 =→p 2.022 =→p
In Matrix form it can be written as:
( )
kN
kK
NNNN
N
N
kN X
X
X
ppp
ppp
ppp
X
X
X




























=














→→→
→→→
→→→
+

  





2
1
21
22221
11211
1
2
1
PrPr
where N is the number of states of the Markov Chain.










=
1.03.03.0
3.02.06.0
6.05.01.0
K
Properties of the Transition Matrix K:
( ) 10 ≤≤ → np ji
( ) 1
1
=∑=
→
N
j
ji np
1
2
For a time-homogeneous Markov Chain:
( ) ( ) ( )
( ) ( )∑
∑
∈
→
∈
++
==
=====
Sr
kjr
Sr
kkkk
rXkp
rXrXjXjX
Pr
Pr|PrPr 11
223
SOLO
Markov Chains
Random Processes
Properties of Markov Chains (continue – 2)
A state j is said to be accessible from a state i (written i → j) if a system started in state i
has a non-zero probability chance of transitioning into state j at some point. Formally,
state j is accessible from state i if there exists an integer n≥0 such that :
( ) ( )n
jion piXjX →=== |Pr
Reducibility
Allowing n to be zero means that every state is defined to be accessible from itself.
A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A set
of states C is a communicating class if every pair of states in C communicates with each
other, and no state in C communicates with any state not in C. It can be shown that
communication in this sense is an equivalence relation and thus that communicating
classes are the equivalence classes of this relation. A communicating class is closed if
the probability of leaving the class is zero, namely that if i is in C but j is not, then j is
not accessible from i.
Finally, a Markov chain is said to be irreducible if its state space is a single
communicating class; in other words, if it is possible to get to any state from any state.
224
SOLO
Markov Chains
Random Processes
Properties of Markov Chains (continue – 3)
A state i has period k if any return to state i must occur in multiples of k time steps.
Formally, the period of a state is defined as:
( ){ }0|Pr: >=== iXiXndivisorcommongreatestk on
Periodicity
Note that even though a state has period k, it may not be possible to reach the state in k
steps. For example, suppose it is possible to return to the state in {6,8,10,12,...} time
steps; then k would be 2, even though 2 does not appear in this list.
If k = 1, then the state is said to be aperiodic; otherwise (k>1), the state is said to be
periodic with period k.
It can be shown that every state in a communicating class must have the same period.
225
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),
(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)
having S (ω) as its power spectrum or R (τ) as its autocorrelation.
Proof of Existence Theorem 3
Define
( ) ( ) ( ) ( ) ( )ω
π
ω
π
ω
ωωω
π
−=
−
=== ∫
+∞
∞−
f
a
S
a
S
fdSa 22
2
:&
1
:
Since , according to Existence Theorem 1,
we can find a random variable ω with the even density function f (ω), and
probability density function
( ) ( ) 1&0 =≥ ∫
+∞
∞−
ωωω dff
( ) ( )∫∞−
=
ω
ττω dfP :
We now form the process , where is a random variable
uniform distributed in the interval (-π,+π) and independent of ω.
( ) ( )ϑω += tatx cos: ϑ
226
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 1)
Since is uniform distributed in the interval (-π,+π) and independent of ω,
its spectrum is
( ){ } ( ){ } ( ){ } ( ){ } ( ){ } 0sinsincoscos
00
,
=−=

ϑωϑω ϑωϑω
ϑω
EtEaEtEatxE
tindependen
ϑ
( ) { } ( )
ϖπ
ϖπ
ϖπϖπ
ϑ
π
ϖ
πϖπϖπ
π
ϑϖπ
π
ϑϖϑϖ
ϑϑ
sin
2
1
2
1
2
1
=
−
====
−+
−
+
−
∫ j
ee
j
e
deeES
jjj
jj
or { } ( ){ } ( ){ } ( )
ϖπ
ϖπ
ϑϖϑϖ ϑϑ
ϑϖ
ϑ
sin
sincos =+= EjEeE j
1=ϖ 1=ϖ
( ) ( ){ } ( ) ( )[ ]{ }
( ){ } ( )[ ]{ }
( ){ } ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ }
 0
2
0
22,
22
2
2sin2sin
2
2cos2cos
2
cos
2
22cos
2
cos
2
coscos
ϑτωϑτωτω
ϑτωτω
ϑτωϑωτ
ϑωϑωω
ϑω
EtE
a
EtE
a
E
a
tE
a
E
a
ttEatxtxE
tindependen
+−++=
+++=
+++=+
2=ϖ 2=ϖ
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),
(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)
having S (ω) as its power spectrum or R (τ) as its autocorrelation.
227
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 2)
( ){ } 0=txE
( ) ( ){ } ( ){ } ( ) ( ) ( )τωωτωτωτ ω xRdf
a
E
a
txtxE ===+ ∫
+∞
∞−
cos
2
cos
2
22
( ) ( )ϑω += tatx cos:We have
Because of those two properties x (t) is wide-sense stationary with a power spectrum
given by:
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=−= ττωτττωτωτω
ττ
dRdjRS x
RR
xx
xx
cossincos
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=+= ωτωω
π
ωτωτωω
π
τ
ωω
dSdjSR x
SS
xx
xx
cos
2
1
sincos
2
1
Therefore ( ) ( )ωπω faSx
2
=
q.e.d.
Fourier
Inverse
Fourier
( ) ( )∫
+∞
∞−
= ωωτω df
a
cos
2
2
f (ω) definition
( )ωS=
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),
(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)
having S (ω) as its power spectrum or R (τ) as its autocorrelation.
228
SOLO Permutation & Combinations
Permutations
Given n objects, that can be arranged in a row, how many different permutations
(new order of the objects) are possible?
To count the possible permutations , let start by moving only the first object {1}.
1
Number of permutations
2
3
n
By moving only the first object {1}, we obtained n permutations.
229
SOLO Permutation & Combinations
Permutations (continue -1)
Since we obtained all the possible position of the first object, we will perform the same
procedure with the second object no {2}, that will change position with all other objects,
in each of the n permutations that we obtained before .
For example from the group 1 we obtain the following new permutations
Number of new permutations
Since this is true for all permutations (n-1 new permutations for each of the first n
permutations) we obtain a total of n (n-1) permutations .
1
2
n-2
n-1
230
SOLO Permutation & Combinations
Permutations (continue -2)
If we will perform the same procedure with the third object {3}, that will change position
with all other objects, besides those with objects no {1} and {2} that we already obtained,
in each of the n (n-1) permutations that we obtained before , we will obtain a total
of n (n-1) (n-2) permutations.
We continue the procedure with the objects {4}, {5}, …, {n}, to obtain finally the total
number of permutations of the n objects:
n (n-1) (n-2) (n-3)… 1 = n !
The gamma function Γ is defined as: ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
Gamma Function Γ
If a = n is an integer then:
( )  ( ) ( ) ( )
( )nn
dtttnttdtttn nn
dv
u
n
Γ=
−+−−=−=+Γ ∫∫
∞
−∞
∞
0
1
0
0
expexpexp1

( ) ( ) ( ) 1expexp1 0
0
=−−=−=Γ
∞
∞
∫ tdtt
Therefore: ( ) ( ) ( ) ( ) ( )!11211 −=−−==Γ=+Γ nnnnnnn 
Table of Content
231
SOLO Permutation & Combinations
Combinations
Given k boxes, each box having a maximum capacity (for box i the maximum
object capacity is ni ).
Given also n objects, that must be arranged in k boxes, each box must be filled to it’s
maximum capacity :
The order of the objects in the box is not important.
Example: A box with a capacity of three objects in which we arranged the objects {2}, {4}, {7)
42 7
4
2 74 27
427
42 7
4 2 7
4 27
Equivalent
3!=6 arrangements
1 outcome
nnnn k =+++ 21
232
SOLO Permutation & Combinations
Combinations (continue - 1)
In order to count the different combinations we start with n ! different arrangements of the
n objects.
nnnn k =+++ 21
In each of the n! arrangements the first n1 objects will go to box no. 1, the next n2
objects in box no. 2, and so on, and the last nk objects in box no. k, and since:
all the objects are in one of the boxes.
233
SOLO Permutation & Combinations
Combinations (continue - 2)
But since the order of the objects in the boxes is not important, to obtain the number of
different combinations, we must divide the total number of permutations n! by n1!, because
of box no.1, as seen in the example bellow, where we used n1=2.
1 2 3 nn-1
n1=2 n2 nk
Box 1 Box 2 Box k
12 3 nn-1
kn
!n
21 =n 2n
123 nn-1
123 nn-1
12nn-1
12n n-1
4
4
4
4
n-2
n-2
n-3
n-3
!1n
!1n
!1n
Same
Combination
Same
Combination
Same
Combination
!
!
1n
n
Therefore since the order of the objects in the boxes is not important, and because
the box no.1 can contain only n1 objects, the number of combination are
234
SOLO Permutation & Combinations
Combinations (continue - 3)
Since the order of the objects in the boxes is not important, to obtain the number of
different combinations, we must divide the total number of arrangements n! by n1!, because
of box no.1, by n2!, because of box no.2, and so on, until nk! because of box no.k, to obtain
!!!
!
21 knnn
n

Combinations
to Bernoulli
Trials
To Generalized
Bernoulli
Trials
Table of Content
235
SOLO Review of Probability
References
[1] W.B. Davenport, Jr., and W.I. Root, “ An Introduction to the Theory
of Random Signals and Noise”, McGraw-Hill, 1958
[2] A. Papoulis, “ Probability, Random Variables and Stochastic
Processes”, McGraw-Hill, 1965
[4] S.M. Ross, “ Introduction to Probability Models”, 4th
Ed., Academic
Press, 1989
[6] R.M. McDonough, and A.W. Whalen, “ Detection of Signals in Noise”,
2nd
Ed., Academic Press, 1995
[7] Al. Spătaru, “ Teoria Transmisiunii Informaţiei – Semnale şi Perturbaţii”,
(romanian), Editura Tehnică, Bucureşti, 1965
[8] http://www.york.ac.uk/depts/maths/histstat/people/welcome.htm
[9] http://en.wikipedia.org/wiki/Category:Probability_and_statistics
[10] http://www-groups.dcs.st-and.ac.uk/~history/Biographies
[3] K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,
Estimation and Data Analysis”, John Wiley & Sons, 1988
[5] S.M. Ross, “ A Course in Simulation”, Macmillan & Collier Macmillan
Publishers, 1990
Table of Content
236
SOLO Review of Probability
Integrals Used in Probability
( )
( )
!
1
!!
1
1
0 ++
=−∫ mn
mn
duuu
mn
( ) ( ) 





−=∫ 2
1
expexp
aa
x
xadxxax ( ) ( ) 





+−=∫ a
x
a
x
a
xadxxax
2
23
2 22
expexp
( ) π
2
1
exp
0
2
=−∫
∞
dxx
( ) 0
2
1
exp
0
2
>=−∫
∞
a
a
dxxa
π
( ) π=−∫
∞
∞−
dxx2
exp
( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
( ) ,3,2,1,0!exp
0
==−∫
∞
nndxxxn
( ) ,3,2,1,0,0
!
exp 1
0
=>=− +
∞
∫ na
a
n
dxxax n
n
237
SOLO Review of Probability
Gamma Function
238
SOLO Review of Probability
Incomplete Gamma Function
January 6, 2015 239
SOLO
Technion
Israeli Institute of Technology
1964 – 1968 BSc EE
1968 – 1971 MSc EE
Israeli Air Force
1970 – 1974
RAFAEL
Israeli Armament Development Authority
1974 – 2013
Stanford University
1983 – 1986 PhD AA
240
SOLO Review of Probability
Ferdinand Georg Frobenius
(1849 –1919)
Perron–Frobenius Theorem
In linear algebra, the Perron–Frobenius Theorem, named
after Oskar Perron and Georg Frobenius, asserts that a real
square matrix with positive entries has a unique largest real
eigenvalue and that the corresponding eigenvector has
strictly positive components. This theorem has important
applications to probability theory (ergodicity of Markov
chains) and the theory of dynamical systems (subshifts of
finite type).
Oskar Perron
(1880 – 1975)
SOLO Review of Probability
Monte Carlo Categories
1. Monte Carlo Calculations
Design various random or pseudo-random number generators.
2. Monte Carlo Sampling
Develop efficient (variance – reduction oriented) sampling techniques for
estimation.
3. Monte Carlo Optimization
Optimize some (non-convex, non-differentiable) functions, to name a few,
Simulated annealing, dynamic weighting, genetic algorithms.

Introduction to Mathematical Probability

  • 1.
  • 2.
    2 SOLO Table of Content Probability SetTheory Probability Definitions Theorem of Addition Conditional Probability Total Probability Theorem Statistical Independent Events Theorem of Multiplication Conditional Probability - Bayes Formula Random Variables Probability Distribution and Probability Density Functions Conditional Probability Distribution and Conditional Probability Density Functions Expected Value or Mathematical Expectation Variance Moments Functions of one Random Variable Jointly, Distributed Random Variables Characteristic Function and Moment-Generating Function Existence Theorems (Theorem 1 & Theorem 2)
  • 3.
    3 SOLO Table of Content(continue - 1) Probability Law of Large Numbers (History) Markov’s Inequality Chebyshev’s Inequality Bienaymé’s Inequality Chernoff’s and Hoeffding’s Bounds Chernoff’s Bound Hoeffding’s Bound Convergence Concepts The Law of Large Numbers Central Limit Theorem Bernoulli Trials – The Binomial Distribution Poisson Asymptotical Development (Law of Rare Events) Normal (Gaussian) Distribution De Moivre-Laplace Asymptotical Development Laplacian Distribution Gama Distribution Beta Distribution Distributions
  • 4.
    4 SOLO Table of Content(continue - 2) Probability Cauchy Distribution Exponential Distribution Chi-square Distribution Student’s t-Distribution Uniform Distribution (Continuous) Rayleigh Distribution Rice Distribution Weibull Distribution Kinetic Theory of Gases Maxwell’s Velocity Distribution Molecular Models Boltzman Statistics Bose-Einstein Statistics Fermi-Dirac Statistics Monte Carlo Method Generating Continuous Random Variables Importance Sampling Generating Discrete Random Variables Metropolis & Metropolis – Hastings Algorithms Markov Chain Monte Carlo (MCMC) Gibbs Sampling Monte Carlo Integration
  • 5.
    5 SOLO Table of Content(continue - 3) Probability Appendices Permutations Combinations References Random Processes Stationarity of a Random Process Ergodicity Markov Processes White Noise Markov Chains Existence Theorems (Theorem 3)
  • 6.
    6 SOLO Set Theory A= (ζ1, ζ2,…, ζn) – a set of n elements A set A is a collection of objects (elements of the set) ζ1, ζ2,…, ζn A (x)= (|x| < 1) – a set of all numbers smaller than 1 A (x,y)= (0 <x <T, 0<y<T) – a set of points (x,y) in a square Ǿ - the set that contains no elements S - the set that contains all elements (Set space) S A = Set space of a die: six independent events {1}, {2}, {3}, {4}, {5}, {6} Examples
  • 7.
    7 SOLO Set Theory SetOperations Inclusion - A is included in B ifBA ⊂ BxAx ∈⇒∈∀ Equality ( ) ( )ABandBABA ⊂⊂⇔= Addition BxorAxifBAxBA ∈∈∪∈⇒∪ Multiplication BxandAxifBAxBA ∈∈∩∈⇒∩ ( ) ( )CBACBA ∪∪=∪∪ AAA =∪ AOA =/∪ SSA =∪ AAA =∩ OOA /=/∩ ASA =∩ S A B S A B BA∪ S A B BA∩ Complement ofA A OAAandSAA /=∩=∪⇒ S A A Difference BABA ∩=− S BA− B A AB −
  • 8.
    8 SOLO Set Theory SetOperations Incompatible Sets A and B are incompatible iff OBA /=∩ Decomposition of a Set jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21 S OBA /=∩ B A S nAAAA ∪∪∪= 21 1A 2A nA jiOAA ji ≠∀/=∩ S 1A 2A nA If we say that A is decomposed in incompatible sets. jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If we say that the set space S is decomposed in exhaustive and incompatible sets. De Morgan Law ( ) BABA ∩=∪ ( ) BABA ∪=∩ To find the complement of a set operations we must interchange between and , and use the complements of the sets.∩ ∪ August De Morgan (1806 – 1871) On other form of De Morgan Law AA i i i  = i i i i AA  = Table of Content
  • 9.
    9 SOLO Probability Pr (A)is the probability of the event A if S nAAAA ∪∪∪= 21 1A 2A nA jiOAA ji ≠∀/=∩ ( ) 0Pr ≥A(1) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21 ( ) 1Pr =S(2) then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++=  Probability Axiomatic Definition Probability Geometric Definition Assume that the probability of an event in a geometric region A is defined as the ratio between A surface to surface of S. ( ) ( ) ( )SSurface ASurface A =Pr ( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21 then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++=  S A
  • 10.
    10 SOLO Probability From thosedefinition we can prove the following:( ) 0=/OP(1’) Proof: OOSandOSS /=/∩/∪= ( ) ( ) ( ) ( ) ( ) 0PrPrPrPr 3 =/⇒/+=⇒ OOSS ( ) ( )APAP −= 1(2’) Proof: OAAandAAS /=∩∪= ( ) ( ) ( ) ( ) ( ) ( ) ( )AAAAS Pr1PrPrPr1Pr 32 −=⇒+==⇒ ( ) 1Pr0 ≤≤ A(3’) Proof: ( ) ( ) ( ) ( ) ( ) 1Pr0Pr1Pr 1'2 ≤⇒≥−= AAA ( ) ( )APr0 1 ≤ ( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21 then ( ) ( ) ( ) ( )n AAAA PrPrPrPr 21 +++=  ( ) ( )AABAIf PrPr ≤⇒⊂(4’) Proof: ( ) ( ) ( ) ( ) ( ) ( )BAAABB PrPr0PrPrPr 00 3 ≤⇒≥+−= ≥≥  ( ) ( ) OAABandAABB /=∩−∪−= ( ) ( ) ( ) ( )BABABA ∩−+=∪ PrPrPrPr(5’) Proof: ( ) ( ) ( ) ( ) ( ) ( ) OABBAandABBAB OABAandABABA /=−∩∩−∪∩= /=−∩−∪=∪ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )BABABA ABBAB ABABA ∩−+=∪⇒     −+∩= −+=∪ PrPrPrPr PrPrPr PrPrPr 3 3 Table of Content
  • 11.
    11 SOLO Probability ( )( ) ( ) ( )      −+−+−=      = −       ≠≠       ≠       == ∑∑∑   n i i n n kji kji kji n ji ji ji n i i n i i AAAAAAAA 1 1 3 ,. 2 . 1 11 Pr1PrPrPrPr(6’) Proof by induction: ( ) ( ) ( ) ( )212121 PrPrPrPr AAAAAA ∩−+=∪For n = 2 we found that satisfies the equation Assume equation true for n – 1. ( ) ( ) ( ) ( ) ( ) ( )   −+−+−=     − = −       − ≠≠       − ≠       − = − = ∑∑∑   1 1 2 3 1 ,. 2 1 . 1 1 1 1 1 Pr1PrPrPrPr n i ni n n kji kji nkji n ji ji nji n i ni n i ni AAAAAAAAAAAAA Let calculate for n but ( ) ( ) ( ) ( ) ( ) ( ) ( )   −+     −+−+−=    −+     =         =      − = − = −       − ≠≠       − ≠       − = − = − = − == ∑∑∑     1 1 1 1 2 3 1 ,. 2 1 . 1 1 1 1 1 1 1 1 11 Pr1PrPrPr PrPrPrPrPr n i nin n i i n n kji kji kji n ji ji ji n i i n n i in n i in n i i n i i AAPAPAAAAAAA AAAAAAA ( ) ( ) ( ) ( )      −+−+−=      − = −       − ≠≠       − ≠       − = − = ∑∑∑   1 1 2 3 1 ,. 2 1 . 1 1 1 1 1 Pr1PrPrPrPr n i i n n kji kji kji n ji ji ji n i i n i i AAAAAAAA Theorem of Addition
  • 12.
    12 SOLO Probability (6’) Proof byinduction (continue): ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )   −−−−+−+      −++−+−=      − = −       − ≠≠       − ≠       − = − = −       − ≠≠≠       − ≠≠       − ≠       − == ∑∑∑ ∑∑∑∑     1 1 2 3 1 ,. 2 1 . 1 1 1 1 1 2 4 1 .,. 3 1 ,. 2 1 . 1 1 11 Pr1PrPrPrPr Pr1PrPrPrPrPr n i ni n n kji kji nkji n ji ji nji n i nin n i i n n lkji lkji lkji n kji kji kji n ji ji ji n i i n i i AAAAAAAAAAAA AAAAAAAAAAAA Use the fact that ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       − − +      − = −−− − + −− − = −− − − = − =      1 11 !1!1 !1 !!1 !1 !!1 !1 !! ! k n k n kkn n kkn n kkn n kn n kkn n k n to obtain q.e.d. ( ) ( ) ( ) ( )      −+−+−=      = −       ≠≠       ≠       == ∑∑∑   n i i n n kji kji kji n ji ji ji n i i n i i AAAAAAAA 1 1 3 ,. 2 . 1 11 Pr1PrPrPrPr ( ) ( ) ( ) ( ) ( ) ( )   −+−+−=     − = −       − ≠≠       − ≠       − = − = ∑∑∑   1 1 2 3 1 ,. 2 1 . 1 1 1 1 1 Pr1PrPrPrPr n i ni n n kji kji nkji n ji ji nji n i ni n i ni AAAAAAAAAAAAA ( ) ( ) ( ) ( )      −+−+−=      = −       ≠≠       ≠       == ∑∑∑   n i i n n kji kji kji n ji ji ji n i i n i i AAAAAAAA 1 1 3 ,. 2 . 1 11 Pr1PrPrPrPr Theorem of Addition (continue) Table of Content
  • 13.
    13 SOLO Probability Conditional Probability SnAAAA ααα ∪∪∪= 21  1αA jiOAA ji ≠∀/=∩ 1αβA mAAAB βββ ∪∪∪= 212αA 2αβA 1βA 2βA  Given two events A and B decomposed in elementary events jiOAAandAAAAA ji n i in ≠∀/=∩=∪∪∪= = αααααα  1 21 lkOAAandAAAAB lk m k km ≠∀/=∩=∪∪∪= = ββββββ  1 21 jiOAAandAAABA jir ≠∀/=∩∪∪∪=∩ αβαβαβαβαβ 21 ( ) ( ) ( ) ( )n AAAA ααα PrPrPrPr 21 +++=  ( ) ( ) ( ) ( )mAAAB βββ PrPrPrPr 21 +++=  ( ) ( ) ( ) ( ) nmrAAABA r ,PrPrPrPr 21 ≤+++=∩ βαβαβα  We want to find the probability of A event under the condition that the event B had occurred designed as P (A|B) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )B BA AAA AAA BA m r Pr Pr PrPrPr PrPrPr |Pr 21 21 ∩ = +++ +++ = βββ βαβαβα  
  • 14.
    14 SOLO Probability Conditional ProbabilityS nAAAA ααα ∪∪∪= 21  1αA jiOAA ji ≠∀/=∩ 1αβA mAAAB βββ ∪∪∪= 212αA 2αβA 1βA 2βA  If the events A and B are statistical independent, that the fact that B occurred will not affect the probability of A to occur. ( ) ( ) ( )B BA BA Pr Pr |Pr ∩ = ( ) ( ) ( )A BA AB Pr Pr |Pr ∩ = ( ) ( )ABA Pr|Pr = ( ) ( ) ( ) ( ) ( ) ( ) ( )BAAABBBABA PrPrPr|PrPr|PrPr ⋅=⋅=⋅=∩ Definition: n events Ai i = 1,2,…n are statistical independent if: ( ) nrAA r i i r i i ,,2PrPr 11  =∀=      ∏== Table of Content
  • 15.
    15 SOLO Probability Conditional Probability- Bayes Formula Using the relation: ( ) ( ) ( ) ( ) ( )llll AABBBABA ββββ Pr|PrPr|PrPr ⋅=⋅=∩ ( ) ( ) ( ) klOBABABAB lk m k k , 1 ∀/=∩∩∩∩= = βββ ( ) ( )∑ = ∩= m k k BAB 1 PrPr β we obtain: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∑= ⋅ ⋅ = ⋅ = m k kk llll l AAB AAB B AAB BA 1 Pr|Pr Pr|Pr Pr Pr|Pr |Pr ββ ββββ β and Bayes Formula Thomas Bayes 1702 - 1761 Table of Content ( ) ( ) ( ) ( ) ( ) ( )∑∑∑ === ⋅=⋅=∩= m k kk m k k m k k AABBBABAB 111 Pr|PrPr|PrPrPr ββββ
  • 16.
    16 SOLO Probability Total ProbabilityTheorem Table of Content jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If we say that the set space S is decomposed in exhaustive and incompatible (exclusive) sets. The Total Probability Theorem states that for any event B, its probability can be decomposed in terms of conditional probability as follows: ( ) ( ) ( ) ( )∑∑ == == n i i n i i BPBABAB 11 |Pr,PrPr Using the relation: ( ) ( ) ( ) ( ) ( )llll AABBBABA Pr|PrPr|PrPr ⋅=⋅=∩ ( ) ( ) ( ) klOBABABAB lk n k k , 1 ∀/=∩∩∩∩= =  ( ) ( )∑= ∩= n k k BAB 1 PrPr For any event B we obtain:
  • 17.
    17 SOLO Probability Statistical IndependentEvents ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∏∑∏∑∏∑ ∑∑∑ = −       ≠≠ =       ≠ =       = = −       ≠≠       ≠       == −+−+−=       −+−+−=      n i i n n kji kji i i n ji ji i i n i i tIndependen lStatisticaA n i i n n kji kji kji n ji ji ji n i i n i i AAAA AAAAAAAA i 1 1 3 ,. 3 1 2 . 2 1 1 1 1 1 3 ,. 2 . 1 11 Pr1PrPrPr Pr1PrPrPrPr    From Theorem of Addition Therefore ( )[ ]∏== −=      − n i i tIndependen lStatisticaA n i i AA i 11 Pr1Pr1  ( )[ ]∏== −−=      n i i tIndependen lStatisticaA n i i AA i 11 Pr11Pr  Since OAASAA n i i n i i n i i n i i /=               =               ====   1111 &         =      − ==  n i i n i i AA 11 PrPr1 ( )∏== =      n i i tIndependen lStatisticaA n i i AA i 11 PrPr  If the n events Ai i = 1,2,…n are statistical independent than are also statistical independentiA ( )∏= = n i iA 1 Pr      = =  n i i MorganDe A 1 Pr ( )[ ]∏= −= n i i tIndependen lStatisticaA A i 1 Pr1 ( ) nrAA r i i r i i ,,2PrPr 11  =∀=      ∏== Table of Content
  • 18.
    18 SOLO Probability Theorem ofMultiplication ( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅= Proof ( ) ( ) ( )ABABA /PrPrPr ⋅=∩Start from ( )[ ] ( ) ( )12121 /PrPrPr AAAAAAA nn  ⋅= ( ) ( ) ( )2131212 /Pr/Pr/Pr AAAAAAAAA nn  ⋅= in the same way ( ) ( ) ( )12122112211 /Pr/Pr/Pr −−−−− ⋅= nnnnnnn AAAAAAAAAAAAA   From those results we obtain: ( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅= q.e.d. Table of Content
  • 19.
    19 SOLO Review ofProbability Random Variables Let ascribe to each outcome or event a real number, such we have a one-to-one correspondence between the real numbers and the Space of Events. Any function that assigns a real number to each event in the Space of Events is called a random variable (a random function is more correct, but this is the used terminology). X x 0 X 1 2 3 4 5 6 x The random variables can be: - Discrete random variables for discrete events - Continuous random variables for continuous events Table of Content
  • 20.
    20 SOLO Review ofProbability Probability Distribution and Probability Density Functions The random variables map the space of events X to the space of real numbers x. ( )xP x 0 ∞+∞− 0.1 The Probability Distribution Function or Cumulative Probability Distribution Function of x can be defined as: (1) PX (x) is a monotonic increasing function ( ) ( ) ∞≤≤∞−≤= xxXxPX Pr: The Probability Distribution Function has the following properties: ( ) ∞≤≤∞−=∞− xPX 0 (2) ( ) ∞≤≤∞−=∞+ xPX 1 (3) ( ) ( ) 2121 xxxPxP XX ≤⇔≤ The Probability that X lies in the interval (a,b) is given by: ( ) ( ) ( ) 0Pr ≥−=≤< aPbPbXa XX If PX (x) is a continuous differentiable function of x we can define ( ) ( ) ( ) ( ) ( ) 0lim Pr lim: 00 ≥= ∆ −∆+ = ∆ ∆+≤< = →∆→∆ xd xPd x xPxxP x xxXx xp XXX xx X the Probability Density Function of x. ( )xp x 0 ∞+∞− 0.1
  • 21.
    21 SOLO Review ofProbability Probability Distribution and Probability Density Functions (continue – 1) The Probability Distribution and Probability Density Functions of x can be defined also for discrete random variables. ( ) ( ) ( ) ( ) ( ) integer 61 616/1 10 6 1 Pr 0 6 10 k k kk k dxixdxxpkXkxP k i k XX      ≤ <≤++ < =−==≤== ∫∑∫ = δ ( )xp x 0 6 0.1 1 2 3 4 5 ( )xP 6/1 3/1 2/1 3/2 6/5 Example Set space of a die: six independent events {x=1}, {x=2}, {x=3}, {x=4}, {x=5}, {x=6} ( ) ( )∑ = −= 6 16 1 : i X ixxp δ Where δ (x) is the Dirac delta function ( ) ( ) 1& 0 00 =    =∞ ≠ = ∫ +∞ ∞− dxx x x x δδ X 1 2 3 4 5 6 x
  • 22.
    22 SOLO Review ofProbability Probability Distribution and Probability Density Functions (Examples) (2) Poisson’s Distribution ( ) ( )0 0 exp ! , k k k nkp k −≈ (1) Binomial (Bernoulli) ( ) ( ) ( ) ( ) knkknk pp k n pp knk n nkp −− −      =− − = 11 !! ! , 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k ( )nkP , (3) Normal (Gaussian) ( ) ( ) ( )[ ] σπ σµ σµ 2 2/exp ,; 22 −− = x xp (4) Laplacian Distribution ( )         − −= b x b bxp µ µ exp 2 1 ,;
  • 23.
    23 SOLO Review ofProbability Probability Distribution and Probability Density Functions (Examples) (5) Gama Distribution ( ) ( ) ( )      < ≥ Γ − = − 00 0 /exp ,; 1 x xx k x kxp k k θ θ θ (6) Beta Distribution ( ) ( ) ( ) ( ) ( ) ( ) ( ) 11 1 0 11 11 1 1 1 ,; −− −− −− − ΓΓ +Γ = − − = ∫ βα βα βα βα βα βα xx duuu xx xp (7) Cauchy Distribution ( ) ( )       +− = 22 0 0 1 ,; γ γ π γ xx xxp
  • 24.
    24 SOLO Review ofProbability Probability Distribution and Probability Density Functions (Examples) SOLO (8) Exponential Distribution ( ) ( )    < ≥− = 00 0exp ; x xx xp λλ λ (9) Chi-square Distribution ( ) ( ) ( ) ( )      < ≥− Γ= − 00 02/exp 2/ 2/1 ; 12/ 2/ x xxx kkxp k k Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a (10) Student’s t-Distribution ( ) ( )[ ] ( ) ( )( ) 2/12 /12/ 2/1 ; + +Γ +Γ = ν ννπν ν ν x xp
  • 25.
    25 SOLO Review ofProbability Probability Distribution and Probability Density Functions (Examples) SOLO (11) Uniform Distribution (Continuous) ( )      >> ≤≤ −= bxxa bxa abbaxp 0 1 ,; (12) Rayleigh Distribution ( ) 2 2 2 2 exp ; σ σ σ       − = x x xp (13) Rice Distribution ( )             + − = 202 2 22 2 exp ,; σσ σ σ vx I vx x vxp
  • 26.
    26 SOLO Review ofProbability Probability Distribution and Probability Density Functions (Examples) (14) Weibull Distribution SOLO ( )      < >≥               − −      − = − 00 0,,exp ,,; 1 x x xx xp αγµ α µ α µ α γ αµγ γγ Weibull Distribution Table of Content
  • 27.
    27 SOLO Review ofProbability Conditional Probability Distribution and Conditional Probability Density Functions SOLO The Conditional Probability Distribution Function or Cumulative Conditional Probability Distribution Function of x given is defined as: ( ) ( ) ∞<<∞−∈≤= xYyxXyxP YX /Pr:// Yy ∈ (1) ( ) ∞≤≤∞−=∞− xyP YX 0// (2) ( ) ∞≤≤∞−=∞+ xyP YX 1// PX/Y (x/y) is a monotonic increasing function(3) ( ) ( ) 212/1/ // xxyxPyxP YXYX ≤⇔≤ The Probability that X lies in the interval (a,b) given is given by: ( ) ( ) ( ) 0///Pr // ≥−=≤< yaPybPYbXa YXYX If PX/Y (x/y) is a continuous differentiable function of x we can define ( ) ( ) ( ) ( ) ( ) 0 /// lim /Pr lim:/ /// 00 / ≥= ∆ −∆+ = ∆ ∆+≤< = →∆→∆ xd yxPd x yxPyxxP x YxxXx yxp YXYXYX xx YX the Conditional Probability Density Function of x. Yy ∈ The random variables map the space of events X to the space of real numbers x.
  • 28.
    28 SOLO Review ofProbability Conditional Probability Distribution and Conditional Probability Density Functions SOLO Example 1 Given PX (x) and pX (x) find PX/Y (x/x ≤ a) and pX/Y (x/x ≤ a) ( ) ( ) ( )   ≤ > =≤ axaPxP ax axxP XX YX / 1 // ( ) ( ) ( )   ≤ > =≤ axaPxp ax axxp XX YX / 0 // Example 2 Given PX (x) and pX (x) find PX/Y (x/ b <x ≤ a) and pX/Y (x/ b< x ≤ a) ( ) ( ) ( ) ( ) ( )        ≥ <≤ − − < =≤ ax axb bPaP bPxP bx axxP XX XX YX 1 0 // ( ) ( ) ( ) ( )        ≥ <≤ − < =≤ ax axb bPaP xp bx axxp XX X YX 0 0 // Table of Content
  • 29.
    29 SOLO Review ofProbability Expected Value or Mathematical Expectation Given a Probability Density Function p (x) we define the Expected Value For a Continuous Random Variable: ( ) ( )∫ +∞ ∞− = dxxpxxE X: For a Discrete Random Variable: ( ) ( )∑= k kXk xpxxE : For a general function g (x) of the Random Variable x: ( )[ ] ( ) ( )∫ +∞ ∞− = dxxpxgxgE X: ( )xp x 0 ∞+∞− 0.1 ( )xE ( ) ( ) ( )∫ ∫ ∞+ ∞− +∞ ∞− = dxxp dxxpx xE X X : The Expected Value is the center of surface enclosed between the Probability Density Function and x axis. Table of Content
  • 30.
    30 SOLO Review ofProbability Variance Given a Probability Density Functions p (x) we define the Variance ( ) ( )[ ]{ } ( ) ( )[ ] ( ) ( )22222 2: xExExExExxExExExVar −=+−=−= Central Moment ( ) { }k k xEx =:'µ Given a Probability Density Functions p (x) we define the Central Moment of order k about the origin ( ) ( )[ ]{ } ( ) ( )∑= −− −      =−= k j jk j jkk k xE j k xExEx 0 '1: µµ Given a Probability Density Functions p (x) we define the Central Moment of order k about the Mean E (x) Table of Content
  • 31.
    31 SOLO Review ofProbability Moments Normal Distribution ( ) ( ) ( )[ ] σπ σ σ 2 2/exp ; 22 x xpX − = [ ] ( )    −⋅ = oddnfor evennforn xE n n 0 131 σ [ ] ( )      += =−⋅ = + 12!2 2 2131 12 knfork knforn xE kk n n σ π σ Proof: Start from: and differentiate k time with respect to a( ) 0exp 2 >=−∫ ∞ ∞− a a dxxa π Substitute a = 1/(2σ2 ) to obtain E [xn ] ( ) ( ) 0 2 1231 exp 12 22 > −⋅ =− + ∞ ∞− ∫ a a k dxxax kk k π [ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) 12 ! 0 122/ 0 222221212 !2 2 exp 2 22 2/exp 2 2 2/exp 2 1 2 + ∞+ = ∞∞ ∞− ++ =−= −=−= ∫ ∫∫ kk k k k xy kkk kdyyy xdxxxdxxxxE σ πσ σ π σ σπ σ σπ σ    Now let compute: [ ] [ ]( )2244 33 xExE == σ Chi-square
  • 32.
    32 SOLO Review ofProbability Moments Gama Distribution ( ) ( ) ( )      < ≥ Γ − = − 00 0 /exp ,; 1 x xx k x kxp k k θ θ θ Beta Distribution ( ) ( ) ( ) ( ) ( ) ( ) ( ) 11 1 0 11 11 1 1 1 ,; −− −− −− − ΓΓ +Γ = − − = ∫ βα βα βα βα βα βα xx duuu xx xp [ ] ( ) ( ) ( ) ( ) ( ) ( ) n knn kn k n k kndxx x k dxxx k xE θ θθ θ θ θ θ Γ +Γ =      − Γ =− Γ = ∫∫ ∞ −+∞ −+ 0 1 0 1 /exp/exp 1 Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a
  • 33.
    33 SOLO Review ofProbability Moments Uniform Distribution (Continuous) ( )      >>− ≤≤− =− cxxc cxc cccxp 0 2 1 ,; [ ]      += + == − + − ∫ oddnfor evennfor n c n x c dxx c xE n c c nc c nn 0 12 12 1 2 1 2 1 Rayleigh Distribution ( )       −= 2 2 2 2 exp; σσ σ xx xp [ ] ( )      = −=−⋅ = =      −=      −= ∫∫ ∞ ∞− + ∞ knfork knforn dx x xdx xx xxE kk n nnn 2!2 12131 2 2 exp 2 1 2 exp 2 2 2 1 0 22 2 2 σ σ π σσσσ 
  • 34.
    34 SOLO Review ofProbability Example Repeat an experiment m times to obtain X1, X2,…,Xm. Define: Statistical Estimation: m XXX X m m +++ = 21 Sample Variation: ( ) ( ) ( ) m XXXXXX V mmmm m 22 2 2 1 −++−+− =  ( ) ( )[ ] 22 σµµ =−= ii XEXE ( ) ( )[ ] jiXXE ji ≠∀=−− 0µµSince the experiment are uncorrelated: ( ) ( ) ( ) ( ) µ= +++ = m XEXEXE XE m m 21 ( ) ( )[ ]{ } ( ) ( ) ( ) mm m m XXX EXEXEXVar m mmmXm 2 2 22 21 22 σσµµµ σ ==               −++−+− =−== 
  • 35.
    35 SOLO Review ofProbability Example (continue) Statistical Estimation: m XXX X m m +++ = 21 Sample Variation: ( ) ( ) ( ) m XXXXXX V mmmm m 22 2 2 1 −++−+− =  Let compute: ( )mVE ( ) ( ) ( )[ ]{ } ( )[ ] ( )[ ] ( )( )[ ] m XXEXEXE m XXE m XX E mimimimi µµµµµµ −−−−+− = −−− =         − 2 2222 ( )( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] mm XXX XEXXE jiXXE XE mi imi ji i 20 1 22 σµµµ µµµ µµ σµ ≠∀=−− =− =      −++−++− −=−−  Therefore: ( ) ( )[ ] ( )[ ] ( )( )[ ] 2 22 2 222 1 2 2 σ σσ σ µµµµ m m m mm m XXEXEXE m XX E mimimi − = −+ = −−−−+− =         − ( ) ( )[ ] ( )[ ] ( )[ ] 2 2 22 2 2 1 1 1 σ σ m m m m m m m XXEXXEXXE VE mmmm m − = − = −++−+− =  Table of Content
  • 36.
    36 SOLO Review ofProbability Functions of one Random Variable Let y = g (x) a given function of the random variable x defined o the domain Ω, with probability distribution pX (x). We want to find pY (y). Fundamental Theorem Assume x1, x2, …, xn all the solutions of the equation ( ) ( ) ( )n xgxgxgy ==== 21 ( ) ( ) ( ) ( ) ( ) ( ) ( )n nXXX Y xg xp xg xp xg xp yp ''' 2 2 1 1 +++=  ( ) ( ) xd xgd xg =:' Proof ( ) ( ) ( ) ( ) ( ) ( )∑∑∑ === ==±≤≤=+≤≤= n i i iX n i iiX n i iiiY yd xg xp xdxpxdxxxydyYyydyp 111 ' PrPr: q.e.d. Cauchy Distribution Derivation of Chi-square
  • 37.
    37 SOLO Review ofProbability Functions of one Random Variable (continue – 1) Example 1 bxay += ( )       − = a by p a yp XY 1 Example 2 x a y = ( )       = y a p y a yp XY 2 Example 3 2 xay = ( ) ( )yU a y p a y p ya yp XXY                 −+         = 2 1 Example 4 xy = ( ) ( ) ( )[ ] ( )yUypypyp XXY −+= Table of Content
  • 38.
    38 SOLO Review ofProbability Jointly, Distributed Random Variables We are interested in function of several variables. ( ) ( )nnnXXX xXxXxXxxxP n ≤≤≤= ,,,Pr:,,, 22112121  The Jointly Cumulative Probability Distribution of the random variables X1, X2, …,Xn is defined as: The Cumulative Probability Distribution of the random variable Xi, can be obtained from ( ) ( ) ( )∞∞∞=∞≤≤∞≤∞≤= ,,,,,,,,,,Pr 2121   iXXXniiiX xPXxXXXxP ni ( )nXXX xxxP n ,,, 2121  If the Jointly Cumulative Probability Distribution is continuous and differentiable in each of the components than we can define the Joint Probability Density Function as: ( ) ( ) n nXXX n nXXX xxx xxxP xxxp n n ∂∂∂ ∂ =      21 21 21 ,,, :,,, 21 21 ( ) ( )∫ ∫ ∫ ∞ ∞− ∞ ∞− ∞ ∞− ≠ ≠ = ik ik ni nknXXXiX xdxdxdxxxpxp ,,,,,,, 12121  
  • 39.
    39 SOLO Review ofProbability Jointly, Distributed Random Variables (continue – 1) We define: ( )[ ] ( ) ( )∫ ∫ ∞ ∞− ∞ ∞− = nnXXXnn xdxdxxxpxxxgxxxgE n ,,,,,,,,:,,, 1212121 21   ∑ = =+++= m i imm XXXXS 1 21 :  Example: Given the Sum of m Variables [ ] ( ) ( ) ( ) ( )∑∑ ∫ ∫ ∫ ∫ == ∞ ∞− ∞ ∞− ∞ ∞− ∞ ∞− == +++= m i i m i nnXXXi nnXXXnm xExdxdxxxpx xdxdxxxpxxxSE n n 11 121 12121 ,,,,, ,,,,,: 21 21     [ ] ( )[ ]{ } ( ) ( ) ( )[ ] ( )[ ]{ } ( )[ ] ( )[ ]{ } ( ) ( )∑∑∑ ∑∑∑ ∑ ≠ = ≠ == ≠ = ≠ == = ∑= ∑= += −−+−=             −=−= = = m ji i m ij j ji m i i m ji i m ij j jjii m i ii m i ii XS XESE mmm XXCovXVar XEXXEXEXEXE XEXESESESVar m i im m i im 1 11 1 11 2 2 1 2 ,2 2 : 1 1
  • 40.
    40 SOLO Review ofProbability Jointly, Distributed Random Variables (continue – 2) Given the joint density function of n random variables X1, X2, …, Xn: ( )nXXX xxxp n ,,, 2121  we want to find the joint density function of n random variables Y1, Y2, …, Yn that are related to X1, X2, …, Xn, through ( ) ( ) ( )nnn n n XXXgY XXXgY XXXgY ,,, ,,, ,,, 21 2122 2111     = = =                                           ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ =                     n n nnn n n n Xd Xd Xd X g X g X g X g X g X g X g X g X g Yd Yd Yd       2 1 21 2 2 2 1 2 1 2 1 1 1 2 1 Assuming that the Jacobian ( )                       ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ = n nnn n n n X g X g X g X g X g X g X g X g X g XXXJ      21 2 2 2 1 2 1 2 1 1 1 21 det,,, is nonzero for each X1, X2, …, Xn, exists a unique solution Y1, Y2, …, Yn
  • 41.
    41 SOLO Review ofProbability Jointly, Distributed Random Variables (continue – 3) Assume that for a given Y1, Y2, …, Yn we can find k solutions (X1, X2, …, Xn)1,… ( X1, X2, …, Xn)k. ( ) ( ) ( ) ( ) ( ) ( )∑ ∑∑ = == = =±≤≤±≤≤= +≤≤+≤≤= k i n n nXX k i nnXX k i innnn nnnnnnYY ydyd xxJ xxp xdxdxxpxdxXxxdxXx ydyYyydyYyydydyyp n n n 1 1 1 1 1 11 1 1111 111111 ,, ,, ,,,,Pr ,,Pr:,, 1 1 1         Therefore ( ) ( ) ( )∑ = = k i n nXX nYY xxJ xxp yyp n n 1 1 1 1 ,, ,, ,, 1 1      The relation between the differential volume in (Y1, Y2, …, Yn) and the differential volume in (X1, X2, …, Xn) is given by ( ) nnn xdxdxxJydyd  111 ,,= 1xd 2xd 3xd 1yd 2yd 3yd
  • 42.
    42 SOLO Review ofProbability Jointly, Distributed Random Variables (continue – 4) Example 1 ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) 0, /exp/exp/exp , 11 11 , ≥ ΓΓ +− = Γ − Γ − = −− + −− yxyx yxyyxx yxp YX βα βαβ β α α θβα θ θβ θ θα θ X and Y are independent gamma random variables with parameters (α,λ) and (β, λ), respectively, compute the joint densities of U= X + Y and V = X / (X + Y) ( ) ( ) ( ) ( )   −= = ⇔    +== +== VUY VUX YXXYXgV YXYXgU 1/, , 2 1 ( ) ( ) UYX YX X YX YJ 11 11 22 −= + −= + − + = ( ) ( ) ( ) ( )[ ] ( )[ ] [ ] ( ) ( ) ( ) ( )[ ] uvuuv u vuuvJ vuuvp yxJ yxp vup YXYX VU 11,, , 1 /exp 1, 1, , , , −− + − ΓΓ − = − − == βα βα θβα θ [ ] ( ) ( ) ( ) ( ) ( ) 11 1 1 /exp −− + −+ − ΓΓ +Γ +Γ − = βα βα βα βα βα θβα θ vv uu Therefore ( ) [ ] ( ) βα βα θβα θ + −+ +Γ − = /exp1 uu upU ( ) ( ) ( ) ( ) ( ) 11 1 −− − ΓΓ +Γ = βα βα βα vvvpV gamma distribution beta distribution Table of Content
  • 43.
    43 SOLO Review ofProbability Characteristic Function and Moment-Generating Function Given a Probability Density Functions pX (x) we define the Characteristic Function or Moment Generating Function ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( )     = ==Φ ∑ ∫∫ +∞ ∞− +∞ ∞− x X XX X discretexxpxj continuousxxPdxjdxxpxj xjE ω ωω ωω exp expexp exp: This is in fact the complex conjugate of the Fourier Transfer of the Probability Density Function. This function is always defined since the condition of the existence of a Fourier Transfer : Given the Characteristic Function we can find the Probability Density Functions pX (x) using the Inverse Fourier Transfer: ( ) ( ) ( ) ∞<== ∫∫ +∞ ∞− ≥+∞ ∞− 1 0 dxxpdxxp X xp X ( ) ( ) ( )∫ +∞ ∞− Φ−= ωωω π dxjxp XX exp 2 1 is always fulfilled.
  • 44.
    44 SOLO Review ofProbability Properties of Moment-Generating Function ( ) ( ) ( )∫ +∞ ∞− = Φ dxxpxxjj d d X X ω ω ω exp ( ) ( ) 10 ==Φ ∫ +∞ ∞− = dxxpXX ω ω ( ) ( ) ( )xEjdxxpxj d d X X == Φ ∫ +∞ ∞−=0ω ω ω ( ) ( ) ( ) ( )∫ +∞ ∞− = Φ dxxpxxjj d d X X 22 2 2 exp ω ω ω ( ) ( ) ( ) ( ) ( )2222 0 2 2 xEjdxxpxj d d X X == Φ ∫ +∞ ∞−=ω ω ω ( ) ( ) ( ) ( )∫ +∞ ∞− = Φ dxxpxxjj d d X nn n X n ω ω ω exp ( ) ( ) ( ) ( ) ( )nn X nn n X n xEjdxxpxj d d == Φ ∫ +∞ ∞−=0ω ω ω   ( ) ( ) ( )∫ +∞ ∞− =Φ dxxpxj XX ωω exp This is the reason why ΦX (ω) is also called the Moment-Generation Function.
  • 45.
    45 SOLO Review ofProbability Properties of Moment-Generating Function ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )   +++++= + Φ ++ Φ + Φ +Φ=Φ === = n n n n X n XX XX xE n j xE j xE j d d nd d d d !!2!1 1 ! 1 !2 1 2 2 0 2 0 2 2 0 0 ωωω ω ω ω ω ω ω ω ω ω ωω ωωω ω Develop ΦX (ω) in a Taylor series ( ) ( ) ( )∫ +∞ ∞− =Φ dxxpxj XX ωω exp
  • 46.
    46 SOLO Review ofProbability Moment-Generating Function Binomial Distribution ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( )[ ] n n k knk n k knk pjp ppkj knk n pp knk n kjkjE −+= − − =− − ==Φ ∑∑ = − = − 1exp 1exp !! ! 1 !! ! expexp 00 ω ωωωω ( ) ( ) ( ) knk pp knk n nkp − − − = 1 !! ! , Poisson Distribution ( ) ( ) integerpositive ! exp ; k k kp k λλ λ − = ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( )[ ]{ }1expexpexpexpexp ! exp exp ! exp exp 00 −=−= −= − =Φ ∑∑ ∞ = ∞ = ωλλωλ λω λ λλ ωω jj k j k kj k k k k Exponential Distribution ( ) ( )    < ≥− = 00 0exp ; x xx xp λλ λ ( ) ( ) ( ) ( )[ ] ( )[ ] ωλ λ λω λω λ λωλλλωω j xj j dxxjdxxxj − =− − = −=−=Φ ∞ ∞∞ ∫∫ 0 00 exp expexpexp
  • 47.
    47 SOLO Review ofProbability Moment-Generating Function Normal Distribution ( ) ( )       − −= 2 2 2 exp 2 1 ,; σ µ σπ σµ x xp ( ) ( ) ( ) ( ) ∫∫ ∞ ∞− ∞ ∞−       −+− −=      − −=Φ dx xjxx dx x xj 2 222 2 2 2 22 exp 2 1 2 expexp 2 1 σ ωσµµ σπσ µ ω σπ ω Let write ( ) ( )[ ] ( ) ( )[ ] µσωσωσωµ µσωµσωµ µσωµσωµµ 24222 22222 222222 2 222 jjx jjx xjxxjxx −++−= ++−+−= ++−=−+− Therefore ( ) ( ) ( ) ( )[ ]    1 2 22 2 242 2 2 2 exp 2 1 2 2 exp 2 expexp 2 1 ∫∫ ∞ ∞− ∞ ∞−         +− −      − −=      − −=Φ dx jxj dx x xj σ σωµ σπσ µσωσω σ µ ω σπ ω Central Limit Theorem ( ) ( ) ( )∫ +∞ ∞− Φ−= ωωω π dxjxp XX exp 2 1 Using ( ) ( )∫ ∞ ∞−       −−−=      − − ωµωσω πσ µ σπ dxj x 22 2 2 2 1 exp 2 1 2 exp 2 1 ( )       +−=Φ µωσωω j22 2 1 exp Poisson Distribution
  • 48.
    48 SOLO Review ofProbability Properties of Moment-Generating Function Moment-Generating Function of the Sum of Independent Random Variables mm XXXS +++= 21 : Given the Sum of Independent Random Variables ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫∫∫ ∫ ∫ ∫ ∞ ∞− ∞ ∞− ∞ ∞− = ∞ ∞− ∞ ∞− +∞ ∞− = +++=Φ mmXmXX xpxpxpxxxp VariablesRandomtIndependen mmSmS xdxpxjxdxpxjxdxpxj dxdxdxxxxpXXXj m mmXXXmmS mm ωωω ωω expexpexp ,,,exp 222111 ,,, 212121 21 221121    ( ) ( ) ( ) ( )ωωωω mm XXXS ΦΦΦ=Φ 21 ( ) ( ) ,m,,ik k kp i i k ii iiXi 21integerpositive ! exp ; = − = λλ λ ( ) ( )[ ]{ } mijiXi ,,2,11expexp =−=Φ ωλω ( ) ( ) ( ) ( ) ( ) ( )[ ]{ }1expexp 2121 −+++=ΦΦ⋅Φ=Φ ωλλλωωωω jmXXXS mm  ( ) ( ) ( )∫ +∞ ∞− =Φ dxxpxj XX ωω exp Sum of Poisson Independent Random Variables is a Poisson Random Variable with mSm λλλλ +++= 21 Example 1: Sum of Poisson Independent Random Variables mm XXXS +++= 21 :
  • 49.
    49 SOLO Review ofProbability Properties of Moment-Generating Function Example 2: Sum of Normal Independent Random Variables ( ) ( ) ( ) ( ) ( ) ( )      +++++++−=       +−      +−      +−= ΦΦ⋅Φ=Φ mm mm XXXS j jjj mm µµµωσσσω µωσωµωσωµωσω ωωωω    21 22 2 2 1 2 22 2 2 2 2 1 2 1 2 2 1 exp 2 1 exp 2 1 exp 2 1 exp 21 Sum of Normal Independent Random Variables is a Normal Random Variable with ( ) ( ) ( )∫ +∞ ∞− =Φ dxxpxj XX ωω exp ( ) ( )       − −= 2 2 2 exp 2 1 ,; i i i iii x xp σ µ σπ σµ mm XXXS +++= 21 : ( )       +−=Φ iiX ji µωσωω 22 2 1 exp mS mS m m µµµµ σσσσ +++= +++=   21 22 2 2 1 2 Therefore the Sm probability distribution is: ( ) ( )         − −= 2 2 2 exp 2 1 ,; m m m mm S S S SSm x Sp σ µ σπ σµ Table of Content
  • 50.
    50 SOLO Review ofProbability Existence Theorems Existence Theorem 1 Given a function G (x) such that ( ) ( ) ( ) 1lim,1,0 ==∞+=∞− ∞→ xGGG x ( ) ( ) 2121 0 xxifxGxG <=≤ ( G (x) is monotonic non-decreasing) ( ) ( ) ( )xGxGxG n xx xx n n == ≥ → + lim We can find an experiment X and a random variable x, defined on X, such that its distribution function P (x) equals the given function G (x). Proof of Existence Theorem 1 Assume that the outcome of the experiment X is any real number -∞ <x < +∞. We consider as events all intervals, the intersection or union of intervals on the real axis. 5x 1x 2x 3x 4x 6x 7x 8x ∞− ∞+ To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1). From our definition of G (x) it follows that P (x) is a distribution function. Existence Theorem 2 Existence Theorem 3
  • 51.
    51 SOLO Review ofProbability Existence Theorems Existence Theorem 2 If a function F (x,y) is such that ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0,,,, 1,,0,, 11122122 ≥+−− =+∞∞+=−∞=∞− yxFyxFyxFyxF FxFyF for every x1 < x2, y1 < y2, then two random variables x and y can be found such that F (x,y) is their joint distribution function. Proof of Existence Theorem 2 Assume that the outcome of the experiment X is any real number -∞ <x < +∞. Assume that the outcome of the experiment Y is any real number -∞ <y < +∞. We consider as events all intervals, the intersection or union of intervals on the real axes x and y. To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1). From our definition of F (x,y) it follows that P (x,y) is a joint distribution function. The proof is similar to that in the Existence Theorem 1
  • 52.
    52 SOLO Review ofProbability Histogram A histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions: ∑= = k i imn 1 A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as: ∑= = i j ji mM 1 An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1. Cumulative Histogram
  • 53.
    53 SOLO Review ofProbability Law of Large Numbers (History) The Weak Law of Large Numbers was first proved by the Swiss mathematician James Bernoulli in the fourth part of his work “Ars Conjectandi” published posthumously in 1713. Jacob Bernoulli -1654 1705 The Law of Large Numbers has three versions: • Weak Law of Large Numbers (WLLN) • Strong Law of Large Numbers (SLLN) • Uniform Law of Large Numbers (ULLN) The French mathematician Siméon Poisson generalized Bernoulli’s theorem around 1800. Siméon Denis Poisson 1781-1840 The next contribution was by Binaymé and later in 1866 by Chebyshev and is known as Binaymé- Chebyshev Inequality. Pafnuty Lvovich Chebyshev 1821 - 1894 Irénée-Jules Bienaymé 1796 - 1878 Weak Law of Large Numbers (WLLN)
  • 54.
    54 SOLO Review ofProbability Law of Large Numbers (History - continue) Francesco Paolo Cantelli 1875-1966 Félix Edouard Justin Ėmile Borel 1871-1956 Andrey Nikolaevich Kolmogorov 1903 - 1987 Table of Content Borel-Cantelli Lemma
  • 55.
    55 SOLO Review ofProbability Markov’s Inequality If X is a random variable which takes only nonnegative values, then for any value a>0 ( ) ( ) a XE aX ≤≥Pr Proof: Suppose X is continuous with probability density function ( )xpX ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )aXa dxxpadxxpadxxpx dxxpxdxxpxdxxpxXE a X a X a X a X a XX ≥= =≥≥ +== ∫∫∫ ∫∫∫ ∞∞∞ ∞∞ Pr 00 1856 - 1922 Since a > 0: ( ) ( ) a XE aX ≤≥Pr Table of Content
  • 56.
    56 SOLO Review ofProbability Chebyshev’s Inequality If X is a random variable with mean μ = E (X) and variance σ2 = E [(X – μ)2 ] , then for any value k > 0 { } 2 2 Pr k kX σ µ ≤≥− Proof: Since (X – μ)2 is a nonnegative random variable, we can apply Markov’s inequality with a = k2 to obtain ( ){ } ( )[ ] 2 2 2 2 22 Pr kk XE kX σµ µ = − ≤≥− But since (X – μ)2 ≥ k2 if and only if | X – μ| ≥ k, the above is equivalent to { } 2 2 Pr k kX σ µ ≤≥− Pafnuty Lvovich Chebyshev 1821 - 1894 Weak Law of Large NumbersTake k σ instead of k to obtain { } 2 1 Pr k kX ≤≥− σµ Bernoulli’s Theorem Table of Content
  • 57.
    57 SOLO Review ofProbability Bienaymé’s Inequality If X is a random variable then for any values a, k > 0 use Proof: Let prove first that if the random variable y takes only positive values, than for any α>0 { } [ ] 0Pr > − ≤≥− a k aXE kaX n n nn ( ) ( ) α α yE y ≤≥Pr i.e. { } [ ] 0Pr > − ≤≥− a k aXE kaX n n ( ) ( ) ( ) ( ) ( )ααα αα ≥=≥≥= ∫∫∫ ∞∞∞ ydyypdyypydyypyyE YYY Pr 0 Define and choose α = kn > 00: ≥−= n aXy { } [ ] 0Pr > − ≤≥− a k aXE kaX n n nn kaXkaX nn ≥−⇒≥− { } [ ] 0Pr > − ≤≥− a k aXE kaX n n Irénée-Jules Bienaymé 1796 - 1878 Markov’s Inequality For n = 2 and a = μ we obtain the Chebyshev’s Inequality. For this reason Chebyshev’s Inequality is known also as Bienaymé - Chebyshev’s Inequality Markov’s Inequality x 0 ∞+ 0.1 [ ] n n k aXE − − a ka +ka − [ ] n n k aXE − ( )xpX Table of Content
  • 58.
    58 SOLO Review ofProbability Chernoff’s and Hoeffding’s Bounds Start from Markov’s Inequality for a nonnegative random variable Z and γ > 0 ( ) ( ) 0,0Pr ≥>≤≥ Z ZE Z γ γ γ Now let take a random variable Y and define the logarithmic generating function ( ) ( )[ ] ( )[ ]    ∞ ∞< =Λ otherwise YtEifYtE tY , exp,expln : Using the fact that exp (x) is a monotonic increasing function ( ) ( ) 0expexp ≥∀≥⇒≥ ttYtY λλ and applying Markov’s inequality with ( ) ( )λγ tYtZ exp:&exp: == we obtain: ( ) ( ) ( )[ ] ( )[ ] ( ) ( )[ ]{ } 0exp exp exp expexpPrPr ≥∀Λ−−=≤≥=≥ ttt t YtE tYtY Y λ λ λλ Therefore: ( ) ( )[ ]{ }ttY Yt Λ−−≤≥ ≥∀ λλ expinfPr 0 From this inequality, by using different Y, we obtain the Chernoff’s and Hoeffding’s bounds To compute ΛY(t) we need to know the distribution function pY (y). Markov, Chebyshev and Bienaymé inequalities use only Expectation Value information. Let try to obtain a tighter bound when the probability distribution function is known. 0 infimum ≥∀t Table of Content
  • 59.
    59 SOLO Review ofProbability Chernoff’s Bound Let X1, X2,… be independent Bernoulli’s random variables with Pr (Xi=1) = p and Pr (Xi=0) = 1-p Herman Chernoff 1921 - Use ( ) mm XmXXY =++= :/: 1 Define: ( ) ( )[ ]{ } ( ) ( ) ( )[ ] ( )[ ]ptp ptptXtEt iXi −+= −⋅+⋅==Λ 1expln 10exp1explnexpln ( ) ( )[ ]{ }ttY Yt Λ−−≤≥ ≥∀ λλ expinfPr 0 ( ) ( )[ ]{ }pmtpmX m t Et m i iY −+=                     =Λ ∑= 1/explnexpln 1 ( )[ ]{ } ( )[ ]tttt Y t Yt Λ−⇔Λ−− ≥∀≥∀ λλ 00 supexpinf ( ) ( )[ ]pmtpmttt Y −+−=Λ− 1/explnλλ ( )[ ] ( ) ( ) 0 1/exp /exp = −+ −=Λ− pmtp mtp tt td d Y λλ ( ) λ λ − − = 1 1 /*exp p p mt ( ) ( )             − − −+      =      − − −      − − =Λ− pp m p m p p mtt Y 1 1 ln1ln 1 1 ln 1 1 ln** λ λ λ λ λλ λ λλ
  • 60.
    60 SOLO Review ofProbability Chernoff’s Bound (continue – 1) Use ( )[ ] ( ) 10 1 1 ln1lnexpinf/Pr 10 1 <<                     − − −+      −≤≥++ << p pp mmXX m λ λ λ λλ λ  Define: ( ) ( ) 1,0 1 1 ln1ln:| <<            − − −+      −= p pp mpH λ λ λ λ λλ ( ) ( )[ ]{ } ( )[ ]{ }**expsupexpinfsupPr 10010 ttttY YY t Λ−−=Λ−−≤≥ <<≥∀<< λλλ λλ ( ) 0| == ppH λ ( )       /−      − − −/+      −= 1 1 1 ln1ln | pp m d pHd λλ λ λ ( ) 0 | = = λ λ d ppHd ( ) mm d pHd 4 1 11| 2 2 −≤      − +−= λλλ λ 0 0.1 m4− ( ) 2 2 | λ λ d pHd 5.0 λ
  • 61.
    61 SOLO Review ofProbability Chernoff’s Bound (continue – 2) ( )[ ] ( )[ ] ( ) 10|supexp|expsup/Pr 1010 1 <<   =≤≥++ <<<< ppHpHmXX m λλλ λλ  ( ) ( ) 1,0 1 1 ln1ln:| <<            − − −+      −= p pp mpH λ λ λ λ λλ ( ) ( ) ( ) ( ) ( ) ( ) ( )2 4 10 2 00 2 | !2 | !1 || pm pH p ppH p ppHpH m −−≤ = − += − +== −≤ ≤≤ λ θλ λ λ λ λλ θ λλλ      From which we arrive to the Chernoff’s Bound ( )[ ] ( )[ ] 1,02exp/Pr 2 1 <<−−≤≥++ ppmmXX m λλλ Define ελ += p: ( )[ ] [ ] 102exp/Pr 2 1 <<−≤+≥++ pmpmXX m εε
  • 62.
    62 SOLO Review ofProbability Chernoff’s Bound (continue – 3) Using the Chernoff Bound we obtain Define now: ( ) ( )( )[ ] [ ] 102exp1/11Pr 2 1 <<−≤+−≥−++− pmpmXX m εε ( ) ( )[ ] mm XmXXY −=−++−= 1:/11: 1  or since ( ) ( )( ) ( ) ε+−≥++−=−++− pmXXmXX mm 1/1/11 11  ( ) ε−≤++ pmXX m /1  ( )[ ] [ ] 102exp/Pr 2 1 <<−≤−≤++ pmpmXX m εε together with: ( )[ ] [ ] 102exp/Pr 2 1 <<−≤+≥++ pmpmXX m εε Chernoff’s Bounds Herman Chernoff 1921 - ( )[ ] [ ] 102exp2/Pr 2 1 <<−≤≥−++ pmpmXX m εε By summing those two inequalities we obtain: ( )[ ] ( )[ ] [ ] 102exp2//Pr 2 11 <<−≤≥−+++−≤−++ pmpmXXpmXX mm εεε  or: Table of Content
  • 63.
    63 SOLO Review ofProbability Hoeffding’s Bound Suppose that Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0 Define: 10: ≤≤→ − − = ≤≤ αα bYa ab Yb We have: ( )bab ab aY a ab Yb Y αα −+= − − + − − = 1 Since exp (.) is a convex function, for any t ≥ 0 we have: ( ) ( ) ( ) ( ) ( ) ( ) 0expexpexp1expexp ≥∀ − − + − − =−+≤ ttb ab aY ta ab Yb tbtatY αα ta tbtY ( )taexp ( )tbexp ( )tYexp ( ) ( ) ( )tbta exp1exp λλ −+ ( ) 01 ≥−+= ttbtatY λλ Let take the expectation of this inequality and define: ( ) ( ) 10/: 0 ≤≤→−−= = pabap YE ( ){ } ( )  ( ) ( )  ( ) ( ) ( ) ( ) ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1 expexp1 expexpexp 00 ≥∀=−−+−= +−= − − + − − ≤ tutabptabpp tbptap tb ab aYE ta ab YEb tYE φ Let start with a simpler problem:
  • 64.
    64 SOLO Review ofProbability Hoeffding’s Bound (continue – 1) ( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ where: ( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) 00 exp1 1exp1 exp1 exp =→ +− −− = +− +−= ud d upp upp upp up pu ud d φ φ Differentiating we obtain: ( ) ( ) ( ) ( ) ( )[ ]22 2 exp1 exp1 upp upp u ud d −−+ −− =φ ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( ) p p uupp upp upp u ud d − =−→=−−+− −−+ −− = 1 *exp0exp1 exp1 exp1 33 3 φ ( ) ( ) ( ) ( ) ( ) ( ) 4 1 1 1 1 1 * 22 2 2 2 =       − −+ − − =≤ p p pp p p pp u ud d u ud d φφ
  • 65.
    65 SOLO Review ofProbability Hoeffding’s Bound (continue – 2) ( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ where: ( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu ( ) ( ) ( ) ( ) ( )222 4 100 8 1 '' 2 1 0'0 abtuuu −≤++= ≤  θφφφφ ( ){ } ( )[ ] 08/expexp 22 ≥∀−≤ tabttYE End of the simpler problem: Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0
  • 66.
    66 SOLO Review ofProbability Hoeffding’s Bound (continue – 3) Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and Therefore we have ( ){ } ( )[ ] ( ) ( ){ } ( )[ ] 08/expexp8/expexp 22 0 22 ≥∀−≤−→−≤ = tabttZEabttZE iii ZE iii i Generalize the result Use ( ) ( )[ ] ( ) 0 exp exp Pr ≥∀≤≥ t t YtE Y λ λ in ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) 0 8 exp2 8/expexp2 expexpexpexp expexpexpexp PrPrPr 1 2 2 1 22 11 11 111 ≥∀      −+−= −−≤       −−+      −=             −−+            −≤       ≥−+      ≥=      ≥ ∑ ∏ ∏∏ ∑∑ ∑∑∑ = = == == === tab t t abtt ZtEtZtEt ZtEtZtEt ZZZ m i ii m i ii m i i m i i m i i m i i m i i m i i m i i λ λ λλ λλ λλλ mm ZZZZ +++= 21
  • 67.
    67 SOLO Review ofProbability Hoeffding’s Bound (continue – 4) Wassily Hoeffding 1914 - 1991 Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and Therefore we have Generalize the result but ( ) ( ) 0 8 infexp2 8 exp2Pr 1 2 2 0 1 2 2 1 ≥∀             −+−≤      −+−≤      ≥ ∑∑∑ = ≥ == tab t tab t tZ m i ii t m i ii m i i λλλ mm ZZZZ +++= 21 ( ) ( ) ( )∑ −−=      −+− = ∑ −= = ≥ = ∑ m i ii abt m i ii t abab t t m i ii 1 22 /4* 1 2 2 0 /2 8 inf 1 2 λλ λ ( )             − −≤      ≥ ∑ ∑ = = m i ii m i i ab Z 1 2 1 2 exp2Pr λ λ We finally obtain Hoeffding’s Bound Table of Content
  • 68.
    68 SOLO Review ofProbability Convergence Concepts We say that the sequence Xn, converge to X with probability 1 if the set of outcomes x such that has the probability 1, or { } ∞→=→ nforXXn 1Pr ( ) ( )xXxXn n = ∞→ lim Convergence Almost Everywhere (a.e.) (or with Probability 1, or Strongly) Convergence in the Mean-Square sence (m.s.) We say that the sequence Xn, converge to X in the mean-square sense if { } ∞→→→ nforXXE n 0 2 Convergence in Probability (p) (or Stochastic Convergence or Convergence in Measure) We say that the sequence Xn, converge to X in Probability sense if { } ∞→→>→ nforXXn 0Pr ε Convergence in Distribution (d) (weak convergence) We say that the sequence Xn, converge to X in Distribution sense if ( ) ( ) ∞→→ nforxpxp XXn No convergence Distribution Almost Everywhere (d) (p)(a.e.) (m.s.) Mean Square Probability implies implies implies or XX ea n .. → or XX sm n .. → or XX P n → or XX d n → Weak Law of Large Numbers Central Limit Theorem Bernoulli’s Theorem
  • 69.
    69 SOLO Review ofProbability Convergence Concepts (continue – 1) According to Cauchy Criterion of Convergence the sequence Xn, converge to a unknown limit if Cauchy Criterion of Convergence Augustin Louis Cauchy )-1789 1857(00 >∞→→→ + manyandnforXX mnn Convergence Almost Everywhere (a.e.) { } 01Pr >∞→=<→ + manyandnforXX mnn ε Convergence in the Mean-Square sence (m.s.) { } 00 2 >∞→→<→ manyandnforXXE mn ε { } ∞→→>→ nforXXn 0Pr ε Using Chebyshev Inequality { } { } 22 /Pr εε XXEXX nn →≤>→ If Xn →X in the m.s. sense, then the right hand, for a given ε, tends to zero, also the left hand side, i.e.: Convergence in Probability (p) The opposite is not true, convergence in probability doesn’t imply convergence in m.s. No convergence Distribution Almost Everywhere (d) (p)(a.e.) (m.s.) Mean Square Probability implies implies implies Table of Content
  • 70.
    70 SOLO Review ofProbability The Laws of Large Numbers The Law of Large Numbers is a fundamental concept in statistics and probability that describes how the average of randomly selected sample of a large population is likely to be close to the average of the whole population. There are two laws of large numbers the Weak Law and the Strong Law. The Weak Law of Large Numbers The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence of random variables that have the same expected value μ and variance σ2 , and are uncorrelated (i.e., the correlation between any two of them is zero), then ( ) nXXX nn /: 1 ++=  converges in probability (a weak convergence sense) to μ . We have { } ∞→=<− nforXn 1Pr εµ converges in probability The Strong Law of Large Numbers The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence of random variables that have the same expected value μ and variance σ2 , and are uncorrelated (i.e., the correlation between any two of them is zero), and E (|Xi|) < ∞ then ,i.e. converges almost surely to μ.{ } ∞→== nforXn 1Pr µ converges almost surely
  • 71.
    71 SOLO Review ofProbability The Law of Large Numbers Differences between the Weak Law and the Strong Law The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ. Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite number of times, although it happens at infrequent intervals. The Strong Law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have for any positive value ε, the inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to an infinite, but infrequent, number of times). Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.
  • 72.
    72 SOLO Review ofProbability The Law of Large Numbers Proof of the Weak Law of Large Numbers ( ) iXE i ∀= µ ( ) iXVar i ∀= 2 σ ( )( )[ ] jiXXE ji ≠∀=−− 0µµ ( ) ( ) ( )[ ] µµ ==++= nnnXEXEXE nn //1  ( ) ( )[ ]{ } ( ) ( ) ( )( )[ ] ( )[ ] ( )[ ] nn n n XEXE n XX E n XX EXEXEXVar n jiXXE nn nnn ji 2 2 2 2 22 1 0 2 1 2 12 σσµµ µµ µ µµ == −++− =               −++− =               − ++ =−= ≠∀=−−   Given we have: Using Chebyshev’s inequality on we obtain:nX ( ) 2 2 / Pr ε σ εµ n Xn ≤≥− Using this equation we obtain: ( ) ( ) ( ) n XXX nnn 2 2 1Pr1Pr1Pr ε σ εµεµεµ −≥≥−−≥>−−=≤− As n approaches infinity, the expression approaches 1. Chebyshev’s inequality q.e.d. Table of Content Monte Carlo Integration Monte Carlo Integration
  • 73.
    73 SOLO Review ofProbability Central Limit Theorem The first version of this theorem was first postulated by the French-born English mathematician Abraham de Moivre in 1733, using the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This was published in1756 in “The Doctrine of Chance” 3th Ed. Pierre-Simon Laplace (1749-1827) Abraham de Moivre (1667-1754) This finding was forgotten until 1812 when the French mathematician Pierre-Simon Laplace recovered it in his work “Théory Analytique des Probabilités”, in which he approximate the binomial distribution with the normal distribution. This is known as the De Moivre – Laplace Theorem. De Moivre – Laplace Theorem The present form of the Central Limit Theorem was given by the Russian mathematician Alexandr Lyapunov in 1901. Alexandr Mikhailovich Lyapunov (1857-1918)
  • 74.
    74 SOLO Review ofProbability Central Limit Theorem (continue – 1) Let X1, X2, …, Xm be a sequence of independent random variables with the same probability distribution function pX (x). Define the statistical mean: m XXX X m m +++ = 21 ( ) ( ) ( ) ( ) µ= +++ = m XEXEXE XE m m 21 ( ) ( )[ ]{ } ( ) ( ) ( ) mm m m XXX EXEXEXVar m mmmXm 2 2 22 21 22 σσµµµ σ ==               −++−+− =−==  Define also the new random variable ( ) ( ) ( ) ( ) m XXXXEX Y m X mm m σ µµµ σ −++−+− = − = 21 : We have: The probability distribution of Y tends to become gaussian (normal) as m tends to infinity, regardless of the probability distribution of the random variable, as long as the mean μ and the variance σ2 are finite.
  • 75.
    75 SOLO Review ofProbability Central Limit Theorem (continue – 2) ( ) ( ) ( ) ( ) m XXXXEX Y m X mm m σ µµµ σ −++−+− = − = 21 : Proof The Characteristic Function ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( ) m X m i m i i m Y m X m j E m X jE m XXX jEYjE i               Φ=                     − =               − =               −++−+− ==Φ − = ∏ ω σ µω σ µ ω σ µµµ ωωω σ µexpexp expexp 1 21  ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0/lim 2 1 !3 / !2 / !1 / 1 2222 33 1 22 0 =      Ο/      Ο/+−= +               − +               − +      − +=      Φ ∞→ − mmmm X E mjX E mjX E mj m m iii Xi ωωωω σ µω σ µω σ µωω σ µ     Develop in a Taylor series( )       Φ − miX ω σ µ
  • 76.
    76 SOLO Review ofProbability Central Limit Theorem (continue – 3) Proof (continue – 1) The Characteristic Function ( ) ( ) m XY m E i               Φ=Φ − ω ω σ µ ( ) 0/lim 2 1 2222 =      Ο/      Ο/+−=      Φ ∞→ − mmmmm m Xi ωωωωω σ µ ( ) ( )2/exp 2 1 2 22 ω ωω ω −→            Ο/+−=Φ ∞→m m Y mm Therefore ( ) ( ) ( ) ( ) ( )2/exp 2 1 2/exp 2 1 exp 2 1 22 ydyjdyjyp m YY −=−−→Φ−= ∫∫ +∞ ∞− ∞→+∞ ∞− π ωωω π ωωω π The probability distribution of Y tends to become gaussian (normal) as m tends to infinity (Convergence in Distribution). Characteristic Function of Normal Distribution Convergence Concepts Table of Content Monte Carlo Integration
  • 77.
    77 SOLO Review ofProbability Bernoulli Trials – The Binomial Distribution ( ) ( ) ( ) ( ) knkknk pp k n pp knk n nkp −− −      =− − = 11 !! ! , Jacob Bernoulli -1654 1705 ( ) ( ) ( ) ! ,1 ! ;; 00 k k i e ipkP k i ik i λγλ λλ λ + === ∑∑ = − = ( ) pnxE = Probability Density Functions Cumulative Distribution Function Mean Value Variance( ) ( )ppnxVar −= 1 ( ) ( )∫ −= − x a dtttxa 0 1 exp,γγ is the incomplete gamma function Moment Generating Function ( ) ( )[ ]n pjp −+=Φ 1exp ωω Distribution Examples
  • 78.
    78 SOLO Review ofProbability Bernoulli Trials – The Binomial Distribution (continue – 1) p – probability of success (r = 1) of a given discrete trial q – probability of failure (r=0) of the given discrete trial 1=+ qp n – number of independent trials ( )nkp , – probability of k successes in n independent trials (Bernoulli Trials) ( ) ( ) ( ) ( ) knkknk pp k n pp knk n nkp −− −      =− − = 11 !! ! , Using the binomial theorem we obtain ( ) ( )∑= − −      ==+ n k knkn pp k n qp 0 11 therefore the previous distribution is called binomial distribution. Jacob Bernoulli -1654 1705 Given a random event r = {0,1} 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k ( )nkP , The probability of k successful trials from n independent trials is given by The number of k successful trials from n independent trials is given by ( )!! ! knk n k n − =      with probability ( ) knk pp − −1 to permutations and Combinations Distribution Examples
  • 79.
    79 SOLO Review ofProbability Bernoulli Trials – The Binomial Distribution (continue – 2) ( ) ( ) ( ) ( ) knkknk pp k n pp knk n nkp −− −      =− − = 11 !! ! , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] pnpppnpp knk n pn pp ini n pnpp ini n iXE n n k knk ik n i ini n i ini =−+=− −− − = − −− − =− − = − − = −− −= = −− = − ∑ ∑∑ 1 1 0 1 1 1 1 0 11 !1! !1 1 !!1 !1 1 !! ! Mean Value Moment Generating Function ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ]nj n k knkj n k knkkjXj pep ppe knk n pp knk n eeE −+= − − =− − ==Φ ∑∑ = − = − 1 1 !! ! 1 !! ! 00 ω ωωω ω
  • 80.
    80 SOLO Review ofProbability Bernoulli Trials – The Binomial Distribution (continue – 3) ( ) ( ) ( ) ( ) knkknk pp k n pp knk n nkp −− −      =− − = 11 !! ! , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∑∑ ∑∑ = − = − = − = − − −− +− −− = − −− =− − = n i ini n i ini n i ini n i ini pp ini n pp ini n pp ini n ipp ini n iXE 12 00 22 1 !!1 ! 1 !!2 ! 1 !!1 ! 1 !! ! ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) pnpnn pp knk n pnpp mnm n pnn pp ini n pnpp ini n pnn nn pp n k knk pp n m mnm n i ini n i ini +−= − −− − +− −− − −= − −− − +− −− − −= −− −+ = −− −+ = −− = −− = −− ∑∑ ∑∑ 2 1 0 1 1 0 22 1 1 2 22 1 1 !1! !1 1 !2! !2 1 1 !!1 !1 1 !!2 !2 1 12      ( ) ( ) ( ) ( ) ( ) ( )ppnpnpnpnnXEXEXVar −=−+−=−= 11 2222 Variance
  • 81.
    81 SOLO Review ofProbability Bernoulli Trials – The Binomial Distribution (continue – 4) Let apply Chebyshev’s Inequality: ( ) pnxE = Mean Value Variance( ) ( ) ( ) ( )ppnXEXExVar −=−= 1 22 ( )[ ]{ } ( )( )[ ] 2 2 2 2 22 Pr kk XEXE kXEX σ = − ≤≥− [ ]{ } ( ) 2 22 1 Pr k ppn kpnX − ≤≥− we obtain: An upper bound to this inequality, when p varies (0 ≤ p ≤ 1), can be obtained by taking the derivative of p (1 – p), equating to zero, and solving for p. The result is p = 0.5. [ ]{ } 2 22 4 Pr k n kpnX ≤≥− Chebyshev’s Inequality We can see that when k → ∞ ,i.e. X converges in Probability to Mean Value n p. This is known as Bernoulli’s Theorem. [ ]{ } 0Pr 22 P kpnX →≥− Convergence in Probability
  • 82.
    82 SOLO Review ofProbability Generalized Bernoulli Trials Consider now r mutually exclusive events A1, A2, …, Ar rjiOAA ji ,,2,1 =≠/=∩ with their sum equal to certain event S: SAAA r =∪∪∪ 21 and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211  Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp  We want to find the probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r =+++ 21 !!! ! 21 rkkk n  The number of possible combinations of k1 events A1, k2 events A2, …,kr events Ar is and the probability of each combination is rk r kk ppp 21 21 We obtain the probability of the Generalized Bernoulli Trials as ( ) rk r kk r r ppp kkk n nkkkp    21 21 21 21 !!! ! ,,,, = Permu- tations & combi- nations Table of Content
  • 83.
    83 SOLO Review ofProbability Poisson Asymptotical Development (Law of Rare Events) ( ) ( ) ( ) ( ) knkknk pp knk n pp k n nkp −− − − =−      = 1 !! ! 1,Start with the Binomial Distribution We assume that n >> 1 and ( ) 1/1/ 00 <<≈+= nknkp ( ) ( ) 0 0 0 00 111,0 k n k k n n n e n k n k pnkp − ∞→ − − →               −=      −=−== ( ) ( ) ( ) ( ) ( ) ( ) ( )nkp k k n k n n k n n nkp k k n k n knnn n k n k k knnn nkp k k k k k k k knk n ,0 ! 1 1 1 1 1 ,0 ! 1 11 1 ! 11 , 0 1 0 0 0 00 =       −       − −      − ==       − +−− =       −     +−− = →∞ → −       ( ) ( )0 0 exp ! , k k k nkp k −≈ This is Poisson Asymptotical Development (Law of Rare Events) Siméon Denis Poisson 1781-1840 Distribution Examples
  • 84.
    84 SOLO Review ofProbability Poisson Distribution Siméon Denis Poisson 1781-1840 ( ) ( ) integerpositive ! exp ; k k kp k λλ λ − = ( ) ( ) ( ) ! ,1 ! ;; 00 k k i e ipkP k i ik i λγλ λλ λ + === ∑∑ = − = ( ) ( ) ( ) ( ) ( ) ( ) λ λ λλ λ λλ λλ λ =−= − −= − = ∑∑∑ ∞ = −=∞ = −∞ =  exp 0 1: 1 1 0 ! exp !1 exp ! exp k kik i i i i kii i XE Probability Density Functions Mean Value Variance ( ) ( ) ( ) λ=−= 22 XEXExVar Moment Generating Function ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( )[ ] ( )[ ]{ }1expexpexpexp ! exp exp ! expexp exp 00 −== −= − ==Φ − ∞ = ∞ = ∑∑ ωλλω λω λ λλω ωω λ jje m j m mj kjE m m m m ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) λλ λλ λλλ λ λλ λλ λλ +=             − + − −= − −= − = ∑∑∑∑ ∞ = −∞ = −∞ = −∞ = 2 exp 1 1 exp 2 2 1 1 0 2 2 !1!2 exp !1 exp ! exp  i i i i i i i i iii i i i XE
  • 85.
    85 SOLO Review ofProbability Poisson Distribution Moment Generating Function ( ) ( )[ ]{ }1expexp −=Φ ωλω j Approximation to the Gaussian Distribution ( )[ ] ( ) ( ) ωλωλωλωλωλ sin2/sin2sin1cos1exp 2 jjj +−=+−=− For λ sufficient large Φ (ω) is negligible for all but very small values of ω, in which case ( ) ( ) ωωωω ≈≈ sin&2/2/sin 22 ( ) ( )[ ]{ }       +−≈−=Φ ωλ λ ω ωλω jj 2 exp1expexp 2 ( )       +−=Φ µωσωω j22 2 1 exp For a normal distribution with mean μ and variance σ2 we found the Moment Generating Function: Therefore the Poisson Distribution can be approximated by a Gaussian Distribution with mean μ = λ and variance σ2 = λ ( ) ( ) ( )       − − − = λ λ λπ λλ λ 2 exp 2 1 ~ ! exp ; 2 k k kp k
  • 86.
    86 SOLO Review ofProbability Poisson Distribution Siméon Denis Poisson 1781-1840 ( ) ( ) integerpositive ! exp ; k k kp k λλ λ − = ( ) ( ) ( ) ! ,1 ! ;; 00 k k i e ipkP k i ik i λγλ λλ λ + === ∑∑ = − = ( ) λ=xE Probability Density Functions Cumulative Distribution Function Mean Value Variance( ) λ=xVar ( ) ( )∫ −= − x a dtttxa 0 1 exp,γγ is the incomplete gamma function Moment Generating Function ( ) ( )[ ]{ }1expexp −−=Φ ωλω j Table of Content
  • 87.
    87 SOLO Review ofProbability Normal (Gaussian) Distribution Karl Friederich Gauss 1777-1855 ( ) ( ) σπ σ µ σµ 2 2 exp ,; 2 2       − − = x xp ( ) ( ) ∫ ∞−       − −= x du u xP 2 2 2 exp 2 1 ,; σ µ σπ σµ ( ) µ=xE ( ) σ=xVar ( ) ( )[ ] ( ) ( )       −=       − −= =Φ ∫ ∞+ ∞− 2 exp exp 2 exp 2 1 exp 22 2 2 σω µω ω σ µ σπ ωω j duuj u xjE Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function Distribution Examples Table of Content
  • 88.
    88 SOLO Review ofProbability De Moivre-Laplace Asymptotical Development ( ) ( ) ( ) ( ) knkknk pp knk n pp k n nkp −− − − =−      = 1 !! ! 1,Start with the Binomial Distribution Use Stirling asymptotical approximation ( ) n nnnn −≈ exp2! π ( ) ( ) ( ) ( ) ( ) ( ) ( ) knk knk knk n kn qn k pn knk n qp knknknkkk nnn nkp − − −       −       − = −+−−− − ≈ π ππ π 2 exp2exp2 exp2 , Define ( ) ( ) pnkkkpnk k 1:&1: 00 +−=−=+= δ ( ) ( ) ( ) ( ) ( ) ( ) ( ) qkqk pnk qk pkn qpn knk qp knk n nkp nkp k knk knk δ −= +− −= +− = +−− − = − +−− − 1 1 1 1 1 ! !1!1 !! ! ,1 , 11 ( ) ( ) ( ) ( ) ( ) ( )nkpnkppnk nkpnkppnk ,1,1 ,1,1 −<⇒+> −>⇒+<
  • 89.
    89 SOLO Review ofProbability De Moivre-Laplace Asymptotical Development (continue – 1) ( ) ( ) knk kn qn k pn knk n nkp −       −       − ≈ π2 , or pnkpnk n k n −≈≈ >>>> 11 0 & δ qpn=:σ ( ) ( ) nkk qn kn pn k qpn knk nkp −−−       −             − ≈ 2 1 22 2 1 , σπ ( ) ( ) ( )kk pn k qn kkk pnqnpnqn nkp δδ δδδδ σπ +−−−−−       +      −      +      −≈ 1111 2 1 , 2 1 2 1 2       +−−      ++−       −      += 2 1 2 1 2 11 2 1 kk qn k pn k qnpn δδ δδ σπ ( ) ( )kk k k qn k pn k pn qn qnpn δδδ δ δδ σπ −−+−>>+ >>−       −      +≈ 11 2 1 2 2 1 2 1
  • 90.
    90 SOLO Review ofProbability De Moivre-Laplace Asymptotical Development (continue – 2) ( ) ( ) ( )kk qn k pn k qnpn nkp δδ δδ σπ −−+−       −      +≈ 11 2 1 , 2 ( )[ ] ( ) ( ) ( ) ( ) ( ) 2 22 22 2 22 2 2 1ln 2 2 11 2 22 1ln1ln,2ln 2 2 σ δδ δδ δ δδ δ δ δ δ δσπ σ k qpn k kk k kk k x xx k k k k qnpn qnqn qn pnpn pn qn qn pn pnnkp = −≈+ ≈      +≈         +−−        −+≈       −−+      ++≈−    from which ( )         −≈ 2 2 2 2 exp 2 1 , σ δ σπ k nkp Pierre-Simon Laplace (1749-1827) Distribution Examples Abraham de Moivre (1667-1754) This result was first published by De Moivre in 1756 in “The Doctrine of Chance” 3th Ed. and reviewed by Laplace, “Théorie Analytiques de Probabilités”, 1820 Central Limit Theorem
  • 91.
    91 SOLO Review ofProbability De Moivre-Laplace Asymptotical Development for Generalized Bernoulli Trials Consider the r mutually exclusive events A1, A2, …, Ar rjiOAA ji ,,2,1 =≠/=∩ with their sum equal to certain event S: SAAA r =∪∪∪ 21 and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211  Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp  The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r =+++ 21 ( ) rk r kk r r ppp kkk n nkkkp    21 21 21 21 !!! ! ,,,, = For n goes to infinity and we havenpnknpn iii +≤≤− ( ) ( ) ( ) r r r rr k r kk r ppn pn pnk pn pnk ppp kkk n r     1 1 2 1 2 11 21 21 2 2 1 exp !!! ! 21 −               − ++ − − → π
  • 92.
    92 SOLO Review ofProbability De Moivre-Laplace Asymptotical Development for Generalized PoissonTrials Consider the r-1 mutually exclusive events A1, A2, …, Ar-1 rjiOAA ji ,,2,1 =≠/=∩ with small probability of occurrence ( ) ( ) ( ) 112211 ,,, −− === rr pAppAppAp  such that ( ) ( ) ( ) 11:121121 <<−=+++=+++ −− rrr ppppApApAp  The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar-1, kr-1 times such that rr knkkk −=+++ − :121  ( ) rk r kk r r ppp kkk n nkkkp    21 21 21 21 !!! ! ,,,, = For n goes to infinity ( ) ( ) ( ) ( ) ! exp ! exp !!! ! 1 11 1 11 21 21 11 21 − −− −− → − r r k r k k r kk r k pnpn k pnpn ppp kkk n r r   Table of Content
  • 93.
    93 SOLO Review ofProbability Laplacian Distribution Pierre-Simon Laplace (1749-1827) ( )         − −= b x b bxp µ µ exp 2 1 ,; ( ) ∫ ∞−         − −= x du b u b bxP µ µ exp 2 1 ,; ( ) µ=xE ( ) 2 2bxVar = ( ) ( )[ ] ( ) ( ) 22 1 exp expexp 2 1 exp b j duuj b u b xjEX ω µω ω µ ωω + =       − −= =Φ ∫ ∞+ ∞− Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function Distribution Examples Table of Content
  • 94.
    94 SOLO Review ofProbability Gama Distribution ( ) ( ) ( )      < ≥ Γ − = − 00 0 /exp ,; 1 x xx k x kxp k k θ θ θ ( ) θkxE = ( ) 2 θkxVar = ( ) ( )[ ] ( ) k X j xjE − −= =Φ θω ωω 1 exp Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( ) ( ) ( )      < ≥ Γ= 00 0 /, ,; x x k xk kxP θγ θ Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a ( ) ( )∫ −= − x a dtttxa 0 1 exp,γγ is the incomplete gamma function Distribution Examples Table of Content
  • 95.
    95 SOLO Review ofProbability Beta Distribution ( ) ( ) ( ) ( ) ( ) ( ) ( ) 11 1 0 11 11 1 1 1 ,; −− −− −− − ΓΓ +Γ = − − = ∫ βα βα βα βα βα βα xx duuu xx xp ( ) ( ) ( ) ( ) ( )∫ −− − ΓΓ +Γ = x duuuxP 0 11 1,; βα βα βα βα ( ) βα α + =xE ( ) ( ) ( )1 2 +++ = βαβα βα xVar ( ) ( )[ ] ( ) ∑ ∏ ∞ = − =       ++ + += =Φ 1 1 0 ! 1 exp k kk r X k j r r xjE ω βα α ωω Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a Distribution Examples Beta Distribution Example Table of Content
  • 96.
    96 SOLO Review ofProbability Cauchy Distribution Augustin Louis Cauchy )-1789 1857( ( ) ( )       +− =               − + = 22 0 2 0 0 1 1 1 ,; γ γ π γ γπ γ xxxx xxp ( ) 2 1 arctan 1 ,; 0 0 +      − = γπ γ xx xxP Probability Density Functions Cumulative Distribution Function Mean Value not defined Variance not defined Moment Generating Function not defined Distribution Examples
  • 97.
    97 SOLO Review ofProbability Cauchy Distribution ( )      ≤≤− =Θ elsewere p 0 2 1 11 1 θθθ θθ Example of Cauchy Distribution Derivation Assume a particle is leaving the origin, moving with constant velocity toward a wall situated at a distance a from the origin. The angle θ, between particle velocity vector and Ox axis, is a random variable uniform distributed between – θ1 and + θ1. Find the probability distribution function of y, the distance from Ox axis at which the particle hits the wall. θtanay = ( ) ( ) ( ) ( ) ( ) ( )      ≤≤− +=      ≤≤− +== Θ elsewere ya a elsewere a a d d p ypY 0 2/ 0 tan1 2/1 tan 1122 1 112 1 θθθ θ θθθ θ θ θ θ θ Therefore we obtain Functions of One Random Variable Table of Content
  • 98.
    98 SOLO Review ofProbability Exponential Distribution ( ) ( )    < ≥− = 00 0exp ; x xx xp λλ λ ( ) ( ) ( ) ( ) ( ) λ λλ λλ λλ 1 expexp exp 0 0exp 0 =−+−−= −= ∫ ∫ ∞ ∞ = −= ∞ dxxxx dxxxxE xu dxxdv ( ) ( ) ( ) 2 22 1 λ =−= xExExVar ( ) ( )[ ] ( ) ( ) ( )[ ] 1 0 0 1exp expexpexp −∞ ∞       −=− − = −==Φ ∫ λ ω λω λω λ λλωωω j xj j dxxxjxjEX Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( ) ( ) ( )    < ≥−− =−= ∫∞− 00 0exp1 exp; x xx dxxxP x λ λλλ ( ) ( ) 2 0 2 2 22 2 λω ω = Φ = = d d jxE X Distributions examples Table of Content
  • 99.
    99 SOLO Review ofProbability Chi-square Distribution ( ) ( ) ( ) ( ) ( )      < ≥− Γ= − 00 02/exp 2/ 2/1 ; 2/2 2/ x xxx kkxp k k ( ) kxE = ( ) kxVar 2= ( ) ( )[ ] ( ) 2/ 21 exp k X j xjE − −= =Φ ω ωω Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( ) ( ) ( )      < ≥ Γ= 00 0 2/ 2/,2/ ; x x k xk kxP γ Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a ( ) ( )∫ −= − x a dtttxa 0 1 exp,γγ is the incomplete gamma function Distributions examples
  • 100.
    100 SOLO Review ofProbability Derivation of Chi and Chi-square Distributions Given k normal random independent variables X1, X2,…,Xk with zero men values and same variance σ2 , their joint density is given by ( ) ( ) ( )       ++ −=       − = ∏ = 2 22 1 2/ 1 2/1 2 2 1 2 exp 2 1 2 2 exp ,,1 σσπσπ σ k kk k i i normal tindependen kXX xx x xxp k   Define Chi-square 0:: 22 1 2 ≥++== kk xxy χ Chi 0: 22 1 ≥++= kk xx χ ( )      +≤++≤=Χ kkkkkk dxxdp k χχχχχ 22 1 Pr  The region in χk space, where pΧk (χk) is constant, is a hyper-shell of a volume (A to be defined) χχ dAVd k 1− = ( ) ( )   Vd kk kkkkkkkk dAdxxdp k χχ σ χ σπ χχχχχ 1 2 2 2/ 22 1 2 exp 2 1 Pr − Χ       −=     +≤++≤= ( ) ( )       −= − Χ 2 2 2/ 1 2 exp 2 σ χ σπ χ χ k kk k k A p k Compute 1x 2x 3x χ χdχχπ ddV 2 4=
  • 101.
    101 SOLO Review ofProbability Derivation of Chi and Chi-square Distributions (continue – 1) ( ) ( ) ( )k k kk k k U A p k χ σ χ σπ χ χ       −= − Χ 2 2 2/ 1 2 exp 2 Chi-square 0: 22 1 2 ≥++== kk xxy χ ( ) ( ) ( ) ( ) ( ) ( )      < ≥      − =         −+== − Χ 00 0 2 exp 22 1 2 2/1 2/ 0 2 2 2 y y y y y A ypyp d yd ypp k kk y k Yk kkk σσπ χ χ χχ   A is determined from the condition ( ) 1=∫ ∞ ∞− dyypY ( ) ( ) ( ) ( ) ( ) ( )2/ 2 12/ 222 exp 22 2/ 2/2 0 2 2 2 22/ k Ak Ay d yyA dyyp k k k kY Γ =→=Γ=            −      = ∫∫ ∞ − ∞ ∞− π πσσσπ ( ) ( ) ( ) ( ) ( )yU yy k kyp kk Y       −      Γ = − 2 2/2 2 2/ 2 exp 2/ 2/1 ,; σσ σ Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a ( ) ( ) ( ) ( ) ( )k k k k k k k U k p k χ σ χ σ χ χ         − Γ = −−− Χ 2 212/2 2 exp 2/ 2/1 ( )    < ≥ = 00 01 : a a aU Function of One Random Variable
  • 102.
    102 SOLO Review ofProbability Derivation of Chi and Chi-square Distributions (continue – 2) Table of Content Chi-square 0: 22 1 2 ≥++== kk xxy χ Mean Value { } { } { }2 2 2 2 1k kE E x E x kχ σ= + + = { } ( ){ } ( ){ } 4 2 42 2 4 0 1, , & 3 th i i i Moment of a Gauss Distribution x i i i i x E x i k E x x E x xσ σ σ  = =  =  = − = − =   ( ){ } ( ){ } { } { } ( ) ( ) 2 4 2 4 2 2 22 2 2 2 2 4 2 2 4 1 2 2 2 4 4 2 2 2 4 1 1 1 1 1 3 2 2 4 4 3 2 k k k k i i k k k k k i j i i j i j i i j i j k k E k E k E x k E x x k E x E x x k k k k k k χ σ σ σ χ σ χ σ σ σ σ σ σ = = = = = = ≠ −     = − = − = −  ÷          = − = + −  ÷ ÷      = + − − = ∑ ∑ ∑ ∑ ∑∑  Variance ( ){ }2 22 2 2 4 2 k kE k kχ σ χ σ σ= − = where xi are gaussian with Gauss’ Distribution
  • 103.
    103 SOLO Review ofProbability Derivation of Chi and Chi-square Distributions (continue – 3) Tail probabilities of the chi-square and normal densities. The Table presents the points on the chi-square distribution for a given upper tail probability { }xyQ >= Pr where y = χn 2 and n is the number of degrees of freedom. This tabulated function is also known as the complementary distribution. An alternative way of writing the previous equation is: { } ( )QxyQ n −=≤=− 1Pr1 2 χ which indicates that at the left of the point x the probability mass is 1 – Q. This is 100 (1 – Q) percentile point. Examples 1. The 95 % probability region for χ2 2 variable can be taken at the one-sided probability region (cutting off the 5% upper tail): ( )[ ] [ ]99.5,095.0,0 2 2 =χ .5 99 2. Or the two-sided probability region (cutting off both 2.5% tails): ( ) ( )[ ] [ ]38.7,05.0975.0,025.0 2 2 2 2 =χχ .0 51 .0 975 .0 025.0 05 .7 38 3. For χ1002 variable, the two-sided 95% probability region (cutting off both 2.5% tails) is: ( ) ( )[ ] [ ]130,74975.0,025.0 2 100 2 100 =χχ 74 130
  • 104.
    104 SOLO Review ofProbability Derivation of Chi and Chi-square Distributions (continue – 4) Note the skewedness of the chi-square distribution: the above two-sided regions are not symmetric about the corresponding means { } nE n = 2 χ Tail probabilities of the chi-square and normal densities. For degrees of freedom above 100, the following approximation of the points on the chi-square distribution can be used: ( ) ( )[ ]22 121 2 1 1 −+−=− nQQn Gχ where G ( ) is given in the last line of the Table and shows the point x on the standard (zero mean and unity variance) Gaussian distribution for the same tail probabilities. In the case Pr { y } = N (y; 0,1) and with Q = Pr { y>x }, we have x (1-Q) :=G (1-Q) .5 99.0 51 .0 975 .0 025.0 05 .7 38 Table of Content
  • 105.
    105 SOLO Review ofProbability Student’s t-Distribution ( ) ( )[ ] ( ) ( )( ) 2/12 /12/ 2/1 ; + +Γ +Γ = ν ννπν ν ν x xp ( )    = > = 1 10 ν ν undefined xE ( ) ( )    ∞ >− = otherwise xVar 22/ ννν Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function not defined ( ) ( )[ ] ( ) ∑ ∞ =       −             +       Γ +Γ += 0 2 ! 2 3 2 1 2 1 2/ 2/1 2 1 ; n n n nn n x x xP ν ν ννπ ν ν Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a ( ) ( ) ( ) ( )121: −+++= naaaaa n  It get his name after W.S. Gosset that wrote under pseudonym “Student” William Sealey Gosset 1876 - 1937 Distributions examples Table of Content
  • 106.
    106 SOLO Review ofProbability Uniform Distribution (Continuous) ( )      >> ≤≤ −= bxxa bxa abbaxp 0 1 ,; ( ) 2 ba xE + = ( ) ( ) 12 2 ab xVar − = ( ) ( )[ ] ( ) ( ) ( )abj ajbj xjE − − = =Φ ω ωω ωω expexp exp Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( )        > ≤≤ − − > = bx bxa ab ax xa baxP 1 0 ,; Distributions examples Moments Table of Content
  • 107.
    107 SOLO Review ofProbability Rayleigh Distribution ( ) 2 2 2 2 exp ; σ σ σ       − = x x xp ( ) 2 π σ=xE ( ) 2 2 4 σ π− =xVar Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( )       −−= 2 2 2 exp1; σ σ x xP ( ) ( )         −      −−=Φ jerfi 22 2/exp1 22 σωπ σωσωω John William Strutt Lord Rayleigh (1842-1919) Distributions examples Moments Rayleigh Distribution is the chi-distribution with k=2( ) ( ) ( ) ( ) ( )k k k k k k k U k p k χ σ χ σ χ χ         − Γ = −−− Χ 2 212/2 2 exp 2/ 2/1
  • 108.
    108 SOLO Review ofProbability Rayleigh Distribution Given X and Y, two independent gaussian random variables, with zero means and the same variances σ2 Example of Rayleigh Distribution ( )       + −= 2 22 2 2 exp 2 1 , σσπ yx yxpXY find the distributions of R and Θ given by: ( )XYYXR /tan& 122 − =Θ+= ( ) ( ) ( ) ( ) θθ σπ θ σ σπσ θθ dprdrp drdrr ydxdyx ydxdyxpdrdrp r XYR Θ Θ =      −=       + −== 22 2 22 22 22 exp 22 exp,, where: ( ) πθ π θ 20 2 1 ≤≤=Θp ( ) 0 2 exp 2 2 2 ≥      −= r rr rpr σσ Uniform Distribution Rayleigh Distribution Solution Table of Content x y r θ
  • 109.
    109 SOLO Review ofProbability Rice Distribution ( )             + − = 202 2 22 2 exp ,; σσ σ σ vx I vx x vxp ( ) 2 π σ=xE ( ) 2 2 4 σ π− =xVar Probability Density Functions Cumulative Distribution Function Mean Value Variance Moment Generating Function ( )       −−= 2 2 2 exp1; σ σ x xP ( ) ( )         −      −−=Φ jerfi 22 2/exp1 22 σωπ σωσωω Stuart Arthur Rice 1889 - 1969 Distributions examples where: ( ) ( )∫       −=      π ϕ σ ϕ πσ 2 0 220 ' 2 'cos exp 2 1 d vxvx I
  • 110.
    110 SOLO Review ofProbability Rice Distribution The Rice Distribution applies to the statistics of the envelope of the output of a bandpass filter consisting of signal plus noise. Example of Rice Distribution ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( )tAtntAtn ttnttntAtnts SC SC 00 000 sinsincoscos sincoscos ωϕωϕ ωωϕω −++= +++=+ X = nC (t) and Y = nS (t) are gaussian random variables, with zero mean and the same variances σ2 and φ is the unknown but constant signal phase. Define the output envelope R and phase Θ: ( )[ ] ( )[ ] ( )[ ] ( )[ ]{ }ϕϕ ϕϕ cos/sintan sincos 1 22 AtnAtn AtnAtnR CS SC +−=Θ −++= − ( ) ( ) ( ) ( ) ( ) 222 22 22 2 2 2 22 cos exp 2 exp 22 sin exp 2 cos exp,, σπ θ σ θϕ σ σπσ ϕ σ ϕ θθ drdrrAAr ydxdAyAx ydxdyxpdrdrp XYR       + −      + −=       − −      + −==Θ Solution ( ) ( ) ( ) ( )∫∫ +      + −      + −== Θ ππ θϕ σ θϕ σπσ θθ 2 0 222 222 0 2 cos exp 22 exp, d rArAr drprp RR
  • 111.
    111 SOLO Review ofProbability Rice Distribution Example of Rice Distribution (continue – 1) ( ) ( ) ( ) ( )∫∫       −      + −== Θ ππ ϕ σ ϕ πσσ θθ 2 0 22 22 2 2 0 ' 2 'cos exp 2 1 2 exp, d rAArr drprp RR where: ( ) ( )∫       −=      π ϕ σ ϕ πσ 2 0 220 ' 2 'cos exp 2 1 d rAAr I is the zero-order modified Bessel function of the first kind ( )             + −= 202 22 2 2 exp,; σσσ σ Ar I Arr ArpR Rice Distribution Since I0 (0) = 1, if in the Rice Distribution we take A = 0 we obtain: Rayleigh Distribution( )       −== 2 2 2 2 exp,0; σσ σ rr ArpR Table of Content
  • 112.
    112 SOLO Review ofProbability Weibull Distribution ( )      < >≥               − −      − = − 00 0,,exp ,,; 1 x x xx xp αγµ α µ α µ α γ αµγ γγ ( ) ( )               − −−== ∫∞− γ α µ αµγαµγ x dxxpxP x exp1,,;,,; ( )       +Γ= γ α 1 1xE Γ is the gamma function ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a Ernst Hjalmar Waloddi Weibull 1887 - 1979 Probability Density Functions Cumulative Distribution Function Mean Value Variance( ) ( )22 2 1 xExVar −      +Γ= γ α Distributions examples Table of Content
  • 113.
    113 KINETIC THEORY OFGASESSOLO MAXWELL’S VELOCITY DISTRIBUTION IN 1859 MAXWELL PROPOSED THE FOLLOWING MODEL: ASSUME THAT THE VELOCITY COMPONENTS OF N MOLECULES, ENCLOSED IN A CUBE WITH SIDE l, ALONG EACH OF THE THREE COORDINATE AXES ARE INDEPENDENTLY AND IDENTICALLY DISTRIBUTED ACCORDING TO THE DENSITY f0(α) = f0(-α), I.E., JAMES CLERK MAXWELL (1831 – 1879) ( ) ( ) ( ) ( ) ( ) ( )[ ] zyx zyxzzyyxx vdvdvdvvvvBA vdvdvdvvfvvfvvfvdvf 00 000000 3 0 exp   −⋅−−= −−−= f (Vi) d Vi = THE PROBABILITY THAT THE i VELOCITY COMPONENTS IS BETWEEN vi AND vi + d vi ; i=x,y,z MAXWELL ASSUMED THAT THE DISTRIBUTION DEPENDS ONLY ON THE MAGNITUDE OF THE VELOCITY.
  • 114.
    114 KINETIC THEORY OFGASESSOLO MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE) SINCE THE DEFINITION OF THE TOTAL NUMBER OF PARTICLES N IS: ( )∫ ∫= tvrfvdrdN ,,33  WE HAVE IN EQUILIBRIUM ( ) ( )[ ] ( ) ( ) ( ) 2 3 222 222 0 3 expexpexp exp       =−−−= ++−== ∫∫∫ ∫ ∫ ∫∫ ∞ ∞− ∞ ∞− ∞ ∞− ∞ ∞− ∞ ∞− ∞ ∞− B AdvvBdvvBdvvBA dvdvdvvvvBAvfvd V N zzyyxx zyxxxx π  WHERE V IS THE VOLUME OF THE CONTAINER ∫= rdV 3 IT FOLLOWS THAT B > 0 AND V NB A 2 3       = π LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2 00 exp vvBAvf  −−=0v 
  • 115.
    115 KINETIC THEORY OFGASESSOLO MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE) LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2 00 exp vvBAvf  −−=0v  THE AVERAGE VELOCITY IS GIVEN BY: ( ) ( ) ( ) ( )[ ] ( ) [ ] 00 3 00 3 0 3 0 3 exp exp vvvBvvvd N VA vvvvBvvd N VA vfvd vfvvd v      =⋅−+= −⋅−−== ∫ ∫ ∫ ∫ THE AVERAGE KINEMATIC ENERGY OF THE MOLECULES ε WHEN IS00  =v ( ) ( ) ( ) B m vBvvd N VAm vfvd vfvmvd 4 3 exp 2 2 1 223 0 3 0 23 =−== ∫ ∫ ∫   ε WE FOUND ALSO THAT FOR A MONOATOMIC GAS Tk 2 3 =ε Tk mm B 24 3 == ε V N Tk m V NB A 2 3 2 3 2       =      = ππ THEREFORE
  • 116.
    116 KINETIC THEORY OFGASESSOLO MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE) MAXWELL VELOCITY DISTRIBUTION BECOMES ( )       ⋅−      = vv Tk m Tk m V N vf  2 exp 2 2 3 0 π ( ) ( ) ( ) ( ) ( ) zyxzyx zyxzyx vdvdvdvvv Tk m Tk m V N vdvdvdvfvfvfvdvf       ++−      = = 222 2/3 0 2 exp 2π 3 OR
  • 117.
    117 KINETIC THEORY OFGASESSOLO MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE) ( )       ⋅−      = vv Tk m Tk m V N vf  2 exp 2 2 3 0 π Table of Content Maxwell’s Distribution is the chi-distribution with k=3( ) ( ) ( ) ( ) ( )k k k k k k k U k p k χ σ χ σ χ χ         − Γ = −−− Χ 2 212/2 2 exp 2/ 2/1
  • 118.
    118 KINETIC THEORY OFGASESSOLO MOLECULAR MODELS BOLTZMANN STATISTICS • DISTINGUISHABLE PARTICLES • NO LIMIT ON THE NUMBER OF PARTICLES PER QUANTUM STATE. BOSE-EINSTEIN STATISTICS • INDISTINGUISHABLE PARTICLES • NO LIMIT ON THE NUMBER OF PARTICLES PER QUANTUM STATE. FERMI-DIRAC STATISTICS • INDISTINGUISHABLE PARTICLES • ON PARTICLE PER QUANTUM STATE. LUDWIG BOLTZMANN SATYENDRANATH N. BOSE ALBERT EINSTEIN ENRICO FERMI PAUL A.M. DIRAC ∏         = j j N j N g Nw j ! ! ( ) ( )∏         − −+ = j jjj jj Ng Ng w !!1 !1 ( )∏         − = j jjjj j NNg g w !! ! ∑= j jNN ∑= j jj NE 'ε NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE Table of Content
  • 119.
    119 KINETIC THEORY OFGASESSOLO MOLECULAR MODELS BOLTZMANN STATISTICS • DISTINGUISHABLE PARTICLES • NO LIMIT ON THE NUMBER OF PARTICLES PER QUANTUM STATE. LUDWIG BOLTZMANN ∏         = j j N j Boltz N g Nw j ! !NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE NUMBER OF WAYS N DISTINGUISHABLE PARTICLES CAN BE DIVIDED IN GROUPS WITH N1, N2,…,Nj,…PARTICLES IS ∑= j jNN ∏j jN N ! ! NUMBER OF WAYS Nj PARTICLES CAN BE PLASED IN THE gj STATES IS jN jg A MACROSTATE IS DEFINED BY - QUANTUM STATES g1,g2,…,gj AT THE ENERGY LEVELS - NUMBER OF PARTICLES N1,N2,…Nj IN STATES g1,g2, …,gj j',,',' 21 εεε 
  • 120.
    120 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE BOLTZMANN STATISTICS ∏         = j j N j Boltz N g Nw j ! ! USING STIRLING FORMULA 0'' ==⇒= ∑∑ EdNdNE j jj j jj εε ( ) aaaa −≈ ln!ln ( ) ( )∑∑ /+−+/−≈−+= j jjjjj STIRLING j jjj NNNgNNNNNgNNw lnlnln!lnln!lnln ( ) ( ) 0lnlnln =−−= ∑j jjjjj NdNNdgNdwd 0==⇒= ∑∑ NdNdNN j j j j TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL CONSTRAINTED BY:
  • 121.
    121 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE BOLTZMANN STATISTICS (CONTINUE) ∏         = j j N j Boltz N g Nw j ! ! 0' =∑j jj Ndεβ ( ) 0lnln =         =− ∑j j j j Nd g N wd 0=∑j jNdα WE OBTAIN LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS 0' * ln0'ln =++         ⇒=         ++         ∑ j j j j jj j j g N Nd g N εβαεβα βα, TO OBTAIN OR j eegN jBoltzj ' * εβα −− = BOLTZMANN MOST PROBABLE MACROSTATE Table of Content
  • 122.
    122 KINETIC THEORY OFGASESSOLO MOLECULAR MODELS BOSE-EINSTEIN STATISTICS • INDISTINGUISHABLE PARTICLES • NO LIMIT ON THE NUMBER OF PARTICLES PER QUANTUM STATE. NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE ∑= j jNN NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS ( ) ( ) !!1 !1 jj jj Ng Ng − −+ A MACROSTATE IS DEFINED BY - QUANTUM STATES g1,g2,…,gj AT THE ENERGY LEVELS - NUMBER OF PARTICLES N1,N2,…Nj IN STATES g1,g2, …,gj SATYENDRANATH N. BOSE (1894-1974) ALBERT EINSTEIN (1879-1955) j',,',' 21 εεε  ( ) ( )∏ − −+ =− j jj jj EB Ng Ng w !!1 !1
  • 123.
    123 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE BOSE-EINSTEIN STATISTICS (CONTINUE) USING STIRLING FORMULA ( ) aaaa −≈ ln!ln ( )[ ] ( ) ( ) ( )[ ] ( )∑ ∑∑                 +++= /+−/+−/+/−++≈−−+= j j j jjjj j jjjjjjjjjjjj STIRLING j jjjj g N gNgN NNNggggNgNgNNggNw 1ln/1ln lnlnln!ln!ln!ln ( ) 01ln 1 1 1 1lnln 2 =         +=               + /+ + − /+         += ∑∑ j j j j j j j j j j j j j j j j j Nd N g Nd g N g g N g N g N N g wd TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL ( ) ( ) ( ) ∏∏ + ≈ − −+ =− j jj jj j jj jj EB Ng Ng Ng Ng w !! ! !!1 !1
  • 124.
    124 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE BOSE-EINSTEIN STATISTICS (CONTINUE) 0' =∑j jj Ndεβ0=∑j jNdα WE OBTAIN LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS 0' * 1ln0'1ln =−−         +⇒=         −−         +∑ j j j j jj j j N g Nd N g εβαεβα βα, TO OBTAIN OR 1 * ' − =− j ee g N j EBj εβα BOSE-EINSTEIN MOST PROBABLE MACROSTATE ( ) ( ) ( ) ∏∏ + ≈ − −+ =− j jj jj j jj jj EB Ng Ng Ng Ng w !! ! !!1 !1 ( ) 01lnln =         += ∑j j j j Nd N g wd Table of Content
  • 125.
    125 KINETIC THEORY OFGASESSOLO MOLECULAR MODELS FERMI-DIRAC STATISTICS NUMBER OF MICROSTATES FOR A GIVEN MACROSTATE ∑= j jNN NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS ( ) !! ! jjj j NNg g − ( )∏ − =− j jjj j DF NNg g w !! ! • INDISTINGUISHABLE PARTICLES • ON PARTICLE PER QUANTUM STATE. ENRICO FERMI (1901-1954) PAUL A.M. DIRAC (1902-1984) A MACROSTATE IS DEFINED BY - QUANTUM STATES g1,g2,…,gj AT THE ENERGY LEVELS - NUMBER OF PARTICLES N1,N2,…Nj AT THE ENERGY LEVELS IN STATES g1,g2, …,gj j',,',' 21 εεε  j',,',' 21 εεε 
  • 126.
    126 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE FERMI-DIRAC STATISTICS (CONTINUE) USING STIRLING FORMULA ( ) aaaa −≈ ln!ln ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ]∑ ∑∑ −−−−= /+−/−/+−−−/−≈−−−= j jjjjjjjj j jjjjjjjjjjjj STIRLING j jjjj NNNgNggg NNNNgNgNggggNNggw lnlnln lnlnln!ln!ln!lnln ( ) ( )[ ] 0lnlnlnln =         − =−−−+= ∑∑ j j j jj j jjjjjjj Nd N Ng NdNNdNdNgNdwd TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL ( )∏ − =− j jjj j DF NNg g w !! !
  • 127.
    127 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE FERMI-DIRAC STATISTICS (CONTINUE) 0' =∑j jj Ndεβ0=∑j jNdα WE OBTAIN LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS 0' * * ln0'ln =−−         − ⇒=         −−         − ∑ j j jj j jj j jj N Ng Nd N Ng εβαεβα βα, TO OBTAIN OR 1 * ' + =− j ee g N j DFj εβα FERMI-DIRAC MOST PROBABLE MACROSTATE ( ) 0lnln =         − = ∑j j j jj Nd N Ng wd ( )∏ − =− j jjj j DF NNg g w !! !
  • 128.
    128 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE FERMI-DIRAC STATISTICS OR ( )∏ − =− j jjj j DF NNg g w !! !( ) ( )∏ − −+ =− j jj jj EB Ng Ng w !!1 !1 BOSE-EINSTEIN STATISTICS ∏         = j j N j Boltz N g Nw j ! ! BOLTZMANN STATISTICS FOR GASES AT LOW PRESSURES OR HIGH TEMPERATURE THE NUMBER OF QUANTUM STATES gj AVAILABLE AT ANY LEVEL IS MUCH LARGER THAN THE NUMBER OF PARTICLES IN THAT LEVEL Nj. jj Ng >> ( ) ( ) ( )( ) ( ) j jj N j Ng jjjjj j jj gNgggg g Ng >> ≈−+++= − −+ 121 !1 !1  ( ) ( ) ( ) j jj N j Ng jjjj jj j gNggg Ng g >> ≈+−−= − 11 ! !  ∏         =≈≈ >> − >> − j j N jBoltz Ng DF Ng EB N g N w ww jjjjj !! AND j jjjj eegNNN jBoltzj Ng DFj Ng EBj ' *** εβα −− >> − >> − =≈≈
  • 129.
    129 KINETIC THEORY OFGASESSOLO THE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE ∏         =≈≈ >> − >> − j j N jBoltz Ng DF Ng EB N g N w ww jjjjj !! AND j jjjj eegNNN jBoltzj Ng DFj Ng EBj ' *** εβα −− >> − >> − =≈≈ DIVIDING THE VALUE OF w FOR BOLTZMANN STATISTICS, WHICH ASSUMED DISTINGUISHABLE PARTICLES, BY N! HAS THE EFFECT OF DISCOUNTING THE DISTINGUISHABILITY OF THE N PARTICLES. Table of Content
  • 130.
    130 SOLO Review ofProbability Monte Carlo Method Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when simulating physical and mathematical systems. Because of their reliance on repeated computation and random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm. The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam, Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to gamble) Stanislaw Ulam 1909 - 1984 Enrico - Fermi 1901 - 1954 John von Neumann 1903 - 1957 Monte Carlo Casino Nicholas Constantine Metropolis (1915 –1999)
  • 131.
    131 SOLO Review ofProbability Monte Carlo Approximation Monte Carlo runs, generate a set of random samples that approximate the distribution p (x). So, with P samples, expectations with respect to the filtering distribution are approximated by ( ) ( ) ( ) ( )∑∫ = ≈ P L L xf P dxxpxf 1 1 and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation. { } ( ) ( ) ∑∫ = ≈== P L L x P dxxpxxE 1 1 1 µ ( ){ } ( ) ( ) ( ) ( )∑∫ = −≈−=−= P L nLnn n x P dxxpxxE 1 111 1 µµµµ  Table of Content x(L) are generated (draw) samples from distribution p (x) ( ) ( )xpx L ~
  • 132.
    132 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (Unknown Statistics) { } { } jimxExE ji ,∀== Define Estimation of the Population mean ∑= = k i ik x k m 1 1 :ˆ A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, , and sample variance, , as estimates of the population mean, m, and variance, σ2 . 2 ˆkσ kmˆ ( ) { } ( ) ( ) ( )[ ] ( ) ( )[ ] 2 1 2 1 222 2 22222 1 11 2 1 2 2 11 2 1 2 11 1 1 1 1 1 21 11 2 1 ˆˆ2 1 ˆ 1 σσ σσσ k k kk mkmkk k mmk k m k xx k Ex k xExE k mxmxE k mx k E k i k i k i k l l k j j k j jii k k i ik k i i k i ki − =      −=       ++−+++−−+=               +       −=       +−=       − ∑ ∑ ∑ ∑∑∑ ∑∑∑ = = = === === { } { } jimxExE ji ,2222 ∀+== σ { } { } mxE k mE k i ik == ∑=1 1 ˆ { } { } { } jimxExExxE ji tindependenxx ji ji ,2 , ∀== Compute Biased Unbiased Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
  • 133.
    133 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 1) { } { } jimxExE ji ,∀== Define Estimation of the Population mean ∑= = k i ik x k m 1 1 :ˆ A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, , and sample variance, , as estimates of the population mean, m, and variance, σ2 . 2 ˆkσ kmˆ ( ) 2 1 2 1 ˆ 1 σ k k mx k E k i ki − =       −∑= { } { } jimxExE ji ,2222 ∀+== σ { } { } mxE k mE k i ik == ∑=1 1 ˆ { } { } { } jimxExExxE ji tindependenxx ji ji ,2 , ∀== Biased Unbiased Therefore, the unbiased estimation of the sample variance of the population is defined as: ( )∑= − − = k i kik mx k 1 22 ˆ 1 1 :ˆσ since { } ( ) 2 1 22 ˆ 1 1 :ˆ σσ =       − − = ∑= k i kik mx k EE Unbiased Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
  • 134.
    134 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 2) A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, , and sample variance, , as estimates of the population mean, m, and variance, σ2 . 2 ˆkσ kmˆ { } { } mxE k mE k i ik == ∑=1 1 ˆ { } ( ) 2 1 22 ˆ 1 1 :ˆ σσ =       − − = ∑= k i kik mx k EE Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
  • 135.
    135 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 3) { } { } mxE k mE k i ik == ∑=1 1 ˆ { } ( ) 2 1 22 ˆ 1 1 :ˆ σσ =       − − = ∑= k i kik mx k EEWe found: Let Compute: ( ){ } ( ) ( ){ } ( ) ( ){ } ( ){ } ( ){ } ( ){ } k mxEmxEmxE k mxmxEmxE k mx k Emx k EmmE k i k ij j ji k i i k i k ij j ji k i i k i i k i ikmk 2 1 1 00 1 2 2 1 11 2 2 2 1 2 1 22 ˆ 2 1 1 11 ˆ: σ σ σ =           −−+−=           −−+−=               −=               −=−= ∑ ∑∑ ∑∑∑ ∑∑ = ≠ == = ≠ == ==  ( ){ } k mmE kmk 2 22 ˆ ˆ: σ σ =−=
  • 136.
    136 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 4) Let Compute: ( ){ } ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )               −− − +− − − +− − =               −−+−−+− − =               −−+− − =               −− − =−= ∑∑ ∑ ∑∑ == = == 2 22 11 2 2 2 1 22 2 2 1 2 2 2 1 22222 ˆ ˆ 11 ˆ2 1 1 ˆˆ2 1 1 ˆ 1 1 ˆ 1 1 ˆ:2 σ σ σσσσσσ k k i i k k i i k i kkii k i ki k i kik mm k k mx k mm mx k E mmmmmxmx k E mmmx k Emx k EE k ( ) ( ){ } ( ){ } ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ }                 k k k i i k k i i k k k i i k k i i k k k i i k k k k i i k k k i k ij j ji k k i i mmE k k mxE k mmE mxE k mmEk mxE k mxE k mmEk mxE k mmE mmE k k mxE k mmE mxEmxEmxE kk / 2 2 1 0 2 0 1 0 2 3 1 2 2 1 2 2 / 2 1 3 2 0 44 2 2 1 2 2 / 2 1 1 22 1 4 2 2 ˆ 2 222 22 22 4 2 ˆ 1 2 1 ˆ4 1 ˆ4 1 2 1 ˆ2 1 ˆ4 ˆ 11 ˆ4 1 1 σ σσσ σσ σσ µ σ σσ σ σσ − − −− − − −− − − + − − −− − − +− − − + +− − +− − − +             −−+− − ≈ ∑∑ ∑∑∑ ∑∑ ∑∑ == === == ≠ == Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−
  • 137.
    137 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 4) Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ− ( ) ( ) ( ) ( ) ( ) ( ){ } ( ) ( ) ( ) ( ) ( ) ( ) ( ){ }4 2 2 4 22 4 44 2 4 44 2 2 2 4 2 4 2 42 ˆ ˆ 11 7 11 2 1 2 1 2 ˆ 11 4 1 1 1 2 k k mmE k k k k k k kk k k k mmE k k kk kk k k k − − + − +− + − = − − − − − + +− − + − + − − + − ≈ σ µσσσ σ σσµ σσ kk 4 42 ˆ 2 σµ σσ − ≈ ( ){ }4 4 : mxE i −=µ ( ) ( ){ } ( ){ } ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ){ } ( ) ( ){ } ( ) ( ){ }                 k k k i i k k i i k k k i i k k i i k k k i i k k k k i i k k k i k ij j ji k k i i mmE k k mxE k mmE mxE k mmEk mxE k mxE k mmEk mxE k mmE mmE k k mxE k mmE mxEmxEmxE kk / 2 2 1 0 2 0 1 0 2 3 1 2 2 1 2 2 / 2 1 3 2 0 44 2 2 1 2 2 / 2 1 1 22 1 4 2 2 ˆ 2 222 22 22 4 2 ˆ 1 2 1 ˆ4 1 ˆ4 1 2 1 ˆ2 1 ˆ4 ˆ 11 ˆ4 1 1 σ σσσ σσ σσ µ σ σσ σ σσ − − −− − − −− − − + − − −− − − +− − − + +− − +− − − +             −−+− − ≈ ∑∑ ∑∑∑ ∑∑ ∑∑ == === == ≠ ==
  • 138.
    138 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 5) { } { } mxE k mE k i ik == ∑=1 1 ˆ { } ( ) 2 1 22 ˆ 1 1 :ˆ σσ =       − − = ∑= k i kik mx k EE We found: ( ){ } k mmE kmk 2 22 ˆ ˆ: σ σ =−= ( ){ } ( ) k mx k EE k i kik k 4 4 2 2 1 22222 ˆ ˆ 1 1 ˆ:2 σµ σσσσσ − ≈               −− − =−= ∑= ( ){ }4 4 : mxE i −=µ Kurtosis of random variable xi Define 4 4 : σ µ λ = ( ){ } ( ) ( ) k mx k EE k i kik k 42 2 1 22222 ˆ 1 ˆ 1 1 ˆ:2 σλ σσσσσ − ≈               −− − =−= ∑=
  • 139.
    SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 6) [ ] ϕσσσ σσ =≤≤ 2 ˆ 2 k 2 k ˆ-0Prob n For high values of k, according to the Central Limit Theorem the estimations of mean and of variance are approximately Gaussian Random Variables. kmˆ 2 ˆkσ We want to find a region around that will contain σ2 with a predefined probability φ as function of the number of iterations k. 2 ˆkσ Since are approximately Gaussian Random Variables nσ is given by solving: 2 ˆkσ ϕζζ π σ σ =      −∫ + − n n d2 2 1 exp 2 1 nσ φ 1.000 0.6827 1.645 0.9000 1.960 0.9500 2.576 0.9900 Cumulative Probability within nσ Standard Deviation of the Mean for a Gaussian Random Variable 22 k 22 1 ˆ- 1 σ λ σσσ λ σσ k n k n − ≤≤ − − 22 k 2 1 1 ˆ-1 1 σ λ σσ λ σσ         − − ≤≤        + − − k n k n ( ) ( ) ( ) ( )( )42222 1,0;ˆ~ˆ&,0;ˆ~ˆ σλσσσσ −−− kkkk kmmmk NN
  • 140.
    SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 7) [ ] ϕσσσ σσ =≤≤ 2 ˆ 2 k 2 k ˆ-0Prob n 22 k 22 1 ˆ- 1 σ λ σσσ λ σσ k n k n − ≤≤ − − 22 k 2 1 1 ˆ-1 1 σ λ σσ λ σσ         − − ≤≤        + − − k n k n 22 ˆ 1 2 k σ λ σσ k − = 22 k 2 1 1ˆ 1 1 σ λ σσ λ σσ         − −≥≥        − + k n k n         − − ≥≥         − + k n k n 1 1 ˆ 1 1 2 2 k 2 λ σ σ λ σ σσ k n k n 1 1 :ˆ: 1 1 k − − =≥≥= − + λ σ σσσ λ σ σσ
  • 141.
    SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 8)
  • 142.
    SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 9)
  • 143.
    143 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 10) k n k n kk 1ˆ 1 :& 1ˆ 1 : 00 − − = − + = λ σ σ λ σ σ σσ Monte-Carlo Procedure Choose the Confidence Level φ and find the corresponding nσ using the normal (Gaussian) distribution. nσ φ 1.000 0.6827 1.645 0.9000 1.960 0.9500 2.576 0.9900 1 Run a few sample k0 > 20 and estimate λ according to2 ( ) ( ) 2 1 2 0 1 4 0 0 0 0 0 0 ˆ 1 ˆ 1 :ˆ       − − = ∑ ∑ = = k i ki k i ki k mx k mx k λ∑= = 0 0 10 1 :ˆ k i ik x k m 3 Compute and as function of kσ σ 4 Find k for which [ ] ϕσσσ σσ =≤≤ 2 ˆ 2 k 2 k ˆ-0Prob n 5 Run k-k0 simulations
  • 144.
    144 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue – 11) Monte-Carlo Procedure Choose the Confidence Level φ = 95% that gives the corresponding nσ=1.96. nσ φ 1.000 0.6827 1.645 0.9000 1.960 0.9500 2.576 0.9900 1 The kurtosis λ = 32 3 Find k for which ϕσ λ σσ σ σ =             − ≤≤  2 kˆ 22 k 2 1 ˆ-0Prob k n 4 Run k>800 simulations Example: Assume a Gaussian distribution λ = 3 95.0 2 96.1ˆ-0Prob 2 kˆ 22 k 2 =             ≤≤  σ σσσ k Assume also that we require also that with probability φ = 95 %22 k 2 1.0ˆ- σσσ ≤ 1.0 2 96.1 = k 800≈k
  • 145.
    145 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 12) Kurtosis of random variable xi Kurtosis Kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations. 1905 Pearson defines Kurtosis, as a measure of departure from normality in a paper published in Biometrika. λ=3 for the normal distribution and the terms ‘leptokurtic’ (λ>3), mesokurtic (λ=3), platikurtic (λ<3) are introduced. ( ){ } ( ){ }[ ]224 /: mxEmxE ii −−=λ ( ){ } ( ){ }[ ]22 4 : mxE mxE i i − − =λ Karl Pearson (1857 –1936) A leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values). A platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values).
  • 146.
    146 Hyperbolic-Secant 25       x 2 sech 2 1 π SOLO Reviewof Probability Estimation of the Mean and Variance of a Random Variable (continue - 13) Distribution Graphical Representation Functional Representation Kurtosis λ Excess Kurtosis λ-3 Normal ( ) σπ σ µ 2 2 exp 2 2       − − x 3 0 Laplace         − − b x b µ exp 2 1 6 3 Uniform bxorxa bxa ab >> ≤≤ − 0 1 1.8 -1.2 Wigner Rx RxxR R > ≤− 0 2 22 2 π -1.02
  • 147.
    147 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable (continue - 14) Skewness of random variable xi Skewness ( ){ } ( ){ }[ ] 2/32 3 : mxE mxE i i − − =γ Karl Pearson (1857 –1936) Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed. 1 Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed. 2 More data in the left tail than it would be expected in a normal distribution More data in the right tail than it would be expected in a normal distribution Karl Pearson suggested two simpler calculations as a measure of skewness: • (mean - mode) / standard deviation • 3 (mean - median) / standard deviation
  • 148.
    148 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) We found that using k measurements the estimated mean and variance are given in batch form by: ∑= = k i ik x k x 1 1 :ˆ A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, , and the variance pk, by a Recursive Filter kxˆ The k+1 measurement will give: ( )1 1 1 1 ˆ 1 1 1 1 ˆ + + = + + + = + = ∑ kk k i ik xxk k x k x ( )kkkk xx k xx ˆ 1 1 ˆˆ 11 − + += ++ Therefore the Recursive Filter form for the k+1 measurement will be: ( )∑= − − = k i kik xx k p 1 2 ˆ 1 1 : ( )∑ + = ++ −= 1 1 2 11 ˆ 1 k i kik xx k p
  • 149.
    149 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 1) We found that using k+1 measurements the estimated variance is given in batch form by: A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, , and the variance pk, by a Recursive Filter kxˆ ( )     + −− + += ++ kkkkk p k k xx k pp 1 ˆ 1 1 2 11 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )2 1 2 12 1 0 1 1 2 1 1 1 2 1 1 2 1 1 1 2 11 ˆ 1 11 1ˆ 1 1 ˆˆˆ 1 2 ˆˆ 1 1 ˆ ˆ 1 ˆ 1 kkkkk kk k i kikkkk pk k i ki k i kk ki k i kik xx k p k xx kk k xxxxxx kk xxxx k k xx xx k xx k p k − + +      −=− + + +           −+−− + −             −+−=       + − −−=−= ++ + = ++ − = + = + + = ++ ∑∑ ∑∑  ( )kkkk xx k xx ˆ 1 1 ˆˆ 11 − + += ++
  • 150.
    150 SOLO Review ofProbability Estimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 2) A random variable, x, may take on any values in the range - ∞ to + ∞. Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, , and the variance pk, by a Recursive Filter kxˆ ( )     + −− + += ++ kkkkk p k k xx k pp 1 ˆ 1 1 2 11 ( )kkkk xx k xx ˆ 1 1 ˆˆ 11 − + += ++ ( ) ( ) ( )kkkk xxkxx ˆˆ1ˆ 11 −+=− ++ ( )( )     −−++= ++ kkkkk p k xxkpp 1 ˆˆ1 2 11
  • 151.
    151 SOLO Review ofProbability Estimate the value of a constant x, given discrete measurements of x corrupted by an uncorrelated gaussian noise sequence with zero mean and variance r0. The scalar equations describing this situation are: kk xx =+1 kkk vxz += System Measurement ( )0,0~ rNvk The Discrete Kalman Filter is given by: ( ) ( )+=−+ kk xx ˆˆ 1 ( ) ( ) ( ) ( )[ ] ( )[ ]−−+−−+−=+ ++ − ++++ + 11 1 01111 ˆˆˆ 1 kk K kkkk xzrppxx k      0 1 kkk I kk wxx Γ+Φ=+  kk I kk vxHz += ( ) ( )[ ] ( )[ ]{ }  ( ) ( )+=ΓΓ+Φ+Φ=−−−−=− +++++ k T I T kk I k T kkkkk pQpxxxxEp  0 11111 ˆˆ ( ) ( )[ ] ( )[ ]{ } ( ) ( )   ( )   ( ) ( ) ( ) ( ) ( ) 0 0 11 1 0111111 11111 1 1 ˆˆ rp pr pHrHpHHpp xxxxEp k k pp k I k K T I kk I k T I kkk T kkkkk kk k ++ + =−         +−−−−= −+−+=+ +=− ++ − ++++++ +++++ + +    General Form with Known Statistics Moments Using a Discrete Recursive Filter Estimation of the Mean and Variance of a Random Variable
  • 152.
    152 SOLO Review ofProbability Estimate the value of a constant x, given discrete measurements of x corrupted by an uncorrelated gaussian noise sequence with zero mean and variance r0. We found that the Discrete Kalman Filter is given by: ( ) ( ) ( )[ ]+−++=+ +++ kkkkk xzKxx ˆˆˆ 111 ( ) ( ) ( ) ( ) ( ) 0 0 0 1 1 r p p rp pr p k k k k k + + + = ++ + =++ ( ) 0 0 0 1 1 r p p p + =+ ( ) ( ) ( ) 0 1 1 2 1 r p p p + + + =+ ( ) k r p p pk 0 0 0 1+ =+ ( ) ( ) 0 1 rp p K k k k ++ + =+ ( ) ( ) 0 1 rp p K k k k ++ + =+( ) ( ) ( ) ( )[ ]+− ++ ++=+ ++ kkkk xz k r p r p xx ˆ 11 ˆˆ 1 0 0 0 0 1 0=k 1=k 0 0 0 2 1 r p p + = ( )11 1 1 0 0 0 0 0 0 0 0 0 0 0 ++ = + + + = k r p r p r k r p p k r p p with Known Statistics Moments Using a Discrete Recursive Filter (continue – 1) Estimation of the Mean and Variance of a Random Variable
  • 153.
    153 SOLO Review ofProbability Estimate the value of a constant x, given continuous measurements of x corrupted by an uncorrelated gaussian noise sequence with zero mean and variance r0. The scalar equations describing this situation are: 0=x vxz += System Measurement ( )rNv ,0~ The Continuous Kalman Filter is given by: ( )  ( ) ( ) ( ) ( )[ ] ( ) 00ˆ&ˆˆˆ 1 1 0 =−      += + − xtxtzrHtptxAtx kK I     00 wxAx Γ+=  vxHz I += ( ) ( ) ( )[ ] ( ) ( )[ ]{ }T txtxtxtxEtp −−= ˆˆ: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 12 1 1 000 −− −=−++= rtptptHrtHtptGQtGtAtptptAtp TT I TT   General Form with Known Statistics Moments Using a Continuous Recursive Filter Estimation of the Mean and Variance of a Random Variable ( ) ( ) ( ) 0 12 0& ptprtptp ==−= − or: ∫∫ −= tp p dt rp pd 0 2 0 1 ( ) t r p p tp 0 0 1+ = ( ) t r p r p rtpK 0 0 1 1+ == − ( ) ( )[ ]txz t r p r p tx ˆ 1 ˆ 0 0 − + =
  • 154.
    154 SOLO Review ofProbability Generating Discrete Random Variables Pseudo-Random Number Generators • First attempts to generate “random numbers”: - Draw balls out of a stirred urn - Roll dice • 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from census reports. • 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to generate random numbers. They published a table of 100,000 digits. • 1946: J. Von Neumann proposed the “middle square method”. • 1948: D.H. Lehmer introduced the “linear congruential method”. • 1955: RAND Corporation published a table of 1,000,000 random digits obtained from electronic noise. • 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruential generators. • 1989: R.S. Wikramaratna proposed the additive congruential method. Routine RANDU (IBM Corp) “We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random”
  • 155.
    155 SOLO Review ofProbability Generating Discrete Random Variables Pseudo-Random Number Generators On a computer the “random numbers” are not random at all – they are strictly deterministic and reproducible, but they look like a stream of random numbers. For this reason the computer programs are called “Pseudo-Random Number Generators”. Essential Properties of a Pseudo-Random Number Generator Repeatability – the same sequence should be produced with the same initial values (or seeds) Randomness – should produce independent uniformly distributed random variables that passes all statistical tests for randomness. Long Period – a pseudo-random number sequence uses finite precision arithmetic, so the sequence must repeat itself with a finite period. This should be much longer than the amount of random numbers needed for simulation. Insensitive to seeds – period and randomness properties should not depend on the initial seeds.
  • 156.
    156 SOLO Review ofProbability Generating Discrete Random Variables Pseudo-Random Number Generators Essential Properties of a Pseudo-Random Number Generator (continue -1) Portability – should give the same results on different computers Efficiency – should be fast (small number of floating point operations) and not use much memory. Disjoint subsequences – different seeds should produce long independent (disjoint) subsequences so that there are no correlations between simulations with different initial seeds. Homogeneity – sequences of all bits should be random.
  • 157.
    157 SOLO Review ofProbability Generating Discrete Random Variables Pseudo-Random Number Generators A Random Number represents the value of a random variable uniform distributed on (0,1). Pseudo-Random Numbers constitute a sequence of values, which although are deterministically generated, have all the appearances of being independent uniform distributed on (0,1). One approach 1. Define x0 = integer initial condition or seed 2. Using integers a and m recursively compute mxax nn modulo1−= mxIntegerxkmaxmkxa nnn <∈+⋅=− ,,,1 Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-random number is an approximation to the value of uniform (0,1) random variable. In general the integers a and m should be chose to satisfy three criteria: 1. For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent (0,1) random variables. For any initial seed, the number of variables that can be generated before repetition begins is large. The values can be computed efficiently on a digital computer. Multiplicative congruential method
  • 158.
    158 SOLO Review ofProbability Generating Discrete Random Variables Pseudo-Random Number Generators (continue – 1) A gudeline is to choose m to be a large prime number compared to the computer word size. Examples: 32 bits word computer: (some IBM systems)807,16712 531 ==−= am 125,35312 535 ==−= am36 bits word computer: Another generator of pseudo-random numbers uses recursions of the type: ( ) mcxax nn modulo1 += − mxIntegerxkmcaxmkcxa nnn <∈+⋅=+− ,,,,1 Mixed congruential method 32 bits word computer: (VAX)069,69232 == am 32 bits word computer: (transputers)525,664,1232 == am 48 bits word computer: (UNIX, RAND 48 routine)1616 48 6652 BcDDEECEam === 48 bits word computer: (CDC vector machine)052 1547 === cam 48 bits word computer: (Cray vector machine)01757228752 16 48 === cBEAam 64 bits word computer: (Numerical Algorithms Group)0132 1359 === cam Return to Table of Content
  • 159.
    159 SOLO Review ofProbability Generating Discrete Random Variables Histograms Return to Table of Content A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent. The intervals (or bands, or bins) are generally of the same size. Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram always equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot. A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as: An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1. Mathematical Definition ∑= = k i imn 1 In a more general mathematical sense, a histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions: ∑= = i j ji mM 1
  • 160.
    160 SOLO Review ofProbability Generating Discrete Random Variables The Inverse Transform Method Suppose we want to generate a discrete random variable X having probability density function: ( ) 1,1,0)( ==−= ∑j jjj pjxxpxp δ To accomplish this, let generate a random number U that is uniformly distributed over (0,1) and set:            <≤ +<≤ < = ∑∑ = − =   j i i j i ij pUpifx ppUpifx pUifx X 1 1 1 1001 00 j j i i j i ij ppUpPxXP =       <<== ∑∑ = − = 1 1 1 )( Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed P (a ≤ U < b) = b-a, we have: and so X has the desired distribution.
  • 161.
    SOLO Review ofProbability Generating Discrete Random Variables The Inverse Transform Method (continue – 1) Suppose we want to generate a discrete random variable X having probability density function: ( ) 1,1,0)( ==−= ∑j jjj pjxxpxp δ Draw X, N times, from p (x) Histogram of the Results
  • 162.
    SOLO Review ofProbability Generating Discrete Random Variables The Inverse Transform Method (continue – 2) Generating a Poisson Random Variable: 1,1,0 ! )( ===== ∑− i i i i pi i eiXPp  λλ ( ) 1 ! !1 1 1 + = + = − + − + i i e i e p p i i i i λ λ λ λ λ Draw X , N times, from Poisson Distribution Histogram of the Results
  • 163.
    SOLO Review ofProbability Generating Discrete Random Variables The Inverse Transform Method (continue – 3) Generating Binominal Random Variable: ( ) ( ) 1,1,01 !! ! )( ==− − === ∑− i i ini i pipp ini n iXPp  ( ) ( ) ( ) ( ) ( ) p p i in pp ini n pp ini n p p ini ini i i −+ − = − − − −−+ = − −−+ + 111 !! ! 1 !1!1 ! 11 1 Return to Table of Content 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k ( )nkP , Histogram of the Results
  • 164.
    164 SOLO Review ofProbability Generating Discrete Random Variables The Accaptance-Rejection Technique Suppose we have an efficient method for simulating a random variable having a probability density function { qj, j ≥0 }. We want to use this to obtain a random variable that has the probability density function { pj, j ≥0 }. Let c be a constant such that: 0.. ≠∀≤ j j j qtsjc q p If such a c exists, it must satisfy: cqcp j j j j ≤⇒≤ ∑∑ 1 11  Rejection Method Step 1: Simulate the value of Y, having probability density function qj. Step 2: Generate a random number U (that is uniformly distributed over (0,1) ). Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.
  • 165.
    165 SOLO Review ofProbability Generating Discrete Random Variables The Acceptance-Rejection Technique (continue – 1) Theorem The random variable X obtained by the rejection method has probability density function P { X=i } = pi. Proof { } { } { } { } { }Acceptance , Acceptance Acceptance, Acceptance Method Acceptance Method Acceptance P qc p UiYP P iYP iYPiXP i i Bayes       ≤= = = ==== { } { } { } { }AcceptanceAcceptanceAcceptance (0,1)ddistribute uniformlyU ceindependen by Pc p P qc p q P qc p UPiYP ii i i i i qi ==      ≤= =  Summing over all i, yields { } { }Acceptance 1 1 Pc p iXP i i i   ∑ ∑ == { } 1Acceptance =Pc { } ipiXP == { } 1 1 Acceptance ≤= c P q.e.d.
  • 166.
    166 SOLO Review ofProbability Generating Discrete Random Variables The Acceptance-Rejection Technique (continue – 2) Example Generate a truncated Gaussian using the Accept-Reject method. Consider the case with ( ) [ ]     −∈ ≈ − otherwise xe xp x 0 4,42/2/2 π Consider the Uniform proposal function ( ) [ ]    −∈ ≈ otherwise x xq 0 4,48/1 In Figure we can see the results of the Accept-Reject method using N=10,000 samples. Return to Table of Content
  • 167.
    167 SOLO Review ofProbability Generating Continuous Random Variables The Inverse Transform Algorithm Let U be a uniform (0,1) random variable. For any continuous distribution function F the random variable X defined by ( )UFX 1− = has distribution F. [ F-1 (u) is defined to be that value of x such that F (x) = u ] Proof Let Px(x) denote the Probability Distribution Function X=F-1 (U) ( ) { } ( ){ }xUFPxXPxPx ≤=≤= −1 Since F is a distribution function, it means that F (x) is a monotonic increasing function of x and so the inequality “a ≤ b” is equivalent to the inequality “F (a) ≤ F (b)”, therefore ( ) ( )[ ] ( ){ } ( )[ ] ( ){ } ( ) ( ) ( )xFxFUP xFUFFPxP uniformU xF UUFF x 1,0 10 1 1 ≤≤ = − =≤= ≤= − Return to Table of Content
  • 168.
    168 SOLO Review ofProbability Generating Continuous Random Variables The Accaptance-Rejection Technique Suppose we have an efficient method for simulating a random variable having a probability density function g (x). We want to use this to obtain a random variable that has the probability density function f (x). Let c be a constant such that: ( ) ( ) yc yg yf ∀≤ If such a c exists, it must satisfy: ( ) ( ) cdyygcdyyf ≤⇒≤ ∫∫ 1 11  Rejection Method Step 1: Simulate the value of Y, having probability density function g (Y). Step 2: Generate a random number U (that is uniformly distributed over (0,1) ). Step 3: If U < f (Y)/c g (Y), set X = Y and stop. Otherwise return to Step 1.
  • 169.
    169 SOLO Review ofProbability Generating Continuous Random Variables The Acceptance-Rejection Technique (continue – 1) Theorem The random variable X obtained by the rejection method has probability density function P { Y=y } = f (y). Proof { } { } { } { } ( ) ( ) { }Acceptance , Acceptance Acceptance, Acceptance Method Acceptance Method Acceptance P ygc yf UyP P yP YPyYP Bayes       ≤ ==== { } ( ) ( ) ( ) { } ( ) ( ) ( ) { } ( ) { }AcceptanceAcceptanceAcceptance (0,1)ddistribute uniformlyU ceindependen by Pc yf P ygc yf yg P ygc yf UPyP yg ==      ≤ =  Summing over all i, yields { } ( ) { }Acceptance 1 1 Pc dyyf dyyYP     ∫ ∫ == { } 1Acceptance =Pc { } ( )yfyYP == { } 1 1 Acceptance ≤= c P q.e.d. Return to Table of Content
  • 170.
    170 SOLO The Bootstrap • Popularizedby Bradley Efron (1979) • The Bootstrap is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assesed from the data themselves, in other words “pulling yourself up by your bootstraps” The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic.The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches. The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. Generating Discrete Random Variables Bradley Efron 1938 Stanford U. Review of Probability
  • 171.
    171 SOLO The Bootstrap (continue-1) • Given n observation zi i=1,…,n and a calculated statistics S, what is the uncertainty in S? • The Procedure: Generating Discrete Random Variables - Draw m values z’i i=1,…,m from the original data with replacement - Calculate the statistic S’ from the “bootstrapped” sample - Repeat L times to build a distribution of uncertainty in S. Review of Probability Return to Table of Content
  • 172.
    172 SOLO Review ofProbability Importance Sampling (IS) Let Y = (Y1,…,Ym) a vector of random variables having a joint probability density function p (y1,…,ym), and suppose that we are interested in estimating ( )[ ] ( ) ( )∫== mmmmp dydyyypyygYYgE  1111 ,,,,,,θ Suppose that a direct generation of the random vector Y so as to compute g (Y) is inefficient possible because (a) is difficult to generate the random vector Y, or (b) the variance of g (Y) is large, or (c) both of the above Suppose that W=(W1,…,Wm) is another random vector, which takes values in the same domain as Y, and has a joint density function q (w1,…,wm) that can be easily generated. The estimation θ can be expressed as: ( )[ ] ( ) ( ) ( ) ( ) ( ) ( ) ( )       === ∫ Wg Wq Wp Edwdwwwq wwq wwpwwg YYgE qmm m mm mp     11 1 11 1 ,, ,, ,,,, ,,θ Therefore, we can estimate θ by generating values of random vector W, and then using as the estimator the resulting average of the values g (W) p (W)/ q (W). Generating Discrete Random Variables
  • 173.
    173 SOLO Review ofProbability Importance Sampling (IS) (continue – 1) [ ] ( ) ( ) ( ) ( ) ( ) ( ) ( )∑∫ = ≈       == N i w i i iqp i xq xp x N x xq xp Edxxq xq xpx xE 1 1  In Figure the Histogram using the Importance Weight wi is presented together with the true PDF Generating Discrete Random Variables Example: Importance Sample for a Bi-Modal Distribution Consider the following distribution: ( ) ( ) ( )2/1,3: 2 1 1,0: 2 1 xxxp NN += We want to calculate the mean value (g (x) = x) using Importance Sampling. Use: ( ) ( ) ( )5,5& −∈= Uxqxxg For i=1,…,N, sample (draw) xi using q (x) We obtain: ( ) ( )i i i xq xp w =: Importance Weight For N=10,000 samples we obtain Ep [x]=1.4915 instead of 1.5. ( ) Nixqx ii ,,1~ = Return to Table of Content
  • 174.
    174 SOLO Metropolis Algorithm • Thismethod of generation of an arbitrary probability distribution was invented by Metropolis, Rosenbluth and Teller (supposedly at a Los Alamos dinner party) and published June 1953. Generating Discrete Random Variables Review of Probability Procedure • Set up a Markov Chain that has as a unique stationary solution the required π (x) Probability Distribution Function (PDF) • Run the chain until stationary. • All subsequent samples are from stationary distribution π (x) as required. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 Nicholas Constantine Metropolis ( 1915 – 1999) This is also called Markov Chain Monte Carlo (MCMC) method. X3 X2 0.3 0.3 0.1 0.2 X1 0.6 0.50.3 0.6 0.1
  • 175.
    175 SOLO Metropolis Algorithm (continue– 1) Generating Discrete Random Variables Review of Probability Nicholas Constantine Metropolis ( 1915 – 1999) Proof of the Procedure Pr (X,t) - the probability of being in the state X at time t. Pr (X→Y)=Pr (Y|X) - the probability, per unit time, of transition probability, of going from state X to state Y. ( ) ( ) ( ) ( ) ( ) ( )[ ]∑ −+=+ Y tXXYtYYXtXtX ,Pr|Pr,Pr|Pr,Pr1,Pr At large t, once the arbitrary initial state is “forgotten”, we want Pr (X,t) → Pr (X). Clearly a sufficient (but not necessary) condition for an equilibrium (time independent) probability distribution is the so called ( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition: This method can be used for any probability distribution, but Metropolis used: ( ) ( ) ( )AEBEE E Ee AB kTE −=∆    ≤∆ >∆ = ∆− : 01 0 |Pr / Note: E (A) is equivalent to Energy level of state A ( ) ( ) 1|PrPr ==→ ∑∑ YB XYYX Sum of probabilities of all states reached from X. X Y ( )XY |Pr ( )YX |Pr ( ) ( )XY XX |Pr Pr = → ( ) ( )YY YY |Pr Pr = →
  • 176.
    176 SOLO Metropolis Algorithm (continue– 2) Generating Discrete Random Variables Review of Probability ( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition: Metropolis defined a symmetric Q (Y|X) = Q (X|Y) as a candidate generating density, for p (Y|X) such that: ( ) 1| =∑Y XYQ In general Q (Y|X) will not satisfy the “Detailed Balance” condition, for example: ( ) ( ) ( ) ( )tYYXQtXXYQ ,Pr|,Pr| > X Y ( )XY |Pr ( )YX |Pr ( ) ( )XY XX |Pr Pr = → ( ) ( )YY YY |Pr Pr = → The process moves from X to Y too often and from Y to X too rarely. A convenient way to correct this is to reduce the number of moves from X to Y by introducing a probability 0 < Α (Y|X) ≤ 1. This is called the Acceptance Probability. ( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr
  • 177.
    177 SOLO Metropolis Algorithm (continue– 3) Generating Discrete Random Variables Review of Probability X Y ( )XY |Pr ( )YX |Pr ( ) ( )XY XX |Pr Pr = → ( ) ( )YY YY |Pr Pr = → Let define the Acceptance Probability as: ( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr ( ) ( ) ( ) ( ) ( ) ( ) ( )   > ≤ =Α YX YXYX XY PrPr1 PrPrPr/Pr | ( ) ( ) ( ) ( ) ( ) ( ) ( )   > ≤ =Α YXXY YX YX PrPrPr/Pr PrPr1 | ( ) ( ) ( ) XYYXYXQYX ≠Α⋅= |||Pr If Pr (X) ≤ Pr (Y) then A (X|Y) = 1, A (Y|X) = Pr (X)/Pr (Y) If Pr (X) >Pr (Y) then A (X|Y) = Pr (Y)/ Pr (X), A (Y|X) = 1 In both cases: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )Y X YX XY YXYXQ XYXYQ YX XY YXQXYQ Pr Pr | | || || |Pr |Pr || = Α Α = Α⋅ Α⋅ = = which is just the Detailed Balance condition.
  • 178.
    178 SOLO Metropolis Algorithm (continue– 2) Generating Discrete Random Variables Review of Probability ( ) ( ) ( ) ( )tAABtBBA ,Pr|Pr,Pr|Pr =Detailed Balance Condition: This method can be used for any probability distribution, but Metropolis used: ( ) ( ) ( )AEBEE E Ee AB kTE −=∆    ≤∆ >∆ = ∆− : 01 0 |Pr / ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( ) TkE TkBEAE TkAEBE e AEBEE e AEBEE e tA tB BA AB / / / 0 1 0 1 ,Pr ,Pr |Pr |Pr ∆− −− −− =             ≤−=∆ >−=∆ == Therefore A B ( )BA →Pr ( )AB →Pr ( ) ( )BA AA →−= → Pr1 Pr ( ) ( )AB BB →−= → Pr1 Pr
  • 179.
    179 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingDiscrete Random Variables Review of Probability • Set up a Markov Chain T (x’|x) that has as a unique stationary solution the required π (x’) Probability Distribution Function (PDF) ( ) ( ) ( )∫= xdxxxTx ππ |'' W. Keith Hastings improved the Metropolis algorithm by allowing a non-symmetrical Candidate Generating Density. Hastings, W., “Monte Carlo Simulation Methods Using Markov Chains and Their Applications”, Biometrica, 1970, No. 57, pp. 97 - 109 Here we give the development for Continuous Random Variables (for Discrete Random Variables the development is similar to that used for Metropolis Algorithm).
  • 180.
    180 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability • The problem is to find the conditional transition probability distribution T (x’|x) of the Markov Chain, that has states converging, after a transition time, to π (x’). ( ) ( ) ( )∫= xdxxxTx ππ |'' To satisfy this requirement a “necessary condition” (but “not sufficient”) is: Proof: ( ) ( ) ( ) ( ) ( ) ( ) ( )''|'''||' 1 xxdxxTxxdxxxTxdxxxT ππππ === ∫∫∫  q.e.d. Let define Q (x’|x) as a candidate generating density, for T (x’|x) such that: ( ) 1'|' =∫ xdxxQ In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example: ( ) ( ) ( ) ( )''||' xxxTxxxT ππ = “Detailed Balance” or “ Reversibility Condition” or “Time Reversibility” ( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely.
  • 181.
    181 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example: ( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely. A convenient way to correct this is to reduce the number of moves from x to x’ by introducing a probability 0 < α (x’|x) ≤ 1. This is called the Acceptance Probability. ( ) ( ) ( ) xxxxxxQxxT ≠= '|'|'|' α If the move is not made the process again returns to x as a value from target distribution. ( ) ( ) ( ) ( ) ( )''||'|' xxxQxxxxxQ ππα =The Detailed Balance is From which ( ) ( ) ( ) ( ) ( ) 1 |' ''| |' ≤= xxxQ xxxQ xx π π α ( ) ( ) ( ) ( ) ( )       = xxxQ xxxQ xx π π α |' ''| ,1min|' In the same way (by interchanging x’ with x) ( ) ( ) ( ) ( ) ( )      = ''| |' ,1min'| xxxQ xxxQ xx π π α
  • 182.
    182 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability Let prove that we satisfy the “Detailed Balance” condition: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       = xxxQ xxxQ xxxQxxxT π π ππ |' ''| ,1min|'|' ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )      = ''| |' ,1min''|''| xxxQ xxxQ xxxQxxxT π π ππ Suppose ( ) ( ) ( ) ( )xxxQxxxQ ππ |'''| < ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )''| |' ''| |'|' xxxQ xxxQ xxxQ xxxQxxxT π π π ππ == ( ) ( ) ( ) ( ) ( ) ( )''|1''|''| xxxQxxxQxxxT πππ =⋅= Therefore ( ) ( ) ( ) ( )''||' xxxTxxxT ππ = q.e.d.
  • 183.
    183 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability The Transition Kernel of the Metropolis Hastings Algorithm is: ( ) ( ) ( ) ( )[ ] ( )'|'1|'|'|' xxxxxxxQxxT xδαα −+= where δx is the Dirac-mass on {x}.
  • 184.
    184 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability Therefore the M-H Algorithm will: Use the previous generated x(t) Draw a new value xnew from the candidate distribution Q (xnew | x(t) ): ( ) ( )tnewnew xxQx |~ Compute the acceptance probability α (xnew | xj ): ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       = ttnew newnewt tnew xxxQ xxxQ xx π π α | | ,1min| Use the Acceptance/Rejection method with (uniform distribution between 0 to 1) and c = 1 (because U [0,1] > α (xnew |x(t) ) ) ( ) [ ]    ≤≤ == otherwise x Uxq 0 101 1,0 1 2 3 4
  • 185.
    185 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability
  • 186.
    186 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability Run This Example
  • 187.
    187 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability The convergence of the M-H Algorithm to the desired unique stationary solution the required π (x) occurs under the following conditions: • Irreducibility: every state is eventually reachable from any start state; for all x, there exist a t such that π (x,t) > 0 • Aperiodicity: the chain doesn’t get caught in cycles. The process is ergodic if it is both irreductible and aperiodic. In M-H algorithm the draws are used as sample from the target density π (x) only after the Markov Chain has passed the transient stage and the effect of the chosen starting value x0 has become so small that it can be ignored. The rate of convergence of the Markov Chain is a function of the chosen candidate generating density Q (x’,x) The efficiency of the algorithm depends on how close is the Acceptance Probability α to 1.
  • 188.
    188 SOLO Metropolis-Hastings (M-H) Algorithm GeneratingContinuous Random Variables Review of Probability Example: ( ) ( ) ( )[ ]2 2 102.0exp7.0 2.0exp3.0 −⋅−+ −⋅= x xxπ Proposed Candidate Distribution: ( ) ( )100,| tt new xxxQ N= Ramon Sagarna R,Sagarna@cs.bham.ac.uk “Lecture 19 Markov Chain Monte Carlo Methods (MCMC)”
  • 189.
    189 SOLO Metropolis Algorithm Generating ContinuousRandom Variables Review of Probability If we choose a symmetric candidate generating density: Q (x’|x) = Q (x|x’) for each x’,x then ( ) ( ) ( )       = x x xx π π α ' ,1min|' ( ) ( ) ( )      = ' ,1min'| x x xx π π α We obtain the Metroplis Algorithm. ( ) ( ) ( )xExEE E Ee xxQ kTE −=∆    ≤∆ >∆ = ∆− ': 01 0 |' / Metropolis has chosen: ( ) ( ) ( )xExEE Ee E xxQ kTE −=∆    ≤∆ >∆ = ∆+ ': 0 01 '| /
  • 190.
    190 SOLO Metropolis Algorithm Generating ContinuousRandom Variables Review of Probability Return to Table of Content
  • 191.
    191 SOLO Gibbs Sampling Generating DiscreteRandom Variables Review of Probability Stuart Geman Brown University Donald Geman Johns Hopkins University Josiah Willard Gibbs 1839 - 1903 In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution, or to compute an integral (such as an expected value). Gibbs sampling is a special case of the Metropolis-Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm. The algorithm is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was devised by Stuart Geman and Donald Geman, some eight decades after the passing of Gibbs, and is also called the Gibbs sampler. Geman, S. and Geman, D., “Stochastic Relaxation, Gibbs Distributions and the Bayes Restoration of Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 6, pp. 721 - 741
  • 192.
    192 SOLO Gibbs Sampling (continue– 1) Generating Discrete Random Variables Review of Probability Gibbs sampler uses what are called the full (or complete) conditional distributions: ( ) ( ) ( ) ( )∫∫ − − +− +− +− ==           − jjj jj jkjjj kjjj Bayes x kjjj xdxx xx xdxxxxx xxxxx xxxxx j , , ,,,,,, ,,,,,, ,,,,,| 111 111 111 π π π π π       The Gibbs sampler sample one variable in turn ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )        11 3 1 21 2 1 1 1 1 2 1 1 1 4 1 2 1 13 1 3 3 1 12 1 2 321 1 1 ,,,|~ ,,,|~ ,,,,|~ ,,,|~ ,,,|~ ++++ + − +++ +++ ++ + t k ttt t k tt k t k t k tttt t k ttt t k ttt xxxxX xxxxX xxxxxX xxxxX xxxxX π π π π π Gibbs sampler always uses the most recent values. Suppose that is k ( ≥2 ) dimensional.( )kxxxx ,,, 21 =
  • 193.
    193 SOLO Gibbs Sampling (continue– 2) Generating Discrete Random Variables Review of Probability Gibbs Sampling is a special case of Metropolis – Hastings Algorithm. To see this let define the candidate generating density: Q (x’|x(t) ) as ( ) ( ) ( ) ( ) ( )     = = −− otherwise xxifxx xxQ t j new j t j new jtnew 0 |Pr | ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       = ttnew newnewt tnew xxxQ xxxQ xx Pr| Pr| ,1min|α ( ) ( ) ( ) ( ) ( ) ( ) ( )        = − − tnew j new j newt j t j xxx xxx Pr|Pr Pr|Pr ,1min At any moment one variable is drawn: ( )new jj new j xxx −|~ πnew jx where ( ) ( ) ( ) ( ) ( ) ( )11 1121 ,,,,,,: −− +−− = t k t j t j ttnew j xxxxxx  The acceptance probability is:( ) ( )tnew xx |α ( ) ( ) ( ) ( ) ( ) ( ) ( )new j new j t k t j new j t j ttnew xxxxxxxxx − −− +− == |,,,,,,, 11 1121 The will be new x
  • 194.
    194 SOLO Gibbs Sampling (continue– 3) Generating Discrete Random Variables Review of Probability ( ) ( ) ( ) ( ) ( )     = = −− otherwise xxifxx xxQ t j new j t j new jtnew 0 |Pr | ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )        =      = − − tnew j new j newt j t j ttnew newnewt tnew xxx xxx xxxQ xxxQ xx Pr|Pr Pr|Pr ,1min Pr| Pr| ,1min|α ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )        = −− −− tt j new j new j newnew j t j t jtnew xxxx xxxx xx PrPr,Pr PrPr,Pr ,1min|α ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )t j t j t j Bayes t j t j x xx xx − − − = Pr ,Pr |Pr ( ) ( ) ( )new j new j new j Bayes new j new j x xx xx − − − = Pr ,Pr |Pr ( ) ( ) ( ) ( ) ( ) ( ) 1 Pr Pr ,1min| t j new j xx t j new jtnew x x xx −− = − − =         =α ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )t j t j tt j t j t xxxxxx −− =→= ,PrPr,( ) ( ) ( )new j new j newnew j new j new xxxxxx −− =→= ,PrPr, The acceptance probability is:( ) ( )tnew xx |α Gibbs Sampling always accepts new jx Gibbs Sampling is a special case of Metropolis – Hastings Algorithm. candidate generating density: Q (x’|x(t) )
  • 195.
    195 SOLO Gibbs Sampling (continue– 4) Generating Discrete Random Variables Review of Probability Return to Table of Content
  • 196.
    SOLO Review ofProbability Monte Carlo Integration Monte Carlo Method can be used to numerically evaluate multidimensional integrals ( ) ( )∫∫ == xdxgdxdxxxgI mm  11 ,, To use Monte Carlo we factorize ( ) ( ) ( )xpxfxg ⋅= ( ) ( ) 1&0 =≥ ∫ xdxpxp in such a way that is interpreted as a Probability Density Function such that( )xp We assume that we can draw NS samples from ( )xp( )S i Nix ,,1, = ( ) S i Nixpx ,,1~ = Using Monte Carlo we can approximate ( ) ( )∑= −≈ SN i S i Nxxxp 1 /δ ( ) ( ) ( ) ( ) ( ) ( ) ( )∑∑∫ ∫ ∑∫ == = =−⋅= −⋅=≈⋅= SS S S N i i S N i i S N i S i N xf N xdxxxf N xdNxxxfIxdxpxfI 11 1 11 / δ δ
  • 197.
    SOLO Review ofProbability Monte Carlo Integration we draw NS samples from ( )xp( )S i Nix ,,1, = ( ) S i Nixpx ,,1~ = ( ) ( ) ( )∑∫ = =≈⋅= S S N i i S N xf N IxdxpxfI 1 1 If the samples are independent, then INS is an unbiased estimate of I. i x II sa N N S S .. ∞→ → ( )[ ] ( ) ∞<−= ∫ xdxpIxff 22 :σIf the variance of is finite; i.e.:( )xf The error of the MC estimate, e = INS – I, is of the order of O (NS -1/2 ), meaning that the rate of convergence of the estimate is independent of the dimension of the integrand. Return to Table of Content According to the Law of Large Numbers INS will almost surely converge to I: then the Central Limit Theorem holds and the estimation error converges in distribution to a Normal Distribution: ( ) ( )2 ,0~lim fNS N IIN S S σN− ∞→
  • 198.
    198 Random ProcessesSOLO Random Variable: Avariable x determined by the outcome Ω of a random experiment. ( )Ω= xx Random Process or Stochastic Process: A function of time x determined by the outcome Ω of a random experiment. ( ) ( )Ω= ,txtx 1 Ω 2 Ω 3Ω 4Ω x t This is a family or an ensemble of functions of time, in general different for each outcome Ω. Mean or Ensemble Average of the Random Process: ( ) ( )[ ] ( ) ( )∫ +∞ ∞− =Ω= ξξξ dptxEtx tx ,: Autocorrelation of the Random Process: ( ) ( ) ( )[ ] ( ) ( ) ( )∫ ∫ +∞ ∞− +∞ ∞− =ΩΩ= ηξξξη ddptxtxEttR txtx 21 ,2121 ,,:, Autocovariance of the Random Process: ( ) ( ) ( )[ ] ( ) ( )[ ]{ }221121 ,,:, txtxtxtxEttC −Ω−Ω= ( ) ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )2121212121 ,,,, txtxttRtxtxtxtxEttC −=−ΩΩ= Table of Content
  • 199.
    199 SOLO Stationarity of aRandom Process 1. Wide Sense Stationarity of a Random Process: • Mean Average of the Random Process is time invariant: ( ) ( )[ ] ( ) ( ) .,: constxdptxEtx tx ===Ω= ∫ +∞ ∞− ξξξ • Autocorrelation of the Random Process is of the form: ( ) ( ) ( )τ τ RttRttR tt 21: 2121 , −= =−= ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )12,2121 ,,,:, 21 ttRddptxtxEttR txtx === ∫ ∫ +∞ ∞− +∞ ∞− ηξξξηωωsince: We have: ( ) ( )ττ −= RR Power Spectrum or Power Spectral Density of a Stationary Random Process: ( ) ( ) ( )∫ +∞ ∞− −= ττωτω djRS exp: 2. Strict Sense Stationarity of a Random Process: All probability density functions are time invariant: ( ) ( ) ( ) .,, constptp xtx == ωωω Ergodicity: ( ) ( ) ( )[ ]Ω==Ω=Ω ∫ + −∞→ ,, 2 1 :, lim txExdttx T tx Ergodicity T TT A Stationary Random Process for which Time Average = Assembly Average Random Processes
  • 200.
    200 SOLO Time Autocorrelation: Ergodicity: ( )( ) ( ) ( ) ( )∫ + −∞→ Ω+Ω=Ω+Ω= T TT dttxtx T txtxR ,, 2 1 :,, lim τττ For a Ergodic Random Process define Finite Signal Energy Assumption: ( ) ( ) ( ) ∞<Ω=Ω= ∫ + −∞→ T TT dttx T txR , 2 1 ,0 22 lim Define: ( ) ( )    ≤≤−Ω =Ω otherwise TtTtx txT 0 , :, ( ) ( ) ( )∫ +∞ ∞− Ω+Ω= dttxtx T R TTT ,, 2 1 : ττ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫∫∫ ∫∫∫ −− − − +∞ − − − − ∞− Ω+Ω−Ω+Ω=Ω+Ω= Ω+Ω+Ω+Ω++Ω= T T TT T T TT T T TT T TT T T TT T TTT dttxtx T dttxtx T dttxtx T dttxtx T dttxtx T dttxtx T R τ τ τ τ τττ ττωττ ,, 2 1 ,, 2 1 ,, 2 1 ,, 2 1 ,, 2 1 ,, 2 1 00  Let compute: ( ) ( ) ( ) ( ) ( )∫∫ −∞→−∞→∞→ Ω+Ω−Ω+Ω= T T TT T T T TT T T T dttxtx T dttxtx T R τ τττ ,, 2 1 ,, 2 1 limlimlim ( ) ( ) ( )ττ Rdttxtx T T T TT T =Ω+Ω∫−∞→ ,, 2 1 lim ( ) ( ) ( ) ( )[ ] 0,, 2 1 ,, 2 1 suplimlim →         Ω+Ω≤Ω+Ω ≤≤−∞→−∞→ ∫ τττ ττ txtx T dttxtx T TT TtTT T T TT T therefore: ( ) ( )ττ RRT T = →∞ lim ( ) ( ) ( )[ ]Ω==Ω=Ω ∫ + −∞→ ,, 2 1 :, lim txExdttx T tx Ergodicity T TT T− T+ ( )txT t Random Processes
  • 201.
    201 SOLO Ergodicity (continue): ( )( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( )( )[ ] ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) [ ]TTTT TT TT TTT XX T dvvjvxdttjtx T dtjtxdttjtx T ddttjtxtjtx T dttxtxdj T djR * 2 1 exp,exp, 2 1 exp,exp, 2 1 exp,exp, 2 1 ,,exp 2 1 exp =−ΩΩ= +−Ω+Ω= +−Ω+Ω= Ω+Ω−=− ∫∫ ∫∫ ∫ ∫ ∫ ∫∫ ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− +∞ ∞− +∞ ∞− +∞ ∞− ωω ττωτω ττωτω τττωττωτLet compute: where: and * means complex-conjugate.( ) ( )∫ +∞ ∞− −Ω= dvvjvxX TT ωexp,: Define: ( ) ( ) ( ) ( ) ( ) ( )[ ]∫ ∫∫ +∞ ∞− + −∞→ +∞ ∞−∞→∞→         Ω+Ω−=         −=         = τττωττωτω ddttxtxE T jdjRE T XX ES T T TT T T T TT T ,, 2 1 expexp 2 : limlimlim * Since the Random Process is Ergodic we can use the Wide Stationarity Assumption: ( ) ( )[ ] ( )ττ RtxtxE TT =Ω+Ω ,, ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ∫ ∫∫ ∫ ∞+ ∞− +∞ ∞− + −∞→ +∞ ∞− + −∞→∞→ −=         −=         −=         = ττωτ ττωττττωω djR ddt T jRddtR T j T XX ES T TT T TT TT T exp 2 1 exp 2 1 exp 2 : 1 * limlimlim    Random Processes
  • 202.
    202 SOLO Ergodicity (continue): We obtainedthe Wiener-Khinchine Theorem (Wiener 1930): ( ) ( ) ( )∫ +∞ ∞−→∞ −=      = dtjR T XX ES TT T τωτω exp 2 : * lim Norbert Wiener 1894 - 1964 Alexander Yakovlevich Khinchine 1894 - 1959 The Power Spectrum or Power Spectral Density of a Stationary Random Process S (ω) is the Fourier Transform of the Autocorrelation Function R (τ). Random Processes
  • 203.
    203 SOLO White Noise A (notnecessary stationary) Random Process whose Autocorrelation is zero for any two different times is called white noise in the wide sense. ( ) ( ) ( )[ ] ( ) ( )211 2 2121 ,,, ttttxtxEttR −=ΩΩ= δσ ( )1 2 tσ - instantaneous variance Wide Sense Whiteness Strict Sense Whiteness A (not necessary stationary) Random Process in which the outcome for any two different times is independent is called white noise in the strict sense. ( ) ( ) ( ) ( )2121, ,,21 ttttp txtx −=Ω δ A Stationary White Noise Random has the Autocorrelation: ( ) ( ) ( )[ ] ( )τδσττ 2 ,, =Ω+Ω= txtxER Note In general whiteness requires Strict Sense Whiteness. In practice we have only moments (typically up to second order) and thus only Wide Sense Whiteness. Random Processes
  • 204.
    204 SOLO White Noise A StationaryWhite Noise Random has the Autocorrelation: ( ) ( ) ( )[ ] ( )τδσττ 2 ,, =Ω+Ω= txtxER The Power Spectral Density is given by performing the Fourier Transform of the Autocorrelation: ( ) ( ) ( ) ( ) ( ) 22 expexp στωτδστωτω =−=−= ∫∫ +∞ ∞− +∞ ∞− dtjdtjRS ( )ωS ω 2 σ We can see that the Power Spectrum Density contains all frequencies at the same amplitude. This is the reason that is called White Noise. The Power of the Noise is defined as: ( ) ( ) 2 0 σωτ ==== ∫ +∞ ∞− SdtRP Random Processes
  • 205.
    205 SOLO Table of Content MarkovProcesses A Markov Process is defined by: Andrei Andreevich Markov 1856 - 1922 ( ) ( )( ) ( ) ( )( ) 111 ,|,,,|, tttxtxptxtxp >∀ΩΩ=≤ΩΩ ττ i.e. the Random Process, the past up to any time t1 is fully defined by the process at t1. Examples of Markov Processes: 1. Continuous Dynamic System ( ) ( ) ( ) ( )wuxthtz vuxtftx ,,, ,,, = = 2. Discrete Dynamic System ( ) ( ) ( ) ( )kkkkk kkkkk wuxthtz vuxtftx ,,, ,,, 1 1 = = + + x - state space vector (n x 1) u - input vector (m x 1) v - white input noise vector (n x 1) - measurement vector (p x 1)z - white measurement noise vector (p x 1)w Random Processes
  • 206.
    206 SOLO Table of Content MarkovProcesses Examples of Markov Processes: 3. Continuous Linear Dynamic System ( ) ( ) ( ) ( ) ( )txCtz tvtxAtx = += Using the Fourier Transform we obtain: ( ) ( ) ( ) ( ) ( ) ( )ωωωωω ω VHVAIjCZ H =−= −    1 Using the Inverse Fourier Transform we obtain: ( ) ( ) ( )∫ +∞ ∞− = ξξξ dvthtz , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )∫∫ ∫ ∫ ∫∫ ∞+ ∞− ∞+ ∞− − ∞+ ∞− +∞ ∞− +∞ ∞− +∞ ∞− −=−=         −== ξξξξωξωω π ξ ωωξξωξω π ωωωω π ξ ω dthvddtjHv dtjdjvHdtjVHtz th egrattion of order change V       exp 2 1 expexp 2 1 exp 2 1 int h (t,τ) v (t) z (t) Random Processes
  • 207.
    207 SOLO Table of Content MarkovProcesses Examples of Markov Processes: 3. Continuous Linear Dynamic System ( ) ( ) ( ) ( ) ( )txCtz tvtxAtx = += The Autocorrelation of the output is: ( ) ( ) ( )∫ +∞ ∞− = ξξξ dvthtz , h (t,τ) v (t) z (t) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( ) ( ) ( ) ( )∫∫ ∫ ∫∫ ∫ ∫∫ ∞+ ∞− −= ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− +∞ ∞− +∞ ∞− +=−+−= −−+−=−+−=         −+−=+= ζτζζσξξτξσ ξξξξδξτξσξξξξξτξ ξξξτξξξττ ξζ dhhdthth ddththddvvEthth dvthdvthEtztzER v t v v zz 2 111 2 212121 2 212111 222111 1 ( ) ( ) ( )[ ] ( )τδσττ 2 vvv tvtvER =+= ( ) ( ) ( ) ( ) ( ) 22 expexp vvvvvv djdjRS σττωτδσττωτω =−=−= ∫∫ +∞ ∞− +∞ ∞− ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2*2 22 2 expexp expexpexpexp expexp xx xx x RR zzzz HHdjhdjh djdjhhdjdjhh djdhhdjRS zzzz σωωχχωχζζωζσ χχωζζωζχσττζωζζωζτζσ ττωζτζζσττωτω χτζ ττ =                 −= −=−−−= −−=−= ∫∫ ∫ ∫∫ ∫ ∫ ∫∫ ∞+ ∞− ∞+ ∞− ∞+ ∞− ∞+ ∞− =+ ∞+ ∞− ∞+ ∞− +∞ ∞− +∞ ∞− −= +∞ ∞− ( ) ( ) ( ) ( )ωωωω vvzz SHHS * = Random Processes
  • 208.
    208 SOLO Table of Content MarkovProcesses Examples of Markov Processes: 4. Continuous Linear Dynamic System ( ) ( ) ( )∫ +∞ ∞− = ξξξ dvthtz , ( ) ( ) ( )[ ] ( )τδσττ 2 vvv tvtvER =+= ( ) 2 vvvS σω = v (t) z (t) ( ) xj K H ωω ω /1+ = ( ) x j K H ωω ω /1+ = The Power Spectral Density of the output is: ( ) ( ) ( ) ( ) ( )2 22 * /1 x v vvzz K SHHS ωω σ ωωωω + == ( ) ( )2 22 /1 x vv zz K S ωω σ ω + = ω x ω 22 vv K σ 2/ 22 vv K σ The Autocorrelation of the output is: ( ) ( ) ( ) ( ) ( ) ( ) ( )∫∫ ∫ ∞+ ∞− = ∞+ ∞− +∞ ∞− − − = + = = dss s K j dj K djSR x v js x v zzzz τ ω σ π ωτω ωω σ π ωτωω π τ ω exp /12 1 exp /12 1 exp 2 1 2 22 2 22 ωj xω R ( ) 0 /1 2 22 = −∫∞→R s x vv dse s K τ ω σ ( ) 0 /1 2 22 = −∫∞→R s x vv dse s K τ ω σ xω− σ ωσ js += 0<τ 0>τ ( ) τωσω ω x e K R vvx zz = = 2 22 τ 2/ 22 vvxK σω ( )τω σω x vx K −= exp 2 22 ( ) ( ) ( ) ( )               >        + − −= − − <         − − = − − = ∫ ∫ → −→ 0 exp Reexp 2 1 0 exp Reexp 2 1 222 22 222 222 22 222 τ ω τσω τ ω σω π τ ω τσω τ ω σω π ωω ωω x vx x vx x vx x vx s sK sdss s K j s sK sdss s K j x x Random Processes
  • 209.
    209 SOLO Markov Processes Examples ofMarkov Processes: 5. Continuous Linear Dynamic System with Time Variable Coefficients ( ) ( ) ( ){ } ( ) ( ) ( ){ } ( ) ( ){ } ( ) ( )21121 & :&: tttQteteE twEtwtetxEtxte T ww wx −= −=−= δ w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx td d +==  ( ) ( ) ( ) ( ) ( )tetGtetFte wxx += ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t dwGttxtttx 0 ,, 00 λλλλ The solution of the Linear System is: where: ( ) ( ) ( ) ( ) ( ) ( ) ( )3132210000 ,,,&,&,, ttttttItttttFtt td d Φ=ΦΦ=ΦΦ=Φ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t wxx deGttettte 0 ,, 00 λλλλ ( ){ } ( ) ( ){ } ( ) ( ){ }twEtGtxEtFtxE += Random Processes
  • 210.
    210 SOLO Markov Processes Examples ofMarkov Processes: 5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 1) ( ) ( ) ( ){ } ( ) ( ) ( ){ } ( ) ( ){ } ( ) ( )21121 & :&: tttQteteE twEtwtetxEtxte T ww wx −= −=−= δ w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t dwGttxtttx 0 ,, 00 λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t wxx deGttettte 0 ,, 00 λλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV T xxx ==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV T xxx : ( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR T xxx T xxx :,&:, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ== t t TTT xxx dtGQGttttVttttRtV 0 ,,,,, 000 λλλλλλ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ + +Φ+Φ++Φ+Φ=++=+ τ λλτλλλλττττττ t t TTT xxx dtGQGttttVttttRtV 0 ,,,,, 000 Random Processes
  • 211.
    211 SOLO Markov Processes Examplesof Markov Processes: 5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 2) ( ) ( ) ( ){ } ( ) ( ) ( ){ } ( ) ( ){ } ( ) ( )21121 & :&: tttQteteE twEtwtetxEtxte T ww wx −= −=−= δ w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t dwGttxtttx 0 ,, 00 λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t wxx deGttettte 0 ,, 00 λλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV T xxx ==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV T xxx : ( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR T xxx T xxx :,&:, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       <Φ+Φ+Φ+Φ >Φ+Φ+Φ+Φ =+ ∫ ∫ + 0,,,, 0,,,, , 0 0 000 000 τλλλλλλττ τλλλλλλττ τ τt t TTT x t t TTT x x dtGQGttttVtt dtGQGttttVtt ttR ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )            <+Φ+ >Φ+Φ−+Φ+ <Φ+Φ−+Φ >+Φ =+ ∫ ∫ + + 0, 0,,, 0,,, 0, , τττ τλλλλλλτττ τλλλλλλττ ττ τ τ τ tttV dtGQGttttV or dtGQGttVtt tVtt ttR T x t t TTT x t t TT x x x Random Processes
  • 212.
    212 SOLO Markov Processes Examplesof Markov Processes: 5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 3) ( ) ( ) ( ){ } ( ) ( ) ( ){ } ( ) ( ){ } ( ) ( )21121 & :&: tttQteteE twEtwtetxEtxte T ww wx −= −=−= δ w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t wxx deGttettte 0 ,, 00 λλλλ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       <Φ+Φ+Φ+Φ >Φ+Φ+Φ+Φ =+ ∫ ∫ + 0,,,, 0,,,, , 0 0 000 000 τλλλλλλττ τλλλλλλττ τ τt t TTT x t t TTT x x dtGQGttttVtt dtGQGttttVtt ttR ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ== t t TTT xxx dtGQGttttVttttRtV 0 ,,,,, 000 λλλλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV T xxx ==: ( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR T xxx T xxx :,&:, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGdtFtGQGttFtttVtt dtGQGttFtttVtttFtV td d T t t TTTTT x t t TTT xx +ΦΦ+ΦΦ+ ΦΦ+ΦΦ= ∫ ∫ 0 0 ,,,, ,,,, 000 000 λλλλλλ λλλλλλ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV td d TT xxx ++= ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )ττττττττ +++++++++=+ tGtQtGtFtVtVtFtV td d TT xxx Random Processes
  • 213.
    213 SOLO Markov Processes Examplesof Markov Processes: 5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 4) ( ) ( ) ( ){ } ( ) ( ) ( ){ } ( ) ( ){ } ( ) ( )21121 & :&: tttQteteE twEtwtetxEtxte T ww wx −= −=−= δ w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ= t t wxx deGttettte 0 ,, 00 λλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV T xxx ==: ( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR T xxx T xxx :,&:, ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )       <Φ+Φ+Φ+Φ >Φ+Φ+Φ+Φ =+ ∫ ∫ + 0,,,, 0,,,, , 0 0 000 000 τλλλλλλττ τλλλλλλττ τ τt t TTT x t t TTT x x dtGQGttttVtt dtGQGttttVtt ttR ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )   <+Φ++++++++ >+Φ+++++ =+ 0,,, 0,,, , ττττττττ τττττ τ tttGtQtGtFttRttRtF tGtQtGtttFttRttRtF ttR td d TTT xx TT xx x ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )   <++++Φ+++++ >+Φ+++++ =+ 0,,, 0,,, , ττττττττ τττττ τ tGtQtGtttFttRttRtF tttGtQtGtFttRttRtF ttR td d TT xx TTT xx x Random Processes
  • 214.
    214 SOLO Markov Processes Examplesof Markov Processes: 6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise w (t) x (t) ( )tF ( )tG ∫ x (t) ( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx td d +== Given a Continuous Linear System: we want to decide if can be approximated by a white noise.( )tw Let start with a first order linear system with white noise input :( )tw' ( ) ( ) ( )tw T tw T tw ' 11 +−= w (t)w' (t) ( ) Ts sH + = 1 1 ( ) ( ) Ttt w ett / 0 0 , −− =φ ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE '''' ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )ttRtwEtwtwEtwE ww ,τττ +=−+−+ ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τττ +=+−+− ttRtwEtwtwEtwE ww , ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( ) ( )ttRtVwEwtwEtwE wwww ,==−− ττ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV td d TT xxx ++= ( ) ( ) Q T tV T tV td d wwww 2 12 +−= ( ) ( )00 , 1 , tt T tt td d ww φφ −= where Random Processes
  • 215.
    215 SOLO Markov Processes Examplesof Markov Processes: 6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise (continue – 1) ( ) ( ) Q T tV T tV td d wwww 2 12 +−= ( ) ( )       −+= −− T t T t wwww e T Q eVtV 22 1 2 0 t 2/T ( ) T t ww eV 2 0 −       − − T t e T Q 2 1 2 T Q V statesteadyww 2 =− ( )tVww ( ) ( ) ( ) ( ) ( ) ( ) ( )     <+=+Φ+ >=+Φ =+ − − 0, 0, , ττττ ττ τ τ τ tVetttV tVetVtt ttR ww TT www ww T www ww ( ) ( ) ( ) ( ) ( ) ( ) ( )     <+=++Φ >=+Φ =+ − − 0, 0, , ττττ ττ τ τ τ tVetVtt tVetttV ttR ww T www ww TT www ww For ( ) ( ) T Q VtVtV T statesteadywwwwww 2 5 ==+≈⇒> − τ τ ( ) ( ) ( ) TT statesteadywwwwwwww e T Q eVVttRttR T ττ τττ τ −− − =≈≈+≈+⇒> 2 ,,5 w (t)w' (t) ( ) Ts sH + = 1 1 Random Processes
  • 216.
    216 SOLO Markov Processes Examplesof Markov Processes: 6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise (continue – 2) ( ) ( ) ( ) TT statesteadywwwwwwww e T Q eVVttRttR T ττ τττ τ −− − =≈≈+≈+⇒> 2 ,,5 ( ) T ww e T Q V / 2 τ τ = = τ T Q V statesteadyww 2 =− T− T 1− − ⋅eV statesteadyww ( ) Qde T Q dVArea T ww === ∫∫ +∞ − +∞ ∞− 0 2 2 τττ τ T is the correlation time of the noise w (t) and can be found from Vww (τ) by tacking the time corresponding to Vww steady-state /e. One other way to find T is by tacking the double sides Laplace Transform L2 on τ of: ( ) ( ){ } ( ) QdetQtQs s ww =−=−=Φ ∫ +∞ ∞− − ττδτδ τ τ2'' L ( ) ( ){ } ( ) ( ) ( )sHQsH sT Q dee T Q Vs sT sswwww −= = = ==Φ ∫ +∞ ∞− −− − 2 / 2 1 2 ττ ττ τL ( ) ( )2 2/1 /1 ωω ω + = Q Qww ω T/12/1 =ω Q 2/Q T/12/1 −=−ω T can be found by tacking ω1/2 of half of the power spectrum Q/2 and T=1/ ω1/2. Random Processes
  • 217.
    217 SOLO Markov Processes Examplesof Markov Processes: ( ) ( )2 2/1/1 ωω ω + = Q Qww ω T/12/1 =ω Q 2/Q T/12/1 −=−ω Let return to the original system: ( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx td d +==  w (t) x (t) ( )tF ( )tG ∫ x (t) 6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise (continue – 3) Compute the power spectrum of and define Q and T. ( )ωjsww =Φ ( )tw then can be approximated by the white noise with( )tw ( )tw' ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE '''' then can be approximated by a colored noise that can be obtained by passing the predefined white noise through a filter ( )tw ( )tw' ( ) sT sH + = 1 1 If Fofeigenvaluemaximum 1 FofconstanttimeminimumT =<51 If Fofeigenvaluemaximum 1 FofconstanttimeminimumT =>52 Random Processes
  • 218.
    218 SOLO Markov Processes Examplesof Markov Processes: 7. Digital Simulation of a Contimuos Process Let start with a first order linear system with white noise input :( )tw' ( ) ( ) ( )tw T tw T tw ' 11 +−= w (t)w' (t) ( ) Ts sH + = 1 1 ( ) ( ) Ttt w ett / 0 0 , −− =φ ( ) ( )00 , 1 , tt T tt td d ww φφ −= ( ) ( ) ( ) ( ) ( )∫ −−− += t t TtTtt dw T etwetw 0 0 ' 1/ 0 / τττ Let choose t = (k+1) ΔT and t0 = k ΔT ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''where ( )[ ] ( ) ( )[ ] ( ) ( ) ( )    Tkw Tk Tk TTkTT dw T eTkweTkw ∆ ∆+ ∆ ∆+−∆− ∫+∆=∆+ 1' 1 /1/ ' 1 1 τττ Random Processes
  • 219.
    219 SOLO Markov Processes Examplesof Markov Processes: 7. Digital Simulation of a Contimuos Process (continue – 1) Define: TT e / : ∆− =ρ ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )[ ] ( )[ ] ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( ) ( )( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )2/2 1 /12 2 1 12 /12 2 1 1 122112 /1/1 1111 1 2 1 22 1 '''' 1 '''' 11 21 21 ρτ ττττττ ττ ττδ ττ −=−=== −−= ∆−∆∆−∆ ∆− ∆+ ∆ ∆+− ∆+ ∆ ∆+− ∆+ ∆ ∆+ ∆ − ∆+−∆+− ∫ ∫ ∫ T Q e T Q e T T Q dQ T e ddwEwwEwE T ee TkwETkwTkwETkwE TT Tk Tk TTk Tk Tk TTk Tk Tk Tk Tk Q TTkTTk    ( )[ ] ( ) ( )[ ] ( ) ( ) ( )    Tkw Tk Tk TTkTT dw T eTkweTkw ∆ ∆+ ∆ ∆+−∆− ∫+∆=∆+ 1' 1 /1/ ' 1 1 τττ Define w’ (k) such that: ( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } T Q kwEkwkwEkwE 2 :'''' =−− ( ) ( ) 2 1 1 ' :' ρ− = kw kw Therefore: ( )[ ] ( ) ( )kwTkwTkw '11 2 ρρ −+∆=∆+ Random Processes
  • 220.
    220 SOLO Markov Chains Random Processes X3X2 0.3 0.3 0.1 0.2 X1 0.6 0.5 0.3 0.6 0.1 Markov chain, named after Andrey Markov, is a stochastic process with the Markov property. Having the Markov property means that, given the present state, future states are independent of the past states. In other words, the description of the present state fully captures all the information that could influence the future evolution of the process. Being a stochastic process means that all state transitions are probabilistic.Andrey Andreevich Markov 1856 - 1922 At each step the system may change its state from the current state to another state (or remain in the same state) according to a probability distribution. The changes of state are called transitions, and the probabilities associated with various state-changes are called transition probabilities Definition of Markov Chains A Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that, given the present state, the future and past states are independent. ( ) ( )nnnnnn xXxXxXxXxX ====== ++ |Pr,,|Pr 1111 
  • 221.
    221 SOLO Markov Chains Random Processes Propertiesof Markov Chains Define the probability of going from state i to state j in m time steps as: ( ) ( )iXjXp om m ji ===→ |Pr and the single step transition as: ( )iXjXp ji ===→ 01 |Pr X3 X2 X1 1.011 =→p 6.021 =→p 5.012 =→p 3.032 =→p 3.023 =→p 3.031 =→p 6.013 =→p 1.033 =→p 2.022 =→p For a time-homogeneous Markov Chain: ( ) ( )iXjXp kkm m ji === +→ |Pr and: ( ) ( )iXjXkp kkji === +→ |Pr 1 so, the n-step transition satisfies the Chapman-Kolmogorov equation, that for any k such that 0 < k <n: ( ) ( ) ( ) ∑∈ − →→→ = Sr kn jr k ri n ji ppp
  • 222.
    222 SOLO Markov Chains Random Processes Propertiesof Markov Chains (continue – 1) The marginal distribution Pr (Xk = x) is the distribution over states at time k: ( ) ( )iXjXkp kkji === +→ |Pr 1 X3 X2 X1 1.011 =→p 6.021 =→p 5.012 =→p 3.032 =→p 3.023 =→p 3.031 =→p 6.013 =→p 1.033 =→p 2.022 =→p In Matrix form it can be written as: ( ) kN kK NNNN N N kN X X X ppp ppp ppp X X X                             =               →→→ →→→ →→→ +          2 1 21 22221 11211 1 2 1 PrPr where N is the number of states of the Markov Chain.           = 1.03.03.0 3.02.06.0 6.05.01.0 K Properties of the Transition Matrix K: ( ) 10 ≤≤ → np ji ( ) 1 1 =∑= → N j ji np 1 2 For a time-homogeneous Markov Chain: ( ) ( ) ( ) ( ) ( )∑ ∑ ∈ → ∈ ++ == ===== Sr kjr Sr kkkk rXkp rXrXjXjX Pr Pr|PrPr 11
  • 223.
    223 SOLO Markov Chains Random Processes Propertiesof Markov Chains (continue – 2) A state j is said to be accessible from a state i (written i → j) if a system started in state i has a non-zero probability chance of transitioning into state j at some point. Formally, state j is accessible from state i if there exists an integer n≥0 such that : ( ) ( )n jion piXjX →=== |Pr Reducibility Allowing n to be zero means that every state is defined to be accessible from itself. A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A set of states C is a communicating class if every pair of states in C communicates with each other, and no state in C communicates with any state not in C. It can be shown that communication in this sense is an equivalence relation and thus that communicating classes are the equivalence classes of this relation. A communicating class is closed if the probability of leaving the class is zero, namely that if i is in C but j is not, then j is not accessible from i. Finally, a Markov chain is said to be irreducible if its state space is a single communicating class; in other words, if it is possible to get to any state from any state.
  • 224.
    224 SOLO Markov Chains Random Processes Propertiesof Markov Chains (continue – 3) A state i has period k if any return to state i must occur in multiples of k time steps. Formally, the period of a state is defined as: ( ){ }0|Pr: >=== iXiXndivisorcommongreatestk on Periodicity Note that even though a state has period k, it may not be possible to reach the state in k steps. For example, suppose it is possible to return to the state in {6,8,10,12,...} time steps; then k would be 2, even though 2 does not appear in this list. If k = 1, then the state is said to be aperiodic; otherwise (k>1), the state is said to be periodic with period k. It can be shown that every state in a communicating class must have the same period.
  • 225.
    225 SOLO Review ofProbability Existence Theorems Existence Theorem 3 Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ), (R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t) having S (ω) as its power spectrum or R (τ) as its autocorrelation. Proof of Existence Theorem 3 Define ( ) ( ) ( ) ( ) ( )ω π ω π ω ωωω π −= − === ∫ +∞ ∞− f a S a S fdSa 22 2 :& 1 : Since , according to Existence Theorem 1, we can find a random variable ω with the even density function f (ω), and probability density function ( ) ( ) 1&0 =≥ ∫ +∞ ∞− ωωω dff ( ) ( )∫∞− = ω ττω dfP : We now form the process , where is a random variable uniform distributed in the interval (-π,+π) and independent of ω. ( ) ( )ϑω += tatx cos: ϑ
  • 226.
    226 SOLO Review ofProbability Existence Theorems Existence Theorem 3 Proof of Existence Theorem 3 (continue – 1) Since is uniform distributed in the interval (-π,+π) and independent of ω, its spectrum is ( ){ } ( ){ } ( ){ } ( ){ } ( ){ } 0sinsincoscos 00 , =−=  ϑωϑω ϑωϑω ϑω EtEaEtEatxE tindependen ϑ ( ) { } ( ) ϖπ ϖπ ϖπϖπ ϑ π ϖ πϖπϖπ π ϑϖπ π ϑϖϑϖ ϑϑ sin 2 1 2 1 2 1 = − ==== −+ − + − ∫ j ee j e deeES jjj jj or { } ( ){ } ( ){ } ( ) ϖπ ϖπ ϑϖϑϖ ϑϑ ϑϖ ϑ sin sincos =+= EjEeE j 1=ϖ 1=ϖ ( ) ( ){ } ( ) ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ }  0 2 0 22, 22 2 2sin2sin 2 2cos2cos 2 cos 2 22cos 2 cos 2 coscos ϑτωϑτωτω ϑτωτω ϑτωϑωτ ϑωϑωω ϑω EtE a EtE a E a tE a E a ttEatxtxE tindependen +−++= +++= +++=+ 2=ϖ 2=ϖ Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ), (R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t) having S (ω) as its power spectrum or R (τ) as its autocorrelation.
  • 227.
    227 SOLO Review ofProbability Existence Theorems Existence Theorem 3 Proof of Existence Theorem 3 (continue – 2) ( ){ } 0=txE ( ) ( ){ } ( ){ } ( ) ( ) ( )τωωτωτωτ ω xRdf a E a txtxE ===+ ∫ +∞ ∞− cos 2 cos 2 22 ( ) ( )ϑω += tatx cos:We have Because of those two properties x (t) is wide-sense stationary with a power spectrum given by: ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )∫∫ +∞ ∞− −=+∞ ∞− =−= ττωτττωτωτω ττ dRdjRS x RR xx xx cossincos ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )∫∫ +∞ ∞− −=+∞ ∞− =+= ωτωω π ωτωτωω π τ ωω dSdjSR x SS xx xx cos 2 1 sincos 2 1 Therefore ( ) ( )ωπω faSx 2 = q.e.d. Fourier Inverse Fourier ( ) ( )∫ +∞ ∞− = ωωτω df a cos 2 2 f (ω) definition ( )ωS= Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ), (R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t) having S (ω) as its power spectrum or R (τ) as its autocorrelation.
  • 228.
    228 SOLO Permutation &Combinations Permutations Given n objects, that can be arranged in a row, how many different permutations (new order of the objects) are possible? To count the possible permutations , let start by moving only the first object {1}. 1 Number of permutations 2 3 n By moving only the first object {1}, we obtained n permutations.
  • 229.
    229 SOLO Permutation &Combinations Permutations (continue -1) Since we obtained all the possible position of the first object, we will perform the same procedure with the second object no {2}, that will change position with all other objects, in each of the n permutations that we obtained before . For example from the group 1 we obtain the following new permutations Number of new permutations Since this is true for all permutations (n-1 new permutations for each of the first n permutations) we obtain a total of n (n-1) permutations . 1 2 n-2 n-1
  • 230.
    230 SOLO Permutation &Combinations Permutations (continue -2) If we will perform the same procedure with the third object {3}, that will change position with all other objects, besides those with objects no {1} and {2} that we already obtained, in each of the n (n-1) permutations that we obtained before , we will obtain a total of n (n-1) (n-2) permutations. We continue the procedure with the objects {4}, {5}, …, {n}, to obtain finally the total number of permutations of the n objects: n (n-1) (n-2) (n-3)… 1 = n ! The gamma function Γ is defined as: ( ) ( )∫ ∞ − −=Γ 0 1 exp dttta a Gamma Function Γ If a = n is an integer then: ( )  ( ) ( ) ( ) ( )nn dtttnttdtttn nn dv u n Γ= −+−−=−=+Γ ∫∫ ∞ −∞ ∞ 0 1 0 0 expexpexp1  ( ) ( ) ( ) 1expexp1 0 0 =−−=−=Γ ∞ ∞ ∫ tdtt Therefore: ( ) ( ) ( ) ( ) ( )!11211 −=−−==Γ=+Γ nnnnnnn  Table of Content
  • 231.
    231 SOLO Permutation &Combinations Combinations Given k boxes, each box having a maximum capacity (for box i the maximum object capacity is ni ). Given also n objects, that must be arranged in k boxes, each box must be filled to it’s maximum capacity : The order of the objects in the box is not important. Example: A box with a capacity of three objects in which we arranged the objects {2}, {4}, {7) 42 7 4 2 74 27 427 42 7 4 2 7 4 27 Equivalent 3!=6 arrangements 1 outcome nnnn k =+++ 21
  • 232.
    232 SOLO Permutation &Combinations Combinations (continue - 1) In order to count the different combinations we start with n ! different arrangements of the n objects. nnnn k =+++ 21 In each of the n! arrangements the first n1 objects will go to box no. 1, the next n2 objects in box no. 2, and so on, and the last nk objects in box no. k, and since: all the objects are in one of the boxes.
  • 233.
    233 SOLO Permutation &Combinations Combinations (continue - 2) But since the order of the objects in the boxes is not important, to obtain the number of different combinations, we must divide the total number of permutations n! by n1!, because of box no.1, as seen in the example bellow, where we used n1=2. 1 2 3 nn-1 n1=2 n2 nk Box 1 Box 2 Box k 12 3 nn-1 kn !n 21 =n 2n 123 nn-1 123 nn-1 12nn-1 12n n-1 4 4 4 4 n-2 n-2 n-3 n-3 !1n !1n !1n Same Combination Same Combination Same Combination ! ! 1n n Therefore since the order of the objects in the boxes is not important, and because the box no.1 can contain only n1 objects, the number of combination are
  • 234.
    234 SOLO Permutation &Combinations Combinations (continue - 3) Since the order of the objects in the boxes is not important, to obtain the number of different combinations, we must divide the total number of arrangements n! by n1!, because of box no.1, by n2!, because of box no.2, and so on, until nk! because of box no.k, to obtain !!! ! 21 knnn n  Combinations to Bernoulli Trials To Generalized Bernoulli Trials Table of Content
  • 235.
    235 SOLO Review ofProbability References [1] W.B. Davenport, Jr., and W.I. Root, “ An Introduction to the Theory of Random Signals and Noise”, McGraw-Hill, 1958 [2] A. Papoulis, “ Probability, Random Variables and Stochastic Processes”, McGraw-Hill, 1965 [4] S.M. Ross, “ Introduction to Probability Models”, 4th Ed., Academic Press, 1989 [6] R.M. McDonough, and A.W. Whalen, “ Detection of Signals in Noise”, 2nd Ed., Academic Press, 1995 [7] Al. Spătaru, “ Teoria Transmisiunii Informaţiei – Semnale şi Perturbaţii”, (romanian), Editura Tehnică, Bucureşti, 1965 [8] http://www.york.ac.uk/depts/maths/histstat/people/welcome.htm [9] http://en.wikipedia.org/wiki/Category:Probability_and_statistics [10] http://www-groups.dcs.st-and.ac.uk/~history/Biographies [3] K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection, Estimation and Data Analysis”, John Wiley & Sons, 1988 [5] S.M. Ross, “ A Course in Simulation”, Macmillan & Collier Macmillan Publishers, 1990 Table of Content
  • 236.
    236 SOLO Review ofProbability Integrals Used in Probability ( ) ( ) ! 1 !! 1 1 0 ++ =−∫ mn mn duuu mn ( ) ( )       −=∫ 2 1 expexp aa x xadxxax ( ) ( )       +−=∫ a x a x a xadxxax 2 23 2 22 expexp ( ) π 2 1 exp 0 2 =−∫ ∞ dxx ( ) 0 2 1 exp 0 2 >=−∫ ∞ a a dxxa π ( ) π=−∫ ∞ ∞− dxx2 exp ( ) 0exp 2 >=−∫ ∞ ∞− a a dxxa π ( ) ,3,2,1,0!exp 0 ==−∫ ∞ nndxxxn ( ) ,3,2,1,0,0 ! exp 1 0 =>=− + ∞ ∫ na a n dxxax n n
  • 237.
    237 SOLO Review ofProbability Gamma Function
  • 238.
    238 SOLO Review ofProbability Incomplete Gamma Function
  • 239.
    January 6, 2015239 SOLO Technion Israeli Institute of Technology 1964 – 1968 BSc EE 1968 – 1971 MSc EE Israeli Air Force 1970 – 1974 RAFAEL Israeli Armament Development Authority 1974 – 2013 Stanford University 1983 – 1986 PhD AA
  • 240.
    240 SOLO Review ofProbability Ferdinand Georg Frobenius (1849 –1919) Perron–Frobenius Theorem In linear algebra, the Perron–Frobenius Theorem, named after Oskar Perron and Georg Frobenius, asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector has strictly positive components. This theorem has important applications to probability theory (ergodicity of Markov chains) and the theory of dynamical systems (subshifts of finite type). Oskar Perron (1880 – 1975)
  • 241.
    SOLO Review ofProbability Monte Carlo Categories 1. Monte Carlo Calculations Design various random or pseudo-random number generators. 2. Monte Carlo Sampling Develop efficient (variance – reduction oriented) sampling techniques for estimation. 3. Monte Carlo Optimization Optimize some (non-convex, non-differentiable) functions, to name a few, Simulated annealing, dynamic weighting, genetic algorithms.

Editor's Notes

  • #7 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #16 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #20 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #29 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.107-108
  • #32 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.147-148
  • #33 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp. 147-149
  • #34 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp. 147-149
  • #35 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.245-246
  • #36 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.245-246
  • #37 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.126-132
  • #38 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.126-132
  • #39 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #43 S.M. Ross,”Introduction to Probability Models”, 5th Ed., Academic Press. Pg.58
  • #51 A. Papoulis, “ Probability, Random Variables and StochasticProcesses”, McGraw-Hill, 1965, pp.99-100
  • #52 A. Papoulis, “ Probability, Random Variables and StochasticProcesses”, McGraw-Hill, 1965, pp.169
  • #53 http://en.wikipedia.org/wiki/Histogram
  • #54 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #55 http://www.math.rutgers.edu/courses/591/pchap3.pdf http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #58 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pg.151 http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Bienayme.html
  • #59 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #60 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #61 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #62 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #63 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #64 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #65 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #66 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #67 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #68 K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley &amp; Sons, 1988, pp.78-80 http://www.math.rutgers.edu/courses/591 http://omega.albany.edu:8008/machine-learning-dir/notes-dir/vc1/vc-l.html
  • #69 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.260-263 http://en.wikipedia.org/wiki/Convergence_in_distribution
  • #70 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.260-263
  • #71 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.263-266 http://en.wikipedia.org/wiki/Law_of_large_numbers
  • #72 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.263-266 http://en.wikipedia.org/wiki/Law_of_large_numbers
  • #73 Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.260-263 http://en.wikipedia.org/wiki/Law_of_large_numbers
  • #74 http://en.wikipedia.org/wiki/Central_limit_theorem
  • #82 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pg.85
  • #83 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.74-75
  • #86 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”, pg. 23
  • #88 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #89 Development of Sterling asymptotical approximation can be found in André Angot:”Compléments de Mathématiques”, § 9.1.4
  • #92 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.76
  • #93 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.74-75
  • #94 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #99 Sheldon M.Ross, ”Introduction to Probability Models”
  • #101 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.250-251
  • #102 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.250-251
  • #103 A. Papoulis, “Probability, Random Variables and Stochastic Processes”,McGraw-Hill, 1965, pp.250-251
  • #104 Bar-Shalom, Fortman, T.E., “Tracking and Data Association”, Academic Press, 1988, pp. 70-79 Bar-Shalom, Y, Li, X-R, “Estimation and Tracking: Principles, Techniques and Software”, Artech House, 1993, pp.235-250
  • #105 Bar-Shalom, Fortman, T.E., “Tracking and Data Association”, Academic Press, 1988, pp. 70-79 Bar-Shalom, Y, Li, X-R, “Estimation and Tracking: Principles, Techniques and Software”, Artech House, 1993, pp.235-250
  • #106 R.McDonough &amp; A.D. Whalen, “Detection of Signals in Noise”, 2nd Ed., pg. 142 http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Gosset.html
  • #107 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #108 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #109 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”, pp. 46-47
  • #110 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”, pp. 46-47
  • #111 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”, pp. 46-47
  • #112 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”, pp. 46-47
  • #131 http://en.wikipedia.org/wiki/Monte_Carlo_sampling http://www.lanl.gov/news/pdf/Metropolis_bio.pdf
  • #133 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pg.147, Prob. 4-10
  • #134 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pg.147, Problem 4-10
  • #135 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pg.147, Problem 4-10
  • #136 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #137 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #138 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #139 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #140 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #141 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #142 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #143 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #144 Taylor, J., H., “Handbook of the Direct Statistical Analysis of Missile Guidance Systems via CADET”,“ The Analytic Sciences Corporation”, NTIS, AD-A013 397, 31 May 1975, Appendix C, “The Monte-Carlo Method: Application and Reliability”
  • #145 Bar-Shalom, Y., Xiao-Rong, L., “Estimation and Tracking: Principles, Techniques, and Software”, Artech House, 1993, pp. 108-109
  • #146 http://www.etsu.edu/math/seier/Kurto100years.doc http://en.wikipedia.org/wiki/Kurtosis http://en.wikipedia.org/wiki/Karl_Pearson
  • #147 http://www.etsu.edu/math/seier/Kurto100years.doc http://en.wikipedia.org/wiki/Kurtosis http://en.wikipedia.org/wiki/Karl_Pearson
  • #148 http://en.wikipedia.org/wiki/Skewness http://en.wikipedia.org/wiki/Kurtosis http://en.wikipedia.org/wiki/Karl_Pearson
  • #149 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.105-106, Problem 4-1-1
  • #150 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.105-106, Problem 4-1-1
  • #151 Ross, S.,M., “A Course in Simulation”, Collier Macmillan Publishers, A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.105-106, Problem 4-1-1
  • #152 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.113-114, Example 4-2-1
  • #153 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.113-114, Example 4-2-1
  • #154 A. Gelb, Ed., “Applied Optimal Estimation”,MIT Press, 1974, pp.1243-126, Example 4-3-1
  • #155 University of Alberta “ Principles of Monte Carlo Simulation”, February 2001
  • #156 University of Alberta “ Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northest Parallel Architectures Center, January 1996 (http://www.npac.syr.edu/users/paulc/montecarlo/p_montecarlo.html) Ren, J. “Pseudorandom Number Generators and the Metroplis Algorithm”, (http://www.hep.fsu.edu/~berd/teach/mcmc05/homework/Ren_RandomNumbers.ppt)
  • #157 University of Alberta “ Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northest Parallel Architectures Center, January 1996
  • #158 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 36 – 37 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northest Parallel Architectures Center, January 1996
  • #159 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 36 – 37 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northest Parallel Architectures Center, January 1996
  • #160 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 36 – 37 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northest Parallel Architectures Center, January 1996 http://en.wikipedia.org/wiki/Histogram
  • #161 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 44 - 50
  • #162 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 44 - 45
  • #163 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 49 - 50
  • #164 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 50 - 51
  • #165 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 51 - 52
  • #166 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 51 - 52
  • #167 Karlsson, R., “ Simulation Based Methods for Target Tracking”, Linkoping Studies in Science and Technology, Thesis No. 930, 2002, pp. 34 – 35, , http://www.control.isy.liu.se/research/reports/LicentiateThesis/Lic930.pdf
  • #168 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 59 - 60
  • #169 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 51 - 52
  • #170 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp. 51 - 52
  • #171 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://en.wikipedia.org/wiki/Bootstrapping_(statistics) Efron, B., “Bootstrap methods: another look at the jackknife”, The Annals of Statistics”, 1979, no.7, pp. 1-26 http://www-stat.stanford.edu/~ckirby/brad/ http://en.wikipedia.org/wiki/Bradley_Efron
  • #172 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://en.wikipedia.org/wiki/Bootstrapping_(statistics) Efron, B., “Bootstrap methods: another look at the jackknife”, The Annals of Statistics”, 1979, no.7, pp. 1-26
  • #173 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp.135 - 136
  • #174 S.M. Ross, “ A Course in Simulation”, Macmillan &amp; Collier MacmillanPublishers, 1990, pp.135 – 136 Karlsson, R., “ Simulation Based Methods for Target Tracking”, Linkoping Studies in Science and Technology, Thesis No. 930, 2002, pp. 35 – 36, , http://www.control.isy.liu.se/research/reports/LicentiateThesis/Lic930.pdf
  • #175 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” (http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm) Chib, S, Greenberg, E., “Understanding the Metropolis-Hastings Algorithm”, The American Statistician, November 1995, Vol. 49, No. 4, pp. 327-335, (http://astro.temple.edu/~msobel/courses_files/firstmetropolis.pdf)
  • #176 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 (http://www.npac.syr.edu/users/paulc/montecarlo/p_montecarlo.html) Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds (http://www.comp.leeds.ac.uk/dima/MCMCTutorial.htm) Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” (http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm)
  • #177 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #178 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #179 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #180 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #181 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #182 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #183 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #184 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 (http://www.ualberta.ca/~cdeutsh/MCS_course.htm) Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #185 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #186 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #187 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #188 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #189 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #190 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #191 University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 Coddington, P., “Monte Carlo Simulation for Statistical Physics”, CPS 713, Northeast Parallel Architecture Center, January 1996 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., “ Equations of state calculations by fast computing machine”, Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092 http://en.wikipedia.org/wiki/Equations_of_State_Calculations_by_Fast_Computing_Machines http://en.wikipedia.org/wiki/Metropolis-Hastings_alorithm Damen, D., “A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd 2008, University of Leeds Zhu, Delleard and Tu, “Markov Chain Monte Carlo for Computer Vision – a Tutorial for ICCV05” http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm http://www.probability.ca/hastings/
  • #192 http://en.wikipedia.org/wiki/Gibbs_sampling Grenager, T., “An Introduction to Markov Chain Monte Carlo”, July 1, 2004 (http://nlp.stanford.edu/~grenager) Damen, D.,”A Tutorial on Markov Chain Monte Carlo (MCMC)”, Maths Club, December 2nd, 2008 (http://www.comp.leeds.ac.uk/dima/MCMCTutorial.htm) Sahu, S., “Tutorial Lectures on MCMC I”, (http://www.soe.ucsc.edu/classes/cmps290c/winter06/paps/mcmc.pdf) University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://www.dam.brown.edu/people/geman/index.html http://www.cis.jhu.edu/people/faculty/geman
  • #193 http://en.wikipedia.org/wiki/Gibbs_sampling University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://www.dam.brown.edu/people/geman/index.html http://www.cis.jhu.edu/people/faculty/geman
  • #194 http://en.wikipedia.org/wiki/Gibbs_sampling University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://www.dam.brown.edu/people/geman/index.html http://www.cis.jhu.edu/people/faculty/geman
  • #195 http://en.wikipedia.org/wiki/Gibbs_sampling University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://www.dam.brown.edu/people/geman/index.html http://www.cis.jhu.edu/people/faculty/geman
  • #196 http://en.wikipedia.org/wiki/Gibbs_sampling University of Alberta, “Principles of Monte Carlo Simulation”, February 2001 http://www.dam.brown.edu/people/geman/index.html http://www.cis.jhu.edu/people/faculty/geman
  • #197 Ristic, B., Arulampalam, S., Gordon, N., “Beyond the Kalman Filter – Particle Filter for Tracking Applications”, Artech House, 2004, pp. 35-36
  • #198 Ristic, B., Arulampalam, S., Gordon, N., “Beyond the Kalman Filter – Particle Filter for Tracking Applications”, Artech House, 2004, pp. 35-36
  • #203 Di Franco &amp; Rabin, “Radar Detection”, pg. 117 Sage &amp; Melsa, “Estimation Theory with Applications to Communication and Control”,McGraw-Hill, 1971, pg.42
  • #221 http://en.wikipedia.org/wiki/Markov_chain
  • #222 http://en.wikipedia.org/wiki/Markov_chain
  • #223 http://en.wikipedia.org/wiki/Markov_chain
  • #224 http://en.wikipedia.org/wiki/Markov_chain
  • #225 http://en.wikipedia.org/wiki/Markov_chain
  • #226 A. Papoulis, “ Probability, Random Variables and StochasticProcesses”, McGraw-Hill, 1965, pp.350
  • #227 A. Papoulis, “ Probability, Random Variables and StochasticProcesses”, McGraw-Hill, 1965, pp.350
  • #228 A. Papoulis, “ Probability, Random Variables and StochasticProcesses”, McGraw-Hill, 1965, pp.303, 350
  • #236 John Minkoff, “Signals, Noise, and Active Sensors - Radar, Sonar, Laser Radar”
  • #238 M.Abramowitz &amp; I.E. Stegun, ED., “Handbook of Mathematical Functions”, Dover Publication, 1965, pg.255
  • #239 M.Abramowitz &amp; I.E. Stegun, ED., “Handbook of Mathematical Functions”, Dover Publication, 1965, pg.261
  • #242 Zhe Chen, “Bayesian Filtering From Kalman Filters to Particle Filters, and Beyond”, 18.05.06, Manuscript, pg. 17 http://www.dsi.unifi.it/users/chisci/idfric/Nonlinear_filtering_Chen.pdf