Introduction to Mathematical Probability

1
Probability
SOLO HERMELIN
Updated: 6.06.09http://www.solohermelin.com

2
SOLO
Table of Content
Probability
Set Theory
Probability Definitions
Theorem of Addition
Conditional Probability
Total Probability Theorem
Statistical Independent Events
Theorem of Multiplication
Conditional Probability - Bayes Formula
Random Variables
Probability Distribution and Probability Density Functions
Conditional Probability Distribution and
Conditional Probability Density Functions
Expected Value or Mathematical Expectation
Variance
Moments
Functions of one Random Variable
Jointly, Distributed Random Variables
Characteristic Function and Moment-Generating Function
Existence Theorems (Theorem 1 & Theorem 2)

3
SOLO
Table of Content (continue - 1)
Probability
Law of Large Numbers (History)
Markov’s Inequality
Chebyshev’s Inequality
Bienaymé’s Inequality
Chernoff’s and Hoeffding’s Bounds
Chernoff’s Bound
Hoeffding’s Bound
Convergence Concepts
The Law of Large Numbers
Central Limit Theorem
Bernoulli Trials – The Binomial Distribution
Poisson Asymptotical Development (Law of Rare Events)
Normal (Gaussian) Distribution
De Moivre-Laplace Asymptotical Development
Laplacian Distribution
Gama Distribution
Beta Distribution
Distributions

4
SOLO
Probability
Cauchy Distribution
Exponential Distribution
Chi-square Distribution
Student’s t-Distribution
Uniform Distribution (Continuous)
Rayleigh Distribution
Rice Distribution
Weibull Distribution
Kinetic Theory of Gases
Maxwell’s Velocity Distribution
Molecular Models
Boltzman Statistics
Bose-Einstein Statistics
Fermi-Dirac Statistics
Monte Carlo Method
Generating Continuous Random Variables
Importance Sampling
Generating Discrete Random Variables
Metropolis & Metropolis – Hastings Algorithms
Markov Chain Monte Carlo (MCMC)
Gibbs Sampling
Monte Carlo Integration

5
SOLO
Probability
Appendices
Permutations
Combinations
References
Random Processes
Stationarity of a Random Process
Ergodicity
Markov Processes
White Noise
Markov Chains
Existence Theorems (Theorem 3)

6
SOLO Set Theory
A = (ζ1, ζ2,…, ζn) – a set of n elements
A set A is a collection of objects (elements of the set) ζ1, ζ2,…, ζn
A (x)= (|x| < 1) – a set of all numbers smaller than 1
A (x,y)= (0 <x <T, 0<y<T) – a set of points (x,y) in a square
Ǿ - the set that contains no elements
S - the set that contains all elements (Set space)
S
A = Set space of a die:
six independent events
{1}, {2}, {3}, {4}, {5}, {6}
Examples

7
SOLO Set Theory
Set Operations
Inclusion - A is included in B ifBA ⊂ BxAx ∈⇒∈∀
Equality ( ) ( )ABandBABA ⊂⊂⇔=
Addition BxorAxifBAxBA ∈∈∪∈⇒∪
Multiplication BxandAxifBAxBA ∈∈∩∈⇒∩
( ) ( )CBACBA ∪∪=∪∪
AAA =∪ AOA =/∪ SSA =∪
AAA =∩ OOA /=/∩ ASA =∩
S
A
B
S
A
B
BA∪
S
A
B
BA∩
Complement ofA A OAAandSAA /=∩=∪⇒
S
A A
Difference BABA ∩=− S BA−
B
A
AB −

8
SOLO Set Theory
Set Operations
Incompatible Sets A and B are incompatible iff OBA /=∩
Decomposition of a Set
jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
S
OBA /=∩
B
A
S nAAAA ∪∪∪= 21
1A 2A nA
jiOAA ji ≠∀/=∩
S
1A 2A nA
If
we say that A is decomposed in incompatible sets.
jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If
we say that the set space S is decomposed in exhaustive and
incompatible sets.
De Morgan Law ( ) BABA ∩=∪ ( ) BABA ∪=∩
To find the complement of a set operations we must interchange
between and , and use the complements of the sets.∩ ∪
August De Morgan
(1806 – 1871)
On other form of De Morgan Law
AA
i
i
i
 = i
i
i
i
AA  =
Table of Content

9
SOLO Probability
Pr (A) is the probability of the event A if
S nAAAA ∪∪∪= 21
1A 2A nA
( ) 0Pr ≥A(1)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
( ) 1Pr =S(2)
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
Probability Axiomatic Definition
Probability Geometric Definition
Assume that the probability of an event in a geometric region A is defined as the
ratio between A surface to surface of S.
( ) ( )
( )SSurface
ASurface
A =Pr
( ) 0Pr ≥A(1)
( ) 1Pr =S(2)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
S
A

10
SOLO Probability
From those definition we can prove the
following:( ) 0=/OP(1’)
Proof: OOSandOSS /=/∩/∪=
( )
( ) ( ) ( ) ( ) 0PrPrPrPr
3
=/⇒/+=⇒ OOSS
( ) ( )APAP −= 1(2’)
Proof: OAAandAAS /=∩∪= ( )
( ) ( )
( ) ( ) ( ) ( )AAAAS Pr1PrPrPr1Pr
32
−=⇒+==⇒
( ) 1Pr0 ≤≤ A(3’)
Proof: ( )
( )
( )
( )
( ) 1Pr0Pr1Pr
1'2
≤⇒≥−= AAA
( )
( )APr0
1
≤
( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )n
AAAA PrPrPrPr 21
+++= 
( ) ( )AABAIf PrPr ≤⇒⊂(4’)
Proof: ( )
( )
( ) ( ) ( ) ( )BAAABB PrPr0PrPrPr
00
3
≤⇒≥+−=
≥≥

( ) ( ) OAABandAABB /=∩−∪−=
( ) ( ) ( ) ( )BABABA ∩−+=∪ PrPrPrPr(5’)
Proof: ( ) ( )
( ) ( ) ( ) ( ) OABBAandABBAB
OABAandABABA
/=−∩∩−∪∩=
/=−∩−∪=∪
( )
( )
( ) ( )
( )
( )
( ) ( )
( ) ( ) ( ) ( )BABABA
ABBAB
ABABA
∩−+=∪⇒




−+∩=
−+=∪
PrPrPrPr
PrPrPr
PrPrPr
3
3
Table of Content

11
SOLO Probability
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr(6’)
Proof by induction:
( ) ( ) ( ) ( )212121
PrPrPrPr AAAAAA ∩−+=∪For n = 2 we found that satisfies the equation
Assume equation true for n – 1.
( ) ( ) ( ) ( ) ( ) ( )


−+−+−=



 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
ni
n
i
ni AAAAAAAAAAAAA
Let calculate for n
but
( )
( ) ( ) ( ) ( ) ( ) ( )


−+




−+−+−=



−+




=








=





−
=
−
=
−





 −
≠≠





 −
≠





 −
=
−
=
−
=
−
==
∑∑∑ 



1
1
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
1
1
1
11
Pr1PrPrPr
PrPrPrPrPr
n
i
nin
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
n
i
in
n
i
in
n
i
i
n
i
i
AAPAPAAAAAAA
AAAAAAA
( ) ( ) ( ) ( ) 




−+−+−=




 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
Theorem of Addition

12
SOLO Probability
(6’)
Proof by induction (continue):
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )


−−−−+−+





−++−+−=





−
=
−





 −
≠≠





 −
≠





 −
=
−
=
−





 −
≠≠≠





 −
≠≠





 −
≠





 −
==
∑∑∑
∑∑∑∑




1
1
2
3
1
,.
2
1
.
1
1
1
1
1
2
4
1
.,.
3
1
,.
2
1
.
1
1
11
Pr1PrPrPrPr
Pr1PrPrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
nin
n
i
i
n
n
lkji
lkji
lkji
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAAAAAAAAAA
AAAAAAAAAAAA
Use the fact that ( )
( )
( )
( )
( )
( )
( )
( ) ( ) 





−
−
+




 −
=
−−−
−
+
−−
−
=
−−
−
−
=
−
=





1
11
!1!1
!1
!!1
!1
!!1
!1
!!
!
k
n
k
n
kkn
n
kkn
n
kkn
n
kn
n
kkn
n
k
n
to obtain
q.e.d.
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr
( ) ( ) ( ) ( ) ( ) ( )


−+−+−=



 −
=
−





 −
≠≠





 −
≠





 −
=
−
=
∑∑∑  
1
1
2
3
1
,.
2
1
.
1
1
1
1
1
Pr1PrPrPrPr
n
i
ni
n
n
kji
kji
nkji
n
ji
ji
nji
n
i
ni
n
i
ni
AAAAAAAAAAAAA
( ) ( ) ( ) ( ) 




−+−+−=





=
−






≠≠






≠






==
∑∑∑  
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAAAAAA
1
1
3
,.
2
.
1
11
Pr1PrPrPrPr
Theorem of Addition (continue)
Table of Content

13
SOLO Probability
Conditional Probability
S nAAAA ααα ∪∪∪= 21

1αA
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

Given two events A and B decomposed in elementary
events
jiOAAandAAAAA ji
n
i
in ≠∀/=∩=∪∪∪=
=
αααααα 
1
21
lkOAAandAAAAB lk
m
k
km ≠∀/=∩=∪∪∪=
=
ββββββ 
1
21
jiOAAandAAABA jir ≠∀/=∩∪∪∪=∩ αβαβαβαβαβ 21
( ) ( ) ( ) ( )n
AAAA ααα PrPrPrPr 21
+++=  ( ) ( ) ( ) ( )mAAAB βββ PrPrPrPr 21 +++= 
( ) ( ) ( ) ( ) nmrAAABA r ,PrPrPrPr 21 ≤+++=∩ βαβαβα 
We want to find the probability of A event under the condition that the event B
had occurred designed as P (A|B)
( )
( ) ( ) ( )
( ) ( ) ( )
( )
( )B
BA
AAA
AAA
BA
m
r
Pr
Pr
PrPrPr
PrPrPr
|Pr
21
21 ∩
=
+++
+++
=
βββ
βαβαβα



14
SOLO Probability
Conditional Probability S nAAAA ααα ∪∪∪= 21

1αA
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

If the events A and B are statistical independent, that the fact that B occurred will
not affect the probability of A to occur.
( ) ( )
( )B
BA
BA
Pr
Pr
|Pr
∩
= ( ) ( )
( )A
BA
AB
Pr
Pr
|Pr
∩
=
( ) ( )ABA Pr|Pr = ( ) ( ) ( ) ( ) ( ) ( ) ( )BAAABBBABA PrPrPr|PrPr|PrPr ⋅=⋅=⋅=∩
Definition:
n events Ai i = 1,2,…n are statistical independent if:
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content

15
SOLO Probability
Conditional Probability - Bayes Formula
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA ββββ Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
m
k
k ,
1
∀/=∩∩∩∩=
=
βββ
( ) ( )∑
=
∩=
m
k
k
BAB
1
PrPr β
we obtain:
( ) ( ) ( )
( )
( ) ( )
( ) ( )∑=
⋅
⋅
=
⋅
= m
k
kk
llll
l
AAB
AAB
B
AAB
BA
1
Pr|Pr
Pr|Pr
Pr
Pr|Pr
|Pr
ββ
ββββ
β
and Bayes Formula
Thomas Bayes
1702 - 1761
Table of Content
( ) ( ) ( ) ( ) ( ) ( )∑∑∑ ===
⋅=⋅=∩=
m
k
kk
m
k
k
m
k
k AABBBABAB
111
Pr|PrPr|PrPrPr ββββ

16
SOLO Probability
Total Probability Theorem
Table of Content
jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If
we say that the set space S is decomposed in exhaustive and
incompatible (exclusive) sets.
The Total Probability Theorem states that for any event B,
its probability can be decomposed in terms of conditional
probability as follows:
( ) ( ) ( ) ( )∑∑ ==
==
n
i
i
n
i
i BPBABAB
11
|Pr,PrPr
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
n
k
k ,
1
∀/=∩∩∩∩=
=

( ) ( )∑=
∩=
n
k
k BAB
1
PrPr
For any event B
we obtain:

17
SOLO Probability
Statistical Independent Events
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )∏∑∏∑∏∑
∑∑∑
=
−






≠≠
=






≠
=






=
=
−






≠≠






≠






==
−+−+−=






−+−+−=





n
i
i
n
n
kji
kji i
i
n
ji
ji i
i
n
i
i
tIndependen
lStatisticaA
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAA
AAAAAAAA
i
1
1
3
,.
3
1
2
.
2
1
1
1
1
1
3
,.
2
.
1
11
Pr1PrPrPr
Pr1PrPrPrPr

 
From Theorem of Addition
Therefore
( )[ ]∏==
−=





−
n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr1Pr1  ( )[ ]∏==
−−=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr11Pr 
Since OAASAA
n
i
i
n
i
i
n
i
i
n
i
i /=














=














====
 
1111
&








=





−
==

n
i
i
n
i
i AA
11
PrPr1
( )∏==
=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
PrPr 
If the n events Ai i = 1,2,…n are statistical independent
than are also statistical independentiA
( )∏=
=
n
i
iA
1
Pr





=
=

n
i
i
MorganDe
A
1
Pr ( )[ ]∏=
−=
n
i
i
tIndependen
lStatisticaA
A
i
1
Pr1
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content

18
SOLO Probability
Theorem of Multiplication
( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅=
Proof
( ) ( ) ( )ABABA /PrPrPr ⋅=∩Start from
( )[ ] ( ) ( )12121 /PrPrPr AAAAAAA nn  ⋅=
( ) ( ) ( )2131212 /Pr/Pr/Pr AAAAAAAAA nn  ⋅=
in the same way
( ) ( ) ( )12122112211 /Pr/Pr/Pr −−−−− ⋅= nnnnnnn AAAAAAAAAAAAA 

From those results we obtain:
( ) ( ) ( ) ( ) ( )12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn  −⋅⋅=
q.e.d.
Table of Content

19
SOLO Review of Probability
Random Variables
Let ascribe to each outcome or event a real number, such we have a one-to-one
correspondence between the real numbers and the Space of Events. Any function
that assigns a real number to each event in the Space of Events is called a random
variable (a random function is more correct, but this is the used terminology).
X
x
0
X
1 2 3 4 5 6
x
The random variables can be:
- Discrete random variables for discrete events
- Continuous random variables for continuous events
Table of Content

20
Probability Distribution and Probability Density Functions
The random variables map the space of events X to the space of real numbers x.
( )xP
x
0 ∞+∞−
0.1
The Probability Distribution Function or Cumulative Probability Distribution Function
of x can be defined as:
(1)
PX (x) is a monotonic increasing function
( ) ( ) ∞≤≤∞−≤= xxXxPX Pr:
The Probability Distribution Function has the following properties:
( ) ∞≤≤∞−=∞− xPX
0
(2) ( ) ∞≤≤∞−=∞+ xPX
1
(3)
( ) ( ) 2121 xxxPxP XX ≤⇔≤
The Probability that X lies in the interval (a,b) is given by:
( ) ( ) ( ) 0Pr ≥−=≤< aPbPbXa XX
If PX (x) is a continuous differentiable function of x we can define
( ) ( ) ( ) ( ) ( ) 0lim
Pr
lim:
00
≥=
∆
−∆+
=
∆
∆+≤<
=
→∆→∆
xd
xPd
x
xPxxP
x
xxXx
xp XXX
xx
X
the Probability Density Function of x.
( )xp
x
0 ∞+∞−
0.1

21
Probability Distribution and Probability Density Functions (continue – 1)
The Probability Distribution and Probability Density Functions of x can be defined
also for discrete random variables.
( ) ( ) ( ) ( ) ( ) integer
61
616/1
10
6
1
Pr
0
6
10
k
k
kk
k
dxixdxxpkXkxP
k
i
k
XX





≤
<≤++
<
=−==≤== ∫∑∫ =
δ
( )xp
x
0 6
0.1
1 2 3 4 5
( )xP
6/1
3/1
2/1
3/2
6/5
Example
Set space of a die:
six independent events {x=1}, {x=2}, {x=3}, {x=4}, {x=5}, {x=6}
( ) ( )∑
=
−=
6
16
1
:
i
X
ixxp δ
Where δ (x) is the Dirac delta function
( ) ( ) 1&
0
00
=



=∞
≠
= ∫
+∞
∞−
dxx
x
x
x δδ
X
1 2 3 4 5 6
x

22
Probability Distribution and Probability Density Functions (Examples)
(2) Poisson’s Distribution ( ) ( )0
0
exp
!
, k
k
k
nkp
k
−≈
(1) Binomial (Bernoulli) ( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
(3) Normal (Gaussian)
( ) ( ) ( )[ ]
σπ
σµ
σµ
2
2/exp
,;
22
−−
=
x
xp
(4) Laplacian Distribution ( )







 −
−=
b
x
b
bxp
µ
µ exp
2
1
,;

23
(5) Gama Distribution ( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
(6) Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
(7) Cauchy Distribution ( )
( ) 





+−
= 22
0
0
1
,;
γ
γ
π
γ
xx
xxp

24
SOLO
(8) Exponential Distribution
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
(9) Chi-square Distribution
( )
( )
( )
( )





<
≥−
Γ=
−
00
02/exp
2/
2/1
;
12/
2/
x
xxx
kkxp
k
k
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
(10) Student’s t-Distribution
( ) ( )[ ]
( ) ( )( ) 2/12
/12/
2/1
; +
+Γ
+Γ
= ν
ννπν
ν
ν
x
xp

25
SOLO
(11) Uniform Distribution (Continuous)
( )





>>
≤≤
−=
bxxa
bxa
abbaxp
0
1
,;
(12) Rayleigh Distribution
( ) 2
2
2
2
exp
;
σ
σ
σ






−
=
x
x
xp
(13) Rice Distribution
( ) 










 +
−
= 202
2
22
2
exp
,;
σσ
σ
σ
vx
I
vx
x
vxp

26
(14) Weibull Distribution
SOLO
( )





<
>≥













 −
−




 −
=
−
00
0,,exp
,,;
1
x
x
xx
xp
αγµ
α
µ
α
µ
α
γ
αµγ
γγ
Weibull Distribution Table of Content

27
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
The Conditional Probability Distribution Function or Cumulative Conditional Probability
Distribution Function of x given is defined as:
( ) ( ) ∞<<∞−∈≤= xYyxXyxP YX
/Pr://
Yy ∈
(1) ( ) ∞≤≤∞−=∞− xyP YX
0//
(2) ( ) ∞≤≤∞−=∞+ xyP YX
1//
PX/Y (x/y) is a monotonic increasing function(3)
( ) ( ) 212/1/ // xxyxPyxP YXYX ≤⇔≤
The Probability that X lies in the interval (a,b) given is given by:
( ) ( ) ( ) 0///Pr //
≥−=≤< yaPybPYbXa YXYX
If PX/Y (x/y) is a continuous differentiable function of x we can define
( ) ( ) ( ) ( ) ( ) 0
///
lim
/Pr
lim:/ ///
00
/ ≥=
∆
−∆+
=
∆
∆+≤<
=
→∆→∆
xd
yxPd
x
yxPyxxP
x
YxxXx
yxp YXYXYX
xx
YX
the Conditional Probability Density Function of x.
Yy ∈
The random variables map the space of events X to the space of real numbers x.

28
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
Example 1
Given PX (x) and pX (x) find PX/Y (x/x ≤ a) and pX/Y (x/x ≤ a)
( )
( ) ( )


≤
>
=≤
axaPxP
ax
axxP
XX
YX
/
1
//
( )
( ) ( )


≤
>
=≤
axaPxp
ax
axxp
XX
YX
/
0
//
Example 2
Given PX (x) and pX (x) find PX/Y (x/ b <x ≤ a) and pX/Y (x/ b< x ≤ a)
( ) ( ) ( )
( ) ( )







≥
<≤
−
−
<
=≤
ax
axb
bPaP
bPxP
bx
axxP
XX
XX
YX
1
0
//
( ) ( )
( ) ( )







≥
<≤
−
<
=≤
ax
axb
bPaP
xp
bx
axxp
XX
X
YX
0
0
//
Table of Content

29
Expected Value or Mathematical Expectation
Given a Probability Density Function p (x) we define the Expected Value
For a Continuous Random Variable: ( ) ( )∫
+∞
∞−
= dxxpxxE X:
For a Discrete Random Variable: ( ) ( )∑=
k
kXk xpxxE :
For a general function g (x) of the
Random Variable x: ( )[ ] ( ) ( )∫
+∞
∞−
= dxxpxgxgE X:
( )xp
x
0 ∞+∞−
0.1
( )xE
( )
( )
( )∫
∫
∞+
∞−
+∞
∞−
=
dxxp
dxxpx
xE
X
X
:
The Expected Value is the center of
surface enclosed between the
Probability Density Function and x
axis.
Table of Content

30
Variance
Given a Probability Density Functions p (x) we define the Variance
( ) ( )[ ]{ } ( ) ( )[ ] ( ) ( )22222
2: xExExExExxExExExVar −=+−=−=
Central Moment
( ) { }k
k xEx =:'µ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the origin
( ) ( )[ ]{ } ( ) ( )∑=
−−
−





=−=
k
j
jk
j
jkk
k xE
j
k
xExEx
0
'1: µµ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the Mean E (x)
Table of Content

31
Moments
Normal Distribution ( ) ( ) ( )[ ]
σπ
σ
σ
2
2/exp
;
22
x
xpX
−
=
[ ] ( )


 −⋅
=
oddnfor
evennforn
xE
n
n
0
131 σ
[ ]
( )





+=
=−⋅
= +
12!2
2
2131
12
knfork
knforn
xE kk
n
n
σ
π
σ
Proof:
Start from: and differentiate k time with respect to a( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
Substitute a = 1/(2σ2
) to obtain E [xn
]
( ) ( ) 0
2
1231
exp 12
22
>
−⋅
=− +
∞
∞−
∫ a
a
k
dxxax kk
k π
[ ] ( ) ( )[ ] ( ) ( )[ ]
( ) ( ) 12
!
0
122/
0
222221212
!2
2
exp
2
22
2/exp
2
2
2/exp
2
1
2
+
∞+
=
∞∞
∞−
++
=−=
−=−=
∫
∫∫
kk
k
k
k
xy
kkk
kdyyy
xdxxxdxxxxE
σ
πσ
σ
π
σ
σπ
σ
σπ
σ
  
Now let compute:
[ ] [ ]( )2244
33 xExE == σ
Chi-square

32
Moments
Gama Distribution ( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
[ ] ( )
( )
( )
( ) ( )
( )
n
knn
kn
k
n
k
kndxx
x
k
dxxx
k
xE θ
θθ
θ
θ
θ
θ Γ
+Γ
=





−
Γ
=−
Γ
= ∫∫
∞ −+∞
−+
0
1
0
1
/exp/exp
1
∞
−
−=Γ
0
1
exp dttta a

33
Moments
( )





>>−
≤≤−
=−
cxxc
cxc
cccxp
0
2
1
,;
[ ]





+=
+
==
−
+
−
∫
oddnfor
evennfor
n
c
n
x
c
dxx
c
xE
n
c
c
nc
c
nn
0
12
12
1
2
1
2
1
( ) 





−= 2
2
2
2
exp;
σσ
σ
xx
xp
[ ]
( )





=
−=−⋅
=
=





−=





−= ∫∫
∞
∞−
+
∞
knfork
knforn
dx
x
xdx
xx
xxE
kk
n
nnn
2!2
12131
2
2
exp
2
1
2
exp
2
2
2
1
0
22
2
2
σ
σ
π
σσσσ


34
Example
Repeat an experiment m times to obtain X1, X2,…,Xm.
Define:
Statistical Estimation:
m
XXX
X m
m
+++
=
21
Sample Variation:
( ) ( ) ( )
m
XXXXXX
V mmmm
m
22
2
2
1 −++−+−
=

( ) ( )[ ] 22
σµµ =−= ii
XEXE
( ) ( )[ ] jiXXE ji ≠∀=−− 0µµSince the experiment are uncorrelated:
( ) ( ) ( ) ( ) µ=
+++
=
m
XEXEXE
XE m
m
21
( ) ( )[ ]{ } ( ) ( ) ( )
mm
m
m
XXX
EXEXEXVar m
mmmXm
2
2
22
21
22 σσµµµ
σ ==













 −++−+−
=−==


35
Example (continue)
Statistical Estimation:
m
XXX
X m
m
+++
=
21
Sample Variation:
( ) ( ) ( )
m
XXXXXX
V mmmm
m
22
2
2
1 −++−+−
=

Let compute: ( )mVE
( ) ( ) ( )[ ]{ } ( )[ ] ( )[ ] ( )( )[ ]
m
XXEXEXE
m
XXE
m
XX
E mimimimi
µµµµµµ −−−−+−
=
−−−
=







 − 2
2222
( )( )[ ] ( ) ( ) ( ) ( )
( )[ ]
( ) ( )[ ]
mm
XXX
XEXXE
jiXXE
XE
mi
imi
ji
i
20
1
22
σµµµ
µµµ
µµ
σµ
≠∀=−−
=−
=




 −++−++−
−=−−

Therefore:
( ) ( )[ ] ( )[ ] ( )( )[ ] 2
22
2
222
1
2
2
σ
σσ
σ
µµµµ
m
m
m
mm
m
XXEXEXE
m
XX
E mimimi −
=
−+
=
−−−−+−
=







 −
( ) ( )[ ] ( )[ ] ( )[ ] 2
2
22
2
2
1 1
1
σ
σ
m
m
m
m
m
m
m
XXEXXEXXE
VE mmmm
m
−
=
−
=
−++−+−
=

Table of Content

36
Functions of one Random Variable
Let y = g (x) a given function of the random variable x defined o the domain Ω, with
probability distribution pX (x). We want to find pY (y).
Fundamental Theorem
Assume x1, x2, …, xn all the solutions of the equation
( ) ( ) ( )n
xgxgxgy ==== 21
( ) ( )
( )
( )
( )
( )
( )n
nXXX
Y
xg
xp
xg
xp
xg
xp
yp
''' 2
2
1
1
+++= 
( ) ( )
xd
xgd
xg =:'
Proof
( ) ( ) ( ) ( ) ( )
( )∑∑∑ ===
==±≤≤=+≤≤=
n
i i
iX
n
i
iiX
n
i
iiiY yd
xg
xp
xdxpxdxxxydyYyydyp
111 '
PrPr:
q.e.d.
Cauchy
Distribution
Derivation of
Chi-square

37
Functions of one Random Variable (continue – 1)
Example 1
bxay += ( ) 




 −
=
a
by
p
a
yp XY
1
Example 2
x
a
y = ( ) 





=
y
a
p
y
a
yp XY 2
Example 3
2
xay = ( ) ( )yU
a
y
p
a
y
p
ya
yp XXY
















−+








=
2
1
Example 4
xy = ( ) ( ) ( )[ ] ( )yUypypyp XXY −+=
Table of Content

38
Jointly, Distributed Random Variables
We are interested in function of several variables.
( ) ( )nnnXXX xXxXxXxxxP n
≤≤≤= ,,,Pr:,,, 22112121

The Jointly Cumulative Probability Distribution of the random variables
X1, X2, …,Xn is defined as:
The Cumulative Probability Distribution of the random variable Xi, can be
obtained from
( ) ( ) ( )∞∞∞=∞≤≤∞≤∞≤= ,,,,,,,,,,Pr 2121   iXXXniiiX xPXxXXXxP ni
( )nXXX xxxP n
,,, 2121

If the Jointly Cumulative Probability Distribution is continuous and differentiable
in each of the components than we can define the Joint Probability Density Function as:
( )
( )
n
nXXX
n
nXXX
xxx
xxxP
xxxp n
n
∂∂∂
∂
=





21
21
21
,,,
:,,, 21
21
( ) ( )∫ ∫ ∫
∞
∞−
∞
∞−
∞
∞−
≠
≠
=
ik
ik
ni nknXXXiX xdxdxdxxxpxp ,,,,,,, 12121
 

39
Jointly, Distributed Random Variables (continue – 1)
We define:
( )[ ] ( ) ( )∫ ∫
∞
∞−
∞
∞−
= nnXXXnn xdxdxxxpxxxgxxxgE n
,,,,,,,,:,,, 1212121 21
 
∑
=
=+++=
m
i
imm
XXXXS
1
21
: 
Example: Given the Sum of m Variables
[ ] ( ) ( )
( ) ( )∑∑ ∫ ∫
∫ ∫
==
∞
∞−
∞
∞−
∞
∞−
∞
∞−
==
+++=
m
i
i
m
i
nnXXXi
nnXXXnm
xExdxdxxxpx
xdxdxxxpxxxSE
n
n
11
121
12121
,,,,,
,,,,,:
21
21




[ ] ( )[ ]{ } ( ) ( )
( )[ ]
( )[ ]{ } ( )[ ] ( )[ ]{ }
( ) ( )∑∑∑
∑∑∑
∑
≠
=
≠
==
≠
=
≠
==
=
∑=
∑=
+=
−−+−=











 −=−=
=
=
m
ji
i
m
ij
j
ji
m
i
i
m
ji
i
m
ij
j
jjii
m
i
ii
m
i
ii
XS
XESE
mmm
XXCovXVar
XEXXEXEXEXE
XEXESESESVar
m
i
im
m
i
im
1 11
1 11
2
2
1
2
,2
2
:
1
1

40
Given the joint density function of n random variables X1, X2, …, Xn: ( )nXXX
xxxp n
,,, 2121

we want to find the joint density function of n random variables Y1, Y2, …, Yn that
are related to X1, X2, …, Xn, through
( )
( )
( )nnn
n
n
XXXgY
XXXgY
XXXgY
,,,
,,,
,,,
21
2122
2111




=
=
=










































∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
=




















n
n
nnn
n
n
n Xd
Xd
Xd
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
Yd
Yd
Yd






2
1
21
2
2
2
1
2
1
2
1
1
1
2
1
Assuming that the Jacobian
( )






















∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
=
n
nnn
n
n
n
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
XXXJ





21
2
2
2
1
2
1
2
1
1
1
21 det,,,
is nonzero for each X1, X2, …, Xn, exists a unique solution Y1, Y2, …, Yn

41
Assume that for a given Y1, Y2, …, Yn we can find k solutions (X1, X2, …, Xn)1,…
( X1, X2, …, Xn)k.
( ) ( )
( ) ( )
( )
( )∑
∑∑
=
==
=
=±≤≤±≤≤=
+≤≤+≤≤=
k
i
n
n
nXX
k
i
nnXX
k
i
innnn
nnnnnnYY
ydyd
xxJ
xxp
xdxdxxpxdxXxxdxXx
ydyYyydyYyydydyyp
n
n
n
1
1
1
1
1
11
1
1111
111111
,,
,,
,,,,Pr
,,Pr:,,
1
1
1








Therefore
( )
( )
( )∑
=
=
k
i
n
nXX
nYY
xxJ
xxp
yyp n
n
1
1
1
1
,,
,,
,, 1
1





The relation between the differential volume in (Y1, Y2, …, Yn) and the differential volume
in (X1, X2, …, Xn) is given by
( ) nnn
xdxdxxJydyd  111
,,=
1xd
2xd
3xd
1yd
2yd
3yd

42
Example 1
( ) ( )
( )
( )
( )
( )[ ]
( ) ( )
0,
/exp/exp/exp
, 11
11
, ≥
ΓΓ
+−
=
Γ
−
Γ
−
= −−
+
−−
yxyx
yxyyxx
yxp YX
βα
βαβ
β
α
α
θβα
θ
θβ
θ
θα
θ
X and Y are independent gamma random variables with parameters (α,λ) and
(β, λ), respectively, compute the joint densities of U= X + Y and V = X / (X + Y)
( )
( ) ( ) ( )


−=
=
⇔



+==
+==
VUY
VUX
YXXYXgV
YXYXgU
1/,
,
2
1
( ) ( )
UYX
YX
X
YX
YJ
11
11
22
−=
+
−=
+
−
+
=
( )
( )
( )
( )[ ]
( )[ ]
[ ]
( ) ( )
( ) ( )[ ] uvuuv
u
vuuvJ
vuuvp
yxJ
yxp
vup YXYX
VU
11,,
, 1
/exp
1,
1,
,
,
,
−−
+
−
ΓΓ
−
=
−
−
==
βα
βα
θβα
θ
[ ]
( )
( )
( ) ( )
( ) 11
1
1
/exp −−
+
−+
−
ΓΓ
+Γ
+Γ
−
=
βα
βα
βα
βα
βα
θβα
θ
vv
uu
Therefore ( ) [ ]
( ) βα
βα
θβα
θ
+
−+
+Γ
−
=
/exp1
uu
upU
( ) ( )
( ) ( )
( ) 11
1
−−
−
ΓΓ
+Γ
=
βα
βα
βα
vvvpV
gamma distribution
beta distribution
Table of Content

43
Characteristic Function and Moment-Generating Function
Given a Probability Density Functions pX (x) we define the Characteristic Function or
Moment Generating Function
( ) ( )[ ]
( ) ( ) ( ) ( )
( ) ( )




=
==Φ
∑
∫∫
+∞
∞−
+∞
∞−
x
X
XX
X
discretexxpxj
continuousxxPdxjdxxpxj
xjE
ω
ωω
ωω
exp
expexp
exp:
This is in fact the complex conjugate of the Fourier Transfer of the Probability Density
Function. This function is always defined since the condition of the existence of a
Fourier Transfer :
Given the Characteristic Function we can find the Probability Density
Functions pX (x) using the Inverse Fourier Transfer:
( )
( )
( ) ∞<== ∫∫
+∞
∞−
≥+∞
∞−
1
0
dxxpdxxp X
xp
X
( ) ( ) ( )∫
+∞
∞−
Φ−= ωωω
π
dxjxp XX exp
2
1
is always fulfilled.

44
Properties of Moment-Generating Function
( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X
ω
ω
ω
exp
( ) ( ) 10
==Φ ∫
+∞
∞−
=
dxxpXX ω
ω
( ) ( ) ( )xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=0ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X 22
2
2
exp ω
ω
ω ( ) ( ) ( ) ( ) ( )2222
0
2
2
xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
nn
n
X
n
ω
ω
ω
exp
( ) ( ) ( ) ( ) ( )nn
X
nn
n
X
n
xEjdxxpxj
d
d
==
Φ
∫
+∞
∞−=0ω
ω
ω
 
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
This is the reason why ΦX (ω) is also called the Moment-Generation Function.

45
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 

+++++=
+
Φ
++
Φ
+
Φ
+Φ=Φ
===
=
n
n
n
n
X
n
XX
XX
xE
n
j
xE
j
xE
j
d
d
nd
d
d
d
!!2!1
1
!
1
!2
1
2
2
0
2
0
2
2
0
0
ωωω
ω
ω
ω
ω
ω
ω
ω
ω
ω
ωω
ωωω
ω
Develop ΦX (ω) in a Taylor series
( ) ( ) ( )∫
+∞
∞−

46
Moment-Generating Function
Binomial Distribution
( ) ( )[ ] ( )
( )
( )
( )
( )[ ] ( )
( ) ( )[ ] n
n
k
knk
n
k
knk
pjp
ppkj
knk
n
pp
knk
n
kjkjE
−+=
−
−
=−
−
==Φ ∑∑
=
−
=
−
1exp
1exp
!!
!
1
!!
!
expexp
00
ω
ωωωω
( )
( )
( ) knk
pp
knk
n
nkp
−
−
−
= 1
!!
!
,
Poisson Distribution ( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( ) ( ) ( )[ ]
( ) ( )[ ] ( )[ ]{ }1expexpexpexpexp
!
exp
exp
!
exp
exp
00
−=−=
−=
−
=Φ ∑∑
∞
=
∞
=
ωλλωλ
λω
λ
λλ
ωω
jj
k
j
k
kj
k
k
k
k
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
( ) ( ) ( ) ( )[ ]
( )[ ]
ωλ
λ
λω
λω
λ
λωλλλωω
j
xj
j
dxxjdxxxj
−
=−
−
=
−=−=Φ
∞
∞∞
∫∫
0
00
exp
expexpexp

47
Moment-Generating Function
Normal Distribution ( ) ( )





 −
−= 2
2
2
exp
2
1
,;
σ
µ
σπ
σµ
x
xp
( ) ( ) ( ) ( )
∫∫
∞
∞−
∞
∞−





 −+−
−=




 −
−=Φ dx
xjxx
dx
x
xj 2
222
2
2
2
22
exp
2
1
2
expexp
2
1
σ
ωσµµ
σπσ
µ
ω
σπ
ω
Let write ( )
( )[ ] ( )
( )[ ] µσωσωσωµ
µσωµσωµ
µσωµσωµµ
24222
22222
222222
2
222
jjx
jjx
xjxxjxx
−++−=
++−+−=
++−=−+−
Therefore
( ) ( ) ( ) ( )[ ]
  
1
2
22
2
242
2
2
2
exp
2
1
2
2
exp
2
expexp
2
1
∫∫
∞
∞−
∞
∞− 






 +−
−




 −
−=




 −
−=Φ dx
jxj
dx
x
xj
σ
σωµ
σπσ
µσωσω
σ
µ
ω
σπ
ω
Central Limit
Theorem
( ) ( ) ( )∫
+∞
∞−
Φ−= ωωω
π
dxjxp XX
exp
2
1
Using
( ) ( )∫
∞
∞−






−−−=




 −
− ωµωσω
πσ
µ
σπ
dxj
x 22
2
2
2
1
exp
2
1
2
exp
2
1
( ) 





+−=Φ µωσωω j22
2
1
exp Poisson
Distribution

48
Moment-Generating Function of the Sum of Independent Random Variables
mm
XXXS +++= 21
:
Given the Sum of Independent Random Variables
( ) ( )[ ] ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )∫∫∫
∫ ∫ ∫
∞
∞−
∞
∞−
∞
∞−
=
∞
∞−
∞
∞−
+∞
∞−
=
+++=Φ
mmXmXX
xpxpxpxxxp
VariablesRandomtIndependen
mmSmS
xdxpxjxdxpxjxdxpxj
dxdxdxxxxpXXXj
m
mmXXXmmS
mm
ωωω
ωω
expexpexp
,,,exp
222111
,,,
212121
21
221121



( ) ( ) ( ) ( )ωωωω mm XXXS
ΦΦΦ=Φ 21
( ) ( ) ,m,,ik
k
kp i
i
k
ii
iiXi
21integerpositive
!
exp
; =
−
=
λλ
λ
( ) ( )[ ]{ } mijiXi
,,2,11expexp =−=Φ ωλω
( ) ( ) ( ) ( ) ( ) ( )[ ]{ }1expexp 2121
−+++=ΦΦ⋅Φ=Φ ωλλλωωωω jmXXXS mm

( ) ( ) ( )∫
+∞
∞−
Sum of Poisson Independent Random Variables is a Poisson Random Variable
with mSm
λλλλ +++= 21
Example 1: Sum of Poisson Independent Random Variables mm
XXXS +++= 21
:

49
Example 2: Sum of Normal Independent Random Variables
( ) ( ) ( ) ( )
( ) ( )





+++++++−=






+−





+−





+−=
ΦΦ⋅Φ=Φ
mm
mm
XXXS
j
jjj
mm
µµµωσσσω
µωσωµωσωµωσω
ωωωω



21
22
2
2
1
2
22
2
2
2
2
1
2
1
2
2
1
exp
2
1
exp
2
1
exp
2
1
exp
21
Sum of Normal Independent Random Variables is a Normal Random Variable with
( ) ( ) ( )∫
+∞
∞−
( ) ( )





 −
−= 2
2
2
exp
2
1
,;
i
i
i
iii
x
xp
σ
µ
σπ
σµ
mm
XXXS +++= 21
:
( ) 





+−=Φ iiX
ji
µωσωω
22
2
1
exp
mS
mS
m
m
µµµµ
σσσσ
+++=
+++=


21
22
2
2
1
2
Therefore the Sm probability distribution is:
( ) ( )







 −
−= 2
2
2
exp
2
1
,;
m
m
m
mm
S
S
S
SSm
x
Sp
σ
µ
σπ
σµ
Table of Content

50
Existence Theorems
Existence Theorem 1
Given a function G (x) such that
( ) ( ) ( ) 1lim,1,0 ==∞+=∞−
∞→
xGGG
x
( ) ( ) 2121 0 xxifxGxG <=≤ ( G (x) is monotonic non-decreasing)
( ) ( ) ( )xGxGxG n
xx
xx
n
n
==
≥
→
+ lim
We can find an experiment X and a random variable x, defined on X, such that
its distribution function P (x) equals the given function G (x).
Proof of Existence Theorem 1
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axis.
5x
1x 2x 3x 4x 6x 7x 8x
∞− ∞+
To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1).
From our definition of G (x) it follows that P (x) is a distribution function.
Existence Theorem 2 Existence Theorem 3

51
Existence Theorems
Existence Theorem 2
If a function F (x,y) is such that
( ) ( ) ( )
( ) ( ) ( ) ( ) 0,,,,
1,,0,,
11122122 ≥+−−
=+∞∞+=−∞=∞−
yxFyxFyxFyxF
FxFyF
for every x1 < x2, y1 < y2, then two random variables x and y can be found such that
F (x,y) is their joint distribution function.
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
Assume that the outcome of the experiment Y is any real number -∞ <y < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axes x and y.
To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1).
From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.
The proof is similar to that in the Existence Theorem 1

52
Histogram
A histogram is a mapping mi that counts the number
of observations that fall into various disjoint
categories (known as bins), whereas the graph of a
histogram is merely one way to represent a histogram.
Thus, if we let n be the total number of observations
and k be the total number of bins, the histogram mi
meets the following conditions:
∑=
=
k
i
imn
1
A cumulative histogram is a mapping that counts the cumulative number of
observations in all of the bins up to the specified bin. That is, the cumulative
histogram Mi of a histogram mi is defined as:
∑=
=
i
j
ji mM
1
An ordinary and a cumulative
histogram of the same data. The
data shown is a random sample of
10,000 points from a normal
distribution with a mean of 0 and a
standard deviation of 1.
Cumulative Histogram

53
Law of Large Numbers (History)
The Weak Law of Large Numbers was first proved by the Swiss
mathematician James Bernoulli in the fourth part of his work
“Ars Conjectandi” published posthumously in 1713.
Jacob Bernoulli
-1654 1705
The Law of Large Numbers has three versions:
• Weak Law of Large Numbers (WLLN)
• Strong Law of Large Numbers (SLLN)
• Uniform Law of Large Numbers (ULLN)
The French mathematician Siméon Poisson generalized
Bernoulli’s theorem around 1800.
Siméon Denis Poisson
1781-1840
The next contribution was by Binaymé and later
in 1866 by Chebyshev and is known as
Binaymé- Chebyshev Inequality.
Pafnuty Lvovich
Chebyshev
1821 - 1894
Irénée-Jules Bienaymé
1796 - 1878
Weak Law of Large Numbers (WLLN)

54
Law of Large Numbers (History - continue)
Francesco Paolo
Cantelli
1875-1966
Félix Edouard Justin Ėmile
Borel
1871-1956
Andrey Nikolaevich
Kolmogorov
1903 - 1987
Table of Content
Borel-Cantelli Lemma

55
If X is a random variable which takes only nonnegative values, then for any value a>0
( ) ( )
a
XE
aX ≤≥Pr
Proof:
Suppose X is continuous with probability density function ( )xpX
( ) ( ) ( ) ( )
( ) ( ) ( )
( )aXa
dxxpadxxpadxxpx
dxxpxdxxpxdxxpxXE
a
X
a
X
a
X
a
X
a
XX
≥=
=≥≥
+==
∫∫∫
∫∫∫
∞∞∞
∞∞
Pr
00
1856 - 1922
Since a > 0:
( ) ( )
a
XE
aX ≤≥Pr
Table of Content

56
If X is a random variable with mean μ = E (X) and variance σ2
= E [(X – μ)2
] ,
then for any value k > 0
{ } 2
2
Pr
k
kX
σ
µ ≤≥−
Proof:
Since (X – μ)2
is a nonnegative random variable,
we can apply Markov’s inequality with a = k2
to obtain
( ){ } ( )[ ]
2
2
2
2
22
Pr
kk
XE
kX
σµ
µ =
−
≤≥−
But since (X – μ)2
≥ k2
if and only if | X – μ| ≥ k, the above is equivalent to
{ } 2
2
Pr
k
kX
σ
µ ≤≥−
Pafnuty Lvovich
Chebyshev
1821 - 1894
Weak Law of
Large NumbersTake k σ instead of k to obtain
{ } 2
1
Pr
k
kX ≤≥− σµ
Bernoulli’s
Theorem
Table of Content

57
Bienaymé’s Inequality
If X is a random variable then for any values a, k > 0 use
Proof:
Let prove first that if the random variable y takes only positive values, than for any α>0
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
nn
( ) ( )
α
α
yE
y ≤≥Pr
i.e.
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
( ) ( ) ( ) ( ) ( )ααα
αα
≥=≥≥= ∫∫∫
∞∞∞
ydyypdyypydyypyyE YYY Pr
0
Define and choose α = kn
> 00: ≥−=
n
aXy
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
nn
kaXkaX nn
≥−⇒≥−
{ } [ ] 0Pr >
−
≤≥− a
k
aXE
kaX n
n
Irénée-Jules
Bienaymé
1796 - 1878
For n = 2 and a = μ we obtain the Chebyshev’s Inequality. For this reason
Chebyshev’s Inequality is known also as Bienaymé - Chebyshev’s Inequality
x
0 ∞+
0.1
[ ]
n
n
k
aXE −
−
a
ka +ka −
[ ]
n
n
k
aXE −
( )xpX
Table of Content

58
Chernoff’s and Hoeffding’s Bounds
Start from Markov’s Inequality
for a nonnegative random
variable Z and γ > 0
( ) ( ) 0,0Pr ≥>≤≥ Z
ZE
Z γ
γ
γ
Now let take a random variable Y and define the logarithmic
generating function
( )
( )[ ] ( )[ ]



∞
∞<
=Λ
otherwise
YtEifYtE
tY
,
exp,expln
:
Using the fact that exp (x) is a monotonic increasing function
( ) ( ) 0expexp ≥∀≥⇒≥ ttYtY λλ
and applying Markov’s inequality with ( ) ( )λγ tYtZ exp:&exp: ==
we obtain:
( ) ( ) ( )[ ] ( )[ ]
( )
( )[ ]{ } 0exp
exp
exp
expexpPrPr ≥∀Λ−−=≤≥=≥ ttt
t
YtE
tYtY Y
λ
λ
λλ
Therefore: ( ) ( )[ ]{ }ttY Yt
Λ−−≤≥ ≥∀
λλ expinfPr 0
From this inequality, by using different Y, we obtain the Chernoff’s
and Hoeffding’s bounds
To compute ΛY(t) we need to know the distribution function pY (y).
Markov, Chebyshev and Bienaymé inequalities use only Expectation Value
information. Let try to obtain a tighter bound when the probability distribution
function is known.
0
infimum
≥∀t
Table of Content

59
Chernoff’s Bound
Let X1, X2,… be independent Bernoulli’s random variables
with Pr (Xi=1) = p and Pr (Xi=0) = 1-p
Herman Chernoff
1921 -
Use
( ) mm XmXXY =++= :/: 1 Define:
( ) ( )[ ]{ } ( ) ( ) ( )[ ]
( )[ ]ptp
ptptXtEt iXi
−+=
−⋅+⋅==Λ
1expln
10exp1explnexpln
( ) ( )[ ]{ }ttY Yt
Λ−−≤≥ ≥∀
λλ expinfPr 0
( ) ( )[ ]{ }pmtpmX
m
t
Et
m
i
iY −+=




















=Λ ∑=
1/explnexpln
1
( )[ ]{ } ( )[ ]tttt Y
t
Yt
Λ−⇔Λ−−
≥∀≥∀
λλ
00
supexpinf
( ) ( )[ ]pmtpmttt Y −+−=Λ− 1/explnλλ
( )[ ] ( )
( )
0
1/exp
/exp
=
−+
−=Λ−
pmtp
mtp
tt
td
d
Y λλ ( )
λ
λ
−
−
=
1
1
/*exp
p
p
mt
( ) ( ) 











−
−
−+





=





−
−
−





−
−
=Λ−
pp
m
p
m
p
p
mtt Y
1
1
ln1ln
1
1
ln
1
1
ln**
λ
λ
λ
λ
λλ
λ
λλ

60
Chernoff’s Bound (continue – 1)
Use
( )[ ] ( ) 10
1
1
ln1lnexpinf/Pr
10
1 <<




















−
−
−+





−≤≥++
<<
p
pp
mmXX m
λ
λ
λ
λλ
λ

Define:
( ) ( ) 1,0
1
1
ln1ln:| <<











−
−
−+





−= p
pp
mpH λ
λ
λ
λ
λλ
( ) ( )[ ]{ } ( )[ ]{ }**expsupexpinfsupPr
10010
ttttY YY
t
Λ−−=Λ−−≤≥
<<≥∀<<
λλλ
λλ
( ) 0| == ppH λ
( )






/−





−
−
−/+





−= 1
1
1
ln1ln
|
pp
m
d
pHd λλ
λ
λ ( ) 0
|
=
=
λ
λ
d
ppHd
( ) mm
d
pHd
4
1
11|
2
2
−≤





−
+−=
λλλ
λ
0 0.1
m4−
( )
2
2
|
λ
λ
d
pHd
5.0 λ

61
( )[ ] ( )[ ] ( ) 10|supexp|expsup/Pr
1010
1 <<


=≤≥++
<<<<
ppHpHmXX m λλλ
λλ

( ) ( ) 1,0
1
1
ln1ln:| <<











−
−
−+





−= p
pp
mpH λ
λ
λ
λ
λλ
( ) ( ) ( ) ( ) ( ) ( )
( )2
4
10
2
00
2
|
!2
|
!1
||
pm
pH
p
ppH
p
ppHpH
m
−−≤
=
−
+=
−
+==
−≤
≤≤
λ
θλ
λ
λ
λ
λλ
θ
λλλ
    
From which we arrive to the Chernoff’s Bound
( )[ ] ( )[ ] 1,02exp/Pr
2
1 <<−−≤≥++ ppmmXX m λλλ
Define
ελ += p:
( )[ ] [ ] 102exp/Pr 2
1
<<−≤+≥++ pmpmXX m
εε

62
Using the Chernoff Bound we obtain
Define now:
( ) ( )( )[ ] [ ] 102exp1/11Pr 2
1
<<−≤+−≥−++− pmpmXX m
εε
( ) ( )[ ] mm XmXXY −=−++−= 1:/11: 1 
or since ( ) ( )( ) ( ) ε+−≥++−=−++− pmXXmXX mm
1/1/11 11

( ) ε−≤++ pmXX m
/1

( )[ ] [ ] 102exp/Pr 2
1
<<−≤−≤++ pmpmXX m
εε
together with:
( )[ ] [ ] 102exp/Pr 2
1
<<−≤+≥++ pmpmXX m
εε
Chernoff’s Bounds
Herman Chernoff
1921 -
( )[ ] [ ] 102exp2/Pr 2
1
<<−≤≥−++ pmpmXX m
εε
By summing those two inequalities we obtain:
( )[ ] ( )[ ] [ ] 102exp2//Pr 2
11
<<−≤≥−+++−≤−++ pmpmXXpmXX mm
εεε 
or:
Table of Content

63
Hoeffding’s Bound
Suppose that Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b
and assume E (Y) = 0
Define: 10: ≤≤→
−
−
=
≤≤
αα
bYa
ab
Yb
We have: ( )bab
ab
aY
a
ab
Yb
Y αα −+=
−
−
+
−
−
= 1
Since exp (.) is a convex function, for any t ≥ 0 we
have:
( ) ( ) ( ) ( ) ( ) ( ) 0expexpexp1expexp ≥∀
−
−
+
−
−
=−+≤ ttb
ab
aY
ta
ab
Yb
tbtatY αα ta tbtY
( )taexp
( )tbexp
( )tYexp
( ) ( ) ( )tbta exp1exp λλ −+
( ) 01 ≥−+= ttbtatY λλ
Let take the expectation of this inequality and define:
( )
( )
10/:
0
≤≤→−−=
=
pabap
YE
( ){ } ( )

( ) ( )

( )
( ) ( ) ( )
( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1
expexp1
expexpexp
00
≥∀=−−+−=
+−=
−
−
+
−
−
≤
tutabptabpp
tbptap
tb
ab
aYE
ta
ab
YEb
tYE
φ
Let start with a simpler problem:

64
Hoeffding’s Bound (continue – 1)
( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ
where:
( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu
( ) ( )
( )
( ) ( )[ ]
( )
( ) 00
exp1
1exp1
exp1
exp
=→
+−
−−
=
+−
+−=
ud
d
upp
upp
upp
up
pu
ud
d φ
φ
Differentiating we obtain:
( ) ( ) ( )
( ) ( )[ ]22
2
exp1
exp1
upp
upp
u
ud
d
−−+
−−
=φ
( ) ( ) ( )
( ) ( )[ ]
( ) ( )[ ] ( )
p
p
uupp
upp
upp
u
ud
d
−
=−→=−−+−
−−+
−−
=
1
*exp0exp1
exp1
exp1
33
3
φ
( ) ( )
( )
( )
( )
( )
4
1
1
1
1
1
* 22
2
2
2
=






−
−+
−
−
=≤
p
p
pp
p
p
pp
u
ud
d
u
ud
d
φφ

65
( ){ } ( )[ ]{ } ( )[ ] ( )[ ] 0exp:expexp1exp ≥∀=−−+−≤ tutabptabpptYE φ
where:
( )abtu −=: ( ) ( ){ } ( ) 00exp1ln: =→+−+−= φφ uppupu
( ) ( ) ( ) ( ) ( )222
4
100
8
1
''
2
1
0'0 abtuuu −≤++=
≤

θφφφφ
( ){ } ( )[ ] 08/expexp
22
≥∀−≤ tabttYE
End of the simpler problem:
Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and
assume E (Y) = 0

66
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for
i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
( ){ } ( )[ ]
( )
( ){ } ( )[ ] 08/expexp8/expexp
22
0
22
≥∀−≤−→−≤
=
tabttZEabttZE iii
ZE
iii
i
Generalize the result
Use
( ) ( )[ ]
( )
0
exp
exp
Pr ≥∀≤≥ t
t
YtE
Y
λ
λ
in
( ) ( )
( ) ( ) ( ) ( )
( ) ( )[ ]
( ) 0
8
exp2
8/expexp2
expexpexpexp
expexpexpexp
PrPrPr
1
2
2
1
22
11
11
111
≥∀





−+−=
−−≤






−−+





−=












−−+











−≤






≥−+





≥=





≥
∑
∏
∏∏
∑∑
∑∑∑
=
=
==
==
===
tab
t
t
abtt
ZtEtZtEt
ZtEtZtEt
ZZZ
m
i
ii
m
i
ii
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
m
i
i
λ
λ
λλ
λλ
λλλ
mm ZZZZ +++= 21

67
Wassily Hoeffding
1914 - 1991
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi for
i = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
Generalize the result
but
( ) ( ) 0
8
infexp2
8
exp2Pr
1
2
2
0
1
2
2
1
≥∀












−+−≤





−+−≤





≥ ∑∑∑ =
≥
==
tab
t
tab
t
tZ
m
i
ii
t
m
i
ii
m
i
i λλλ
mm ZZZZ +++= 21
( )
( )
( )∑ −−=





−+−
=
∑ −=
=
≥
=
∑
m
i
ii
abt
m
i
ii
t
abab
t
t
m
i
ii
1
22
/4*
1
2
2
0
/2
8
inf
1
2
λλ
λ
( ) 











−
−≤





≥
∑
∑
=
=
m
i
ii
m
i
i
ab
Z
1
2
1
2
exp2Pr
λ
λ
We finally obtain Hoeffding’s Bound
Table of Content

68
Convergence Concepts
We say that the sequence Xn, converge to X with probability 1 if the set of outcomes
x such that
has the probability 1, or
{ } ∞→=→ nforXXn 1Pr
( ) ( )xXxXn
n
=
∞→
lim
Convergence Almost Everywhere (a.e.) (or with Probability 1, or Strongly)
Convergence in the Mean-Square sence (m.s.)
We say that the sequence Xn, converge to X in the mean-square sense if
{ } ∞→→→ nforXXE n 0
2
Convergence in Probability (p) (or Stochastic Convergence or Convergence in Measure)
We say that the sequence Xn, converge to X in Probability sense if
{ } ∞→→>→ nforXXn 0Pr ε
Convergence in Distribution (d) (weak convergence)
We say that the sequence Xn, converge to X in Distribution sense if
( ) ( ) ∞→→ nforxpxp XXn
No convergence
Distribution
Almost
Everywhere
(d)
(p)(a.e.)
(m.s.)
Mean Square
Probability
implies
implies
implies
or XX
ea
n
..
→
or XX
sm
n
..
→
or XX
P
n →
or XX
d
n →
Weak Law of
Large Numbers
Central Limit
Theorem
Bernoulli’s
Theorem

69
Convergence Concepts (continue – 1)
According to Cauchy Criterion of Convergence
the sequence Xn, converge to a unknown limit if
Cauchy Criterion of Convergence
Augustin Louis Cauchy
)-1789 1857(00 >∞→→→ + manyandnforXX mnn
Convergence Almost Everywhere (a.e.)
{ } 01Pr >∞→=<→ + manyandnforXX mnn ε
Convergence in the Mean-Square sence (m.s.)
{ } 00
2
>∞→→<→ manyandnforXXE mn ε
{ } ∞→→>→ nforXXn 0Pr ε
Using Chebyshev Inequality
{ } { } 22
/Pr εε XXEXX nn →≤>→
If Xn →X in the m.s. sense, then the right hand,
for a given ε, tends to zero, also the left hand side,
i.e.: Convergence in Probability (p)
The opposite is not true, convergence in probability doesn’t imply convergence in m.s.
No convergence
Distribution
Almost
Everywhere
(d)
(p)(a.e.)
(m.s.)
Mean Square
Probability
implies
implies
implies
Table of Content

70
The Laws of Large Numbers
The Law of Large Numbers is a fundamental concept in statistics and probability that
describes how the average of randomly selected sample of a large population is likely
to be close to the average of the whole population. There are two laws of large numbers
the Weak Law and the Strong Law.
The Weak Law of Large Numbers
The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), then
( ) nXXX nn /: 1 ++= 
converges in probability (a weak convergence sense) to μ . We have
{ } ∞→=<− nforXn 1Pr εµ
converges in
probability
The Strong Law of Large Numbers
The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), and E (|Xi|) < ∞
then ,i.e. converges almost surely to μ.{ } ∞→== nforXn 1Pr µ
converges
almost surely

71
Differences between the Weak Law and the Strong Law
The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ.
Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite
number of times, although it happens at infrequent intervals.
The Strong Law shows that this almost surely will not occur.
In particular, it implies that with probability 1, we have for any positive value ε, the
inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to
an infinite, but infrequent, number of times).
Almost sure convergence is also called strong convergence of random variables.
This version is called the strong law because random variables which converge
strongly (almost surely) are guaranteed to converge weakly (in probability). The
strong law implies the weak law.

72
Proof of the Weak Law of Large Numbers
( ) iXE i ∀= µ ( ) iXVar i ∀= 2
σ ( )( )[ ] jiXXE ji ≠∀=−− 0µµ
( ) ( ) ( )[ ] µµ ==++= nnnXEXEXE nn //1 
( ) ( )[ ]{ } ( ) ( )
( )( )[ ] ( )[ ] ( )[ ]
nn
n
n
XEXE
n
XX
E
n
XX
EXEXEXVar
n
jiXXE
nn
nnn
ji 2
2
2
2
22
1
0
2
1
2
12
σσµµ
µµ
µ
µµ
==
−++−
=













 −++−
=














−
++
=−=
≠∀=−−


Given
we have:
Using Chebyshev’s inequality on we obtain:nX ( ) 2
2
/
Pr
ε
σ
εµ
n
Xn ≤≥−
Using this equation we obtain:
( ) ( ) ( ) n
XXX nnn 2
2
1Pr1Pr1Pr
ε
σ
εµεµεµ −≥≥−−≥>−−=≤−
As n approaches infinity, the expression approaches 1.
Chebyshev’s
inequality
q.e.d.
Table of Content
Monte Carlo
Integration
Monte Carlo
Integration

73
Central Limit Theorem
The first version of this theorem was first postulated by the
French-born English mathematician Abraham de Moivre in
1733, using the normal distribution to approximate the
distribution of the number of heads resulting from many tosses
of a fair coin. This was published in1756 in “The Doctrine
of Chance” 3th Ed.
Pierre-Simon Laplace
(1749-1827)
Abraham de Moivre
(1667-1754)
This finding was forgotten until 1812 when the French
mathematician Pierre-Simon Laplace recovered it in his work
“Théory Analytique des Probabilités”, in which he approximate
the binomial distribution with the normal distribution.
This is known as the De Moivre – Laplace Theorem.
De Moivre – Laplace
Theorem
The present form of the Central Limit Theorem was given by the
Russian mathematician Alexandr Lyapunov in 1901.
Alexandr Mikhailovich
Lyapunov
(1857-1918)

74
Central Limit Theorem (continue – 1)
Let X1, X2, …, Xm be a sequence of independent random variables with the same
probability distribution function pX (x). Define the statistical mean:
m
XXX
X m
m
+++
=
21
( ) ( ) ( ) ( ) µ=
+++
=
m
XEXEXE
XE m
m
21
( ) ( )[ ]{ } ( ) ( ) ( )
mm
m
m
XXX
EXEXEXVar m
mmmXm
2
2
22
21
22 σσµµµ
σ ==













 −++−+−
=−==

Define also the new random variable
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
We have:
The probability distribution of Y tends to become gaussian (normal) as m
tends to infinity, regardless of the probability distribution of the random
variable, as long as the mean μ and the variance σ2
are finite.

75
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
Proof
The Characteristic Function
( ) ( )[ ] ( ) ( ) ( )
( ) ( )
( )
m
X
m
i
m
i
i
m
Y
m
X
m
j
E
m
X
jE
m
XXX
jEYjE
i














Φ=



















 −
=













 −
=













 −++−+−
==Φ
−
=
∏
ω
σ
µω
σ
µ
ω
σ
µµµ
ωωω
σ
µexpexp
expexp
1
21 
( )
( ) ( ) ( ) ( ) ( ) ( )
0/lim
2
1
!3
/
!2
/
!1
/
1
2222
33
1
22
0
=





Ο/





Ο/+−=
+













 −
+













 −
+




 −
+=





Φ
∞→
−
mmmm
X
E
mjX
E
mjX
E
mj
m
m
iii
Xi
ωωωω
σ
µω
σ
µω
σ
µωω
σ
µ 
  
Develop in a Taylor series( ) 





Φ −
miX
ω
σ
µ

76
Proof (continue – 1)
The Characteristic Function ( ) ( )
m
XY
m
E i














Φ=Φ −
ω
ω
σ
µ
( ) 0/lim
2
1
2222
=





Ο/





Ο/+−=





Φ
∞→
−
mmmmm m
Xi
ωωωωω
σ
µ
( ) ( )2/exp
2
1 2
22
ω
ωω
ω −→











Ο/+−=Φ
∞→m
m
Y
mm
Therefore
( ) ( ) ( ) ( ) ( )2/exp
2
1
2/exp
2
1
exp
2
1 22
ydyjdyjyp
m
YY −=−−→Φ−= ∫∫
+∞
∞−
∞→+∞
∞− π
ωωω
π
ωωω
π
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity
(Convergence in Distribution).
Characteristic Function
of Normal Distribution
Convergence
Concepts
Table of Content
Monte Carlo
Integration

77
Bernoulli Trials – The Binomial Distribution
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
Jacob
Bernoulli
-1654 1705
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) pnxE =
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance( ) ( )ppnxVar −= 1
( ) ( )∫ −= −
x
a
dtttxa
0
1
exp,γγ is the incomplete gamma function
( ) ( )[ ]n
pjp −+=Φ 1exp ωω Distribution
Examples

78
Bernoulli Trials – The Binomial Distribution (continue – 1)
p – probability of success (r = 1) of a given discrete trial
q – probability of failure (r=0) of the given discrete trial
1=+ qp
n – number of independent trials
( )nkp , – probability of k successes in n independent trials (Bernoulli Trials)
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
Using the binomial theorem we obtain
( ) ( )∑=
−
−





==+
n
k
knkn
pp
k
n
qp
0
11
therefore the previous distribution is called binomial distribution.
Jacob
Bernoulli
-1654 1705
Given a random event r = {0,1}
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
The probability of k successful trials from n independent
trials is given by
The number of k successful trials from n independent trials is given by
( )!!
!
knk
n
k
n
−
=





with probability ( ) knk
pp
−
−1
to permutations
and Combinations
Distribution
Examples

79
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
( )
( )
( ) ( )
( ) ( )
( )
( )
( )
( ) ( )[ ] pnpppnpp
knk
n
pn
pp
ini
n
pnpp
ini
n
iXE
n
n
k
knk
ik
n
i
ini
n
i
ini
=−+=−
−−
−
=
−
−−
−
=−
−
=
−
−
=
−−
−=
=
−−
=
−
∑
∑∑
1
1
0
1
1
1
1
0
11
!1!
!1
1
!!1
!1
1
!!
!
Mean Value
( ) ( ) ( )
( )
( )
( ) ( )
( )[ ]nj
n
k
knkj
n
k
knkkjXj
pep
ppe
knk
n
pp
knk
n
eeE
−+=
−
−
=−
−
==Φ ∑∑ =
−
=
−
1
1
!!
!
1
!!
!
00
ω
ωωω
ω

80
( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
( )
( )
( )
( ) ( )
( )
( ) ( )
( )
( ) ( )
( )∑∑
∑∑
=
−
=
−
=
−
=
−
−
−−
+−
−−
=
−
−−
=−
−
=
n
i
ini
n
i
ini
n
i
ini
n
i
ini
pp
ini
n
pp
ini
n
pp
ini
n
ipp
ini
n
iXE
12
00
22
1
!!1
!
1
!!2
!
1
!!1
!
1
!!
!
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )
( ) ( )
( )
( )
( )[ ]
( )
( )
( )
( )[ ]
( ) pnpnn
pp
knk
n
pnpp
mnm
n
pnn
pp
ini
n
pnpp
ini
n
pnn
nn
pp
n
k
knk
pp
n
m
mnm
n
i
ini
n
i
ini
+−=
−
−−
−
+−
−−
−
−=
−
−−
−
+−
−−
−
−=
−−
−+
=
−−
−+
=
−−
=
−−
=
−−
∑∑
∑∑
2
1
0
1
1
0
22
1
1
2
22
1
1
!1!
!1
1
!2!
!2
1
1
!!1
!1
1
!!2
!2
1
12
    
( ) ( ) ( ) ( ) ( ) ( )ppnpnpnpnnXEXEXVar −=−+−=−= 11
2222
Variance

81
Let apply Chebyshev’s Inequality:
( ) pnxE = Mean Value
Variance( ) ( ) ( ) ( )ppnXEXExVar −=−= 1
22
( )[ ]{ } ( )( )[ ]
2
2
2
2
22
Pr
kk
XEXE
kXEX
σ
=
−
≤≥−
[ ]{ } ( )
2
22 1
Pr
k
ppn
kpnX
−
≤≥−
we obtain:
An upper bound to this inequality, when p varies (0 ≤ p ≤ 1), can be obtained by
taking the derivative of p (1 – p), equating to zero, and solving for p. The result is
p = 0.5.
[ ]{ } 2
22
4
Pr
k
n
kpnX ≤≥−
We can see that when k → ∞ ,i.e. X converges in
Probability to Mean Value n p. This is known as Bernoulli’s Theorem.
[ ]{ } 0Pr 22
P
kpnX →≥−
Convergence in
Probability

82
Generalized Bernoulli Trials
Consider now r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 =≠/=∩
with their sum equal to certain event S: SAAA r =∪∪∪ 21
and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211 
Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp 
We want to find the probability that in n trials will obtain A1, k1 times, A2, k2
times, and so on, and Ar, kr times such that nkkk r =+++ 21
!!!
!
21 rkkk
n

The number of possible combinations of k1 events A1, k2 events A2, …,kr events Ar
is and the probability of each combination is
rk
r
kk
ppp 21
21
We obtain the probability of the Generalized Bernoulli Trials as
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
Permu-
tations
& combi-
nations
Table of Content

83
Poisson Asymptotical Development (Law of Rare Events)
( ) ( )
( )
( ) knkknk
pp
knk
n
pp
k
n
nkp
−−
−
−
=−





= 1
!!
!
1,Start with the Binomial Distribution
We assume that n >> 1 and ( ) 1/1/ 00 <<≈+= nknkp
( ) ( ) 0
0
0
00
111,0 k
n
k
k
n
n
n
e
n
k
n
k
pnkp −
∞→
−
−
→














−=





−=−==
( ) ( ) ( )
( ) ( ) ( ) ( )nkp
k
k
n
k
n
n
k
n
n
nkp
k
k
n
k
n
knnn
n
k
n
k
k
knnn
nkp
k
k
k
k
k
k
k
knk
n
,0
!
1
1
1
1
1
,0
!
1
11
1
!
11
,
0
1
0
0
0
00
=






−





 −
−





−
==






−
+−−
=






−




+−−
=
→∞
→
−
  



( ) ( )0
0
exp
!
, k
k
k
nkp
k
−≈ This is Poisson Asymptotical Development (Law of Rare Events)
1781-1840
Distribution
Examples

84
Poisson Distribution
1781-1840
( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) ( ) ( )
( )
( )
( )
λ
λ
λλ
λ
λλ
λλ
λ
=−=
−
−=
−
= ∑∑∑
∞
=
−=∞
=
−∞
=

exp
0
1:
1
1
0 !
exp
!1
exp
!
exp
k
kik
i
i
i
i
kii
i
XE
Mean Value
Variance ( ) ( ) ( ) λ=−=
22
XEXExVar
( ) ( )[ ] ( ) ( ) ( ) ( )[ ]
( )[ ] ( )[ ]{ }1expexpexpexp
!
exp
exp
!
expexp
exp
00
−==
−=
−
==Φ
−
∞
=
∞
=
∑∑
ωλλω
λω
λ
λλω
ωω
λ
jje
m
j
m
mj
kjE
m
m
m
m
( ) ( ) ( )
( )
( )
( )
( )
( )
( )
λλ
λλ
λλλ
λ
λλ
λλ
λλ
+=












−
+
−
−=
−
−=
−
= ∑∑∑∑
∞
=
−∞
=
−∞
=
−∞
=
2
exp
1
1
exp
2
2
1
1
0
2
2
!1!2
exp
!1
exp
!
exp
 i
i
i
i
i
i
i
i
iii
i
i
i
XE

85
Moment Generating Function ( ) ( )[ ]{ }1expexp −=Φ ωλω j
Approximation to the Gaussian Distribution
( )[ ] ( ) ( ) ωλωλωλωλωλ sin2/sin2sin1cos1exp 2
jjj +−=+−=−
For λ sufficient large Φ (ω) is negligible for all but very small values of ω,
in which case ( ) ( ) ωωωω ≈≈ sin&2/2/sin
22
( ) ( )[ ]{ } 





+−≈−=Φ ωλ
λ
ω
ωλω jj
2
exp1expexp
2
( ) 





+−=Φ µωσωω j22
2
1
exp
For a normal distribution with mean μ and variance σ2
we found the
Moment Generating Function:
Therefore the Poisson Distribution can be approximated by a Gaussian
Distribution with mean μ = λ and variance σ2
= λ
( ) ( ) ( )





 −
−
−
=
λ
λ
λπ
λλ
λ
2
exp
2
1
~
!
exp
;
2
k
k
kp
k

86
1781-1840
( ) ( ) integerpositive
!
exp
; k
k
kp
k
λλ
λ
−
=
( ) ( ) ( )
!
,1
!
;;
00 k
k
i
e
ipkP
k
i
ik
i
λγλ
λλ
λ
+
=== ∑∑
=
−
=
( ) λ=xE
Mean Value
Variance( ) λ=xVar
( ) ( )∫ −= −
x
a
dtttxa
0
1
( ) ( )[ ]{ }1expexp −−=Φ ωλω j
Table of Content

87
Normal (Gaussian) Distribution
Karl Friederich Gauss
1777-1855
( )
( )
σπ
σ
µ
σµ
2
2
exp
,;
2
2





 −
−
=
x
xp
( ) ( )
∫
∞−





 −
−=
x
du
u
xP 2
2
2
exp
2
1
,;
σ
µ
σπ
σµ
( ) µ=xE
( ) σ=xVar
( ) ( )[ ]
( ) ( )






−=





 −
−=
=Φ
∫
∞+
∞−
2
exp
exp
2
exp
2
1
exp
22
2
2
σω
µω
ω
σ
µ
σπ
ωω
j
duuj
u
xjE
Mean Value
Variance
Distribution
Examples
Table of Content

88
De Moivre-Laplace Asymptotical Development
( ) ( )
( )
( ) knkknk
pp
knk
n
pp
k
n
nkp
−−
−
−
=−





= 1
!!
!
1,Start with the Binomial Distribution
Use Stirling asymptotical approximation ( ) n
nnnn −≈ exp2! π
( )
( )
( ) ( ) ( ) ( )
( )
knk
knk
knk
n
kn
qn
k
pn
knk
n
qp
knknknkkk
nnn
nkp
−
−
−






−






−
=
−+−−−
−
≈
π
ππ
π
2
exp2exp2
exp2
,
Define ( ) ( ) pnkkkpnk k 1:&1: 00 +−=−=+= δ
( )
( ) ( )
( ) ( )
( ) ( )
qkqk
pnk
qk
pkn
qpn
knk
qp
knk
n
nkp
nkp
k
knk
knk
δ
−=
+−
−=
+−
=
+−−
−
=
− +−−
−
1
1
1
1
1
!
!1!1
!!
!
,1
,
11
( ) ( ) ( )
( ) ( ) ( )nkpnkppnk
nkpnkppnk
,1,1
,1,1
−<⇒+>
−>⇒+<

89
De Moivre-Laplace Asymptotical Development (continue – 1)
( )
( )
knk
kn
qn
k
pn
knk
n
nkp
−






−






−
≈
π2
,
or
pnkpnk
n
k
n
−≈≈
>>>> 11
0 & δ
qpn=:σ
( ) ( )
nkk
qn
kn
pn
k
qpn
knk
nkp
−−−





 −











 −
≈
2
1
22
2
1
,
σπ
( )
( ) ( )kk pn
k
qn
kkk
pnqnpnqn
nkp
δδ
δδδδ
σπ
+−−−−−






+





−





+





−≈ 1111
2
1
,
2
1
2
1
2






+−−





++−






−





+=
2
1
2
1
2
11
2
1
kk qn
k
pn
k
qnpn
δδ
δδ
σπ
( ) ( )kk
k
k
qn
k
pn
k
pn
qn qnpn
δδδ
δ
δδ
σπ
−−+−>>+
>>−






−





+≈ 11
2
1
2
2
1
2
1

90
De Moivre-Laplace Asymptotical Development (continue – 2)
( )
( ) ( )kk qn
k
pn
k
qnpn
nkp
δδ
δδ
σπ
−−+−






−





+≈ 11
2
1
,
2
( )[ ] ( ) ( )
( )
( ) ( )
2
22
22
2
22
2
2
1ln
2
2
11
2
22
1ln1ln,2ln
2
2
σ
δδ
δδ
δ
δδ
δ
δ
δ
δ
δσπ
σ
k
qpn
k
kk
k
kk
k
x
xx
k
k
k
k
qnpn
qnqn
qn
pnpn
pn
qn
qn
pn
pnnkp
=
−≈+
≈





+≈








+−−







−+≈






−−+





++≈−



from which
( ) 







−≈ 2
2
2
2
exp
2
1
,
σ
δ
σπ
k
nkp
(1749-1827)
Distribution
Examples
Abraham de Moivre
(1667-1754)
This result was first published by De Moivre in
1756 in “The Doctrine of Chance” 3th Ed. and
reviewed by Laplace, “Théorie Analytiques de
Probabilités”, 1820
Central
Limit
Theorem

91
De Moivre-Laplace Asymptotical Development for Generalized Bernoulli Trials
Consider the r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 =≠/=∩
with their sum equal to certain event S: SAAA r =∪∪∪ 21
and the probability of occurrence ( ) ( ) ( ) rr pAppAppAp === ,,, 2211 
Therefore ( ) ( ) ( ) 12121 =+++=+++ rr pppApApAp 
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and
Ar, kr times such that nkkk r =+++ 21
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
For n goes to infinity and we havenpnknpn iii +≤≤−
( ) ( )
( ) r
r
r
rr
k
r
kk
r ppn
pn
pnk
pn
pnk
ppp
kkk
n r




1
1
2
1
2
11
21
21 2
2
1
exp
!!!
! 21
−













 −
++
−
−
→
π

92
De Moivre-Laplace Asymptotical Development for Generalized PoissonTrials
Consider the r-1 mutually exclusive events A1, A2, …, Ar-1
rjiOAA ji ,,2,1 =≠/=∩
with small probability of occurrence ( ) ( ) ( ) 112211 ,,, −− === rr pAppAppAp 
such that ( ) ( ) ( ) 11:121121 <<−=+++=+++ −− rrr ppppApApAp 
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and
Ar-1, kr-1 times such that rr knkkk −=+++ − :121 
( ) rk
r
kk
r
r ppp
kkk
n
nkkkp 

 21
21
21
21
!!!
!
,,,, =
For n goes to infinity
( ) ( ) ( ) ( )
!
exp
!
exp
!!!
!
1
11
1
11
21
21
11
21
−
−− −−
→
−
r
r
k
r
k
k
r
kk
r k
pnpn
k
pnpn
ppp
kkk
n r
r


Table of Content

93
Laplacian Distribution
(1749-1827)
( )







 −
−=
b
x
b
bxp
µ
µ exp
2
1
,;
( ) ∫
∞− 






 −
−=
x
du
b
u
b
bxP
µ
µ exp
2
1
,;
( ) µ=xE
( ) 2
2bxVar =
( ) ( )[ ]
( )
( )
22
1
exp
expexp
2
1
exp
b
j
duuj
b
u
b
xjEX
ω
µω
ω
µ
ωω
+
=





 −
−=
=Φ
∫
∞+
∞−
Mean Value
Variance
Distribution
Examples
Table of Content

94
Gama Distribution
( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
( ) θkxE =
( ) 2
θkxVar =
( ) ( )[ ]
( ) k
X
j
xjE
−
−=
=Φ
θω
ωω
1
exp
Mean Value
Variance
( )
( )
( )





<
≥
Γ=
00
0
/,
,;
x
x
k
xk
kxP
θγ
θ
∞
−
−=Γ
0
1
exp dttta a
( ) ( )∫ −= −
x
a
dtttxa
0
1
Distribution
Examples
Table of Content

95
Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
( ) ( )
( ) ( )
( )∫
−−
−
ΓΓ
+Γ
=
x
duuuxP
0
11
1,;
βα
βα
βα
βα
( )
βα
α
+
=xE
( )
( ) ( )1
2
+++
=
βαβα
βα
xVar
( ) ( )[ ]
( )
∑ ∏
∞
=
−
=






++
+
+=
=Φ
1
1
0 !
1
exp
k
kk
r
X
k
j
r
r
xjE
ω
βα
α
ωω
Mean Value
Variance
∞
−
−=Γ
0
1
exp dttta a
Distribution
Examples
Beta Distribution
Example Table of Content

96
Cauchy Distribution
Augustin Louis Cauchy
)-1789 1857(
( )
( ) 





+−
=













 −
+
= 22
0
2
0
0
1
1
1
,;
γ
γ
π
γ
γπ
γ
xxxx
xxp
( )
2
1
arctan
1
,; 0
0 +




 −
=
γπ
γ
xx
xxP
Mean Value not defined
Variance not defined
Moment Generating Function not defined
Distribution
Examples

97
Cauchy Distribution
( )





≤≤−
=Θ
elsewere
p
0
2
1
11
1
θθθ
θθ
Example of Cauchy Distribution Derivation
Assume a particle is leaving the origin, moving
with constant velocity toward a wall situated at a
distance a from the origin. The angle θ, between
particle velocity vector and Ox axis, is a random
variable uniform distributed between – θ1 and + θ1.
Find the probability distribution function of y, the
distance from Ox axis at which the particle hits the
wall. θtanay =
( ) ( )
( )
( )
( )
( )





≤≤−
+=





≤≤−
+== Θ
elsewere
ya
a
elsewere
a
a
d
d
p
ypY
0
2/
0
tan1
2/1
tan
1122
1
112
1
θθθ
θ
θθθ
θ
θ
θ
θ
θ
Therefore we obtain
Functions of
One Random Variable
Table of Content

98
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
( ) ( )
( )
( ) ( )
λ
λλ
λλ
λλ
1
expexp
exp
0
0exp
0
=−+−−=
−=
∫
∫
∞
∞
=
−=
∞
dxxxx
dxxxxE
xu
dxxdv
( ) ( ) ( ) 2
22 1
λ
=−= xExExVar
( ) ( )[ ] ( ) ( )
( )[ ]
1
0
0
1exp
expexpexp
−∞
∞






−=−
−
=
−==Φ ∫
λ
ω
λω
λω
λ
λλωωω
j
xj
j
dxxxjxjEX
Mean Value
Variance
( ) ( )
( )



<
≥−−
=−= ∫∞−
00
0exp1
exp;
x
xx
dxxxP
x
λ
λλλ
( ) ( ) 2
0
2
2
22 2
λω ω
=
Φ
=
=
d
d
jxE X
Distributions
examples
Table of Content

99
Chi-square Distribution
( )
( )
( )
( )
( )





<
≥−
Γ=
−
00
02/exp
2/
2/1
;
2/2
2/
x
xxx
kkxp
k
k
( ) kxE =
( ) kxVar 2=
( ) ( )[ ]
( ) 2/
21
exp
k
X
j
xjE
−
−=
=Φ
ω
ωω
Mean Value
Variance
( )
( )
( )





<
≥
Γ=
00
0
2/
2/,2/
;
x
x
k
xk
kxP
γ
∞
−
−=Γ
0
1
exp dttta a
( ) ( )∫ −= −
x
a
dtttxa
0
1
Distributions
examples

100
Derivation of Chi and Chi-square Distributions
Given k normal random independent variables X1, X2,…,Xk with zero men values and
same variance σ2
, their joint density is given by
( )
( ) ( ) 




 ++
−=






−
= ∏
=
2
22
1
2/
1
2/1
2
2
1
2
exp
2
1
2
2
exp
,,1
σσπσπ
σ k
kk
k
i
i
normal
tindependen
kXX
xx
x
xxp k


Define
Chi-square 0::
22
1
2
≥++== kk
xxy χ
Chi 0:
22
1
≥++= kk
xx χ
( ) 



 +≤++≤=Χ kkkkkk
dxxdp k
χχχχχ
22
1
Pr 
The region in χk space, where pΧk
(χk) is constant, is a hyper-shell of a volume
(A to be defined)
χχ dAVd k 1−
=
( )
( ) 

Vd
kk
kkkkkkkk
dAdxxdp k
χχ
σ
χ
σπ
χχχχχ 1
2
2
2/
22
1
2
exp
2
1
Pr −
Χ 





−=



 +≤++≤=
( )
( ) 





−=
−
Χ 2
2
2/
1
2
exp
2 σ
χ
σπ
χ
χ k
kk
k
k
A
p k
Compute
1x
2x
3x
χ
χdχχπ ddV 2
4=

101
Derivation of Chi and Chi-square Distributions (continue – 1)
( )
( )
( )k
k
kk
k
k U
A
p k
χ
σ
χ
σπ
χ
χ 





−=
−
Χ 2
2
2/
1
2
exp
2
Chi-square 0:
22
1
2
≥++== kk
xxy χ
( ) ( ) ( ) ( ) ( )
( )





<
≥





−
=








−+==
−
Χ
00
0
2
exp
22
1 2
2/1
2/
0
2
2
2
y
y
y
y
y
A
ypyp
d
yd
ypp
k
kk
y
k
Yk kkk
σσπ
χ
χ χχ


A is determined from the condition ( ) 1=∫
∞
∞−
dyypY
( )
( ) ( )
( ) ( )
( )2/
2
12/
222
exp
22
2/
2/2
0
2
2
2
22/
k
Ak
Ay
d
yyA
dyyp
k
k
k
kY
Γ
=→=Γ=











−





= ∫∫
∞
−
∞
∞−
π
πσσσπ
( ) ( )
( )
( )
( )yU
yy
k
kyp
kk
Y 





−





Γ
=
−
2
2/2
2
2/
2
exp
2/
2/1
,;
σσ
σ
∞
−
−=Γ
0
1
exp dttta a
( ) ( ) ( )
( )
( )k
k
k
k
k
k
k
U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1
( )



<
≥
=
00
01
:
a
a
aU
Function of
One Random
Variable

102
Table of Content
Chi-square 0:
22
1
2
≥++== kk
xxy χ
Mean Value { } { } { }2 2 2 2
1k kE E x E x kχ σ= + + =
{ }
( ){ } ( ){ }
4
2 42 2 4
0
1, ,
& 3
th
i
i i
Moment of a
Gauss Distribution
x i i i i
x E x
i k
E x x E x xσ σ σ
 = =

=
 = − = − =


( ){ } ( ){ }
{ } { }
( )
( )
2
4
2 4
2
2 22 2 2 2 2 4 2 2 4
1
2 2 2 4 4 2 2 2 4
1 1 1 1 1
3
2 2 4 4
3 2
k
k
k k i
i
k k k k k
i j i i j
i j i i j
i j
k k
E k E k E x k
E x x k E x E x x k
k k k k k
χ
σ
σ
σ χ σ χ σ σ
σ σ
σ σ
=
= = = = =
≠
−
   
= − = − = −  ÷
   
    
= − = + −  ÷ ÷
    
= + − − =
∑
∑ ∑ ∑ ∑∑

Variance ( ){ }2
22 2 2 4
2
k
kE k kχ
σ χ σ σ= − =
where xi
are gaussian
with
Gauss’ Distribution

103
Tail probabilities of the chi-square and normal densities.
The Table presents the points on the chi-square
distribution for a given upper tail probability
{ }xyQ >= Pr
where y = χn
2
and n is the number of degrees
of freedom. This tabulated function is also
known as the complementary distribution.
An alternative way of writing the previous
equation is: { } ( )QxyQ n −=≤=− 1Pr1
2
χ
which indicates that at the left of the point x
the probability mass is 1 – Q. This is
100 (1 – Q) percentile point.
Examples
1. The 95 % probability region for χ2
2
variable
can be taken at the one-sided probability
region (cutting off the 5% upper tail): ( )[ ] [ ]99.5,095.0,0
2
2 =χ
.5 99
2. Or the two-sided probability region (cutting off both 2.5% tails): ( ) ( )[ ] [ ]38.7,05.0975.0,025.0
2
2
2
2 =χχ
.0 51
.0 975 .0 025.0 05
.7 38
3. For χ1002 variable, the two-sided 95% probability region (cutting off both 2.5% tails) is:
( ) ( )[ ] [ ]130,74975.0,025.0
2
100
2
100 =χχ
74
130

104
Note the skewedness of the chi-square
distribution: the above two-sided regions are
not symmetric about the corresponding means
{ } nE n =
2
χ
Tail probabilities of the chi-square and normal densities.
For degrees of freedom above 100, the
following approximation of the points on the
chi-square distribution can be used:
( ) ( )[ ]22
121
2
1
1 −+−=− nQQn Gχ
where G ( ) is given in the last line of the Table
and shows the point x on the standard (zero
mean and unity variance) Gaussian distribution
for the same tail probabilities.
In the case Pr { y } = N (y; 0,1) and with
Q = Pr { y>x }, we have x (1-Q) :=G (1-Q)
.5 99.0 51
.0 975 .0 025.0 05
.7 38
Table of Content

105
Student’s t-Distribution
( ) ( )[ ]
( ) ( )( ) 2/12
/12/
2/1
; +
+Γ
+Γ
= ν
ννπν
ν
ν
x
xp
( )



=
>
=
1
10
ν
ν
undefined
xE
( )
( )



∞
>−
=
otherwise
xVar
22/ ννν
Mean Value
Variance
Moment Generating Function not defined
( ) ( )[ ]
( )
∑
∞
=






−











 +






Γ
+Γ
+=
0
2
!
2
3
2
1
2
1
2/
2/1
2
1
;
n
n
n
nn
n
x
x
xP
ν
ν
ννπ
ν
ν
∞
−
−=Γ
0
1
exp dttta a
( ) ( ) ( ) ( )121: −+++= naaaaa n 
It get his name after W.S. Gosset that wrote
under pseudonym “Student”
William Sealey
Gosset
1876 - 1937
Distributions
examples
Table of Content

106
( )





>>
≤≤
−=
bxxa
bxa
abbaxp
0
1
,;
( )
2
ba
xE
+
=
( ) ( )
12
2
ab
xVar
−
=
( ) ( )[ ]
( ) ( )
( )abj
ajbj
xjE
−
−
=
=Φ
ω
ωω
ωω
expexp
exp
Mean Value
Variance
( )







>
≤≤
−
−
>
=
bx
bxa
ab
ax
xa
baxP
1
0
,;
Distributions
examples
Moments
Table of Content

107
( ) 2
2
2
2
exp
;
σ
σ
σ






−
=
x
x
xp
( )
2
π
σ=xE
( ) 2
2
4
σ
π−
=xVar
Mean Value
Variance
( ) 





−−= 2
2
2
exp1;
σ
σ
x
xP
( ) ( )








−





−−=Φ jerfi
22
2/exp1 22 σωπ
σωσωω
John William Strutt
Lord Rayleigh
(1842-1919)
Distributions
examples
Moments
Rayleigh Distribution is the chi-distribution with k=2( ) ( ) ( )
( )
( )k
k
k
k
k
k
k U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1

108
Given X and Y, two independent gaussian random variables, with zero means and the
same variances σ2
Example of Rayleigh Distribution
( ) 




 +
−= 2
22
2
2
exp
2
1
,
σσπ
yx
yxpXY
find the distributions of R and Θ given by: ( )XYYXR /tan& 122 −
=Θ+=
( ) ( )
( ) ( ) θθ
σπ
θ
σ
σπσ
θθ
dprdrp
drdrr
ydxdyx
ydxdyxpdrdrp
r
XYR
Θ
Θ
=





−=





 +
−==
22
2
22
22
22
exp
22
exp,,
where:
( ) πθ
π
θ 20
2
1
≤≤=Θp
( ) 0
2
exp 2
2
2
≥





−= r
rr
rpr
σσ
Uniform Distribution
Solution
Table of Content
x
y
r
θ

109
Rice Distribution
( ) 










 +
−
= 202
2
22
2
exp
,;
σσ
σ
σ
vx
I
vx
x
vxp
( )
2
π
σ=xE
( ) 2
2
4
σ
π−
=xVar
Mean Value
Variance
( ) 





−−= 2
2
2
exp1;
σ
σ
x
xP
( ) ( )








−





−−=Φ jerfi
22
2/exp1 22 σωπ
σωσωω
Stuart Arthur Rice
1889 - 1969
Distributions
examples
where:
( ) ( )∫ 





−=




 π
ϕ
σ
ϕ
πσ
2
0
220
'
2
'cos
exp
2
1
d
vxvx
I

110
Rice Distribution
The Rice Distribution applies to the statistics of the envelope of the output of a bandpass
filter consisting of signal plus noise.
Example of Rice Distribution
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )[ ] ( ) ( )[ ] ( )tAtntAtn
ttnttntAtnts
SC
SC
00
000
sinsincoscos
sincoscos
ωϕωϕ
ωωϕω
−++=
+++=+
X = nC (t) and Y = nS (t) are gaussian random variables, with zero mean and the same
variances σ2
and φ is the unknown but constant signal phase.
Define the output envelope R and phase Θ:
( )[ ] ( )[ ]
( )[ ] ( )[ ]{ }ϕϕ
ϕϕ
cos/sintan
sincos
1
22
AtnAtn
AtnAtnR
CS
SC
+−=Θ
−++=
−
( ) ( ) ( ) ( )
( )
222
22
22
2
2
2
22
cos
exp
2
exp
22
sin
exp
2
cos
exp,,
σπ
θ
σ
θϕ
σ
σπσ
ϕ
σ
ϕ
θθ
drdrrAAr
ydxdAyAx
ydxdyxpdrdrp XYR





 +
−




 +
−=





 −
−




 +
−==Θ
Solution
( ) ( ) ( ) ( )∫∫ +




 +
−




 +
−== Θ
ππ
θϕ
σ
θϕ
σπσ
θθ
2
0
222
222
0 2
cos
exp
22
exp, d
rArAr
drprp RR

111
Rice Distribution
Example of Rice Distribution (continue – 1)
( ) ( ) ( ) ( )∫∫ 





−




 +
−== Θ
ππ
ϕ
σ
ϕ
πσσ
θθ
2
0
22
22
2
2
0
'
2
'cos
exp
2
1
2
exp, d
rAArr
drprp RR
where:
( ) ( )∫ 





−=




 π
ϕ
σ
ϕ
πσ
2
0
220 '
2
'cos
exp
2
1
d
rAAr
I
is the zero-order modified Bessel function of the first kind
( ) 










 +
−= 202
22
2
2
exp,;
σσσ
σ
Ar
I
Arr
ArpR Rice Distribution
Since I0 (0) = 1, if in the Rice Distribution we take A = 0 we obtain:
Rayleigh Distribution( ) 





−== 2
2
2
2
exp,0;
σσ
σ
rr
ArpR
Table of Content

112
Weibull Distribution
( )





<
>≥













 −
−




 −
=
−
00
0,,exp
,,;
1
x
x
xx
xp
αγµ
α
µ
α
µ
α
γ
αµγ
γγ
( ) ( )













 −
−−== ∫∞−
γ
α
µ
αµγαµγ
x
dxxpxP
x
exp1,,;,,;
( ) 





+Γ=
γ
α
1
1xE
∞
−
−=Γ
0
1
exp dttta a
Ernst Hjalmar
Waloddi Weibull
1887 - 1979
Mean Value
Variance( ) ( )22 2
1 xExVar −





+Γ=
γ
α
Distributions
examples
Table of Content

113
KINETIC THEORY OF GASESSOLO
MAXWELL’S VELOCITY DISTRIBUTION
IN 1859 MAXWELL PROPOSED THE FOLLOWING MODEL:
ASSUME THAT THE VELOCITY COMPONENTS OF N
MOLECULES, ENCLOSED IN A CUBE WITH SIDE l, ALONG EACH
OF THE THREE COORDINATE AXES ARE INDEPENDENTLY AND
IDENTICALLY DISTRIBUTED ACCORDING TO THE DENSITY f0(α)
= f0(-α), I.E.,
JAMES CLERK
MAXWELL
(1831 – 1879)
( ) ( ) ( ) ( )
( ) ( )[ ] zyx
zyxzzyyxx
vdvdvdvvvvBA
vdvdvdvvfvvfvvfvdvf
00
000000
3
0
exp


−⋅−−=
−−−=
f (Vi) d Vi = THE PROBABILITY THAT THE i VELOCITY
COMPONENTS IS BETWEEN vi AND vi + d vi ; i=x,y,z
MAXWELL ASSUMED THAT THE DISTRIBUTION DEPENDS
ONLY ON THE MAGNITUDE OF THE VELOCITY.

114
MAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
SINCE THE DEFINITION OF THE TOTAL NUMBER OF PARTICLES N IS:
( )∫ ∫= tvrfvdrdN ,,33 
WE HAVE IN EQUILIBRIUM
( ) ( )[ ]
( ) ( ) ( ) 2
3
222
222
0
3
expexpexp
exp






=−−−=
++−==
∫∫∫
∫ ∫ ∫∫
∞
∞−
∞
∞−
∞
∞−
∞
∞−
∞
∞−
∞
∞−
B
AdvvBdvvBdvvBA
dvdvdvvvvBAvfvd
V
N
zzyyxx
zyxxxx
π

WHERE V IS THE VOLUME OF THE CONTAINER
∫= rdV 3
IT FOLLOWS THAT B > 0 AND
V
NB
A
2
3






=
π
LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2
00 exp vvBAvf

−−=0v


115
LET FIND THE CONSTANTS A, B AND IN ( ) ( )[ ]2
00 exp vvBAvf

−−=0v

THE AVERAGE VELOCITY IS GIVEN BY:
( )
( )
( ) ( )[ ]
( ) [ ] 00
3
00
3
0
3
0
3
exp
exp
vvvBvvvd
N
VA
vvvvBvvd
N
VA
vfvd
vfvvd
v





=⋅−+=
−⋅−−==
∫
∫
∫
∫
THE AVERAGE KINEMATIC ENERGY OF THE MOLECULES ε WHEN IS00

=v
( )
( )
( ) B
m
vBvvd
N
VAm
vfvd
vfvmvd
4
3
exp
2
2
1
223
0
3
0
23
=−== ∫
∫
∫


ε
WE FOUND ALSO THAT FOR A MONOATOMIC GAS Tk
2
3
=ε
Tk
mm
B
24
3
==
ε V
N
Tk
m
V
NB
A
2
3
2
3
2 





=





=
ππ
THEREFORE

116
MAXWELL VELOCITY DISTRIBUTION BECOMES
( ) 





⋅−





= vv
Tk
m
Tk
m
V
N
vf

2
exp
2
2
3
0
π
( ) ( ) ( ) ( )
( ) zyxzyx
zyxzyx
vdvdvdvvv
Tk
m
Tk
m
V
N
vdvdvdvfvfvfvdvf






++−





=
=
222
2/3
0
2
exp
2π
3
OR

117
( ) 





⋅−





= vv
Tk
m
Tk
m
V
N
vf

2
exp
2
2
3
0
π
Table of Content
Maxwell’s Distribution is the chi-distribution with k=3( ) ( ) ( )
( )
( )k
k
k
k
k
k
k
U
k
p k
χ
σ
χ
σ
χ
χ 







−
Γ
=
−−−
Χ 2
212/2
2
exp
2/
2/1

118
MOLECULAR MODELS
BOLTZMANN STATISTICS
• DISTINGUISHABLE PARTICLES
• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
BOSE-EINSTEIN STATISTICS
• INDISTINGUISHABLE PARTICLES
FERMI-DIRAC STATISTICS
• ON PARTICLE PER QUANTUM STATE.
LUDWIG BOLTZMANN
SATYENDRANATH N. BOSE ALBERT EINSTEIN
ENRICO FERMI PAUL A.M. DIRAC
∏ 







=
j j
N
j
N
g
Nw
j
!
!
( )
( )∏ 







−
−+
=
j jjj
jj
Ng
Ng
w
!!1
!1
( )∏ 







−
=
j jjjj
j
NNg
g
w
!!
!
∑=
j
jNN ∑=
j
jj NE 'ε
NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
Table of Content

119
MOLECULAR MODELS
• DISTINGUISHABLE PARTICLES
LUDWIG BOLTZMANN
∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!NUMBER OF MICROSTATES
NUMBER OF WAYS N DISTINGUISHABLE
PARTICLES CAN BE DIVIDED IN GROUPS
WITH N1, N2,…,Nj,…PARTICLES IS
∑=
j
jNN
∏j
jN
N
!
!
NUMBER OF WAYS Nj PARTICLES CAN BE PLASED IN THE gj STATES IS
jN
jg
A MACROSTATE IS DEFINED BY
- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES
N1,N2,…Nj
IN STATES g1,g2, …,gj
j',,',' 21 εεε 

120
THE MOST PROBABLE MACROSTATE –
THE THERMODYNAMIC EQUILIBRIUM STATE
BOLTZMANN STATISTICS ∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
USING STIRLING FORMULA
0'' ==⇒= ∑∑ EdNdNE
j
jj
j
jj εε
( ) aaaa −≈ ln!ln
( ) ( )∑∑ /+−+/−≈−+=
j
jjjjj
STIRLING
j
jjj NNNgNNNNNgNNw lnlnln!lnln!lnln
( ) ( ) 0lnlnln =−−= ∑j
jjjjj NdNNdgNdwd
0==⇒= ∑∑ NdNdNN
j
j
j
j
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST
COMPUTE THE DIFFERENTIAL
CONSTRAINTED BY:

121
(CONTINUE) ∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
0' =∑j
jj Ndεβ
( ) 0lnln =








=− ∑j
j
j
j
Nd
g
N
wd
0=∑j
jNdα
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'
*
ln0'ln =++








⇒=








++








∑ j
j
j
j
jj
j
j
g
N
Nd
g
N
εβαεβα
βα,
TO OBTAIN
OR j
eegN jBoltzj
'
*
εβα −−
=
BOLTZMANN
MOST PROBABLE MACROSTATE
Table of Content

122
MOLECULAR MODELS
∑=
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED
IN THE gj STATES IS
( )
( ) !!1
!1
jj
jj
Ng
Ng
−
−+
- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES
N1,N2,…Nj
SATYENDRANATH N. BOSE
(1894-1974)
ALBERT EINSTEIN
(1879-1955)
j',,',' 21 εεε 
( )
( )∏ −
−+
=−
j jj
jj
EB
Ng
Ng
w
!!1
!1

123
(CONTINUE)
USING STIRLING FORMULA ( ) aaaa −≈ ln!ln
( )[ ] ( ) ( ) ( )[ ]
( )∑
∑∑
















+++=
/+−/+−/+/−++≈−−+=
j j
j
jjjj
j
jjjjjjjjjjjj
STIRLING
j
jjjj
g
N
gNgN
NNNggggNgNgNNggNw
1ln/1ln
lnlnln!ln!ln!ln
( ) 01ln
1
1
1
1lnln
2
=








+=














+
/+
+
−
/+








+= ∑∑ j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
j
Nd
N
g
Nd
g
N
g
g
N
g
N
g
N
N
g
wd
( )
( )
( )
∏∏
+
≈
−
−+
=−
j jj
jj
j jj
jj
EB
Ng
Ng
Ng
Ng
w
!!
!
!!1
!1

124
(CONTINUE)
0' =∑j
jj Ndεβ0=∑j
jNdα
WE OBTAIN
0'
*
1ln0'1ln =−−








+⇒=








−−








+∑ j
j
j
j
jj
j
j
N
g
Nd
N
g
εβαεβα
βα,
TO OBTAIN
OR
1
* '
−
=− j
ee
g
N
j
EBj εβα
BOSE-EINSTEIN
( )
( )
( )
∏∏
+
≈
−
−+
=−
j jj
jj
j jj
jj
EB
Ng
Ng
Ng
Ng
w
!!
!
!!1
!1
( ) 01lnln =








+= ∑j
j
j
j
Nd
N
g
wd
Table of Content

125
MOLECULAR MODELS
∑=
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED
IN THE gj STATES IS ( ) !!
!
jjj
j
NNg
g
−
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!
• ON PARTICLE PER QUANTUM STATE.
ENRICO FERMI
(1901-1954)
PAUL A.M. DIRAC
(1902-1984)
- QUANTUM STATES g1,g2,…,gj AT THE
ENERGY LEVELS
- NUMBER OF PARTICLES N1,N2,…Nj AT
THE ENERGY LEVELS
j',,',' 21 εεε 
j',,',' 21 εεε 

126
(CONTINUE)
USING STIRLING FORMULA ( ) aaaa −≈ ln!ln
( )[ ] ( ) ( ) ( )[ ]
( ) ( )[ ]∑
∑∑
−−−−=
/+−/−/+−−−/−≈−−−=
j
jjjjjjjj
j
jjjjjjjjjjjj
STIRLING
j
jjjj
NNNgNggg
NNNNgNgNggggNNggw
lnlnln
lnlnln!ln!ln!lnln
( ) ( )[ ] 0lnlnlnln =







 −
=−−−+= ∑∑ j
j
j
jj
j
jjjjjjj Nd
N
Ng
NdNNdNdNgNdwd
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!

127
(CONTINUE)
0' =∑j
jj Ndεβ0=∑j
jNdα
WE OBTAIN
0'
*
*
ln0'ln =−−







 −
⇒=








−−







 −
∑ j
j
jj
j
jj
j
jj
N
Ng
Nd
N
Ng
εβαεβα
βα,
TO OBTAIN
OR
1
* '
+
=− j
ee
g
N
j
DFj εβα
FERMI-DIRAC
( ) 0lnln =







 −
= ∑j
j
j
jj
Nd
N
Ng
wd
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!

128
FERMI-DIRAC
STATISTICS
OR
( )∏ −
=−
j jjj
j
DF
NNg
g
w
!!
!( )
( )∏ −
−+
=−
j jj
jj
EB
Ng
Ng
w
!!1
!1
BOSE-EINSTEIN
STATISTICS
∏ 







=
j j
N
j
Boltz
N
g
Nw
j
!
!
BOLTZMANN
STATISTICS
FOR GASES AT LOW PRESSURES OR HIGH TEMPERATURE THE NUMBER
OF QUANTUM STATES gj AVAILABLE AT ANY LEVEL IS MUCH LARGER
THAN THE NUMBER OF PARTICLES IN THAT LEVEL Nj.
jj Ng >>
( )
( ) ( )( ) ( ) j
jj
N
j
Ng
jjjjj
j
jj
gNgggg
g
Ng >>
≈−+++=
−
−+
121
!1
!1

( ) ( ) ( ) j
jj
N
j
Ng
jjjj
jj
j
gNggg
Ng
g >>
≈+−−=
−
11
!
!

∏ 







=≈≈
>>
−
>>
−
j j
N
jBoltz
Ng
DF
Ng
EB
N
g
N
w
ww
jjjjj
!!
AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj
'
***
εβα −−
>>
−
>>
− =≈≈

129
∏ 







=≈≈
>>
−
>>
−
j j
N
jBoltz
Ng
DF
Ng
EB
N
g
N
w
ww
jjjjj
!!
AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj
'
***
εβα −−
>>
−
>>
− =≈≈
DIVIDING THE VALUE OF w FOR BOLTZMANN STATISTICS, WHICH
ASSUMED DISTINGUISHABLE PARTICLES, BY N! HAS THE EFFECT OF
DISCOUNTING THE DISTINGUISHABILITY OF THE N PARTICLES.
Table of Content

130
Monte Carlo Method
Monte Carlo methods are a class of computational algorithms that
rely on repeated random sampling to compute their results. Monte
Carlo methods are often used when simulating physical and
mathematical systems. Because of their reliance on repeated
computation and random or pseudo-random numbers, Monte Carlo
methods are most suited to calculation by a computer. Monte Carlo
methods tend to be used when it is infeasible or impossible to
compute an exact result with a deterministic algorithm.
The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam,
Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear
weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo
Casino in Monaco where Ulam's uncle would borrow money to gamble)
Stanislaw Ulam
1909 - 1984
Enrico - Fermi
1901 - 1954
John von Neumann
1903 - 1957
Monte Carlo Casino
Nicholas Constantine Metropolis
(1915 –1999)

131
Monte Carlo Approximation
Monte Carlo runs, generate a set of random samples that approximate the distribution p (x).
So, with P samples, expectations with respect to the filtering distribution are approximated by
( ) ( ) ( )
( )∑∫ =
≈
P
L
L
xf
P
dxxpxf
1
1
and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution
up to some degree of approximation.
{ } ( ) ( )
∑∫ =
≈==
P
L
L
x
P
dxxpxxE
1
1
1
µ
( ){ } ( ) ( ) ( )
( )∑∫ =
−≈−=−=
P
L
nLnn
n x
P
dxxpxxE
1
111
1
µµµµ

Table of Content
x(L)
are generated (draw) samples from distribution p (x)
( )
( )xpx L
~

132
Estimation of the Mean and Variance of a Random Variable (Unknown Statistics)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,
and sample variance, , as estimates of the population mean, m, and variance, σ2
.
2
ˆkσ
kmˆ
( )
{ }
( ) ( ) ( )[ ] ( ) ( )[ ]
2
1
2
1
222
2
22222
1 11
2
1
2
2
11
2
1
2
11
1
1
1
1
1
21
11
2
1
ˆˆ2
1
ˆ
1
σσ
σσσ
k
k
kk
mkmkk
k
mmk
k
m
k
xx
k
Ex
k
xExE
k
mxmxE
k
mx
k
E
k
i
k
i
k
i
k
l
l
k
j
j
k
j
jii
k
k
i
ik
k
i
i
k
i
ki
−
=





−=






++−+++−−+=














+






−=






+−=






−
∑
∑
∑ ∑∑∑
∑∑∑
=
=
= ===
===
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Compute
Biased
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.

133
Estimation of the Mean and Variance of a Random Variable (continue - 1)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
.
2
ˆkσ
kmˆ
( ) 2
1
2 1
ˆ
1
σ
k
k
mx
k
E
k
i
ki
−
=






−∑=
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Biased
Unbiased
Therefore, the unbiased estimation of the sample variance of the population is defined as:
( )∑=
−
−
=
k
i
kik mx
k 1
22
ˆ
1
1
:ˆσ since { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
Unbiased

134
.
2
ˆkσ
kmˆ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE

135
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EEWe found:
Let Compute:
( ){ } ( )
( ){ } ( ) ( ){ }
( ){ } ( ){ } ( ){ }
k
mxEmxEmxE
k
mxmxEmxE
k
mx
k
Emx
k
EmmE
k
i
k
ij
j
ji
k
i
i
k
i
k
ij
j
ji
k
i
i
k
i
i
k
i
ikmk
2
1 1
00
1
2
2
1 11
2
2
2
1
2
1
22
ˆ
2
1
1
11
ˆ:
σ
σ
σ
=










−−+−=










−−+−=














−=














−=−=
∑ ∑∑
∑∑∑
∑∑
=
≠
==
=
≠
==
==

( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=

136
Let Compute:
( ){ } ( ) ( )
( ) ( ) ( ) ( )[ ]
( ) ( ) ( ) ( )














−−
−
+−
−
−
+−
−
=














−−+−−+−
−
=














−−+−
−
=














−−
−
=−=
∑∑
∑
∑∑
==
=
==
2
22
11
2
2
2
1
22
2
2
1
2
2
2
1
22222
ˆ
ˆ
11
ˆ2
1
1
ˆˆ2
1
1
ˆ
1
1
ˆ
1
1
ˆ:2
σ
σ
σσσσσσ
k
k
i
i
k
k
i
i
k
i
kkii
k
i
ki
k
i
kik
mm
k
k
mx
k
mm
mx
k
E
mmmmmxmx
k
E
mmmx
k
Emx
k
EE
k
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−

137
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−
( )
( )
( ) ( ) ( )
( ){ }
( ) ( ) ( ) ( ) ( ) ( )
( ){ }4
2
2
4
22
4
44
2
4
44
2
2
2
4
2
4
2
42
ˆ
ˆ
11
7
11
2
1
2
1
2
ˆ
11
4
1
1
1
2
k
k
mmE
k
k
k
k
k
k
kk
k
k
k
mmE
k
k
kk
kk
k
k
k
−
−
+
−
+−
+
−
=
−
−
−
−
−
+
+−
−
+
−
+
−
−
+
−
≈
σ
µσσσ
σ
σσµ
σσ
kk
4
42
ˆ 2
σµ
σσ
−
≈ ( ){ }4
4 : mxE i −=µ
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==

138
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
We found:
( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=
( ){ } ( )
k
mx
k
EE
k
i
kik
k
4
4
2
2
1
22222
ˆ
ˆ
1
1
ˆ:2
σµ
σσσσσ
−
≈














−−
−
=−= ∑=
( ){ }4
4 : mxE i −=µ
Kurtosis of random variable xi
Define
4
4
:
σ
µ
λ =
( ){ } ( ) ( )
k
mx
k
EE
k
i
kik
k
42
2
1
22222
ˆ
1
ˆ
1
1
ˆ:2
σλ
σσσσσ
−
≈














−−
−
=−= ∑=

[ ] ϕσσσ σσ =≤≤
2
ˆ
2
k
2
k
ˆ-0Prob n
For high values of k, according to the Central Limit Theorem the estimations of mean
and of variance are approximately Gaussian Random Variables.
kmˆ
2
ˆkσ
We want to find a region around that
will contain σ2
with a predefined probability
φ as function of the number of iterations k.
2
ˆkσ
Since are approximately Gaussian Random
Variables nσ is given by solving:
2
ˆkσ
ϕζζ
π
σ
σ
=





−∫
+
−
n
n
d2
2
1
exp
2
1
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
Cumulative Probability within nσ
Standard Deviation of the Mean for a
Gaussian Random Variable
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
( ) ( ) ( ) ( )( )42222
1,0;ˆ~ˆ&,0;ˆ~ˆ σλσσσσ −−− kkkk kmmmk NN

2
ˆ
2
k
2
k
ˆ-0Prob n
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
22
ˆ
1
2
k
σ
λ
σσ
k
−
=
22
k
2 1
1ˆ
1
1 σ
λ
σσ
λ
σσ 






 −
−≥≥






 −
+
k
n
k
n







 −
−
≥≥







 −
+
k
n
k
n
1
1
ˆ
1
1
2
2
k
2
λ
σ
σ
λ
σ
σσ
k
n
k
n
1
1
:ˆ:
1
1
k
−
−
=≥≥=
−
+
λ
σ
σσσ
λ
σ
σσ

143
k
n
k
n
kk 1ˆ
1
:&
1ˆ
1
:
00
−
−
=
−
+
=
λ
σ
σ
λ
σ
σ
σσ
Monte-Carlo Procedure
Choose the Confidence Level φ and find the corresponding nσ
using the normal (Gaussian) distribution.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
Run a few sample k0 > 20 and estimate λ according to2
( )
( )
2
1
2
0
1
4
0
0
0
0
0
0
ˆ
1
ˆ
1
:ˆ






−
−
=
∑
∑
=
=
k
i
ki
k
i
ki
k
mx
k
mx
k
λ∑=
=
0
0
10
1
:ˆ
k
i
ik x
k
m
3 Compute and as function of kσ σ
4 Find k for which
2
ˆ
2
k
2
k
ˆ-0Prob n
5 Run k-k0 simulations

144
Estimation of the Mean and Variance of a Random Variable (continue – 11)
Monte-Carlo Procedure
Choose the Confidence Level φ = 95% that gives the
corresponding nσ=1.96.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
The kurtosis λ = 32
3 Find k for which ϕσ
λ
σσ
σ
σ =












−
≤≤

2
kˆ
22
k
2 1
ˆ-0Prob
k
n
4 Run k>800 simulations
Example:
Assume a Gaussian distribution λ = 3
95.0
2
96.1ˆ-0Prob
2
kˆ
22
k
2
=












≤≤

σ
σσσ
k
Assume also that we require also that with probability φ = 95 %22
k
2
1.0ˆ- σσσ ≤
1.0
2
96.1 =
k
800≈k

145
Kurtosis of random variable xi
Kurtosis
Kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure
of the "peakedness" of the probability distribution of a real-valued random variable.
Higher kurtosis means more of the variance is due to infrequent extreme deviations, as
opposed to frequent modestly-sized deviations.
1905 Pearson defines Kurtosis,
as a measure of departure from normality in a paper published in
Biometrika. λ=3 for the normal distribution and the terms
‘leptokurtic’ (λ>3), mesokurtic (λ=3), platikurtic (λ<3) are
introduced.
( ){ } ( ){ }[ ]224
/: mxEmxE ii −−=λ
( ){ }
( ){ }[ ]22
4
:
mxE
mxE
i
i
−
−
=λ
Karl Pearson
(1857 –1936)
A leptokurtic distribution has a more acute "peak" around the mean (that is,
a higher probability than a normally distributed variable of values near the
mean) and "fat tails" (that is, a higher probability than a normally distributed
variable of extreme values).
A platykurtic distribution has a smaller "peak" around the mean (that is, a lower
probability than a normally distributed variable of values near the mean) and
"thin tails" (that is, a lower probability than a normally distributed variable of
extreme values).

146
Hyperbolic-Secant
25






x
2
sech
2
1 π
Distribution Graphical
Representation
Functional
Representation
Kurtosis
λ
Excess
Kurtosis
λ-3
Normal
( )
σπ
σ
µ
2
2
exp 2
2





 −
−
x
3 0
Laplace 






 −
−
b
x
b
µ
exp
2
1
6 3
Uniform
bxorxa
bxa
ab
>>
≤≤
−
0
1
1.8 -1.2
Wigner
Rx
RxxR
R
>
≤−
0
2 22
2
π -1.02

147
Skewness of random variable xi
Skewness
( ){ }
( ){ }[ ] 2/32
3
:
mxE
mxE
i
i
−
−
=γ Karl Pearson
(1857 –1936)
Negative skew: The left tail is longer; the mass of the distribution is concentrated on
the right of the figure. The distribution is said to be left-skewed.
1
Positive skew: The right tail is longer; the mass of the distribution is concentrated
on the left of the figure. The distribution is said to be right-skewed.
2
More data in the left tail than
it would be expected in a
normal distribution
More data in the right tail than
it would be expected in a
normal distribution
Karl Pearson suggested two simpler calculations as a measure of skewness:
• (mean - mode) / standard deviation
• 3 (mean - median) / standard deviation

148
Estimation of the Mean and Variance of a Random Variable using a
Recursive Filter (Unknown Statistics)
We found that using k measurements the estimated mean and variance are given in
batch form by:
∑=
=
k
i
ik x
k
x
1
1
:ˆ
Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,
and the variance pk, by a Recursive Filter
kxˆ
The k+1 measurement will give:
( )1
1
1
1
ˆ
1
1
1
1
ˆ +
+
=
+ +
+
=
+
= ∑ kk
k
i
ik xxk
k
x
k
x
( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++
Therefore the Recursive Filter form for the
k+1 measurement will be:
( )∑=
−
−
=
k
i
kik xx
k
p
1
2
ˆ
1
1
:
( )∑
+
=
++ −=
1
1
2
11
ˆ
1 k
i
kik xx
k
p

149
Recursive Filter (Unknown Statistics) (continue – 1)
We found that using k+1 measurements the estimated variance is given in
batch form by:
kxˆ
( ) 


 +
−−
+
+= ++ kkkkk p
k
k
xx
k
pp
1
ˆ
1
1 2
11
( )
( )
( )
( )
( )
( ) ( ) ( )
( )
( ) ( )2
1
2
12
1
0
1
1
2
1
1
1
2
1
1
2
1
1
1
2
11
ˆ
1
11
1ˆ
1
1
ˆˆˆ
1
2
ˆˆ
1
1
ˆ
ˆ
1
ˆ
1
kkkkk
kk
k
i
kikkkk
pk
k
i
ki
k
i
kk
ki
k
i
kik
xx
k
p
k
xx
kk
k
xxxxxx
kk
xxxx
k
k
xx
xx
k
xx
k
p
k
−
+
+





−=−
+
+
+










−+−−
+
−












−+−=






+
−
−−=−=
++
+
=
++
−
=
+
=
+
+
=
++
∑∑
∑∑

( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++

150
Recursive Filter (Unknown Statistics) (continue – 2)
kxˆ
( ) 


 +
−−
+
+= ++ kkkkk p
k
k
xx
k
pp
1
ˆ
1
1 2
11
( )kkkk xx
k
xx ˆ
1
1
ˆˆ 11 −
+
+= ++
( ) ( ) ( )kkkk xxkxx ˆˆ1ˆ 11 −+=− ++
( )( ) 



−−++= ++ kkkkk p
k
xxkpp
1
ˆˆ1
2
11

151
Estimate the value of a constant x, given discrete measurements of x corrupted by an
uncorrelated gaussian noise sequence with zero mean and variance r0.
The scalar equations describing this situation are:
kk xx =+1
kkk vxz +=
System
Measurement ( )0,0~ rNvk
The Discrete Kalman Filter is given by:
( ) ( )+=−+ kk xx ˆˆ 1
( ) ( ) ( ) ( )[ ] ( )[ ]−−+−−+−=+ ++
−
++++
+
11
1
01111
ˆˆˆ
1
kk
K
kkkk xzrppxx
k
  
 
0
1 kkk
I
kk wxx Γ+Φ=+
 kk
I
kk vxHz +=
( ) ( )[ ] ( )[ ]{ } 
( )
( )+=ΓΓ+Φ+Φ=−−−−=− +++++ k
T
I
T
kk
I
k
T
kkkkk pQpxxxxEp

0
11111
ˆˆ
( ) ( )[ ] ( )[ ]{ }
( ) ( )  
( )  
( )
( ) ( ) ( )
( ) 0
0
11
1
0111111
11111
1
1
ˆˆ
rp
pr
pHrHpHHpp
xxxxEp
k
k
pp
k
I
k
K
T
I
kk
I
k
T
I
kkk
T
kkkkk
kk
k
++
+
=−








+−−−−=
−+−+=+
+=−
++
−
++++++
+++++
+
+
  
General Form
with Known Statistics Moments Using a Discrete Recursive Filter
Estimation of the Mean and Variance of a Random Variable

152
Estimate the value of a constant x, given discrete measurements of x corrupted by an
We found that the Discrete Kalman Filter is given by:
( ) ( ) ( )[ ]+−++=+ +++ kkkkk xzKxx ˆˆˆ 111
( ) ( )
( )
( )
( )
0
0
0
1
1
r
p
p
rp
pr
p
k
k
k
k
k
+
+
+
=
++
+
=++
( )
0
0
0
1
1
r
p
p
p
+
=+ ( ) ( )
( )
0
1
1
2
1
r
p
p
p
+
+
+
=+ ( )
k
r
p
p
pk
0
0
0
1+
=+
( )
( ) 0
1
rp
p
K
k
k
k
++
+
=+
( )
( ) 0
1
rp
p
K
k
k
k
++
+
=+( ) ( )
( )
( )[ ]+−
++
++=+ ++ kkkk xz
k
r
p
r
p
xx ˆ
11
ˆˆ 1
0
0
0
0
1
0=k
1=k
0
0
0
2
1
r
p
p
+
=
( )11
1
1
0
0
0
0
0
0
0
0
0
0
0
++
=
+
+
+
=
k
r
p
r
p
r
k
r
p
p
k
r
p
p
with Known Statistics Moments Using a Discrete Recursive Filter (continue – 1)

153
Estimate the value of a constant x, given continuous measurements of x corrupted by an
The scalar equations describing this situation are:
0=x
vxz +=
System
Measurement ( )rNv ,0~
The Continuous Kalman Filter is given by:
( )  ( ) ( ) ( ) ( )[ ] ( ) 00ˆ&ˆˆˆ
1
1
0
=−





+=
+
−
xtxtzrHtptxAtx
kK
I


 
00
wxAx Γ+=
 vxHz
I
+=
( ) ( ) ( )[ ] ( ) ( )[ ]{ }T
txtxtxtxEtp −−= ˆˆ:
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 12
1
1
000
−−
−=−++= rtptptHrtHtptGQtGtAtptptAtp TT
I
TT


General Form
with Known Statistics Moments Using a Continuous Recursive Filter
( ) ( ) ( ) 0
12
0& ptprtptp ==−= −
or:
∫∫ −=
tp
p
dt
rp
pd
0
2
0
1 ( )
t
r
p
p
tp
0
0
1+
=
( )
t
r
p
r
p
rtpK
0
0
1
1+
== −
( ) ( )[ ]txz
t
r
p
r
p
tx ˆ
1
ˆ
0
0
−
+
=

154
Pseudo-Random Number Generators
• First attempts to generate “random numbers”:
- Draw balls out of a stirred urn
- Roll dice
• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from
census reports.
• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to
generate random numbers. They published a table of 100,000 digits.
• 1946: J. Von Neumann proposed the “middle square method”.
• 1948: D.H. Lehmer introduced the “linear congruential method”.
• 1955: RAND Corporation published a table of 1,000,000 random digits obtained
from electronic noise.
• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruential
generators.
• 1989: R.S. Wikramaratna proposed the additive congruential method.
Routine RANDU (IBM Corp)
“We guarantee that each number is random individually, but we don’t guarantee
that more than one of them is random”

155
On a computer the “random numbers” are not random at all – they are strictly
deterministic and reproducible, but they look like a stream of random numbers.
For this reason the computer programs are called “Pseudo-Random Number Generators”.
Essential Properties of a Pseudo-Random Number Generator
Repeatability – the same sequence should be produced with the same initial values
(or seeds)
Randomness – should produce independent uniformly distributed random variables
that passes all statistical tests for randomness.
Long Period – a pseudo-random number sequence uses finite precision arithmetic,
so the sequence must repeat itself with a finite period. This should be
much longer than the amount of random numbers needed for simulation.
Insensitive to seeds – period and randomness properties should not depend on the
initial seeds.

156
Essential Properties of a Pseudo-Random Number Generator (continue -1)
Portability – should give the same results on different computers
Efficiency – should be fast (small number of floating point operations) and not use
much memory.
Disjoint subsequences – different seeds should produce long independent (disjoint)
subsequences so that there are no correlations between simulations
with different initial seeds.
Homogeneity – sequences of all bits should be random.

157
A Random Number represents the value of a random variable uniform distributed on (0,1).
Pseudo-Random Numbers constitute a sequence of values, which although are
deterministically generated, have all the appearances of being independent uniform
distributed on (0,1).
One approach
1. Define x0 = integer initial condition or seed
2. Using integers a and m recursively compute
mxax nn modulo1−= mxIntegerxkmaxmkxa nnn <∈+⋅=− ,,,1
Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-random
number is an approximation to the value of uniform (0,1) random variable.
In general the integers a and m should be chose to satisfy three criteria:
1. For any initial seed, the resultant sequence has the “appearance” of being a sequence
of independent (0,1) random variables.
For any initial seed, the number of variables that can be generated before repetition
begins is large.
The values can be computed efficiently on a digital computer.
Multiplicative congruential method

158
Pseudo-Random Number Generators (continue – 1)
A gudeline is to choose m to be a large prime number compared to the computer word size.
Examples:
32 bits word computer: (some IBM systems)807,16712 531
==−= am
125,35312 535
==−= am36 bits word computer:
Another generator of pseudo-random numbers uses recursions of the type:
( ) mcxax nn modulo1 += −
mxIntegerxkmcaxmkcxa nnn <∈+⋅=+− ,,,,1
Mixed congruential method
32 bits word computer: (VAX)069,69232
== am
32 bits word computer: (transputers)525,664,1232
== am
48 bits word computer: (UNIX, RAND 48 routine)1616
48
6652 BcDDEECEam ===
48 bits word computer: (CDC vector machine)052 1547
=== cam
48 bits word computer: (Cray vector machine)01757228752 16
48
=== cBEAam
64 bits word computer: (Numerical Algorithms Group)0132 1359
=== cam
Return to Table of Content

159
Histograms
A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what
proportion of cases fall into each of several categories: it is a form of data binning. The categories
are usually specified as non-overlapping intervals of some variable. The categories (bars) must be
adjacent. The intervals (or bands, or bins) are generally of the same size.
Histograms are used to plot density of data, and often for density estimation: estimating the
probability density function of the underlying variable. The total area of a histogram always
equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a
relative frequency plot.
A cumulative histogram is a mapping that counts the
cumulative number of observations in all of the bins
up to the specified bin. That is, the cumulative
histogram Mi of a histogram mi is defined as:
An ordinary and a cumulative
histogram of the same data. The
data shown is a random sample of
10,000 points from a normal
distribution with a mean of 0 and
a standard deviation of 1.
Mathematical Definition
∑=
=
k
i
imn
1
In a more general mathematical sense, a histogram is
a mapping mi that counts the number of observations
that fall into various disjoint categories (known as
bins), whereas the graph of a histogram is merely one
way to represent a histogram. Thus, if we let n be the
total number of observations and k be the total number
of bins, the histogram mi meets the following
conditions:
∑=
=
i
j
ji mM
1

160
The Inverse Transform Method
Suppose we want to generate a discrete random variable X
having probability density function:
( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
To accomplish this, let generate a random number U that is uniformly distributed
over (0,1) and set:











<≤
+<≤
<
=
∑∑ =
−
=


j
i
i
j
i
ij pUpifx
ppUpifx
pUifx
X
1
1
1
1001
00
j
j
i
i
j
i
ij ppUpPxXP =






<<== ∑∑ =
−
= 1
1
1
)(
Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed
P (a ≤ U < b) = b-a, we have:
and so X has the desired distribution.

The Inverse Transform Method (continue – 1)
Suppose we want to generate a discrete random variable X
having probability density function: ( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
Draw X, N times,
from p (x)
Histogram of the
Results

Generating a Poisson Random Variable: 1,1,0
!
)( ===== ∑−
i
i
i
i pi
i
eiXPp 
λλ
( )
1
!
!1
1
1
+
=
+
=
−
+
−
+
i
i
e
i
e
p
p
i
i
i
i λ
λ
λ
λ
λ
Draw X , N times, from
Histogram of the Results

Generating Binominal Random Variable:
( )
( ) 1,1,01
!!
!
)( ==−
−
=== ∑−
i
i
ini
i pipp
ini
n
iXPp 
( ) ( )
( )
( )
( ) p
p
i
in
pp
ini
n
pp
ini
n
p
p
ini
ini
i
i
−+
−
=
−
−
−
−−+
=
−
−−+
+
111
!!
!
1
!1!1
! 11
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
Histogram of the Results

164
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having a
probability density function { qj, j ≥0 }. We want to use this to obtain a random
variable that has the probability density function { pj, j ≥0 }.
Let c be a constant such that: 0.. ≠∀≤ j
j
j
qtsjc
q
p
If such a c exists, it must satisfy: cqcp
j
j
j
j ≤⇒≤ ∑∑ 1
11

Rejection Method
Step 1: Simulate the value of Y, having probability density function qj.
Step 2: Generate a random number U (that is uniformly distributed
over (0,1) ).
Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.

165
The Acceptance-Rejection Technique (continue – 1)
Theorem
The random variable X obtained by the rejection method has probability density
function P { X=i } = pi.
Proof
{ } { } { }
{ } { }Acceptance
,
Acceptance
Acceptance,
Acceptance
Method
Acceptance
Method
Acceptance
P
qc
p
UiYP
P
iYP
iYPiXP i
i
Bayes






≤=
=
=
====
{ }
{ } { } { }AcceptanceAcceptanceAcceptance
(0,1)ddistribute
uniformlyU
ceindependen
by
Pc
p
P
qc
p
q
P
qc
p
UPiYP
ii
i
i
i
i
qi
==





≤=
=

Summing over all i, yields
{ }
{ }Acceptance
1
1
Pc
p
iXP i
i
i

 ∑
∑ ==
{ } 1Acceptance =Pc
{ } ipiXP ==
{ } 1
1
Acceptance ≤=
c
P
q.e.d.

166
Example
Generate a truncated Gaussian using the
Accept-Reject method. Consider the case with
( ) [ ]



 −∈
≈
−
otherwise
xe
xp
x
0
4,42/2/2
π
Consider the Uniform proposal function
( )
[ ]


 −∈
≈
otherwise
x
xq
0
4,48/1
In Figure we can see the results of the
Accept-Reject method using N=10,000 samples.

167
The Inverse Transform Algorithm
Let U be a uniform (0,1) random variable. For any continuous
distribution function F the random variable X defined by
( )UFX 1−
=
has distribution F. [ F-1
(u) is defined to be that value of x such that F (x) = u ]
Proof
Let Px(x) denote the Probability Distribution Function X=F-1
(U)
( ) { } ( ){ }xUFPxXPxPx ≤=≤= −1
Since F is a distribution function, it means that F (x) is a monotonic increasing
function of x and so the inequality “a ≤ b” is equivalent to the inequality
“F (a) ≤ F (b)”, therefore
( ) ( )[ ] ( ){ }
( )[ ]
( ){ } ( )
( )
( )xFxFUP
xFUFFPxP
uniformU
xF
UUFF
x
1,0
10
1
1
≤≤
=
−
=≤=
≤=
−

168
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having a
probability density function g (x). We want to use this to obtain a random
variable that has the probability density function f (x).
Let c be a constant such that: ( )
( )
yc
yg
yf
∀≤
If such a c exists, it must satisfy: ( ) ( ) cdyygcdyyf ≤⇒≤ ∫∫ 1
11

Rejection Method
Step 1: Simulate the value of Y, having probability density function g (Y).
Step 2: Generate a random number U (that is uniformly distributed
over (0,1) ).
Step 3: If U < f (Y)/c g (Y), set X = Y and stop. Otherwise return to Step 1.

169
Theorem
The random variable X obtained by the rejection method has probability density
function P { Y=y } = f (y).
Proof
{ } { } { }
{ }
( )
( )
{ }Acceptance
,
Acceptance
Acceptance,
Acceptance
Method
Acceptance
Method
Acceptance
P
ygc
yf
UyP
P
yP
YPyYP
Bayes






≤
====
{ }
( )
( )
( )
{ }
( ) ( )
( )
{ }
( )
{ }AcceptanceAcceptanceAcceptance
(0,1)ddistribute
uniformlyU
ceindependen
by
Pc
yf
P
ygc
yf
yg
P
ygc
yf
UPyP
yg
==





≤
=

Summing over all i, yields
{ }
( )
{ }Acceptance
1
1
Pc
dyyf
dyyYP

  
∫
∫ ==
{ } 1Acceptance =Pc
{ } ( )yfyYP ==
{ } 1
1
Acceptance ≤=
c
P
q.e.d.

170
SOLO
The Bootstrap
• Popularized by Bradley Efron (1979)
• The Bootstrap is a name generically applied to statistical resampling schemes
that allow uncertainty in the data to be assesed from the data themselves, in
other words
“pulling yourself up by your bootstraps”
The disadvantage of bootstrapping is that while (under some conditions) it is
asymptotically consistent, it does not provide general finite-sample
guarantees, and has a tendency to be overly optimistic.The apparent
simplicity may conceal the fact that important assumptions are being made
when undertaking the bootstrap analysis (e.g. independence of samples)
where these would be more formally stated in other approaches.
The advantage of bootstrapping over analytical methods is its great simplicity - it is
straightforward to apply the bootstrap to derive estimates of standard errors and
confidence intervals for complex estimators of complex parameters of the
distribution, such as percentile points, proportions, odds ratio, and correlation
coefficients.
Bradley Efron
1938
Stanford U.
Review of Probability

171
SOLO
The Bootstrap (continue -1)
• Given n observation zi i=1,…,n and a calculated statistics S, what is the uncertainty
in S?
• The Procedure:
- Draw m values z’i i=1,…,m from the original data with replacement
- Calculate the statistic S’ from the “bootstrapped” sample
- Repeat L times to build a distribution of uncertainty in S.

172
Importance Sampling (IS)
Let Y = (Y1,…,Ym) a vector of random variables having a joint probability density
function p (y1,…,ym), and suppose that we are interested in estimating
( )[ ] ( ) ( )∫== mmmmp dydyyypyygYYgE  1111 ,,,,,,θ
Suppose that a direct generation of the random vector Y so as to compute g (Y) is
inefficient possible because
(a) is difficult to generate the random vector Y, or
(b) the variance of g (Y) is large, or
(c) both of the above
Suppose that W=(W1,…,Wm) is another random vector, which takes values in the
same domain as Y, and has a joint density function q (w1,…,wm) that can be easily
generated. The estimation θ can be expressed as:
( )[ ] ( ) ( )
( )
( ) ( )
( )
( )






=== ∫ Wg
Wq
Wp
Edwdwwwq
wwq
wwpwwg
YYgE qmm
m
mm
mp 


 11
1
11
1 ,,
,,
,,,,
,,θ
Therefore, we can estimate θ by generating values of random vector W, and then
using as the estimator the resulting average of the values g (W) p (W)/ q (W).

173
Importance Sampling (IS) (continue – 1)
[ ] ( )
( )
( ) ( )
( )
( )
( )∑∫ =
≈






==
N
i
w
i
i
iqp
i
xq
xp
x
N
x
xq
xp
Edxxq
xq
xpx
xE
1
1

In Figure the Histogram using the Importance
Weight wi is presented together with the
true PDF
Example:
Importance Sample for a Bi-Modal Distribution
Consider the following distribution:
( ) ( ) ( )2/1,3:
2
1
1,0:
2
1
xxxp NN +=
We want to calculate the mean value (g (x) = x)
using Importance Sampling.
Use: ( ) ( ) ( )5,5& −∈= Uxqxxg
For i=1,…,N, sample (draw) xi using q (x)
We obtain:
( )
( )i
i
i
xq
xp
w =: Importance Weight
For N=10,000 samples we obtain Ep [x]=1.4915 instead of 1.5.
( ) Nixqx ii ,,1~ =

174
SOLO
Metropolis Algorithm
• This method of generation of an arbitrary probability distribution
was invented by Metropolis, Rosenbluth and Teller (supposedly at a
Los Alamos dinner party) and published June 1953.
Procedure
• Set up a Markov Chain that has as a unique stationary solution
the required π (x) Probability Distribution Function (PDF)
• Run the chain until stationary.
• All subsequent samples are from stationary distribution π (x)
as required.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.,
“ Equations of state calculations by fast computing machine”,
Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092
( 1915 – 1999)
This is also called Markov Chain Monte Carlo (MCMC) method.
X3 X2
0.3
0.3
0.1
0.2
X1
0.6
0.50.3
0.6
0.1

175
SOLO
Metropolis Algorithm (continue – 1)
( 1915 – 1999)
Proof of the Procedure
Pr (X,t) - the probability of being in the state X at time t.
Pr (X→Y)=Pr (Y|X) - the probability, per unit time, of transition probability, of going
from state X to state Y.
( ) ( ) ( ) ( ) ( ) ( )[ ]∑ −+=+
Y
tXXYtYYXtXtX ,Pr|Pr,Pr|Pr,Pr1,Pr
At large t, once the arbitrary initial state is “forgotten”, we want Pr (X,t) → Pr (X).
Clearly a sufficient (but not necessary) condition for an equilibrium (time independent)
probability distribution is the so called
( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
This method can be used for any probability distribution,
but Metropolis used:
( ) ( ) ( )AEBEE
E
Ee
AB
kTE
−=∆



≤∆
>∆
=
∆−
:
01
0
|Pr
/ Note: E (A) is equivalent to
Energy level of state A
( ) ( ) 1|PrPr ==→ ∑∑ YB
XYYX Sum of probabilities of all states reached from X.
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→

176
SOLO
( ) ( ) ( ) ( )tYYXtXXY ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
Metropolis defined a symmetric Q (Y|X) = Q (X|Y) as a candidate generating density,
for p (Y|X) such that: ( ) 1| =∑Y
XYQ
In general Q (Y|X) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )tYYXQtXXYQ ,Pr|,Pr| >
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→
The process moves from X to Y too
often and from Y to X too rarely.
A convenient way to correct this is to reduce the number of moves from X to Y by
introducing a probability 0 < Α (Y|X) ≤ 1. This is called the Acceptance Probability.
( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr

177
SOLO
X Y
( )XY |Pr
( )YX |Pr
( )
( )XY
XX
|Pr
Pr
=
→ ( )
( )YY
YY
|Pr
Pr
=
→
Let define the Acceptance Probability as:
( ) ( ) ( ) XYXYXYQXY ≠Α⋅= |||Pr
( )
( ) ( ) ( ) ( )
( ) ( )


>
≤
=Α
YX
YXYX
XY
PrPr1
PrPrPr/Pr
| ( )
( ) ( )
( ) ( ) ( ) ( )


>
≤
=Α
YXXY
YX
YX
PrPrPr/Pr
PrPr1
|
( ) ( ) ( ) XYYXYXQYX ≠Α⋅= |||Pr
If Pr (X) ≤ Pr (Y) then A (X|Y) = 1, A (Y|X) = Pr (X)/Pr (Y)
If Pr (X) >Pr (Y) then A (X|Y) = Pr (Y)/ Pr (X), A (Y|X) = 1
In both cases:
( )
( )
( ) ( )
( ) ( )
( ) ( ) ( )
( )
( )
( )Y
X
YX
XY
YXYXQ
XYXYQ
YX
XY YXQXYQ
Pr
Pr
|
|
||
||
|Pr
|Pr ||
=
Α
Α
=
Α⋅
Α⋅
=
=
which is just the Detailed Balance condition.

178
SOLO
( ) ( ) ( ) ( )tAABtBBA ,Pr|Pr,Pr|Pr =Detailed Balance Condition:
This method can be used for any probability distribution, but Metropolis used:
( ) ( ) ( )AEBEE
E
Ee
AB
kTE
−=∆



≤∆
>∆
=
∆−
:
01
0
|Pr
/
( )
( )
( )
( )
( ) ( )[ ]
( ) ( )
( ) ( )[ ] ( ) ( )
TkE
TkBEAE
TkAEBE
e
AEBEE
e
AEBEE
e
tA
tB
BA
AB /
/
/
0
1
0
1
,Pr
,Pr
|Pr
|Pr ∆−
−−
−−
=












≤−=∆
>−=∆
==
Therefore
A B
( )BA →Pr
( )AB →Pr
( )
( )BA
AA
→−=
→
Pr1
Pr ( )
( )AB
BB
→−=
→
Pr1
Pr

179
SOLO
Metropolis-Hastings (M-H) Algorithm
• Set up a Markov Chain T (x’|x) that has as a unique stationary solution
the required π (x’) Probability Distribution Function (PDF)
( ) ( ) ( )∫= xdxxxTx ππ |''
W. Keith Hastings improved the Metropolis algorithm by allowing a non-symmetrical
Candidate Generating Density.
Hastings, W., “Monte Carlo Simulation Methods Using Markov Chains and
Their Applications”, Biometrica, 1970, No. 57, pp. 97 - 109
Here we give the development for Continuous Random Variables
(for Discrete Random Variables the development is similar to that used for
Metropolis Algorithm).

180
SOLO
• The problem is to find the conditional transition probability distribution T (x’|x)
of the Markov Chain, that has states converging, after a transition time, to π (x’).
( ) ( ) ( )∫= xdxxxTx ππ |''
To satisfy this requirement a “necessary condition” (but “not sufficient”) is:
Proof:
( ) ( ) ( ) ( ) ( ) ( ) ( )''|'''||'
1
xxdxxTxxdxxxTxdxxxT ππππ === ∫∫∫ 
q.e.d.
Let define Q (x’|x) as a candidate generating density, for T (x’|x) such that:
( ) 1'|' =∫ xdxxQ
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )''||' xxxTxxxT ππ =
“Detailed Balance”
or “ Reversibility Condition”
or “Time Reversibility”
( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from
x to x’ too often and from x’ to x too rarely.

181
SOLO
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
( ) ( ) ( ) ( )''||' xxxQxxxQ ππ > Loosely speaking, the process moves from
x to x’ too often and from x’ to x too rarely.
A convenient way to correct this is to reduce the number of moves from x to x’ by
introducing a probability 0 < α (x’|x) ≤ 1. This is called the Acceptance Probability.
( ) ( ) ( ) xxxxxxQxxT ≠= '|'|'|' α
If the move is not made the process again returns to x as a value from target distribution.
( ) ( ) ( ) ( ) ( )''||'|' xxxQxxxxxQ ππα =The Detailed Balance is
From which
( ) ( ) ( )
( ) ( )
1
|'
''|
|' ≤=
xxxQ
xxxQ
xx
π
π
α ( ) ( ) ( )
( ) ( ) 





=
xxxQ
xxxQ
xx
π
π
α
|'
''|
,1min|'
In the same way (by interchanging
x’ with x) ( ) ( ) ( )
( ) ( )





=
''|
|'
,1min'|
xxxQ
xxxQ
xx
π
π
α

182
SOLO
Let prove that we satisfy the “Detailed Balance” condition:
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) 





=
xxxQ
xxxQ
xxxQxxxT
π
π
ππ
|'
''|
,1min|'|'
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )





=
''|
|'
,1min''|''|
xxxQ
xxxQ
xxxQxxxT
π
π
ππ
Suppose ( ) ( ) ( ) ( )xxxQxxxQ ππ |'''| <
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( )''|
|'
''|
|'|' xxxQ
xxxQ
xxxQ
xxxQxxxT π
π
π
ππ ==
( ) ( ) ( ) ( ) ( ) ( )''|1''|''| xxxQxxxQxxxT πππ =⋅=
Therefore
( ) ( ) ( ) ( )''||' xxxTxxxT ππ = q.e.d.

183
SOLO
The Transition Kernel of the Metropolis Hastings Algorithm is:
( ) ( ) ( ) ( )[ ] ( )'|'1|'|'|' xxxxxxxQxxT xδαα −+=
where δx is the Dirac-mass on {x}.

184
SOLO
Therefore the M-H Algorithm will:
Use the previous generated x(t)
Draw a new value xnew
from the candidate distribution Q (xnew
| x(t)
):
( )
( )tnewnew
xxQx |~
Compute the acceptance probability α (xnew
| xj
):
( )
( )
( )
( ) ( )
( )
( ) ( )
( ) 





= ttnew
newnewt
tnew
xxxQ
xxxQ
xx
π
π
α
|
|
,1min|
Use the Acceptance/Rejection method with
(uniform distribution between 0 to 1) and c = 1 (because U [0,1] > α (xnew
|x(t)
) )
( ) [ ]


 ≤≤
==
otherwise
x
Uxq
0
101
1,0
1
2
3
4

185
SOLO

186
SOLO
Run This
Example

187
SOLO
The convergence of the M-H Algorithm to the desired unique stationary solution
the required π (x) occurs under the following conditions:
• Irreducibility: every state is eventually reachable from any start state;
for all x, there exist a t such that π (x,t) > 0
• Aperiodicity: the chain doesn’t get caught in cycles.
The process is ergodic if it is both irreductible and aperiodic.
In M-H algorithm the draws are used as sample from the target density π (x) only
after the Markov Chain has passed the transient stage and the effect of the chosen
starting value x0
has become so small that it can be ignored. The rate of convergence
of the Markov Chain is a function of the chosen candidate generating density Q (x’,x)
The efficiency of the algorithm depends on how close is the Acceptance Probability α
to 1.

188
SOLO
Example:
( ) ( )
( )[ ]2
2
102.0exp7.0
2.0exp3.0
−⋅−+
−⋅=
x
xxπ
Proposed Candidate
Distribution:
( ) ( )100,| tt
new xxxQ N=
Ramon Sagarna
R,Sagarna@cs.bham.ac.uk
“Lecture 19
Markov Chain Monte Carlo
Methods (MCMC)”

189
SOLO
If we choose a symmetric candidate generating density: Q (x’|x) = Q (x|x’) for each x’,x
then
( ) ( )
( ) 





=
x
x
xx
π
π
α
'
,1min|' ( ) ( )
( )





=
'
,1min'|
x
x
xx
π
π
α
We obtain the Metroplis Algorithm.
( ) ( ) ( )xExEE
E
Ee
xxQ
kTE
−=∆



≤∆
>∆
=
∆−
':
01
0
|'
/
Metropolis has chosen:
( ) ( ) ( )xExEE
Ee
E
xxQ kTE
−=∆



≤∆
>∆
= ∆+
':
0
01
'| /

190
SOLO

191
SOLO
Gibbs Sampling
Stuart Geman
Brown University
Donald Geman
Johns Hopkins
University
Josiah Willard Gibbs
1839 - 1903
In mathematics and physics, Gibbs sampling is an
algorithm to generate a sequence of samples from the joint
probability distribution of two or more random variables.
The purpose of such a sequence is to approximate the joint
distribution, or to compute an integral (such as an
expected value). Gibbs sampling is a special case of the
Metropolis-Hastings algorithm, and thus an example of a
Markov chain Monte Carlo algorithm. The algorithm is
named after the physicist J. W. Gibbs, in reference to an
analogy between the sampling algorithm and statistical
physics. The algorithm was devised by Stuart Geman and
Donald Geman, some eight decades after the passing of
Gibbs, and is also called the Gibbs sampler.
Geman, S. and Geman, D., “Stochastic Relaxation, Gibbs Distributions and the Bayes
Restoration of Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence,
1984, 6, pp. 721 - 741

192
SOLO
Gibbs Sampling (continue – 1)
Gibbs sampler uses what are called the full (or complete) conditional distributions:
( )
( )
( )
( )∫∫ −
−
+−
+−
+− ==










−
jjj
jj
jkjjj
kjjj
Bayes
x
kjjj
xdxx
xx
xdxxxxx
xxxxx
xxxxx
j
,
,
,,,,,,
,,,,,,
,,,,,|
111
111
111
π
π
π
π
π


  

The Gibbs sampler sample one variable in turn
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )







11
3
1
21
2
1
1
1
1
2
1
1
1
4
1
2
1
13
1
3
3
1
12
1
2
321
1
1
,,,|~
,,,|~
,,,,|~
,,,|~
,,,|~
++++
+
−
+++
+++
++
+
t
k
ttt
t
k
tt
k
t
k
t
k
tttt
t
k
ttt
t
k
ttt
xxxxX
xxxxX
xxxxxX
xxxxX
xxxxX
π
π
π
π
π
Gibbs sampler always uses the most recent values.
Suppose that is k ( ≥2 ) dimensional.( )kxxxx ,,, 21 =

193
SOLO
Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.
To see this let define the candidate generating density: Q (x’|x(t)
) as
( )
( )
( )
( ) ( )



 =
= −−
otherwise
xxifxx
xxQ
t
j
new
j
t
j
new
jtnew
0
|Pr
|
( )
( )
( )
( ) ( )
( )
( ) ( )
( ) 





= ttnew
newnewt
tnew
xxxQ
xxxQ
xx
Pr|
Pr|
,1min|α
( ) ( )
( ) ( )
( ) ( )
( )







=
−
−
tnew
j
new
j
newt
j
t
j
xxx
xxx
Pr|Pr
Pr|Pr
,1min
At any moment one variable is drawn: ( )new
jj
new
j xxx −|~ πnew
jx
where ( ) ( ) ( ) ( ) ( )
( )11
1121 ,,,,,,: −−
+−− = t
k
t
j
t
j
ttnew
j xxxxxx 
The acceptance probability is:( )
( )tnew
xx |α
( ) ( ) ( ) ( ) ( )
( ) ( )new
j
new
j
t
k
t
j
new
j
t
j
ttnew
xxxxxxxxx −
−−
+− == |,,,,,,, 11
1121 The will be
new
x

194
SOLO
( )
( )
( )
( ) ( )



 =
= −−
otherwise
xxifxx
xxQ
t
j
new
j
t
j
new
jtnew
0
|Pr
|
( )
( )
( )
( ) ( )
( )
( ) ( )
( )
( ) ( )
( ) ( )
( ) ( )
( )







=





=
−
−
tnew
j
new
j
newt
j
t
j
ttnew
newnewt
tnew
xxx
xxx
xxxQ
xxxQ
xx
Pr|Pr
Pr|Pr
,1min
Pr|
Pr|
,1min|α
( )
( )
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
( )







=
−−
−−
tt
j
new
j
new
j
newnew
j
t
j
t
jtnew
xxxx
xxxx
xx
PrPr,Pr
PrPr,Pr
,1min|α
( ) ( )
( )
( ) ( )
( )
( )
( )t
j
t
j
t
j
Bayes
t
j
t
j
x
xx
xx
−
−
− =
Pr
,Pr
|Pr ( ) ( )
( )new
j
new
j
new
j
Bayes
new
j
new
j
x
xx
xx
−
−
− =
Pr
,Pr
|Pr
( )
( ) ( )
( )
( )
( )
1
Pr
Pr
,1min|
t
j
new
j xx
t
j
new
jtnew
x
x
xx
−− =
−
−
=








=α
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( )t
j
t
j
tt
j
t
j
t
xxxxxx −− =→= ,PrPr,( ) ( ) ( )new
j
new
j
newnew
j
new
j
new
xxxxxx −− =→= ,PrPr,
The acceptance probability is:( )
( )tnew
xx |α
Gibbs Sampling always accepts
new
jx
Gibbs Sampling is a special case of
Metropolis – Hastings Algorithm.
candidate generating density: Q (x’|x(t)
)

195
SOLO

Monte Carlo Method can be used to numerically evaluate multidimensional integrals
( ) ( )∫∫ == xdxgdxdxxxgI mm  11 ,,
To use Monte Carlo we factorize ( ) ( ) ( )xpxfxg ⋅=
( ) ( ) 1&0 =≥ ∫ xdxpxp
in such a way that is interpreted as a Probability Density Function such that( )xp
We assume that we can draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
Using Monte Carlo we can approximate ( ) ( )∑=
−≈
SN
i
S
i
Nxxxp
1
/δ
( ) ( ) ( ) ( )
( ) ( ) ( )∑∑∫
∫ ∑∫
==
=
=−⋅=
−⋅=≈⋅=
SS
S
S
N
i
i
S
N
i
i
S
N
i
S
i
N
xf
N
xdxxxf
N
xdNxxxfIxdxpxfI
11
1
11
/
δ
δ

we draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
( ) ( ) ( )∑∫ =
=≈⋅=
S
S
N
i
i
S
N xf
N
IxdxpxfI
1
1
If the samples are independent, then INS
is an unbiased estimate of I.
i
x
II
sa
N
N
S
S
..
∞→
→
( )[ ] ( ) ∞<−= ∫ xdxpIxff
22
:σIf the variance of is finite; i.e.:( )xf
The error of the MC estimate, e = INS
– I, is of the order of O (NS
-1/2
), meaning
that the rate of convergence of the estimate is independent of the dimension of
the integrand. Return to Table of Content
According to the Law of Large Numbers INS
will almost surely converge to I:
then the Central Limit Theorem holds and the estimation error converges in
distribution to a Normal Distribution:
( ) ( )2
,0~lim fNS
N
IIN S
S
σN−
∞→

198
Random ProcessesSOLO
Random Variable:
A variable x determined by the outcome Ω of a random experiment.
( )Ω= xx
Random Process or Stochastic Process:
A function of time x determined by the outcome Ω of a random experiment.
( ) ( )Ω= ,txtx
1
Ω
2
Ω
3Ω
4Ω
x
t
This is a family or an ensemble of
functions of time, in general different
for each outcome Ω.
Mean or Ensemble Average of the Random Process: ( ) ( )[ ] ( ) ( )∫
+∞
∞−
=Ω= ξξξ dptxEtx tx
,:
Autocorrelation of the Random Process: ( ) ( ) ( )[ ] ( ) ( ) ( )∫ ∫
+∞
∞−
+∞
∞−
=ΩΩ= ηξξξη ddptxtxEttR txtx 21 ,2121
,,:,
Autocovariance of the Random Process: ( ) ( ) ( )[ ] ( ) ( )[ ]{ }221121 ,,:, txtxtxtxEttC −Ω−Ω=
( ) ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )2121212121 ,,,, txtxttRtxtxtxtxEttC −=−ΩΩ=
Table of Content

199
SOLO
Stationarity of a Random Process
1. Wide Sense Stationarity of a Random Process:
• Mean Average of the Random Process is time invariant:
( ) ( )[ ] ( ) ( ) .,: constxdptxEtx tx
===Ω= ∫
+∞
∞−
ξξξ
• Autocorrelation of the Random Process is of the form: ( ) ( ) ( )τ
τ
RttRttR
tt 21:
2121
,
−=
=−=
( ) ( ) ( )[ ] ( ) ( ) ( ) ( )12,2121 ,,,:, 21
ttRddptxtxEttR txtx === ∫ ∫
+∞
∞−
+∞
∞−
ηξξξηωωsince:
We have: ( ) ( )ττ −= RR
Power Spectrum or Power Spectral Density of a Stationary Random Process:
( ) ( ) ( )∫
+∞
∞−
−= ττωτω djRS exp:
2. Strict Sense Stationarity of a Random Process:
All probability density functions are time invariant: ( ) ( ) ( ) .,,
constptp xtx
== ωωω
Ergodicity:
( ) ( ) ( )[ ]Ω==Ω=Ω ∫
+
−∞→
,,
2
1
:, lim txExdttx
T
tx
Ergodicity
T
TT
A Stationary Random Process for which Time Average = Assembly Average
Random Processes

200
SOLO
Time Autocorrelation:
Ergodicity:
( ) ( ) ( ) ( ) ( )∫
+
−∞→
Ω+Ω=Ω+Ω=
T
TT
dttxtx
T
txtxR ,,
2
1
:,, lim τττ
For a Ergodic Random Process define
Finite Signal Energy Assumption: ( ) ( ) ( ) ∞<Ω=Ω= ∫
+
−∞→
T
TT
dttx
T
txR ,
2
1
,0 22
lim
Define: ( )
( )


 ≤≤−Ω
=Ω
otherwise
TtTtx
txT
0
,
:, ( ) ( ) ( )∫
+∞
∞−
Ω+Ω= dttxtx
T
R TTT
,,
2
1
: ττ
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )∫∫∫
∫∫∫
−−
−
−
+∞
−
−
−
−
∞−
Ω+Ω−Ω+Ω=Ω+Ω=
Ω+Ω+Ω+Ω++Ω=
T
T
TT
T
T
TT
T
T
TT
T
TT
T
T
TT
T
TTT
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
dttxtx
T
R
τ
τ
τ
τ
τττ
ττωττ
,,
2
1
,,
2
1
,,
2
1
,,
2
1
,,
2
1
,,
2
1
00

Let compute:
( ) ( ) ( ) ( ) ( )∫∫ −∞→−∞→∞→
Ω+Ω−Ω+Ω=
T
T
TT
T
T
T
TT
T
T
T
dttxtx
T
dttxtx
T
R
τ
τττ ,,
2
1
,,
2
1
limlimlim
( ) ( ) ( )ττ Rdttxtx
T
T
T
TT
T
=Ω+Ω∫−∞→
,,
2
1
lim
( ) ( ) ( ) ( )[ ] 0,,
2
1
,,
2
1
suplimlim →








Ω+Ω≤Ω+Ω
≤≤−∞→−∞→
∫ τττ
ττ
txtx
T
dttxtx
T
TT
TtTT
T
T
TT
T
therefore: ( ) ( )ττ RRT
T
=
→∞
lim
( ) ( ) ( )[ ]Ω==Ω=Ω ∫
+
−∞→
,,
2
1
:, lim txExdttx
T
tx
Ergodicity
T
TT
T− T+
( )txT
t
Random Processes

201
SOLO
Ergodicity (continue):
( ) ( ) ( ) ( ) ( )
( ) ( )[ ] ( ) ( )( )[ ]
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( ) [ ]TTTT
TT
TT
TTT
XX
T
dvvjvxdttjtx
T
dtjtxdttjtx
T
ddttjtxtjtx
T
dttxtxdj
T
djR
*
2
1
exp,exp,
2
1
exp,exp,
2
1
exp,exp,
2
1
,,exp
2
1
exp
=−ΩΩ=
+−Ω+Ω=
+−Ω+Ω=
Ω+Ω−=−
∫∫
∫∫
∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
+∞
∞−
ωω
ττωτω
ττωτω
τττωττωτLet compute:
where: and * means complex-conjugate.( ) ( )∫
+∞
∞−
−Ω= dvvjvxX TT ωexp,:
Define:
( ) ( ) ( ) ( ) ( ) ( )[ ]∫ ∫∫
+∞
∞−
+
−∞→
+∞
∞−∞→∞→ 







Ω+Ω−=








−=








= τττωττωτω ddttxtxE
T
jdjRE
T
XX
ES
T
T
TT
T
T
T
TT
T
,,
2
1
expexp
2
: limlimlim
*
Since the Random Process is Ergodic we can use the Wide Stationarity Assumption:
( ) ( )[ ] ( )ττ RtxtxE TT =Ω+Ω ,,
( ) ( ) ( ) ( ) ( )
( ) ( )∫
∫ ∫∫ ∫
∞+
∞−
+∞
∞−
+
−∞→
+∞
∞−
+
−∞→∞→
−=








−=








−=








=
ττωτ
ττωττττωω
djR
ddt
T
jRddtR
T
j
T
XX
ES
T
TT
T
TT
TT
T
exp
2
1
exp
2
1
exp
2
:
1
*
limlimlim
  
Random Processes

202
SOLO
Ergodicity (continue):
We obtained the Wiener-Khinchine Theorem (Wiener 1930):
( ) ( ) ( )∫
+∞
∞−→∞
−=





= dtjR
T
XX
ES TT
T
τωτω exp
2
:
*
lim
Norbert Wiener
1894 - 1964
Alexander Yakovlevich
Khinchine
1894 - 1959
The Power Spectrum or Power Spectral Density of
a Stationary Random Process S (ω) is the Fourier
Transform of the Autocorrelation Function R (τ).
Random Processes

203
SOLO
White Noise
A (not necessary stationary) Random Process whose Autocorrelation is zero for
any two different times is called white noise in the wide sense.
( ) ( ) ( )[ ] ( ) ( )211
2
2121
,,, ttttxtxEttR −=ΩΩ= δσ
( )1
2
tσ - instantaneous variance
Wide Sense Whiteness
Strict Sense Whiteness
A (not necessary stationary) Random Process in which the outcome for any two
different times is independent is called white noise in the strict sense.
( ) ( ) ( ) ( )2121,
,,21
ttttp txtx
−=Ω δ
A Stationary White Noise Random has the Autocorrelation:
( ) ( ) ( )[ ] ( )τδσττ 2
,, =Ω+Ω= txtxER
Note
In general whiteness requires Strict Sense Whiteness. In practice we have only
moments (typically up to second order) and thus only Wide Sense Whiteness.
Random Processes

204
SOLO
White Noise
A Stationary White Noise Random has the Autocorrelation:
( ) ( ) ( )[ ] ( )τδσττ 2
,, =Ω+Ω= txtxER
The Power Spectral Density is given by performing the Fourier Transform of the
Autocorrelation:
( ) ( ) ( ) ( ) ( ) 22
expexp στωτδστωτω =−=−= ∫∫
+∞
∞−
+∞
∞−
dtjdtjRS
( )ωS
ω
2
σ
We can see that the Power Spectrum Density contains all frequencies at the same
amplitude. This is the reason that is called White Noise.
The Power of the Noise is defined as: ( ) ( ) 2
0 σωτ ==== ∫
+∞
∞−
SdtRP
Random Processes

205
SOLO
Table of Content
Markov Processes
A Markov Process is defined by:
Andrei Andreevich
Markov
1856 - 1922
( ) ( )( ) ( ) ( )( ) 111
,|,,,|, tttxtxptxtxp >∀ΩΩ=≤ΩΩ ττ
i.e. the Random Process, the past up to any time t1 is fully defined
by the process at t1.
Examples of Markov Processes:
1. Continuous Dynamic System
( ) ( )
( ) ( )wuxthtz
vuxtftx
,,,
,,,
=
=
2. Discrete Dynamic System
( ) ( )
( ) ( )kkkkk
kkkkk
wuxthtz
vuxtftx
,,,
,,,
1
1
=
=
+
+
x - state space vector (n x 1)
u - input vector (m x 1)
v - white input noise vector (n x 1)
- measurement vector (p x 1)z
- white measurement noise vector (p x 1)w
Random Processes

206
SOLO
Table of Content
Markov Processes
3. Continuous Linear Dynamic System
( ) ( ) ( )
( ) ( )txCtz
tvtxAtx
=
+=
Using the Fourier Transform we obtain: ( ) ( )
( )
( ) ( ) ( )ωωωωω
ω
VHVAIjCZ
H
=−=
−
  
1
Using the Inverse Fourier Transform we obtain:
( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( )
( ) ( ) ( )( )
( )
( ) ( )∫∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
−
∞+
∞−
+∞
∞−
+∞
∞−
+∞
∞−
−=−=








−==
ξξξξωξωω
π
ξ
ωωξξωξω
π
ωωωω
π
ξ
ω
dthvddtjHv
dtjdjvHdtjVHtz
th
egrattion
of
order
change
V
  
  
exp
2
1
expexp
2
1
exp
2
1
int
h (t,τ)
v (t) z (t)
Random Processes

207
SOLO
Table of Content
Markov Processes
3. Continuous Linear Dynamic System
( ) ( ) ( )
( ) ( )txCtz
tvtxAtx
=
+=
The Autocorrelation of the output is:
( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
h (t,τ)
v (t) z (t)
( ) ( ) ( )[ ] ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )[ ] ( ) ( ) ( )
( ) ( ) ( ) ( )∫∫
∫ ∫∫ ∫
∫∫
∞+
∞−
−=
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
+=−+−=
−−+−=−+−=








−+−=+=
ζτζζσξξτξσ
ξξξξδξτξσξξξξξτξ
ξξξτξξξττ
ξζ
dhhdthth
ddththddvvEthth
dvthdvthEtztzER
v
t
v
v
zz
2
111
2
212121
2
212111
222111
1
( ) ( ) ( )[ ] ( )τδσττ
2
vvv tvtvER =+=
( ) ( ) ( ) ( ) ( ) 22
expexp vvvvvv
djdjRS σττωτδσττωτω =−=−= ∫∫
+∞
∞−
+∞
∞−
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 2*2
22
2
expexp
expexpexpexp
expexp
xx
xx
x
RR
zzzz
HHdjhdjh
djdjhhdjdjhh
djdhhdjRS
zzzz
σωωχχωχζζωζσ
χχωζζωζχσττζωζζωζτζσ
ττωζτζζσττωτω
χτζ
ττ
=
















−=
−=−−−=
−−=−=
∫∫
∫ ∫∫ ∫
∫ ∫∫
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
=+
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
−=
+∞
∞−
( ) ( ) ( ) ( )ωωωω vvzz
SHHS *
=
Random Processes

208
SOLO
Table of Content
Markov Processes
4. Continuous Linear Dynamic System ( ) ( ) ( )∫
+∞
∞−
= ξξξ dvthtz ,
( ) ( ) ( )[ ] ( )τδσττ
2
vvv tvtvER =+= ( ) 2
vvvS σω =
v (t) z (t)
( )
xj
K
H
ωω
ω
/1+
=
( )
x
j
K
H
ωω
ω
/1+
=
The Power Spectral Density of the output is:
( ) ( ) ( ) ( )
( )2
22
*
/1 x
v
vvzz
K
SHHS
ωω
σ
ωωωω
+
==
( )
( )2
22
/1 x
vv
zz
K
S
ωω
σ
ω
+
=
ω
x
ω
22
vv
K σ
2/
22
vv
K σ
The Autocorrelation of the output is:
( ) ( ) ( )
( )
( )
( )
( )∫∫
∫
∞+
∞−
=
∞+
∞−
+∞
∞−
−
−
=
+
=
=
dss
s
K
j
dj
K
djSR
x
v
js
x
v
zzzz
τ
ω
σ
π
ωτω
ωω
σ
π
ωτωω
π
τ
ω
exp
/12
1
exp
/12
1
exp
2
1
2
22
2
22
ωj
xω
R
( )
0
/1
2
22
=
−∫∞→R
s
x
vv
dse
s
K τ
ω
σ
( )
0
/1
2
22
=
−∫∞→R
s
x
vv
dse
s
K τ
ω
σ
xω−
σ
ωσ js +=
0<τ
0>τ
( ) τωσω
ω x
e
K
R vvx
zz
=
=
2
22
τ
2/
22
vvxK σω
( )τω
σω
x
vx
K
−= exp
2
22
( )
( )
( )
( )














>







+
−
−=
−
−
<








−
−
=
−
−
=
∫
∫
→
−→
0
exp
Reexp
2
1
0
exp
Reexp
2
1
222
22
222
222
22
222
τ
ω
τσω
τ
ω
σω
π
τ
ω
τσω
τ
ω
σω
π
ωω
ωω
x
vx
x
vx
x
vx
x
vx
s
sK
sdss
s
K
j
s
sK
sdss
s
K
j
x
x
Random Processes

209
SOLO
Markov Processes
5. Continuous Linear Dynamic System with
Time Variable Coefficients
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== 
( ) ( ) ( ) ( ) ( )tetGtetFte wxx +=
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00
λλλλ
The solution of the Linear System is:
where:
( ) ( ) ( ) ( ) ( ) ( ) ( )3132210000
,,,&,&,, ttttttItttttFtt
td
d
Φ=ΦΦ=ΦΦ=Φ
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx deGttettte
0
,, 00 λλλλ
( ){ } ( ) ( ){ } ( ) ( ){ }twEtGtxEtFtxE +=
Random Processes

210
SOLO
Markov Processes
Time Variable Coefficients (continue – 1)
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00
λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV
T
xxx
:
( ) ( ) ( ){ } ( ) ( ) ( ){ }ττττ +=++=+ teteEttRteteEttR
T
xxx
T
xxx
:,&:,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ==
t
t
TTT
xxx
dtGQGttttVttttRtV
0
,,,,, 000
λλλλλλ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫
+
+Φ+Φ++Φ+Φ=++=+
τ
λλτλλλλττττττ
t
t
TTT
xxx
dtGQGttttVttttRtV
0
,,,,, 000
Random Processes

211
SOLO Markov Processes
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
dwGttxtttx
0
,, 00 λλλλ ( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==: ( ) ( ){ } ( ) ( ){ }ττττ ++=+=+ teteEtxVartV
T
xxx
:
T
xxx
T
xxx
:,&:,
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( )
( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )











<+Φ+
>Φ+Φ−+Φ+
<Φ+Φ−+Φ
>+Φ
=+
∫
∫
+
+
0,
0,,,
0,,,
0,
,
τττ
τλλλλλλτττ
τλλλλλλττ
ττ
τ
τ
τ
tttV
dtGQGttttV
or
dtGQGttVtt
tVtt
ttR
T
x
t
t
TTT
x
t
t
TT
x
x
x
Random Processes

212
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx
deGttettte
0
,, 00
λλλλ
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )∫ ΦΦ+ΦΦ==
t
t
TTT
xxx dtGQGttttVttttRtV
0
,,,,, 000 λλλλλλ
( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx ==:
T
xxx
T
xxx
:,&:,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGdtFtGQGttFtttVtt
dtGQGttFtttVtttFtV
td
d
T
t
t
TTTTT
x
t
t
TTT
xx
+ΦΦ+ΦΦ+
ΦΦ+ΦΦ=
∫
∫
0
0
,,,,
,,,,
000
000
λλλλλλ
λλλλλλ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV
td
d TT
xxx ++=
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )ττττττττ +++++++++=+ tGtQtGtFtVtVtFtV
td
d TT
xxx
Random Processes

213
( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) ( ){ } ( ) ( )21121
&
:&:
tttQteteE
twEtwtetxEtxte
T
ww
wx
−=
−=−=
δ
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )∫Φ+Φ=
t
t
wxx deGttettte
0
,, 00 λλλλ ( ) ( ){ } ( ) ( ){ }teteEtxVartV
T
xxx
==:
T
xxx
T
xxx
:,&:,
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )






<Φ+Φ+Φ+Φ
>Φ+Φ+Φ+Φ
=+
∫
∫
+
0,,,,
0,,,,
,
0
0
000
000
τλλλλλλττ
τλλλλλλττ
τ τt
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )


<+Φ++++++++
>+Φ+++++
=+
0,,,
0,,,
,
ττττττττ
τττττ
τ
tttGtQtGtFttRttRtF
tGtQtGtttFttRttRtF
ttR
td
d
TTT
xx
TT
xx
x
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )


<++++Φ+++++
>+Φ+++++
=+
0,,,
0,,,
,
ττττττττ
τττττ
τ
tGtQtGtttFttRttRtF
tttGtQtGtFttRttRtF
ttR
td
d
TT
xx
TTT
xx
x
Random Processes

214
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
w (t) x (t)
( )tF
( )tG ∫
x (t)
( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== Given a Continuous Linear System:
we want to decide if can be approximated by a white noise.( )tw
Let start with a first order linear system with white noise input :( )tw'
( ) ( ) ( )tw
T
tw
T
tw '
11
+−= w (t)w' (t) ( )
Ts
sH
+
=
1
1
( ) ( ) Ttt
w
ett /
0
0
, −−
=φ
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )ttRtwEtwtwEtwE ww
,τττ +=−+−+
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τττ +=+−+− ttRtwEtwtwEtwE ww
,
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( ) ( )ttRtVwEwtwEtwE wwww
,==−− ττ
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )tGtQtGtFtVtVtFtV
td
d TT
xxx ++= ( ) ( ) Q
T
tV
T
tV
td
d
wwww 2
12
+−=
( ) ( )00
,
1
, tt
T
tt
td
d
ww
φφ −=
where
Random Processes

215
(continue – 1)
( ) ( ) Q
T
tV
T
tV
td
d
wwww 2
12
+−=
( ) ( ) 





−+=
−−
T
t
T
t
wwww
e
T
Q
eVtV
22
1
2
0 t
2/T
( ) T
t
ww
eV
2
0
−






−
−
T
t
e
T
Q 2
1
2 T
Q
V statesteadyww
2
=−
( )tVww
( )
( ) ( ) ( )
( ) ( ) ( )




<+=+Φ+
>=+Φ
=+
−
−
0,
0,
,
ττττ
ττ
τ τ
τ
tVetttV
tVetVtt
ttR
ww
TT
www
ww
T
www
ww
( )
( ) ( ) ( )
( ) ( ) ( )




<+=++Φ
>=+Φ
=+
−
−
0,
0,
,
ττττ
ττ
τ τ
τ
tVetVtt
tVetttV
ttR
ww
T
www
ww
TT
www
ww
For ( ) ( )
T
Q
VtVtV
T
statesteadywwwwww
2
5 ==+≈⇒> −
τ
τ
( ) ( ) ( ) TT
statesteadywwwwwwww
e
T
Q
eVVttRttR
T
ττ
τττ
τ −−
−
=≈≈+≈+⇒>
2
,,5
w (t)w' (t) ( )
Ts
sH
+
=
1
1
Random Processes

216
(continue – 2)
( ) ( ) ( ) TT
statesteadywwwwwwww
e
T
Q
eVVttRttR
T
ττ
τττ
τ −−
−
=≈≈+≈+⇒>
2
,,5
( ) T
ww e
T
Q
V
/
2
τ
τ =
=
τ
T
Q
V statesteadyww
2
=−
T− T
1−
−
⋅eV statesteadyww ( ) Qde
T
Q
dVArea T
ww === ∫∫
+∞
−
+∞
∞− 0
2
2 τττ
τ
T is the correlation time of the noise w (t) and can be found from Vww (τ) by
tacking the time corresponding to Vww steady-state /e.
One other way to find T is by tacking the double sides Laplace Transform L2 on τ of:
( ) ( ){ } ( ) QdetQtQs s
ww
=−=−=Φ ∫
+∞
∞−
−
ττδτδ τ
τ2''
L
( ) ( ){ }
( )
( ) ( )sHQsH
sT
Q
dee
T
Q
Vs sT
sswwww
−=
=
=
==Φ ∫
+∞
∞−
−−
−
2
/
2
1
2
ττ ττ
τL
( )
( )2
2/1
/1 ωω
ω
+
=
Q
Qww
ω
T/12/1 =ω
Q
2/Q
T/12/1 −=−ω
T can be found by tacking ω1/2 of half of the
power spectrum Q/2 and T=1/ ω1/2.
Random Processes

217
( )
( )2
2/1/1 ωω
ω
+
=
Q
Qww
ω
T/12/1 =ω
Q
2/Q
T/12/1 −=−ω
Let return to the original system: ( ) ( ) ( ) ( ) ( ) ( )twtGtxtFtxtx
td
d
+== 
w (t) x (t)
( )tF
( )tG ∫
x (t)
(continue – 3)
Compute the power spectrum of
and define Q and T.
( )ωjsww =Φ ( )tw
then can be approximated by the white noise with( )tw ( )tw'
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''
then can be approximated by a colored noise that can be obtained by passing
the predefined white noise through a filter
( )tw
( )tw' ( )
sT
sH
+
=
1
1
If
Fofeigenvaluemaximum
1
FofconstanttimeminimumT =<51
If
Fofeigenvaluemaximum
1
FofconstanttimeminimumT =>52
Random Processes

218
7. Digital Simulation of a Contimuos Process
Let start with a first order linear system with white noise input :( )tw'
( ) ( ) ( )tw
T
tw
T
tw '
11
+−= w (t)w' (t) ( )
Ts
sH
+
=
1
1
( ) ( ) Ttt
w
ett /
0
0
, −−
=φ ( ) ( )00
,
1
, tt
T
tt
td
d
ww
φφ −=
( ) ( )
( ) ( )
( )∫
−−−
+=
t
t
TtTtt
dw
T
etwetw
0
0
'
1/
0
/
τττ
Let choose t = (k+1) ΔT and t0 = k ΔT
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ } ( )τδττ −=−− tQwEwtwEtwE ''''where
( )[ ] ( ) ( )[ ]
( )
( )
( )
  
Tkw
Tk
Tk
TTkTT
dw
T
eTkweTkw
∆
∆+
∆
∆+−∆−
∫+∆=∆+
1'
1
/1/
'
1
1 τττ
Random Processes

219
7. Digital Simulation of a Contimuos Process (continue – 1)
Define:
TT
e /
: ∆−
=ρ
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
( )[ ] ( )[ ]
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
( )
( )( )
( )[ ]
( )
( )[ ] ( )
( ) ( )2/2
1
/12
2
1
12
/12
2
1 1
122112
/1/1
1111
1
2
1
22
1
''''
1
''''
11
21
21
ρτ
ττττττ
ττ
ττδ
ττ
−=−===
−−=
∆−∆∆−∆
∆−
∆+
∆
∆+−
∆+
∆
∆+−
∆+
∆
∆+
∆ −
∆+−∆+−
∫
∫ ∫
T
Q
e
T
Q
e
T
T
Q
dQ
T
e
ddwEwwEwE
T
ee
TkwETkwTkwETkwE
TT
Tk
Tk
TTk
Tk
Tk
TTk
Tk
Tk
Tk
Tk Q
TTkTTk
  
( )[ ] ( ) ( )[ ]
( )
( )
( )
  
Tkw
Tk
Tk
TTkTT
dw
T
eTkweTkw
∆
∆+
∆
∆+−∆−
∫+∆=∆+
1'
1
/1/
'
1
1 τττ
Define w’ (k) such that:
( ) ( ){ }[ ] ( ) ( ){ }[ ]{ }
T
Q
kwEkwkwEkwE
2
:'''' =−−
( ) ( )
2
1
1
'
:'
ρ−
=
kw
kw
Therefore:
( )[ ] ( ) ( )kwTkwTkw '11 2
ρρ −+∆=∆+
Random Processes

220
SOLO
Markov Chains
Random Processes
X3 X2
0.3
0.3
0.1
0.2
X1
0.6
0.5
0.3
0.6
0.1
Markov chain, named after Andrey Markov, is a stochastic
process with the Markov property. Having the Markov property
means that, given the present state, future states are independent
of the past states. In other words, the description of the present
state fully captures all the information that could influence the
future evolution of the process. Being a stochastic process means
that all state transitions are probabilistic.Andrey
Andreevich
Markov
1856 - 1922
At each step the system may change its state from the
current state to another state (or remain in the same
state) according to a probability distribution. The
changes of state are called transitions, and the
probabilities associated with various state-changes are
called transition probabilities
Definition of Markov Chains
A Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov
property, namely that, given the present state, the future and past states are independent.
( ) ( )nnnnnn xXxXxXxXxX ====== ++ |Pr,,|Pr 1111 

221
SOLO
Markov Chains
Random Processes
Properties of Markov Chains
Define the probability of going from state i to state j in m time steps as:
( )
( )iXjXp om
m
ji ===→ |Pr
and the single step transition as:
( )iXjXp ji ===→ 01 |Pr
X3 X2
X1
1.011 =→p
6.021 =→p
5.012 =→p
3.032 =→p
3.023 =→p
3.031 =→p
6.013 =→p
1.033 =→p 2.022 =→p
For a time-homogeneous Markov Chain:
( )
( )iXjXp kkm
m
ji === +→ |Pr
and:
( ) ( )iXjXkp kkji === +→ |Pr 1
so, the n-step transition satisfies the Chapman-Kolmogorov equation, that for any k
such that 0 < k <n:
( ) ( ) ( )
∑∈
−
→→→ =
Sr
kn
jr
k
ri
n
ji ppp

222
SOLO
Markov Chains
Random Processes
Properties of Markov Chains (continue – 1)
The marginal distribution Pr (Xk = x) is the
distribution over states at time k:
( ) ( )iXjXkp kkji === +→ |Pr 1
X3 X2
X1
1.011 =→p
6.021 =→p
5.012 =→p
3.032 =→p
3.023 =→p
3.031 =→p
6.013 =→p
1.033 =→p 2.022 =→p
In Matrix form it can be written as:
( )
kN
kK
NNNN
N
N
kN X
X
X
ppp
ppp
ppp
X
X
X




























=














→→→
→→→
→→→
+

  





2
1
21
22221
11211
1
2
1
PrPr
where N is the number of states of the Markov Chain.










=
1.03.03.0
3.02.06.0
6.05.01.0
K
Properties of the Transition Matrix K:
( ) 10 ≤≤ → np ji
( ) 1
1
=∑=
→
N
j
ji np
1
2
For a time-homogeneous Markov Chain:
( ) ( ) ( )
( ) ( )∑
∑
∈
→
∈
++
==
=====
Sr
kjr
Sr
kkkk
rXkp
rXrXjXjX
Pr
Pr|PrPr 11

223
SOLO
Markov Chains
Random Processes
A state j is said to be accessible from a state i (written i → j) if a system started in state i
has a non-zero probability chance of transitioning into state j at some point. Formally,
state j is accessible from state i if there exists an integer n≥0 such that :
( ) ( )n
jion piXjX →=== |Pr
Reducibility
Allowing n to be zero means that every state is defined to be accessible from itself.
A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A set
of states C is a communicating class if every pair of states in C communicates with each
other, and no state in C communicates with any state not in C. It can be shown that
communication in this sense is an equivalence relation and thus that communicating
classes are the equivalence classes of this relation. A communicating class is closed if
the probability of leaving the class is zero, namely that if i is in C but j is not, then j is
not accessible from i.
Finally, a Markov chain is said to be irreducible if its state space is a single
communicating class; in other words, if it is possible to get to any state from any state.

224
SOLO
Markov Chains
Random Processes
A state i has period k if any return to state i must occur in multiples of k time steps.
Formally, the period of a state is defined as:
( ){ }0|Pr: >=== iXiXndivisorcommongreatestk on
Periodicity
Note that even though a state has period k, it may not be possible to reach the state in k
steps. For example, suppose it is possible to return to the state in {6,8,10,12,...} time
steps; then k would be 2, even though 2 does not appear in this list.
If k = 1, then the state is said to be aperiodic; otherwise (k>1), the state is said to be
periodic with period k.
It can be shown that every state in a communicating class must have the same period.

225
Existence Theorems
Existence Theorem 3
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),
(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)
having S (ω) as its power spectrum or R (τ) as its autocorrelation.
Define
( ) ( ) ( ) ( ) ( )ω
π
ω
π
ω
ωωω
π
−=
−
=== ∫
+∞
∞−
f
a
S
a
S
fdSa 22
2
:&
1
:
Since , according to Existence Theorem 1,
we can find a random variable ω with the even density function f (ω), and
probability density function
( ) ( ) 1&0 =≥ ∫
+∞
∞−
ωωω dff
( ) ( )∫∞−
=
ω
ττω dfP :
We now form the process , where is a random variable
uniform distributed in the interval (-π,+π) and independent of ω.
( ) ( )ϑω += tatx cos: ϑ

226
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 1)
Since is uniform distributed in the interval (-π,+π) and independent of ω,
its spectrum is
( ){ } ( ){ } ( ){ } ( ){ } ( ){ } 0sinsincoscos
00
,
=−=

ϑωϑω ϑωϑω
ϑω
EtEaEtEatxE
tindependen
ϑ
( ) { } ( )
ϖπ
ϖπ
ϖπϖπ
ϑ
π
ϖ
πϖπϖπ
π
ϑϖπ
π
ϑϖϑϖ
ϑϑ
sin
2
1
2
1
2
1
=
−
====
−+
−
+
−
∫ j
ee
j
e
deeES
jjj
jj
or { } ( ){ } ( ){ } ( )
ϖπ
ϖπ
ϑϖϑϖ ϑϑ
ϑϖ
ϑ
sin
sincos =+= EjEeE j
1=ϖ 1=ϖ
( ) ( ){ } ( ) ( )[ ]{ }
( ){ } ( )[ ]{ }
( ){ } ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ }
 0
2
0
22,
22
2
2sin2sin
2
2cos2cos
2
cos
2
22cos
2
cos
2
coscos
ϑτωϑτωτω
ϑτωτω
ϑτωϑωτ
ϑωϑωω
ϑω
EtE
a
EtE
a
E
a
tE
a
E
a
ttEatxtxE
tindependen
+−++=
+++=
+++=+
2=ϖ 2=ϖ

227
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 2)
( ){ } 0=txE
( ) ( ){ } ( ){ } ( ) ( ) ( )τωωτωτωτ ω xRdf
a
E
a
txtxE ===+ ∫
+∞
∞−
cos
2
cos
2
22
( ) ( )ϑω += tatx cos:We have
Because of those two properties x (t) is wide-sense stationary with a power spectrum
given by:
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=−= ττωτττωτωτω
ττ
dRdjRS x
RR
xx
xx
cossincos
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=+= ωτωω
π
ωτωτωω
π
τ
ωω
dSdjSR x
SS
xx
xx
cos
2
1
sincos
2
1
Therefore ( ) ( )ωπω faSx
2
=
q.e.d.
Fourier
Inverse
Fourier
( ) ( )∫
+∞
∞−
= ωωτω df
a
cos
2
2
f (ω) definition
( )ωS=

228
SOLO Permutation & Combinations
Permutations
Given n objects, that can be arranged in a row, how many different permutations
(new order of the objects) are possible?
To count the possible permutations , let start by moving only the first object {1}.
1
Number of permutations
2
3
n
By moving only the first object {1}, we obtained n permutations.

229
Permutations (continue -1)
Since we obtained all the possible position of the first object, we will perform the same
procedure with the second object no {2}, that will change position with all other objects,
in each of the n permutations that we obtained before .
For example from the group 1 we obtain the following new permutations
Number of new permutations
Since this is true for all permutations (n-1 new permutations for each of the first n
permutations) we obtain a total of n (n-1) permutations .
1
2
n-2
n-1

230
Permutations (continue -2)
If we will perform the same procedure with the third object {3}, that will change position
with all other objects, besides those with objects no {1} and {2} that we already obtained,
in each of the n (n-1) permutations that we obtained before , we will obtain a total
of n (n-1) (n-2) permutations.
We continue the procedure with the objects {4}, {5}, …, {n}, to obtain finally the total
number of permutations of the n objects:
n (n-1) (n-2) (n-3)… 1 = n !
The gamma function Γ is defined as: ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
Gamma Function Γ
If a = n is an integer then:
( )  ( ) ( ) ( )
( )nn
dtttnttdtttn nn
dv
u
n
Γ=
−+−−=−=+Γ ∫∫
∞
−∞
∞
0
1
0
0
expexpexp1

( ) ( ) ( ) 1expexp1 0
0
=−−=−=Γ
∞
∞
∫ tdtt
Therefore: ( ) ( ) ( ) ( ) ( )!11211 −=−−==Γ=+Γ nnnnnnn 
Table of Content

231
Combinations
Given k boxes, each box having a maximum capacity (for box i the maximum
object capacity is ni ).
Given also n objects, that must be arranged in k boxes, each box must be filled to it’s
maximum capacity :
The order of the objects in the box is not important.
Example: A box with a capacity of three objects in which we arranged the objects {2}, {4}, {7)
42 7
4
2 74 27
427
42 7
4 2 7
4 27
Equivalent
3!=6 arrangements
1 outcome
nnnn k =+++ 21

232
Combinations (continue - 1)
In order to count the different combinations we start with n ! different arrangements of the
n objects.
nnnn k =+++ 21
In each of the n! arrangements the first n1 objects will go to box no. 1, the next n2
objects in box no. 2, and so on, and the last nk objects in box no. k, and since:
all the objects are in one of the boxes.

233
But since the order of the objects in the boxes is not important, to obtain the number of
different combinations, we must divide the total number of permutations n! by n1!, because
of box no.1, as seen in the example bellow, where we used n1=2.
1 2 3 nn-1
n1=2 n2 nk
Box 1 Box 2 Box k
12 3 nn-1
kn
!n
21 =n 2n
123 nn-1
123 nn-1
12nn-1
12n n-1
4
4
4
4
n-2
n-2
n-3
n-3
!1n
!1n
!1n
Same
Combination
Same
Combination
Same
Combination
!
!
1n
n
Therefore since the order of the objects in the boxes is not important, and because
the box no.1 can contain only n1 objects, the number of combination are

234
Since the order of the objects in the boxes is not important, to obtain the number of
different combinations, we must divide the total number of arrangements n! by n1!, because
of box no.1, by n2!, because of box no.2, and so on, until nk! because of box no.k, to obtain
!!!
!
21 knnn
n

Combinations
to Bernoulli
Trials
To Generalized
Bernoulli
Trials
Table of Content

235
References
[1] W.B. Davenport, Jr., and W.I. Root, “ An Introduction to the Theory
of Random Signals and Noise”, McGraw-Hill, 1958
[2] A. Papoulis, “ Probability, Random Variables and Stochastic
Processes”, McGraw-Hill, 1965
[4] S.M. Ross, “ Introduction to Probability Models”, 4th
Ed., Academic
Press, 1989
[6] R.M. McDonough, and A.W. Whalen, “ Detection of Signals in Noise”,
2nd
Ed., Academic Press, 1995
[7] Al. Spătaru, “ Teoria Transmisiunii Informaţiei – Semnale şi Perturbaţii”,
(romanian), Editura Tehnică, Bucureşti, 1965
[8] http://www.york.ac.uk/depts/maths/histstat/people/welcome.htm
[9] http://en.wikipedia.org/wiki/Category:Probability_and_statistics
[10] http://www-groups.dcs.st-and.ac.uk/~history/Biographies
[3] K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,
Estimation and Data Analysis”, John Wiley & Sons, 1988
[5] S.M. Ross, “ A Course in Simulation”, Macmillan & Collier Macmillan
Publishers, 1990
Table of Content

236
Integrals Used in Probability
( )
( )
!
1
!!
1
1
0 ++
=−∫ mn
mn
duuu
mn
( ) ( ) 





−=∫ 2
1
expexp
aa
x
xadxxax ( ) ( ) 





+−=∫ a
x
a
x
a
xadxxax
2
23
2 22
expexp
( ) π
2
1
exp
0
2
=−∫
∞
dxx
( ) 0
2
1
exp
0
2
>=−∫
∞
a
a
dxxa
π
( ) π=−∫
∞
∞−
dxx2
exp
( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
( ) ,3,2,1,0!exp
0
==−∫
∞
nndxxxn
( ) ,3,2,1,0,0
!
exp 1
0
=>=− +
∞
∫ na
a
n
dxxax n
n

237
Gamma Function

238
Incomplete Gamma Function

January 6, 2015 239
SOLO
Technion
Israeli Institute of Technology
1964 – 1968 BSc EE
1968 – 1971 MSc EE
Israeli Air Force
1970 – 1974
RAFAEL
Israeli Armament Development Authority
1974 – 2013
Stanford University
1983 – 1986 PhD AA

240
Ferdinand Georg Frobenius
(1849 –1919)
Perron–Frobenius Theorem
In linear algebra, the Perron–Frobenius Theorem, named
after Oskar Perron and Georg Frobenius, asserts that a real
square matrix with positive entries has a unique largest real
eigenvalue and that the corresponding eigenvector has
strictly positive components. This theorem has important
applications to probability theory (ergodicity of Markov
chains) and the theory of dynamical systems (subshifts of
finite type).
Oskar Perron
(1880 – 1975)

Monte Carlo Categories
1. Monte Carlo Calculations
Design various random or pseudo-random number generators.
2. Monte Carlo Sampling
Develop efficient (variance – reduction oriented) sampling techniques for
estimation.
3. Monte Carlo Optimization
Optimize some (non-convex, non-differentiable) functions, to name a few,
Simulated annealing, dynamic weighting, genetic algorithms.

Introduction to Mathematical Probability

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Mathematical Probability

More from Solo Hermelin

Recently uploaded

Introduction to Mathematical Probability

Editor's Notes