3 recursive bayesian estimation

1
Recursive
Bayesian Estimation
SOLO HERMELIN
Updated: 22.02.09
11.01.14
http://www.solohermelin.com

2
SOLO
Table of Content Recursive Bayesian Estimation
Review of Probability
Conditional Probability
Total Probability Theorem
Conditional Probability - Bayes Formula
Statistical Independent Events
Expected Value or Mathematical Expectation
Variance and Central Moments
Characteristic Function and Moment-Generating Function
Probability Distribution and Probability Density Functions (Examples)
Normal (Gaussian) Distribution
Existence Theorems 1 & 2
Monte Carlo Method
Estimation of the Mean and Variance of a Random Variable
Generating Discrete Random Variables
Existence Theorem 3
Markov Processes
Functions of one Random Variable
The Laws of Large Numbers
Central Limit Theorem
Problem Definition
Stochastic Processes

3
SOLO
Table of Content (continue -1)
Recursive Bayesian Estimation
Bayesian Estimation Introduction
Linear Gaussian Markov Systems
Closed-Form Solutions of Estimation
Kalman Filter
Extended Kalman Filter
General Bayesian Nonlinear Filters
Additive Gaussian Nonlinear Filter
Gauss – Hermite Quadrature Approximation
Unscented Kalman Filter
Monte Carlo Kalman Filter (MCKF)
Non-Additive Non-Gaussian Nonlinear Filter
Nonlinear Estimation Using Particle Filters
Importance Sampling (IS)
Sequential Importance Sampling (SIS)
Sequential Importance Resampling (SIR)
Monte Carlo Particle Filter (MCPF)
Bayesian Maximum Likelihood Estimate (Maximum Aposteriori – MAP Estimate)

4
SOLO
Table of Content (continue -2)
References
Nonlinear Filters based on the Fokker-Planck Equation

5
SOLO Recursive Bayesian Estimation
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
Since this is a probabilistic problem, we start with a remainder of Probability Theory
A discrete nonlinear system is defined by
( )
( )kkk
kkk
vxkhz
wxkfx
,,
,,1 11
=
−= −− State vector dynamics
Measurements
kk vw ,1− State and Measurement Noise Vectors, respectively
Problem Definition:
Estimate the hidden States of a Non-linear Dynamic Stochastic System from
Noisy Measurements .
kx
kz
Table of Content

6
SOLO
Pr (A) is the probability of the event A if
S nAAAA ∪∪∪= 21
1A 2A nA
jiOAA ji ≠∀/=∩
( ) 0Pr ≥A(1)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
( ) 1Pr =S(2)
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
Probability Axiomatic Definition
Probability Geometric Definition
Assume that the probability of an event in a geometric region A is defined as the
ratio between A surface to surface of S.
( ) ( )
( )SSurface
ASurface
A =Pr
( ) 0Pr ≥A(1)
( ) 1Pr =S(2)
(3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )nAAAA PrPrPrPr 21 +++= 
S
A
A more detailed explanation
of the subject is given in the
“Probability” Presentation

7
SOLO
From those definition we can prove the
following:( ) 0=/OP(1’)
Proof: OOSandOSS /=/∩/∪=
( )
( ) ( ) ( ) ( ) 0PrPrPrPr
3
=/⇒/+=⇒ OOSS
( ) ( )APAP −= 1(2’)
Proof: OAAandAAS /=∩∪= ( )
( ) ( )
( ) ( ) ( ) ( )AAAAS Pr1PrPrPr1Pr
32
−=⇒+==⇒
( ) 1Pr0 ≤≤ A(3’)
Proof: ( )
( )
( )
( )
( ) 1Pr0Pr1Pr
1'2
≤⇒≥−= AAA
( )
( )APr0
1
≤
( ) 0Pr ≥A(1) ( ) 1Pr =S(2) (3) If jiOAAandAAAA jin ≠∀/=∩∪∪∪= 21
then ( ) ( ) ( ) ( )n
AAAA PrPrPrPr 21
+++= 
( ) ( )AABAIf PrPr ≤⇒⊂(4’)
Proof: ( )
( )
( ) ( ) ( ) ( )BAAABB PrPr0PrPrPr
00
3
≤⇒≥+−=
≥≥

( ) ( ) OAABandAABB /=∩−∪−=
( ) ( ) ( ) ( )BABABA ∩−+=∪ PrPrPrPr(5’)
Proof: ( ) ( )
( ) ( ) ( ) ( ) OABBAandABBAB
OABAandABABA
/=−∩∩−∪∩=
/=−∩−∪=∪
( )
( )
( ) ( )
( )
( )
( ) ( )
( ) ( ) ( ) ( )BABABA
ABBAB
ABABA
∩−+=∪⇒




−+∩=
−+=∪
PrPrPrPr
PrPrPr
PrPrPr
3
3
Table of Content

8
SOLO
Conditional Probability
S nAAAA ααα ∪∪∪= 21

1αA
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

Given two events A and B decomposed in elementary
events
jiOAAandAAAAA ji
n
i
in ≠∀/=∩=∪∪∪=
=
αααααα 
1
21
lkOAAandAAAAB lk
m
k
km ≠∀/=∩=∪∪∪=
=
ββββββ 
1
21
jiOAAandAAABA jir ≠∀/=∩∪∪∪=∩ αβαβαβαβαβ 21
( ) ( ) ( ) ( )n
AAAA ααα PrPrPrPr 21
+++=  ( ) ( ) ( ) ( )mAAAB βββ PrPrPrPr 21 +++= 
( ) ( ) ( ) ( ) nmrAAABA r ,PrPrPrPr 21 ≤+++=∩ βαβαβα 
We want to find the probability of A event under the condition that the event B
had occurred designed as P (A|B)
( )
( ) ( ) ( )
( ) ( ) ( )
( )
( )B
BA
AAA
AAA
BA
m
r
Pr
Pr
PrPrPr
PrPrPr
|Pr
21
21 ∩
=
+++
+++
=
βββ
βαβαβα



9
SOLO
Conditional Probability S nAAAA ααα ∪∪∪= 21

1αA
1αβA
mAAAB βββ ∪∪∪= 212αA
2αβA 1βA 2βA

If the events A and B are statistical independent, that the fact that B occurred will
not affect the probability of A to occur.
( ) ( )
( )B
BA
BA
Pr
Pr
|Pr
∩
= ( ) ( )
( )A
BA
AB
Pr
Pr
|Pr
∩
=
( ) ( )ABA Pr|Pr = ( ) ( ) ( ) ( ) ( ) ( ) ( )BAAABBBABA PrPrPr|PrPr|PrPr ⋅=⋅=⋅=∩
Definition:
n events Ai i = 1,2,…n are statistical independent if:
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content

10
SOLO
Conditional Probability - Bayes Formula
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA ββββ Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
m
k
k ,
1
∀/=∩∩∩∩=
=
βββ
( ) ( )∑
=
∩=
m
k
k
BAB
1
PrPr β
we obtain:
( ) ( ) ( )
( )
( ) ( )
( ) ( )∑=
⋅
⋅
=
⋅
= m
k
kk
llll
l
AAB
AAB
B
AAB
BA
1
Pr|Pr
Pr|Pr
Pr
Pr|Pr
|Pr
ββ
ββββ
β
Bayes Formula
Thomas Bayes
1702 - 1761
Table of Content

11
SOLO
Total Probability Theorem
Table of Content
jiOAAandSAAA jin ≠∀/=∩=∪∪∪ 21If
we say that the set space S is decomposed in exhaustive and
incompatible (exclusive) sets.
The Total Probability Theorem states that for any event B,
its probability can be decomposed in terms of conditional
probability as follows:
( ) ( ) ( ) ( )∑∑ ==
==
n
i
i
n
i
i BPBABAB
11
|Pr,PrPr
Using the relation:
( ) ( ) ( ) ( ) ( )llll AABBBABA Pr|PrPr|PrPr ⋅=⋅=∩
( ) ( ) ( ) klOBABABAB lk
n
k
k ,
1
∀/=∩∩∩∩=
=

( ) ( )∑=
∩=
n
k
k BAB
1
PrPr
For any event B
we obtain:

12
SOLO
Statistical Independent Events
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )∏∑∏∑∏∑
∑∑∑
=
−






≠≠
=






≠
=






=
=
−






≠≠






≠






==
−+−+−=






−+−+−=





n
i
i
n
n
kji
kji i
i
n
ji
ji i
i
n
i
i
tIndependen
lStatisticaA
n
i
i
n
n
kji
kji
kji
n
ji
ji
ji
n
i
i
n
i
i
AAAA
AAAAAAAA
i
1
1
3
,.
3
1
2
.
2
1
1
1
1
1
3
,.
2
.
1
11
Pr1PrPrPr
Pr1PrPrPrPr

 
From Theorem of Addition
Therefore
( )[ ]∏==
−=





−
n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr1Pr1  ( )[ ]∏==
−−=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
Pr11Pr 
Since OAASAA
n
i
i
n
i
i
n
i
i
n
i
i /=














=














====
 
1111
&








=





−
==

n
i
i
n
i
i AA
11
PrPr1
( )∏==
=




 n
i
i
tIndependen
lStatisticaA
n
i
i AA
i
11
PrPr 
If the n events Ai i = 1,2,…n are statistical independent
than are also statistical independentiA
( )∏=
=
n
i
iA
1
Pr





=
=

n
i
i
MorganDe
A
1
Pr ( )[ ]∏=
−=
n
i
i
tIndependen
lStatisticaA
A
i
1
Pr1
( ) nrAA
r
i
i
r
i
i ,,2PrPr
11
 =∀=





∏==
Table of Content

13
SOLO Review of Probability
Expected Value or Mathematical Expectation
Given a Probability Density Function p (x) we define the Expected Value
For a Continuous Random Variable: ( ) ( )∫
+∞
∞−
= dxxpxxE X:
For a Discrete Random Variable: ( ) ( )∑=
k
kXk xpxxE :
For a general function g (x) of the
Random Variable x: ( )[ ] ( ) ( )∫
+∞
∞−
= dxxpxgxgE X:
( )xp
x
0 ∞+∞−
0.1
( )xE
( )
( )
( )∫
∫
∞+
∞−
+∞
∞−
=
dxxp
dxxpx
xE
X
X
:
The Expected Value is the center of
surface enclosed between the
Probability Density Function and x
axis.
Table of Content

14
Variance
Given a Probability Density Functions p (x) we define the Variance
( ) ( )[ ]{ } ( ) ( )[ ] ( ) ( )22222
2: xExExExExxExExExVar −=+−=−=
Central Moment
( ) { }k
k xEx =:'µ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the origin
( ) ( )[ ]{ } ( ) ( )∑=
−−
−





=−=
k
j
jk
j
jkk
k xE
j
k
xExEx
0
'1: µµ
Given a Probability Density Functions p (x) we define the Central Moment
of order k about the Mean E (x)
Table of Content

15
Moments
Normal Distribution ( ) ( ) ( )[ ]
σπ
σ
σ
2
2/exp
;
22
x
xpX
−
=
[ ] ( )


 −⋅
=
oddnfor
evennforn
xE
n
n
0
131 σ
[ ]
( )





+=
=−⋅
= +
12!2
2
2131
12
knfork
knforn
xE kk
n
n
σ
π
σ
Proof:
Start from: and differentiate k time with respect to a( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
Substitute a = 1/(2σ2
) to obtain E [xn
]
( ) ( ) 0
2
1231
exp 12
22
>
−⋅
=− +
∞
∞−
∫ a
a
k
dxxax kk
k π
[ ] ( ) ( )[ ] ( ) ( )[ ]
( ) ( ) 12
!
0
122/
0
222221212
!2
2
exp
2
22
2/exp
2
2
2/exp
2
1
2
+
∞+
=
∞∞
∞−
++
=−=
−=−=
∫
∫∫
kk
k
k
k
xy
kkk
kdyyy
xdxxxdxxxxE
σ
πσ
σ
π
σ
σπ
σ
σπ
σ
  
Now let compute:
[ ] [ ]( )2244
33 xExE == σ
Chi-square

16
Functions of one Random Variable
Let y = g (x) a given function of the random variable x defined o the domain Ω, with
probability distribution pX (x). We want to find pY (y).
Fundamental Theorem
Assume x1, x2, …, xn all the solutions of the equation
( ) ( ) ( )n
xgxgxgy ==== 21
( ) ( )
( )
( )
( )
( )
( )n
nXXX
Y
xg
xp
xg
xp
xg
xp
yp
''' 2
2
1
1
+++= 
( ) ( )
xd
xgd
xg =:'
Proof
( ) ( ) ( ) ( ) ( )
( )∑∑∑ ===
==±≤≤=+≤≤=
n
i i
iX
n
i
iiX
n
i
iiiY yd
xg
xp
xdxpxdxxxydyYyydyp
111 '
PrPr:
q.e.d.

17
Functions of one Random Variable (continue – 1)
Example 1
bxay += ( ) 




 −
=
a
by
p
a
yp XY
1
Example 2
x
a
y = ( ) 





=
y
a
p
y
a
yp XY 2
Example 3
2
xay = ( ) ( )yU
a
y
p
a
y
p
ya
yp XXY
















−+








=
2
1
Example 4
xy = ( ) ( ) ( )[ ] ( )yUypypyp XXY −+=
Table of Content

18
Characteristic Function and Moment-Generating Function
Given a Probability Density Functions pX (x) we define the Characteristic Function or
Moment Generating Function
( ) ( )[ ]
( ) ( ) ( ) ( )
( ) ( )




=
==Φ
∑
∫∫
+∞
∞−
+∞
∞−
x
X
XX
X
discretexxpxj
continuousxxPdxjdxxpxj
xjE
ω
ωω
ωω
exp
expexp
exp:
This is in fact the complex conjugate of the Fourier Transfer of the Probability Density
Function. This function is always defined since the sufficient condition of the existence of a
Fourier Transfer :
Given the Characteristic Function we can find the Probability Density
Functions pX (x) using the Inverse Fourier Transfer:
( )
( )
( ) ∞<== ∫∫
+∞
∞−
≥+∞
∞−
1
0
dxxpdxxp X
xp
X
( ) ( ) ( )∫
+∞
∞−
Φ−= ωωω
π
dxjxp XX exp
2
1
is always fulfilled.

19
Properties of Moment-Generating Function
( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X
ω
ω
ω
exp
( ) ( ) 10
==Φ ∫
+∞
∞−
=
dxxpXX ω
ω
( ) ( ) ( )xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=0ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
X 22
2
2
exp ω
ω
ω ( ) ( ) ( ) ( ) ( )2222
0
2
2
xEjdxxpxj
d
d
X
X
==
Φ
∫
+∞
∞−=ω
ω
ω
( ) ( ) ( ) ( )∫
+∞
∞−
=
Φ
dxxpxxjj
d
d
X
nn
n
X
n
ω
ω
ω
exp
( ) ( ) ( ) ( ) ( )nn
X
nn
n
X
n
xEjdxxpxj
d
d
==
Φ
∫
+∞
∞−=0ω
ω
ω
 
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp
This is the reason why ΦX (ω) is also called the Moment-Generation Function.

20
Properties of Moment-Generating Function
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 

+++++=
+
Φ
++
Φ
+
Φ
+Φ=Φ
===
=
n
n
n
n
X
n
XX
XX
xE
n
j
xE
j
xE
j
d
d
nd
d
d
d
!!2!1
1
!
1
!2
1
2
2
0
2
0
2
2
0
0
ωωω
ω
ω
ω
ω
ω
ω
ω
ω
ω
ωω
ωωω
ω
Develop ΦX (ω) in a Taylor series
( ) ( ) ( )∫
+∞
∞−
=Φ dxxpxj XX ωω exp

21
(2) Poisson’s Distribution ( ) ( )0
0
exp
!
, k
k
k
nkp
k
−≈
(1) Binomial (Bernoulli) ( )
( )
( ) ( ) knkknk
pp
k
n
pp
knk
n
nkp
−−
−





=−
−
= 11
!!
!
,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
(3) Normal (Gaussian)
( ) ( ) ( )[ ]
σπ
σµ
σµ
2
2/exp
,;
22
−−
=
x
xp
(4) Laplacian Distribution ( )







 −
−=
b
x
b
bxp
µ
µ exp
2
1
,;

22
(5) Gama Distribution ( )
( )
( )





<
≥
Γ
−
=
−
00
0
/exp
,;
1
x
xx
k
x
kxp
k
k
θ
θ
θ
(6) Beta Distribution
( ) ( )
( )
( )
( ) ( )
( ) 11
1
0
11
11
1
1
1
,;
−−
−−
−−
−
ΓΓ
+Γ
=
−
−
=
∫
βα
βα
βα
βα
βα
βα xx
duuu
xx
xp
(7) Cauchy Distribution ( )
( ) 





+−
= 22
0
0
1
,;
γ
γ
π
γ
xx
xxp

23
SOLO
(8) Exponential Distribution
( )
( )



<
≥−
=
00
0exp
;
x
xx
xp
λλ
λ
(9) Chi-square Distribution
( )
( )
( )
( )





<
≥−
Γ=
−
00
02/exp
2/
2/1
;
12/
2/
x
xxx
kkxp
k
k
Γ is the gamma function ( ) ( )∫
∞
−
−=Γ
0
1
exp dttta a
(10) Student’s t-Distribution
( ) ( )[ ]
( ) ( )( ) 2/12
/12/
2/1
; +
+Γ
+Γ
= ν
ννπν
ν
ν
x
xp

24
SOLO
(11) Uniform Distribution (Continuous)
( )





>>
≤≤
−=
bxxa
bxa
abbaxp
0
1
,;
(12) Rayleigh Distribution
( ) 2
2
2
2
exp
;
σ
σ
σ






−
=
x
x
xp
(13) Rice Distribution
( ) 










 +
−
= 202
2
22
2
exp
,;
σσ
σ
σ
vx
I
vx
x
vxp

25
(14) Weibull Distribution
SOLO
( )





<
>≥













 −
−




 −
=
−
00
0,,exp
,,;
1
x
x
xx
xp
αγµ
α
µ
α
µ
α
γ
αµγ
γγ
Table of Content

26
Normal (Gaussian) Distribution
Karl Friederich Gauss
1777-1855
( )
( )
( )σµ
σπ
σ
µ
σµ ,;:
2
2
exp
,;
2
2
x
x
xp N=





 −
−
=
( ) ( )
∫
∞−





 −
−=
x
du
u
xP 2
2
2
exp
2
1
,;
σ
µ
σπ
σµ
( ) µ=xE
( ) σ=xVar
( ) ( )[ ]
( ) ( )






−=





 −
−=
=Φ
∫
∞+
∞−
2
exp
exp
2
exp
2
1
exp
22
2
2
σω
µω
ω
σ
µ
σπ
ωω
j
duuj
u
xjE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function

27
Moments
Normal Distribution ( ) ( ) ( )[ ] ( )σ
σπ
σ
σ ,0;:
2
2/exp
,0;
22
x
x
xpX N=
−
=
[ ] ( )


 −⋅
=
oddnfor
evennforn
xE
n
n
0
131 σ
[ ]
( )





+=
=−⋅
= +
12!2
2
2131
12
knfork
knforn
xE kk
n
n
σ
π
σ
Proof:
Start from: and differentiate k time with respect to a( ) 0exp 2
>=−∫
∞
∞−
a
a
dxxa
π
Substitute a = 1/(2σ2
) to obtain E [xn
]
( ) ( ) 0
2
1231
exp 12
22
>
−⋅
=− +
∞
∞−
∫ a
a
k
dxxax kk
k π
[ ] ( ) ( )[ ] ( ) ( )[ ]
( ) ( ) 12
!
0
122/
0
222221212
!2
2
exp
2
22
2/exp
2
2
2/exp
2
1
2
+
∞+
=
∞∞
∞−
++
=−=
−=−=
∫
∫∫
kk
k
k
k
xy
kkk
kdyyy
xdxxxdxxxxE
σ
πσ
σ
π
σ
σπ
σ
σπ
σ
  
Now let compute:
[ ] [ ]( )2244
33 xExE == σ
Chi-square

28
Normal (Gaussian) Distribution (continue – 1)
Karl Friederich Gauss
1777-1855
( ) ( ) ( ) ( )PxxxxPxxPPxxp
T
,;:
2
1
exp2,; 12/1 
N=



−−−= −−
π
A Vector – Valued Gaussian Random Variable has the
Probability Density Functions
where
{ }xEx

= Mean Value
( )( ){ }T
xxxxEP

−−= Covariance Matrix
If P is diagonal P = diag [σ1
2
σ2
2
… σk
2
] then the components of the random vector
are uncorrelated, and
x

( )
( ) ( ) ( ) ( )
∏=
−
−





 −
−
=





 −
−




 −
−




 −
−
=






























−
−
−




























−
−
−
−=
k
i i
i
ii
k
k
kk
kk
k
T
kk
xxxxxxxx
xx
xx
xx
xx
xx
xx
PPxxp
1
2
2
2
2
2
2
2
2
22
1
2
1
2
11
22
11
1
2
2
2
2
1
22
11
2/1
2
2
exp
2
2
exp
2
2
exp
2
2
exp
0
0
2
1
exp2,;
σπ
σ
σπ
σ
σπ
σ
σπ
σ
σ
σ
σ
π



therefore the
components of the
random vector are
also independent

29
The Laws of Large Numbers
The Law of Large Numbers is a fundamental concept in statistics and probability that
describes how the average of randomly selected sample of a large population is likely
to be close to the average of the whole population. There are two laws of large numbers
the Weak Law and the Strong Law.
The Weak Law of Large Numbers
The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), then
( ) nXXX nn /: 1 ++= 
converges in probability (a weak convergence sense) to μ . We have
{ } ∞→=<− nforXn 1Pr εµ
converges in
probability
The Strong Law of Large Numbers
The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequence
of random variables that have the same expected value μ and variance σ2
, and are
uncorrelated (i.e., the correlation between any two of them is zero), and E (|Xi|) < ∞
then ,i.e. converges almost surely to μ.{ } ∞→== nforXn 1Pr µ
converges
almost surely

3030
The Law of Large Numbers
Differences between the Weak Law and the Strong Law
The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ.
Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite
number of times, although it happens at infrequent intervals.
The Strong Law shows that this almost surely will not occur.
In particular, it implies that with probability 1, we have for any positive value ε, the
inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to
an infinite, but infrequent, number of times).
Almost sure convergence is also called strong convergence of random variables.
This version is called the strong law because random variables which converge
strongly (almost surely) are guaranteed to converge weakly (in probability). The
strong law implies the weak law.

3131
The Law of Large Numbers
Proof of the Weak Law of Large Numbers
( ) iXE i ∀= µ ( ) iXVar i ∀= 2
σ ( )( )[ ] jiXXE ji ≠∀=−− 0µµ
( ) ( ) ( )[ ] µµ ==++= nnnXEXEXE nn //1 
( ) ( )[ ]{ } ( ) ( )
( )( )[ ] ( )[ ] ( )[ ]
nn
n
n
XEXE
n
XX
E
n
XX
EXEXEXVar
n
jiXXE
nn
nnn
ji 2
2
2
2
22
1
0
2
1
2
12
σσµµ
µµ
µ
µµ
==
−++−
=













 −++−
=














−
++
=−=
≠∀=−−


Given
we have:
Using Chebyshev’s inequality on we obtain:nX ( ) 2
2
/
Pr
ε
σ
εµ
n
Xn ≤≥−
Using this equation we obtain:
( ) ( ) ( ) n
XXX nnn 2
2
1Pr1Pr1Pr
ε
σ
εµεµεµ −≥≥−−≥>−−=≤−
As n approaches infinity, the expression approaches 1.
Chebyshev’s
inequality
q.e.d.
Monte Carlo
Integration
Monte Carlo
Integration
Table of Content

3232
Central Limit Theorem
The first version of this theorem was first postulated by the
French-born English mathematician Abraham de Moivre in
1733, using the normal distribution to approximate the
distribution of the number of heads resulting from many tosses
of a fair coin. This was published in1756 in “The Doctrine
of Chance” 3th Ed.
Pierre-Simon Laplace
(1749-1827)
Abraham de Moivre
(1667-1754)
This finding was forgotten until 1812 when the French
mathematician Pierre-Simon Laplace recovered it in his work
“Théory Analytique des Probabilités”, in which he approximate
the binomial distribution with the normal distribution.
This is known as the De Moivre – Laplace Theorem.
De Moivre – Laplace
Theorem
The present form of the Central Limit Theorem was given by the
Russian mathematician Alexandr Lyapunov in 1901.
Alexandr Mikhailovich
Lyapunov
(1857-1918)

3333
Central Limit Theorem (continue – 1)
Let X1, X2, …, Xm be a sequence of independent random variables with the same
probability distribution function pX (x). Define the statistical mean:
m
XXX
X m
m
+++
=
21
( ) ( ) ( ) ( ) µ=
+++
=
m
XEXEXE
XE m
m
21
( ) ( )[ ]{ } ( ) ( ) ( )
mm
m
m
XXX
EXEXEXVar m
mmmXm
2
2
22
21
22 σσµµµ
σ ==













 −++−+−
=−==

Define also the new random variable
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
We have:
The probability distribution of Y tends to become gaussian (normal) as m
tends to infinity, regardless of the probability distribution of the random
variable, as long as the mean μ and the variance σ2
are finite.

3434
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
Proof
The Characteristic Function
( ) ( )[ ] ( ) ( ) ( )
( ) ( )
( )
m
X
m
i
m
i
i
m
Y
m
X
m
j
E
m
X
jE
m
XXX
jEYjE
i














Φ=



















 −
=













 −
=













 −++−+−
==Φ
−
=
∏
ω
σ
µω
σ
µ
ω
σ
µµµ
ωωω
σ
µexpexp
expexp
1
21 
( )
( ) ( ) ( ) ( ) ( ) ( )
0/lim
2
1
!3
/
!2
/
!1
/
1
2222
33
1
22
0
=





Ο/





Ο/+−=
+













 −
+













 −
+




 −
+=





Φ
∞→
−
mmmm
X
E
mjX
E
mjX
E
mj
m
m
iii
Xi
ωωωω
σ
µω
σ
µω
σ
µωω
σ
µ 
  
Develop in a Taylor series( ) 





Φ −
miX
ω
σ
µ

35
Proof (continue – 1)
The Characteristic Function ( ) ( )
m
XY
m
E i














Φ=Φ −
ω
ω
σ
µ
( ) 0/lim
2
1
2222
=





Ο/





Ο/+−=





Φ
∞→
−
mmmmm m
Xi
ωωωωω
σ
µ
( ) ( )2/exp
2
1 2
22
ω
ωω
ω −→











Ο/+−=Φ
∞→m
m
Y
mm
Therefore
( ) ( ) ( ) ( ) ( )2/exp
2
1
2/exp
2
1
exp
2
1 22
ydyjdyjyp
m
YY −=−−→Φ−= ∫∫
+∞
∞−
∞→+∞
∞− π
ωωω
π
ωωω
π
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity
(Convergence in Distribution).
Characteristic Function
of Normal Distribution
Convergence
Concepts
Monte Carlo
Integration
Table of Content

36
( ) ( ) ( ) ( )
m
XXXXEX
Y m
X
mm
m
σ
µµµ
σ
−++−+−
=
−
=
21
:
Proof
The Characteristic Function
( ) ( )[ ] ( ) ( ) ( )
( ) ( )
( )
m
X
m
i
m
i
i
m
Y
m
X
m
j
E
m
X
jE
m
XXX
jEYjE
i














Φ=



















 −
=













 −
=













 −++−+−
==Φ
−
=
∏
ω
σ
µω
σ
µ
ω
σ
µµµ
ωωω
σ
µexpexp
expexp
1
21 
( )
( ) ( ) ( ) ( ) ( ) ( )
0/lim
2
1
!3
/
!2
/
!1
/
1
2222
33
1
22
0
=





Ο/





Ο/+−=
+













 −
+













 −
+




 −
+=





Φ
∞→
−
mmmm
X
E
mjX
E
mjX
E
mj
m
m
iii
Xi
ωωωω
σ
µω
σ
µω
σ
µωω
σ
µ 
  
Develop in a Taylor series( ) 





Φ −
miX
ω
σ
µ

37
Proof (continue – 1)
The Characteristic Function ( ) ( )
m
XY
m
E i














Φ=Φ −
ω
ω
σ
µ
( ) 0/lim
2
1
2222
=





Ο/





Ο/+−=





Φ
∞→
−
mmmmm m
Xi
ωωωωω
σ
µ
( ) ( )2/exp
2
1 2
22
ω
ωω
ω −→











Ο/+−=Φ
∞→m
m
Y
mm
Therefore
( ) ( ) ( ) ( ) ( )2/exp
2
1
2/exp
2
1
exp
2
1 22
ydyjdyjyp
m
YY −=−−→Φ−= ∫∫
+∞
∞−
∞→+∞
∞− π
ωωω
π
ωωω
π
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity
(Convergence in Distribution).
Characteristic Function
of Normal Distribution
Convergence
Concepts
Table of Content

38
Existence Theorems
Existence Theorem 1
Given a function G (x) such that
( ) ( ) ( ) 1lim,1,0 ==∞+=∞−
∞→
xGGG
x
( ) ( ) 2121 0 xxifxGxG <=≤ ( G (x) is monotonic non-decreasing)
( ) ( ) ( )xGxGxG n
xx
xx
n
n
==
≥
→
+ lim
We can find an experiment X and a random variable x, defined on X, such that
its distribution function P (x) equals the given function G (x).
Proof of Existence Theorem 1
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axis.
5x
1x 2x 3x 4x 6x 7x 8x
∞− ∞+
To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1).
From our definition of G (x) it follows that P (x) is a distribution function.
Existence Theorem 2 Existence Theorem 3

39
Existence Theorems
Existence Theorem 2
If a function F (x,y) is such that
( ) ( ) ( )
( ) ( ) ( ) ( ) 0,,,,
1,,0,,
11122122 ≥+−−
=+∞∞+=−∞=∞−
yxFyxFyxFyxF
FxFyF
for every x1 < x2, y1 < y2, then two random variables x and y can be found such that
F (x,y) is their joint distribution function.
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.
Assume that the outcome of the experiment Y is any real number -∞ <y < +∞.
We consider as events all intervals, the intersection or union of intervals on the
real axes x and y.
To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1).
From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.
The proof is similar to that in the Existence Theorem 1

40
Monte Carlo Method
Monte Carlo methods are a class of computational algorithms that
rely on repeated random sampling to compute their results. Monte
Carlo methods are often used when simulating physical and
mathematical systems. Because of their reliance on repeated
computation and random or pseudo-random numbers, Monte Carlo
methods are most suited to calculation by a computer. Monte Carlo
methods tend to be used when it is infeasible or impossible to
compute an exact result with a deterministic algorithm.
The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam,
Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear
weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo
Casino in Monaco where Ulam's uncle would borrow money to gamble)
Stanislaw Ulam
1909 - 1984
Enrico - Fermi
1901 - 1954
John von Neumann
1903 - 1957
Monte Carlo Casino
Nicholas Constantine Metropolis
(1915 –1999)

41
Monte Carlo Approximation
Monte Carlo runs, generate a set of random samples that approximate the distribution p (x).
So, with P samples, expectations with respect to the filtering distribution are approximated by
( ) ( ) ( )
( )∑∫ =
≈
P
L
L
xf
P
dxxpxf
1
1
and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution
up to some degree of approximation.
{ } ( ) ( )
∑∫ =
≈==
P
L
L
x
P
dxxpxxE
1
1
1
µ
( ){ } ( ) ( ) ( )
( )∑∫ =
−≈−=−=
P
L
nLnn
n x
P
dxxpxxE
1
111
1
µµµµ

Table of Content
x(L)
are generated (draw) samples from distribution p (x)
( )
( )xpx L
~

42
Estimation of the Mean and Variance of a Random Variable (Unknown Statistics)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.
Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,
and sample variance, , as estimates of the population mean, m, and variance, σ2
.
2
ˆkσ
kmˆ
( )
{ }
( ) ( ) ( )[ ] ( ) ( )[ ]
2
1
2
1
222
2
22222
1 11
2
1
2
2
11
2
1
2
11
1
1
1
1
1
21
11
2
1
ˆˆ2
1
ˆ
1
σσ
σσσ
k
k
kk
mkmkk
k
mmk
k
m
k
xx
k
Ex
k
xExE
k
mxmxE
k
mx
k
E
k
i
k
i
k
i
k
l
l
k
j
j
k
j
jii
k
k
i
ik
k
i
i
k
i
ki
−
=





−=






++−+++−−+=














+






−=






+−=






−
∑
∑
∑ ∑∑∑
∑∑∑
=
=
= ===
===
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Compute
Biased
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.

43
Estimation of the Mean and Variance of a Random Variable (continue - 1)
{ } { } jimxExE ji ,∀==
Define
Estimation of the
Population mean
∑=
=
k
i
ik x
k
m
1
1
:ˆ
.
2
ˆkσ
kmˆ
( ) 2
1
2 1
ˆ
1
σ
k
k
mx
k
E
k
i
ki
−
=






−∑=
{ } { } jimxExE ji ,2222
∀+== σ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } { } { } jimxExExxE ji
tindependenxx
ji
ji
,2
,
∀==
Biased
Unbiased
Therefore, the unbiased estimation of the sample variance of the population is defined as:
( )∑=
−
−
=
k
i
kik mx
k 1
22
ˆ
1
1
:ˆσ since { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
Unbiased

44
.
2
ˆkσ
kmˆ
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE

45
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ { } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EEWe found:
Let Compute:
( ){ } ( )
( ){ } ( ) ( ){ }
( ){ } ( ){ } ( ){ }
k
mxEmxEmxE
k
mxmxEmxE
k
mx
k
Emx
k
EmmE
k
i
k
ij
j
ji
k
i
i
k
i
k
ij
j
ji
k
i
i
k
i
i
k
i
ikmk
2
1 1
00
1
2
2
1 11
2
2
2
1
2
1
22
ˆ
2
1
1
11
ˆ:
σ
σ
σ
=










−−+−=










−−+−=














−=














−=−=
∑ ∑∑
∑∑∑
∑∑
=
≠
==
=
≠
==
==

( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=

46
Let Compute:
( ){ } ( ) ( )
( ) ( ) ( ) ( )[ ]
( ) ( ) ( ) ( )














−−
−
+−
−
−
+−
−
=














−−+−−+−
−
=














−−+−
−
=














−−
−
=−=
∑∑
∑
∑∑
==
=
==
2
22
11
2
2
2
1
22
2
2
1
2
2
2
1
22222
ˆ
ˆ
11
ˆ2
1
1
ˆˆ2
1
1
ˆ
1
1
ˆ
1
1
ˆ:2
σ
σ
σσσσσσ
k
k
i
i
k
k
i
i
k
i
kkii
k
i
ki
k
i
kik
mm
k
k
mx
k
mm
mx
k
E
mmmmmxmx
k
E
mmmx
k
Emx
k
EE
k
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−

47
Since (xi – m), (xj - m) and are all independent for i ≠ j:( )kmm ˆ−
( )
( )
( ) ( ) ( )
( ){ }
( ) ( ) ( ) ( ) ( ) ( )
( ){ }4
2
2
4
22
4
44
2
4
44
2
2
2
4
2
4
2
42
ˆ
ˆ
11
7
11
2
1
2
1
2
ˆ
11
4
1
1
1
2
k
k
mmE
k
k
k
k
k
k
kk
k
k
k
mmE
k
k
kk
kk
k
k
k
−
−
+
−
+−
+
−
=
−
−
−
−
−
+
+−
−
+
−
+
−
−
+
−
≈
σ
µσσσ
σ
σσµ
σσ
kk
4
42
ˆ 2
σµ
σσ
−
≈ ( ){ }4
4 : mxE i −=µ
( )
( ){ } ( ){ } ( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ } ( )
( ){ }
( ){ }
( )
( ){ } ( ){ }
( )
( ){ }
( )
( ){ }




    
  

  
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
i
i
k
k
k
i
i
k
k
k
k
i
i
k
k
k
i
k
ij
j
ji
k
k
i
i
mmE
k
k
mxE
k
mmE
mxE
k
mmEk
mxE
k
mxE
k
mmEk
mxE
k
mmE
mmE
k
k
mxE
k
mmE
mxEmxEmxE
kk
/
2
2
1
0
2
0
1
0
2
3
1
2
2
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ
1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ
11
ˆ4
1
1
σ
σσσ
σσ
σσ
µ
σ
σσ
σ
σσ
−
−
−−
−
−
−−
−
−
+
−
−
−−
−
−
+−
−
−
+
+−
−
+−
−
−
+












−−+−
−
≈
∑∑
∑∑∑
∑∑ ∑∑
==
===
==
≠
==

48
{ } { } mxE
k
mE
k
i
ik == ∑=1
1
ˆ
{ } ( ) 2
1
22
ˆ
1
1
:ˆ σσ =






−
−
= ∑=
k
i
kik mx
k
EE
We found:
( ){ } k
mmE kmk
2
22
ˆ ˆ:
σ
σ =−=
( ){ } ( )
k
mx
k
EE
k
i
kik
k
4
4
2
2
1
22222
ˆ
ˆ
1
1
ˆ:2
σµ
σσσσσ
−
≈














−−
−
=−= ∑=
( ){ }4
4 : mxE i −=µ
Kurtosis of random variable xi
Define
4
4
:
σ
µ
λ =
( ){ } ( ) ( )
k
mx
k
EE
k
i
kik
k
42
2
1
22222
ˆ
1
ˆ
1
1
ˆ:2
σλ
σσσσσ
−
≈














−−
−
=−= ∑=

49
[ ] ϕσσσ σσ =≤≤
2
ˆ
2
k
2
k
ˆ-0Prob n
For high values of k, according to the Central Limit Theorem the estimations of mean
and of variance are approximately Gaussian Random Variables.
kmˆ
2
ˆkσ
We want to find a region around that
will contain σ2
with a predefined probability
φ as function of the number of iterations k.
2
ˆkσ
Since are approximately Gaussian Random
Variables nσ is given by solving:
2
ˆkσ
ϕζζ
π
σ
σ
=





−∫
+
−
n
n
d2
2
1
exp
2
1
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
Cumulative Probability within nσ
Standard Deviation of the Mean for a
Gaussian Random Variable
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
( ) ( ) ( ) ( )( )42222
1,0;ˆ~ˆ&,0;ˆ~ˆ σλσσσσ −−− kkkk kmmmk NN

50
2
ˆ
2
k
2
k
ˆ-0Prob n
22
k
22 1
ˆ-
1
σ
λ
σσσ
λ
σσ
k
n
k
n
−
≤≤
−
−
22
k
2
1
1
ˆ-1
1
σ
λ
σσ
λ
σσ 







−
−
≤≤







+
−
−
k
n
k
n
22
ˆ
1
2
k
σ
λ
σσ
k
−
=
22
k
2 1
1ˆ
1
1 σ
λ
σσ
λ
σσ 






 −
−≥≥






 −
+
k
n
k
n







 −
−
≥≥







 −
+
k
n
k
n
1
1
ˆ
1
1
2
2
k
2
λ
σ
σ
λ
σ
σσ
k
n
k
n
1
1
:ˆ:
1
1
k
−
−
=≥≥=
−
+
λ
σ
σσσ
λ
σ
σσ

51

52

53
k
n
k
n
kk 1ˆ
1
:&
1ˆ
1
:
00
−
−
=
−
+
=
λ
σ
σ
λ
σ
σ
σσ
Monte-Carlo Procedure
Choose the Confidence Level φ and find the corresponding nσ
using the normal (Gaussian) distribution.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
Run a few sample k0 > 20 and estimate λ according to2
( )
( )
2
1
2
0
1
4
0
0
0
0
0
0
ˆ
1
ˆ
1
:ˆ






−
−
=
∑
∑
=
=
k
i
ki
k
i
ki
k
mx
k
mx
k
λ∑=
=
0
0
10
1
:ˆ
k
i
ik x
k
m
3 Compute and as function of kσ σ
4 Find k for which
2
ˆ
2
k
2
k
ˆ-0Prob n
5 Run k-k0 simulations

54
Estimation of the Mean and Variance of a Random Variable (continue – 11)
Monte-Carlo Procedure
Choose the Confidence Level φ = 95% that gives the
corresponding nσ=1.96.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
The kurtosis λ = 32
3 Find k for which ϕσ
λ
σσ
σ
σ =












−
≤≤

2
kˆ
22
k
2 1
ˆ-0Prob
k
n
4 Run k>800 simulations
Example:
Assume a Gaussian distribution λ = 3
95.0
2
96.1ˆ-0Prob
2
kˆ
22
k
2
=












≤≤

σ
σσσ
k
Assume also that we require also that with probability φ = 95 %22
k
2
1.0ˆ- σσσ ≤
1.0
2
96.1 =
k
800≈k

55
Pseudo-Random Number Generators
• First attempts to generate “random numbers”:
- Draw balls out of a stirred urn
- Roll dice
• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from
census reports.
• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to
generate random numbers. They published a table of 100,000 digits.
• 1946: J. Von Neumann proposed the “middle square method”.
• 1948: D.H. Lehmer introduced the “linear congruential method”.
• 1955: RAND Corporation published a table of 1,000,000 random digits obtained
from electronic noise.
• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruential
generators.
• 1989: R.S. Wikramaratna proposed the additive congruential method.

56
Pseudo-Random Number Generators
A Random Number represents the value of a random variable uniform distributed on (0,1).
Pseudo-Random Numbers constitute a sequence of values, which although are
deterministically generated, have all the appearances of being independent uniform
distributed on (0,1).
One approach
1. Define x0 = integer initial condition or seed
2. Using integers a and m recursively compute
mxax nn modulo1−= mxIntegerxkmaxmkxa nnn <∈+⋅=− ,,,1
Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-random
number is an approximation to the value of uniform (0,1) random variable.
In general the integers a and m should be chose to satisfy three criteria:
1. For any initial seed, the resultant sequence has the “appearance” of being a sequence
of independent (0,1) random variables.
For any initial seed, the number of variables that can be generated before repetition
begins is large.
The values can be computed efficiently on a digital computer.
Multiplicative congruential method
Return to
Monte Carlo Approximation

57
Pseudo-Random Number Generators (continue – 1)
A guideline is to choose m to be a large prime number compared to the computer word size.
Examples:
32 bits word computer: 807,16712 531
==−= am
125,35312 535
==−= am36 bits word computer:
Another generator of pseudo-random numbers uses recursions of the type:
( ) mcxax nn modulo1 += −
mxIntegerxkmcaxmkcxa nnn <∈+⋅=+− ,,,,1
Mixed congruential method

58
Histograms
Return to Table of Content
A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what
proportion of cases fall into each of several categories: it is a form of data binning. The categories
are usually specified as non-overlapping intervals of some variable. The categories (bars) must be
adjacent. The intervals (or bands, or bins) are generally of the same size.
Histograms are used to plot density of data, and often for density estimation: estimating the
probability density function of the underlying variable. The total area of a histogram always
equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a
relative frequency plot.
A cumulative histogram is a mapping that counts the
cumulative number of observations in all of the bins
up to the specified bin. That is, the cumulative
histogram Mi of a histogram mi is defined as:
An ordinary and a cumulative
histogram of the same data. The
data shown is a random sample of
10,000 points from a normal
distribution with a mean of 0 and
a standard deviation of 1.
Mathematical Definition
∑=
=
k
i
imn
1
In a more general mathematical sense, a histogram is
a mapping mi that counts the number of observations
that fall into various disjoint categories (known as
bins), whereas the graph of a histogram is merely one
way to represent a histogram. Thus, if we let n be the
total number of observations and k be the total number
of bins, the histogram mi meets the following
conditions:
∑=
=
i
j
ji mM
1

59
The Inverse Transform Method
Suppose we want to generate a discrete random variable X
having probability density function:
( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
To accomplish this, let generate a random number U that is uniformly distributed
over (0,1) and set:











<≤
+<≤
<
=
∑∑ =
−
=


j
i
i
j
i
ij pUpifx
ppUpifx
pUifx
X
1
1
1
1001
00
j
j
i
i
j
i
ij ppUpPxXP =






<<== ∑∑ =
−
= 1
1
1
)(
Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed
P (a ≤ U < b) = b-a, we have:
and so X has the desired distribution.

60
The Inverse Transform Method (continue – 1)
Suppose we want to generate a discrete random variable X
having probability density function: ( ) 1,1,0)( ==−= ∑j
jjj pjxxpxp δ
Draw X, N times,
from p (x)
Histogram of the
Results

61
Generating a Poisson Random Variable: 1,1,0
!
)( ===== ∑−
i
i
i
i pi
i
eiXPp 
λλ
( )
1
!
!1
1
1
+
=
+
=
−
+
−
+
i
i
e
i
e
p
p
i
i
i
i λ
λ
λ
λ
λ
Draw X , N times, from
Poisson Distribution
Histogram of the Results

62
Generating Binominal Random Variable:
( )
( ) 1,1,01
!!
!
)( ==−
−
=== ∑−
i
i
ini
i pipp
ini
n
iXPp 
( ) ( )
( )
( )
( ) p
p
i
in
pp
ini
n
pp
ini
n
p
p
ini
ini
i
i
−+
−
=
−
−
−
−−+
=
−
−−+
+
111
!!
!
1
!1!1
! 11
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
( )nkP ,
Histogram of the Results

63
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having a
probability density function { qj, j ≥0 }. We want to use this to obtain a random
variable that has the probability density function { pj, j ≥0 }.
Let c be a constant such that: 0.. ≠∀≤ j
j
j
qtsjc
q
p
If such a c exists, it must satisfy: cqcp
j
j
j
j ≤⇒≤ ∑∑ 1
11

Rejection Method
Step 1: Simulate the value of Y, having probability density function qj.
Step 2: Generate a random number U (that is uniformly distributed
over (0,1) ).
Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.

64
The Acceptance-Rejection Technique (continue – 1)
Theorem
The random variable X obtained by the rejection method has probability density
function P { X=i } = pi.
Proof
{ } { } { }
{ } { }Acceptance
,
Acceptance
Acceptance,
Acceptance
Method
Acceptance
Method
Acceptance
P
qc
p
UiYP
P
iYP
iYPiXP i
i
Bayes






≤=
=
=
====
{ }
{ } { } { }AcceptanceAcceptanceAcceptance
(0,1)ddistribute
uniformlyU
ceindependen
by
Pc
p
P
qc
p
q
P
qc
p
UPiYP
ii
i
i
i
i
qi
==





≤=
=

Summing over all i, yields
{ }
{ }Acceptance
1
1
Pc
p
iXP i
i
i

 ∑
∑ ==
{ } 1Acceptance =Pc
{ } ipiXP ==
{ } 1
1
Acceptance ≤=
c
P
q.e.d.

65
The Acceptance-Rejection Technique (continue – 2)
Example
Generate a truncated Gaussian using the
Accept-Reject method. Consider the case with
( ) [ ]



 −∈
≈
−
otherwise
xe
xp
x
0
4,42/2/2
π
Consider the Uniform proposal function
( )
[ ]


 −∈
≈
otherwise
x
xq
0
4,48/1
In Figure we can see the results of the
Accept-Reject method using N=10,000 samples.

66
Generating Continuous Random Variables
The Inverse Transform Algorithm
Let U be a uniform (0,1) random variable. For any continuous
distribution function F the random variable X defined by
( )UFX 1−
=
has distribution F. [ F-1
(u) is defined to be that value of x such that F (x) = u ]
Proof
Let Px(x) denote the Probability Distribution Function X=F-1
(U)
( ) { } ( ){ }xUFPxXPxPx ≤=≤= −1
Since F is a distribution function, it means that F (x) is a monotonic increasing
function of x and so the inequality “a ≤ b” is equivalent to the inequality
“F (a) ≤ F (b)”, therefore
( ) ( )[ ] ( ){ }
( )[ ]
( ){ } ( )
( )
( )xFxFUP
xFUFFPxP
uniformU
xF
UUFF
x
1,0
10
1
1
≤≤
=
−
=≤=
≤=
−

67
Importance Sampling
Let Y = (Y1,…,Ym) a vector of random variables having a joint probability density
function f (y1,…,ym), and suppose that we are interested in estimating
( )[ ] ( ) ( )∫== mmmmf dydyyyfyyhYYhE  1111 ,,,,,,θ
Suppose that a direct generation of the random vector Y so as to compute h (Y) is
inefficient possible because
(a) is difficult to generate the random vector Y, or
(b) the variance of h (Y) is large, or
(c) both of the above
Suppose that W=(W1,…,Wm) is another random vector, which takes values in the
same domain as Y, and has a joint density function g(w1,…,wm) that can be easily
generated. The estimation θ can be expressed as:
( )[ ] ( ) ( )
( )
( ) ( ) ( )
( ) 





=== ∫ Wg
WfWh
Edwdwwwg
wwg
wwfwwh
YYhE gmm
m
mm
mf 


 11
1
11
1 ,,
,,
,,,,
,,θ
Therefore, we can estimate θ by generating values of random vector W, and then
using as the estimator the resulting average of the values h (W) f (W)/ g (W).
Return to Particle Filters

68
Monte Carlo Integration
Monte Carlo Method can be used to numerically evaluate multidimensional integrals
( ) ( )∫∫ == xdxgdxdxxxgI mm  11 ,,
To use Monte Carlo we factorize ( ) ( ) ( )xpxfxg ⋅=
( ) ( ) 1&0 =≥ ∫ xdxpxp
in such a way that is interpreted as a Probability Density Function such that( )xp
We assume that we can draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
Using Monte Carlo we can approximate ( ) ( )∑=
−≈
SN
i
S
i
Nxxxp
1
/δ
( ) ( ) ( ) ( )
( ) ( ) ( )∑∑∫
∫ ∑∫
==
=
=−⋅=
−⋅=≈⋅=
SS
S
S
N
i
i
S
N
i
i
S
N
i
S
i
N
xf
N
xdxxxf
N
xdNxxxfIxdxpxfI
11
1
11
/
δ
δ

69
Monte Carlo Integration
we draw NS samples from ( )xp( )S
i
Nix ,,1, =
( ) S
i
Nixpx ,,1~ =
( ) ( ) ( )∑∫ =
=≈⋅=
S
S
N
i
i
S
N xf
N
IxdxpxfI
1
1
If the samples are independent, then INS
is an unbiased estimate of I.
i
x
According to the Law of Large Numbers INS
will almost surely converge to I:
II
sa
N
N
S
S
..
∞→
→
( )[ ] ( ) ∞<−= ∫ xdxpIxff
22
:σIf the variance of is finite; i.e.:( )xf
then the Central Limit Theorem holds and the estimation error converges in
distribution to a Normal Distribution:
( ) ( )2
,0~lim fNS
N
IIN S
S
σN−
∞→
The error of the MC estimate, e = INS
– I, is of the order of O (NS
-1/2
), meaning
that the rate of convergence of the estimate is independent of the dimension of
the integrand.
Numerical Integration of
and ( )kk xzp |( )1| −kk xxp
Return to Particle Filters

70
Existence Theorems
Existence Theorem 3
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),
(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)
having S (ω) as its power spectrum or R (τ) as its autocorrelation.
Define
( ) ( ) ( ) ( ) ( )ω
π
ω
π
ω
ωωω
π
−=
−
=== ∫
+∞
∞−
f
a
S
a
S
fdSa 22
2
:&
1
:
Since , according to Existence Theorem 1,
we can find a random variable ω with the even density function f (ω), and
probability density function
( ) ( ) 1&0 =≥ ∫
+∞
∞−
ωωω dff
( ) ( )∫∞−
=
ω
ττω dfP :
We now form the process , where is a random variable
uniform distributed in the interval (-π,+π) and independent of ω.
( ) ( )ϑω += tatx cos: ϑ

71
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 1)
Since is uniform distributed in the interval (-π,+π) and independent of ω,
its spectrum is
( ){ } ( ){ } ( ){ } ( ){ } ( ){ } 0sinsincoscos
00
,
=−=

ϑωϑω ϑωϑω
ϑω
EtEaEtEatxE
tindependen
ϑ
( ) { } ( )
ϖπ
ϖπ
ϖπϖπ
ϑ
π
ϖ
πϖπϖπ
π
ϑϖπ
π
ϑϖϑϖ
ϑϑ
sin
2
1
2
1
2
1
=
−
====
−+
−
+
−
∫ j
ee
j
e
deeES
jjj
jj
or { } ( ){ } ( ){ } ( )
ϖπ
ϖπ
ϑϖϑϖ ϑϑ
ϑϖ
ϑ
sin
sincos =+= EjEeE j
1=ϖ 1=ϖ
( ) ( ){ } ( ) ( )[ ]{ }
( ){ } ( )[ ]{ }
( ){ } ( )[ ]{ } ( ){ } ( )[ ]{ } ( ){ }
 0
2
0
22,
22
2
2sin2sin
2
2cos2cos
2
cos
2
22cos
2
cos
2
coscos
ϑτωϑτωτω
ϑτωτω
ϑτωϑωτ
ϑωϑωω
ϑω
EtE
a
EtE
a
E
a
tE
a
E
a
ttEatxtxE
tindependen
+−++=
+++=
+++=+
2=ϖ 2=ϖ

72
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 2)
( ){ } 0=txE
( ) ( ){ } ( ){ } ( ) ( ) ( )τωωτωτωτ ω xRdf
a
E
a
txtxE ===+ ∫
+∞
∞−
cos
2
cos
2
22
( ) ( )ϑω += tatx cos:We have
Because of those two properties x (t) is wide-sense stationary with a power spectrum
given by:
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=−= ττωτττωτωτω
ττ
dRdjRS x
RR
xx
xx
cossincos
( ) ( ) ( ) ( )[ ]
( ) ( )
( ) ( )∫∫
+∞
∞−
−=+∞
∞−
=+= ωτωω
π
ωτωτωω
π
τ
ωω
dSdjSR x
SS
xx
xx
cos
2
1
sincos
2
1
Therefore ( ) ( )ωπω faSx
2
=
q.e.d.
Fourier
Inverse
Fourier
( ) ( )∫
+∞
∞−
= ωωτω df
a
cos
2
2
f (ω) definition
( )ωS=

73
SOLO
Markov Processes
A Markov Process is defined by:
Andrei Andreevich
Markov
1856 - 1922
( ) ( )( ) ( ) ( )( ) 111
,|,,,|, tttxtxptxtxp >∀ΩΩ=≤ΩΩ ττ
i.e. the Random Process, the past up to any time t1 is fully defined
by the process at t1.
Examples of Markov Processes:
1. Continuous Dynamic System
( ) ( )
( ) ( )vuxthtz
wuxtftx
,,,
,,,
=
=
2. Discrete Dynamic System
( ) ( )
( ) ( )kkkkk
kkkkk
vuxthtz
wuxtftx
,,,
,,, 1111
=
= −−−−
x - state space vector (n x 1)
u - input vector (m x 1)
- measurement vector (p x 1)z
v - white measurement noise vector (p x 1)
- white input noise vector (n x 1)w

74
Recursive Bayesian EstimationSOLO
Using this property we obtain:
( ) ( )1021 |,,,| −−− = kkkkk xxpxxxxp 
Markov Processes
( ) ( )
( )
( )
( ) ( )
( )
( )
( ) ( )∏=
−
−−−−
−−−−−−
=
=
=
−−
−
k
i
ii
k
xxp
kkkk
kk
xxp
kkkkkk
xxpxp
xxpxxxpxxp
xxxpxxxxpxxxxp
kk
kk
1
10
02
|
0211
021
|
021021
|
,,,,||
,,,,,,|,,,,
21
1

  


  

Markov Process:
Table of Content
the present discrete state probability depends only on the previous state.
The Markov Process is defined if we know p (x0) and p(xi|xi-1) for each i.

75
In a Markovian system the probability of the current
true state depends only on the previous state, and is
independent of the other earlier states
( ) ( )1021 |,,,| −−− = kkkkk xxpxxxxp 
Similarly the measurements at the k-th time-
step is dependent upon the current true
state, so is conditionally independent of all other
earlier states, given the current state
( ) ( )kkkkk xzpxxxzp |,,,| 01 =− 
( ) ( ) ( ) ( ) ( )kkkkkkkk zpzxpxpxzpxzp ||, ==
From the definition of the Markovian system (see Figure) p (xk|xk-1) is defined by
f and the statistics of x and w and p (zk|xk) is defined by h and statistics of x and v.
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )111 ,, −−− kkk wuxf
( )kk vxh ,
Markov Processes
( )000 ,, wuxf
( )11,vxh
( )111 ,, wuxf
( )22 ,vxh
Hidden States
Measurements

76
( ) ( ) ( )
( ) ( )kvkkk
xkkwkkkk
vpgivenvxhz
xpuwpgivenwuxfx
:,
,,:,, 011111 0
=
= −−−−−
Markov Processes
( ) ( )j
kkkkxkkkw
j
k wuxfxtsNjuxxfw k 11111
1
1 ,,..,..,1,, −−−−−
−
− ===
Suppose that we can obtain all for which
j
kw 1−
( ) ( ) ( )∑=
−
−−−−− ∇=
kxN
j
j
kkkw
j
kwkk wuxfwpxxp
1
1
11111 ,,|then
( ) ( ) ( )∑=
−
∇=
kx
k
N
j
j
kkv
j
kvkk vxhvpxzp
1
1
,|
( ) ( )j
kkkzkkv
j
k vxhztsNjxzhv k
,..,..,1,1
=== −
In the same way, suppose that we can obtain all for whichj
kv
then
( ) ( ) ( )
( ) ( )∑
∑
=
−
−−−−
=
−−−−
∇=
=+≤≤=
kx
kx
N
j
k
j
kkkw
j
kw
N
j
j
k
j
kwkkkkkkkk
xdwuxfwp
wdwpxxdxXxxdxxp
1
1
1111
1
1111
,,
|Pr|
This is a Conceptual
Not a Practical Procedure
Analytic Computations of and .( )kk xzp |( )1| −kk xxp

77
( ) ( ) ( )
( ) ( )kvkkk
xkkwkkkk
vpgivenvxhz
xpuwpgivenwuxfx
:
,,:, 011111 0
+=
+= −−−−−
kx1−kx
kz1−kz
( ) 111, −−− + kkk wuxf
( ) kk vxh +
Markov Processes
( ) ( )[ ]111 ,| −−− −= kkkwkk uxfxpxxptherefore
( ) ( )[ ]kkvkk xhzpxzp −=|and
For additive noise
we have
( )
( )kkk
kkkk
xhzv
uxfxw
−=
−= −−− 111 ,
Analytic Computations of and (continue – 1)( )kk xzp |( )1| −kk xxp

78
SOLO
( )
( )kkk
kkk
vxhz
wxfx
,
, 11
=
= −−
kk vw &1− are system and measurement white-noise sequences
independent of past and current states and on each other and
having known P.D.F.s ( ) ( )kk vpwp &1−
We want to compute p (xk|Z1:k) recursively, assuming knowledge of p(xk-1|Z1:k-1)
in two stages, prediction (before) and update (after measurement)
( ) ( )( ) ( )∫ −−−−− −= 11111 ,| kkkkkkk wdwpwxfxxxp δ
We need to evaluate the following integrals:
( ) ( )( ) ( )∫ −= kkkkkkk vdvpvxhzxzp ,| δ
We use the numeric Monte Carlo Method to evaluate the integrals:
Generate (Draw): ( ) ( ) Sk
i
kk
i
k Nivpvwpw ,,1~&~ 11 =−−
( ) ( )( ) S
N
i
i
k
i
k
i
kkk Nwxfxxxp
S
∑=
−−− −≈
1
111 /,| δ
( ) ( )( ) S
N
i
i
k
i
k
i
kkk Nvxhzxzp
S
∑=
−≈
1
/,| δ
or
( ) ( ) ( ) S
N
i
i
kkkk
i
k
i
k
i
k Nxxxxpwxfx
S
∑=
−−− −≈→=
1
111 /|, δ
( ) ( ) ( ) S
N
i
i
kkkk
i
k
i
k
i
k Nzzxzpvxhz
S
∑=
−≈→=
1
/|, δ
Analytic solutions for those integral
equations do not exist in the general
case.
Numerical Computations of and .( )kk xzp |( )1| −kk xxp
Markov Processes
Prediction (before measurement) ( ) ( ) ( )∫ −−−−− = 11:1111:1 ||| kkkkkkk xdZxpxxpZxp1
Update (after measurement)
( ) ( )
( ) ( ) ( )
( )
( ) ( )
( )
( ) ( )
( ) ( )∫ −
−
−
−
=
− ===
kkkkk
kkkk
kk
kkkk
Bayes
bp
apabp
bap
kkkkk
xdZxpxzp
Zxpxzp
Zzp
Zxpxzp
ZzxpZxp
1:1
1:1
1:1
1:1
|
|
1:1:1
||
||
|
||
,||
2

79
( ) ( ) ( )
( ) ( )kvkkk
xkkwkkkk
vpgivenvxhz
xpuwpgivenwuxfx
:,
,,:,, 011111 0
=
= −−−−−
Markov Processes
Monte Carlo Computations of and .( )kk xzp |( )1| −kk xxp
Generate (Draw) ( ) Sx
i
Nixpx ,,1~ 00 0
=
For { }∞∈ ,,1 k
Initialization0
1 At stage k-1
Generate (Draw) NS samples ( ) Skw
i
k Niwpw ,,1~ 11 =−−
2 State Update ( ) S
i
kk
i
k
i
k Niwuxfx ,,1,, 111 == −−−
3 Generate (Draw) Measurement Noise ( ) Skv
i
k Nivpv ,,1~ =
k:=k+1 & return to 1
Compute Histograms of
to obtain ( )kk xzp |
kk xz |
( ) ( )∑=
− −≈
SN
i
S
i
kkkk Nxxxxp
1
1 /| δ
( ) ( )∑=
−≈
SN
i
S
i
kkkk Nzzxzp
1
/| δ
Compute Histograms of
to obtain
1| −kk xx
( )1| −kk xxp
4 Measurement , Update ( ) S
i
k
i
k
i
k Nivxhz ,,1, ==kz

SOLO
Stochastic Processes deal with systems corrupted by noise. A description of those processes is
given in “Stochastic Processes” Presentation. Here we give only one aspect of those processes.
( ) ( ) ( ) [ ]fttttwddttxftxd ,, 0∈+=
A continuous dynamic system is described by:
( )tx - n- dimensional state vector
( )twd - n- dimensional process noise vector
Assuming system measurements at discrete time tk given by:
( ) ( )( ) [ ]fkkkkk tttvttxhtz ,,, 0∈=
kv - m- dimensional measurement noise vector at tk
We are interested in the probability of the state at time t given the set of discrete
measurements until (included) time tk < t.
x
( )kZtxp |,
{ }kk zzzZ ,,, 21 = - set of all measurements up to and including time tk.
The time evolution of the probability density function is described by the
Fokker–Planck equation.

A solution to the one-dimensional
Fokker–Planck equation, with both the
drift and the diffusion term. The initial
condition is a Dirac delta function in
x = 1, and the distribution drifts
towards x = 0.
The Fokker–Planck equation describes the time evolution of
the probability density function of the position of a particle, and
can be generalized to other observables as well. It is named after
Adriaan Fokker and Max Planck and is also known as the
Kolmogorov forward equation. The first use of the Fokker–
Planck equation was the statistical description of Brownian
motion of a particle in a fluid.
In one spatial dimension x, the Fokker–Planck equation for a
process with drift D1(x,t) and diffusion D2(x,t) is
More generally, the time-dependent probability distribution
may depend on a set of N macrovariables xi. The general
form of the Fokker–Planck equation is then
where D1
is the drift vector and D2
the diffusion tensor; the latter results from the presence of the
stochastic force.
Fokker – Planck Equation
Adriaan Fokker
1887 - 1972
Max Planck
1858 - 1947
SOLO
Adriaan Fokker
„Die mittlere Energie rotierender
elektrischer Dipole im Strahlungsfeld"
Annalen der Physik 43, (1914) 810-
820
Max Plank, „Ueber einen Satz der
statistichen Dynamik und eine
Erweiterung in der Quantumtheorie“,
Sitzungberichte der Preussischen
Akadademie der Wissenschaften
(1917) p. 324-341
( ) ( ) ( )[ ] ( ) ( )[ ]txftxD
x
txftxD
x
txf
t
,,,,, 22
2
1
∂
∂
+
∂
∂
−=
∂
∂
( )[ ] ( )[ ]∑∑∑ = == ∂∂
∂
+
∂
∂
−=
∂
∂ N
i
N
j
Nji
ji
N
i
Ni
i
ftxxD
xx
ftxxD
x
f
t 1 1
1
2
2
1
1
1
,,,,,, 

Fokker – Planck Equation (continue – 1)
The Fokker–Planck equation can be used for computing the probability densities of stochastic
differential equations.
where is the state and is a standard M-dimensional Wiener process. If the initial
probability distribution is , then the probability distribution of the state
is given by the Fokker – Planck Equation with the drift and diffusion terms:
Similarly, a Fokker–Planck equation can be derived for Stratonovich stochastic differential
equations. In this case, noise-induced drift terms appear if the noise strength is state-dependent.
SOLO
Consider the Itô stochastic differential equation:
( ) ( ) ( )[ ] ( ) ( )[ ]txftxD
x
txftxD
x
txf
t
,,,,, 22
2
1
∂
∂
+
∂
∂
−=
∂
∂

Derivation of the Fokker–Planck Equation
SOLO
Start with ( ) ( ) ( )11|1, 111
|, −−− −−−
= kxkkxxkkxx xpxxpxxp kkkkk
and ( ) ( ) ( ) ( )∫∫
+∞
∞−
−−−
+∞
∞−
−− −−−
== 111|11, 111
|, kkxkkxxkkkxxkx xdxpxxpxdxxpxp kkkkkk
define ( ) ( )ttxxtxxttttt kkkk ∆−==∆−== −− 11 ,,,
( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ) ( )[ ] ( )∫
+∞
∞−
∆−∆− ∆−∆−∆−= ttxdttxpttxtxptxp ttxttxtxtx ||
Let use the Characteristic Function of
( ) ( ) ( ) ( ) ( )[ ]{ } ( ) ( ) ( ) ( )[ ] ( ) ( ) ( ) ( )ttxtxtxtxdttxtxpttxtxss ttxtxttxtx ∆−−=∆∆−∆−−−=Φ ∫
+∞
∞−
∆−∆−∆ |exp: ||
( ) ( ) ( ) ( )[ ]ttxtxp ttxtx ∆−∆− ||
The inverse transform is ( ) ( ) ( ) ( )[ ] ( ) ( )[ ]{ } ( ) ( ) ( )∫
∞+
∞−
∆−∆∆− Φ∆−−=∆−
j
j
ttxtxttxtx sdsttxtxs
j
ttxtxp || exp
2
1
|
π
Using Chapman-Kolmogorov Equation we obtain:
( ) ( )[ ] ( ) ( )[ ]{ } ( ) ( ) ( )
( ) ( ) ( ) ( )[ ]
( ) ( )[ ] ( )
( ) ( )[ ]{ } ( ) ( ) ( ) ( ) ( )[ ] ( )ttxdsdttxpsttxtxs
j
ttxdttxpsdsttxtxs
j
txp
j
j
ttxttxtx
ttx
ttxtxp
j
j
ttxtxtx
ttxtx
∆−∆−Φ∆−−=
∆−∆−Φ∆−−=
∫ ∫
∫ ∫
∞+
∞−
∞+
∞−
∆−∆−∆
+∞
∞−
∆−
∆−
∞+
∞−
∆−∆
∆−
|
|
|
exp
2
1
exp
2
1
|
π
π
  

Derivation of the Fokker–Planck Equation (continue – 1)
SOLO
The Characteristic Function can be expressed in terms of the moments about x (t-Δt) as:
( ) ( )[ ] ( ) ( )[ ]{ } ( ) ( ) ( ) ( ) ( )[ ] ( )ttxdsdttxpsttxtxs
j
txp
j
j
ttxttxtxtx ∆−∆−Φ∆−−= ∫ ∫
+∞
∞−
∞+
∞−
∆−∆−∆ |exp
2
1
π
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )[ ] ( ){ }∑
∞
=
∆−∆∆−∆ ∆−∆−−
−
+=Φ
1
|| |
!
1
i
i
ttxtx
i
ttxtx ttxttxtxE
i
s
s
Therefore
( ) ( )[ ] ( ) ( )[ ]{ } ( )
( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ] ( )ttxdsdttxpttxttxtxE
i
s
ttxtxs
j
txp
j
j
ttx
i
i
ttxtx
i
tx ∆−∆−






∆−∆−−
−
+∆−−= ∫ ∫ ∑
+∞
∞−
∞+
∞−
∆−
∞
=
∆−
1
| |
!
1exp
2
1
π
Use the fact that ( ) ( ) ( )[ ]{ } ( ) ( ) ( )[ ]
( )[ ]
,2,1,01exp
2
1
=
∂
∆−−∂
−=∆−−−∫
∞+
∞−
i
tx
ttxtx
sdttxtxss
j i
i
i
j
j
i δ
π
( ) ( )[ ] ( ) ( )[ ]{ } ( ) ( )[ ] ( )
( ) ( ) ( )[ ]
( )[ ]
( ) ( )[ ] ( ){ } ( ) ( )[ ] ( )∫∑
∫ ∫
∞+
∞−
∞
=
∆−
+∞
∞−
∆−
∞+
∞−
∆−∆−∆−∆−−
∂
∆−−∂−
+
∆−∆−∆−−=
1
|
!
1
exp
2
1
i
ttx
i
i
ii
ttx
j
j
tx
ttxdttxpttxttxtxE
tx
ttxtx
i
ttxdttxpsdttxtxs
j
txp
δ
π
where δ [u] is the Dirac delta function:
[ ] { } ( ) [ ] ( ) ( ) ( ) ( ) ( )000..0exp
2
1
FFFtsuFFduuuFsdus
j
u
j
j
==∀== −+
+∞
∞−
∞+
∞−
∫∫ δ
π
δ

SOLO
[ ] ( ){ } ( ) [ ] ( ) ( ) ( ) ( ) ( )afafaftsufufduuaufsduas
j
ua au
j
j
==∀=−−=− −+=
+∞
∞−
∞+
∞−
∫∫ ..exp
2
1
δ
π
δ
[ ] ( ){ } ( ) ( ) { } ( ) ( ) { }∫∫∫
∞+
∞−
∞+
∞−
∞+
∞−
=→=−
−
=−
j
j
j
j
j
j
sdussFs
j
uf
du
d
sdussF
j
ufsduass
j
ua
ud
d
exp
2
1
exp
2
1
exp
2
1
πππ
δ
( ) [ ] ( ) ( ){ } ( ) ( ){ }
{ } ( ) { } { } ( ) ( )
au
j
j
j
j
j
j
j
j
ud
ufd
sdsFass
j
sdduusufass
j
sdduuasufs
j
dusduass
j
ufduua
ud
d
uf
=
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
∞+
∞−
+∞
∞−
−=
−
=−
−
=
−
−
=−
−
=−
∫∫ ∫
∫ ∫∫ ∫∫
exp
2
1
expexp
2
1
exp
2
1
exp
2
1
ππ
ππ
δ
[ ] ( ) ( ){ } ( ) ( ) { } ( ) ( ) { }∫∫∫
∞+
∞−
∞+
∞−
∞+
∞−
=→=−
−
=−
j
j
i
i
ij
j
j
j
i
i
i
i
sdussFs
j
uf
du
d
sdussF
j
ufsduass
j
ua
ud
d
exp
2
1
exp
2
1
exp
2
1
πππ
δ
( ) [ ] ( ) ( ) ( ){ } ( ) ( ) ( ){ }
( ) { } ( ) { } ( ) ( ) { } ( ) ( )
au
i
i
i
j
j
i
ij
j
i
i
j
j
i
ij
j
i
i
i
i
ud
ufd
sdassFs
j
sdduusufass
j
sdduuasufs
j
dusduass
j
ufduua
ud
d
uf
=
−=
−
=−
−
=
−
−
=−
−
=−
∫∫ ∫
∫ ∫∫ ∫∫
∞+
∞−
∞+
∞−
∞+
∞−
∞+
∞−
+∞
∞−
+∞
∞−
∞+
∞−
+∞
∞−
1exp
2
1
expexp
2
1
exp
2
1
exp
2
1
ππ
ππ
δ
Useful results related to integrals involving Delta (Dirac) function

SOLO
( ) ( )[ ]{ }
( ) ( )[ ]
( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ]txpttxdttxpttxtxttxdttxpsdttxtxs
j
ttxttxttx
ttxtx
j
j
∆−
+∞
∞−
∆−
+∞
∞−
∆−
∆−−
∞+
∞−
=∆−∆−∆−−=∆−∆−∆−− ∫∫ ∫ δ
π
δ
  
exp
2
1
( ) ( ) ( )[ ]
( )[ ] ( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ] ( )
( ) ( ) ( )[ ]
( )[ ] ( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ] ( )
( ) ( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ]( )
( )[ ]∑
∑ ∫
∫∑
∞
=
=∆
∆−∆−
∞
=
∞+
∞−
∆−∆−
+∞
∞−
∞
=
∆−∆−
∂
∆−∆−−∂−
=
∆−∆−∆−∆−−
∂
∆−−∂−
=
∆−∆−∆−∆−−
∂
∆−−∂−
1
0
|
1
|
1
|
|
!
1
|
!
1
|
!
1
i
t
i
ttx
i
ttxtx
ii
i
ttx
i
ttxtxi
ii
i
ttx
i
ttxtxi
ii
tx
txpttxttxtxE
i
ttxdttxpttxttxtxE
tx
ttxtx
i
ttxdttxpttxttxtxE
tx
ttxtx
i
δ
δ
( ) [ ] ( ) ( ) ( )
[ ]
[ ] ( )
auau
i
i
i
i
i
i
i
i
i
ud
ufd
duua
uad
d
uf
ud
ufd
duua
ud
d
uf
==
=−
−
→−=− ∫∫
+∞
∞−
+∞
∞−
δδ 1We found
( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ]( )
( )[ ]∑
∞
=
=∆
∆−∆−
∆−
∂
∆−∆−−∂−
+=
1
0
| |
!
1
i
t
i
ttx
i
ttxtx
ii
ttxtx
tx
txpttxttxtxE
i
txptxp
( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ]( )
( )[ ]∑
∞
=
∆−
→∆
∆−
→∆ ∂
∆−∆−−∂
∆
−
=
∆
−
1
00
|1
lim
!
1
lim
i
i
ttx
ii
t
i
ttxtx
t tx
txpttxttxtxE
tit
txptxp
Therefore
Rearranging, dividing by Δt, and tacking the limit Δt→0, we obtain:

SOLO
We found ( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( ) ( ) ( )[ ] ( ){ } ( ) ( )[ ]( )
( )[ ]∑
∞
=
∆−∆−
→∆
∆−
→∆ ∂
∆−∆−−∂
∆
−
=
∆
−
1
|
00
|1
lim
!
1
lim
i
i
ttx
i
ttxtx
i
t
i
ttxtx
t tx
txpttxttxtxE
tit
txptxp
Define: ( ) ( )[ ] ( ) ( ) ( ) ( )[ ] ( ){ }
t
ttxttxtxE
txtxm
i
ttxtx
t
i
∆
∆−∆−−
=−
∆−
→∆
−
|
lim:
|
0
Therefore ( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ]( )
( )[ ]∑
∞
=
−
∂
−∂−
=
∂
∂
1 !
1
i
i
tx
iii
tx
tx
txptxtxm
it
txp
( ) ( )ttxtx
t
∆−=
→∆
−
0
lim: and:
This equation is called the Stochastic Equation or Kinetic Equation.
It is a partial differential equation that we must solve, with the initial condition:
( ) ( )[ ] ( )[ ]000 0 txptxp tx ===

SOLO
We want to find px(t) [x(t)] where x(t) is the solution of
( ) ( ) ( ) [ ]fg ttttntxf
dt
txd
,, 0∈+=
( ){ } 0: == tnEn gg

( )tng
( ) ( )[ ] ( ) ( )[ ]{ } ( ) ( )τδττ −=−− ttQnntntnE gggg
ˆˆ
Wiener (Gauss) Process
( ) ( )[ ] ( ) ( )[ ] ( ){ } [ ] ( ){ } [ ]{ } ( )tQnEtxnE
t
ttxttxtxE
txtxm gg
t
===
∆
∆−∆−−
=−
→∆
−
22
2
2
0
2
|
|
lim:
( ) ( )[ ] ( ) ( )[ ] ( ){ } ( ) ( ) ( ) ( ) ( )txfnEtxftx
td
txd
E
t
ttxttxtxE
txtxm g
t
,,|
|
lim:
0
0
1
=+=












=
∆
∆−∆−−
=−
→∆
−

( ) ( )[ ] ( ) ( )[ ] ( ){ } 20
|
lim:
0
>=
∆
∆−∆−−
=−
→∆
− i
t
ttxttxtxE
txtxm
i
t
i
Therefore we obtain:
( ) ( )[ ] ( )[ ] ( ) ( )[ ]( )
( )
( ) ( ) ( )[ ]
( )[ ]2
2
2
1,
tx
txp
tQ
tx
txpttxf
t
txp txtxtx
∂
∂
+
∂
∂
−=
∂
∂
Fokker–Planck Equation
Return to Daum

89
Given a nonlinear discrete stochastic Markovian system we want to use k discrete
measurements Z1:k={z1,z2,…,zk} to estimate the hidden state xk. For this we want to
compute the probability of xk given all the measurements Z1:k={z1,z2,…,zk} .
If we know p ( xk| Z1:k ) then xk is estimated using:
{ } ( )∫== kkkkkkkk xdZxpxZxEx :1:1| ||:ˆ
( )( ){ } ( )( ) ( )∫ −−=−−= kkk
T
kkkkk
T
kkkkkk xdZxpxxxxZxxxxEP :1:1| |ˆˆ|ˆˆ
or more general we can compute all moments of the probability distribution p ( xk| Z1:k ):
( ){ } ( ) ( )∫= kkkkkk xdZxpxgZxgE :1:1 ||
Problem:
Estimate the hidden
States of a
Non-linear Dynamic
Stochastic System
from Noisy
Measurements.
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
The knowledge of p ( xk| Z1:k ) allows also the computation of Maximum a Posteriori
(MAP) estimate using: ( )kk
x
MAP
kk Zxpx
k
:1| |maxargˆ =

90
To find the expression for p ( xk| Z1:k ) we use the theorem of joint probability (Bayes Rule):
( ) ( )
( )k
kk
RuleBayes
kk
Zp
Zxp
Zxp
:1
:1
:1
,
| =
Since Z1:k ={ zk, Z1:k-1 }: ( ) ( )
( )1:1
1:1
:1
,
,,
|
−
−
=
kk
kkk
kk
Zzp
Zzxp
Zxp
The denominator of this expression is
( ) ( ) ( )1:11:11:1 ,,|,, −−− = kkkkk
RuleBayes
kkk ZxpZxzpZzxp
( ) ( ) ( )
  
1:11:11:1 |,| −−−= kkkkkk ZpZxpZxzp
Since the knowledge of xk supersedes the need for Z1:k-1 = {z1, z2,…,zk-1}
( ) ( )kkkkk xzpZxzp |,| 1:1 ≡−
( ) ( ) ( ) ( )
( ) ( )1:11:1
1:11:1
:1
|
||
|
−−
−−
=
kkk
kkkkk
kk
ZpZzp
ZpZxpxzp
ZxpTherefore:
( ) ( ) ( )1:11:11:1 |, −−− = kkk
RuleBayes
kk ZpZzpZzp
and the nominator is

91
The final result is:
( ) ( ) ( )
( )1:1
1:1
:1
|
||
|
−
−
=
kk
kkkk
kk
Zzp
Zxpxzp
Zxp
Therefore:
Since p ( xk| Z1:k ) is a probability distribution it must satisfy:
( ) ( ) ( )
( )
( ) ( )
( )∫
∫
∫ −
−
−
−
===
1:1
1:1
1:1
1:1
:1
|
||
|
||
|1
kk
kkkkk
k
kk
kkkk
kkk
Zzp
xdZxpxzp
xd
Zzp
Zxpxzp
xdZxp
( ) 1| :1 =∫ kkk xdZxp
( ) ( ) ( )∫ −− = kkkkkkk xdZxpxzpZzp 1:11:1 |||
( ) ( ) ( )
( ) ( )∫ −
−
=
kkkkk
kkkk
kk
xdZxpxzp
Zxpxzp
Zxp
1:1
1:1
:1
||
||
|
and:
This is a recursive relation that needs the value of p (xk|Z1:k-1), assuming that
p (zk|xk) is obtained from the Markovian system definition ( zk = h (xk,vk) ).
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
Hidden States
Measurements

92
The Correction Step is:
( ) ( ) ( )
( )1:1
1:1
:1
|
||
|
−
−
=
kk
kkkk
kk
Zzp
Zxpxzp
Zxp
evidence
priorlikeliood
posterior
⋅
=
or:
prior: given by prediction equation ( )kk xzp |
likelihood: given by observation model ( )1:1| −kk Zxp
evidence: the normalized constant on the denominator
( ) ( ) ( )∫ −− = kkkkkkk xdZxpxzpZzp 1:11:1 |||

93
( ) ( ) ( )1:111:111:11 |,||, −−−−−− = kkkkk
Bayes
kkk ZxpZxxpZxxp
( ) ( ) ( ) ( )∫∫ −−−−−−−− == 11:11111:111:1 |||,| kkkkkkkkkkk xdZxpxxpxdZxxpZxp
Using:
We obtain:
Since for a Markov Process the knowledge of xk-1 supersedes the need for
Z1:k-1 = {z1, z2,…,zk-1}
( ) ( )11:11 |,| −−− = kkkkk xxpZxxp
Chapman – Kolmogorov Equation
Sydney Chapman
1888 - 1970
Andrey
Nikolaevich
Kolmogorov
1903-1987
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
Hidden States
Measurements

94
Using p (xk-1|Z1:k-1) from time-step k-1 and p (xk|xk-1) of the Markov system, compute:
Initialize with p (x0)
( ) ( ) ( )
( ) ( )∫ −
−
=
kkkkk
kkkk
kk
xdZxpxzp
Zxpxzp
Zxp
1:1
1:1
:1
||
||
|
Using p (xk|Z1:k-1) from Prediction phase and p (zk|xk) of the Markov system, compute:
{ } ( )∫== kkkkkkkk xdZxpxZxEx :1:1| ||ˆ
( )( ){ } ( )( ) ( )∫ −−=−−= kkk
T
kkkkk
T
kkkkkk xdZxpxxxxZxxxxEP :1:1| |ˆˆ|ˆˆ
At stage k
k:=k+1
( )1|11|
ˆˆ −−− = kkkk xfx
0
Prediction phase
(before zk measurement)
1
Correction Step (after zk measurement)2
Filtering3
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
Bayesian Estimation Introduction - Summary

95
( ) ( ) ( )
( ) ( )∫ −
−
=
kkkkk
kkkk
kk
xdZxpxzp
Zxpxzp
Zxp
1:1
1:1
:1
||
||
|
Prediction phase
(before zk measurement)
1
Correction Step (after zk measurement)2
kx1−kx
kz1−kz
0x 1x 2x
1z 2z kZ :11:1 −kZ
( )11, −− kk wxf
( )kk vxh ,
( )00 ,wxf
( )11,vxh
( )11,wxf
( )22 ,vxh
Bayesian Estimation Introduction - Summary
This is a Conceptual Solution because the Integrals are Often Not Tractable.
An optimal solution is possible for some restricted cases:
• Linear Systems with Gaussian Noises (system and measurements)
• Grid-Based Filters
Table of Content

96
SOLO
Linear Gaussian Systems
A Linear Combination of Independent Gaussian random vectors is also a
Gaussian random vector mmm XaXaXaS +++= 2211:
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )



+++++++−=




+−



+−



+−=
ΦΦ⋅Φ==Φ ∫ ∫
+∞
∞−
+∞
∞−
mmmm
mmmm
YYYm
YpYp
mYYmS
aaajaaa
ajaajaaja
YdYdYYpSj m
mmYY
mm
µµµωσσσω
µωσωµωσωµωσω
ωωωωω



  



2211
222
2
2
2
2
1
2
1
2
222
22
2
2
2
2
2
11
2
1
2
1
2
11,,
2
1
exp
2
1
exp
2
1
exp
2
1
exp
,,exp 21
11
1
( ) ( )





 −
−= 2
2
2
exp
2
1
,;
i
ii
i
iiiX
X
Xp i
σ
µ
σπ
σµ ( ) ( ) ( ) 



+−==Φ ∫
+∞
∞−
iiiiXiX jXdXpXj ii
µωσωωω
22
2
1
expexp:
Moment-
Generating
Function
Gaussian
distribution
Define
Proof:
( ) ( )iX
ii
i
X
i
iYiii Xp
aa
Y
p
a
YpXaY iii
11
: =





=→=
( ) ( ) ( ) ( )
( )
( ) 





+−=Φ===Φ ∫∫
+∞
∞−
+∞
∞−
iiiiiiX
asign
asign
ii
i
iX
iiiiYiY ajaXaXda
a
Xp
XajYdYpYj i
i
ii
µωσωωωω
222
2
1
expexpexp:
1
1

97
SOLO
Linear Gaussian Systems (continue – 1)
A Linear Combination of Independent Gaussian random vectors is also a
Gaussian random vector mmm XaXaXaS +++= 2211:
Therefore the Linear Combination of Independent Gaussian Random Variables is a
Gaussian Random Variable with
mmS
mmS
aaa
aaa
m
m
µµµµ
σσσσ
+++=
+++=


2211
222
2
2
2
2
1
2
1
2
Therefore the Sm probability distribution is:
( ) ( )







 −
−= 2
2
2
exp
2
1
,;
m
m
m
mm
S
S
S
SSm
x
Sp
σ
µ
σπ
σµ
Proof (continue – 1):
( ) ( ) ( )





+++++++−=Φ mmmmS aaajaaam
µµµωσσσωω  2211
222
2
2
2
2
1
2
1
2
2
1
exp
We found:
q.e.d.

98
Linear Gaussian Markov Systems (continue – 2)
( )
( )kkkk
kkkk
vuxkhz
wuxkfx
,,,
,,,1 111
=
−= −−−
kkkk
kkkkkkk
vxHz
wuGxx
+=
Γ++Φ= −−−−−− 111111
wk-1 and vk, white noises, zero mean, Gaussian, independent
( ) ( ) ( ){ } ( ) ( ){ } ( )kPkekeEkxEkxke x
T
xxx =−= &:
( ) ( ) ( ){ } ( ) ( ){ } ( ) lk
T
www kQlekeEkwEkwke ,
0
&: δ=−=

( ) ( ) ( ){ } ( ) ( ){ } ( ) lk
T
vvv kRlekeEkvEkvke ,
0
&: δ=−=

( ) ( ){ } { }0=lekeE
T
vw



=
≠
=
lk
lk
lk
1
0
,δ
( ) ( )Qwwpw ,0;N=
( ) ( )Rvvpv ,0;N=
( )
( ) 





−= −
wQw
Q
wp T
nw
1
2/12/
2
1
exp
2
1
π
( )
( ) 



−= −
vRv
R
vp T
pv
1
2/12/
2
1
exp
2
1
π
A Linear Gaussian Markov Systems is defined as
( ) ( )0|0000 ,;0
Pxxxp ttx == = N ( )
( )
( ) ( )



−−−= =
−
== 00
1
0|0002/1
0|0
2/0
2
1
exp
2
1
0
xxPxx
P
xp t
T
tntx
π

99
111111 −−−−−− Γ++Φ= kkkkkkk wuGxx
Prediction phase (before zk measurement)
{ } { } { }  
0
1:111111:1111:11| |||:ˆ −−−−−−−−−− Γ++Φ== kkkkkkkkkkkk ZwEuGZxEZxEx
or 111|111|
ˆˆ −−−−−− +Φ= kkkkkkk uGxx
The expectation is
{ }[ ] { }[ ]{ }
( )[ ] ( )[ ]{ }1:1111|111111|111
1:11|1|1|
|ˆˆ
|ˆˆ:
−−−−−−−−−−−−−
−−−−
Γ+−ΦΓ+−Φ=
−−=
k
T
kkkkkkkkkkkk
k
T
kkkkkkkk
ZwxxwxxE
ZxExxExEP
( ) ( ){ } ( ){ }
( ){ } { } T
k
Q
T
kkk
T
k
T
kkkkk
T
k
T
kkkkk
T
k
P
T
kkkkkkk
wwExxwE
wxxExxxxE
kk
11111
0
1|1111
1
0
11|11111|111|111
ˆ
ˆˆˆ
1|1
−−−−−−−−−−
−−−−−−−−−−−−−−
ΓΓ+Φ−Γ+
Γ−Φ+Φ−−Φ=
−−
  
    
T
kk
T
kkkkkk QPP 1111|111| −−−−−−− ΓΓ+ΦΦ=
{ } ( )1|1|1:1 ,ˆ;| −−− = kkkkkkk PxxZxP N
Since is a Linear Combination of Independent
Gaussian Random Variables:
111111 −−−−−− Γ++Φ= kkkkkkk wuGxx

100
SOLO
For the particular vector measurement equation
where the measurement noise, is Gaussian (normal), with zero mean: ( ) ( )kkkv Rvvp ,0;N=
( )
( )
( )xp
zxp
xzp
x
zx
xz
,
| ,
| =
and independent of , the conditional probability can be written,
using Bayes rule as:
kx ( )xzp xz ||
( )










−
−
==−=
1
111
1111
1
1
,
nxpp
nx
pxnxpxnpxpx
xHz
xHz
zxfxHzv
xn
xn

( ) ( )
2/1
,,
/,, T
vxzx
JJvxpzxp =
The measurement noise can be related to and by the function:v zx
pxp
p
pp
p
I
z
f
z
f
z
f
z
f
z
f
J =
















∂
∂
∂
∂
∂
∂
∂
∂
=





∂
∂
=



1
1
1
1
( ) ( ) ( ) ( )vpxpvxpzxp vxvxzx
⋅== ,, ,,
kv
Since the measurement noise is independent of :xv
zThe joint probability of and is given by:x
kkkk vxHz +=
Correction Step (after zk measurement) - 1st
Way
( ) ( ) ( )
( )1:1
1:1
:1
|
||
|
−
−
=
kk
kkkk
kk
Zzp
Zxpxzp
Zxp

101
( ) ( )kkkv Rvvp ,0;N=
kkkk vxHz +=
Consider a Gaussian vector , where ,
measurement, , where the Gaussian noise
is independent of and .
v
kx ( ) [ ]1|1| ,; −−= kkkkkkx Pxxxp

N
kx
( ) ( ) ( ) ( )∫∫
+∞
∞−
+∞
∞−
== kkxkkxzkkkzxkz xdxpxzpxdzxpzp |, |,
is Gaussian with( )kz zp ( ) ( ) ( ) ( ) 1|
0
−=+=+= kkkkkkkkk xHvExEHvxHEzE


( ) ( )[ ] ( )[ ]{ } [ ][ ]{ }
( )[ ] ( )[ ]{ } [ ]{ }
[ ]{ } [ ]{ } { } k
T
kkkk
T
kk
T
k
T
kkkk
T
kkkkk
T
k
T
kkkkkkk
T
kkkkkkkkkk
T
kkkkkkkkkkkk
T
kkkkk
RHPHvvEHxxvEvxxEH
HxxxxEHvxxHvxxHE
xHvxHxHvxHEzEzzEzEz
+=+−−−−
−−=+−+−=
−+−+=−−=
−−−
−−−−
−−
1|
0
1|
0
1|
1|1|1|1|
1|1|cov
  

  



( )
( ) ( )
( )[ ] ( )[ ] ( )[ ]






−−+−−−−
+−
=
−
xHzRHPHxHz
RHPH
zp TT
Tpz
ˆˆ
2
1
exp
2
1 1
2/12/
π
( )
( )
( ) ( )





−−−= −
−
−−
−
−− 1|
1
1|1|2/1
1|
2/1:1|
2
1
exp
2
1
|1:1 kkkkk
T
kkk
kk
nkkZx xxPxx
P
Zxp kk

π
( ) ( )
( )
( ) ( )



−−−=−= −
kkk
T
kkkpkkkvkkxz xHzRxHz
R
xHzpxzp 1
2/12/|
2
1
exp
2
1
|
π
Correction Step (after zk measurement) 1st
Way (continue – 1)

102
kkkk vxHz +=
( ) ( )Rvvpv ,0;N=
( )
( ) 





−= −
vRv
R
vp T
pv
1
2/12/
2
1
exp
2
1
π
( )
( )
( ) ( )



−−−= −
−
−−
−
−− 1|
1
1|1|2/1
1|
2/1:1|
2
1
exp
2
1
|1:1 kkkkk
T
kkk
kk
nkkZx xxPxx
P
Zxp kk

π
( ) ( )
( )
( ) ( )



−−−=−= −
kkk
T
kkkpkkkvkkxz xHzRxHz
R
xHzpxzp 1
2/12/|
2
1
exp
2
1
|
π
( )
( )
[ ] [ ] [ ]






−+−−
+
= −
−
−−
−
1|
1
1|1|2/1
1|
2/
ˆˆ
2
1
exp
2
1
kkkk
T
kkkk
T
kkk
k
T
kkkk
p
kz xHzRHPHxHz
RHPH
zp
π
( ) ( ) ( )
( )
( )
( ) ( ) ( ) ( ) [ ] [ ] [ ]





−+−+−−−−−−⋅
+
==
−
−
−−−
−
−−
−
−
−−
−
1|
1
1|1|1|
1
1|1|
1
2/1
1|
2/12/1
1|2/1:1
1:1
:1
ˆˆ
2
1
2
1
2
1
exp
2
1
|
||
|
kkkkk
T
kkkk
T
kkkkkkkkk
T
kkkkkkk
T
kkk
k
T
kkkk
kkknkk
kkkk
kk
xHzRHPHxHzxxPxxxHzRxHz
RHPH
RPZzp
Zxpxzp
Zxp

π
from which

103
( ) ( ) ( ) ( ) ( ) [ ] ( )1|
1
1|1|1|
1
1|1|
1
−
−
−−−
−
−−
−
−+−−−−+−− kkkk
T
kkkkk
T
kkkkkkkkk
T
kkkkkkk
T
kkk xHzHPHRxHzxxPxxxHzRxHz

( )[ ] ( )[ ] ( ) ( )
( ) [ ] ( ) ( ) [ ]{ }( )
( ) ( ) ( ) ( ) ( ) [ ]( )1|
11
1|1|1|
1
1|1|
1
1|
1|
1
1|
1
1|1|
1
1|1|
1|
1
1|1|1|1|
1
1|1|
−
−−
−−−
−
−−
−
−
−
−
−
−
−−
−
−−
−
−
−−−−
−
−−
−+−+−−−−−−
−+−−=−+−−
−−+−−−−−−=
kkkkk
T
kkk
T
kkkkkkkk
T
kkkkkkkkk
T
k
T
kkk
kkkk
T
kkkkkk
T
kkkkkkkk
T
kkkkk
T
kkkk
kkkkk
T
kkkkkkkkkkkk
T
kkkkkkkk
xxHRHPxxxxHRxHzxHzRHxx
xHzHPHRRxHzxHzHPHRxHz
xxPxxxxHxHzRxxHxHz



[ ] [ ] 1111
1|
1111
1|
1 −−−−
−
−−−−
−
−
++/−/=+− k
T
kkk
T
kkkkkkk
LemmaMatrixInverse
T
kkkkkk RHHRHPHRRRHPHRRwe have
Define:
[ ] [ ] 1
1|
1
1|
1
1|
1
1|
111
1|| :
−
−
−
−
−
−
−
−
−−−
− +−=+= kk
T
k
T
kkkkkkkkkk
LemmaMatrixInverse
kk
T
kkkkk PHHPHRHPPHRHPP
( )[ ] ( )[ ]1|
1
|1|
1
|1|
1
|1| −
−
−
−
−
−
− −+−−+−= kkkkk
T
kkkkkkkk
T
kkkkk
T
kkkkkk xHzRHPxxPxHzRHPxx

( )
( )
( )[ ] ( )[ ]





−+−−+−−⋅= −
−
−
−
−
−
− 1|
1
|1|
1
|1|
1
|1|2/1
|
2/:1|
2
1
exp
2
1
| kkkkk
T
kkkkkkkk
T
kkkkk
T
kkkkkk
kk
nkkzx xHzRHPxxPxHzRHPxx
P
Zxp

π
then ( ) ( ) ( ) ( ) ( ) [ ] ( )1|
1
1|1|1|
1
1|1|
1
−
−
−−−
−
−−
−
−+−−−−+−− kkkkk
T
kkkk
T
kkkkkkkkk
T
kkkkkkk
T
kkk xHzRHPHxHzxxPxxxHzRxHz

( ) ( ) ( ) ( ) ( ) ( )
( ) ( )( ) ( ) ( )1|
1
|1|1|
1
||
1
1|
1|
1
|
1
|1|1|
1
|
1
||
1
1|
−
−
−−
−−
−
−
−−
−−
−−−
−
−−+−−−
−−−−−=
kkkkk
T
kkkkkkkkkkkk
T
kkkk
kkkkk
T
kkkkk
T
kkkkkkkk
T
kkkkkkkkk
T
kkkk
xxPxxxxPPHRxHz
xHzRHPPxxxHzRHPPPHRxHz



104
then
( )kkzx
x
Zxp
k
:1| |max
( )
{ }kk
kkkkk
T
kkkkkkkk
ZxE
xHzRHPxxx
:1
1|
1
|1|
*
|
|
ˆˆ:ˆ
=
−+== −
−
−
( )
( )
( )[ ] ( )[ ]





−+−−+−−⋅= −
−
−
−
−
−
− 1|
1
1|
1
|1|
1
1|2/1
|
2/:1|
2
1
exp
2
1
| kkkkk
T
kkkkkk
T
kkkkk
T
kkkk
kk
nkkzx xHzRHxxPxHzRHxx
P
Zxp

π
where:[ ] ( )( ){ }k
T
kkkkkkkk
T
kkkkk ZxxxxEHRHPP :1||
111
1||
ˆˆ: −−=+=
−−−
−

105
{ } ( ) ( )

ki
kkkkkkkkkkkkkkkkk zzKxxHzKxZxEx 1|1|1|1|:1| ˆ| −−−− −+=−+==
Summary 1st
Way – Kalman Filter
Initial Conditions:
[ ] 111
1|| :
−−−
− += kk
T
kkkkk HRHPP
Prediction phase (before zk measurement)
111|111|
ˆˆ −−−−−− +Φ= kkkkkkk uGxx
Correction Step (after zk measurement)
T
kk
T
kkkkkk QPP 1111|111| −−−−−−− ΓΓ+ΦΦ=
1
|:
−
= k
T
kkkk RHPK
{ }00|0
ˆ xEx = ( ) ( ){ }T
xxxxEP 0|000|000|0
ˆˆ: −−=
kkkk wxHz += { } { } { }
0
1:11|1:11:11| |ˆ||ˆ −−−−− +=+== kkkkkkkkkkkkk ZwExHZwxHEZzEz
1|1|
ˆˆ −− = kkkkk xHz

106
kkkk vxHz +=
( ) ( )Rvvpv ,0;N= ( )
( ) 





−= −
vRv
R
vp T
pv
1
2/12/
2
1
exp
2
1
π
( )
( )
[ ] [ ] [ ]






−+−−
+
= −
−
−−
−
1|
1
1|1|2/1
1|
2/
ˆˆ
2
1
exp
2
1
kkkkk
T
kkkk
T
kkkk
k
T
kkkk
p
kz xHzRHPHxHz
RHPH
zp
π
from which { } 1|1:11|
ˆ|ˆ −−− == kkkkkkk xHZzEz
( ) ( ){ } kk
T
kkkkk
T
kkkkkk
zz
kk SRHPHZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|
[ ][ ]{ }
[ ] ( )[ ]{ } T
kkkk
T
kkkkkkkk
k
T
kkkkkk
xz
kk
HPZvxxHxxE
ZzzxxEP
1|1:11|1|
1:11|1|1|
ˆˆ
ˆˆ
−−−−
−−−−
=+−−=
−−=
We also have
Correction Step (after zk measurement) 2nd
Way
Define the innovation: 1|1|
ˆˆ: −− −=−= kkkkkk xHzzzi

107
Joint and Conditional Gaussian Random Variables






=
k
k
k
z
x
yDefine: assumed that they are Gaussian distributed
Prediction phase (before zk measurement) 2nd
way (continue -1)
{ }








=












=
−
−
−
−
−
1|
1|
1:1
1:1
1:1
ˆ
ˆ
|
|
|
kk
kk
kk
kk
kk
z
x
Zz
Zx
EZyE








=
















−
−








−
−
=
−−
−−
−
−
−
−
−
− zz
kk
zx
kk
xz
kk
xx
kk
k
T
kkk
kkk
kkk
kkkyy
kk
PP
PP
Z
zz
xx
zz
xx
EP
1|1|
1|1|
1:1
1|
1|
1|
1|
1|
ˆ
ˆ
ˆ
ˆ
where: [ ][ ]{ } 1|1:11|1|1|
ˆˆ −−−−− =−−= kkk
T
kkkkkk
xx
kk PZxxxxEP
[ ][ ]{ } kk
T
kkkkk
T
kkkkkk
zz
kk SRHPHZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|
[ ][ ]{ } T
kkkk
T
kkkkkk
xz
kk HPZzzxxEP 1|1:11|1|1| ˆˆ −−−−− =−−=

108
( ) ( ) ( )



−−−= −
−
−−
−
− 1|
1
1|1|2/1
1|
1:1,
ˆˆ
2
1
exp
2
1
|, kkk
yy
kk
T
kkk
yy
kk
kkkzx yyPyy
P
Zzxp
π
The conditional probability distribution function (pdf) of xk given zk is given by:
( ) ( ) ( )





−−−= −
−
−−
−
− 1|
1
1|1|2/1
1|
1:1 ˆˆ
2
1
exp
2
1
| kkk
zz
kk
T
kkk
zz
kk
kkz zzPzz
P
Zzp
π
( ) ( )
( )
( )
( ) ( )
( ) ( )



−−−




−−−
===
−
−
−−
−
−
−−
−
−
−
−
−
1|
1
1|1|
1|
1
1|1|
2/1
1|
2/1
1|
1:1
1:1,
|1:1|
ˆˆ
2
1
exp
ˆˆ
2
1
exp
2
2
|
|,
|,|
kkk
zz
kk
T
kkk
kkk
yy
kk
T
kkk
yy
kk
zz
kk
kkz
kkkzx
kkzxkkkzx
zzPzz
yyPyy
P
P
Zzp
Zzxp
zxpZzxp
π
π
( ) ( ) ( ) ( )



−−+−−−= −
−
−−−
−
−−
−
−
1|
1
1|1|1|
1
1|1|2/1
1|
2/1
1|
ˆˆ
2
1
ˆˆ
2
1
exp
2
2
kkk
zz
kk
T
kkkkkk
yy
kk
T
kkk
yy
kk
zz
kk
zzPzzyyPyy
P
P
π
π
We assumed that is Gaussian distributed:





=
k
k
k
z
x
y

109
( ) ( ) ( ) ( ) ( )



−−+−−−= −
−
−−−
−
−−
−
−
1|
1
1|1|1|
1
1|1|2/1
1|
2/1
1|
| ˆˆ
2
1
ˆˆ
2
1
exp
2
2
| kkk
zz
kk
T
kkkkkk
zz
kk
T
kkk
yy
kk
zz
kk
kkzx zzPzzyyPyy
P
P
zxp
π
π
Define: 1|1| ˆ:&ˆ: −− −=−= kkkkkkkk zzxx ςξ
( ) ( ) ( ) ( )
k
zz
kk
T
kk
zz
kk
T
kk
zx
kk
T
kk
xz
kk
T
kk
xx
kk
T
k
kkkzz
T
k
k
k
zz
kk
zx
kk
xz
kk
xx
kk
T
k
k
k
zz
kk
T
k
k
k
zz
kk
zx
kk
xz
kk
xx
kk
T
k
k
kkk
zz
kk
T
kkkkkk
yy
kk
T
kkk
PTTTT
P
TT
TT
P
PP
PP
zzPzzyyPyyq
ςςςςξςςξξξ
ςς
ς
ξ
ς
ξ
ςς
ς
ξ
ς
ξ
1
1|1|1|1|1|
1
1|
1|1|
1|1|
1
1|
1
1|1|
1|1|
1|
1
1|1|1|
1
1|1| ˆˆˆˆ:
−
−−−−−
−
−
−−
−−
−
−
−
−−
−−
−
−
−−−
−
−−
−+++=
−



















=
−



















=
−−−−−=

110
way (continue – 4)
Using Inverse Matrix Lemma:
( ) ( )
( ) ( ) 







−−−
−−−
=





−−−−−
−−−−−−
11111
111111
nxmnxnmxnmxmmxnmxmnxmnxnmxnmxm
mxmnxmmxnmxmnxmnxnmxnmxmnxmnxn
mxmmxn
nxmnxn
BADCDCBADC
CBDCBADCBA
CD
BA








=








−−
−−
−
−−
−−
zz
kk
zx
kk
xz
kk
xx
kk
zz
kk
zx
kk
xz
kk
xx
kk
TT
TT
PP
PP
1|1|
1|1|
1
1|1|
1|1|
in 1
1|1|1|
1
1|
1|
1
1|1|1|
1
1|
1|
1
1|1|1|
1
1|
−
−−−
−
−
−
−
−−−
−
−
−
−
−−−
−
−
−=
−=
−=
zz
kk
xz
kk
xz
kk
xx
kk
xz
kk
xx
kk
zx
kk
zz
kk
zz
kk
kkzxkkzzkkxzkkxxkkxx
PPTT
TTTTP
PPPPT
k
zz
kk
T
kk
zz
kk
T
kk
zx
kk
T
kk
xz
kk
T
kk
xx
kk
T
k PTTTTq ςςςςξςςξξξ
1
1|1|1|1|1|
−
−−−−− −+++=
( )
k
zz
kk
T
kk
zz
kk
T
k
k
xz
kk
xx
kk
zx
kk
T
kk
xz
kk
xx
kk
zx
kk
T
kk
xz
kk
T
kk
xx
kk
xx
kk
zx
kk
T
k
T
k
PT
TTTTTTTTTT
ςςςς
ςςςςςξξςξ
1
1|1|
1|
1
1|1|1|
1
1|1|1|1|
1
1|1|
−
−−
−
−
−−−
−
−−−−
−
−−
−+
−+++=
( ) ( )
( ) ( ) ( )k
xz
kk
xx
kkk
xx
kk
T
k
xz
kk
xx
kkkk
zz
kk
xz
kk
xx
kkkkzx
zz
kk
T
k
k
xz
kk
xx
kk
xx
kk
T
k
xz
kk
xx
kkkk
xx
kk
T
k
xz
kk
xx
kkk
TT
TTTTTPTTTT
TTTTTTTT
zx
kk
Txz
kk
ςξςξςς
ςςξξςξ
1|
1
1|1|1|
1
1|
0
1|1|
1
1|1|1|
1|
1
1|1|1|
1
1|1|1|
1
1|
1|1|
−
−
−−−
−
−−−
−
−−−
−
−
−−−
−
−−−
−
−
=
++=−−+
+++=
−−
  

111
way (continue – 5)








=








−−
−−
−
−−
−−
zz
kk
zx
kk
xz
kk
xx
kk
zz
kk
zx
kk
xz
kk
xx
kk
TT
TT
PP
PP
1|1|
1|1|
1
1|1|
1|1|
1
1|1|1|
1
1|
1|
1
1|1|1|
1
1|
1|
1
1|1|1|
1
1|
−
−−−
−
−
−
−
−−−
−
−
−
−
−−−
−
−
−=
−=
−=
zz
kk
xz
kk
xz
kk
xx
kk
xz
kk
xx
kk
zx
kk
zz
kk
zz
kk
kkzxkkzzkkxzkkxxkkxx
PPTT
TTTTP
PPPPT
( ) ( )k
xz
kk
xx
kkk
xx
kk
T
k
xz
kk
xx
kkk TTTTTq ςξςξ 1|
1
1|1|1|
1
1| −
−
−−−
−
− ++=
1|1| ˆ:&ˆ: −− −=−= kkkkkkkk zzxx ςξ
( )
( )[ ] ( )[ ]






−−−−−−−=






−=
−−−−−
−
−
−
−
1|1|1|1|1|2/1
1|
2/1
1|
2/1
1|
2/1
1|
|
ˆˆˆˆ
2
1
exp
2
2
2
1
exp
2
2
|
kkkkkkk
xx
kk
T
kkkkkkk
yy
kk
zz
kk
yy
kk
zz
kk
kkzx
zzKxxTzzKxx
P
P
q
P
P
zxp
π
π
π
π
( )1|
1
1|1|1|
1
1|1| ˆˆ −
−
−−−
−
−− −−−=+ kkk
K
zz
kk
xz
kkkkkk
xx
kk
xz
kkk zzPPxxTT
k


ςξ

112
( ) ( )[ ] ( )[ ]





−−−−−−−= −
−
−−−−−
−
−−− 1|
1
1|1|1|1|1|
1
1|1|1|| ˆˆˆˆ
2
1
exp| kkk
xx
kk
xz
kkkkk
xx
kk
T
kkk
xx
kk
xz
kkkkkkkzx zzPPxxTzzPPxxczxp
From this we can see that
{ } ( )1|
1
1|1|1|| ˆˆˆ| −
−
−−− −+== kkk
K
zz
kk
xz
kkkkkkkk zzPPxxzxE
k



( )( ){ }
T
k
zz
kkk
xx
kk
zx
kk
zz
kk
xz
kk
xx
kk
xx
kkk
T
kkkkkk
xx
kk
KPKP
PPPPTZxxxxEP
1|1|
1|
1
1|1|1|
1
1|:1|||
ˆˆ
−−
−
−
−−−
−
−
−=
−==−−=
[ ][ ]{ } 1|1:11|1|1|
ˆˆ −−−−− =−−= kkk
T
kkkkkk
xx
kk PZxxxxEP
[ ][ ]{ } k
T
kkkkkk
T
kkkkkk
zz
kk SHPHRZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|
[ ][ ]{ } T
kkkk
T
kkkkkk
xz
kk HPZzzxxEP 1|1:11|1|1| ˆˆ −−−−− =−−=

113
From this we can see that
( ) [ ] 111
1|1|
1
1|1|1||
−−−
−−
−
−−− +=+−= kk
T
kkkkkk
T
kkkkk
T
kkkkkkk HRHPPHHPHRHPPP
( ) 1
1|
1
1|1|
1
1|1|
−
−
−
−−
−
−− =+== k
T
kkk
T
kkkkk
T
kkk
zz
kk
xz
kkk SHPHPHRHPPPK
kk
T
kkkkk KSKPP −= −1||
or
[ ][ ]{ } 1|1:11|1|1|
ˆˆ −−−−− =−−= kkk
T
kkkkkk
xx
kk PZxxxxEP
[ ][ ]{ } k
T
kkkkkk
T
kkkkkk
zz
kk SHPHRZzzzzEP =+=−−= −−−−− :ˆˆ 1|1:11|1|1|
[ ][ ]{ } T
kkkk
T
kkkkkk
xz
kk HPZzzxxEP 1|1:11|1|1| ˆˆ −−−−− =−−=

114
We found that the optimal Kk is
[ ] 1
1|1|
−
−− +=
T
kkkkk
T
kkkk HPHRHPK
[ ] [ ] 1111
|1
11
&
1
|1 1
1|
1
−−−−
+
−−−
+ +−=+ −
−
− k
T
kkk
T
kkkkkk
LemmaMatrixInverse
existPR
T
kkkkk RHHRHPHRRHPHR
kkk
[ ] 1111
1|
1
1|
1
1|
−−−−
−
−
−
−
− +−= k
T
kkk
T
kkkkk
T
kkkk
T
kkkk RHHRHPHRHPRHPK
[ ]{ } [ ] 1111
|1
111
|1|1
−−−−
+
−−−
++ +−+= k
T
kkk
T
kkkkk
T
kkk
T
kkkkk RHHRHPHRHHRHPP
[ ] 1
|
1111
|1
−−−−−
+ =+= RHPRHHRHPK T
kkk
T
kkk
T
kkkk
If Rk
-1
and Pk|k-1
-1
exist:
Relation Between 1st
and 2nd
ways
2nd
Way
1st
Way = 2nd
Way

3 recursive bayesian estimation

3 recursive bayesian estimation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 3 recursive bayesian estimation

Similar to 3 recursive bayesian estimation (20)

More from Solo Hermelin

More from Solo Hermelin (20)

Recently uploaded

Recently uploaded (20)

3 recursive bayesian estimation

Editor's Notes