2. Outline
Motivation of Particle Filters
Two commonly used Particle Filters
A Stochastic Volatility Model example
3. State-Space Processes
A general state-space model is defined as
Observation equation: yt+1 ∼ p(yt+1 |xt+1 , θ)
Evolution equation: xt+1 ∼ p(xt+1 |xt , θ)
with initial state distribution p(x0 |θ) and prior p(θ).
Some notation notes:
1. xt , yt denote a single observation or state at time t.
2. xt , yt denote vectors of observations or states up to time t.
3. T is the total number of observations and N is the Monte Carlo
sample size in our SMC algorithm.
4. Inferential Goals
There are primarily four inferential goals, namely:
1. Filtering: the posterior distribution of current latent state at
each time point and known parameters
p(xt |yt , θ) for t = 1, . . . , T
2. Learning: the posterior distribution of parameter given all
observations
p(θ|yT )
3. Smoothing: the posterior distribution of latent states given all
the observations
p(xT |yT )
4. Model Assessment: the marginal distribution of observations,
used for computing Bayes Factor
p(yT )
5. Inferential Goals
Due to time constraints, we will only talk about filtering today.
The dependence on parameters is depressed.
1. Filtering: the posterior distribution of current latent state at
each time point and known parameters
p(xt |yt ) for t = 1, . . . , T
2. Learning: the posterior distribution of parameter given all
observations
p(θ|yT )
3. Smoothing: the posterior distribution of latent states given all
the observations
p(xT |yT )
4. Model Assessment: the marginal distribution of observations,
used for computing Bayes Factor
p(yT )
6. Kalman Filters
When the state-space model is linear and Gaussian, it is
known as Normal Dynamic Linear Model (NDLM).
yt = Ft xt + vt
xt = Gt xt−1 + wt
vt ∼ N(0, σ2 )
t
wt ∼ N(0, τ2 )
t
in which noises vt and wt are temporally and mutually.
The celebrated Kalman Filters solve this model using Kalman
Recursions.
7. But the real world is not NDLM
Most real world applications are not inside the realm of NDLM.
yt |xt ∼ p(yt |xt ),
xt |xt−1 ∼ p(xt |xt−1 )
We cannot get the analytical form of the recursions as we can
in NDLM.
p(xt |yt−1 ) = p(xt |xt−1 )p(xt−1 |yt−1 ) dxt−1
p(xt |yt ) ∝ p(yt |xt )p(xt |yt−1 )
But the form of the above display suggests that some kind of
Monte Carlo methods might work.
8. First Particle Filter: Bootstrap Filter
Gordon, Salmond and Smith (1993) proposed one of the
earliest (and still popular) particle filter: bootstrap filter.
p(xt |yt−1 ) = p(xt |xt−1 )p(xt−1 |yt−1 ) dxt−1 (1)
p(xt |yt ) ∝ p(yt |xt )p(xt |yt−1 ) (2)
The rough idea is when we have Monte Carlo samples
(particles) from p(xt−1 |yt−1 ), we can generate new particles
from p(xt |yt−1 ) based on (1), and then resample from new
particles based on likelihood ratio to get the particles from
p(xt |yt ).
9. Bootstrap Filter
(i) (i)
1. Propagate {xt−1 }N 1 to {xt }N 1 via p(xt |xt−1 )
i= ˜ i=
(i) (i)
2. Resample {xt }N 1 from {xt−1 }N 1 with weights
i= ˜ i=
(i)
wi ∝ p(yt |xt )
t ˜
Algorithm 1: Bootstrap Filter
Bootstrap Filter belongs to the class of Propagate-Resample
Filters.
Bootstrap Filter is incredibly easy to implement, since we only
need to be able to sample from evolutionary density
p(xt |xt−1 ) and to compute likelihood p(yt |xt ).
Notice that the Resampling step is not essential.
10. Drawback of Bootstrap Filter
However, Bootstrap Filter suffers from severe Model
Degeneracy (Model Impoverishment).
In the Propagation step, information of new observation yt is
not incorporated, so the propagated states may not be
important or high likelihood states, and so at the resampling
step most weights are given to a few of the new states. Then
we won’t have a good Monte Carlo sample of the objective
distribution.
Of course, if we set the MC sample size N large enough, we
might reduce the Model Degeneracy to a reasonable range,
but it is too computationally expensive.
11. Another Approach: Auxiliary Particle Filter
The problem with Bootstrap Filter is the blind propagation.
Playing with math, we can find the following thing
p(xt |yt ) ∝ p(xt |xt−1 , yt )p(yt |xt−1 )p(xt−1 |yt−1 )
Based on this Pitt and Shephard (1999) proposed a
Resample-Propagate scheme.
The idea is with MC samples from p(xt−1 |yt−1 ), we resample
to get p(xt−1 |yt ), then propagate to get MC samples from
p(xt |yt ).
12. Auxiliary Particle Filter
(i) (i)
1. Resample {xt−1 }N 1 from {xt−1 }N 1 with weights
˜ i= i=
(i)
wi ∝ p(yt |xt−1 )
t
(i) (i)
2. Propagate {xt−1 }N 1 to {xt }N 1 via p(xt |xt−1 , yt )
˜ i= i=
Algorithm 2: Auxiliary Particle Filter
In APF, we use current observation yt in our first resampling
step, so only “good” particles are propagated forward. It
should be more efficient than Boostrap Filter.
13. Drawback of APF
But nothing comes for free. In order to use the Algorithm
given above, we need to (1) be able to compute the predictive
likelihood p(yt |xt−1 ) and (2) be able to sample from
“evolution-posterior” density p(xt |xt−1 , yt ).
These are often impossible in most cases.
An general approach is Importance Sampling. Pitts and
Shephard (1999) suggests:
1. Use p(yt |g(xt−1 )), i.e. likelihood p(yt |xt ) evaluated at
g(xt−1 ) as proposal weights of resample;
2. Use evolution kernel p(xt |xt−1 ) as the proposal density to
propagate particles.
14. An Applicable APF
(i) (i)
1. Resample {xt−1 }N 1 from {xt−1 }N 1 with weights
˜ i= i=
(i)
wi ∝ p(yt |g(xt−1 ))
t
(i) (i)
2. Propagate {xt−1 }N 1 to {xt }N 1 via p(xt |xt−1 )
˜ i= ˜ i=
(i) (i)
3. Resample {xt }N 1 from {xt }N 1 with weights
i= ˜ i=
(i)
p(yt |xt )
˜
wi ∝
t (i)
p(yt |g(xt−1 ))
˜
Algorithm 3: An Applicable Auxiliary Particle Filter
The performance of APF depends hugely on how we choose
proposal density. In some cases, it might perform worse than
Bootsrap Filter.
15. Example: Stochastic Volatility Model
Consider a very simple log-stochastic volatility model
yt = Vt εt
log(Vt ) = α + β log(Vt−1 ) + σηt
log(V1 ) ∼ N(0, σ2 )
0
α = 0, β = .95, σ = σ0 = .1, T = 500
4
2
y
0
−2
100 200 300 400 500
Time