Current limitations of sequential inference in general hidden Markov models
1. Current limitations of sequential inference in
general hidden Markov models
Pierre Jacob
Department of Statistics, University of Oxford
March 5th
Pierre Jacob Sequential inference in HMM 1/ 60
2. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 2/ 60
3. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 2/ 60
4. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 2/ 60
5. Hidden Markov Models
y2y1 yT
X2X0 X1 XT
y0
Figure : Graph representation of a general HMM.
(Xt): initial µθ, transition fθ. (Yt) given (Xt): measurement gθ.
Prior on the parameter θ ∈ Θ.
Pierre Jacob Sequential inference in HMM 3/ 60
7. General questions
For each model, how much do the data inform the
parameters?
For each model, how much do the data inform the latent
Markov process?
How much do the data inform the choice of a model?
How to predict future observations?
Pierre Jacob Sequential inference in HMM 5/ 60
8. Questions translated into integrals
Filtering question:
∫
X
φ(xt) p(dxt | y0:t, θ)
=
1
Zt(θ)
∫
Xt+1
φ(xt)p(dx0:t | θ)
t∏
s=0
p(ys | xs, θ).
Prediction question:
∫
Y
φ(yt+k) p(dyt+k | y0:t, θ)
=
∫
Y
∫
X
φ(yt+k) p(dxt+k | y0:t, θ) p(dyt+k | xt+k, θ).
Pierre Jacob Sequential inference in HMM 6/ 60
9. Questions translated into integrals
Parameter estimation:
p(y0:t | θ) =
∫
Xt+1
p(dx0 | θ)
t∏
s=1
p(dxs | xs−1, θ)
t∏
s=0
p(ys | xs, θ),
and eventually
∫
Θ
φ(θ)πθ,t(dθ) =
1
Zt
∫
Θ
φ(θ)p(y0:t | θ)πθ(dθ).
If we acknowledge parameter uncertainty, then more questions:
∫
X
φ(xt) p(dxt | y0:t) =
∫
Θ
∫
X
φ(xt)p(dxt | y0:t, θ)πθ,t(dθ).
Pierre Jacob Sequential inference in HMM 7/ 60
10. Questions translated into integrals
Model choice:
P
(
M = M(m)
| y0:t
)
=
P
(
M = M(m)
)
Z
(m)
t
∑M
m′=1 P
(
M = M(m′)
)
Z
(m′)
t
.
If we acknowledge model uncertainty, then more questions:
∫
Y
φ(yt+k) P(dyt+k | y0:t)
=
M∑
m=1
∫
Θ(m)
∫
Y
φ(yt+k) p(dyt+k | y0:t, θ, M(m)
)
× πθ(m),t(dθ)P
(
M = M(m)
| y0:t
)
.
Pierre Jacob Sequential inference in HMM 8/ 60
11. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 8/ 60
12. Phytoplankton–Zooplankton model
Hidden process (xt) = (αt, pt, zt).
At each (integer) time, αt ∼ N(µα, σ2
α).
Given αt,
dpt
dt
= αpt − cptzt,
dzt
dt
= ecptzt − mlzt − mqz2
t .
Observations: log yt ∼ N(log pt, σ2
y).
Set c = 0.25 and e = 0.3, and (log p0, log z0) ∼ N(log 2, 0.2).
Unknown parameters: θ = (µα, σα, σy, ml, mq).
Pierre Jacob Sequential inference in HMM 9/ 60
13. Implicit models
Even simple, standard scientific models are such that the
implied probability distribution p(dx0:t | θ) admits a density
function that cannot be computed pointwise.
To cover as many models as possible, we can only assume
that the hidden process can be simulated.
This covers cases where xt = ψ(xt−1, k, v1:k), for some integer
k, vector v1:k ∈ Rk, and deterministic function ψ.
Calls for “plug and play” methods.
Time series analysis via mechanistic models,
Bret´o, He, Ionides and King, 2009.
Pierre Jacob Sequential inference in HMM 10/ 60
14. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 10/ 60
15. Exact methods
Consider the problem of estimating some quantity It.
Consider an estimator IN
t where N is a tuning parameter.
Hopefully N is such that IN
t
some sense
−−−−−−→
N→∞
It.
For instance E[(IN
t − It)2] goes to zero when N → ∞.
Variational methods / Ensemble Kalman Filters are not exact.
Consider the estimator that always returns 29.5. . .
Pierre Jacob Sequential inference in HMM 11/ 60
16. Sequential methods
Consider the problem of estimating some quantity It, for all
t ≥ 0, e.g. upon the arrival of new data.
Assume the quantities It for all t ≥ 0 are related one to the
other.
A sequential method “updates” the estimate IN
t into IN
t+1.
MCMC methods are not sequential: they have to be re-run
from scratch whenever a new observation arrives.
Therefore, sequential methods are not to be confused with
iterative methods.
Pierre Jacob Sequential inference in HMM 12/ 60
17. Online methods
Consider the problem of estimating some quantity It, for all
t ≥ 0, e.g. upon the arrival of new data.
A method is online if it provides estimates IN
t of It for all
t ≥ 0, such that. . .
. . . the computational cost of obtaining each IN
t given IN
t−1 is
independent of t,
. . . the precision of the estimate does not explode over time:
r(IN
t ) =
(
E
[
(IN
t − It)2
])1/2
|It|
can be uniformly bounded over t.
Consider the estimator that always returns 29.5. . .
Pierre Jacob Sequential inference in HMM 13/ 60
18. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 13/ 60
19. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 13/ 60
20. Approximate Bayesian Computation
1 Draw θ from the prior distribution πθ.
2 Draw x0:t, a realisation of the hidden Markov chain given θ.
3 Draw ˆy0:t, a realisation of the observations given x0:t and θ.
4 If D(ˆy0:t, y0:t) ≤ ε, keep (θ, x0:t).
Pierre Jacob Sequential inference in HMM 14/ 60
21. Approximate Bayesian Computation
Plug and play: only requires simulations from the model.
Exact if D is a distance and ε is zero.
In practice, D is typically not a distance.
The tolerance ε is often chosen implicitely.
E.g., ε is chosen so that 1% of the generated samples is kept.
Better than the 29.5 estimator?
Pierre Jacob Sequential inference in HMM 15/ 60
22. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 15/ 60
23. Sequential Monte Carlo for filtering
Objects of interest:
filtering distributions: p(xt|y0:t, θ), for all t, for a given θ,
likelihood: p(y0:t | θ) =
∫
p(y0:t | x0:t, θ)p(x0:t | θ)dx0:t.
Particle filters:
propagate recursively Nx particles approximating p(xt | y0:t, θ)
for all t,
give likelihood estimates ˆpNx (y0:t | θ) of p(y0:t | θ) for all t.
Pierre Jacob Sequential inference in HMM 16/ 60
24. Plug and play requirement
Particle filters can be implemented if
the hidden process can be simulated forward, given any θ:
x0 ∼ µθ and xt ∼ fθ(· | xt−1),
the measurement density gθ(y | x) can be evaluated
point-wise, for any x, y, θ.
A bit less “plug and play” than ABC.
Pierre Jacob Sequential inference in HMM 17/ 60
25. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
26. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
27. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
28. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
29. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
30. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
31. Sequential Monte Carlo for filtering
y2
X2X0
y1
X1
...
... yT
XT
θ
Pierre Jacob Sequential inference in HMM 18/ 60
32. Sequential Monte Carlo for filtering
Consider I(φt) =
∫
φt(xt)p(xt | y0:t)dxt.
Lp-bound:
E
[
IN
(φt) − I (φt)
p]1/p
≤
c(p) ||φt||∞√
N
.
Central limit theorem:
√
N
(
IN
(φt) − I (φt)
)
D
−−−−→
N→∞
N
(
0, σ2
t
)
.
where σ2
t < σ2
max for all t.
Particle filters are fully online, plug and play, and exact. . . for
filtering.
Pierre Jacob Sequential inference in HMM 19/ 60
33. Sequential Monte Carlo for filtering
Properties of the likelihood estimator
The likelihood estimator is unbiased,
E
[
ˆpNx
(y0:t | θ)
]
= E
[ t∏
s=0
1
Nx
Nx∑
k=1
wk
s
]
= p(y0:t | θ)
and the relative variance is bounded linearly in time,
V
[
ˆpNx (y0:t | θ)
p(y0:t | θ)
]
≤ C
t
Nx
for some constant C (under some conditions!).
Particle filters are not online for likelihood estimation.
Pierre Jacob Sequential inference in HMM 20/ 60
34. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 20/ 60
35. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 20/ 60
36. SMC samplers
The goal is now to approximate sequentially
p(θ), p(θ|y0), . . . , p(θ|y0:T ).
Sequential Monte Carlo samplers.
Jarzynski 1997, Neal 2001, Chopin 2002, Del Moral, Doucet
& Jasra 2006. . .
Propagates a number Nθ of θ-particles approximating
p(θ | y0:t) for all t.
Evidence estimates ˆpNθ (y0:t) ≈ p(y0:t) for all t.
Pierre Jacob Sequential inference in HMM 21/ 60
38. First step
p(θ|y1)
p(θ)
q qqq qq q qqq qq qq qq qq qqqqq qq qqqqqq qqq q qqq qq qqq q qq qq qq
Θ
density
Figure : First distribution in black, next distribution in red.
Pierre Jacob Sequential inference in HMM 23/ 60
40. Resampling and move
p(θ|y1)
p(θ)
q qqqq qq q q qqq qq qqq qqq qqqq qqq qq qq q qqq q qqq q qq qqq qq q qq
Θ
density
Figure : Samples θ after resampling and MCMC move.
Pierre Jacob Sequential inference in HMM 25/ 60
41. SMC samplers
1: Sample from the prior θ(m) ∼ p(·) for m ∈ [1, Nθ].
2: Set ω(m) ← 1/Nθ.
3: for t = 0 to T do
4: Reweight ω(m) ← ω(m) × p(yt|y0:t−1, θ(m)) for m ∈ [1, Nθ].
5: if some degeneracy criterion is met then
6: Resample the particles, reset the weights ω(m) ← 1/Nθ.
7: MCMC move for each particle, targeting p(θ | y0:t).
8: end if
9: end for
Pierre Jacob Sequential inference in HMM 26/ 60
42. Proposed method
SMC samplers require
pointwise evaluations of p(yt | y0:t−1, θ),
MCMC moves targeting each intermediate distribution.
For Hidden Markov models, the likelihood is intractable.
Particle filters provide likelihood approximations for a given θ.
Hence, we equip each θ-particle with its own particle filter.
Pierre Jacob Sequential inference in HMM 27/ 60
43. One step of SMC2
For each θ-particle θ
(m)
t , perform one step of its particle filter:
to obtain pNx (yt+1 | y0:t, θ
(m)
t ) and reweight:
ω
(m)
t+1 = ω
(m)
t × pNx
(yt+1|y0:t, θ
(m)
t ).
Pierre Jacob Sequential inference in HMM 28/ 60
44. One step of SMC2
Whenever
Effective sample size =
(∑Nθ
m=1 ω
(m)
t+1
)2
∑Nθ
m=1
(
ω
(m)
t+1
)2 < threshold × Nθ
(Kong, Liu & Wong, 1994)
resample the θ-particles and move them by PMCMC, i.e.
Propose θ⋆ ∼ q(·|θ
(m)
t ) and run PF(Nx, θ⋆) for t + 1 steps.
Accept or not based on ˆpNx (y0:t+1 | θ⋆).
Pierre Jacob Sequential inference in HMM 29/ 60
52. Exact approximation
SMC2 is a standard SMC sampler on an extended space, with
target distribution:
πt(θ, x1:Nx
0:t , a1:Nx
0:t−1) = p(θ|y0:t)
×
1
Nt+1
x
Nx∑
n=1
p(xn
0:t|θ, y0:t)
Nx∏
i=1
i̸=hn
t (1)
q0,θ(xi
0)
×
t∏
s=1
Nx∏
i=1
i̸=hn
t (s)
W
ai
s−1
s−1,θqs,θ(xi
s|x
ai
s−1
s−1 )
.
Related to pseudo-marginal and PMCMC methods.
Pierre Jacob Sequential inference in HMM 31/ 60
53. Exact approximation
From the extended target representation, we obtain
θ from p(θ | y1:t),
xn
0:t from p(x0:t | θ, y1:t),
thus allowing joint state and parameter inference.
Evidence estimates are obtained by computing the average of
the θ-weights ω
(m)
t .
The “extended target” argument yields consistency for any
fixed Nx, when Nθ goes to infinity.
Exact method, sequential by design, but not online.
Pierre Jacob Sequential inference in HMM 32/ 60
54. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 32/ 60
55. Scalability in T
Cost if MCMC move at each time step
A single move step at time t costs O (tNxNθ).
If move at every step, the total cost becomes O
(
t2NxNθ
)
.
If Nx = Ct, the total cost becomes O
(
t3Nθ
)
.
With adaptive resampling, the cost is only O
(
t2Nθ
)
. Why?
Pierre Jacob Sequential inference in HMM 33/ 60
56. Scalability in T
512
1024
0 100 200 300
time
ESS
Figure : Effective Sample Size against time, for the PZ model.
Pierre Jacob Sequential inference in HMM 34/ 60
57. Scalability in T
0e+00
2e+06
4e+06
6e+06
0 100 200 300
time
Cumulativecostperparticle
Figure : Cumulative cost per θ-particle during one run of SMC2
. The
cost is measured by the number of calls to the transition sampling
function. Nx is fixed.
Pierre Jacob Sequential inference in HMM 35/ 60
60. Scalability in T
Under Bernstein-Von Mises, the posterior becomes Gaussian.
p(θ|y1:ct)
p(θ|y1:t)
Θ
density
E[ESS] from p(θ | y1:t) to p(θ | y1:ct) becomes independent of t.
Hence resampling times occur geometrically: τk ≈ ck with c > 1.
Pierre Jacob Sequential inference in HMM 38/ 60
61. Scalability in T
More formally. . .
The expected ESS at time t + k, if the last resampling time was t,
is related to
Vp(θ|y1:t)
[
p(θ | y1:t+k)
p(θ | y1:t)
]
= Vp(θ|y1:t)
[
L(θ; y1:t+k)
L(θ; y1:t)
∫
Θ L(θ; y1:t)p(dθ)
∫
Θ L(θ; y1:t+k)p(dθ)
]
.
Then Laplace expansions of L yield similar results as before, under
regularity conditions.
Pierre Jacob Sequential inference in HMM 39/ 60
62. Scalability in T
Open problem
Online exact Bayesian inference in linear time?
On one hand dim(X0:t) = dim(X) × (t + 1) which grows . . .
. . . but θ itself is of fixed dimension and p(θ | y1:t) ≈ N(θ⋆, v⋆/t)!
Our specific problem
Move steps at time t imply running a particle filter from time zero.
Attempts have been made at re-starting from t − ∆ but then, bias.
Pierre Jacob Sequential inference in HMM 40/ 60
63. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 40/ 60
64. Phytoplankton–Zooplankton: model
Hidden process (xt) = (αt, pt, zt).
At each (integer) time, αt ∼ N(µα, σ2
α).
Given αt,
dpt
dt
= αpt − cptzt,
dzt
dt
= ecptzt − mlzt − mqz2
t .
Observations: log yt ∼ N(log pt, σ2
y).
Set c = 0.25 and e = 0.3, and (log p0, log z0) ∼ N(log 2, 0.2).
Unknown parameters: θ = (µα, σα, σy, ml, mq).
Pierre Jacob Sequential inference in HMM 41/ 60
77. Outline
1 Setting: online inference in time series
Hidden Markov Models
Implicit models
Exact / sequential / online methods
2 Plug and play methods
Approximate Bayesian Computation
Particle Filters
3 SMC2 for sequential inference
A sequential method for HMM
Not online
4 Numerical experiments
5 Discussion
Pierre Jacob Sequential inference in HMM 53/ 60
78. Forgetting mechanism for hidden states
Forgetting property of a uniformly ergodic Markov chain:
||pν
t − pµ
t ||TV ≤ Cρt
where ν, µ are two initial distributions pν
t is the distribution of
Xt after t steps, ρ < 1, C > 0.
Similarly, the filtering distribution πt(dxt) = p(dxt | y0:t)
forgets its initial condition geometrically fast.
Introduce the operator Φt, taking a measure, applying a
Markov kernel to it, and then a Bayes update using yt.
Under conditions on the data generating process and the
model,
||Φ0:t(µ) − Φ0:t(ν)||TV ≤ Cρt
.
Pierre Jacob Sequential inference in HMM 54/ 60
79. Forgetting mechanism for parameters
Forgetting mechanism for Bayesian posterior distribution:
||pν
t − pµ
t ||TV ≤
1
√
t
C.
Huge literature on prior robustness.
Posterior forgetting goes much slower than Markov chain
forgetting.
An error in the approximation of p(θ | y1:t) damages the
subsequent approximations of p(θ | y1:t+k), for many k’s.
SMC samplers are stable because of the added MCMC steps,
which costs increase with t.
Pierre Jacob Sequential inference in HMM 55/ 60
80. Other challenges
Dimensionality: the other big open problem.
Particle filter’s errors grow exponentially fast with dim(X).
Can local particle filters beat the curse of dimensionality?
Rebeschini, van Handel, 2013.
Carefully analyzed biased approximations.
Assumption of a spatial forgetting effect from the model.
Pierre Jacob Sequential inference in HMM 56/ 60
81. Other challenges
Particle filters provide useful estimates. . .
. . . but no estimates of their associated variance.
Can we estimate the variance without having to run the
algorithm many times?
Pierre Jacob Sequential inference in HMM 57/ 60
82. Other challenges
Particle methods are more and more commonly used outside
the setting of HMMs.
For instance, in the setting of long memory processes:
probabilistic programming, Bayesian non-parametric
applications.
Are particle methods useful for models that do not satisfy
forgetting properties?
Stability of Feynman-Kac formulae with path-dependent
potentials,
Chopin, Del Moral, Rubenthaler, 2009.
Pierre Jacob Sequential inference in HMM 58/ 60
83. Discussion
SMC2 allows sequential exact approximation in HMMs, but
not online.
Properties of posterior distributions could help achieving exact
online inference, or prove that it is, in fact, impossible.
Do we want to sample from the posterior as t → ∞?
Importance of plug and play inference for time series.
Implementation in LibBi, with GPU support.
Pierre Jacob Sequential inference in HMM 59/ 60
84. Links
Particle Markov chain Monte Carlo,
Andrieu, Doucet, Holenstein, 2010 (JRSS B)
Sequential Monte Carlo samplers: error bounds and
insensitivity to initial conditions,
Whiteley, 2011 (Stoch. Analysis and Appl.).
SMC2: an algorithm for sequential analysis of HMM,
Chopin, Jacob, Papaspiliopoulos, 2013 (JRSS B)
www.libbi.org
Pierre Jacob Sequential inference in HMM 60/ 60