Introduction to HMM

Introduction to
HMM
2018. 04
Hyoseop Lee / hslee@encoredtech.com

Definition of HMM
Def [HMM] A Hidden Markov Model (HMM) { X_t: t in mathbb{N} } is a particular kind of
dependent mixture with bold{X}^{(t)}, bold{C}^{(t)} representing the histories from time 1 to
time t, one can summarize the simplest model of this kind by:
text{Pr}(C_t|bold{C}^{(t-1)})=text{Pr}(C_t|C_{t-1}),
text{Pr}(X_t|bold{X}^{(t-1)},bold{C}^{t})=text{Pr}(X_t|C_t).
Remark
● Parameter process {C_t|t=1,2,cdots} satisfies Markov property, and
● State dependent process {X_t|t=1,2,cdots} depends only on the current state C_t

EM algorithm
EM algorithm
Given the statistical model which generates a set X of observed data, a set of unobserved latent
data or missing values Z, and a vector of unknown parameters , along with a likelihood
function, the maximum likelihood estimate (MLE) of the unknown
parameters is determined by the marginal likelihood of the observed data
However, this quantity is often intractable (e.g. if Z is a sequence of events, so that the number
of values grows exponentially with the sequence length, making the exact calculation of the sum
extremely difficult).
Remark) EM works to improve Q(boldsymboltheta|boldsymboltheta^{(t)}) rather than
directly improving log p(mathbf{X}|boldsymboltheta).

EM algorithm
EM algorithm
The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these
two steps:
● Expectation step (E step): Calculate the expected value of the log likelihood function, with
respect to the conditional distribution of Z given X under the current estimate of the
parameters :
●
● Maximization step (M step): Find the parameters that maximize this quantity:

Baum-Welch Algorithm
Complete-data log-likelihood :
begin{align*} logBig( text{Pr}(bold{x}^{(T)},bold{c}^{(T)} Big)&=
logBig(delta_{c_1}prod_{t=2}^Tgamma_{c_{t-1},c_t}prod_{t=1}^Tp_{c_t}(x_t)Big)
&= logdelta_{c_1}+sum_{t=2}^Tloggamma_{c_{t-1},c_t}+sum_{t=1}^Tp_{c_t}(x_t)
end{align*}
Hence,
mathbb{E}Big[logBig( text{Pr}(bold{x}^{(T)},bold{c}^{(T)} Big) Big]
= sum_{j=1}^m u_j(1)
logdelta_{j}+sum_{j=1}^msum_{k=1}^mbig(sum_{t=2}^Tv_{jk}(t)big)loggamma_{jk}+sum
_{j=1}^msum_{t=1}^T u_j(t)p_{j}(x_t)

E Step
Conditional expectations given the observations bold{x}^{(T)} and the current parameter
estimates
● text{Pr}(C_t=j|bold{x}^{(T)})=frac{text{Pr}(C_t=j,bold{x}^{(T)})}{text{Pr}(bold{x}^{(T)}
)}=alpha_t(j)beta_t(j)/L_T
● text{Pr}(C_{t-1}=j,
C_t=k|bold{x}^{(T)})=frac{text{Pr}(C_{t-1}=j,C_t=k,bold{x}^{(T)})}{text{Pr}(bold{x}^{(T
)})}=alpha_{t-1}(j)gamma_{jk}p_k(x_t)beta_t(j)/L_T

Forward and Backward Probabilities
where boldsymbol{alpha}_t, boldsymbol{beta}_t are defined as follows
● boldsymbol{alpha}_t=boldsymbol{delta}bold{P}(x_1)boldsymbolGammabold{P}(x_
2)cdotsboldsymbol{Gamma}bold{P}(x_t)=boldsymbol{delta}bold{P}(x_1)prod_{s=2}
^tboldsymbol{Gamma}bold{P}(x_s)
● boldsymbol{beta}_t^prime=boldsymbol{Gamma}bold{P}(x_{t+1})boldsymbol{Gam
ma}bold{P}(x_{t+2})cdotsboldsymbol{Gamma}bold{P}_{x_T}bold{1}^prime=Big(
prod_{s=t+1}^Tboldsymbol{Gamma}bold{P}(x_s)Big) bold{1}^prime
We referred to the elements of boldsymbol{alpha}_t, boldsymbol{beta}_t as forward
probabilities and backward probabilities, respectively.

Prop[Forward probability] For t=1, 2, …, T, and j=1, 2, …, m,
alpha_t(j)=text{Pr}(X_1=x_1,X_2=x_2,cdots,X_t=x_t,C_t=j).
Prop[Backward probability] For t=1, 2, ... , T-1, and i=1,2, …, m,
beta_t(i)=text{Pr}(X_{t+1}=x_{t+1},X_{t+2}=x_{t+2},cdots,X_T=x_T|C_i=i)
provided that text{Pr}(C_t=i)>0.

M Step
Maximize the CDLL with respect to the three sets of parameters:
● delta_j=u_j(1)/sum_{j=1}^mu_j(1)
● gamma_{jk}=f_{jk}/sum_{k=1}^m f_{jk} text{ where } f_{jk}=sum_{t=2}^Tv_{jk}(t)
● sum_{j=1}^msum_{t=1}^Tu_j(t)log(p_j(x_t)) depends on the j-th state dependent
distribution

M Step
For a Poisson-HMM, p_j(x)=e^{-lambda_j^x/x!},
0=sum_tu_j(t)(-1+x_t/lambda_j), that is, lambda_j=sum_{t=1}^Tu_j(t)x_t/sum_{t=1}^Tu_j(t)
For a Normal-HMM, p_j(x)=(2pisigma^2_j)^{-1/2}exp big( -frac1{2sigma^2_j} (x-mu_j)^2
big)
mu_j=sum_{t=1}^Tu_j(t)x_t/sum_{t=1}^Tu_j(t)
sigma_j^2=sum_{t=1}^Tu_j(t)(x_t-mu_j)^2/sum_{t=1}^Tu_j(t)

References
● Wikipedia contributors. Expectation–maximization algorithm. Wikipedia, The Free Encyclopedia. April 5,
2018, 09:20 UTC. Available at:
https://en.wikipedia.org/w/index.php?title=Expectation%E2%80%93maximization_algorithm&oldid=834
359959. Accessed April 9, 2018.
● Wikipedia contributors. Baum–Welch algorithm. Wikipedia, The Free Encyclopedia. February 21, 2018,
06:13 UTC. Available at:
https://en.wikipedia.org/w/index.php?title=Baum%E2%80%93Welch_algorithm&oldid=826827292.
Accessed April 9, 2018.
● Walter Zucchini, Iain L. MacDonald, and Roland Langrock, Hidden Markov Models for Time Series: An
Introduction Using R, Second Edition, Chapman and Hall/CRC, June 7, 2016.

Introduction to HMM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to HMM

Similar to Introduction to HMM (20)

Recently uploaded

Recently uploaded (20)

Introduction to HMM