2. Definition of HMM
Def [HMM] A Hidden Markov Model (HMM) { X_t: t in mathbb{N} } is a particular kind of
dependent mixture with bold{X}^{(t)}, bold{C}^{(t)} representing the histories from time 1 to
time t, one can summarize the simplest model of this kind by:
text{Pr}(C_t|bold{C}^{(t-1)})=text{Pr}(C_t|C_{t-1}),
text{Pr}(X_t|bold{X}^{(t-1)},bold{C}^{t})=text{Pr}(X_t|C_t).
Remark
● Parameter process {C_t|t=1,2,cdots} satisfies Markov property, and
● State dependent process {X_t|t=1,2,cdots} depends only on the current state C_t
4. EM algorithm
EM algorithm
Given the statistical model which generates a set X of observed data, a set of unobserved latent
data or missing values Z, and a vector of unknown parameters , along with a likelihood
function, the maximum likelihood estimate (MLE) of the unknown
parameters is determined by the marginal likelihood of the observed data
However, this quantity is often intractable (e.g. if Z is a sequence of events, so that the number
of values grows exponentially with the sequence length, making the exact calculation of the sum
extremely difficult).
Remark) EM works to improve Q(boldsymboltheta|boldsymboltheta^{(t)}) rather than
directly improving log p(mathbf{X}|boldsymboltheta).
5. EM algorithm
EM algorithm
The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these
two steps:
● Expectation step (E step): Calculate the expected value of the log likelihood function, with
respect to the conditional distribution of Z given X under the current estimate of the
parameters :
●
● Maximization step (M step): Find the parameters that maximize this quantity:
7. Baum-Welch Algorithm
E Step
Conditional expectations given the observations bold{x}^{(T)} and the current parameter
estimates
● text{Pr}(C_t=j|bold{x}^{(T)})=frac{text{Pr}(C_t=j,bold{x}^{(T)})}{text{Pr}(bold{x}^{(T)}
)}=alpha_t(j)beta_t(j)/L_T
● text{Pr}(C_{t-1}=j,
C_t=k|bold{x}^{(T)})=frac{text{Pr}(C_{t-1}=j,C_t=k,bold{x}^{(T)})}{text{Pr}(bold{x}^{(T
)})}=alpha_{t-1}(j)gamma_{jk}p_k(x_t)beta_t(j)/L_T
8. Baum-Welch Algorithm
Forward and Backward Probabilities
where boldsymbol{alpha}_t, boldsymbol{beta}_t are defined as follows
● boldsymbol{alpha}_t=boldsymbol{delta}bold{P}(x_1)boldsymbolGammabold{P}(x_
2)cdotsboldsymbol{Gamma}bold{P}(x_t)=boldsymbol{delta}bold{P}(x_1)prod_{s=2}
^tboldsymbol{Gamma}bold{P}(x_s)
● boldsymbol{beta}_t^prime=boldsymbol{Gamma}bold{P}(x_{t+1})boldsymbol{Gam
ma}bold{P}(x_{t+2})cdotsboldsymbol{Gamma}bold{P}_{x_T}bold{1}^prime=Big(
prod_{s=t+1}^Tboldsymbol{Gamma}bold{P}(x_s)Big) bold{1}^prime
We referred to the elements of boldsymbol{alpha}_t, boldsymbol{beta}_t as forward
probabilities and backward probabilities, respectively.
9. Baum-Welch Algorithm
Prop[Forward probability] For t=1, 2, …, T, and j=1, 2, …, m,
alpha_t(j)=text{Pr}(X_1=x_1,X_2=x_2,cdots,X_t=x_t,C_t=j).
Prop[Backward probability] For t=1, 2, ... , T-1, and i=1,2, …, m,
beta_t(i)=text{Pr}(X_{t+1}=x_{t+1},X_{t+2}=x_{t+2},cdots,X_T=x_T|C_i=i)
provided that text{Pr}(C_t=i)>0.
10. Baum-Welch Algorithm
M Step
Maximize the CDLL with respect to the three sets of parameters:
● delta_j=u_j(1)/sum_{j=1}^mu_j(1)
● gamma_{jk}=f_{jk}/sum_{k=1}^m f_{jk} text{ where } f_{jk}=sum_{t=2}^Tv_{jk}(t)
● sum_{j=1}^msum_{t=1}^Tu_j(t)log(p_j(x_t)) depends on the j-th state dependent
distribution
11. Baum-Welch Algorithm
M Step
For a Poisson-HMM, p_j(x)=e^{-lambda_j^x/x!},
0=sum_tu_j(t)(-1+x_t/lambda_j), that is, lambda_j=sum_{t=1}^Tu_j(t)x_t/sum_{t=1}^Tu_j(t)
For a Normal-HMM, p_j(x)=(2pisigma^2_j)^{-1/2}exp big( -frac1{2sigma^2_j} (x-mu_j)^2
big)
mu_j=sum_{t=1}^Tu_j(t)x_t/sum_{t=1}^Tu_j(t)
sigma_j^2=sum_{t=1}^Tu_j(t)(x_t-mu_j)^2/sum_{t=1}^Tu_j(t)
12. References
● Wikipedia contributors. Expectation–maximization algorithm. Wikipedia, The Free Encyclopedia. April 5,
2018, 09:20 UTC. Available at:
https://en.wikipedia.org/w/index.php?title=Expectation%E2%80%93maximization_algorithm&oldid=834
359959. Accessed April 9, 2018.
● Wikipedia contributors. Baum–Welch algorithm. Wikipedia, The Free Encyclopedia. February 21, 2018,
06:13 UTC. Available at:
https://en.wikipedia.org/w/index.php?title=Baum%E2%80%93Welch_algorithm&oldid=826827292.
Accessed April 9, 2018.
● Walter Zucchini, Iain L. MacDonald, and Roland Langrock, Hidden Markov Models for Time Series: An
Introduction Using R, Second Edition, Chapman and Hall/CRC, June 7, 2016.