1. Speeding Up Bayesian HMM by the Four Russians
Method
Md Pavel Mahmud1 Alexander Schliep1;2
1Department of Computer Science, Rutgers University
2BioMAPS Institute, Rutgers University
December 22, 2013
3. Motivation
Problem: Classify an observation sequence O using Hidden
Markov Model (HMM)
Example: Identifying isochore classes from DNA sequence
Concentration of G+C content
0.6
0.55
0.5
0.45
0.4
G+C content
0.35
0.3
Kb
ACGT AAGTTCAT GCGTCCGGC ACGTACGTACGT
4. Motivation
Problem: Classify an observation sequence O using Hidden
Markov Model (HMM)
Example: Identifying isochore classes from DNA sequence
Concentration of G+C content
0.6
0.55
0.5
0.45
0.4
G+C content
0.35
0.3
Kb
|AC{GzT}
S1
|AAGT{Tz CAT}
S2
|GCGT{CzCGGC}
S3
|ACGTAC{GzTACGT}
S1
5. Hidden Markov Model
b1,* b2,*
A C G T a1,2
A C G T
a SSa1,1
1 2
2,2 a2,1
a1,3
a3,1
a3,2
a2,3
S3
b3,*
aA C G T 3,3
6. Hidden Markov Model
b1,* b2,*
A C G T a1,2
A C G T
a SSa2,2 1,1
1 2
a2,1
a1,3
a3,1
a3,2
a2,3
S3
aA C G T 3,3
N, number of states
Σ, finite alphabet
A={ai,j}, transition matrix
B={bi,j}, emission matrix
π, initial state distribution
b3,*
7. Hidden Markov Model
Given the observation sequence O = (o1; o2; : : : ; oT ) 2 T
Find the hidden state sequence Q = (q1; q2; : : : ; qT ) 2 ST
Dependency structure
qt-2 qt-1 qt qt+1
ot-2 ot-1 ot ot+1
How to learn the model parameters = (A;B; )
8. Learning
ML approach
ML = argmax
P(Oj)
QML = argmax
Q
P(QjML;O)
Fast computation
Local optimization only
Bayesian computations: integrate out model parameters
P(QjO) =
R
P(Qj;O)P(jO)d
Computationally expensive
9. Bayesian Analysis
Our goal is to compute the distribution P(QjO)
Prior distribution for Ai ;, Bi ;, and
Standard conjugate priors such as Dirichlet distribution
Gibbs sampling
Creates a Markov chain with stationary distribution P(QjO)
The states of the chain can be used as samples from the
stationary distribution
Forward-backward Gibbs sampling (FBG-sampling)
Excellent convergence characteristics
We speed up computation exploiting sequence repetition
10. Bayesian Analysis
Algorithm 1 FBG-Sampling(O)
1: Choose initial parameters 0 = (A0; B0; 0).
2: Perform the following steps for 0 m M.
Qm = StateSampler(O, m)
Sample HMM parameters,
m+1 PriorDistribution(hyperparameters;O;Qm; m)
3: return Q0;Q1; : : : ;QM1.
11. Bayesian Analysis
Algorithm 2 StateSampler(O, )
1: Forward Variables: t (j) = P(O1:::t ; qt = j j)
Compute 1(j) = jbj ;o1 for all j .
For 2 t T:
Compute t (j) =
NP
i=1
t1(i)ai ;jbj ;ot for all j .
2: Backward Sampling:
Sample qT s.t. P(qT = i) / T (i ).
For T t 1:
Sample qt s.t. P(qt = i) / t (i)ai ;qt+1.
3: return Q
25. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
26. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
27. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
qdk
28. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q(d1)k
qdk
29. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q(d1)k ! qdk
30. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q(d1)k ! qdk
31. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q3k q(d1)k ! qdk
32. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q2k
q3k q(d1)k ! qdk
33. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
q2k ! q3k q(d1)k ! qdk
34. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
qk
q2k ! q3k q(d1)k ! qdk
35. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
qk ! q2k ! q3k q(d1)k ! qdk
36. Speeding up MCMC
Lets assume T is a multiple of k s.t. T = dk
Changes
k 2k 3k ! (d1)k dk
qk ! q2k ! q3k q(d1)k ! qdk
Tk
N forward variables instead of TN
Backward state sampling modi
39. Compression and Forward Variables
k 2k 3k ! (d1)k dk
Exploit sequence repetition in long sequences [Mozes'09]
Viterbi path (the most likely state sequence) computation
Baum-Welch algorithm
40. Compression and Forward Variables
k 2k 3k ! (d1)k dk
Exploit sequence repetition in long sequences [Mozes'09]
Viterbi path (the most likely state sequence) computation
Baum-Welch algorithm
Lets de
41. ne
M(v) s.t. Mi ;j (v) = ai ;jbj ;v
M
Oi :::j
= M(oi ) M(oi+1) M(oj1) M(oj )
42. Compression and Forward Variables
k 2k 3k ! (d1)k dk
Exploit sequence repetition in long sequences [Mozes'09]
Viterbi path (the most likely state sequence) computation
Baum-Welch algorithm
Lets de
43. ne
M(v) s.t. Mi ;j (v) = ai ;jbj ;v
M
Oi :::j
= M(oi ) M(oi+1) M(oj1) M(oj )
Pre-compute all possible matrices M(X), where jXj k
Known as the four Russians method
44. Compression and Forward Variables
k 2k 3k ! (d1)k dk
Exploit sequence repetition in long sequences [Mozes'09]
Viterbi path (the most likely state sequence) computation
Baum-Welch algorithm
Lets de
45. ne
M(v) s.t. Mi ;j (v) = ai ;jbj ;v
M
Oi :::j
= M(oi ) M(oi+1) M(oj1) M(oj )
Pre-compute all possible matrices M(X), where jXj k
Known as the four Russians method
Rewrite forward variables t as a row vector
t = M(o1) M(o2) M(ot1) M(ot ) = t1 M(ot )
46. Compression and Forward Variables
k 2k 3k ! (d1)k dk
Exploit sequence repetition in long sequences [Mozes'09]
Viterbi path (the most likely state sequence) computation
Baum-Welch algorithm
Lets de
47. ne
M(v) s.t. Mi ;j (v) = ai ;jbj ;v
M
Oi :::j
= M(oi ) M(oi+1) M(oj1) M(oj )
Pre-compute all possible matrices M(X), where jXj k
Known as the four Russians method
Rewrite forward variables t as a row vector
t = M(o1) M(o2) M(ot1) M(ot ) = t1 M(ot )
lk = (l1)k M(O(l1)k+1:::lk )
49. Backward-forward State Sequence
qk ! q2k ! q3k q(d1)k ! qdk
P(QjO; ) = P(Q1:::k1jQk:::T ;O; ) | {z }
Part A
P(Qk:::T jO; ) | {z }
Part B
50. Backward-forward State Sequence
qk ! q2k ! q3k q(d1)k ! qdk
P(QjO; ) = P(Q1:::k1jQk:::T ;O; ) | {z }
Part A
P(Qk:::T jO; ) | {z }
Part B
P(qT jO; ) | {z }
Part B1
Q
di2
s=(i1)k
e=ik
P(qs jQe:::T ;O; ) | {z }
Part B2
eQ1
j=s+1
P(qj jQs:::j1;Qe:::T ;O; ) | {z }
Part B3
51. Backward-forward State Sequence
qk ! q2k ! q3k q(d1)k ! qdk
P(QjO; ) = P(Q1:::k1jQk:::T ;O; ) | {z }
Part A
P(Qk:::T jO; ) | {z }
Part B
P(qT jO; ) | {z }
Part B1
Q
di2
s=(i1)k
e=ik
P(qs jQe:::T ;O; ) | {z }
Part B2
eQ1
j=s+1
P(qj jQs:::j1;Qe:::T ;O; ) | {z }
Part B3
Sampling from part B1
P(qT jO; ) / P(qT ;Oj) = T (qT )
52. Backward-forward State Sequence
Sampling from part B2
P(qs jQe:::T ;O; ) / s (qs )Mqs ;qe (Os+1:::e )
B2 : sampling qs in group i-1
qe
oe
qe-1
oe-1
qs+1
os+1
qs
os
qs-1
os-1
53. Backward-forward State Sequence
Sampling from part B3
P(qj jQs:::j1;Qe:::T ;O; ) / Mqj1;qj (oj )Mqj ;qe (Oj+1:::e )
B3 : sampling qj in group i
qj qj+1 qj-1
qe
oj-1 oj
oj+1 oe
group
(i-1)
group
(i+1)
54. Fast Sampling Algorithm
Algorithm 3 FastStateSampler(O, )
1: Precompute:
M(X) for all X 2
Sk
i=1 i
2: Forward Variables:
Compute k = M(O1:::k )
Compute ik = (i1)kM(O(i1)k+1:::ik ) for 1 i d
3: Backward-forward Sampling:
Sample qT . For d i 2:
Sample q(i1)k using part B2
Sample qj , for (i 1)k j ik, using part B3
Given qk , sample q1; q2; : : : ; qk1 using part A
4: return Q
55. Running Time
Tk
Pre-compute 2jjk matrices in O(2jjkN3)
Forward variables in O(N2)
State samples in O(T log N)
56. Running Time
Tk
Pre-compute 2jjk matrices in O(2jjkN3)
Forward variables in O(N2)
State samples in O(T log N)
Total running time is O(2jjkN3 + Tk
N2 + T log N)
57. Running Time
Tk
Pre-compute 2jjk matrices in O(2jjkN3)
Forward variables in O(N2)
State samples in O(T log N)
Total running time is O(2jjkN3 + Tk
N2 + T log N)
If k is set to be 12logjj T, running time is
O(2
p
TN3 + 2TN2
logjj T + T log N)
58. Running Time
Tk
Pre-compute 2jjk matrices in O(2jjkN3)
Forward variables in O(N2)
State samples in O(T log N)
Total running time is O(2jjkN3 + Tk
N2 + T log N)
If k is set to be 12logjj T, running time is
O(2
p
TN3 + 2TN2
logjj T + T log N)
Assuming N
p
T
logjj T , the speed-up is (logjj T)
59. DNA Segmentation: Based on Sequence Composition
10
9
8
7
6
5
4
3
2
1
0
0 5 10 15 20 25 30 35 40 45
Time (second)
Bacteriophage Lambda (0.05 Mbp)
250
200
150
100
50
0
Mycoplasma Leachii (1 Mbp)
0 5 10 15 20 25 30 35 40 45
1400
1200
1000
Planctomyces Brasiliensis (6 Mbp)
Time (second) Number of States
800
600
400
200
0
0 5 10 15 20 25 30 35 40 45
3000
2500
2000
1500
1000
500
0
Sorangium Cellulosum (13 Mbp)
0 5 10 15 20 25 30 35 40 45
Number of States
Dirichlet priors, non-informative hyper-parameters [Boys'00,'04]
61. ed, but exact, forward-backward Gibbs sampling
Applicable to higher order observations
Logarithmic improvement in running time
First use of sequence repetition for Bayesian HMM
Long sequence and small alphabet - biological sequence
Future work
Other compression schemes
Complex HMM topologies