SlideShare a Scribd company logo
1 of 17
Download to read offline
Background and Motivation Methods Results
Sufficient statistics for CTMCs:
Methods, implementations and applications
Paula Tataru, Asger Hobolth
April 13, 2011
Sufficient statistics for CTMCs: Methods, implementations and applications 1/17
Background and Motivation Methods Results
HIV evolution
HIV pol gene
C02 GCA TCC . . . GAT
E01 GCA TTC . . . GAT
C99 GCA TTT . . . GAT
D01 GCA CTT . . . GAT
D99 GCA CTT . . . GAT
D02 GCA CTT . . . GAT
C94 GCA CCT . . . GAT
H98 GCA TTT . . . GAC
I93 GCA TTC . . . GAG
I95 GCA TTC . . . GAC
Figure: Phylogeny reconstructed using maximum likelihood
Sufficient statistics for CTMCs: Methods, implementations and applications 2/17
Background and Motivation Methods Results
HIV evolution
Genetic code
2nd
base
T C A G
1st
base
T
TTT Phe TCT Ser TAT Tyr TGT Cys
TTC Phe TCC Ser TAC Tyr TGC Cys
TTA Leu TCA Ser TAA Stop TGA Stop
TTG Leu TCG Ser TAG Stop TGG Trp
C
CTT Leu CCT Pro CAT His CGT Arg
CTC Leu CCC Pro CAC His CGC Arg
CTA Leu CCA Pro CAA Gln CGA Arg
CTG Leu CCG Pro CAG Gln CGG Arg
A
ATT Ile ACT Thr AAT Asn AGT Ser
ATC Ile ACC Thr AAC Asn AGC Ser
ATA Ile ACA Thr AAA Lys AGA Arg
ATG Met ACG Thr AAG Lys AGG Arg
G
GTT Val GCT Ala GAT Asp GGT Gly
GTC Val GCC Ala GAC Asp GGC Gly
GTA Val GCA Ala GAA Glu GGA Gly
GTG Val GCG Ala GAG Glu GGG Gly
Table: Genetic code
Sufficient statistics for CTMCs: Methods, implementations and applications 3/17
Background and Motivation Methods Results
HIV evolution
Transitions vs transversions
A G
C T
Sufficient statistics for CTMCs: Methods, implementations and applications 4/17
Background and Motivation Methods Results
HIV evolution
Codon substitution model
Goldman&Yang (94) codon model:
qij =



0 if i → j by more than one substitution
ακπj if i → j by a synonymous transition
απj if i → j by a synonymous transversion
αωκπj if i → j by a non-synonymous transition
αωπj if i → j by a non-synonymous transversion
Sufficient statistics for CTMCs: Methods, implementations and applications 5/17
Background and Motivation Methods Results
Continuous Time Markov Chains
Continuous Time Markov Chains (CTMCs)
A CTMC is a stochastic process {X(t)|t ≥ 0} that fulfils the
Markov property.
The process is characterized by a rate matrix Q = (qij ) that
describes the rates at which the process moves from one state
to another.
Q fulfils qii = − j=i qij .
Transition probabilities
pij (t) = P(X(t) = j|X(0) = i) = (eQt)ij .
AAA AAC
q12
q21
...
q1· q2·
TTT
q1n
qn1
q2n
qn2
qn·
Sufficient statistics for CTMCs: Methods, implementations and applications 6/17
Background and Motivation Methods Results
Continuous Time Markov Chains
Likelihood equations
The log likelihood of observing the complete data x given κ, ω, α is:
(κ, ω, α; x) = −α (κTs,ts + ωκTns,ts + ωTns,tv + Ts,tv )
+Nts log κ + Nns log ω + N log α
Let β = αk. Maximize likelihood ⇔ solving a 2nd order equation:



∂l
∂α = −Ts,tv − ωTns,tv + Ntv
α = 0
∂l
∂β = −Ts,ts − ωTns,ts + Nts
β = 0
∂l
∂ω = −αTns,tv − βTns,ts + Nns
ω = 0
Sufficient statistics for CTMCs: Methods, implementations and applications 7/17
Background and Motivation Methods Results
Continuous Time Markov Chains
Missing data problem: EM algorithm
Iterative procedure:
expectation step - computes the expectation of the likelihood
using the current estimate of the variables;
maximization step - estimates the parameters by maximizing
the likelihood found in the previous step.
In particular, it can be used to estimate the rate matrix for
discretely observed CTMCs.
Sufficient statistics for CTMCs: Methods, implementations and applications 8/17
Background and Motivation Methods Results
Continuous Time Markov Chains
E step
E [ (κ, ω, α; x)|y] = −α κE[Ts,ts
|y] + ωκE[Tns,ts
|y] + ωE[Tns,tv
|y] + E[Ts,tv
|y]
+E[Nts
|y] log κ + E[Nns
|y] log ω + E[N|y] log α
Assuming a fixed phylogeny, we have: βk
γk
tk
E[Ti |y] =
branch k a,b
P(γk = a, βk = b, tk|y)
E[Ti 1(b after tk)|a]
pab(tk)
E[Nij |y] =
branch k a,b
P(γk = a, βk = b, tk|y)
E[Nij 1(b after tk)|a]
pab(tk)
Sufficient statistics for CTMCs: Methods, implementations and applications 9/17
Background and Motivation Methods Results
Continuous Time Markov Chains
Statistics for CTMCs
We are interested in the following summary statistics:
E[Ncd 1(X(T) = b)|X(0) = a] = qcd Icd
ab (T)
where Icd
ab (T) =
T
0
pac(t)pdb(T − t) dt.
Sometimes we need sums of expected values:
N∆
ab(T) =
(c,d)∈∆
E[Ncd 1(X(T) = b)|X(0) = a] =
(c,d)∈∆
qcd Icd
ab (T)
Nts requires ∆ = {(c, d)|c → d by a transition}.
Sufficient statistics for CTMCs: Methods, implementations and applications 10/17
Background and Motivation Methods Results
Methods
In Hobolth. A & Jensen J.L. (2010), three methods are presented
to evaluate the necessary statistics:
1. eigenvalue decomposition - eigen;
2. uniformization - uni;
3. basic matrix exponentiation - expm.
Our aim is to implement them and compare their performance.
Sufficient statistics for CTMCs: Methods, implementations and applications 11/17
Background and Motivation Methods Results
eigen
1. Eigenvalue decomposition
Let Q = UΛU−1
Then Icd
ab (T) =
T
0
pac(t)pdb(T − t) dt
=
T
0
eQt
ac
eQ(T−t)
db
dt
=
T
0
UeΛt
U−1
ac
UeΛ(T−t)
U−1
db
dt
Setting Jij (T) =
T
0
eλi t
eλj (T−t)
dt
we have N∆(T) = U · J(T) U−1 · ((1∆ Q) · U) · U−1
Sufficient statistics for CTMCs: Methods, implementations and applications 12/17
Background and Motivation Methods Results
uni
2. Uniformization
Let µ = maxi (−qii ) and R = 1
µQ + I
Then P(t) = eQt = e(µ(R−I))t
=
∞
m=0
Rm (µt)m
m!
e−µt
=
∞
m=0
Rm
Pois(m; µt)
and N∆(T) =
1
µ
∞
m=0
Pois(m + 1; µT)
m
l=0
Rl
· (1∆ Q) · Rm−l
Sufficient statistics for CTMCs: Methods, implementations and applications 13/17
Background and Motivation Methods Results
uni
Truncation at s(λ)
0 50 150 250
0100300
λ
s(λ)
regression
tail
2 4 6 8 10
102030
λs(λ)
regression
tail
Figure: Comparison between the Poission tail as determined by R and
the approximation using linear regression, with s(λ) = 4 + 6
√
λ + λ.
Sufficient statistics for CTMCs: Methods, implementations and applications 14/17
Background and Motivation Methods Results
expm
3. Exponentiation
Let A =
Q 1∆ Q
0 Q
⇒ eAt =
F(T) G(T)
0 F(T)
We have d
dT eAT = AeAT
⇒
F (T) G (T)
0 F (T)
=
Q 1∆ Q
0 Q
F(T) G(T)
0 F(T)
⇒ G(T) =
T
0
eQt
(1∆ Q) eQ(T−t)
dt
Then N∆(T) = (eAT )1:n,(n+1):2n
For exponentiating a matrix, we used the R package expm.
Sufficient statistics for CTMCs: Methods, implementations and applications 15/17
Background and Motivation Methods Results
Expectation time
2 4 6 8 10 12 14 16
0.01.02.03.0
# sequences
sec
eigen
uni
expm
Figure: Time comparsion for computing expectations in the EM algorithm
Sufficient statistics for CTMCs: Methods, implementations and applications 16/17
Background and Motivation Methods Results
Summary
N∆
(T) accuracy
running time
pre main
eigen U · J(T) U−1
· ((1∆ Q) · U) · U−1
prone to
O(n3
) O(n3
)
large errors
uni
1
µ m
Pois(m; µT)
m−1
l=0
R
l
· (1∆ Q) · R
m−l sum of many
O(s(µT)n3
) O(s(µT)n3
)
small numbers
expm (eAT
)1:n,(n+1):2n most accurate 1
none O(n3
)
Table: Summary
1
Higham, J.: The Scaling and Squaring Method for the Matrix Exponential Revisited (2003)
Moler, C., Van Loan, C.: Nineteen Dubious Ways to Compute the Exponential of a Matrix,
Twenty-Five Years Later (2003)
Sufficient statistics for CTMCs: Methods, implementations and applications 17/17

More Related Content

Viewers also liked

Viewers also liked (13)

PaulaTataru
PaulaTataruPaulaTataru
PaulaTataru
 
PaulaTataruVienna
PaulaTataruViennaPaulaTataruVienna
PaulaTataruVienna
 
Mols_August2013
Mols_August2013Mols_August2013
Mols_August2013
 
AB-RNA-alignments-2010
AB-RNA-alignments-2010AB-RNA-alignments-2010
AB-RNA-alignments-2010
 
part A
part Apart A
part A
 
write_thesis
write_thesiswrite_thesis
write_thesis
 
TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014
 
birc-csd2012
birc-csd2012birc-csd2012
birc-csd2012
 
PaulaTataruCSHL
PaulaTataruCSHLPaulaTataruCSHL
PaulaTataruCSHL
 
PaulaTataruAarhus
PaulaTataruAarhusPaulaTataruAarhus
PaulaTataruAarhus
 
PaulaTataruOxford
PaulaTataruOxfordPaulaTataruOxford
PaulaTataruOxford
 
PaulaTataru_PhD_defense
PaulaTataru_PhD_defensePaulaTataru_PhD_defense
PaulaTataru_PhD_defense
 
AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011
 

Similar to Thiele

Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systemsahmad bassiouny
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network IJECEIAES
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction modelIJCI JOURNAL
 
RoCo MSc Seminar
RoCo MSc SeminarRoCo MSc Seminar
RoCo MSc SeminarYazanSafadi
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...IJERA Editor
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmmnozomuhamada
 
IGARSS2011_TU1.T03.2_Zhan.ppt
IGARSS2011_TU1.T03.2_Zhan.pptIGARSS2011_TU1.T03.2_Zhan.ppt
IGARSS2011_TU1.T03.2_Zhan.pptgrssieee
 
Mathematical support for preventive maintenance periodicity optimization of r...
Mathematical support for preventive maintenance periodicity optimization of r...Mathematical support for preventive maintenance periodicity optimization of r...
Mathematical support for preventive maintenance periodicity optimization of r...Alexander Lyubchenko
 
Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06Rediet Moges
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
Exploiting clustering techniques
Exploiting clustering techniquesExploiting clustering techniques
Exploiting clustering techniquesmejjagiri
 
A machine consciousness approach to urban traffic signal control
A machine consciousness approach to urban traffic signal controlA machine consciousness approach to urban traffic signal control
A machine consciousness approach to urban traffic signal controlAndré Paraense
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
 

Similar to Thiele (20)

PhDretreat2011
PhDretreat2011PhDretreat2011
PhDretreat2011
 
Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...Computing near-optimal policies from trajectories by solving a sequence of st...
Computing near-optimal policies from trajectories by solving a sequence of st...
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systems
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network
 
Ydstie
YdstieYdstie
Ydstie
 
Ecg
EcgEcg
Ecg
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction model
 
922214 e002013
922214 e002013922214 e002013
922214 e002013
 
RoCo MSc Seminar
RoCo MSc SeminarRoCo MSc Seminar
RoCo MSc Seminar
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
 
2012 mdsp pr06  hmm
2012 mdsp pr06  hmm2012 mdsp pr06  hmm
2012 mdsp pr06  hmm
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
IGARSS2011_TU1.T03.2_Zhan.ppt
IGARSS2011_TU1.T03.2_Zhan.pptIGARSS2011_TU1.T03.2_Zhan.ppt
IGARSS2011_TU1.T03.2_Zhan.ppt
 
Mathematical support for preventive maintenance periodicity optimization of r...
Mathematical support for preventive maintenance periodicity optimization of r...Mathematical support for preventive maintenance periodicity optimization of r...
Mathematical support for preventive maintenance periodicity optimization of r...
 
Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06Digital Signal Processing[ECEG-3171]-Ch1_L06
Digital Signal Processing[ECEG-3171]-Ch1_L06
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
Exploiting clustering techniques
Exploiting clustering techniquesExploiting clustering techniques
Exploiting clustering techniques
 
A machine consciousness approach to urban traffic signal control
A machine consciousness approach to urban traffic signal controlA machine consciousness approach to urban traffic signal control
A machine consciousness approach to urban traffic signal control
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 

More from Paula Tataru

More from Paula Tataru (8)

PhDretreat2014
PhDretreat2014PhDretreat2014
PhDretreat2014
 
AB-RNA-comparison-2011
AB-RNA-comparison-2011AB-RNA-comparison-2011
AB-RNA-comparison-2011
 
AB-RNA-alignments-2011
AB-RNA-alignments-2011AB-RNA-alignments-2011
AB-RNA-alignments-2011
 
AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011
 
AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010
 
AB-RNA-SCFG-2010
AB-RNA-SCFG-2010AB-RNA-SCFG-2010
AB-RNA-SCFG-2010
 
AB-RNA-Nus-2010
AB-RNA-Nus-2010AB-RNA-Nus-2010
AB-RNA-Nus-2010
 
mgsa_poster
mgsa_postermgsa_poster
mgsa_poster
 

Thiele

  • 1. Background and Motivation Methods Results Sufficient statistics for CTMCs: Methods, implementations and applications Paula Tataru, Asger Hobolth April 13, 2011 Sufficient statistics for CTMCs: Methods, implementations and applications 1/17
  • 2. Background and Motivation Methods Results HIV evolution HIV pol gene C02 GCA TCC . . . GAT E01 GCA TTC . . . GAT C99 GCA TTT . . . GAT D01 GCA CTT . . . GAT D99 GCA CTT . . . GAT D02 GCA CTT . . . GAT C94 GCA CCT . . . GAT H98 GCA TTT . . . GAC I93 GCA TTC . . . GAG I95 GCA TTC . . . GAC Figure: Phylogeny reconstructed using maximum likelihood Sufficient statistics for CTMCs: Methods, implementations and applications 2/17
  • 3. Background and Motivation Methods Results HIV evolution Genetic code 2nd base T C A G 1st base T TTT Phe TCT Ser TAT Tyr TGT Cys TTC Phe TCC Ser TAC Tyr TGC Cys TTA Leu TCA Ser TAA Stop TGA Stop TTG Leu TCG Ser TAG Stop TGG Trp C CTT Leu CCT Pro CAT His CGT Arg CTC Leu CCC Pro CAC His CGC Arg CTA Leu CCA Pro CAA Gln CGA Arg CTG Leu CCG Pro CAG Gln CGG Arg A ATT Ile ACT Thr AAT Asn AGT Ser ATC Ile ACC Thr AAC Asn AGC Ser ATA Ile ACA Thr AAA Lys AGA Arg ATG Met ACG Thr AAG Lys AGG Arg G GTT Val GCT Ala GAT Asp GGT Gly GTC Val GCC Ala GAC Asp GGC Gly GTA Val GCA Ala GAA Glu GGA Gly GTG Val GCG Ala GAG Glu GGG Gly Table: Genetic code Sufficient statistics for CTMCs: Methods, implementations and applications 3/17
  • 4. Background and Motivation Methods Results HIV evolution Transitions vs transversions A G C T Sufficient statistics for CTMCs: Methods, implementations and applications 4/17
  • 5. Background and Motivation Methods Results HIV evolution Codon substitution model Goldman&Yang (94) codon model: qij =    0 if i → j by more than one substitution ακπj if i → j by a synonymous transition απj if i → j by a synonymous transversion αωκπj if i → j by a non-synonymous transition αωπj if i → j by a non-synonymous transversion Sufficient statistics for CTMCs: Methods, implementations and applications 5/17
  • 6. Background and Motivation Methods Results Continuous Time Markov Chains Continuous Time Markov Chains (CTMCs) A CTMC is a stochastic process {X(t)|t ≥ 0} that fulfils the Markov property. The process is characterized by a rate matrix Q = (qij ) that describes the rates at which the process moves from one state to another. Q fulfils qii = − j=i qij . Transition probabilities pij (t) = P(X(t) = j|X(0) = i) = (eQt)ij . AAA AAC q12 q21 ... q1· q2· TTT q1n qn1 q2n qn2 qn· Sufficient statistics for CTMCs: Methods, implementations and applications 6/17
  • 7. Background and Motivation Methods Results Continuous Time Markov Chains Likelihood equations The log likelihood of observing the complete data x given κ, ω, α is: (κ, ω, α; x) = −α (κTs,ts + ωκTns,ts + ωTns,tv + Ts,tv ) +Nts log κ + Nns log ω + N log α Let β = αk. Maximize likelihood ⇔ solving a 2nd order equation:    ∂l ∂α = −Ts,tv − ωTns,tv + Ntv α = 0 ∂l ∂β = −Ts,ts − ωTns,ts + Nts β = 0 ∂l ∂ω = −αTns,tv − βTns,ts + Nns ω = 0 Sufficient statistics for CTMCs: Methods, implementations and applications 7/17
  • 8. Background and Motivation Methods Results Continuous Time Markov Chains Missing data problem: EM algorithm Iterative procedure: expectation step - computes the expectation of the likelihood using the current estimate of the variables; maximization step - estimates the parameters by maximizing the likelihood found in the previous step. In particular, it can be used to estimate the rate matrix for discretely observed CTMCs. Sufficient statistics for CTMCs: Methods, implementations and applications 8/17
  • 9. Background and Motivation Methods Results Continuous Time Markov Chains E step E [ (κ, ω, α; x)|y] = −α κE[Ts,ts |y] + ωκE[Tns,ts |y] + ωE[Tns,tv |y] + E[Ts,tv |y] +E[Nts |y] log κ + E[Nns |y] log ω + E[N|y] log α Assuming a fixed phylogeny, we have: βk γk tk E[Ti |y] = branch k a,b P(γk = a, βk = b, tk|y) E[Ti 1(b after tk)|a] pab(tk) E[Nij |y] = branch k a,b P(γk = a, βk = b, tk|y) E[Nij 1(b after tk)|a] pab(tk) Sufficient statistics for CTMCs: Methods, implementations and applications 9/17
  • 10. Background and Motivation Methods Results Continuous Time Markov Chains Statistics for CTMCs We are interested in the following summary statistics: E[Ncd 1(X(T) = b)|X(0) = a] = qcd Icd ab (T) where Icd ab (T) = T 0 pac(t)pdb(T − t) dt. Sometimes we need sums of expected values: N∆ ab(T) = (c,d)∈∆ E[Ncd 1(X(T) = b)|X(0) = a] = (c,d)∈∆ qcd Icd ab (T) Nts requires ∆ = {(c, d)|c → d by a transition}. Sufficient statistics for CTMCs: Methods, implementations and applications 10/17
  • 11. Background and Motivation Methods Results Methods In Hobolth. A & Jensen J.L. (2010), three methods are presented to evaluate the necessary statistics: 1. eigenvalue decomposition - eigen; 2. uniformization - uni; 3. basic matrix exponentiation - expm. Our aim is to implement them and compare their performance. Sufficient statistics for CTMCs: Methods, implementations and applications 11/17
  • 12. Background and Motivation Methods Results eigen 1. Eigenvalue decomposition Let Q = UΛU−1 Then Icd ab (T) = T 0 pac(t)pdb(T − t) dt = T 0 eQt ac eQ(T−t) db dt = T 0 UeΛt U−1 ac UeΛ(T−t) U−1 db dt Setting Jij (T) = T 0 eλi t eλj (T−t) dt we have N∆(T) = U · J(T) U−1 · ((1∆ Q) · U) · U−1 Sufficient statistics for CTMCs: Methods, implementations and applications 12/17
  • 13. Background and Motivation Methods Results uni 2. Uniformization Let µ = maxi (−qii ) and R = 1 µQ + I Then P(t) = eQt = e(µ(R−I))t = ∞ m=0 Rm (µt)m m! e−µt = ∞ m=0 Rm Pois(m; µt) and N∆(T) = 1 µ ∞ m=0 Pois(m + 1; µT) m l=0 Rl · (1∆ Q) · Rm−l Sufficient statistics for CTMCs: Methods, implementations and applications 13/17
  • 14. Background and Motivation Methods Results uni Truncation at s(λ) 0 50 150 250 0100300 λ s(λ) regression tail 2 4 6 8 10 102030 λs(λ) regression tail Figure: Comparison between the Poission tail as determined by R and the approximation using linear regression, with s(λ) = 4 + 6 √ λ + λ. Sufficient statistics for CTMCs: Methods, implementations and applications 14/17
  • 15. Background and Motivation Methods Results expm 3. Exponentiation Let A = Q 1∆ Q 0 Q ⇒ eAt = F(T) G(T) 0 F(T) We have d dT eAT = AeAT ⇒ F (T) G (T) 0 F (T) = Q 1∆ Q 0 Q F(T) G(T) 0 F(T) ⇒ G(T) = T 0 eQt (1∆ Q) eQ(T−t) dt Then N∆(T) = (eAT )1:n,(n+1):2n For exponentiating a matrix, we used the R package expm. Sufficient statistics for CTMCs: Methods, implementations and applications 15/17
  • 16. Background and Motivation Methods Results Expectation time 2 4 6 8 10 12 14 16 0.01.02.03.0 # sequences sec eigen uni expm Figure: Time comparsion for computing expectations in the EM algorithm Sufficient statistics for CTMCs: Methods, implementations and applications 16/17
  • 17. Background and Motivation Methods Results Summary N∆ (T) accuracy running time pre main eigen U · J(T) U−1 · ((1∆ Q) · U) · U−1 prone to O(n3 ) O(n3 ) large errors uni 1 µ m Pois(m; µT) m−1 l=0 R l · (1∆ Q) · R m−l sum of many O(s(µT)n3 ) O(s(µT)n3 ) small numbers expm (eAT )1:n,(n+1):2n most accurate 1 none O(n3 ) Table: Summary 1 Higham, J.: The Scaling and Squaring Method for the Matrix Exponential Revisited (2003) Moler, C., Van Loan, C.: Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later (2003) Sufficient statistics for CTMCs: Methods, implementations and applications 17/17