SlideShare a Scribd company logo
1 of 30
Download to read offline
Paula Tataru
Conditional expectations
of sufficient statistics
for continuous time Markov chains
Joint work with Asger Hobolth
November 28, 2012
Motivation
Methods
Results
Paula
Tataru
2/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Continuous-time Markov chains
A stochastic process {X(t) | t ≥ 0}
that fulfills the Markov property
Motivation
Methods
Results
Paula
Tataru
3/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Continuous-time Markov chains
• Credit risk
• States: different types of ratings
• Chemical reactions
• States: different states of a molecule
• DNA data
• States:
• Nucleotides
• Amino acids
• Codons
Motivation
Methods
Results
Paula
Tataru
4/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Modeling DNA data
1st 2nd base 3rd
base T C A G base
T
TTT
Phe
TCT
Ser
TAT
Tyr
TGT
Cys
T
TTC TCC TAC TGC C
TTA
Leu
TCA TAA Stop TGA Stop A
TTG TCG TAG Stop TGG Trp G
C
CTT CCT
Pro
CAT
His
CGT
Arg
T
CTC CCC CAC CGC C
CTA CCA CAA
Gln
CGA A
CTG CCG CAG CGG G
A
ATT
Ile
ACT
Thr
AAT
Asn
AGT
Ser
T
ATC ACC AAC AGC C
ATA ACA AAA
Lys
AGA
Arg
A
ATG Met ACG AAG AGG G
G
GTT
Val
GCT
Ala
GAT
Asp
GGT
Gly
T
GTC GCC GAC GGC C
GTA GCA GAA
Glu
GGA A
GTG GCG GAG GGG G
Motivation
Methods
Results
Paula
Tataru
5/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Amino acids (n = 20)
• Codons (n = 64)
Motivation
Methods
Results
Paula
Tataru
6/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Continuous-time Markov chains
• Characterized by a rate matrix
with properties
Motivation
Methods
Results
Paula
Tataru
7/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Modeling DNA data
• Nucleotides (n = 4)
• Jukes Cantor (JC)
Motivation
Methods
Results
Paula
Tataru
8/30
Conditional
expectations
of sufficient
statistics
for CTMCs
Continuous-time Markov chains
• Waiting time is exponentially distributed
• Jumps are discretely distributed
Motivation
Methods
Results
Paula
Tataru
9/30
Conditional
expectations
of sufficient
statistics
for CTMCs
General EM
Motivation
Methods
Results
Paula
Tataru
10/30
Conditional
expectations
of sufficient
statistics
for CTMCs
General EM
• Full log likelihood
• E-step
• M-step
Motivation
Methods
Results
Paula
Tataru
11/30
Conditional
expectations
of sufficient
statistics
for CTMCs
EM for Jukes Cantor
• Full log likelihood
• M-step
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
12/30
Calculating expectations
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
13/30
Linear combinations
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
14/30
Linear combinations
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
15/30
Linear combinations
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
16/30
Methods
• In Hobolth&Jensen (2010), three methods are
presented to evaluate the necessary statistics
• Eigenvalue decomposition (EVD)
• Uniformization (UNI)
• Exponentiation (EXPM)
• Extend efficiently to linear combinations
• Which method is more accurate?
• Which method is faster?
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
17/30
Eigenvalue decomposition
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
18/30
Uniformization
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
19/30
Uniformization
• Choose truncation point s(μt)
• Bound error using the tail of Pois(m+1; μt)
• s(μt) = 4 + 6 √μt + μt
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
20/30
Exponentiation
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
21/30
Algorithms
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1
μ, s(μt), R
O(n3
) O(n2
) O(n2
)
Rm
eAt
O(n2
) O(s(μt)n3
) O(n3
)
O(n3
) O(s(μt)n2
) O(n2
)
μt —
* expm package in R
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
22/30
Accuracy
• Use two models for which ∑(C; t) is available in
analytical form
• JC, varying n
• HKY, varying t
• Measure accuracy as the normalized difference:
approximation / true - 1
• as a function of n
• as a function of t
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
23/30
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
24/30
Speed
EVD UNI EXPM*
1 A
Order
2 J(t)
Order
3 (C; t)Σ (C; t)Σ (C; t)Σ
Order
Influenced by Q
, U, UΛ -1
μ, s(μt), R
O(n3
) O(n2
) O(n2
)
Rm
eAt
O(n2
) O(s(μt)n3
) O(n3
)
O(n3
) O(s(μt)n2
) O(n2
)
μt —
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
25/30
Speed
• Partition of computation
• Precomputation
• EVD
• UNI
• Main computation
• Use three models and different time points to
asses the speed
• GY, n = 61
• GTR, n = 4
• UNR, n = 4
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
26/30
Speed
• 10 equidistant time points
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
27/30
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
28/30
Results
• Accuracy
• Similar accuracy
• EXPM is the most accurate one
• Speed
• EXPM is the slowest
• EVD vs UNI
Motivation
Methods
Results
Paula
Tataru
Conditional
expectations
of sufficient
statistics
for CTMCs
29/30
Results
• UNI has potential
• Better cut off point s(μt)
• Ameliorate effect of μt by
• Adaptive uniformization
• On-the-fly variant of adaptive uniformization
Thank you!
Comparison of methods for calculating
conditional expectations of sufficient statistics
for continuous time Markov chains
BMC Bioinformatics 12(1):465, 2011

More Related Content

Similar to Calculating Expectations for CTMCs

Computational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionComputational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionGianluca Bontempi
 
Signal fundamentals
Signal fundamentalsSignal fundamentals
Signal fundamentalsLalit Kanoje
 
rbs - presentation about applications of machine learning.
rbs - presentation about applications of machine learning.rbs - presentation about applications of machine learning.
rbs - presentation about applications of machine learning.ChellamuthuMech
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataDaiki Tanaka
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...Dragan Sahpaski
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systemsahmad bassiouny
 

Similar to Calculating Expectations for CTMCs (7)

Computational Intelligence for Time Series Prediction
Computational Intelligence for Time Series PredictionComputational Intelligence for Time Series Prediction
Computational Intelligence for Time Series Prediction
 
Signal fundamentals
Signal fundamentalsSignal fundamentals
Signal fundamentals
 
rbs - presentation about applications of machine learning.
rbs - presentation about applications of machine learning.rbs - presentation about applications of machine learning.
rbs - presentation about applications of machine learning.
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
 
presentation_btp
presentation_btppresentation_btp
presentation_btp
 
Continuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event SystemsContinuous Systems To Discrete Event Systems
Continuous Systems To Discrete Event Systems
 

More from Paula Tataru

PaulaTataru_PhD_defense
PaulaTataru_PhD_defensePaulaTataru_PhD_defense
PaulaTataru_PhD_defensePaula Tataru
 
TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014Paula Tataru
 
AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011Paula Tataru
 
AB-RNA-comparison-2011
AB-RNA-comparison-2011AB-RNA-comparison-2011
AB-RNA-comparison-2011Paula Tataru
 
AB-RNA-alignments-2011
AB-RNA-alignments-2011AB-RNA-alignments-2011
AB-RNA-alignments-2011Paula Tataru
 
AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011Paula Tataru
 
AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010Paula Tataru
 
AB-RNA-alignments-2010
AB-RNA-alignments-2010AB-RNA-alignments-2010
AB-RNA-alignments-2010Paula Tataru
 

More from Paula Tataru (18)

PhDretreat2014
PhDretreat2014PhDretreat2014
PhDretreat2014
 
PaulaTataru_PhD_defense
PaulaTataru_PhD_defensePaulaTataru_PhD_defense
PaulaTataru_PhD_defense
 
birc-csd2012
birc-csd2012birc-csd2012
birc-csd2012
 
TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014TreeOfLife-jeopardy-2014
TreeOfLife-jeopardy-2014
 
AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011AB-RNA-Mfold&SCFGs-2011
AB-RNA-Mfold&SCFGs-2011
 
AB-RNA-comparison-2011
AB-RNA-comparison-2011AB-RNA-comparison-2011
AB-RNA-comparison-2011
 
AB-RNA-alignments-2011
AB-RNA-alignments-2011AB-RNA-alignments-2011
AB-RNA-alignments-2011
 
AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011AB-RNA-Nussinov-2011
AB-RNA-Nussinov-2011
 
AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010AB-RNA-SCFGdesign=2010
AB-RNA-SCFGdesign=2010
 
AB-RNA-SCFG-2010
AB-RNA-SCFG-2010AB-RNA-SCFG-2010
AB-RNA-SCFG-2010
 
AB-RNA-alignments-2010
AB-RNA-alignments-2010AB-RNA-alignments-2010
AB-RNA-alignments-2010
 
AB-RNA-Nus-2010
AB-RNA-Nus-2010AB-RNA-Nus-2010
AB-RNA-Nus-2010
 
PaulaTataruVienna
PaulaTataruViennaPaulaTataruVienna
PaulaTataruVienna
 
PaulaTataruCSHL
PaulaTataruCSHLPaulaTataruCSHL
PaulaTataruCSHL
 
PaulaTataruAarhus
PaulaTataruAarhusPaulaTataruAarhus
PaulaTataruAarhus
 
mgsa_poster
mgsa_postermgsa_poster
mgsa_poster
 
PaulaTataruOxford
PaulaTataruOxfordPaulaTataruOxford
PaulaTataruOxford
 
Mols_August2013
Mols_August2013Mols_August2013
Mols_August2013
 

Calculating Expectations for CTMCs