Upcoming SlideShare
×

Moment closure inference for stochastic kinetic models

865

Published on

My talk from the MBI Workshop on Recent Advances in Statistical Inference for Mathematical Biology 2012

http://www.mbi.osu.edu/2011/rasschedule.html

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
865
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide

Moment closure inference for stochastic kinetic models

1. 1. Moment closure inference for stochastic kinetic models Colin GillespieSchool of Mathematics & Statistics
2. 2. Talk outline An introduction to moment closure Case study: Aphids Conclusion 2/43
3. 3. Birth-death processBirth-death model X −→ 2X and 2X −→ Xwhich has the propensity functions λX and µX .Deterministic representationThe deterministic model is dX (t ) = ( λ − µ )X (t ) , dtwhich can be solved to give X (t ) = X (0) exp[(λ − µ)t ]. 3/43
4. 4. Birth-death processBirth-death model X −→ 2X and 2X −→ Xwhich has the propensity functions λX and µX .Deterministic representationThe deterministic model is dX (t ) = ( λ − µ )X (t ) , dtwhich can be solved to give X (t ) = X (0) exp[(λ − µ)t ]. 3/43
5. 5. Stochastic representationIn the stochastic framework, eachreaction has a probability of occurring 50The analogous version of the 40birth-death process is the difference Populationequation 30 20dpn = λ(n − 1)pn−1 + µ(n + 1)pn+1 10 dt − (λ + µ)npn 0 0 1 2 3 4 TimeUsually called the forward Kolmogorovequation or chemical master equation 4/43
6. 6. Moment equationsMultiply the CME by enθ and sum over n, to obtain ∂M ∂M = [λ(eθ − 1) + µ(e−θ − 1)] ∂t ∂θwhere ∞ M (θ; t ) = ∑ e n θ pn ( t ) n =0If we differentiate this p.d.e. w.r.t θ and set θ = 0, we get dE[N (t )] = (λ − µ)E[N (t )] dtwhere E[N (t )] is the mean 5/43
7. 7. The mean equation dE[N (t )] = (λ − µ)E[N (t )] dtThis ODE is solvable - the associated forward Kolmogorov equation isalso solvableThe equation for the mean and deterministic ODE are identicalWhen the rate laws are linear, the stochastic mean and deterministicsolution always correspond 6/43
8. 8. The variance equationIf we differentiate the p.d.e. w.r.t θ twice and set θ = 0, we get: dE[N (t )2 ] = (λ − µ)E[N (t )] + 2(λ − µ)E[N (t )2 ] dtand hence the variance Var[N (t )] = E[N (t )2 ] − E[N (t )]2 .Differentiating three times gives an expression for the skewness, etc 7/43
9. 9. Simple dimerisation modelDimerisation 2X1 −→ X2 and X2 −→ 2X1with propensities 0.5k1 X1 (X1 − 1) and k2 X2 . 8/43
10. 10. Dimerisation moment equationsWe formulate the dimer model in terms of moment equations dE[X1 ] 2 = 0.5k1 (E[X1 ] − E[X1 ]) − k2 E[X1 ] dt 2 dE[X1 ] 2 2 = k1 (E[X1 X2 ] − E[X1 X2 ]) + 0.5k1 (E[X1 ] − E[X1 ]) dt 2 + k2 (E[X1 ] − 2E[X1 ])where E[X1 ] is the mean of X1 and E[X1 ] − E[X1 ]2 is the variance 2The i th moment equation depends on the (i + 1)th equation 9/43
11. 11. Deterministic approximates stochasticRewriting dE[X1 ] 2 = 0.5k1 (E[X1 ] − E[X1 ]) − k2 E[X1 ] dtin terms of its variance, i.e. E[X1 ] = Var[X1 ] + E[X1 ]2 , we get 2 dE[X1 ] = 0.5k1 E [X1 ](E[X1 ] − 1) + 0.5k1 Var[X1 ] − k2 E[X1 ] (1) dt Setting Var[X1 ] = 0 in (1), recovers the deterministic equation So we can consider the deterministic models as an approximation to the stochastic When we have polynomial rate laws, setting the variance to zero results in the deterministic equation 10/43
12. 12. Deterministic approximates stochasticRewriting dE[X1 ] 2 = 0.5k1 (E[X1 ] − E[X1 ]) − k2 E[X1 ] dtin terms of its variance, i.e. E[X1 ] = Var[X1 ] + E[X1 ]2 , we get 2 dE[X1 ] = 0.5k1 E [X1 ](E[X1 ] − 1) + 0.5k1 Var[X1 ] − k2 E[X1 ] (1) dt Setting Var[X1 ] = 0 in (1), recovers the deterministic equation So we can consider the deterministic models as an approximation to the stochastic When we have polynomial rate laws, setting the variance to zero results in the deterministic equation 10/43
13. 13. Simple dimerisation modelTo close the equations, we assume an underlying distributionThe easiest option is to assume an underlying Normal distribution, i.e. E[X1 ] = 3E[X1 ]E[X1 ] − 2E[X1 ]3 3 2But we could also use, the Poisson 3 E[X1 ] = E[X1 ] + 3E[X1 ]2 + E[X1 ]3or the Log normal 2 3 3 E [ X1 ] E [ X1 ] = E [ X1 ] 11/43
14. 14. Simple dimerisation modelTo close the equations, we assume an underlying distributionThe easiest option is to assume an underlying Normal distribution, i.e. E[X1 ] = 3E[X1 ]E[X1 ] − 2E[X1 ]3 3 2But we could also use, the Poisson 3 E[X1 ] = E[X1 ] + 3E[X1 ]2 + E[X1 ]3or the Log normal 2 3 3 E [ X1 ] E [ X1 ] = E [ X1 ] 11/43
15. 15. Heat shock modelProctor et al, 2005. Stochastic kinetic model of the heat shock system twenty-three reactions seventeen chemical speciesA single stochastic simulation up to t = 2000 takes about 35 minutes.If we convert the model to moment equations, we get 139 equations ADP Native Protein 1200 6000000 5950000 1000 5900000 800 Population 5850000 600 5800000 400 5750000 200 5700000 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Time Gillespie, CS, 2009 12/43
16. 16. Density plots: heat shock model Time t=200 Time t=2000 0.006Density 0.004 0.002 0.000 600 800 1000 1200 1400 600 800 1000 1200 1400 ADP population 13/43
17. 17. P53-Mdm2 oscillation modelProctor and Grey, 2008 300 16 chemical species 250 Around a dozen reactions 200 PopulationThe model contains an events At t = 1, set X = 0 150If we convert the model to moment 100equations, we get 139 equations. 50However, in this case the moment 0closure approximation doesn’t do to 0 5 10 15 20 25 30 Timewell! 14/43
18. 18. P53-Mdm2 oscillation modelProctor and Grey, 2008 300 16 chemical species Around a dozen reactions 250The model contains an events 200 Population At t = 1, set X = 0 150If we convert the model to moment 100equations, we get 139 equations. 50However, in this case the moment 0closure approximation doesn’t do to 0 5 10 15 20 25 30well! Time 14/43
19. 19. P53-Mdm2 oscillation modelProctor and Grey, 2008 300 16 chemical species Around a dozen reactions 250The model contains an events 200 Population At t = 1, set X = 0 150If we convert the model to moment 100equations, we get 139 equations. 50However, in this case the moment 0closure approximation doesn’t do to 0 5 10 15 20 25 30well! Time 14/43
20. 20. What went wrong?The Moment closure (tends) to fail when there is a large differencebetween the deterministic and stochastic formulationsIn this particular case, strongly correlated speciesTypically when the MC approximation fails, it gives a negativevarianceThe MC approximation does work well for other parameter values forthe p53 model 15/43
21. 21. Part IICotton aphids 16/43
22. 22. Cotton aphidsAphid infestation (G & Golightly, 2010)A cotton aphid infestation of a cotton plant can result in: leaves that curl and pucker seedling plants become stunted and may die a late season infestation can result in stained cotton cotton aphids have developed resistance to many chemical treatments and so can be difﬁcult to treat Basically it costs someone a lot of money 17/43
23. 23. Cotton aphidsAphid infestation (G & Golightly, 2010)A cotton aphid infestation of a cotton plant can result in: leaves that curl and pucker seedling plants become stunted and may die a late season infestation can result in stained cotton cotton aphids have developed resistance to many chemical treatments and so can be difﬁcult to treat Basically it costs someone a lot of money 17/43
24. 24. Cotton aphidsThe data consists of ﬁve observations at each plot the sampling times are t=0, 1.14, 2.29, 3.57 and 4.57 weeks (i.e. every 7 to 8 days) three blocks, each being in a distinct area three irrigation treatments (low, medium and high) three nitrogen levels (blanket, variable and none) 18/43
25. 25. The data Zero Variable Block q25002000 q1500 Low q q1000 q q q q q q500 q q q q q q q q q q q q q q q q q q q q q q 0 q q q2500 q2000 Medium q1500 q q q q q 19/431000 q q q
26. 26. Zero Variable Block The data q 2500 2000 q 1500 Low q q 1000 q q q q q q 500 q q q q q q q q q q q q q q q q q q q q q q 0 q q q 2500 qNo. of aphids 2000 Medium q 1500 q q q q q 1000 q q q q 500 q q q q q q q q q q q q q q q q q q q q q 0 2500 2000 q q High 1500 q q q q 1000 q q q q q q 500 q q q q q q q q q q q q q q q q q q q q q q 0 q q 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 Time 19/43
27. 27. Some notationLet n (t ) to be the size of the aphid population at time t c (t ) to be the cumulative aphid population at time t 1. We observe n (t ) at discrete time points 2. We don’t observe c (t ) 3. c (t ) ≥ n (t ) 20/43
28. 28. The modelWe assume, based on previous modelling (Matis et al., 2004) An aphid birth rate of λn (t ) An aphid death rate of µn (t )c (t ) So extinction is certain, as eventually µnc > λn for large t 21/43
29. 29. The modelDeterministic representationPrevious modelling efforts have focused on deterministic models: dN (t ) = λN (t ) − µC (t )N (t ) dt dC (t ) = λN (t ) dtSome problems Initial and ﬁnal aphid populations are quite small No allowance for ‘natural’ random variation Solution: use a stochastic model 22/43
30. 30. The modelDeterministic representationPrevious modelling efforts have focused on deterministic models: dN (t ) = λN (t ) − µC (t )N (t ) dt dC (t ) = λN (t ) dtSome problems Initial and ﬁnal aphid populations are quite small No allowance for ‘natural’ random variation Solution: use a stochastic model 22/43
31. 31. The modelStochastic representationLet pn,c (t ) denote the probability: there are n aphids in the population at time t a cumulative population size of c at time tThis gives the forward Kolmogorov equation dpn,c (t ) = λ(n − 1)pn−1,c −1 (t ) + µc (n + 1)pn+1,c (t ) dt − n ( λ + µ c ) p n ,c ( t )Even though this equation is fairly simple, it still can’t be solved exactly. 23/43
32. 32. Some simulations 800 600 Aphid pop. 400 200 0 0 2 4 6 8 10 Time (days)Parameters: n (0) = c (0) = 1, λ = 1.7 and µ = 0.001 24/43
33. 33. Some simulations 800 600 Aphid pop. 400 200 0 0 2 4 6 8 10 Time (days)Parameters: n (0) = c (0) = 1, λ = 1.7 and µ = 0.001 24/43
34. 34. Some simulations 800 600 Aphid pop. 400 200 0 0 2 4 6 8 10 Time (days)Parameters: n (0) = c (0) = 1, λ = 1.7 and µ = 0.001 24/43
35. 35. Stochastic parameter estimationLet X(tu ) = (n (tu ), c (tu )) be the vector of observed aphid countsand unobserved cumulative population size at time tu ;To infer λ and µ, we need to estimate Pr[X(tu )| X(tu −1 ), λ, µ]i.e. the solution of the forward Kolmogorov equationWe will use moment closure to estimate this distribution 25/43
36. 36. Stochastic parameter estimationLet X(tu ) = (n (tu ), c (tu )) be the vector of observed aphid countsand unobserved cumulative population size at time tu ;To infer λ and µ, we need to estimate Pr[X(tu )| X(tu −1 ), λ, µ]i.e. the solution of the forward Kolmogorov equationWe will use moment closure to estimate this distribution 25/43
37. 37. Moment equations for the means dE[n (t )] = λE[n(t )] − µ(E[n(t )]E[c (t )] + Cov[n(t ), c (t )]) dt dE[c (t )] = λE[n(t )] dtThe equation for the E[n (t )] depends on the Cov[n (t ), c (t )]Setting Cov[n (t ), c (t )]=0 gives the deterministic modelWe obtain similar equations for higher-order moments 26/43
38. 38. Moment equations for the means dE[n (t )] = λE[n(t )] − µ(E[n(t )]E[c (t )] + Cov[n(t ), c (t )]) dt dE[c (t )] = λE[n(t )] dtThe equation for the E[n (t )] depends on the Cov[n (t ), c (t )]Setting Cov[n (t ), c (t )]=0 gives the deterministic modelWe obtain similar equations for higher-order moments 26/43
39. 39. Parameter inferenceGiven the parameters: {λ, µ} the initial states: X(tu −1 ) = (n (tu −1 ), c (tu −1 ));We have X(tu ) | X(tu −1 ), λ, µ ∼ N (ψu −1 , Σu −1 )where ψu −1 and Σu −1 are calculated using the moment closureapproximation 27/43
40. 40. Parameter inferenceSummarising our beliefs about {λ, µ} and the unobservedcumulative population c (t0 ) via priors p (λ, µ) and p (c (t0 ))The joint posterior for parameters and unobserved states (for a singledata set) is 4 p (λ, µ, c | n) ∝ p (λ, µ) p (c(t0 )) ∏ p (x(tu ) | x(tu−1 ), λ, µ) u =1For the results shown, we used a simple random walk MH step toexplore the parameter and state spacesFor more complicated models, we can use a Durham & Gallant stylebridge (Milner, G & Wilkinson, 2012). 28/43
41. 41. Parameter inferenceSummarising our beliefs about {λ, µ} and the unobservedcumulative population c (t0 ) via priors p (λ, µ) and p (c (t0 ))The joint posterior for parameters and unobserved states (for a singledata set) is 4 p (λ, µ, c | n) ∝ p (λ, µ) p (c(t0 )) ∏ p (x(tu ) | x(tu−1 ), λ, µ) u =1For the results shown, we used a simple random walk MH step toexplore the parameter and state spacesFor more complicated models, we can use a Durham & Gallant stylebridge (Milner, G & Wilkinson, 2012). 28/43
42. 42. Simulation studyThree treatments & two blocksBaseline birth and death rates: {λ = 1.75, µ = 0.00095}Treatment 2 increases µ by 0.0004Treatment 3 increases λ by 0.35The block effect reduces µ by 0.0003 Treatment 1 Treatment 2 Treatment 3Block 1 {1.75, 0.00095} {1.75, 0.00135} {2.1, 0.00095}Block 2 {1.75, 0.00065} {1.75, 0.00105} {2.1, 0.00065} 29/43
43. 43. Simulation studyThree treatments & two blocksBaseline birth and death rates: {λ = 1.75, µ = 0.00095}Treatment 2 increases µ by 0.0004Treatment 3 increases λ by 0.35The block effect reduces µ by 0.0003 Treatment 1 Treatment 2 Treatment 3Block 1 {1.75, 0.00095} {1.75, 0.00135} {2.1, 0.00095}Block 2 {1.75, 0.00065} {1.75, 0.00105} {2.1, 0.00065} 29/43
44. 44. Simulation studyThree treatments & two blocksBaseline birth and death rates: {λ = 1.75, µ = 0.00095}Treatment 2 increases µ by 0.0004Treatment 3 increases λ by 0.35The block effect reduces µ by 0.0003 Treatment 1 Treatment 2 Treatment 3Block 1 {1.75, 0.00095} {1.75, 0.00135} {2.1, 0.00095}Block 2 {1.75, 0.00065} {1.75, 0.00105} {2.1, 0.00065} 29/43
45. 45. Simulation studyThree treatments & two blocksBaseline birth and death rates: {λ = 1.75, µ = 0.00095}Treatment 2 increases µ by 0.0004Treatment 3 increases λ by 0.35The block effect reduces µ by 0.0003 Treatment 1 Treatment 2 Treatment 3Block 1 {1.75, 0.00095} {1.75, 0.00135} {2.1, 0.00095}Block 2 {1.75, 0.00065} {1.75, 0.00105} {2.1, 0.00065} 29/43
46. 46. Simulated data Treament 1 Treatment 2 Treatment 3 1500 q BlockPopulation 1000 q q 1 q 2 500 q q q q q q q q q q q 0 q 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Time 30/43
47. 47. Parameter structureLet i , k represent the block and treatments level, i ∈ {1, 2} andk ∈ {1, 2, 3}For each data set, we assume birth rates of the form: λik = λ + αi + β kwhere α1 = β 1 = 0So for block 1, treatment 1 we have: λ11 = λand for block 2, treatment 1 we have: λ21 = λ + α2 31/43
48. 48. MCMC schemeUsing the MCMC scheme described previously, we generated 2Miterates and thinned by 1KThis took a few hours and convergence was fairly quickWe used independent proper uniform priors for the parametersFor the initial unobserved cumulative population, we had c (t0 ) = n (t0 ) +where has a Gamma distribution with shape 1 and scale 10.This set up mirrors the scheme that we used for the real data set 32/43
49. 49. Marginal posterior distributions for λ and µ 20000 6 15000Density Density 4 10000 2 5000 0 X 0 X 1.6 1.7 1.8 1.9 2.0 0.00090 0.00095 0.00100 Birth Rate Death Rate 33/43
50. 50. Marginal posterior distributions for birth rates −0.2 0.0 0.2 0.4 Block 2 Treatment 2 Treatment 3 6 Density 4 2 0 X X X −0.2 0.0 0.2 0.4 −0.2 0.0 0.2 0.4 Birth RateWe obtained similar densities for the death rates. 34/43
51. 51. Application to the cotton aphid data setRecall that the data consists of ﬁve observations on twenty randomly chosen leaves in each plot; three blocks, each being in a distinct area; three irrigation treatments (low, medium and high); three nitrogen levels (blanket, variable and none); the sampling times are t=0, 1.14, 2.29, 3.57 and 4.57 weeks (i.e. every 7 to 8 days).Following in the same vein as the simulated data, we are estimating 38parameters (including interaction terms) and the latent cumulative aphidpopulation. 35/43
52. 52. Cotton aphid data Marginal posterior distributions 6 15000Density Density 4 10000 2 5000 0 0 1.6 1.7 1.8 1.9 2.0 0.00090 0.00095 0.00100 Birth Rate Death Rate 36/43
53. 53. Does the model ﬁt the data?We simulate predictive distributions from the MCMC output, i.e. werandomly sample parameter values (λ, µ) and the unobserved statec and simulate forwardWe simulate forward using the Gillespie simulator not the moment closure approximation 37/43
54. 54. Does the model ﬁt the data?Predictive distributions for 6 of the 27 Aphid data sets D 123 D 121 D131 2500 2000 1500 X q q q q 1000 X q q X q q q q Aphid Population q q q q q q q 500 X q q q X q q q q q X q q q q X X q q q X q X q q q X X 0 q D 112 D 122 D 113 q q X 2500 q q 2000 1500 q q X q q q q 1000 q q q q X q q X q q q q q q q 500 X q q X q q X q q q q q X q q q X X q X X q 0 q 1.14 2.29 3.57 4.57 1.14 2.29 3.57 4.57 1.14 2.29 3.57 4.57 Time 38/43
55. 55. Summarising the resultsConsider the additional number of aphids per treatment combinationSet c (0) = n (0) = 1 and tmax = 6We now calculate the number of aphids we would see for eachparameter combination in addition to the baselineFor example, the effect due to medium water: ∗ λ211 = λ + αWater (M) and µ211 = µ + αWater (M)So i i Additional aphids = cWater (M) − cbaseline 39/43
56. 56. Aphids over baseline Main Effects 0 2000 6000 10000 Nitrogen (V) Water (H) Water (M) 0.0025 0.0020 0.0015 0.0010 0.0005 0.0000Density Block 3 Block 2 Nitrogen (Z) 0.0025 0.0020 0.0015 0.0010 0.0005 0.0000 0 2000 6000 10000 0 2000 6000 10000 Aphids 40/43
57. 57. Aphids over baseline Interactions 0 2000 6000 10000 0 2000 6000 10000 W(H) N(Z) W(M) N(Z) W(H) N(V) W(M) N(V) 0.003 0.002 0.001 0.000 B3 W(H) B2 W(H) B3 W(M) B2 W(M) 0.003Density 0.002 0.001 0.000 B3 N(Z) B2 N(Z) B3 N(V) B2 N(V) 0.003 0.002 0.001 0.000 0 2000 6000 10000 0 2000 6000 10000 Aphids 40/43
58. 58. ConclusionsThe 95% credible intervals for the baseline birth and death rates are(1.64, 1.86) and (0.00090, 0.00099).Main effects have little effect by themselvesHowever block 2 appears to have a very strong interaction withnitrogenMoment closure parameter inference is a very useful technique forestimating parameters in stochastic population models 41/43
59. 59. Future workAphid model Other data sets suggest that there is aphid immigration in the early stages Model selection for stochastic models Incorporate measurement errorMoment closure Better closure techniques Assessing the ﬁt 42/43
60. 60. Acknowledgements Andrew Golightly Richard Boys Peter Milner Darren Wilkinson Jim Matis (Texas A & M)References Gillespie, CS Moment closure approximations for mass-action models. IET Systems Biology 2009. Gillespie, CS, Golightly, A Bayesian inference for generalized stochastic population growth models with application to aphids. Journal of the Royal Statistical Society, Series C 2010. Milner, P, Gillespie, CS, Wilkinson, DJ Moment closure approximations for stochastic kinetic models with rational rate laws. Mathematical Biosciences 2011. Milner, P, Gillespie, CS and Wilkinson, DJ Moment closure based parameter inference of stochastic kinetic models. Statistics and Computing 2012. 43/43
1. A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.