Considerate Approaches to ABC Model Selection

2,540 views

Published on

Talk given at ISBA 2012 in the Approximate Bayesian Computation Special Topic Session

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,540
On SlideShare
0
From Embeds
0
Number of Embeds
1,652
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Considerate Approaches to ABC Model Selection

  1. 1. Considerate Approaches to ABC Model Selection Michael P.H. Stumpf, Christopher Barnes, Sarah Filippi, Thomas Thorne Theoretical Systems Biology Group 26/06/2012 Considerate Approaches to ABC Model Selection Stumpf et al. 1 of 15
  2. 2. Evolving Networks (a) Duplication attachment (b) Duplication attachment with complimentarity wj (c) Linear preferential wi (d) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 2 of 15
  3. 3. Inference and Model SelectionWe have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = {M1 , . . . , Mν }, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T . Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  4. 4. Inference and Model SelectionWe have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = {M1 , . . . , Mν }, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T .Model PosteriorPr(Mi |T, D) Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  5. 5. Inference and Model SelectionWe have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = {M1 , . . . , Mν }, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T . Likelihood PriorModel Posterior Pr(D|Mi , T)π(Mi )Pr(Mi |T, D)= ν Pr(D|Mj , T)π(Mj ) j =1 Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  6. 6. Inference and Model SelectionWe have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = {M1 , . . . , Mν }, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T . Likelihood PriorModel Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/orPr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  7. 7. Inference and Model SelectionWe have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = {M1 , . . . , Mν }, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T . Likelihood PriorModel Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/orPr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. EvidenceApproximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway. Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  8. 8. Approximate Bayesian ComputationWe can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x )Here fi (x |θ) is the likelihood which is often hard to evaluate; considerfor example dyy = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and˜ = g (y ; θ). dt Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  9. 9. Approximate Bayesian ComputationWe can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x )Here fi (x |θ) is the likelihood which is often hard to evaluate; considerfor example dyy = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and˜ = g (y ; θ). dtBut we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  10. 10. Approximate Bayesian ComputationWe can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x )Here fi (x |θ) is the likelihood which is often hard to evaluate; considerfor example dyy = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and˜ = g (y ; θ). dtBut we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x )Solutions for Complex Problems (?)Approximate (i) data, (ii) model or (iii) distance. Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  11. 11. ABC with Summary StatisticsIf the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  12. 12. ABC with Summary StatisticsIf the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy XSufficient StatisticsThis only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) wehave p(x |s, θ) = p(x |s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  13. 13. ABC with Summary StatisticsIf the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy XSufficient StatisticsThis only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) wehave p(x |s, θ) = p(x |s)Sufficency for Model SelectionIf S (.) is sufficient for parameter estimation (in all models iconsidered) it is not necessarily sufficient for model selection (Robertet al., PNAS (2011)). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  14. 14. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30600 25 Role of Summary Statistics 20 Mean (sufficient) correctly400 15 10 infers µ.200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max250 300 Var fails to capture any200 250 information on µ. 200150 150100 100 50 50 0 0 −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  15. 15. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30600 25 Role of Summary Statistics 20 Mean (sufficient) correctly400 15 10 infers µ.200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max250 300 Var fails to capture any200 250 information on µ. 200150 150100 We need a way of constructing 100 50 50 sets of statistics that together are 0 0 (approximately) sufficient. −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  16. 16. A Closer Look at Summary StatisticsWe interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s.If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  17. 17. A Closer Look at Summary StatisticsWe interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s.If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s)Information Theoretical PerspectiveA summary statistic is an information compression device. Now let Sbe a set of statistics which together are sufficient. Then the mutualinformation p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x ) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  18. 18. A Closer Look at Summary StatisticsWe interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s.If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s)Information Theoretical PerspectiveA summary statistic is an information compression device. Now let Sbe a set of statistics which together are sufficient. Then the mutualinformation p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x )Constructing Minimally Sufficient Summary StatisticsWe seek the set U ⊆ S with minimal cardinality such thatI (Θ; S) = I (Θ; U). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  19. 19. Constructing Sufficient StatisticsPropositionLet X be a random variable generated according to f (·|θ). Let S be asummary statistic and U and T two subsets of S such that U = U(X ),T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have I (Θ; S |T ) = I (Θ; S |U ) − I (Θ; T |U ) .In order to construct a subset T of S such that I (Θ; S |T ) = 0, it is thussufficient to add statistics from S one by one until the condition holds.If we denote by S(k ) the kth statistic to be added (with k w) we haveS(k ) = S(k ) (X ), and then I (Θ; S |S(1) , . . . , S(k +1) ) I (Θ; S |S(1) , . . . , S(k ) ) . Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  20. 20. Constructing Sufficient Statistics p(θ, S(x )|U(x ))I (Θ; S |U ) = p(θ, S(x ), U(x )) log dxd θ Ω X p(θ|U(x ))p(S(x )|U(x )) = p(S(x )) [KL(p(Θ|S(x ))||p(Θ|U(x )))] dx X = Ep(X ) [KL(p(Θ|S(X ))||p(Θ|U(X )))]An Impossible Algorithm• for all subsets u ∗ ⊆ s ∗ , perform ABC to obtain estimates p (Θ|u ∗ )• determine the set A = {u ∗ ⊂ s∗ such that KL (p (Θ|s∗ )||p (Θ|u ∗ )) = 0},• the desired subset is argminu ∗ ∈A |u ∗ | Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  21. 21. Constructing Sufficient Statistics input: a sufficient set of statistics whose values on the dataset is s∗ = {s1 , . . . , sw }, a threshold δ ∗ ∗ output: a subset v ∗ of s∗ choose randomly u ∗ in s∗ v ∗ ← u∗ q ∗ ← s ∗ v ∗ repeat repeat if q ∗ = Ø then return v ∗ end if choose randomly u ∗ in q ∗ q ∗ ← q ∗ u ∗ perform ABC to obtain p (Θ|v ∗ , u ∗ ) until KL (p (Θ|v ∗ , u ∗ )||p (Θ|v ∗ )) δ optionally: v ∗ ← OrderDependency (v ∗ , u ∗ ) v ∗ ← v ∗ ∪ u∗ q ∗ ← s ∗ v ∗ until q ∗ = Ø return v ∗ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  22. 22. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 100 100 80 80 60 60Run Run 40 40 20 20 mean S2 range max random mean S2 range max random Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  23. 23. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 6 q q q 8 q qqq qqq q q q qq q q qq q qq q q q q q q q 6 4 q q q q qq q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q qq q q q qqlog(BF) ABC log(BF) ABC q q q 4 q q q q q 2 q qq q q qq q q q q q q qq q q q q qq q q qq q q q q q q q qqqqqq qq q 2 q q q q qq qq qqq q qq q q q q q q q q qq q q q qq q q q q q q qq q q qq q q q q qq q q 0 qq q q qq q q q q 0 q q q q q q −2 −2 q q −2 0 2 4 6 8 −2 0 2 4 6 8 log(BF) predicted log(BF) predicted Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  24. 24. Examples: Population GeneticsConstant Population Size 100 80 60Run 40 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  25. 25. Examples: Population GeneticsConstant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  26. 26. Examples: Population GeneticsConstant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Summary Statistic Choice The choice of summary statistics appears to depend subtely on the true data-generating model. In light of coalescent processes this is, however, to be expected. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  27. 27. Examples: Random Walks Classical Random Persistent Random Biased Random Walk Walk Walk 100 100 100 80 80 80 60 60 60Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 [S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightness index; [S5] Eigenvalues of gyration tensor. Parameter Sufficiency for Complex Problems Here all statistics that have been chosen for parameter estimation are also chosen for model selection. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  28. 28. Conditioning on Information Θ s1 s2 s3 Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  29. 29. Conditioning on Information Θ s1 s2 xStatistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  30. 30. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ).Statistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  31. 31. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ).Statistics Sufficient: Implicates same area as How about full data. p(t |s) Ancillary: Implicates all values of θ equally. if s is not (quite) sufficient? Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  32. 32. Model Selection vs. Model CheckingModel Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others.Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’.Alternative Approach: ABCµ [Ratmann et al., PNAS]. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  33. 33. Model Selection vs. Model CheckingModel Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others.Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’.Alternative Approach: ABCµ [Ratmann et al., PNAS].Posterior Predictive ChecksWe are interested in the posterior predictive distribution, p(t (X )|s(X )) = p(t (X )|θ)p(θ|s(X ))d θ. ΘIn particular we have p(s(X )|s(X )) = p(s(X )|X )unless t (X ) is sufficient. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  34. 34. ABC on Network Data (e) Duplication attachment (f) Duplication attachment with complimentarity wj (g) Linear preferential wi (h) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  35. 35. ABC on Network DataSummarizing Networks• Data are noisy and incomplete.• We can simulate models of network evolution, but this does not allow us to calculate likelihoods for all but very trivial models.• There is also no sufficient statistic that would allow us to summarize networks, so ABC approaches require some thought.• Many possible summary statistics of networks are expensive to calculate. Full likelihood: Wiuf et al., PNAS (2006). ABC: Ratman et al., PLoS Comp.Biol. (2008). ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012). Stumpf & Wiuf, J. Roy. Soc. Interface (2010). Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  36. 36. Spectral Distances c a b c d e   0 1 1 1 0 a   a d e   1 0 1 1 0 b  A = 1 1 0 0 0 c     1 1 0 0 1 d  b 0 0 0 1 0 eGraph SpectraGiven a graph G with nodes N and edges (i , j ) ∈ E with i , j ∈ N, theadjacency matrix, A, of the graph is defined by 1 if (i , j ) ∈ E , ai ,j = 0 otherwise.The eigenvalues, λ, of this matrix provide one way of defining thegraph spectrum. Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  37. 37. Spectral DistancesA simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  38. 38. Spectral DistancesA simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,jHowever for unlabelled graphs we require some mapping h fromi ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  39. 39. Spectral DistancesA simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,jHowever for unlabelled graphs we require some mapping h fromi ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,jGiven a spectrum (which is relatively cheap to compute) we have (α) (β) 2 D (A, B ) = λl − λl l Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  40. 40. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  41. 41. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 0.4Model probability Organism 0.3 S.cerevisae D.melanogaster H.pylori E.coli 0.2 0.1 0.0 DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  42. 42. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 Model Selection • Inference here was based on all 0.4 the data, not summaryModel probability 0.3 Organism S.cerevisae statistics. D.melanogaster H.pylori E.coli • Duplication models receive the 0.2 strongest support from the data. 0.1 • Several models receive support and no model is chosen 0.0 unambiguously. DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  43. 43. Protein Interaction Network Data δ α 15 8 6 10 DA 4 5 2 0 0 0.0 0.4 0.8 0.0 0.4 0.8 δ α 15 8 6 10 DAC 4 5 2 S.cerevisiae 0 0 0.0 0.4 0.8 0.0 0.4 0.8 D. melanogaster δ α p m H. pylori 1.0 10 10 4 0.8 8 8 E. coli 3 0.6 6 6 DACL 2 0.4 4 4 1 0.2 2 2 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 δ α p m 1.0 4 5 8 0.8 4 3 6 0.6 3 DACR 2 4 0.4 2 1 2 0.2 1 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  44. 44. Considerate Use of ABC• ABC is a tool for situations where conventional statistical approaches fail or are too cumbersome.• If all the data are used then this is (relatively) unproblematic; if the data are compressed/corrupted then caution is required.• Some of the issues arising in ABC mirror those also encountered in “conventional” statistics: Any Bayesian inference uses the data only via the minimal sufficient statistic. This is because the calculation of the posterior distribution involves multiplying the likelihood by the prior and normalizing. Any factor of the likelihood that is a function of y alone will disappear after normalization. D. Cox (2006).• In other cases it seems prudent to accept the additional (and considerable) computational cost of constructing suitable summary statistics (such as in Barnes et al., Stat&Comp 2012). Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15
  45. 45. Acknowledgements Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15

×