SlideShare a Scribd company logo
1 of 45
Download to read offline
Considerate Approaches to ABC Model Selection




                                             Michael P.H. Stumpf, Christopher
                                           Barnes, Sarah Filippi, Thomas Thorne

                                                      Theoretical Systems Biology Group


                                                                 26/06/2012




 Considerate Approaches to ABC Model Selection   Stumpf et al.                            1 of 15
Evolving Networks




              (a) Duplication attachment (b) Duplication attachment
                                                   with complimentarity




                                                                       wj
            (c) Linear preferential
                                                                        wi
                                                        (d) General scale-free
            attachment
   Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   2 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .
Model Posterior

Pr(Mi |T, D)




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )
Pr(Mi |T, D)=         ν
                           Pr(D|Mj , T)π(Mj )
                    j =1

                                Evidence




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Model Selection   3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )                            For complicated models and/or
Pr(Mi |T, D)=         ν                                              detailed data the likelihood
                           Pr(D|Mj , T)π(Mj )                        evaluation can become
                    j =1                                             prohibitively expensive.
                                Evidence




     Considerate Approaches to ABC Model Selection   Stumpf et al.    Model Selection         3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system that
we seek to describe by a mathematical model. In principle we can
have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an
associated parameter θi .
We may know the different constituent parts of the system, Xi , and
have measurements for some or all of them under some experimental
designs, T .      Likelihood Prior
Model Posterior
                       Pr(D|Mi , T)π(Mi )                            For complicated models and/or
Pr(Mi |T, D)=         ν                                              detailed data the likelihood
                           Pr(D|Mj , T)π(Mj )                        evaluation can become
                    j =1                                             prohibitively expensive.
                                Evidence

Approximate Inference
We can approximate the likelihood and/or the models. The “true”
model is unlikely to be in M anyway.

     Considerate Approaches to ABC Model Selection   Stumpf et al.    Model Selection         3 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt




      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt
But we can still simulate from the data-generating model, whence

                                  1(y = x )f (y |θi )π(θi )
                  p(θi |x ) =                               dy
                                X          p (x )
                                  1 (∆(y , x ) < ) f (y |θi )π(θi )
                              ≈                                     dy
                                X              p (x )




      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
Approximate Bayesian Computation
We can define the posterior as
                                      f (x |θi )π(θi )
                                    p(θi |x ) =
                                            p (x )
Here fi (x |θ) is the likelihood which is often hard to evaluate; consider
for example
                                                             dy
y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and
˜                                                                = g (y ; θ).
                                                             dt
But we can still simulate from the data-generating model, whence

                                  1(y = x )f (y |θi )π(θi )
                  p(θi |x ) =                               dy
                                X          p (x )
                                  1 (∆(y , x ) < ) f (y |θi )π(θi )
                              ≈                                     dy
                                X              p (x )

Solutions for Complex Problems (?)
Approximate (i) data, (ii) model or (iii) distance.

      Considerate Approaches to ABC Model Selection   Stumpf et al.   Approximate Bayesian Computation   4 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

       pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                X




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

        pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                 X
Sufficient Statistics
This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we
have
                           p(x |s, θ) = p(x |s)




      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparison
between real and simulated data becomes prohibitive. In such
situations, which originally motivated ABC approaches, summary
statistics of the data are compared. We then have

        pS , (θi |D) ∝               1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy
                                 X
Sufficient Statistics
This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we
have
                           p(x |s, θ) = p(x |s)

Sufficency for Model Selection
If S (.) is sufficient for parameter estimation (in all models i
considered) it is not necessarily sufficient for model selection (Robert
et al., PNAS (2011)).
      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   5 of 15
ABC with Summary Statistics
           Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that
           σ2 = 1 is known).

                 mean                                var
                                   30



600
                                   25                                    Role of Summary Statistics
                                   20
                                                                                      Mean (sufficient) correctly
400                                15


                                   10
                                                                                           infers µ.
200
                                      5                                         Max/Min capture some
 0
      −4    −2    0      2    4
                                      0
                                           −4   −2    0    2   4
                                                                                        information on µ.
                 min                                 max

250
                                   300                                                   Var fails to capture any
200
                                   250
                                                                                             information on µ.
                                   200
150
                                   150

100
                                   100

 50                                   50


 0                                     0
      −4    −2    0      2    4            −4   −2    0    2   4
                                  θ


                      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics        5 of 15
ABC with Summary Statistics
           Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that
           σ2 = 1 is known).

                 mean                                var
                                   30



600
                                   25                                    Role of Summary Statistics
                                   20
                                                                                      Mean (sufficient) correctly
400                                15


                                   10
                                                                                           infers µ.
200
                                      5                                         Max/Min capture some
 0
      −4    −2    0      2    4
                                      0
                                           −4   −2    0    2   4
                                                                                        information on µ.
                 min                                 max

250
                                   300                                                   Var fails to capture any
200
                                   250
                                                                                             information on µ.
                                   200
150
                                   150

100
                                                                         We need a way of constructing
                                   100

 50                                   50
                                                                         sets of statistics that together are
 0                                     0                                 (approximately) sufficient.
      −4    −2    0      2    4            −4   −2    0    2   4
                                  θ


                      Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics        5 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)
Information Theoretical Perspective
A summary statistic is an information compression device. Now let S
be a set of statistics which together are sufficient. Then the mutual
information
                                          p(θ, x )
         I (Θ; X ) =        p(θ, x ) log           d θdx = I (θ, S)
                       Ω X               p(θ)p(x )




     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,

                               S : Rd −→ Rw ,                S(x ) = s.
If S is sufficient then (we include the model indicator variable in θ)

                                        p(θ|x ) = p(θ|s)
Information Theoretical Perspective
A summary statistic is an information compression device. Now let S
be a set of statistics which together are sufficient. Then the mutual
information
                                          p(θ, x )
         I (Θ; X ) =        p(θ, x ) log           d θdx = I (θ, S)
                       Ω X               p(θ)p(x )

Constructing Minimally Sufficient Summary Statistics
We seek the set U ⊆ S with minimal cardinality such that
I (Θ; S) = I (Θ; U).

     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   6 of 15
Constructing Sufficient Statistics


Proposition
Let X be a random variable generated according to f (·|θ). Let S be a
summary statistic and U and T two subsets of S such that U = U(X ),
T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have

                        I (Θ; S |T ) = I (Θ; S |U ) − I (Θ; T |U ) .

In order to construct a subset T of S such that I (Θ; S |T ) = 0, it is thus
sufficient to add statistics from S one by one until the condition holds.
If we denote by S(k ) the kth statistic to be added (with k   w) we have
S(k ) = S(k ) (X ), and then

            I (Θ; S |S(1) , . . . , S(k +1) )            I (Θ; S |S(1) , . . . , S(k ) ) .


     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   7 of 15
Constructing Sufficient Statistics


                                                                  p(θ, S(x )|U(x ))
I (Θ; S |U ) =               p(θ, S(x ), U(x )) log                                     dxd θ
                    Ω X                                        p(θ|U(x ))p(S(x )|U(x ))

              =         p(S(x )) [KL(p(Θ|S(x ))||p(Θ|U(x )))] dx
                    X
              = Ep(X ) [KL(p(Θ|S(X ))||p(Θ|U(X )))]



An Impossible Algorithm
• for all subsets u ∗ ⊆ s ∗ , perform ABC to obtain estimates p (Θ|u ∗ )
• determine the set
  A = {u ∗ ⊂ s∗ such that KL (p (Θ|s∗ )||p (Θ|u ∗ )) = 0},
• the desired subset is argminu ∗ ∈A |u ∗ |


     Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics     7 of 15
Constructing Sufficient Statistics
 input: a sufficient set of statistics whose values on the dataset is s∗ =
 {s1 , . . . , sw }, a threshold δ
    ∗           ∗

 output: a subset v ∗ of s∗
 choose randomly u ∗ in s∗
 v ∗ ← u∗
 q ∗ ← s ∗ v ∗
 repeat
      repeat
            if q ∗ = Ø then return v ∗
            end if
            choose randomly u ∗ in q ∗
            q ∗ ← q ∗ u ∗
            perform ABC to obtain p (Θ|v ∗ , u ∗ )
      until KL (p (Θ|v ∗ , u ∗ )||p (Θ|v ∗ )) δ
      optionally: v ∗ ← OrderDependency (v ∗ , u ∗ )
      v ∗ ← v ∗ ∪ u∗
      q ∗ ← s ∗ v ∗
 until q ∗ = Ø
 return v ∗
    Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics   7 of 15
Examples: Normal Distributions


                        y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 )
                                           1                          2


      100                                                           100




      80                                                            80




      60                                                            60
Run




                                                              Run
      40                                                            40




      20                                                            20




              mean      S2      range    max     random                     mean     S2      range      max   random




            Considerate Approaches to ABC Model Selection   Stumpf et al.      ABC Summary Statistics                  8 of 15
Examples: Normal Distributions


                               y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 )
                                                  1                          2
              6




                                                                                      q                                                                            q           q




                                                                                                             8
                                                                  q
                                                                                                                                                                 qqq
                                                                                                                                                                 qqq
                                                                                  q                                                                                q
                                                                                                                                                         q     qq
                                                                                                                                                          q
                                                           q                                                                                           qq q qq q
                                                                                                                                                             q
                                                                                                                                                       q q q q q




                                                                                                             6
              4




                                                            q             q                                                                                q
                                                                                                                                                      q qq
                                                                                                                                                      q qq       q
                                                   q                                                                                                       q
                                                                                                                                                 q    q qq q q
                                                                                                                                                       q q q q
                                                       q              q                                                                              qq q q
                                                                          q                                                                     q q q q q
                                                                                                                                                qq  q
                                                                                                                                                   qq q q
                                                                                                                                                 q qq
log(BF) ABC




                                                                                               log(BF) ABC
                                                                                                                                             q    q q




                                                                                                             4
                                                                                                                                               q
                                                            q                                                                          q          q                q
              2




                                           q        qq        q                                                                                q qq
                                                                                                                                                q             q
                                                      q       q                                                                              q q
                                                                                                                                            qq                 q           q
                                                     q     q qq
                                                            q q
                                      qq                 q          q                                                                        q
                                                        q q
                                                      q q qqqqqq    qq                                                             q




                                                                                                             2
                                  q             q q  q qq           qq
                                                            qqq q qq q
                                                            q                             q                                       q
                                                                                                                                           q
                              q                 q q qq q q q qq q q
                                                    q q q q qq q                                                                           q
                                               qq q q q q
                                                               qq                                                             q
                    q
              0




                                                      qq                                                                                                  q
                                               q                  qq                                                        q q                                            q
                                                                    q




                                                                                                             0
                                                                q
                                  q
                                                                q q                                                                    q
                                                                      q
              −2




                                                                                                             −2
                                           q                                                                      q



                        −2        0            2           4                  6           8                           −2     0             2          4                6       8

                                      log(BF) predicted                                                                           log(BF) predicted




                   Considerate Approaches to ABC Model Selection                              Stumpf et al.           ABC Summary Statistics                                       8 of 15
Examples: Population Genetics
Constant Population
       Size
      100




      80




      60
Run




      40




      20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.




                            Considerate Approaches to ABC Model Selection   Stumpf et al.   ABC Summary Statistics                 9 of 15
Examples: Population Genetics
Constant Population                                                               Exponential                                                               Two-Island Model
       Size                                                                    Population Growth                                                             with Migration
      100
                                                                             100                                                                      100




      80
                                                                             80                                                                       80




      60
                                                                             60                                                                       60
Run




                                                                       Run




                                                                                                                                                Run
      40
                                                                             40                                                                       40




      20
                                                                             20                                                                       20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11
                                                                                   S1   S2   S3   S4   S5    S6   S7   S8   S9   S10   S11                   S1   S2   S3   S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.




                            Considerate Approaches to ABC Model Selection                                   Stumpf et al.              ABC Summary Statistics                                                    9 of 15
Examples: Population Genetics
Constant Population                                                               Exponential                                                               Two-Island Model
       Size                                                                    Population Growth                                                             with Migration
      100
                                                                             100                                                                      100




      80
                                                                             80                                                                       80




      60
                                                                             60                                                                       60
Run




                                                                       Run




                                                                                                                                                Run
      40
                                                                             40                                                                       40




      20
                                                                             20                                                                       20




             S1   S2   S3    S4   S5   S6   S7   S8   S9   S10   S11
                                                                                   S1   S2   S3   S4   S5    S6   S7   S8   S9   S10   S11                   S1   S2   S3   S4   S5   S6   S7   S8   S9   S10   S11




            [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP
            Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between
            haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.


            Summary Statistic Choice
            The choice of summary statistics appears to depend subtely on the
            true data-generating model. In light of coalescent processes this is,
            however, to be expected.

                            Considerate Approaches to ABC Model Selection                                   Stumpf et al.              ABC Summary Statistics                                                    9 of 15
Examples: Random Walks

       Classical Random                                    Persistent Random                                          Biased Random
             Walk                                                 Walk                                                     Walk
      100                                                  100                                                  100




      80                                                   80                                                   80




      60                                                   60                                                   60
Run




                                                     Run




                                                                                                          Run
      40                                                   40                                                   40




      20                                                   20                                                   20




                 S1    S2    S3    S4    S5                      S1   S2     S3    S4      S5                          S1   S2   S3   S4   S5




            [S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightness
            index; [S5] Eigenvalues of gyration tensor.


            Parameter Sufficiency for Complex Problems
            Here all statistics that have been chosen for parameter estimation are
            also chosen for model selection.


                      Considerate Approaches to ABC Model Selection        Stumpf et al.        ABC Summary Statistics                          9 of 15
Conditioning on Information


                      Θ



   s1                 s2                   s3




    Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   10 of 15
Conditioning on Information


                         Θ



     s1                  s2                   x

Statistics
  Sufficient: Implicates same area as
             full data.
   Ancillary: Implicates all values of θ
              equally.
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   10 of 15
Conditioning on Information

                                                                 What is the meaning of
                         Θ                                       p(θ|s0 , s1 , . . . , sn )?
                                                                 Let s = (s0 , s1 , . . . , sn ), and
                                                                 assume I (θ, s) < I (θ, x ) but
                                                                   → 0.
                                                                 This can happen for sufficient
                                                                 and ancillary s. In the latter
     s1                  s2                   x                  case we obtain

                                                                               p(θ|s) = π(θ).
Statistics
  Sufficient: Implicates same area as
             full data.
   Ancillary: Implicates all values of θ
              equally.
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC             10 of 15
Conditioning on Information

                                                                 What is the meaning of
                         Θ                                       p(θ|s0 , s1 , . . . , sn )?
                                                                 Let s = (s0 , s1 , . . . , sn ), and
                                                                 assume I (θ, s) < I (θ, x ) but
                                                                   → 0.
                                                                 This can happen for sufficient
                                                                 and ancillary s. In the latter
     s1                  s2                   x                  case we obtain

                                                                               p(θ|s) = π(θ).
Statistics
  Sufficient: Implicates same area as                               How about
             full data.
                                                                                          p(t |s)
   Ancillary: Implicates all values of θ
              equally.                                           if s is not (quite) sufficient?
       Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC             10 of 15
Model Selection vs. Model Checking

Model Selection: Several models M ∈ M are compared and one or
              more are chosen in light of the data: Find models which
              are better than others.
Model Checking: The quality of a model Mi is assessed against the
              available data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].




     Considerate Approaches to ABC Model Selection   Stumpf et al.   Interpreting ABC   11 of 15
Model Selection vs. Model Checking

Model Selection: Several models M ∈ M are compared and one or
              more are chosen in light of the data: Find models which
              are better than others.
Model Checking: The quality of a model Mi is assessed against the
              available data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].

Posterior Predictive Checks
We are interested in the posterior predictive distribution,

                   p(t (X )|s(X )) =                  p(t (X )|θ)p(θ|s(X ))d θ.
                                                Θ
In particular we have

                                p(s(X )|s(X )) = p(s(X )|X )
unless t (X ) is sufficient.

      Considerate Approaches to ABC Model Selection    Stumpf et al.   Interpreting ABC   11 of 15
ABC on Network Data




              (e) Duplication attachment (f) Duplication attachment
                                                   with complimentarity




                                                                       wj
            (g) Linear preferential                                     wi
                                                        (h) General scale-free
            attachment
   Considerate Approaches to ABC Model Selection   Stumpf et al.   Network Evolution   12 of 15
ABC on Network Data

Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
  evolution, but this does not allow us to
  calculate likelihoods for all but very
  trivial models.
• There is also no sufficient statistic that
  would allow us to summarize networks,
  so ABC approaches require some
  thought.
• Many possible summary statistics of
  networks are expensive to calculate.
    Full likelihood: Wiuf et al., PNAS (2006).
             ABC: Ratman et al., PLoS Comp.Biol. (2008).
     ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012).
                                                                             Stumpf & Wiuf, J. Roy. Soc. Interface (2010).

           Considerate Approaches to ABC Model Selection     Stumpf et al.     Network Evolution                             12 of 15
Spectral Distances
                   c                                                    a b c d e
                                                                                                
                                                                        0    1    1      1   0       a
                                                                                                
     a                            d           e             
                                                                       1    0    1      1   0   b
                                                                                                 
                                                         A =           1    1    0      0   0   c
                                                                                                
                                                            
                                                                       1    1    0      0   1   d
                                                                                                 
                   b                                                    0    0    0      1   0       e

Graph Spectra
Given a graph G with nodes N and edges (i , j ) ∈ E with i , j ∈ N, the
adjacency matrix, A, of the graph is defined by
                                               1      if (i , j ) ∈ E ,
                                  ai ,j =
                               0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining the
graph spectrum.
     Considerate Approaches to ABC Model Selection   Stumpf et al.   Network Evolution                   12 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j




     Considerate Approaches to ABC Model Selection     Stumpf et al.   Network Evolution   13 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )         Dh (A, B ) =                     (ai ,j − bh(i ),h(j ) )2 ,
                                                             i ,j




     Considerate Approaches to ABC Model Selection     Stumpf et al.      Network Evolution      13 of 15
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                               D (A, B ) =                  (ai ,j − bi ,j )2 .
                                                     i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )         Dh (A, B ) =                     (ai ,j − bh(i ),h(j ) )2 ,
                                                             i ,j

Given a spectrum (which is relatively cheap to compute) we have

                                                               (α)          (β) 2
                           D (A, B ) =                      λl        − λl
                                                 l


     Considerate Approaches to ABC Model Selection     Stumpf et al.      Network Evolution      13 of 15
Protein Interaction Network Data
   Species                 Proteins       Interactions        Genome size           Sampling fraction
   S.cerevisiae                5035             22118                    6532             0.77
   D. melanogaster             7506             22871                  14076              0.53
   H. pylori                     715                1423                 1589             0.45
   E. coli                     1888                 7008                 5416             0.35




    Considerate Approaches to ABC Model Selection    Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                               Species                         Proteins        Interactions           Genome size           Sampling fraction
                               S.cerevisiae                       5035                     22118                 6532             0.77
                               D. melanogaster                    7506                     22871               14076              0.53
                               H. pylori                           715                      1423                 1589             0.45
                               E. coli                            1888                      7008                 5416             0.35



                    0.5




                    0.4
Model probability




                                                                       Organism
                    0.3                                                   S.cerevisae
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli

                    0.2




                    0.1




                    0.0


                          DA      DAC    LPA       SF   DACL    DACR
                                           Model

                               Considerate Approaches to ABC Model Selection                 Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                               Species                         Proteins        Interactions           Genome size           Sampling fraction
                               S.cerevisiae                       5035                     22118                 6532             0.77
                               D. melanogaster                    7506                     22871               14076              0.53
                               H. pylori                           715                      1423                 1589             0.45
                               E. coli                            1888                      7008                 5416             0.35



                    0.5
                                                                                                Model Selection
                                                                                                 • Inference here was based on all
                    0.4

                                                                                                   the data, not summary
Model probability




                    0.3
                                                                       Organism
                                                                          S.cerevisae              statistics.
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli                 • Duplication models receive the
                    0.2
                                                                                                   strongest support from the data.
                    0.1                                                                          • Several models receive support
                                                                                                   and no model is chosen
                    0.0
                                                                                                   unambiguously.
                          DA      DAC    LPA       SF   DACL    DACR
                                           Model

                               Considerate Approaches to ABC Model Selection                 Stumpf et al.   Network Evolution                  14 of 15
Protein Interaction Network Data
                           δ                     α


          15




                                     8
                                     6
          10
   DA




                                     4
          5




                                     2
          0




                                     0
               0.0   0.4       0.8       0.0   0.4   0.8
                           δ                     α
          15




                                     8
                                     6
          10
   DAC




                                     4
          5




                                     2




                                                                                                                        S.cerevisiae
          0




                                     0




               0.0   0.4       0.8       0.0   0.4   0.8                                                                D. melanogaster
                           δ                     α                      p                              m
                                                                                                                        H. pylori




                                                                                     1.0
          10




                                                           10
                                     4




                                                                                     0.8
          8




                                                           8
                                                                                                                        E. coli
                                     3




                                                                                     0.6
          6




                                                           6
   DACL




                                     2




                                                                                     0.4
          4




                                                           4
                                     1




                                                                                     0.2
          2




                                                           2




                                                                                     0.0
          0




                                     0




                                                           0




               0.0   0.4       0.8       0.0   0.4   0.8        0.0   0.4   0.8            0   2   4       6   8   10
                           δ                     α                      p                              m


                                                                                     1.0
                                     4




                                                           5
          8




                                                                                     0.8
                                                           4
                                     3
          6




                                                                                     0.6
                                                           3
   DACR




                                     2
          4




                                                                                     0.4
                                                           2
                                     1
          2




                                                                                     0.2
                                                           1




                                                                                     0.0
          0




                                     0




                                                           0




               0.0   0.4       0.8       0.0   0.4   0.8        0.0   0.4   0.8            0   2   4       6   8   10




          Considerate Approaches to ABC Model Selection                     Stumpf et al.          Network Evolution                  14 of 15
Considerate Use of ABC

• ABC is a tool for situations where conventional statistical
  approaches fail or are too cumbersome.
• If all the data are used then this is (relatively) unproblematic; if the
  data are compressed/corrupted then caution is required.
• Some of the issues arising in ABC mirror those also encountered
  in “conventional” statistics:
          Any Bayesian inference uses the data only via the minimal
          sufficient statistic. This is because the calculation of the
          posterior distribution involves multiplying the likelihood by the
          prior and normalizing. Any factor of the likelihood that is a
          function of y alone will disappear after normalization.

  D. Cox (2006).
• In other cases it seems prudent to accept the additional (and
  considerable) computational cost of constructing suitable summary
  statistics (such as in Barnes et al., Stat&Comp 2012).
     Considerate Approaches to ABC Model Selection   Stumpf et al.   Conclusion   15 of 15
Acknowledgements




   Considerate Approaches to ABC Model Selection   Stumpf et al.   Conclusion   15 of 15

More Related Content

What's hot

ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapterChristian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Yes III: Computational methods for model choice
Yes III: Computational methods for model choiceYes III: Computational methods for model choice
Yes III: Computational methods for model choiceChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 

What's hot (20)

ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Yes III: Computational methods for model choice
Yes III: Computational methods for model choiceYes III: Computational methods for model choice
Yes III: Computational methods for model choice
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 

Viewers also liked

Developmental homeostasis
Developmental homeostasisDevelopmental homeostasis
Developmental homeostasisGowthami R
 
Intermediate Level - DNA and Jewish Genealogy
Intermediate Level - DNA and Jewish GenealogyIntermediate Level - DNA and Jewish Genealogy
Intermediate Level - DNA and Jewish GenealogyKeith Rothschild
 
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...Pasteur_Tunis
 
Runs of Homozygosity presentation
Runs of Homozygosity presentationRuns of Homozygosity presentation
Runs of Homozygosity presentationHadeel Abu Jamous
 
The complete genome sequence of a neanderthal article presentation
The complete genome sequence of a neanderthal article presentationThe complete genome sequence of a neanderthal article presentation
The complete genome sequence of a neanderthal article presentationSveta Jagannathan
 

Viewers also liked (7)

Developmental homeostasis
Developmental homeostasisDevelopmental homeostasis
Developmental homeostasis
 
Intermediate Level - DNA and Jewish Genealogy
Intermediate Level - DNA and Jewish GenealogyIntermediate Level - DNA and Jewish Genealogy
Intermediate Level - DNA and Jewish Genealogy
 
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
Bioinformatics & biostatistics tools for monogenic and multifactorial disease...
 
Runs of Homozygosity presentation
Runs of Homozygosity presentationRuns of Homozygosity presentation
Runs of Homozygosity presentation
 
The complete genome sequence of a neanderthal article presentation
The complete genome sequence of a neanderthal article presentationThe complete genome sequence of a neanderthal article presentation
The complete genome sequence of a neanderthal article presentation
 
Non-Mendellian genetics
Non-Mendellian geneticsNon-Mendellian genetics
Non-Mendellian genetics
 
Homeostasis
HomeostasisHomeostasis
Homeostasis
 

Similar to Considerate Approaches to ABC Model Selection

Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschChristian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011Christian Robert
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Discussion cabras-robert-130323171455-phpapp02
Discussion cabras-robert-130323171455-phpapp02Discussion cabras-robert-130323171455-phpapp02
Discussion cabras-robert-130323171455-phpapp02Deb Roy
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 

Similar to Considerate Approaches to ABC Model Selection (20)

Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Discussion cabras-robert-130323171455-phpapp02
Discussion cabras-robert-130323171455-phpapp02Discussion cabras-robert-130323171455-phpapp02
Discussion cabras-robert-130323171455-phpapp02
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Lecture_9.pdf
Lecture_9.pdfLecture_9.pdf
Lecture_9.pdf
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 

More from Michael Stumpf

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksMichael Stumpf
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachMichael Stumpf
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsMichael Stumpf
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMichael Stumpf
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataMichael Stumpf
 
Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Michael Stumpf
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksMichael Stumpf
 

More from Michael Stumpf (8)

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory Networks
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric Approach
 
Are Powerlaws Useful?
Are Powerlaws Useful?Are Powerlaws Useful?
Are Powerlaws Useful?
 
Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in Immunobiology
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida Glabrata
 
Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...Statistical analysis of network data and evolution on GPUs: High-performance ...
Statistical analysis of network data and evolution on GPUs: High-performance ...
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networks
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Considerate Approaches to ABC Model Selection

  • 1. Considerate Approaches to ABC Model Selection Michael P.H. Stumpf, Christopher Barnes, Sarah Filippi, Thomas Thorne Theoretical Systems Biology Group 26/06/2012 Considerate Approaches to ABC Model Selection Stumpf et al. 1 of 15
  • 2. Evolving Networks (a) Duplication attachment (b) Duplication attachment with complimentarity wj (c) Linear preferential wi (d) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 2 of 15
  • 3. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 4. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Model Posterior Pr(Mi |T, D) Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 5. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) Pr(Mi |T, D)= ν Pr(D|Mj , T)π(Mj ) j =1 Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 6. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/or Pr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. Evidence Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 7. Inference and Model Selection We have observed data, D, that was generated by some system that we seek to describe by a mathematical model. In principle we can have a model-set, M = {M1 , . . . , Mν }, where each model Mi has an associated parameter θi . We may know the different constituent parts of the system, Xi , and have measurements for some or all of them under some experimental designs, T . Likelihood Prior Model Posterior Pr(D|Mi , T)π(Mi ) For complicated models and/or Pr(Mi |T, D)= ν detailed data the likelihood Pr(D|Mj , T)π(Mj ) evaluation can become j =1 prohibitively expensive. Evidence Approximate Inference We can approximate the likelihood and/or the models. The “true” model is unlikely to be in M anyway. Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
  • 8. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 9. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt But we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 10. Approximate Bayesian Computation We can define the posterior as f (x |θi )π(θi ) p(θi |x ) = p (x ) Here fi (x |θ) is the likelihood which is often hard to evaluate; consider for example dy y = max[0, y +g1 +y ×g2] with g1 , g2 ∼ N(0,σ1/2 ) and ˜ = g (y ; θ). dt But we can still simulate from the data-generating model, whence 1(y = x )f (y |θi )π(θi ) p(θi |x ) = dy X p (x ) 1 (∆(y , x ) < ) f (y |θi )π(θi ) ≈ dy X p (x ) Solutions for Complex Problems (?) Approximate (i) data, (ii) model or (iii) distance. Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
  • 11. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 12. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Sufficient Statistics This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we have p(x |s, θ) = p(x |s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 13. ABC with Summary Statistics If the data, D, are very complex and detailed, direct comparison between real and simulated data becomes prohibitive. In such situations, which originally motivated ABC approaches, summary statistics of the data are compared. We then have pS , (θi |D) ∝ 1 (∆ (S (x )), S (yθ )) < ) f (y |θ)π(θi )dy X Sufficient Statistics This only works is the statistic S (.) is sufficient, i.e. if for s = S (x ) we have p(x |s, θ) = p(x |s) Sufficency for Model Selection If S (.) is sufficient for parameter estimation (in all models i considered) it is not necessarily sufficient for model selection (Robert et al., PNAS (2011)). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 14. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30 600 25 Role of Summary Statistics 20 Mean (sufficient) correctly 400 15 10 infers µ. 200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max 250 300 Var fails to capture any 200 250 information on µ. 200 150 150 100 100 50 50 0 0 −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 15. ABC with Summary Statistics Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming that σ2 = 1 is known). mean var 30 600 25 Role of Summary Statistics 20 Mean (sufficient) correctly 400 15 10 infers µ. 200 5 Max/Min capture some 0 −4 −2 0 2 4 0 −4 −2 0 2 4 information on µ. min max 250 300 Var fails to capture any 200 250 information on µ. 200 150 150 100 We need a way of constructing 100 50 50 sets of statistics that together are 0 0 (approximately) sufficient. −4 −2 0 2 4 −4 −2 0 2 4 θ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
  • 16. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 17. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Information Theoretical Perspective A summary statistic is an information compression device. Now let S be a set of statistics which together are sufficient. Then the mutual information p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x ) Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 18. A Closer Look at Summary Statistics We interpret a summary statistic as a function, S : Rd −→ Rw , S(x ) = s. If S is sufficient then (we include the model indicator variable in θ) p(θ|x ) = p(θ|s) Information Theoretical Perspective A summary statistic is an information compression device. Now let S be a set of statistics which together are sufficient. Then the mutual information p(θ, x ) I (Θ; X ) = p(θ, x ) log d θdx = I (θ, S) Ω X p(θ)p(x ) Constructing Minimally Sufficient Summary Statistics We seek the set U ⊆ S with minimal cardinality such that I (Θ; S) = I (Θ; U). Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
  • 19. Constructing Sufficient Statistics Proposition Let X be a random variable generated according to f (·|θ). Let S be a summary statistic and U and T two subsets of S such that U = U(X ), T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have I (Θ; S |T ) = I (Θ; S |U ) − I (Θ; T |U ) . In order to construct a subset T of S such that I (Θ; S |T ) = 0, it is thus sufficient to add statistics from S one by one until the condition holds. If we denote by S(k ) the kth statistic to be added (with k w) we have S(k ) = S(k ) (X ), and then I (Θ; S |S(1) , . . . , S(k +1) ) I (Θ; S |S(1) , . . . , S(k ) ) . Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 20. Constructing Sufficient Statistics p(θ, S(x )|U(x )) I (Θ; S |U ) = p(θ, S(x ), U(x )) log dxd θ Ω X p(θ|U(x ))p(S(x )|U(x )) = p(S(x )) [KL(p(Θ|S(x ))||p(Θ|U(x )))] dx X = Ep(X ) [KL(p(Θ|S(X ))||p(Θ|U(X )))] An Impossible Algorithm • for all subsets u ∗ ⊆ s ∗ , perform ABC to obtain estimates p (Θ|u ∗ ) • determine the set A = {u ∗ ⊂ s∗ such that KL (p (Θ|s∗ )||p (Θ|u ∗ )) = 0}, • the desired subset is argminu ∗ ∈A |u ∗ | Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 21. Constructing Sufficient Statistics input: a sufficient set of statistics whose values on the dataset is s∗ = {s1 , . . . , sw }, a threshold δ ∗ ∗ output: a subset v ∗ of s∗ choose randomly u ∗ in s∗ v ∗ ← u∗ q ∗ ← s ∗ v ∗ repeat repeat if q ∗ = Ø then return v ∗ end if choose randomly u ∗ in q ∗ q ∗ ← q ∗ u ∗ perform ABC to obtain p (Θ|v ∗ , u ∗ ) until KL (p (Θ|v ∗ , u ∗ )||p (Θ|v ∗ )) δ optionally: v ∗ ← OrderDependency (v ∗ , u ∗ ) v ∗ ← v ∗ ∪ u∗ q ∗ ← s ∗ v ∗ until q ∗ = Ø return v ∗ Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
  • 22. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 100 100 80 80 60 60 Run Run 40 40 20 20 mean S2 range max random mean S2 range max random Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  • 23. Examples: Normal Distributions y1 , ...yd ∼ N(µ, σ2 ) and y1 , ...yd ∼ N(µ, σ2 ) 1 2 6 q q q 8 q qqq qqq q q q qq q q qq q qq q q q q q q q 6 4 q q q q qq q qq q q q q q qq q q q q q q q q qq q q q q q q q q qq q qq q q q qq log(BF) ABC log(BF) ABC q q q 4 q q q q q 2 q qq q q qq q q q q q q qq q q q q qq q q qq q q q q q q q qqqqqq qq q 2 q q q q qq qq qqq q qq q q q q q q q q qq q q q qq q q q q q q qq q q qq q q q q qq q q 0 qq q q qq q q q q 0 q q q q q q −2 −2 q q −2 0 2 4 6 8 −2 0 2 4 6 8 log(BF) predicted log(BF) predicted Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
  • 24. Examples: Population Genetics Constant Population Size 100 80 60 Run 40 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 25. Examples: Population Genetics Constant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 26. Examples: Population Genetics Constant Population Exponential Two-Island Model Size Population Growth with Migration 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 [S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNP Homozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences between haplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium. Summary Statistic Choice The choice of summary statistics appears to depend subtely on the true data-generating model. In light of coalescent processes this is, however, to be expected. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 27. Examples: Random Walks Classical Random Persistent Random Biased Random Walk Walk Walk 100 100 100 80 80 80 60 60 60 Run Run Run 40 40 40 20 20 20 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 [S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightness index; [S5] Eigenvalues of gyration tensor. Parameter Sufficiency for Complex Problems Here all statistics that have been chosen for parameter estimation are also chosen for model selection. Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
  • 28. Conditioning on Information Θ s1 s2 s3 Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 29. Conditioning on Information Θ s1 s2 x Statistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 30. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ). Statistics Sufficient: Implicates same area as full data. Ancillary: Implicates all values of θ equally. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 31. Conditioning on Information What is the meaning of Θ p(θ|s0 , s1 , . . . , sn )? Let s = (s0 , s1 , . . . , sn ), and assume I (θ, s) < I (θ, x ) but → 0. This can happen for sufficient and ancillary s. In the latter s1 s2 x case we obtain p(θ|s) = π(θ). Statistics Sufficient: Implicates same area as How about full data. p(t |s) Ancillary: Implicates all values of θ equally. if s is not (quite) sufficient? Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
  • 32. Model Selection vs. Model Checking Model Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others. Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’. Alternative Approach: ABCµ [Ratmann et al., PNAS]. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  • 33. Model Selection vs. Model Checking Model Selection: Several models M ∈ M are compared and one or more are chosen in light of the data: Find models which are better than others. Model Checking: The quality of a model Mi is assessed against the available data: Determine if a model is actually ‘good’. Alternative Approach: ABCµ [Ratmann et al., PNAS]. Posterior Predictive Checks We are interested in the posterior predictive distribution, p(t (X )|s(X )) = p(t (X )|θ)p(θ|s(X ))d θ. Θ In particular we have p(s(X )|s(X )) = p(s(X )|X ) unless t (X ) is sufficient. Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
  • 34. ABC on Network Data (e) Duplication attachment (f) Duplication attachment with complimentarity wj (g) Linear preferential wi (h) General scale-free attachment Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 35. ABC on Network Data Summarizing Networks • Data are noisy and incomplete. • We can simulate models of network evolution, but this does not allow us to calculate likelihoods for all but very trivial models. • There is also no sufficient statistic that would allow us to summarize networks, so ABC approaches require some thought. • Many possible summary statistics of networks are expensive to calculate. Full likelihood: Wiuf et al., PNAS (2006). ABC: Ratman et al., PLoS Comp.Biol. (2008). ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012). Stumpf & Wiuf, J. Roy. Soc. Interface (2010). Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 36. Spectral Distances c a b c d e   0 1 1 1 0 a   a d e   1 0 1 1 0 b  A = 1 1 0 0 0 c     1 1 0 0 1 d  b 0 0 0 1 0 e Graph Spectra Given a graph G with nodes N and edges (i , j ) ∈ E with i , j ∈ N, the adjacency matrix, A, of the graph is defined by 1 if (i , j ) ∈ E , ai ,j = 0 otherwise. The eigenvalues, λ, of this matrix provide one way of defining the graph spectrum. Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
  • 37. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 38. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 39. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Given a spectrum (which is relatively cheap to compute) we have (α) (β) 2 D (A, B ) = λl − λl l Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
  • 40. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 41. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 0.4 Model probability Organism 0.3 S.cerevisae D.melanogaster H.pylori E.coli 0.2 0.1 0.0 DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 42. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 Model Selection • Inference here was based on all 0.4 the data, not summary Model probability 0.3 Organism S.cerevisae statistics. D.melanogaster H.pylori E.coli • Duplication models receive the 0.2 strongest support from the data. 0.1 • Several models receive support and no model is chosen 0.0 unambiguously. DA DAC LPA SF DACL DACR Model Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 43. Protein Interaction Network Data δ α 15 8 6 10 DA 4 5 2 0 0 0.0 0.4 0.8 0.0 0.4 0.8 δ α 15 8 6 10 DAC 4 5 2 S.cerevisiae 0 0 0.0 0.4 0.8 0.0 0.4 0.8 D. melanogaster δ α p m H. pylori 1.0 10 10 4 0.8 8 8 E. coli 3 0.6 6 6 DACL 2 0.4 4 4 1 0.2 2 2 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 δ α p m 1.0 4 5 8 0.8 4 3 6 0.6 3 DACR 2 4 0.4 2 1 2 0.2 1 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
  • 44. Considerate Use of ABC • ABC is a tool for situations where conventional statistical approaches fail or are too cumbersome. • If all the data are used then this is (relatively) unproblematic; if the data are compressed/corrupted then caution is required. • Some of the issues arising in ABC mirror those also encountered in “conventional” statistics: Any Bayesian inference uses the data only via the minimal sufficient statistic. This is because the calculation of the posterior distribution involves multiplying the likelihood by the prior and normalizing. Any factor of the likelihood that is a function of y alone will disappear after normalization. D. Cox (2006). • In other cases it seems prudent to accept the additional (and considerable) computational cost of constructing suitable summary statistics (such as in Barnes et al., Stat&Comp 2012). Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15
  • 45. Acknowledgements Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15