Likelihood-free Methods




                           Likelihood-free Methods

                                  Christian P. Robert

                             Universit´ Paris-Dauphine & CREST
                                      e
                          http://www.ceremade.dauphine.fr/~xian


                                  February 16, 2012
Likelihood-free Methods




Outline
Introduction

The Metropolis-Hastings Algorithm

simulation-based methods in Econometrics

Genetics of ABC

Approximate Bayesian computation

ABC for model choice

ABC model choice consistency
Likelihood-free Methods
   Introduction




Approximate Bayesian computation

       Introduction
           Monte Carlo basics
           Population Monte Carlo
           AMIS

       The Metropolis-Hastings Algorithm

       simulation-based methods in Econometrics

       Genetics of ABC

       Approximate Bayesian computation
Likelihood-free Methods
   Introduction
     Monte Carlo basics


General purpose



       Given a density π known up to a normalizing constant, and an
       integrable function h, compute

                                                     h(x)˜ (x)µ(dx)
                                                          π
                          Π(h) =   h(x)π(x)µ(dx) =
                                                       π (x)µ(dx)
                                                       ˜

       when         h(x)˜ (x)µ(dx) is intractable.
                        π
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Monte Carlo basics

       Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                     N
                                    ΠMC (h) = N −1
                                    ˆ
                                      N                    h(xi ).
                                                     i=1

             ˆ                 as
       LLN: ΠMC (h) −→ Π(h)
               N
       If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
                          √                      L
             CLT:               ˆ
                              N ΠMC (h) − Π(h)
                                  N                  N 0, Π [h − Π(h)]2   .
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Monte Carlo basics

       Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by
                                                     N
                                    ΠMC (h) = N −1
                                    ˆ
                                      N                    h(xi ).
                                                     i=1

             ˆ                 as
       LLN: ΠMC (h) −→ Π(h)
               N
       If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞,
                          √                      L
             CLT:               ˆ
                              N ΠMC (h) − Π(h)
                                  N                  N 0, Π [h − Π(h)]2   .


       Caveat
       Often impossible or inefficient to simulate directly from Π
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Importance Sampling


       For Q proposal distribution such that Q(dx) = q(x)µ(dx),
       alternative representation

                          Π(h) =   h(x){π/q}(x)q(x)µ(dx).
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Importance Sampling


       For Q proposal distribution such that Q(dx) = q(x)µ(dx),
       alternative representation

                          Π(h) =   h(x){π/q}(x)q(x)µ(dx).


       Principle
       Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by
                                           N
                          ΠIS (h) = N −1
                          ˆ
                            Q,N                  h(xi ){π/q}(xi ).
                                           i=1
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Importance Sampling (convergence)


       Then
            ˆ        as
       LLN: ΠIS (h) −→ Π(h)
              Q,N                             and if Q((hπ/q)2 ) < ∞,
                          √                       L
           CLT:                 ˆ
                              N(ΠIS (h) − Π(h))
                                  Q,N                 N 0, Q{(hπ/q − Π(h))2 } .
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Importance Sampling (convergence)


       Then
            ˆ        as
       LLN: ΠIS (h) −→ Π(h)
              Q,N                             and if Q((hπ/q)2 ) < ∞,
                          √                       L
           CLT:                 ˆ
                              N(ΠIS (h) − Π(h))
                                  Q,N                 N 0, Q{(hπ/q − Π(h))2 } .


       Caveat
                                                          ˆ
       If normalizing constant unknown, impossible to use ΠIS
                                                            Q,N

       Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Self-Normalised Importance Sampling


       Self normalized version
                                   N                  −1 N
                   ˆ
                   ΠSNIS (h)   =         {π/q}(xi )           h(xi ){π/q}(xi ).
                     Q,N
                                   i=1                  i=1
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Self-Normalised Importance Sampling


       Self normalized version
                                   N                   −1 N
                   ˆ
                   ΠSNIS (h)   =         {π/q}(xi )            h(xi ){π/q}(xi ).
                     Q,N
                                   i=1                   i=1

               ˆ          as
       LLN : ΠSNIS (h) −→ Π(h)
                 Q,N
       and if Π((1 + h2 )(π/q)) < ∞,
               √                               L
       CLT :         ˆ
                 N(ΠSNIS (h) − Π(h))
                          Q,N                      N    0, π {(π/q)(h − Π(h)}2 ) .
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Self-Normalised Importance Sampling


       Self normalized version
                                   N                  −1 N
                   ˆ
                   ΠSNIS (h)   =         {π/q}(xi )           h(xi ){π/q}(xi ).
                     Q,N
                                   i=1                  i=1

               ˆ          as
       LLN : ΠSNIS (h) −→ Π(h)
                 Q,N
       and if Π((1 + h2 )(π/q)) < ∞,
               √                               L
       CLT :         ˆ
                 N(ΠSNIS (h) − Π(h))
                          Q,N        N 0, π {(π/q)(h − Π(h)}2 ) .
        c The quality of the SNIS approximation depends on the
       choice of Q
Likelihood-free Methods
   Introduction
     Monte Carlo basics


Iterated importance sampling

       Introduction of an algorithmic temporal dimension :
                          (t)                (t−1)
                      xi        ∼ qt (x|xi           )             i = 1, . . . , n,   t = 1, . . .

       and
                                                               n
                                            ˆ    1                      (t)  (t)
                                            It =                        i h(xi )
                                                 n
                                                           i=1

       is still unbiased for
                                                         (t)
                                 (t)          πt (xi )
                                 i     =       (t)        (t−1)
                                                                        ,     i = 1, . . . , n
                                           qt (xi |xi               )
Likelihood-free Methods
   Introduction
     Population Monte Carlo



       PMC: Population Monte Carlo Algorithm
       At time t = 0
                                        iid
                  Generate (xi,0 )1≤i≤N ∼ Q0
                  Set ωi,0 = {π/q0 }(xi,0 )
                                        iid
                  Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N )
                                                ω
                  Set xi,0 = xJi ,0
                      ˜
Likelihood-free Methods
   Introduction
     Population Monte Carlo



       PMC: Population Monte Carlo Algorithm
       At time t = 0
                                           iid
                  Generate (xi,0 )1≤i≤N ∼ Q0
                  Set ωi,0 = {π/q0 }(xi,0 )
                                           iid
                  Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N )
                                                ω
                  Set xi,0 = xJi ,0
                      ˜
       At time t (t = 1, . . . , T ),
                                  ind
                  Generate xi,t ∼ Qi,t (˜i,t−1 , ·)
                                         x
                  Set ωi,t = {π(xi,t )/qi,t (˜i,t−1 , xi,t )}
                                             x
                                           iid
                  Generate (Ji,t )1≤i≤N ∼ M(1, (¯ i,t )1≤i≤N )
                                                ω
                  Set xi,t = xJi,t ,t .
                      ˜

                  [Capp´, Douc, Guillin, Marin, & CPR, 2009, Stat.& Comput.]
                       e
Likelihood-free Methods
   Introduction
     Population Monte Carlo


Notes on PMC


       After T iterations of PMC, PMC estimator of Π(h) given by
                                             T   N
                              ¯          1
                              ΠPMC (h) =
                                N,T
                                                       ¯
                                                       ωi,t h(xi,t ).
                                         T
                                             t=1 i=1
Likelihood-free Methods
   Introduction
     Population Monte Carlo


Notes on PMC


       After T iterations of PMC, PMC estimator of Π(h) given by
                                             T   N
                              ¯          1
                              ΠPMC (h) =
                                N,T
                                                       ¯
                                                       ωi,t h(xi,t ).
                                         T
                                             t=1 i=1


          1. ωi,t means normalising over whole sequence of simulations
             ¯
          2. Qi,t ’s chosen arbitrarily under support constraint
          3. Qi,t ’s may depend on whole sequence of simulations
          4. natural solution for     ABC
Likelihood-free Methods
   Introduction
     Population Monte Carlo


Improving quality
       The efficiency of the SNIS approximation depends on the choice of
       Q, ranging from optimal

                              q(x) ∝ |h(x) − Π(h)|π(x)

       to useless
                                    ˆ
                                var ΠSNIS (h) = +∞
                                      Q,N
Likelihood-free Methods
   Introduction
     Population Monte Carlo


Improving quality
       The efficiency of the SNIS approximation depends on the choice of
       Q, ranging from optimal

                                  q(x) ∝ |h(x) − Π(h)|π(x)

       to useless
                                         ˆ
                                     var ΠSNIS (h) = +∞
                                           Q,N


       Example (PMC=adaptive importance sampling)
       Population Monte Carlo is producing a sequence of proposals Qt
       aiming at improving efficiency
                                              ˆ               ˆ
         Kull(π, qt ) ≤ Kull(π, qt−1 ) or var ΠSNIS (h) ≤ var ΠSNIS ,∞ (h)
                                                Qt ,∞           Qt−1

                          [Capp´, Douc, Guillin, Marin, Robert, 04, 07a, 07b, 08]
                               e
Likelihood-free Methods
   Introduction
     AMIS


Multiple Importance Sampling

       Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T
       generate an iid sample
                                    t            t
                                   x1 , . . . , xN ∼ Qt

       and estimate Π(h) by
                                             T             N
                          ΠMIS (h) = T −1
                          ˆ
                            Q,N                   N   −1
                                                                 h(xit )ωit
                                            t=1            i=1

        where
                                             π(xit )
                                     ωit =                                    correct...
                                             qt (xit )
Likelihood-free Methods
   Introduction
     AMIS


Multiple Importance Sampling

       Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T
       generate an iid sample
                                      t            t
                                     x1 , . . . , xN ∼ Qt

       and estimate Π(h) by
                                               T             N
                          ΠMIS (h) = T −1
                          ˆ
                            Q,N                     N   −1
                                                                   h(xit )ωit
                                              t=1            i=1

        where
                                         π(xit )
                          ωit =           T
                                                                                still correct!
                                  T −1     =1 q     (xit )
Likelihood-free Methods
   Introduction
     AMIS


Mixture representation


       Deterministic mixture correction of the weights proposed by Owen
       and Zhou (JASA, 2000)
                  The corresponding estimator is still unbiased [if not
                  self-normalised]
                  All particles are on the same weighting scale rather than their
                  own
                  Large variance proposals Qt do not take over
                  Variance reduction thanks to weight stabilization & recycling
                  [K.o.] removes the randomness in the component choice
                  [=Rao-Blackwellisation]
Likelihood-free Methods
   Introduction
     AMIS


Global adaptation


       Global Adaptation
       At iteration t = 1, · · · , T ,
                                                        ˆ
          1. For 1 ≤ i ≤ N1 , generate xit ∼ T3 (ˆt−1 , Σt−1 )
                                                 µ
          2. Calculate the mixture importance weight of particle xit

                                           ωit = π xit       δit

                  where
                                           t−1
                                 δit   =                      ˆ ˆ
                                                 qT (3) xit ; µl , Σl
                                           l=0
Likelihood-free Methods
   Introduction
     AMIS




       Backward reweighting
          3. If t ≥ 2, actualize the weights of all past particles, xil
             1≤l ≤t −1
                                      ωil = π xit δil
                  where
                                                                ˆ
                                δil = δil + qT (3) xil ; µt−1 , Σt−1
                                                         ˆ

                                                                     ˆ
          4. Compute IS estimates of target mean and variance µt and Σt ,
                                                              ˆ
             where
                                      t   N1                  t   N1
                           µt
                           ˆj    =             ωil (xj )li             ωil . . .
                                     l=1 i=1                 l=1 i=1
Likelihood-free Methods
   Introduction
     AMIS


A toy example



                             10
                             0
                             −10
                             −20
                             −30




                                   −40   −20   0         20   40




       Banana shape benchmark: marginal distribution of (x1 , x2 ) for the parameters
        2
       σ1 = 100 and b = 0.03. Contours represent 60% (red), 90% (black) and 99.9%
       (blue) confidence regions in the marginal space.
Likelihood-free Methods
   Introduction
     AMIS


A toy example

                                      p=5                                 p=10               p=20

                                  q
                          1e+05




                                                20000 40000 60000 80000




                                                                                     60000
                          8e+04
                          6e+04




                                                                                     20000
                                            q
                          4e+04




                                                                                 q



                                            q                                    q                  q




                                                                                     0
       Banana shape example: boxplots of 10 replicate ESSs for the AMIS scheme
       (left) and the NOT-AMIS scheme (right) for p = 5, 10, 20.
Likelihood-free Methods
   The Metropolis-Hastings Algorithm




The Metropolis-Hastings Algorithm

       Introduction

       The Metropolis-Hastings Algorithm
          Monte Carlo Methods based on Markov Chains
          The Metropolis–Hastings algorithm
          The random walk Metropolis-Hastings algorithm

       simulation-based methods in Econometrics

       Genetics of ABC

       Approximate Bayesian computation
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains


       Epiphany! It is not necessary to use a sample from the distribution
       f to approximate the integral

                                         I=       h(x)f (x)dx ,
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains


       Epiphany! It is not necessary to use a sample from the distribution
       f to approximate the integral

                                         I=       h(x)f (x)dx ,


       Principle: Obtain X1 , . . . , Xn ∼ f (approx) without directly
       simulating from f , using an ergodic Markov chain with stationary
       distribution f
                                                     [Metropolis et al., 1953]
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)

       Idea
       For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
       generated using a transition kernel with stationary distribution f
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)

       Idea
       For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
       generated using a transition kernel with stationary distribution f


              Insures the convergence in distribution of (X (t) ) to a random
              variable from f .
              For a “large enough” T0 , X (T0 ) can be considered as
              distributed from f
              Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
              generated from f , sufficient for most approximation purposes.
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     Monte Carlo Methods based on Markov Chains


Running Monte Carlo via Markov Chains (2)

       Idea
       For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is
       generated using a transition kernel with stationary distribution f


              Insures the convergence in distribution of (X (t) ) to a random
              variable from f .
              For a “large enough” T0 , X (T0 ) can be considered as
              distributed from f
              Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
              generated from f , sufficient for most approximation purposes.
       Problem: How can one build a Markov chain with a given
       stationary distribution?
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


The Metropolis–Hastings algorithm



       Basics
       The algorithm uses the objective (target) density

                                           f

       and a conditional density
                                         q(y |x)
       called the instrumental (or proposal) distribution
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


The MH algorithm

       Algorithm (Metropolis–Hastings)
       Given x (t) ,
         1. Generate Yt ∼ q(y |x (t) ).
         2. Take

                                             Yt      with prob. ρ(x (t) , Yt ),
                            X (t+1) =
                                             x (t)   with prob. 1 − ρ(x (t) , Yt ),

              where
                                                        f (y ) q(x|y )
                                       ρ(x, y ) = min                  ,1   .
                                                        f (x) q(y |x)
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Features



              Independent of normalizing constants for both f and q(·|x)
              (ie, those constants independent of x)
              Never move to values with f (y ) = 0
              The chain (x (t) )t may take the same value several times in a
              row, even though f is a density wrt Lebesgue measure
              The sequence (yt )t is usually not a Markov chain
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Convergence properties

       The M-H Markov chain is reversible, with invariant/stationary
       density f since it satisfies the detailed balance condition
                            f (y ) K (y , x) = f (x) K (x, y )
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The Metropolis–Hastings algorithm


Convergence properties

       The M-H Markov chain is reversible, with invariant/stationary
       density f since it satisfies the detailed balance condition
                            f (y ) K (y , x) = f (x) K (x, y )
       If
                            q(y |x) > 0 for every (x, y ),
       the chain is Harris recurrent and
                                       T
                                1
                          lim                h(X (t) ) =   h(x)df (x)       a.e. f .
                      T →∞      T
                                       t=1


                                lim             K n (x, ·)µ(dx) − f        =0
                                n→∞
                                                                      TV
       for every initial distribution µ
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Random walk Metropolis–Hastings



       Use of a local perturbation as proposal

                                               Yt = X (t) + εt ,

        where εt ∼ g , independent of X (t) .
       The instrumental density is now of the form g (y − x) and the
       Markov chain is a random walk if we take g to be symmetric
       g (x) = g (−x)
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm




       Algorithm (Random walk Metropolis)
       Given x (t)
         1. Generate Yt ∼ g (y − x (t) )
         2. Take
                                                                          f (Yt )
                                         
                                         Y          with prob. min 1,              ,
                              (t+1)         t
                          X            =                                 f (x (t) )
                                          (t)
                                          x          otherwise.
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


RW-MH on mixture posterior distribution
            3
            2
       µ2

            1
            0




                                                              X
            −1




                     −1                0             1    2       3

                                                     µ1


       Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1)
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Acceptance rate




       A high acceptance rate is not indication of efficiency since the
       random walk may be moving “too slowly” on the target surface
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Acceptance rate



       A high acceptance rate is not indication of efficiency since the
       random walk may be moving “too slowly” on the target surface

       If x (t) and yt are “too close”, i.e. f (x (t) ) f (yt ), yt is accepted
       with probability
                                      f (yt )
                               min              ,1     1.
                                     f (x (t) )
       and acceptance rate high
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Acceptance rate




       A high acceptance rate is not indication of efficiency since the
       random walk may be moving “too slowly” on the target surface

       If average acceptance rate low, the proposed values f (yt ) tend to
       be small wrt f (x (t) ), i.e. the random walk [not the algorithm!]
       moves quickly on the target surface often reaching its boundaries
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Rule of thumb




       In small dimensions, aim at an average acceptance rate of 50%. In
       large dimensions, at an average acceptance rate of 25%.
                                        [Gelman,Gilks and Roberts, 1995]
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Noisy AR(1)



       Target distribution of x given x1 , x2 and y is

                          −1                                  τ2
                  exp            (x − ϕx1 )2 + (x2 − ϕx)2 +      (y − x 2 )2   .
                          2τ 2                                σ2

       For a Gaussian random walk with scale ω small enough, the
       random walk never jumps to the other mode. But if the scale ω is
       sufficiently large, the Markov chain explores both modes and give a
       satisfactory approximation of the target distribution.
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Noisy AR(2)




       Markov chain based on a random walk with scale ω = .1.
Likelihood-free Methods
   The Metropolis-Hastings Algorithm
     The random walk Metropolis-Hastings algorithm


Noisy AR(3)




       Markov chain based on a random walk with scale ω = .5.
Likelihood-free Methods
   simulation-based methods in Econometrics




Meanwhile, in a Gaulish village...



       Similar exploration of simulation-based techniques in Econometrics
              Simulated method of moments
              Method of simulated moments
              Simulated pseudo-maximum-likelihood
              Indirect inference
                                              [Gouri´roux & Monfort, 1996]
                                                    e
Likelihood-free Methods
   simulation-based methods in Econometrics




Simulated method of moments

                           o
       Given observations y1:n from a model

                                    yt = r (y1:(t−1) , t , θ) ,       t   ∼ g (·)

       simulate           1:n ,   derive

                                           yt (θ) = r (y1:(t−1) ,    t , θ)

       and estimate θ by
                                                      n
                                           arg min         (yto − yt (θ))2
                                                θ
                                                     t=1
Likelihood-free Methods
   simulation-based methods in Econometrics




Simulated method of moments

                           o
       Given observations y1:n from a model

                                    yt = r (y1:(t−1) , t , θ) ,          t   ∼ g (·)

       simulate           1:n ,   derive

                                           yt (θ) = r (y1:(t−1) ,       t , θ)

       and estimate θ by
                                                   n               n             2
                                      arg min           yto   −         yt (θ)
                                              θ
                                                  t=1             t=1
Likelihood-free Methods
   simulation-based methods in Econometrics




Method of simulated moments

       Given a statistic vector K (y ) with

                                 Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ)

       find an unbiased estimator of k(y1:(t−1) ; θ),

                                                ˜
                                                k( t , y1:(t−1) ; θ)

       Estimate θ by
                                      n                   S
                      arg min                 K (yt ) −         ˜ t
                                                                k( s , y1:(t−1) ; θ)/S
                             θ
                                     t=1                  s=1

                                                                        [Pakes & Pollard, 1989]
Likelihood-free Methods
   simulation-based methods in Econometrics




Indirect inference



                                                       ˆ
       Minimise (in θ) the distance between estimators β based on
       pseudo-models for genuine observations and for observations
       simulated under the true model and the parameter θ.

                                              [Gouri´roux, Monfort, & Renault, 1993;
                                                    e
                                              Smith, 1993; Gallant & Tauchen, 1996]
Likelihood-free Methods
   simulation-based methods in Econometrics




Indirect inference (PML vs. PSE)


       Example of the pseudo-maximum-likelihood (PML)

                             ˆ
                             β(y) = arg max           log f (yt |β, y1:(t−1) )
                                              β
                                                  t


       leading to
                                       ˆ        ˆ
                             arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
                                     θ

       when
                                   ys (θ) ∼ f (y|θ)       s = 1, . . . , S
Likelihood-free Methods
   simulation-based methods in Econometrics




Indirect inference (PML vs. PSE)


       Example of the pseudo-score-estimator (PSE)
                                                                                   2
                          ˆ                           ∂ log f
                          β(y) = arg min                      (yt |β, y1:(t−1) )
                                              β
                                                  t
                                                         ∂β

       leading to
                                       ˆ        ˆ
                             arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
                                     θ

       when
                                   ys (θ) ∼ f (y|θ)         s = 1, . . . , S
Likelihood-free Methods
   simulation-based methods in Econometrics




Consistent indirect inference



                  ...in order to get a unique solution the dimension of
              the auxiliary parameter β must be larger than or equal to
              the dimension of the initial parameter θ. If the problem is
              just identified the different methods become easier...
Likelihood-free Methods
   simulation-based methods in Econometrics




Consistent indirect inference



                  ...in order to get a unique solution the dimension of
              the auxiliary parameter β must be larger than or equal to
              the dimension of the initial parameter θ. If the problem is
              just identified the different methods become easier...

       Consistency depending on the criterion and on the asymptotic
       identifiability of θ
                                      [Gouri´roux, Monfort, 1996, p. 66]
                                             e
Likelihood-free Methods
   simulation-based methods in Econometrics




AR(2) vs. MA(1) example

       true (AR) model
                                              yt =   t   −θ   t−1

       and [wrong!] auxiliary (MA) model

                                       yt = β1 yt−1 + β2 yt−2 + ut

       R code
       x=eps=rnorm(250)
       x[2:250]=x[2:250]-0.5*x[1:249]
       simeps=rnorm(250)
       propeta=seq(-.99,.99,le=199)
       dist=rep(0,199)
       bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef)
       for (t in 1:199)
         dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]*
         simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
Likelihood-free Methods
   simulation-based methods in Econometrics




AR(2) vs. MA(1) example
       One sample:
                                     0.8
                                     0.6
                          distance

                                     0.4
                                     0.2
                                     0.0




                                           −1.0   −0.5   0.0   0.5   1.0
Likelihood-free Methods
   simulation-based methods in Econometrics




AR(2) vs. MA(1) example
       Many samples:
                            6
                            5
                            4
                            3
                            2
                            1
                            0




                                      0.2     0.4   0.6   0.8   1.0
Likelihood-free Methods
   simulation-based methods in Econometrics




Choice of pseudo-model



       Pick model such that
            ˆ
        1. β(θ) not flat
            (i.e. sensitive to changes in θ)
            ˆ
        2. β(θ) not dispersed (i.e. robust agains changes in ys (θ))
                                              [Frigessi & Heggland, 2004]
Likelihood-free Methods
   simulation-based methods in Econometrics




ABC using indirect inference


                  We present a novel approach for developing summary statistics
              for use in approximate Bayesian computation (ABC) algorithms by
              using indirect inference(...) In the indirect inference approach to
              ABC the parameters of an auxiliary model fitted to the data become
              the summary statistics. Although applicable to any ABC technique,
              we embed this approach within a sequential Monte Carlo algorithm
              that is completely adaptive and requires very little tuning(...)

                                               [Drovandi, Pettitt & Faddy, 2011]

              c Indirect inference provides summary statistics for ABC...
Likelihood-free Methods
   simulation-based methods in Econometrics




ABC using indirect inference



                  ...the above result shows that, in the limit as h → 0, ABC will
              be more accurate than an indirect inference method whose auxiliary
              statistics are the same as the summary statistic that is used for
              ABC(...) Initial analysis showed that which method is more
              accurate depends on the true value of θ.

                                                   [Fearnhead and Prangle, 2012]

      c Indirect inference provides estimates rather than global inference...
Likelihood-free Methods
   Genetics of ABC




Genetic background of ABC



       ABC is a recent computational technique that only requires being
       able to sample from the likelihood f (·|θ)
       This technique stemmed from population genetics models, about
       15 years ago, and population geneticists still contribute
       significantly to methodological developments of ABC.
                                 [Griffith & al., 1997; Tavar´ & al., 1999]
                                                              e
Likelihood-free Methods
   Genetics of ABC




Population genetics

                          [Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010]


              Describe the genotypes, estimate the alleles frequencies,
              determine their distribution among individuals, populations
              and between populations;

              Predict and understand the evolution of gene frequencies in
              populations as a result of various factors.

        c Analyses the effect of various evolutive forces (mutation, drift,
       migration, selection) on the evolution of gene frequencies in time
       and space.
Likelihood-free Methods
   Genetics of ABC




A[B]ctors
               Les mécanismes de l’évolution
       Towards an explanation of the mechanisms of evolutionary
       changes, mathematical models ofl’évolution ont to be developed
          •! Les mathématiques de evolution started
       in the years 1920-1930.
               commencé à être développés dans les
               années 1920-1930




             Ronald A Fisher (1890-1962)   Sewall Wright (1889-1988)   John BS Haldane (1892-1964)
Likelihood-free Methods
   Genetics of ABC




Wright-Fisher model de
 Le modèle                Wright-Fisher

                          •! En l’absence de mutation et de
                             sélection,A population of constant
                                        les fréquences
                             alléliquessize, in which individuals
                                        dérivent (augmentent
                             et diminuent) inévitablement
                                       reproduce at the same time.
                             jusqu’à la fixation d’un allèle
                                      Each gene in a generation is
                                      a copy of a gene of the
                          •! La dérivepreviousdonc à la
                                      conduit generation.
                            perte de variation génétique à
                            l’intérieur In the absence of mutation
                                         des populations
                                     and selection, allele
                                     frequencies derive inevitably
                                     until the fixation of an
                                     allele.
Likelihood-free Methods
   Genetics of ABC




Coalescent theory
                                 [Kingman, 1982; Tajima, Tavar´, &tc]
                                                              e




      !"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6"
      Coalescence theory interested in the genealogy of a sample of
      !!7**+$,-'",()5534%'" common ancestor of the sample.
       "
      genes back in time to the    "       "!"7**+$,-'"8",$)('5,'1,'"9"
Likelihood-free Methods
   Genetics of ABC




Common ancestor



                                                                               Les différentes
                                                                               lignées fusionnent
                                                                               (coalescent) au fur
                                                                               et à mesure que
                                                                               l’on remonte vers le




                                                         Time of coalescence
                                                                               passé




                                                                 (T)
                            Modélisation du processus de dérive génétique
       The different lineages mergeen “remontant dans lein the past.
                                   when we go back temps”
                          jusqu’à l’ancêtre commun d’un échantillon de gènes                  6
Likelihood-free Methods
   Genetics of ABC
  Sous l’hypothèse de neutralité des marqueurs génétiques étudiés,
  les mutations sont indépendantes de la généalogie
Neutral généalogie ne dépend que des processus démographiques
  i.e. la mutations

      On construit donc la généalogie selon les paramètres
      démographiques (ex. N),
                                       puis on ajoute aassumption les
                                             Under the posteriori of
                                      mutations sur les différentes
                                             neutrality, the mutations
                                      branches,independentau feuilles de
                                             are du MRCA of the
                                      l’arbregenealogy.
                                      On obtient ainsi des données de
                                             We construct the genealogy
                                      polymorphisme sous les modèles
                                             according to the
                                      démographiques et mutationnels
                                             demographic parameters,
                                      considérés
                                             then we add a posteriori the
                                             mutations.                20
Likelihood-free Methods
   Genetics of ABC




Demo-genetic inference



       Each model is characterized by a set of parameters θ that cover
       historical (time divergence, admixture time ...), demographics
       (population sizes, admixture rates, migration rates, ...) and genetic
       (mutation rate, ...) factors
       The goal is to estimate these parameters from a dataset of
       polymorphism (DNA sample) y observed at the present time
       Problem: most of the time, we can not calculate the likelihood of
       the polymorphism data f (y|θ).
Likelihood-free Methods
   Genetics of ABC




Untractable likelihood



       Missing (too missing!) data structure:

                          f (y|θ) =       f (y|G , θ)f (G |θ)dG
                                      G


       The genealogies are considered as nuisance parameters.
       This problematic thus differs from the phylogenetic approach
       where the tree is the parameter of interesst.
Likelihood-free Methods
   Genetics of ABC




A genuine !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
          example of application
                     1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+




                                                             94


       Pygmies populations: do they have a common origin? Is there a
       lot of exchanges between pygmies and non-pygmies populations?
Likelihood-free Methods
   Genetics of ABC

          !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
          1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Alternative scenarios

             Différents scénarios possibles, choix de scenario par ABC




                                                                  Verdu et al. 2009

                                                                         96
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+
Likelihood-free Methods


           !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03
   Genetics of ABC



          1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.
Différentsresults
 Simulation scénarios possibles, choix de scenari

             Différents scénarios possibles, choix de scenario par ABC




        c Scenario 1A is chosen. largement soutenu par rapport aux
            Le scenario 1a est
             autres ! plaide pour une origine commune des
  Le       scenario 1a est largement soutenu par
             populations pygmées d’Afrique de l’Ouest             rap
                                                                  Verdu e
Likelihood-free Methods
   Genetics of ABC



    !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
Most likely scenario
   1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+

   Scénario
   évolutif :
   on « raconte »
   une histoire à
   partir de ces
   inférences




                                     Verdu et al. 2009


                                                     99
Likelihood-free Methods
   Approximate Bayesian computation




Approximate Bayesian computation

       Introduction

       The Metropolis-Hastings Algorithm

       simulation-based methods in Econometrics

       Genetics of ABC

       Approximate Bayesian computation
          ABC basics
          Alphabet soup
          Automated summary statistic selection
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Untractable likelihoods



       Cases when the likelihood function f (y|θ) is unavailable and when
       the completion step

                                      f (y|θ) =       f (y, z|θ) dz
                                                  Z

       is impossible or too costly because of the dimension of z
        c MCMC cannot be implemented!
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Untractable likelihoods
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Illustrations



       Example
     Stochastic volatility model: for                         Highest weight trajectories


     t = 1, . . . , T ,




                                             0.4
                                             0.2
     yt = exp(zt ) t , zt = a+bzt−1 +σηt ,



                                             0.0
                                             −0.2
     T very large makes it difficult to
                                             −0.4
     include z within the simulated                 0   200    400

                                                                           t
                                                                                   600      800   1000




     parameters
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Illustrations



       Example
       Potts model: if y takes values on a grid Y of size k n and


                                      f (y|θ) ∝ exp θ         Iyl =yi
                                                        l∼i

       where l∼i denotes a neighbourhood relation, n moderately large
       prohibits the computation of the normalising constant
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Illustrations




       Example
       Inference on CMB: in cosmology, study of the Cosmic Microwave
       Background via likelihoods immensely slow to computate (e.g
       WMAP, Plank), because of numerically costly spectral transforms
       [Data is a Fortran program]
                                        [Kilbinger et al., 2010, MNRAS]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Illustrations


       Example
   Phylogenetic tree: in population
   genetics, reconstitution of a common
   ancestor from a sample of genes via
   a phylogenetic tree that is close to
   impossible to integrate out
   [100 processor days with 4
   parameters]
                                    [Cornuet et al., 2009, Bioinformatics]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


The ABC method

       Bayesian setting: target is π(θ)f (x|θ)
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


The ABC method

       Bayesian setting: target is π(θ)f (x|θ)
       When likelihood f (x|θ) not in closed form, likelihood-free rejection
       technique:
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


The ABC method

       Bayesian setting: target is π(θ)f (x|θ)
       When likelihood f (x|θ) not in closed form, likelihood-free rejection
       technique:
       ABC algorithm
       For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
       simulating
                            θ ∼ π(θ) , z ∼ f (z|θ ) ,
       until the auxiliary variable z is equal to the observed value, z = y.

                                                        [Tavar´ et al., 1997]
                                                              e
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Why does it work?!



       The proof is trivial:

                                      f (θi ) ∝         π(θi )f (z|θi )Iy (z)
                                                  z∈D
                                            ∝ π(θi )f (y|θi )
                                            = π(θi |y) .

                                                                           [Accept–Reject 101]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Earlier occurrence


              ‘Bayesian statistics and Monte Carlo methods are ideally
              suited to the task of passing many models over one
              dataset’
                                      [Don Rubin, Annals of Statistics, 1984]

       Note Rubin (1984) does not promote this algorithm for
       likelihood-free simulation but frequentist intuition on posterior
       distributions: parameters from posteriors are more likely to be
       those that could have generated the data.
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


A as A...pproximative


       When y is a continuous random variable, equality z = y is replaced
       with a tolerance condition,

                                      (y, z) ≤

       where         is a distance
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


A as A...pproximative


       When y is a continuous random variable, equality z = y is replaced
       with a tolerance condition,

                                           (y, z) ≤

       where is a distance
       Output distributed from

                            π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )

                                                         [Pritchard et al., 1999]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


ABC algorithm


       Algorithm 1 Likelihood-free rejection sampler 2
         for i = 1 to N do
           repeat
              generate θ from the prior distribution π(·)
              generate z from the likelihood f (·|θ )
           until ρ{η(z), η(y)} ≤
           set θi = θ
         end for

       where η(y) defines a (not necessarily sufficient) statistic
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Output

       The likelihood-free algorithm samples from the marginal in z of:

                                              π(θ)f (z|θ)IA ,y (z)
                              π (θ, z|y) =                           ,
                                             A ,y ×Θ π(θ)f (z|θ)dzdθ

       where A       ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Output

       The likelihood-free algorithm samples from the marginal in z of:

                                              π(θ)f (z|θ)IA ,y (z)
                              π (θ, z|y) =                           ,
                                             A ,y ×Θ π(θ)f (z|θ)dzdθ

       where A       ,y   = {z ∈ D|ρ(η(z), η(y)) < }.
       The idea behind ABC is that the summary statistics coupled with a
       small tolerance should provide a good approximation of the
       posterior distribution:

                               π (θ|y) =     π (θ, z|y)dz ≈ π(θ|y) .
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (first attempt)




       What happens when              → 0?
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (first attempt)



       What happens when              → 0?
       If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
       δ > 0, there exists 0 such that < 0 implies

                π(θ)     f (z|θ)IA ,y (z) dz      π(θ)f (y|θ)(1 δ)µ(B )
                                             ∈
                  A ,y ×Θ π(θ)f (z|θ)dzdθ        Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (first attempt)



       What happens when              → 0?
       If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
       δ > 0, there exists 0 such that < 0 implies

                π(θ)     f (z|θ)IA ,y (z) dz       π(θ)f (y|θ)(1 δ)
                                                                    XX )
                                                                    µ(BX
                                             ∈
                  A ,y ×Θ π(θ)f (z|θ)dzdθ        Θ π(θ)f (y|θ)dθ(1 ± δ)
                                                                       µ(B )
                                                                       XXX
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (first attempt)



       What happens when               → 0?
       If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
       δ  0, there exists 0 such that  0 implies

                π(θ)     f (z|θ)IA ,y (z) dz      π(θ)f (y|θ)(1 δ)
                                                                    XX )
                                                                    µ(BX
                                             ∈
                  A ,y ×Θ π(θ)f (z|θ)dzdθ        Θ π(θ)f (y|θ)dθ(1 ± δ) X
                                                                       µ(B )
                                                                       X
                                                                         X

                               [Proof extends to other continuous-in-0 kernels K ]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (second attempt)

       What happens when              → 0?
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Convergence of ABC (second attempt)

       What happens when                     → 0?
       For B ⊂ Θ, we have

                       A         f (z|θ)dz                                  f (z|θ)π(θ)dθ
                            ,y                                          B
                                              π(θ)dθ =                                       dz
         B A      ,y ×Θ
                            π(θ)f (z|θ)dzdθ                 A   ,y   A ,y ×Θ π(θ)f (z|θ)dzdθ

                                 B f (z|θ)π(θ)dθ            m(z)
            =                                                              dz
                   A   ,y
                                      m(z)         A ,y ×Θ π(θ)f (z|θ)dzdθ
                                                 m(z)
            =               π(B|z)                              dz
                   A   ,y               A ,y ×Θ π(θ)f (z|θ)dzdθ




       which indicates convergence for a continuous π(B|z).
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Probit modelling on Pima Indian women
       Example (R benchmark)
       200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Probit modelling on Pima Indian women
       Example (R benchmark)
       200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


       Probability of diabetes function of above variables
                               P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Probit modelling on Pima Indian women
       Example (R benchmark)
       200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


       Probability of diabetes function of above variables
                               P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
       Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
       g -prior modelling:
                                             β ∼ N3 (0, n XT X)−1
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Probit modelling on Pima Indian women
       Example (R benchmark)
       200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


       Probability of diabetes function of above variables
                               P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
       Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
       g -prior modelling:
                                             β ∼ N3 (0, n XT X)−1
       Use of importance function inspired from the MLE estimate
       distribution
                                          ˆ ˆ
                                 β ∼ N (β, Σ)
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Pima Indian benchmark




                                                                         80
                            100




                                                                                                                1.0
                            80




                                                                         60




                                                                                                                0.8
                            60




                                                                                                                0.6
                  Density




                                                               Density




                                                                                                      Density
                                                                         40
                            40




                                                                                                                0.4
                                                                         20
                            20




                                                                                                                0.2
                                                                                                                0.0
                            0




                                                                         0




                              −0.005   0.010   0.020   0.030                  −0.05   −0.03   −0.01                   −1.0   0.0   1.0   2.0




       Figure: Comparison between density estimates of the marginals on β1
       (left), β2 (center) and β3 (right) from ABC rejection samples (red) and
       MCMC samples (black)

                                                                                      .
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


MA example


       Back to the MA(q) model
                                                     q
                                      xt =   t   +         ϑi    t−i
                                                     i=1

       Simple prior: uniform over the inverse [real and complex] roots in
                                                           q
                                      Q(u) = 1 −                ϑi u i
                                                         i=1

       under the identifiability conditions
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


MA example



       Back to the MA(q) model
                                                     q
                                      xt =   t   +         ϑi   t−i
                                                     i=1

       Simple prior: uniform prior over the identifiability zone, e.g.
       triangle for MA(2)
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


MA example (2)
       ABC algorithm thus made of
         1. picking a new value (ϑ1 , ϑ2 ) in the triangle
         2. generating an iid sequence ( t )−qt≤T
         3. producing a simulated series (xt )1≤t≤T
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


MA example (2)
       ABC algorithm thus made of
         1. picking a new value (ϑ1 , ϑ2 ) in the triangle
         2. generating an iid sequence ( t )−qt≤T
         3. producing a simulated series (xt )1≤t≤T
       Distance: basic distance between the series
                                                            T
                           ρ((xt )1≤t≤T , (xt )1≤t≤T ) =         (xt − xt )2
                                                           t=1

       or distance between summary statistics like the q autocorrelations
                                               T
                                       τj =           xt xt−j
                                              t=j+1
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Comparison of distance impact




       Evaluation of the tolerance on the ABC sample against both
       distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Comparison of distance impact

                          4




                                                            1.5
                          3




                                                            1.0
                          2




                                                            0.5
                          1




                                                            0.0
                          0




                              0.0   0.2   0.4   0.6   0.8         −2.0   −1.0    0.0   0.5   1.0   1.5

                                           θ1                                   θ2




       Evaluation of the tolerance on the ABC sample against both
       distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Comparison of distance impact

                          4




                                                            1.5
                          3




                                                            1.0
                          2




                                                            0.5
                          1




                                                            0.0
                          0




                              0.0   0.2   0.4   0.6   0.8         −2.0   −1.0    0.0   0.5   1.0   1.5

                                           θ1                                   θ2




       Evaluation of the tolerance on the ABC sample against both
       distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


Homonomy
       The ABC algorithm is not to be confused with the ABC algorithm
              The Artificial Bee Colony algorithm is a swarm based meta-heuristic
              algorithm that was introduced by Karaboga in 2005 for optimizing
              numerical problems. It was inspired by the intelligent foraging
              behavior of honey bees. The algorithm is specifically based on the
              model proposed by Tereshko and Loengarov (2005) for the foraging
              behaviour of honey bee colonies. The model consists of three
              essential components: employed and unemployed foraging bees, and
              food sources. The first two components, employed and unemployed
              foraging bees, search for rich food sources (...) close to their hive.
              The model also defines two leading modes of behaviour (...):
              recruitment of foragers to rich food sources resulting in positive
              feedback and abandonment of poor sources by foragers causing
              negative feedback.

                                                           [Karaboga, Scholarpedia]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


ABC advances

       Simulating from the prior is often poor in efficiency
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


ABC advances

       Simulating from the prior is often poor in efficiency
       Either modify the proposal distribution on θ to increase the density
       of x’s within the vicinity of y ...
            [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


ABC advances

       Simulating from the prior is often poor in efficiency
       Either modify the proposal distribution on θ to increase the density
       of x’s within the vicinity of y ...
            [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

       ...or by viewing the problem as a conditional density estimation
       and by developing techniques to allow for larger
                                                   [Beaumont et al., 2002]
Likelihood-free Methods
   Approximate Bayesian computation
     ABC basics


ABC advances

       Simulating from the prior is often poor in efficiency
       Either modify the proposal distribution on θ to increase the density
       of x’s within the vicinity of y ...
            [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]

       ...or by viewing the problem as a conditional density estimation
       and by developing techniques to allow for larger
                                                   [Beaumont et al., 2002]

       .....or even by including      in the inferential framework [ABCµ ]
                                                           [Ratmann et al., 2009]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP
    Better usage of [prior] simulations by
   adjustement: instead of throwing away
   θ such that ρ(η(z), η(y))  , replace
   θ’s with locally regressed transforms
      (use with BIC)



               θ∗ = θ − {η(z) − η(y)}T β
                                       ˆ               [Csill´ry et al., TEE, 2010]
                                                             e

              ˆ
       where β is obtained by [NP] weighted least square regression on
       (η(z) − η(y)) with weights

                                      Kδ {ρ(η(z), η(y))}

                                              [Beaumont et al., 2002, Genetics]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (regression)


       Also found in the subsequent literature, e.g. in                   Fearnhead-Prangle (2012)   :
       weight directly simulation by

                                          Kδ {ρ(η(z(θ)), η(y))}

       or
                                          S
                                      1
                                                Kδ {ρ(η(zs (θ)), η(y))}
                                      S
                                          s=1

                                                           [consistent estimate of f (η|θ)]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (regression)


       Also found in the subsequent literature, e.g. in                   Fearnhead-Prangle (2012)   :
       weight directly simulation by

                                          Kδ {ρ(η(z(θ)), η(y))}

       or
                                          S
                                      1
                                                Kδ {ρ(η(zs (θ)), η(y))}
                                      S
                                          s=1

                                           [consistent estimate of f (η|θ)]
       Curse of dimensionality: poor estimate when d = dim(η) is large...
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (density estimation)


       Use of the kernel weights

                                          Kδ {ρ(η(z(θ)), η(y))}

       leads to the NP estimate of the posterior expectation

                                      i   θi Kδ {ρ(η(z(θi )), η(y))}
                                          i Kδ {ρ(η(z(θi )), η(y))}

                                                                       [Blum, JASA, 2010]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (density estimation)


       Use of the kernel weights

                                         Kδ {ρ(η(z(θ)), η(y))}

       leads to the NP estimate of the posterior conditional density
                                      ˜
                                      Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
                                  i
                                          i Kδ {ρ(η(z(θi )), η(y))}

                                                                   [Blum, JASA, 2010]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (density estimations)


       Other versions incorporating regression adjustments
                                      ˜
                                      Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                  i
                                           i Kδ {ρ(η(z(θi )), η(y))}
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (density estimations)


       Other versions incorporating regression adjustments
                                      ˜
                                      Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                  i
                                           i Kδ {ρ(η(z(θi )), η(y))}

       In all cases, error

            E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
              g
                                     c
                    var(ˆ (θ|y)) =
                        g                (1 + oP (1))
                                   nbδ d
                                                                   [Blum, JASA, 2010]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NP (density estimations)


       Other versions incorporating regression adjustments
                                      ˜
                                      Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
                                  i
                                           i Kδ {ρ(η(z(θi )), η(y))}

       In all cases, error

            E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
              g
                                     c
                    var(ˆ (θ|y)) =
                        g                (1 + oP (1))
                                   nbδ d
                                                             [standard NP calculations]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NCH

       Incorporating non-linearities and heterocedasticities:

                                                            σ (η(y))
                                                            ˆ
                             θ∗ = m(η(y)) + [θ − m(η(z))]
                                  ˆ              ˆ
                                                            σ (η(z))
                                                            ˆ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NCH

       Incorporating non-linearities and heterocedasticities:

                                                                   σ (η(y))
                                                                   ˆ
                             θ∗ = m(η(y)) + [θ − m(η(z))]
                                  ˆ              ˆ
                                                                   σ (η(z))
                                                                   ˆ

       where
              m(η) estimated by non-linear regression (e.g., neural network)
              ˆ
              σ (η) estimated by non-linear regression on residuals
              ˆ

                                      log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi
                                               ˆ

                                                                [Blum  Fran¸ois, 2009]
                                                                            c
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NCH (2)


       Why neural network?
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-NCH (2)


       Why neural network?
              fights curse of dimensionality
              selects relevant summary statistics
              provides automated dimension reduction
              offers a model choice capability
              improves upon multinomial logistic
                                                    [Blum  Fran¸ois, 2009]
                                                                c
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MCMC


       Markov chain (θ(t) ) created via the transition function
                 
                 θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
                 
                 
                                                            π(θ )Kω (t) |θ )
       θ (t+1)
               =                       and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
                                                                  ω (θ
                  (t)
                 θ                   otherwise,
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MCMC


       Markov chain (θ(t) ) created via the transition function
                 
                 θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
                 
                 
                                                            π(θ )Kω (t) |θ )
       θ (t+1)
               =                       and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,
                                                                  ω (θ
                  (t)
                 θ                   otherwise,

       has the posterior π(θ|y ) as stationary distribution
                                                      [Marjoram et al, 2003]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MCMC (2)

       Algorithm 2 Likelihood-free MCMC sampler
           Use Algorithm 1 to get (θ(0) , z(0) )
           for t = 1 to N do
             Generate θ from Kω ·|θ(t−1) ,
             Generate z from the likelihood f (·|θ ),
             Generate u from U[0,1] ,
                            π(θ )Kω (θ(t−1) |θ )
              if u ≤                               I
                           π(θ(t−1) Kω (θ |θ(t−1) ) A   ,y   (z ) then
                set       (θ(t) , z(t) ) = (θ , z )
             else
                (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
             end if
           end for
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Why does it work?

       Acceptance probability does not involve calculating the likelihood
       and

                   π (θ , z |y)           q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
                                      ×
            π (θ (t−1) , z(t−1) |y)            q(θ |θ (t−1) )f (z |θ )
                                                   π(θ ) XX|θ IA ,y (z )
                                                           )
                                                          f (z X  X
                                      =
                                          π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) )
                                          q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) )
                                      ×
                                               q(θ |θ (t−1) ) X|θ
                                                              X )
                                                              f (z XX
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Why does it work?

       Acceptance probability does not involve calculating the likelihood
       and

                   π (θ , z |y)           q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
                                      ×
            π (θ (t−1) , z(t−1) |y)            q(θ |θ (t−1) )f (z |θ )
                                                   π(θ ) XX|θ IA ,y (z )
                                                           )
                                                          f (z X  X
                                      =               hh            (
                                          π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) )
                                                      f (z(t−1) |θ (t−1)
                                                           ((( h   ((
                                                         hh            (
                                          q(θ (t−1) |θ ) ((hhhhh)
                                                         f (z(t−1) |θ (t−1)
                                                                      ((
                                                              ((( h
                                      ×
                                               q(θ |θ (t−1) ) X|θ
                                                              X )
                                                              f (z XX
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Why does it work?

       Acceptance probability does not involve calculating the likelihood
       and

                   π (θ , z |y)           q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
                                      ×
            π (θ (t−1) , z(t−1) |y)            q(θ |θ (t−1) )f (z |θ )
                                                   π(θ ) X|θ IA ,y (z )
                                                          X )
                                                          f (z X  X
                                      =               hh            (
                                          π(θ (t−1) ) ((hhhhh) XX(t−1) )
                                                      f (z(t−1) |θ (t−1) IAX
                                                                   (( X
                                                           ((( h ,y (zX           
                                                                               X
                                                         hh              (
                                          q(θ (t−1) |θ ) ((hhhhh)
                                                         f (z(t−1) |θ (t−1)
                                                                        ((
                                                              ((( h
                                      ×
                                               q(θ |θ (t−1) ) X|θ
                                                              X )
                                                              f (z XX
                                            π(θ )q(θ (t−1) |θ )
                                      =                              IA ,y (z )
                                          π(θ (t−1) q(θ |θ (t−1) )
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example



       Case of
                                 1       1
                           x ∼ N (θ, 1) + N (θ, 1)
                                 2       2
       under prior θ ∼ N (0, 10)
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example



       Case of
                                 1       1
                           x ∼ N (θ, 1) + N (θ, 1)
                                 2       2
       under prior θ ∼ N (0, 10)

       ABC sampler
       thetas=rnorm(N,sd=10)
       zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
       eps=quantile(abs(zed-x),.01)
       abc=thetas[abs(zed-x)eps]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


       Case of
                                 1       1
                           x ∼ N (θ, 1) + N (θ, 1)
                                 2       2
       under prior θ ∼ N (0, 10)

       ABC-MCMC sampler
       metas=rep(0,N)
       metas[1]=rnorm(1,sd=10)
       zed[1]=x
       for (t in 2:N){
         metas[t]=rnorm(1,mean=metas[t-1],sd=5)
         zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1)
         if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){
          metas[t]=metas[t-1]
          zed[t]=zed[t-1]}
         }
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                      x =2
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                            x =2
                      0.30
                      0.25
                      0.20
                      0.15
                      0.10
                      0.05
                      0.00




                                  −4   −2      0   2   4

                                              θ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                      x = 10
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                            x = 10
                      0.25
                      0.20
                      0.15
                      0.10
                      0.05
                      0.00




                                 −10   −5      0     5   10

                                                θ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                      x = 50
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A toy example


                                            x = 50
                      0.10
                      0.08
                      0.06
                      0.04
                      0.02
                      0.00




                             −40      −20      0     20   40

                                                θ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ


                          [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]

       Use of a joint density

                               f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

       where y is the data, and ξ( |y, θ) is the prior predictive density of
       ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ


                          [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]

       Use of a joint density

                               f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )

       where y is the data, and ξ( |y, θ) is the prior predictive density of
       ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
       Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
       approximation.
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ details

       Multidimensional distances ρk (k = 1, . . . , K ) and errors
       k = ρk (ηk (z), ηk (y)), with


                            ˆ              1
        k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =           K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
                                          Bhk
                                                b

                                                  ˆ
       then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ details

       Multidimensional distances ρk (k = 1, . . . , K ) and errors
       k = ρk (ηk (z), ηk (y)), with


                            ˆ              1
        k   ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =              K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
                                          Bhk
                                                   b

                                                  ˆ
       then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
       ABCµ involves acceptance probability

                                                         ˆ
                             π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
                                                          ˆ
                             π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ multiple errors




                                      [ c Ratmann et al., PNAS, 2009]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABCµ for model choice




                                      [ c Ratmann et al., PNAS, 2009]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Questions about ABCµ


       For each model under comparison, marginal posterior on   used to
       assess the fit of the model (HPD includes 0 or not).
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Questions about ABCµ


       For each model under comparison, marginal posterior on         used to
       assess the fit of the model (HPD includes 0 or not).
              Is the data informative about ? [Identifiability]
              How is the prior π( ) impacting the comparison?
              How is using both ξ( |x0 , θ) and π ( ) compatible with a
              standard probability model? [remindful of Wilkinson ]
              Where is the penalisation for complexity in the model
              comparison?
                                       [X, Mengersen  Chen, 2010, PNAS]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC

       Another sequential version producing a sequence of Markov
                                              (t)          (t)
       transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC

       Another sequential version producing a sequence of Markov
                                              (t)          (t)
       transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )

       ABC-PRC Algorithm
                                                              (t−1)
         1. Select θ at random from previous θi                       ’s with probabilities
             (t−1)
            ωi     (1 ≤ i ≤ N).
         2. Generate
                                       (t)                             (t)
                                      θi     ∼ Kt (θ|θ ) , x ∼ f (x|θi ) ,
         3. Check that (x, y )  , otherwise start again.

                                                                       [Sisson et al., 2007]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Why PRC?


       Partial rejection control: Resample from a population of weighted
       particles by pruning away particles with weights below threshold C ,
       replacing them by new particles obtained by propagating an
       existing particle by an SMC step and modifying the weights
       accordinly.
                                                                [Liu, 2001]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Why PRC?


       Partial rejection control: Resample from a population of weighted
       particles by pruning away particles with weights below threshold C ,
       replacing them by new particles obtained by propagating an
       existing particle by an SMC step and modifying the weights
       accordinly.
                                                                [Liu, 2001]
       PRC justification in ABC-PRC:
              Suppose we then implement the PRC algorithm for some
              c  0 such that only identically zero weights are smaller
              than c
       Trouble is, there is no such c...
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC weight


                                (t)
       Probability ωi                 computed as
                          (t)           (t)         (t)         (t)
                     ωi         ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

        where Lt−1 is an arbitrary transition kernel.
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC weight


                                (t)
       Probability ωi                 computed as
                          (t)           (t)         (t)         (t)
                     ωi         ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

        where Lt−1 is an arbitrary transition kernel.
       In case
                             Lt−1 (θ |θ) = Kt (θ|θ ) ,
       all weights are equal under a uniform prior.
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC weight


                                (t)
       Probability ωi                 computed as
                          (t)           (t)         (t)         (t)
                     ωi         ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 ,

        where Lt−1 is an arbitrary transition kernel.
       In case
                             Lt−1 (θ |θ) = Kt (θ|θ ) ,
       all weights are equal under a uniform prior.
       Inspired from Del Moral et al. (2006), who use backward kernels
       Lt−1 in SMC to achieve unbiasedness
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC bias

                            Lack of unbiasedness of the method
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-PRC bias

                                      Lack of unbiasedness of the method
       Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to
                                                                      (t−1)                  (t)        (t−1)               (t)
                                                              π(θ               |y )Kt (θ          |θ            )f (y |θ         ),


        For an arbitrary function h(θ), E [ωt h(θ(t) )] proportional to

                            (t)       π(θ (t) )Lt−1 (θ (t−1) |θ (t) )                         (t−1)                   (t)        (t−1)              (t)         (t−1)        (t)
                      h(θ         )                                                    π(θ               |y )Kt (θ          |θ           )f (y |θ         )dθ           dθ
                                      π(θ (t−1) )Kt (θ (t) |θ (t−1) )

                                            (t)       π(θ (t) )Lt−1 (θ (t−1) |θ (t) )                         (t−1)               (t−1)
                      ∝               h(θ         )                                                 π(θ               )f (y |θ            )
                                                      π(θ (t−1) )Kt (θ (t) |θ (t−1) )
                                              (t)          (t−1)               (t)         (t−1)        (t)
                              × Kt (θ                 |θ            )f (y |θ         )dθ           dθ
                                        (t)            (t)                                 (t−1)        (t)              (t−1)          (t−1)             (t)
                      ∝           h(θ         )π(θ           |y )         Lt−1 (θ                  |θ         )f (y |θ            )dθ               dθ          .
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A mixture example (1)

       Toy model of Sisson et al. (2007): if

               θ ∼ U(−10, 10) ,       x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) ,

       then the posterior distribution associated with y = 0 is the normal
       mixture
                    θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100)
       restricted to [−10, 10].
       Furthermore, true target available as

       π(θ||x|  ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


“Ugly, squalid graph...”
             1.0




                                         1.0




                                                                     1.0




                                                                                                 1.0




                                                                                                                             1.0
             0.8




                                         0.8




                                                                     0.8




                                                                                                 0.8




                                                                                                                             0.8
             0.6




                                         0.6




                                                                     0.6




                                                                                                 0.6




                                                                                                                             0.6
             0.4




                                         0.4




                                                                     0.4




                                                                                                 0.4




                                                                                                                             0.4
             0.2




                                         0.2




                                                                     0.2




                                                                                                 0.2




                                                                                                                             0.2
             0.0




                                         0.0




                                                                     0.0




                                                                                                 0.0




                                                                                                                             0.0
                   −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3

                             θ                           θ                           θ                           θ                           θ
             1.0




                                         1.0




                                                                     1.0




                                                                                                 1.0




                                                                                                                             1.0
             0.8




                                         0.8




                                                                     0.8




                                                                                                 0.8




                                                                                                                             0.8
             0.6




                                         0.6




                                                                     0.6




                                                                                                 0.6




                                                                                                                             0.6
             0.4




                                         0.4




                                                                     0.4




                                                                                                 0.4




                                                                                                                             0.4
             0.2




                                         0.2




                                                                     0.2




                                                                                                 0.2




                                                                                                                             0.2
             0.0




                                         0.0




                                                                     0.0




                                                                                                 0.0




                                                                                                                             0.0
                   −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3         −3   −1       1   3

                             θ                           θ                           θ                           θ                           θ



                         Comparison of τ = 0.15 and τ = 1/0.15 in Kt
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A PMC version
       Use of the same kernel idea as ABC-PRC but with IS correction
       ( check PMC )
       Generate a sample at iteration t by
                                                    N
                                       (t)                (t−1)              (t−1)
                               πt (θ
                               ˆ             )∝          ωj       Kt (θ(t) |θj       )
                                                   j=1

       modulo acceptance of the associated xt , and use an importance
                                                       (t)
       weight associated with an accepted simulation θi
                                             (t)          (t)          (t)
                                        ωi         ∝ π(θi ) πt (θi ) .
                                                            ˆ

        c Still likelihood free
                                                                          [Beaumont et al., 2009]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Our ABC-PMC algorithm
       Given a decreasing sequence of approximation levels                   1    ≥ ... ≥           T,

          1. At iteration t = 1,
                      For i = 1, ..., N
                                          (1)                         (1)
                             Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y )                          1
                                  (1)
                             Set ωi = 1/N
                                                                            (1)
              Take τ 2 as twice the empirical variance of the θi ’s
          2. At iteration 2 ≤ t ≤ T ,
                      For i = 1, ..., N, repeat
                                                 (t−1)                              (t−1)
                             Pick θi from the θj         ’s with probabilities ωj
                                          (t)           2                         (t)
                             generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi )
                      until (x, y )        t
                             (t)          (t)    N       (t−1)      −1      (t)         (t−1)
                      Set   ωi     ∝   π(θi )/   j=1   ωj        ϕ σt  θi         − θj          )
                    2                                                                               (t)
              Take τt+1 as twice the weighted empirical variance of the θi ’s
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Sequential Monte Carlo

       SMC is a simulation technique to approximate a sequence of
       related probability distributions πn with π0 “easy” and πT as
       target.
       Iterated IS as PMC: particles moved from time n to time n via
       kernel Kn and use of a sequence of extended targets πn˜
                                                          n
                                 πn (z0:n ) = πn (zn )
                                 ˜                             Lj (zj+1 , zj )
                                                         j=0

       where the Lj ’s are backward Markov kernels [check that πn (zn ) is a
       marginal]
                               [Del Moral, Doucet  Jasra, Series B, 2006]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Sequential Monte Carlo (2)

       Algorithm 3 SMC sampler
                          (0)
           sample zi ∼ γ0 (x) (i = 1, . . . , N)
                               (0)        (0)        (0)
           compute weights wi = π0 (zi )/γ0 (zi )
           for t = 1 to N do
             if ESS(w (t−1) )  NT then
                resample N particles z (t−1) and set weights to 1
             end if
                        (t−1)        (t−1)
             generate zi      ∼ Kt (zi     , ·) and set weights to
                                                       (t)         (t)    (t−1)
                                (t)      (t−1)    πt (zi ))Lt−1 (zi ), zi         ))
                            wi        = wi−1            (t−1)        (t−1)     (t)
                                                 πt−1 (zi     ))Kt (zi     ), zi ))

           end for
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-SMC
                                                       [Del Moral, Doucet  Jasra, 2009]

       True derivation of an SMC-ABC algorithm
       Use of a kernel Kn associated with target π                        n   and derivation of the
       backward kernel
                                                       π n (z )Kn (z , z)
                                      Ln−1 (z, z ) =
                                                             πn (z)

       Update of the weights
                                                         M                  m
                                                         m=1 IA       n
                                                                          (xin )
                               win ∝ wi(n−1)           M                m
                                                                      (xi(n−1) )
                                                       m=1 IA   n−1

             m
       when xin ∼ K (xi(n−1) , ·)
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-SMCM



       Modification: Makes M repeated simulations of the pseudo-data z
       given the parameter, rather than using a single [M = 1] simulation,
       leading to weight that is proportional to the number of accepted
       zi s
                                       M
                                    1
                           ω(θ) =         Iρ(η(y),η(zi ))
                                   M
                                      i=1

       [limit in M means exact simulation from (tempered) target]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Properties of ABC-SMC


       The ABC-SMC method properly uses a backward kernel L(z, z ) to
       simplify the importance weight and to remove the dependence on
       the unknown likelihood from this weight. Update of importance
       weights is reduced to the ratio of the proportions of surviving
       particles
       Major assumption: the forward kernel K is supposed to be invariant
       against the true target [tempered version of the true posterior]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Properties of ABC-SMC


       The ABC-SMC method properly uses a backward kernel L(z, z ) to
       simplify the importance weight and to remove the dependence on
       the unknown likelihood from this weight. Update of importance
       weights is reduced to the ratio of the proportions of surviving
       particles
       Major assumption: the forward kernel K is supposed to be invariant
       against the true target [tempered version of the true posterior]
       Adaptivity in ABC-SMC algorithm only found in on-line
       construction of the thresholds t , slowly enough to keep a large
       number of accepted transitions
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A mixture example (2)
       Recovery of the target, whether using a fixed standard deviation of
       τ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s.
            1.0




                                                 1.0




                                                                                      1.0




                                                                                                                           1.0




                                                                                                                                                                1.0
            0.8




                                                 0.8




                                                                                      0.8




                                                                                                                           0.8




                                                                                                                                                                0.8
            0.6




                                                 0.6




                                                                                      0.6




                                                                                                                           0.6




                                                                                                                                                                0.6
            0.4




                                                 0.4




                                                                                      0.4




                                                                                                                           0.4




                                                                                                                                                                0.4
            0.2




                                                 0.2




                                                                                      0.2




                                                                                                                           0.2




                                                                                                                                                                0.2
            0.0




                                                 0.0




                                                                                      0.0




                                                                                                                           0.0




                                                                                                                                                                0.0
                  −3   −2   −1   0   1   2   3         −3   −2   −1   0   1   2   3         −3   −2   −1   0   1   2   3         −3   −2   −1   0   1   2   3         −3   −2   −1   0   1   2   3

                                 θ                                    θ                                    θ                                    θ                                    θ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Yet another ABC-PRC
       Another version of ABC-PRC called PRC-ABC with a proposal
       distribution Mt , S replications of the pseudo-data, and a PRC step:
       PRC-ABC Algorithm
                                                              (t−1)                          (t−1)
         1. Select θ at random from the θi                            ’s with probabilities ωi
                                  (t)                         iid           (t−1)
         2. Generate θi                 ∼ Kt , x1 , . . . , xS ∼ f (x|θi            ), set
                                        (t)                                           (t)
                                   ωi         =       Kt {η(xs ) − η(y)} Kt (θi ) ,
                                                  s
                                                                      (t)
              and accept with probability p (i) = ωi /ct,N
                          (t)       (t)
         3. Set ωi              ∝ ωi /p (i) (1 ≤ i ≤ N)

                                        [Peters, Sisson  Fan, Stat  Computing, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Yet another ABC-PRC



       Another version of ABC-PRC called PRC-ABC with a proposal
       distribution Mt , S replications of the pseudo-data, and a PRC step:
                                                 (t−1)        (t−1)
              Use of a NP estimate Mt (θ) =     ωi       ψ(θ|θi       (with a
              fixed variance?!)
              S = 1 recommended for “greatest gains in sampler
              performance”
              ct,N upper quantile of non-zero weights at time t
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Wilkinson’s exact BC
       ABC approximation error (i.e. non-zero tolerance) replaced with
       exact simulation from a controlled approximation to the target,
       convolution of true posterior with kernel function
                                     π(θ)f (z|θ)K (y − z)
                    π (θ, z|y) =                            ,
                                   π(θ)f (z|θ)K (y − z)dzdθ
       with K kernel parameterised by bandwidth .
                                                        [Wilkinson, 2008]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Wilkinson’s exact BC
       ABC approximation error (i.e. non-zero tolerance) replaced with
       exact simulation from a controlled approximation to the target,
       convolution of true posterior with kernel function
                                     π(θ)f (z|θ)K (y − z)
                    π (θ, z|y) =                            ,
                                   π(θ)f (z|θ)K (y − z)dzdθ
       with K kernel parameterised by bandwidth .
                                                         [Wilkinson, 2008]
       Theorem
       The ABC algorithm based on the assumption of a randomised
       observation y = y + ξ, ξ ∼ K , and an acceptance probability of
                       ˜

                                      K (y − z)/M

       gives draws from the posterior distribution π(θ|y).
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


How exact a BC?




              “Using to represent measurement error is
              straightforward, whereas using to model the model
              discrepancy is harder to conceptualize and not as
              commonly used”
                                                [Richard Wilkinson, 2008]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


How exact a BC?
       Pros
              Pseudo-data from true model and observed data from noisy
              model
              Interesting perspective in that outcome is completely
              controlled
              Link with ABCµ and assuming y is observed with a
              measurement error with density K
              Relates to the theory of model approximation
                                                    [Kennedy  O’Hagan, 2001]
       Cons
              Requires K to be bounded by M
              True approximation error never assessed
              Requires a modification of the standard ABC algorithm
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC for HMMs


       Specific case of a hidden         Markov model



                                      Xt+1 ∼ Qθ (Xt , ·)
                                      Yt+1 ∼ gθ (·|xt )
                   0
       where only y1:n is observed.
                                            [Dean, Singh, Jasra,  Peters, 2011]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC for HMMs


       Specific case of a hidden          Markov model



                                       Xt+1 ∼ Qθ (Xt , ·)
                                       Yt+1 ∼ gθ (·|xt )
                   0
       where only y1:n is observed.
                                             [Dean, Singh, Jasra,  Peters, 2011]
       Use of specific constraints, adapted to the Markov structure:
                                     0                       0
                             y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MLE for HMMs

       ABC-MLE defined by
                     ˆ                       0                      0
                     θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
                                      θ
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MLE for HMMs

       ABC-MLE defined by
                     ˆ                       0                      0
                     θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
                                      θ


       Exact MLE for the likelihood              same basis as Wilkinson!


                                               0
                                          pθ (y1 , . . . , yn )

       corresponding to the perturbed process

                               (xt , yt + zt )1≤t≤n      zt ∼ U(B(0, 1)

                                               [Dean, Singh, Jasra,  Peters, 2011]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MLE is biased


              ABC-MLE is asymptotically (in n) biased with target

                                      l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


ABC-MLE is biased


              ABC-MLE is asymptotically (in n) biased with target

                                      l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]


              but ABD-MLE converges to true value in the sense

                                              l n (θn ) → l (θ)

              for all sequences (θn ) converging to θ and           n
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Noisy ABC-MLE


       Idea: Modify instead the data from the start
                                        0
                                      (y1 + ζ1 , . . . , yn + ζn )

                                                                     [   see Fearnhead-Prangle   ]
       noisy ABC-MLE estimate
                                 0                           0
              arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , )
                          θ

                                                [Dean, Singh, Jasra,  Peters, 2011]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Consistent noisy ABC-MLE



              Degrading the data improves the estimation performances:
                      Noisy ABC-MLE is asymptotically (in n) consistent
                      under further assumptions, the noisy ABC-MLE is
                      asymptotically normal
                      increase in variance of order −2
              likely degradation in precision or computing time due to the
              lack of summary statistic
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


SMC for ABC likelihood

       Algorithm 4 SMC ABC for HMMs
         Given θ
         for k = 1, . . . , n do
                                 1 1                N    N
           generate proposals (xk , yk ), . . . , (xk , yk ) from the model
           weigh each proposal with ωk   l =I                     l
                                                  B(yk + ζk , ) (yk )
                                                      0

                                                              l
           renormalise the weights and sample the xk ’s accordingly
         end for
         approximate the likelihood by
                                       n       N
                                                     l
                                                    ωk N
                                      k=1     l=1



                                           [Jasra, Singh, Martin,  McCoy, 2010]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Which summary?


       Fundamental difficulty of the choice of the summary statistic when
       there is no non-trivial sufficient statistics [except when done by the
       experimenters in the field]
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Which summary?


       Fundamental difficulty of the choice of the summary statistic when
       there is no non-trivial sufficient statistics [except when done by the
       experimenters in the field]
       Starting from a large collection of summary statistics is available,
       Joyce and Marjoram (2008) consider the sequential inclusion into
       the ABC target, with a stopping rule based on a likelihood ratio
       test.
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


Which summary?


       Fundamental difficulty of the choice of the summary statistic when
       there is no non-trivial sufficient statistics [except when done by the
       experimenters in the field]
       Starting from a large collection of summary statistics is available,
       Joyce and Marjoram (2008) consider the sequential inclusion into
       the ABC target, with a stopping rule based on a likelihood ratio
       test.
              Does not taking into account the sequential nature of the tests
              Depends on parameterisation
              Order of inclusion matters.
Likelihood-free Methods
   Approximate Bayesian computation
     Alphabet soup


A connected Monte Carlo study



                             of the pseudo-data per simulated parameter
                  Repeating simulations

              does not improve approximation
              Tolerance level             does not seem to be highly influential
              Choice of distance / summary statistics / calibration factors
              are paramount to successful approximation
              ABC-SMC outperforms ABC-MCMC
                                                      [Mckinley, Cook, Deardon, 2009]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Semi-automatic ABC

       Fearnhead and Prangle (2010) study ABC and the selection of the
       summary statistic in close proximity to Wilkinson’s proposal
       ABC then considered from a purely inferential viewpoint and
       calibrated for estimation purposes.
       Use of a randomised (or ‘noisy’) version of the summary statistics

                                             η (y) = η(y) + τ
                                             ˜

       Derivation of a well-calibrated version of ABC, i.e. an algorithm
       that gives proper predictions for the distribution associated with
       this randomised summary statistic.
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Semi-automatic ABC

       Fearnhead and Prangle (2010) study ABC and the selection of the
       summary statistic in close proximity to Wilkinson’s proposal
       ABC then considered from a purely inferential viewpoint and
       calibrated for estimation purposes.
       Use of a randomised (or ‘noisy’) version of the summary statistics

                                             η (y) = η(y) + τ
                                             ˜

       Derivation of a well-calibrated version of ABC, i.e. an algorithm
       that gives proper predictions for the distribution associated with
       this randomised summary statistic. [calibration constraint: ABC
       approximation with same posterior mean as the true randomised
       posterior.]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Summary statistics



       Optimality of the posterior expectation E[θ|y] of the parameter of
       interest as summary statistics η(y)!
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Summary statistics



       Optimality of the posterior expectation E[θ|y] of the parameter of
       interest as summary statistics η(y)!
       Use of the standard quadratic loss function

                                             (θ − θ0 )T A(θ − θ0 ) .
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Details on Fearnhead and Prangle (FP) ABC

       Use of a summary statistic S(·), an importance proposal g (·), a
       kernel K (·) ≤ 1 and a bandwidth h  0 such that

                                         (θ, ysim ) ∼ g (θ)f (ysim |θ)

       is accepted with probability (hence the bound)

                                             K [{S(ysim ) − sobs }/h]

       and the corresponding importance weight defined by

                                                   π(θ) g (θ)

                                                              [Fearnhead  Prangle, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Errors, errors, and errors

       Three levels of approximation
              π(θ|yobs ) by π(θ|sobs ) loss of information
                                                                                [ignored]
              π(θ|sobs ) by

                                             π(s)K [{s − sobs }/h]π(θ|s) ds
                          πABC (θ|sobs ) =
                                                π(s)K [{s − sobs }/h] ds

              noisy observations
              πABC (θ|sobs ) by importance Monte Carlo based on N
              simulations, represented by var(a(θ)|sobs )/Nacc [expected
              number of acceptances]
                                                               [M. Twain/B. Disraeli]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Average acceptance asymptotics


       For the average acceptance probability/approximate likelihood

                  p(θ|sobs ) =               f (ysim |θ) K [{S(ysim ) − sobs }/h] dysim ,

       overall acceptance probability

                    p(sobs ) =           p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd )

                                                                              [FP, Lemma 1]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Optimal importance proposal


       Best choice of importance proposal in terms of effective sample size

                                      g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2

                                                    [Not particularly useful in practice]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Optimal importance proposal


       Best choice of importance proposal in terms of effective sample size

                                      g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2

                                                    [Not particularly useful in practice]

              note that p(θ|sobs ) is an approximate likelihood
              reminiscent of parallel tempering
              could be approximately achieved by attrition of half of the
              data
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Calibration of h

              “This result gives insight into how S(·) and h affect the Monte
              Carlo error. To minimize Monte Carlo error, we need hd to be not
              too small. Thus ideally we want S(·) to be a low dimensional
              summary of the data that is sufficiently informative about θ that
              π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (FP, p.5)


              turns h into an absolute value while it should be
              context-dependent and user-calibrated
              only addresses one term in the approximation error and
              acceptance probability (“curse of dimensionality”)
              h large prevents πABC (θ|sobs ) to be close to π(θ|sobs )
              d small prevents π(θ|sobs ) to be close to π(θ|yobs ) (“curse of
              [dis]information”)
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Calibrating ABC

              “If πABC is calibrated, then this means that probability statements
              that are derived from it are appropriate, and in particular that we
              can use πABC to quantify uncertainty in estimates” (FP, p.5)
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Calibrating ABC

              “If πABC is calibrated, then this means that probability statements
              that are derived from it are appropriate, and in particular that we
              can use πABC to quantify uncertainty in estimates” (FP, p.5)


       Definition
       For 0  q  1 and subset A, event Eq (A) made of sobs such that
       PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if

                                             Pr(θ ∈ A|Eq (A)) = q



              unclear meaning of conditioning on Eq (A)
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Calibrated ABC



       Theorem (FP)
       Noisy ABC, where

                                   sobs = S(yobs ) + h ,     ∼ K (·)

       is calibrated
                                                                       [Wilkinson, 2008]
                                       no condition on h!!
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Calibrated ABC




       Consequence: when h = ∞

       Theorem (FP)
       The prior distribution is always calibrated

       is this a relevant property then?
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


More about calibrated ABC


              “Calibration is not universally accepted by Bayesians. It is even more
              questionable here as we care how statements we make relate to the
              real world, not to a mathematically defined posterior.” R. Wilkinson


              Same reluctance about the prior being calibrated
              Property depending on prior, likelihood, and summary
              Calibration is a frequentist property (almost a p-value!)
              More sensible to account for the simulator’s imperfections
              than using noisy-ABC against a meaningless based measure
                                                                    [Wilkinson, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Converging ABC


       Theorem (FP)
       For noisy ABC, the expected noisy-ABC log-likelihood,

          E {log[p(θ|sobs )]} =              log[p(θ|S(yobs ) + )]π(yobs |θ0 )K ( )dyobs d ,

       has its maximum at θ = θ0 .

       True for any choice of summary statistic? even ancilary statistics?!
                                        [Imposes at least identifiability...]
       Relevant in asymptotia and not for the data
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Converging ABC


       Corollary
       For noisy ABC, the ABC posterior converges onto a point mass on
       the true parameter value as m → ∞.

       For standard ABC, not always the case (unless h goes to zero).
       Strength of regularity conditions (c1) and (c2) in Bernardo
        Smith, 1994?
                     [out-of-reach constraints on likelihood and posterior]
       Again, there must be conditions imposed upon summary
       statistics...
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Loss motivated statistic
       Under quadratic loss function,

       Theorem (FP)
                                                ˆ
         (i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when
             ˆ = E(θ|yobs ) (!)
             θ
        (ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs )
                                               ˆ
       (iii) If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ]

                         ˆ
                  E[L(θ, θ)|yobs ] = trace(AΣ) + h2   xT AxK (x)dx + o(h2 ).

       measure-theoretic difficulties?
       dependence of sobs on h makes me uncomfortable inherent to noisy
       ABC
       Relevant for choice of K ?
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Optimal summary statistic
              “We take a different approach, and weaken the requirement for
              πABC to be a good approximation to π(θ|yobs ). We argue for πABC
              to be a good approximation solely in terms of the accuracy of
              certain estimates of the parameters.” (FP, p.5)

       From this result, FP
              derive their choice of summary statistic,

                                             S(y) = E(θ|y)

                                                                      [almost sufficient]
              suggest

                            h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )

              as optimal bandwidths for noisy and standard ABC.
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Optimal summary statistic
              “We take a different approach, and weaken the requirement for
              πABC to be a good approximation to π(θ|yobs ). We argue for πABC
              to be a good approximation solely in terms of the accuracy of
              certain estimates of the parameters.” (FP, p.5)

       From this result, FP
              derive their choice of summary statistic,

                                             S(y) = E(θ|y)

                                                    [wow! EABC [θ|S(yobs )] = E[θ|yobs ]]
              suggest

                            h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )

              as optimal bandwidths for noisy and standard ABC.
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Caveat


       Since E(θ|yobs ) is most usually unavailable, FP suggest
         (i) use a pilot run of ABC to determine a region of non-negligible
             posterior mass;
        (ii) simulate sets of parameter values and data;
       (iii) use the simulated sets of parameter values and data to
             estimate the summary statistic; and
       (iv) run ABC with this choice of summary statistic.
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Caveat


       Since E(θ|yobs ) is most usually unavailable, FP suggest
         (i) use a pilot run of ABC to determine a region of non-negligible
             posterior mass;
        (ii) simulate sets of parameter values and data;
       (iii) use the simulated sets of parameter values and data to
             estimate the summary statistic; and
       (iv) run ABC with this choice of summary statistic.
       where is the assessment of the first stage error?
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Approximating the summary statistic



       As Beaumont et al. (2002) and Blum and Fran¸ois (2010), FP
                                                        c
       use a linear regression to approximate E(θ|yobs ):
                                              (i)
                                       θi = β0 + β (i) f (yobs ) +   i

       with f made of standard transforms
                                                                 [Further justifications?]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


[my]questions about semi-automatic ABC

              dependence on h and S(·) in the early stage
              reduction of Bayesian inference to point estimation
              approximation error in step (i) not accounted for
              not parameterisation invariant
              practice shows that proper approximation to genuine posterior
              distributions stems from using a (much) larger number of
              summary statistics than the dimension of the parameter
              the validity of the approximation to the optimal summary
              statistic depends on the quality of the pilot run
              important inferential issues like model choice are not covered
              by this approach.
                                                              [Robert, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


More about semi-automatic ABC
                                             [End of section derived from comments on Read Paper, to appear]

              “The apparently arbitrary nature of the choice of summary statistics
              has always been perceived as the Achilles heel of ABC.” M.
              Beaumont
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


More about semi-automatic ABC
                                             [End of section derived from comments on Read Paper, to appear]

              “The apparently arbitrary nature of the choice of summary statistics
              has always been perceived as the Achilles heel of ABC.” M.
              Beaumont

              “Curse of dimensionality” linked with the increase of the
              dimension of the summary statistic
              Connection with principal component analysis
                                                                                       [Itan et al., 2010]
              Connection with partial least squares
                                                                               [Wegman et al., 2009]
              Beaumont et al. (2002) postprocessed output is used as input
              by FP to run a second ABC
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Wood’s alternative
       Instead of a non-parametric kernel approximation to the likelihood
                                        1
                                                  K {η(yr ) − η(yobs )}
                                        R     r

       Wood (2010) suggests a normal approximation

                                             η(y(θ)) ∼ Nd (µθ , Σθ )

       whose parameters can be approximated based on the R simulations
       (for each value of θ).
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Wood’s alternative
       Instead of a non-parametric kernel approximation to the likelihood
                                        1
                                                  K {η(yr ) − η(yobs )}
                                        R     r

       Wood (2010) suggests a normal approximation

                                             η(y(θ)) ∼ Nd (µθ , Σθ )

       whose parameters can be approximated based on the R simulations
       (for each value of θ).
              Parametric versus non-parametric rate [Uh?!]
              Automatic weighting of components of η(·) through Σθ
              Dependence on normality assumption (pseudo-likelihood?)
                                                   [Cornebise, Girolami  Kosmidis, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Reinterpretation and extensions
       Reinterpretation of ABC output as joint simulation from

                                      π (x, y |θ) = f (x|θ)¯Y |X (y |x)
                                      ¯                    π

       where
                                             πY |X (y |x) = K (y − x)
                                             ¯
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Reinterpretation and extensions
       Reinterpretation of ABC output as joint simulation from

                                      π (x, y |θ) = f (x|θ)¯Y |X (y |x)
                                      ¯                    π

       where
                                             πY |X (y |x) = K (y − x)
                                             ¯

       Reinterpretation of noisy ABC
       if y |y obs ∼ πY |X (·|y obs ), then marginally
          ¯          ¯

                                                 y ∼ πY |θ (·|θ0 )
                                                 ¯ ¯

     c Explain for the consistency of Bayesian inference based on y and π
                                                                  ¯     ¯
                                          [Lee, Andrieu  Doucet, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Reinterpretation and extensions
       Also fits within the pseudo-marginal of Andrieu and Roberts (2009)

       Algorithm 5 Rejuvenating GIMH-ABC
         At time t, given θt = θ:
         Sample θ ∼ g (·|θ).
                          ⊗N
         Sample z1:N ∼ πY |Θ (·|θ ).
                                         ⊗(N−1)
           Sample x1:N−1 ∼ πY |Θ                  (·|θ).
           With probability
                                                            N
                                       π(θ )g (θ|θ )        i=1 IBh (zi ) (y )
                          min 1,                     ×        N−1
                                       π(θ)g (θ |θ)    1+     i=1 IBh (xi ) (y )

           set θt+1 = θ .
           Otherwise set θt+1 = θ.
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Reinterpretation and extensions
       Also fits within the pseudo-marginal of Andrieu and Roberts (2009)

       Algorithm 6 One-hit MCMC-ABC
         At time t, with θt = θ:
         Sample θ ∼ g (·|θ).
         With probability 1 − min 1, π(θ )g(θ |θ)) , set θt+1 = θ and go to
                                         π(θ)g
                                               (θ|θ

         time t + 1.
         Sample zi ∼ πY |Θ (·|θ ) and xi ∼ πY |Θ (·|θ) for i = 1, . . . until
         y ∈ Bh (zi ) and/or y ∈ Bh (xi ).
         If y ∈ Bh (zi ) set θt+1 = θ and go to time t + 1.
         If y ∈ Bh (xi ) set θt+1 = θ and go to time t + 1.


                                             [Lee, Andrieu  Doucet, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Reinterpretation and extensions


       Also fits within the pseudo-marginal of Andrieu and Roberts (2009)

       Validation
        c The invariant distribution of θ is

                                             πθ|Y (·|y )
                                             ¯

       for both algorithms.

                                                     [Lee, Andrieu  Doucet, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


ABC for Markov chains

       Rewriting the posterior as

                                π(θ)1−n (θ)π(θ|x1 )   π(θ|xt−1 , xt )

       where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


ABC for Markov chains

       Rewriting the posterior as

                                π(θ)1−n (θ)π(θ|x1 )   π(θ|xt−1 , xt )

       where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
              Allows for a stepwise ABC, replacing each π(θ|xt−1 , xt ) by an
              ABC approximation
              Similarity with FP’s multiple sources of data (and also with
                Dean et al., 2011 )


                                                         [White et al., 2010, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Back to sufficiency
       Difference between regular sufficiency, equivalent to
                                             π(θ|y) = π(θ|η(y))
       for all θ’s and all priors π, and
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Back to sufficiency
       Difference between regular sufficiency, equivalent to
                                             π(θ|y) = π(θ|η(y))
       for all θ’s and all priors π, and
       marginal sufficiency, stated as
                                        π(µ(θ)|y) = π(µ(θ)|η(y))
       for all θ’s, the given prior π and a subvector µ(θ)
                                                                   [Basu, 1977]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Back to sufficiency
       Difference between regular sufficiency, equivalent to
                                             π(θ|y) = π(θ|η(y))
       for all θ’s and all priors π, and
       marginal sufficiency, stated as
                                        π(µ(θ)|y) = π(µ(θ)|η(y))
       for all θ’s, the given prior π and a subvector µ(θ)
                                                             [Basu, 1977]
       Relates to F  P’s main result, but could event be reduced to
       conditional sufficiency
                                   π(µ(θ)|yobs ) = π(µ(θ)|η(yobs ))
       (if feasible at all...)
                                                                      [Dawson, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


ABC and BCa’s




       Arguments in favour of a comparison of ABC with bootstrap in the
       event π(θ) is not defined from prior information but as
       conventional non-informative prior.
                                                            [Efron, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Predictive performances



       Instead of posterior means, other aspects of posterior to explore.
       E.g., look at minimising loss of information

                             p(θ, y)                            p(θ, η(y))
           p(θ, y) log               dθdy −   p(θ, η(y)) log               dθdη(y)
                            p(θ)p(y)                           p(θ)p(η(y))

       for selection of summary statistics.
                                          [Filippi, Barnes,  Stumpf, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables



       Auxiliary variable method avoids computations of untractable
       constant in likelihood
                                                          ˜
                                             f (y|θ) = Zθ f (y|θ)
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables

       Auxiliary variable method avoids computations of untractable
       constant in likelihood
                                                            ˜
                                               f (y|θ) = Zθ f (y|θ)

       Introduce pseudo-data z with artificial target g (z|θ, y)
       Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ )
       Accept with probability

                           π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ)
                                                                      ∧1
                            π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ )

                                             [Møller, Pettitt, Berthelsen,  Reeves, 2006]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables

       Auxiliary variable method avoids computations of untractable
       constant in likelihood
                                                            ˜
                                               f (y|θ) = Zθ f (y|θ)

       Introduce pseudo-data z with artificial target g (z|θ, y)
       Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ )
       Accept with probability
                                ˜                             ˜
                           π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ)
                                                                      ∧1
                                 ˜                           ˜
                            π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ )

                                             [Møller, Pettitt, Berthelsen,  Reeves, 2006]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables


       Auxiliary variable method avoids computations of untractable
       constant in likelihood
                                                          ˜
                                             f (y|θ) = Zθ f (y|θ)

       Introduce pseudo-data z with artificial target g (z|θ, y)
       Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ )

       For      Gibbs random fields    , existence of a genuine sufficient statistic η(y).
                                           [Møller, Pettitt, Berthelsen,  Reeves, 2006]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables and ABC


       Special case of ABC when
              g (z|θ, y) = K (η(bz) − η(y))
              ˜       ˜       ˜      ˜
              f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


Auxiliary variables and ABC


       Special case of ABC when
              g (z|θ, y) = K (η(bz) − η(y))
              ˜       ˜       ˜      ˜
              f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!]
       Consequences
              likelihood-free (ABC) versus constant-free (AVM)
              in ABC, K (·) should be allowed to depend on θ
              for Gibbs random fields, the auxiliary approach should be
              prefered to ABC
                                                                  [Møller, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


ABC and BIC



       Idea of applying BIC during the       local regression   :
              Run regular ABC
              Select summary statistics during local regression
              Recycle the prior simulation sample (reference table) with
              those summary statistics
              Rerun the corresponding local regression (low cost)
                                                                    [Pudlo  Sedki, 2012]
Likelihood-free Methods
   Approximate Bayesian computation
     Automated summary statistic selection


A Brave New World?!
Likelihood-free Methods
   ABC for model choice




ABC for model choice

       Introduction

       The Metropolis-Hastings Algorithm

       simulation-based methods in Econometrics

       Genetics of ABC

       Approximate Bayesian computation

       ABC for model choice
         Model choice
         Gibbs random fields
Likelihood-free Methods
   ABC for model choice
     Model choice


Bayesian model choice



       Several models M1 , M2 , . . . are considered simultaneously for a
       dataset y and the model index M is part of the inference.
       Use of a prior distribution. π(M = m), plus a prior distribution on
       the parameter conditional on the value m of the model index,
       πm (θ m )
       Goal is to derive the posterior distribution of M, challenging
       computational target when models are complex.
Likelihood-free Methods
   ABC for model choice
     Model choice


Generic ABC for model choice



       Algorithm 7 Likelihood-free model choice sampler (ABC-MC)
         for t = 1 to T do
           repeat
              Generate m from the prior π(M = m)
              Generate θ m from the prior πm (θ m )
              Generate z from the model fm (z|θ m )
           until ρ{η(z), η(y)} 
           Set m(t) = m and θ (t) = θ m
         end for
Likelihood-free Methods
   ABC for model choice
     Model choice


ABC estimates

       Posterior probability π(M = m|y) approximated by the frequency
       of acceptances from model m
                                       T
                                   1
                                             Im(t) =m .
                                   T
                                       t=1

       Issues with implementation:
              should tolerances   be the same for all models?
              should summary statistics vary across models (incl. their
              dimension)?
              should the distance measure ρ vary as well?
Likelihood-free Methods
   ABC for model choice
     Model choice


ABC estimates



       Posterior probability π(M = m|y) approximated by the frequency
       of acceptances from model m
                                    T
                                1
                                          Im(t) =m .
                                T
                                    t=1

       Extension to a weighted polychotomous logistic regression estimate
       of π(M = m|y), with non-parametric kernel weights
                                         [Cornuet et al., DIYABC, 2009]
Likelihood-free Methods
   ABC for model choice
     Model choice


The Great ABC controversy
       On-going controvery in phylogeographic genetics about the validity
       of using ABC for testing




   Against: Templeton, 2008,
   2009, 2010a, 2010b, 2010c
   argues that nested hypotheses
   cannot have higher probabilities
   than nesting hypotheses (!)
Likelihood-free Methods
   ABC for model choice
     Model choice


The Great ABC controversy



     On-going controvery in phylogeographic genetics about the validity
     of using ABC for testing
                                        Replies: Fagundes et al., 2008,
   Against: Templeton, 2008,           Beaumont et al., 2010, Berger et
   2009, 2010a, 2010b, 2010c           al., 2010, Csill`ry et al., 2010
                                                       e
   argues that nested hypotheses       point out that the criticisms are
   cannot have higher probabilities    addressed at [Bayesian]
   than nesting hypotheses (!)         model-based inference and have
                                       nothing to do with ABC...
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Gibbs random fields


       Gibbs distribution
       The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
       the graph G if

                                    1
                          f (y) =     exp −         Vc (yc )   ,
                                    Z
                                              c∈C

       where Z is the normalising constant, C is the set of cliques of G
       and Vc is any function also called potential sufficient statistic
       U(y) = c∈C Vc (yc ) is the energy function
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Gibbs random fields


       Gibbs distribution
       The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with
       the graph G if

                                    1
                          f (y) =     exp −         Vc (yc )   ,
                                    Z
                                              c∈C

       where Z is the normalising constant, C is the set of cliques of G
       and Vc is any function also called potential sufficient statistic
       U(y) = c∈C Vc (yc ) is the energy function

        c Z is usually unavailable in closed form
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Potts model
       Potts model
       Vc (y) is of the form

                          Vc (y) = θS(y) = θ         δyl =yi
                                               l∼i

       where l∼i denotes a neighbourhood structure
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Potts model
       Potts model
       Vc (y) is of the form

                          Vc (y) = θS(y) = θ             δyl =yi
                                                   l∼i

       where l∼i denotes a neighbourhood structure

       In most realistic settings, summation
                               Zθ =         exp{θ T S(x)}
                                      x∈X

       involves too many terms to be manageable and numerical
       approximations cannot always be trusted
                              [Cucala, Marin, CPR  Titterington, 2009]
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Bayesian Model Choice



       Comparing a model with potential S0 taking values in Rp0 versus a
       model with potential S1 taking values in Rp1 can be done through
       the Bayes factor corresponding to the priors π0 and π1 on each
       parameter space

                                          exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 )
                                                0
                          Bm0 /m1 (x) =
                                          exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 )
                                                1
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Bayesian Model Choice



       Comparing a model with potential S0 taking values in Rp0 versus a
       model with potential S1 taking values in Rp1 can be done through
       the Bayes factor corresponding to the priors π0 and π1 on each
       parameter space

                                          exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 )
                                                0
                          Bm0 /m1 (x) =
                                          exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 )
                                                1

       Use of Jeffreys’ scale to select most appropriate model
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Neighbourhood relations



       Choice to be made between M neighbourhood relations
                            m
                           i ∼i       (0 ≤ m ≤ M − 1)

       with
                                Sm (x) =          I{xi =xi   }
                                            m
                                           i ∼i

       driven by the posterior probabilities of the models.
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Model index



       Formalisation via a model index M that appears as a new
       parameter with prior distribution π(M = m) and
       π(θ|M = m) = πm (θm )
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Model index



       Formalisation via a model index M that appears as a new
       parameter with prior distribution π(M = m) and
       π(θ|M = m) = πm (θm )
       Computational target:

                 P(M = m|x) ∝        fm (x|θm )πm (θm ) dθm π(M = m) ,
                                Θm
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Sufficient statistics



       By definition, if S(x) sufficient statistic for the joint parameters
       (M, θ0 , . . . , θM−1 ),

                          P(M = m|x) = P(M = m|S(x)) .
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Sufficient statistics



       By definition, if S(x) sufficient statistic for the joint parameters
       (M, θ0 , . . . , θM−1 ),

                          P(M = m|x) = P(M = m|S(x)) .

       For each model m, own sufficient statistic Sm (·) and
       S(·) = (S0 (·), . . . , SM−1 (·)) also sufficient.
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Sufficient statistics in Gibbs random fields


       For Gibbs random fields,
                                            1            2
                    x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
                                               1
                                         =          f 2 (S(x)|θm )
                                           n(S(x)) m

       where
                           n(S(x)) = {˜ ∈ X : S(˜) = S(x)}
                                      x         x
        c S(x) is therefore also sufficient for the joint parameters
                                       [Specific to Gibbs random fields!]
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


ABC model choice Algorithm



       ABC-MC
              Generate m∗ from the prior π(M = m).
                        ∗
              Generate θm∗ from the prior πm∗ (·).
              Generate x ∗ from the model fm∗ (·|θm∗ ).
                                                  ∗

              Compute the distance ρ(S(x0 ), S(x∗ )).
              Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ ))  .
                       ∗


       Note When          = 0 the algorithm is exact
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


ABC approximation to the Bayes factor

       Frequency ratio:
                                        ˆ
                                        P(M = m0 |x0 ) π(M = m1 )
                    BF m0 /m1 (x0 ) =                 ×
                                        ˆ
                                        P(M = m1 |x0 ) π(M = m0 )
                                        {mi∗ = m0 } π(M = m1 )
                                   =               ×           ,
                                        {mi∗ = m1 } π(M = m0 )
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


ABC approximation to the Bayes factor

       Frequency ratio:
                                          ˆ
                                          P(M = m0 |x0 ) π(M = m1 )
                    BF m0 /m1 (x0 ) =                   ×
                                          ˆ
                                          P(M = m1 |x0 ) π(M = m0 )
                                           {mi∗ = m0 } π(M = m1 )
                                     =                ×           ,
                                           {mi∗ = m1 } π(M = m0 )

       replaced with

                                         1 + {mi∗ = m0 } π(M = m1 )
                     BF m0 /m1 (x0 ) =                  ×
                                         1 + {mi∗ = m1 } π(M = m0 )

       to avoid indeterminacy (also Bayes estimate).
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Toy example


       iid Bernoulli model versus two-state first-order Markov chain, i.e.
                                               n
                     f0 (x|θ0 ) = exp θ0            I{xi =1}      {1 + exp(θ0 )}n ,
                                           i=1

       versus
                                           n
                              1
               f1 (x|θ1 ) =     exp θ1             I{xi =xi−1 }    {1 + exp(θ1 )}n−1 ,
                              2
                                         i=2

       with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase
       transition” boundaries).
Likelihood-free Methods
   ABC for model choice
     Gibbs random fields


Toy example (2)




                                                                    10
                                 5




                                                                    5
                          BF01




                                                             BF01

                                                                    0
                           ^




                                                              ^
                                 0




                                                                    −5
                                 −5




                                      −40   −20     0   10          −10   −40   −20     0   10

                                             BF01                                BF01




       (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in
       logs) over 2, 000 simulations and 4.106 proposals from the prior.
       (right) Same when using tolerance corresponding to the 1%
       quantile on the distances.
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Back to sufficiency



              ‘Sufficient statistics for individual models are unlikely to
              be very informative for the model probability.’
                                         [Scott Sisson, Jan. 31, 2011, X.’Og]
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Back to sufficiency



              ‘Sufficient statistics for individual models are unlikely to
              be very informative for the model probability.’
                                         [Scott Sisson, Jan. 31, 2011, X.’Og]

       If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
       η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
       (η1 (x), η2 (x)) is not always sufficient for (m, θm )
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Back to sufficiency



              ‘Sufficient statistics for individual models are unlikely to
              be very informative for the model probability.’
                                         [Scott Sisson, Jan. 31, 2011, X.’Og]

       If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
       η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
       (η1 (x), η2 (x)) is not always sufficient for (m, θm )
        c Potential loss of information at the testing level
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 (T → ∞)

       ABC approximation
                                            T
                                            t=1 Imt =1 Iρ{η(zt ),η(y)}≤
                                B12 (y) =   T
                                                                          ,
                                            t=1 Imt =2 Iρ{η(zt ),η(y)}≤

       where the (mt , z t )’s are simulated from the (joint) prior
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 (T → ∞)

       ABC approximation
                                            T
                                            t=1 Imt =1 Iρ{η(zt ),η(y)}≤
                                B12 (y) =   T
                                                                          ,
                                            t=1 Imt =2 Iρ{η(zt ),η(y)}≤

       where the (mt , z t )’s are simulated from the (joint) prior
       As T go to infinity, limit

                                        Iρ{η(z),η(y)}≤ π1 (θ 1 )f1 (z|θ 1 ) dz dθ 1
                     B12 (y) =
                                        Iρ{η(z),η(y)}≤ π2 (θ 2 )f2 (z|θ 2 ) dz dθ 2
                                        Iρ{η,η(y)}≤ π1 (θ 1 )f1η (η|θ 1 ) dη dθ 1
                                  =                                               ,
                                        Iρ{η,η(y)}≤ π2 (θ 2 )f2η (η|θ 2 ) dη dθ 2

       where f1η (η|θ 1 ) and f2η (η|θ 2 ) distributions of η(z)
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 ( → 0)




       When         goes to zero,

                                 η          π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1
                                B12 (y) =                                 ,
                                            π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 ( → 0)




       When         goes to zero,

                                 η          π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1
                                B12 (y) =                                 ,
                                            π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2

                  c Bayes factor based on the sole observation of η(y)
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 (under sufficiency)

       If η(y) sufficient statistic for both models,

                                      fi (y|θ i ) = gi (y)fi η (η(y)|θ i )

       Thus

                                 Θ1   π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1
           B12 (y) =
                                 Θ2   π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2
                                g1 (y)     π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1   g1 (y) η
                          =                           η                  =       B (y) .
                                g2 (y)     π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2    g2 (y) 12

                                         [Didelot, Everitt, Johansen  Lawson, 2011]
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Limiting behaviour of B12 (under sufficiency)

       If η(y) sufficient statistic for both models,

                                      fi (y|θ i ) = gi (y)fi η (η(y)|θ i )

       Thus

                                 Θ1   π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1
           B12 (y) =
                                 Θ2   π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2
                                g1 (y)     π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1   g1 (y) η
                          =                           η                  =       B (y) .
                                g2 (y)     π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2    g2 (y) 12

                                         [Didelot, Everitt, Johansen  Lawson, 2011]
                    c No discrepancy only when cross-model sufficiency
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Poisson/geometric example

       Sample
                                   x = (x1 , . . . , xn )
       from either a Poisson P(λ) or from a geometric G(p) Then
                                         n
                                  S=          yi = η(x)
                                        i=1

       sufficient statistic for either model but not simultaneously
       Discrepancy ratio

                                g1 (x)   S!n−S / i yi !
                                       =
                                g2 (x)    1 n+S−1
                                                S
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Poisson/geometric discrepancy

                                η
       Range of B12 (x) versus B12 (x) B12 (x): The values produced have
       nothing in common.
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Formal recovery



       Creating an encompassing exponential family
                                       T           T
       f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

       leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                              [Didelot, Everitt, Johansen  Lawson, 2011]
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Formal recovery



       Creating an encompassing exponential family
                                       T           T
       f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

       leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                              [Didelot, Everitt, Johansen  Lawson, 2011]

       In the Poisson/geometric case, if        i   xi ! is added to S, no
       discrepancy
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Formal recovery



       Creating an encompassing exponential family
                                       T           T
       f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)}

       leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
                              [Didelot, Everitt, Johansen  Lawson, 2011]

       Only applies in genuine sufficiency settings...
       c Inability to evaluate loss brought by summary statistics
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Meaning of the ABC-Bayes factor

              ‘This is also why focus on model discrimination typically
              (...) proceeds by (...) accepting that the Bayes Factor
              that one obtains is only derived from the summary
              statistics and may in no way correspond to that of the
              full model.’
                                        [Scott Sisson, Jan. 31, 2011, X.’Og]
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Meaning of the ABC-Bayes factor

              ‘This is also why focus on model discrimination typically
              (...) proceeds by (...) accepting that the Bayes Factor
              that one obtains is only derived from the summary
              statistics and may in no way correspond to that of the
              full model.’
                                          [Scott Sisson, Jan. 31, 2011, X.’Og]

       In the Poisson/geometric case, if E[yi ] = θ0  0,

                                     η          (θ0 + 1)2 −θ0
                                lim B12 (y) =            e
                                n→∞                 θ0
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


MA(q) divergence




                                                                          0.7
                                                            0.6
                                0.6




                                              0.6




                                                                          0.6
                                                            0.5
                                0.5




                                              0.5




                                                                          0.5
                                                            0.4
                                0.4




                                              0.4




                                                                          0.4
                                                            0.3
                                0.3




                                              0.3




                                                                          0.3
                                                            0.2
                                0.2




                                              0.2




                                                                          0.2
                                                            0.1
                                0.1




                                              0.1




                                                                          0.1
                                0.0




                                              0.0




                                                            0.0




                                                                          0.0
                                      1   2         1   2         1   2         1   2




       Evolution [against ] of ABC Bayes factor, in terms of frequencies of
       visits to models MA(1) (left) and MA(2) (right) when equal to
       10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
       of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor
       equal to 17.71.
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


MA(q) divergence




                                                                          0.8
                                0.6




                                              0.6




                                                            0.6




                                                                          0.6
                                0.4




                                              0.4




                                                            0.4




                                                                          0.4
                                0.2




                                              0.2




                                                            0.2




                                                                          0.2
                                0.0




                                              0.0




                                                            0.0




                                                                          0.0
                                      1   2         1   2         1   2         1   2




       Evolution [against ] of ABC Bayes factor, in terms of frequencies of
       visits to models MA(1) (left) and MA(2) (right) when equal to
       10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
       of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
       equal to .004.
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Further comments


              ‘There should be the possibility that for the same model,
              but different (non-minimal) [summary] statistics (so
                                    ∗
              different η’s: η1 and η1 ) the ratio of evidences may no
              longer be equal to one.’
                                      [Michael Stumpf, Jan. 28, 2011, ’Og]


       Using different summary statistics [on different models] may
       indicate the loss of information brought by each set but agreement
       does not lead to trustworthy approximations.
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


A population genetics evaluation


       Population genetics example with
              3 populations
              2 scenari
              15 individuals
              5 loci
              single mutation parameter
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


A population genetics evaluation


       Population genetics example with
              3 populations
              2 scenari
              15 individuals
              5 loci
              single mutation parameter
              24 summary statistics
              2 million ABC proposal
              importance [tree] sampling alternative
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Stability of importance sampling


                                1.0




                                          1.0




                                                1.0




                                                          1.0




                                                                    1.0
                                                                q




                                      q
                                0.8




                                          0.8




                                                0.8




                                                          0.8




                                                                    0.8
                                0.6




                                          0.6




                                                0.6




                                                          0.6




                                                                    0.6
                                                      q
                                0.4




                                          0.4




                                                0.4




                                                          0.4




                                                                    0.4
                                0.2




                                          0.2




                                                0.2




                                                          0.2




                                                                    0.2
                                0.0




                                          0.0




                                                0.0




                                                          0.0




                                                                    0.0
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Comparison with ABC
       Use of 24 summary statistics and DIY-ABC logistic correction

                                                       1.0

                                                                                                                                                                      q q
                                                                                                                                                                        q q
                                                                                                                                                                         q
                                                                                                                                                                     q
                                                                                                                                                                     q    q
                                                                                                                                                                          q
                                                                                                                                                                          q
                                                                                                                                                                        q
                                                                                                                                                                       q q
                                                                                                                                                            q     q q qq q
                                                                                                                                                                         q
                                                       0.8




                                                                                                                                                                       q
                                                                                                                                                                   q    q
                                                                                                                                                                    q
                                                                                                                                                                     q   qq
                                                                                                                                                            q q
                                                                                                                                                                  q q
                                                                                                                                                          q q qq q
                           ABC estimates of P(M=1|y)




                                                                                                                                            q              q q q q q
                                                                                                                                                               qq
                                                                                                                                                            qq
                                                                                                                                                q qq
                                                                                                                                  q   q                             q
                                                                                                                                                 q          q
                                                       0.6




                                                                                                                                                 q   q
                                                                                                                                                     q       q
                                                                                                                                  q                           q
                                                                                                                             qq                    q
                                                                                                                                      q q
                                                                                                                                      q
                                                                                                                                                            q   q
                                                                                                                                  q
                                                                                         q                               q
                                                                                                                         q                        q
                                                                                                               q                                                q
                                                                                             q q      q                q q                            q
                                                                                                                                  q
                                                                                                           q
                                                                                               q
                                                                                    qq
                                                       0.4




                                                                                              qq               q
                                                                               q                                   q

                                                                           q                      q
                                                                       q
                                                                           q

                                                               q                              q
                                                       0.2




                                                                       q

                                                                   q
                                                       0.0




                                                             0.0                   0.2                    0.4                 0.6                     0.8                1.0

                                                                                    Importance Sampling estimates of P(M=1|y)
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Comparison with ABC
       Use of 15 summary statistics and DIY-ABC logistic correction

                                                       1.0
                                                                                                                                                                                         q      q
                                                                                                                                                           q                                  q qq
                                                                                                                                                                                                q
                                                                                                                                                                                         qq q qq
                                                                                                                                                               q                            qqq
                                                                                                                                                                                              qq
                                                                                                                                                                   q      qq            q q     q
                                                                                                                                                                                           q qq q
                                                                                                                                                                                               q
                                                                                                                                                                                 q       q       q
                                                                                                                                                                   q                  qq    q
                                                                                                                                                                                     q
                                                                                                                                                                                     q
                                                                                                                                                                   q                   q
                                                       0.8




                                                                                                                                                                           q
                                                                                                                                             q
                                                                                                                                                               q             q               qq
                                                                                                                                                                        q
                                                                                                                                              q
                                                                                                                                                     q                   q q
                                                                                                                                                 q                          q
                                                                                                                                                               q                             q
                           ABC estimates of P(M=1|y)




                                                                                                                                         q
                                                                                                                                                                   q         q                    q
                                                                                                                     q                               q q
                                                                                                 q                           q                                 q                     q            q
                                                                                                                                                                   q q
                                                       0.6




                                                                                                                                                                                         q

                                                                                                                                          q                                      q

                                                                                                 qq
                                                                                                                                 q
                                                                                                                 q
                                                                                             q                                                                                   q
                                                                                                     q                   q       qq
                                                       0.4




                                                               q                                                                             q
                                                                                                                     q
                                                                                    q                 q
                                                                                                             q
                                                                           q
                                                                                                                                     q
                                                                                         q
                                                                               q                      q
                                                       0.2




                                                                           q                                                                                                         q
                                                                                                         q
                                                                       q
                                                                   q   q                                                                                                                     q
                                                       0.0




                                                             0.0                   0.2                           0.4                     0.6                           0.8                        1.0

                                                                                    Importance Sampling estimates of P(M=1|y)
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


Comparison with ABC
       Use of 24 summary statistics and DIY-ABC logistic correction

                                                       1.0
                                                                                                            q                                                           q
                                                                                                                                                               q                  q           q
                                                                                                                                                  q
                                                                                                                                                                        qq                q
                                                                                                                                     q
                                                                                                                                                 q             qq                     q
                                                                                                                                                                                      q
                                                                                                                                                                    q   q
                                                                                                                             q       q                             q
                                                                                                                                             q
                                                                                                                                             q                     q
                                                                                                                                         q
                                                       0.8




                                                                                                                                                                   qq
                                                                                                                                                                    q
                                                                                               q
                                                                                                                 q                       q
                                                                                                                                                                        q q               q
                                                                                                                 q                                                 q
                                                                                                                                   q                               q
                                                                                                                qq               qq
                           ABC estimates of P(M=1|y)




                                                                                                                                         q
                                                                                                                     q               q                              q

                                                                                                                                                                    q
                                                       0.6




                                                                                                                                                           q
                                                                                   q                                              q               q
                                                                                                                             q                                 q
                                                                                                          q q                     q
                                                                                                        q q              qq              q
                                                                                                           q                                      q
                                                                                                          q                                                    q
                                                                                                          q   q
                                                                                           q
                                                                                                            q
                                                                                                                             q               q
                                                                           q                            q                     q
                                                       0.4




                                                                                       q                             q
                                                                                               q                                             q         q
                                                                                                                                             q        q
                                                                                                       q                             q
                                                                                                        q                q
                                                                                                                                 q                                            q
                                                                                       q                                                     q
                                                                                                                                 q
                                                                                                         q
                                                                                               q           q
                                                                                                        q
                                                       0.2




                                                                   q                                    q
                                                                                                   q
                                                                       q
                                                                               q
                                                                                                   q
                                                       0.0




                                                             0.0               0.2                     0.4                                   0.6                         0.8                      1.0

                                                                                Importance Sampling estimates of P(M=1|y)
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


The only safe cases???



       Besides specific models like Gibbs random fields,
       using distances over the data itself escapes the discrepancy...
                                [Toni  Stumpf, 2010; Sousa  al., 2009]
Likelihood-free Methods
   ABC for model choice
     Generic ABC model choice


The only safe cases???



       Besides specific models like Gibbs random fields,
       using distances over the data itself escapes the discrepancy...
                                [Toni  Stumpf, 2010; Sousa  al., 2009]

       ...and so does the use of more informal model fitting measures
                                                  [Ratmann  al., 2009]
Likelihood-free Methods
   ABC model choice consistency




ABC model choice consistency

       Introduction

       The Metropolis-Hastings Algorithm

       simulation-based methods in Econometrics

       Genetics of ABC

       Approximate Bayesian computation

       ABC for model choice

       ABC model choice consistency
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


The starting point



       Central question to the validation of ABC for model choice:

       When is a Bayes factor based on an insufficient statistic T(y)
                               consistent?
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


The starting point



       Central question to the validation of ABC for model choice:

       When is a Bayes factor based on an insufficient statistic T(y)
                               consistent?

                                      T
       Note: c drawn on T(y) through B12 (y) necessarily differs from c
       drawn on y through B12 (y)
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example




       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example


       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
       Four possible statistics
         1. sample mean y (sufficient for M1 if not M2 );
         2. sample median med(y) (insufficient);
         3. sample variance var(y) (ancillary);
         4. median absolute deviation mad(y) = med(y − med(y));
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example
       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
                 6
                 5
                 4
       Density

                 3
                 2
                 1
                 0




                     0.1   0.2   0.3          0.4        0.5   0.6   0.7

                                       posterior probability
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example
       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
                            n=100
         0.7
         0.6
         0.5
         0.4




                      q
         0.3




                      q
                      q

                      q
         0.2
         0.1




                                       q
                                       q
                                       q

                                       q
                                       q
         0.0




                                       q
                                       q



                    Gauss           Laplace
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example
       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
                 6




                                                                                     3
                 5
                 4
       Density




                                                                           Density

                                                                                     2
                 3
                 2




                                                                                     1
                 1
                 0




                                                                                     0




                     0.1   0.2   0.3          0.4        0.5   0.6   0.7                 0.0   0.2   0.4                 0.6   0.8   1.0

                                       posterior probability                                               probability
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


A benchmark if toy example
       Comparison suggested by referee of PNAS paper [thanks]:
                                 [X, Cornuet, Marin,  Pillai, Aug. 2011]
       Model M1 : y ∼ N (θ1 , 1) opposed to model M2 :
                    √
       y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale
                      2),
       parameter 1/ 2 (variance one).
                            n=100                           n=100




                                              1.0
         0.7




                                                                       q

                                                                       q
         0.6




                                                                       q
                                              0.8
                                                                       q

                                                                       q
         0.5




                                                                       q
                                                                       q

                                                                       q
                                              0.6




                                                                       q
         0.4




                                                                       q

                                                      q
                      q                               q
         0.3




                      q
                      q
                                              0.4




                                                      q
                      q
         0.2




                                              0.2




                                                      q
         0.1




                                       q
                                       q
                                       q              q
                                       q              q

                                       q
                                              0.0
         0.0




                                       q
                                       q



                    Gauss           Laplace         Gauss           Laplace
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


Framework

       Starting from sample

                                          y = (y1 , . . . , yn )

       the observed sample, not necessarily iid with true distribution

                                                y ∼ Pn

       Summary statistics

                          T(y) = Tn = (T1 (y), T2 (y), · · · , Td (y)) ∈ Rd

       with true distribution Tn ∼ Gn .
Likelihood-free Methods
   ABC model choice consistency
     Formalised framework


Framework



        c Comparison of
           – under M1 , y ∼ F1,n (·|θ1 ) where θ1 ∈ Θ1 ⊂ Rp1
           – under M2 , y ∼ F2,n (·|θ2 ) where θ2 ∈ Θ2 ⊂ Rp2
       turned into
           – under M1 , T(y) ∼ G1,n (·|θ1 ), and θ1 |T(y) ∼ π1 (·|Tn )
           – under M2 , T(y) ∼ G2,n (·|θ2 ), and θ2 |T(y) ∼ π2 (·|Tn )
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions


       A collection of asymptotic “standard” assumptions:
       [A1] There exist a sequence {vn } converging to +∞,
       an a.c. distribution Q with continuous bounded density q(·),
       a symmetric, d × d positive definite matrix V0
       and a vector µ0 ∈ Rd such that
                                      −1/2                n→∞
                                  vn V0      (Tn − µ0 )         Q, under Gn

        and for all M  0
                                            −d                   −1/2
                      sup         |V0 |1/2 vn gn (t) − q vn V0          {t − µ0 }   = o(1)
                 vn |t−µ0 |M
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions




       A collection of asymptotic “standard” assumptions:
       [A2] For i = 1, 2, there exist d × d symmetric positive definite matrices
       Vi (θi ) and µi (θi ) ∈ Rd such that
                                                       n→∞
                     vn Vi (θi )−1/2 (Tn − µi (θi ))         Q,   under Gi,n (·|θi ) .
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions


       A collection of asymptotic “standard” assumptions:
       [A3] For i = 1, 2, there exist sets Fn,i ⊂ Θi and constants               i , τi , αi   0
       such that for all τ  0,

                             sup Gi,n |Tn − µ(θi )|  τ |µi (θi ) − µ0 | ∧   i   |θi
                            θi ∈Fn,i
                                        −α                            −αi
                                       vn i sup (|µi (θi ) − µ0 | ∧ i )
                                            θi ∈Fn,i

        with
                                                 c          −τ
                                            πi (Fn,i ) = o(vn i ).
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions



       A collection of asymptotic “standard” assumptions:
       [A4] For u  0
                                                                       −1
                            Sn,i (u) = θi ∈ Fn,i ; |µ(θi ) − µ0 | ≤ u vn

       if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exist constants di  τi ∧ αi − 1
       such that
                                                       −d
                                 πi (Sn,i (u)) ∼ u di vn i , ∀u vn
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions

       A collection of asymptotic “standard” assumptions:
       [A5] If inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exists U  0 such that for
       any M  0,

                                                                             −d
                                      sup          sup        |Vi (θi )|1/2 vn gi (t|θi )
                                  vn |t−µ0 |M θi ∈Sn,i (U)

                                     −q vn Vi (θi )−1/2 (t − µ(θi )         = o(1)

        and

                               πi Sn,i (U) ∩ ||Vi (θi )−1 || + ||Vi (θi )||  M
            lim lim sup                                                                     =0.
          M→∞              n                        πi (Sn,i (U))
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Assumptions


       A collection of asymptotic “standard” assumptions:
       [A1]–[A2] are standard central limit theorems ([A1] redundant
       when one model is “true”)
       [A3] controls the large deviations of the estimator Tn from the
       estimand µ(θ)
       [A4] is the standard prior mass condition found in Bayesian
       asymptotics (di effective dimension of the parameter)
       [A5] controls more tightly convergence esp. when µi is not
       one-to-one
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Effective dimension




       [A4] Understanding d1 , d2 : defined only when
       µ0 ∈ {µi (θi ), θi ∈ Θi },

                           πi (θi : |µi (θi ) − µ0 |  n−1/2 ) = O(n−di /2 )

       is the effective dimension of the model Θi around µ0
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Asymptotic marginals

       Asymptotically, under [A1]–[A5]

                                  mi (t) =        gi (t|θi ) πi (θi ) dθi
                                             Θi

       is such that
       (i) if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0,

                                  Cl vn i ≤ mi (Tn ) ≤ Cu vn i
                                      d−d                  d−d


       and
       (ii) if inf{|µi (θi ) − µ0 |; θi ∈ Θi }  0

                                  mi (Tn ) = oPn [vn i + vn i ].
                                                   d−τ    d−α
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Within-model consistency



       Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the
       posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn
       provided αi ∧ τi  di .
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Within-model consistency



       Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the
       posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn
       provided αi ∧ τi  di .
       Note: di can truly be seen as an effective dimension of the model
       under the posterior πi (.|Tn ), since if µ0 ∈ {µi (θi ); θi ∈ Θi },

                                  mi (Tn ) ∼ vn i
                                              d−d
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Between-model consistency




       Consequence of above is that asymptotic behaviour of the Bayes
       factor is driven by the asymptotic mean value of Tn under both
       models. And only by this mean value!
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Between-model consistency

       Consequence of above is that asymptotic behaviour of the Bayes
       factor is driven by the asymptotic mean value of Tn under both
       models. And only by this mean value!
       Indeed, if

           inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0

        then
                              −(d1 −d2 )                             −(d1 −d2 )
                          Cl vn            ≤ m1 (Tn ) m2 (Tn ) ≤ Cu vn            ,

       where Cl , Cu = OPn (1), irrespective of the true model.
       c Only depends on the difference d1 − d2
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Between-model consistency


       Consequence of above is that asymptotic behaviour of the Bayes
       factor is driven by the asymptotic mean value of Tn under both
       models. And only by this mean value!
       Else, if

           inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 }  inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0

        then
                           m1 (Tn )         −(d1 −α2 ) −(d1 −τ2 )
                                n ≥ Cu min vn         , vn        ,
                           m2 (T )
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Consistency theorem



       If

            inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0,

       Bayes factor
                                    T          −(d1 −d2 )
                                   B12 = O(vn               )
       irrespective of the true model. It is consistent iff Pn is within the
       model with the smallest dimension
Likelihood-free Methods
   ABC model choice consistency
     Consistency results


Consistency theorem



       If Pn belongs to one of the two models and if µ0 cannot be
       attained by the other one :

                           0 = min (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2)
                              max (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) ,
                              T
       then the Bayes factor B12 is consistent
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Consequences on summary statistics



       Bayes factor driven by the means µi (θi ) and the relative position of
       µ0 wrt both sets {µi (θi ); θi ∈ Θi }, i = 1, 2.
       For ABC, this implies the most likely statistics Tn are ancillary
       statistics with different mean values under both models
       Else, if Tn asymptotically depends on some of the parameters of
       the models, it is quite likely that there exists θi ∈ Θi such that
       µi (θi ) = µ0 even though model M1 is misspecified
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [1]


       If
                             n
            Tn = n−1              Xi4 ,   µ1 (θ) = 3 + θ4 + 6θ2 ,   µ2 (θ) = 6 + · · ·
                           i=1

        and the true distribution is Laplace with mean θ0 = 1, under the
                                         √
       Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ )
                                                   [here d1 = d2 = d = 1]
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [1]


       If
                             n
            Tn = n−1              Xi4 ,   µ1 (θ) = 3 + θ4 + 6θ2 ,   µ2 (θ) = 6 + · · ·
                           i=1

        and the true distribution is Laplace with mean θ0 = 1, under the
                                         √
       Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ )
                                                   [here d1 = d2 = d = 1]
       c a Bayes factor associated with such a statistic is inconsistent
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [1]
       If
                                n
              n           −1
            T =n                     Xi4 ,       µ1 (θ) = 3 + θ4 + 6θ2 ,                            µ2 (θ) = 6 + · · ·
                               i=1
                                             8
                                             6
                                             4
                                             2
                                             0




                                                 0.0    0.2   0.4                 0.6   0.8   1.0

                                                                    probability




                                                       Fourth moment
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [1]
       If
                                n
              n           −1
            T =n                     Xi4 ,   µ1 (θ) = 3 + θ4 + 6θ2 ,   µ2 (θ) = 6 + · · ·
                               i=1
                                                      n=1000
                                     0.45




                                                 q
                                     0.40
                                     0.35




                                                               q
                                                               q


                                                 q
                                     0.30




                                                 q



                                                M1             M2
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [1]

       If
                                n
              n           −1
            T =n                     Xi4 ,   µ1 (θ) = 3 + θ4 + 6θ2 ,   µ2 (θ) = 6 + · · ·
                               i=1

       Caption: Comparison of the distributions of the posterior
       probabilities that the data is from a normal model (as opposed to a
       Laplace model) with unknown mean when the data is made of
       n = 1000 observations either from a normal (M1) or Laplace (M2)
       distribution with mean one and when the summary statistic in the
       ABC algorithm is restricted to the empirical fourth moment. The
       ABC algorithm uses proposals from the prior N (0, 4) and selects
       the tolerance as the 1% distance quantile.
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [0]

       When
                                           (4)   (6)
                                  T(y) = yn , yn
                                         ¯    ¯

       and the true distribution is Laplace with mean θ0 = 0, then
                                        √
                    ∗              ∗
       µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3
                                                    [d1 = 1 and d2 = 1/2]
       thus
                          B12 ∼ n−1/4 → 0 : consistent
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [0]

       When
                                               (4)   (6)
                                    T(y) = yn , yn
                                           ¯    ¯

       and the true distribution is Laplace with mean θ0 = 0, then
                                        √
                    ∗              ∗
       µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3
                                                    [d1 = 1 and d2 = 1/2]
       thus
                          B12 ∼ n−1/4 → 0 : consistent
       Under the Gaussian model µ0 = 3 µ2 (θ2 ) ≥ 6  3 ∀θ2

                                  B12 → +∞ :     consistent
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Toy example: Laplace versus Gauss [0]
       When
                                                                        (4)           (6)
                                                T(y) = yn , yn
                                                       ¯    ¯


                                            5
                                            4
                                            3
                                  Density

                                            2
                                            1
                                            0




                                                0.0   0.2      0.4          0.6       0.8   1.0

                                                            posterior probabilities




                                  Fourth and sixth moments
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Embedded models




       When M1 submodel of M2 , and if the true distribution belongs to
       the smaller model M1 , Bayes factor is of order
                                    −(d1 −d2 )
                                  vn
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Embedded models



       When M1 submodel of M2 , and if the true distribution belongs to
       the smaller model M1 , Bayes factor is of order
                                     −(d1 −d2 )
                                    vn

       If summary statistic only informative on a parameter that is the
       same under both models, i.e if d1 = d2 , then
        c the Bayes factor is not consistent
Likelihood-free Methods
   ABC model choice consistency
     Summary statistics


Embedded models



       When M1 submodel of M2 , and if the true distribution belongs to
       the smaller model M1 , Bayes factor is of order
                                     −(d1 −d2 )
                                    vn

       Else, d1  d2 and Bayes factor is consistent under M1 . If true
       distribution not in M1 , then
        c Bayes factor is consistent only if µ1 = µ2 = µ0

ABC in Roma

  • 1.
    Likelihood-free Methods Likelihood-free Methods Christian P. Robert Universit´ Paris-Dauphine & CREST e http://www.ceremade.dauphine.fr/~xian February 16, 2012
  • 2.
    Likelihood-free Methods Outline Introduction The Metropolis-HastingsAlgorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
  • 3.
    Likelihood-free Methods Introduction Approximate Bayesian computation Introduction Monte Carlo basics Population Monte Carlo AMIS The Metropolis-Hastings Algorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation
  • 4.
    Likelihood-free Methods Introduction Monte Carlo basics General purpose Given a density π known up to a normalizing constant, and an integrable function h, compute h(x)˜ (x)µ(dx) π Π(h) = h(x)π(x)µ(dx) = π (x)µ(dx) ˜ when h(x)˜ (x)µ(dx) is intractable. π
  • 5.
    Likelihood-free Methods Introduction Monte Carlo basics Monte Carlo basics Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N −1 ˆ N h(xi ). i=1 ˆ as LLN: ΠMC (h) −→ Π(h) N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆ N ΠMC (h) − Π(h) N N 0, Π [h − Π(h)]2 .
  • 6.
    Likelihood-free Methods Introduction Monte Carlo basics Monte Carlo basics Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N −1 ˆ N h(xi ). i=1 ˆ as LLN: ΠMC (h) −→ Π(h) N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆ N ΠMC (h) − Π(h) N N 0, Π [h − Π(h)]2 . Caveat Often impossible or inefficient to simulate directly from Π
  • 7.
    Likelihood-free Methods Introduction Monte Carlo basics Importance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx).
  • 8.
    Likelihood-free Methods Introduction Monte Carlo basics Importance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx). Principle Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by N ΠIS (h) = N −1 ˆ Q,N h(xi ){π/q}(xi ). i=1
  • 9.
    Likelihood-free Methods Introduction Monte Carlo basics Importance Sampling (convergence) Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ N(ΠIS (h) − Π(h)) Q,N N 0, Q{(hπ/q − Π(h))2 } .
  • 10.
    Likelihood-free Methods Introduction Monte Carlo basics Importance Sampling (convergence) Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ N(ΠIS (h) − Π(h)) Q,N N 0, Q{(hπ/q − Π(h))2 } . Caveat ˆ If normalizing constant unknown, impossible to use ΠIS Q,N Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
  • 11.
    Likelihood-free Methods Introduction Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). Q,N i=1 i=1
  • 12.
    Likelihood-free Methods Introduction Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). Q,N i=1 i=1 ˆ as LLN : ΠSNIS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ N(ΠSNIS (h) − Π(h)) Q,N N 0, π {(π/q)(h − Π(h)}2 ) .
  • 13.
    Likelihood-free Methods Introduction Monte Carlo basics Self-Normalised Importance Sampling Self normalized version N −1 N ˆ ΠSNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). Q,N i=1 i=1 ˆ as LLN : ΠSNIS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ N(ΠSNIS (h) − Π(h)) Q,N N 0, π {(π/q)(h − Π(h)}2 ) . c The quality of the SNIS approximation depends on the choice of Q
  • 14.
    Likelihood-free Methods Introduction Monte Carlo basics Iterated importance sampling Introduction of an algorithmic temporal dimension : (t) (t−1) xi ∼ qt (x|xi ) i = 1, . . . , n, t = 1, . . . and n ˆ 1 (t) (t) It = i h(xi ) n i=1 is still unbiased for (t) (t) πt (xi ) i = (t) (t−1) , i = 1, . . . , n qt (xi |xi )
  • 15.
    Likelihood-free Methods Introduction Population Monte Carlo PMC: Population Monte Carlo Algorithm At time t = 0 iid Generate (xi,0 )1≤i≤N ∼ Q0 Set ωi,0 = {π/q0 }(xi,0 ) iid Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N ) ω Set xi,0 = xJi ,0 ˜
  • 16.
    Likelihood-free Methods Introduction Population Monte Carlo PMC: Population Monte Carlo Algorithm At time t = 0 iid Generate (xi,0 )1≤i≤N ∼ Q0 Set ωi,0 = {π/q0 }(xi,0 ) iid Generate (Ji,0 )1≤i≤N ∼ M(1, (¯ i,0 )1≤i≤N ) ω Set xi,0 = xJi ,0 ˜ At time t (t = 1, . . . , T ), ind Generate xi,t ∼ Qi,t (˜i,t−1 , ·) x Set ωi,t = {π(xi,t )/qi,t (˜i,t−1 , xi,t )} x iid Generate (Ji,t )1≤i≤N ∼ M(1, (¯ i,t )1≤i≤N ) ω Set xi,t = xJi,t ,t . ˜ [Capp´, Douc, Guillin, Marin, & CPR, 2009, Stat.& Comput.] e
  • 17.
    Likelihood-free Methods Introduction Population Monte Carlo Notes on PMC After T iterations of PMC, PMC estimator of Π(h) given by T N ¯ 1 ΠPMC (h) = N,T ¯ ωi,t h(xi,t ). T t=1 i=1
  • 18.
    Likelihood-free Methods Introduction Population Monte Carlo Notes on PMC After T iterations of PMC, PMC estimator of Π(h) given by T N ¯ 1 ΠPMC (h) = N,T ¯ ωi,t h(xi,t ). T t=1 i=1 1. ωi,t means normalising over whole sequence of simulations ¯ 2. Qi,t ’s chosen arbitrarily under support constraint 3. Qi,t ’s may depend on whole sequence of simulations 4. natural solution for ABC
  • 19.
    Likelihood-free Methods Introduction Population Monte Carlo Improving quality The efficiency of the SNIS approximation depends on the choice of Q, ranging from optimal q(x) ∝ |h(x) − Π(h)|π(x) to useless ˆ var ΠSNIS (h) = +∞ Q,N
  • 20.
    Likelihood-free Methods Introduction Population Monte Carlo Improving quality The efficiency of the SNIS approximation depends on the choice of Q, ranging from optimal q(x) ∝ |h(x) − Π(h)|π(x) to useless ˆ var ΠSNIS (h) = +∞ Q,N Example (PMC=adaptive importance sampling) Population Monte Carlo is producing a sequence of proposals Qt aiming at improving efficiency ˆ ˆ Kull(π, qt ) ≤ Kull(π, qt−1 ) or var ΠSNIS (h) ≤ var ΠSNIS ,∞ (h) Qt ,∞ Qt−1 [Capp´, Douc, Guillin, Marin, Robert, 04, 07a, 07b, 08] e
  • 21.
    Likelihood-free Methods Introduction AMIS Multiple Importance Sampling Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T generate an iid sample t t x1 , . . . , xN ∼ Qt and estimate Π(h) by T N ΠMIS (h) = T −1 ˆ Q,N N −1 h(xit )ωit t=1 i=1 where π(xit ) ωit = correct... qt (xit )
  • 22.
    Likelihood-free Methods Introduction AMIS Multiple Importance Sampling Reycling: given several proposals Q1 , . . . , QT , for 1 ≤ t ≤ T generate an iid sample t t x1 , . . . , xN ∼ Qt and estimate Π(h) by T N ΠMIS (h) = T −1 ˆ Q,N N −1 h(xit )ωit t=1 i=1 where π(xit ) ωit = T still correct! T −1 =1 q (xit )
  • 23.
    Likelihood-free Methods Introduction AMIS Mixture representation Deterministic mixture correction of the weights proposed by Owen and Zhou (JASA, 2000) The corresponding estimator is still unbiased [if not self-normalised] All particles are on the same weighting scale rather than their own Large variance proposals Qt do not take over Variance reduction thanks to weight stabilization & recycling [K.o.] removes the randomness in the component choice [=Rao-Blackwellisation]
  • 24.
    Likelihood-free Methods Introduction AMIS Global adaptation Global Adaptation At iteration t = 1, · · · , T , ˆ 1. For 1 ≤ i ≤ N1 , generate xit ∼ T3 (ˆt−1 , Σt−1 ) µ 2. Calculate the mixture importance weight of particle xit ωit = π xit δit where t−1 δit = ˆ ˆ qT (3) xit ; µl , Σl l=0
  • 25.
    Likelihood-free Methods Introduction AMIS Backward reweighting 3. If t ≥ 2, actualize the weights of all past particles, xil 1≤l ≤t −1 ωil = π xit δil where ˆ δil = δil + qT (3) xil ; µt−1 , Σt−1 ˆ ˆ 4. Compute IS estimates of target mean and variance µt and Σt , ˆ where t N1 t N1 µt ˆj = ωil (xj )li ωil . . . l=1 i=1 l=1 i=1
  • 26.
    Likelihood-free Methods Introduction AMIS A toy example 10 0 −10 −20 −30 −40 −20 0 20 40 Banana shape benchmark: marginal distribution of (x1 , x2 ) for the parameters 2 σ1 = 100 and b = 0.03. Contours represent 60% (red), 90% (black) and 99.9% (blue) confidence regions in the marginal space.
  • 27.
    Likelihood-free Methods Introduction AMIS A toy example p=5 p=10 p=20 q 1e+05 20000 40000 60000 80000 60000 8e+04 6e+04 20000 q 4e+04 q q q q 0 Banana shape example: boxplots of 10 replicate ESSs for the AMIS scheme (left) and the NOT-AMIS scheme (right) for p = 5, 10, 20.
  • 28.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis-Hastings Algorithm Introduction The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains The Metropolis–Hastings algorithm The random walk Metropolis-Hastings algorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation
  • 29.
    Likelihood-free Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains Epiphany! It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx ,
  • 30.
    Likelihood-free Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains Epiphany! It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx , Principle: Obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f , using an ergodic Markov chain with stationary distribution f [Metropolis et al., 1953]
  • 31.
    Likelihood-free Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f
  • 32.
    Likelihood-free Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Insures the convergence in distribution of (X (t) ) to a random variable from f . For a “large enough” T0 , X (T0 ) can be considered as distributed from f Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is generated from f , sufficient for most approximation purposes.
  • 33.
    Likelihood-free Methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x (0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Insures the convergence in distribution of (X (t) ) to a random variable from f . For a “large enough” T0 , X (T0 ) can be considered as distributed from f Produce a dependent sample X (T0 ) , X (T0 +1) , . . ., which is generated from f , sufficient for most approximation purposes. Problem: How can one build a Markov chain with a given stationary distribution?
  • 34.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The Metropolis–Hastings algorithm Basics The algorithm uses the objective (target) density f and a conditional density q(y |x) called the instrumental (or proposal) distribution
  • 35.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The MH algorithm Algorithm (Metropolis–Hastings) Given x (t) , 1. Generate Yt ∼ q(y |x (t) ). 2. Take Yt with prob. ρ(x (t) , Yt ), X (t+1) = x (t) with prob. 1 − ρ(x (t) , Yt ), where f (y ) q(x|y ) ρ(x, y ) = min ,1 . f (x) q(y |x)
  • 36.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Features Independent of normalizing constants for both f and q(·|x) (ie, those constants independent of x) Never move to values with f (y ) = 0 The chain (x (t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
  • 37.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f (y ) K (y , x) = f (x) K (x, y )
  • 38.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f (y ) K (y , x) = f (x) K (x, y ) If q(y |x) > 0 for every (x, y ), the chain is Harris recurrent and T 1 lim h(X (t) ) = h(x)df (x) a.e. f . T →∞ T t=1 lim K n (x, ·)µ(dx) − f =0 n→∞ TV for every initial distribution µ
  • 39.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Random walk Metropolis–Hastings Use of a local perturbation as proposal Yt = X (t) + εt , where εt ∼ g , independent of X (t) . The instrumental density is now of the form g (y − x) and the Markov chain is a random walk if we take g to be symmetric g (x) = g (−x)
  • 40.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Algorithm (Random walk Metropolis) Given x (t) 1. Generate Yt ∼ g (y − x (t) ) 2. Take f (Yt )  Y with prob. min 1, , (t+1) t X = f (x (t) )  (t) x otherwise.
  • 41.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm RW-MH on mixture posterior distribution 3 2 µ2 1 0 X −1 −1 0 1 2 3 µ1 Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1)
  • 42.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Acceptance rate A high acceptance rate is not indication of efficiency since the random walk may be moving “too slowly” on the target surface
  • 43.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Acceptance rate A high acceptance rate is not indication of efficiency since the random walk may be moving “too slowly” on the target surface If x (t) and yt are “too close”, i.e. f (x (t) ) f (yt ), yt is accepted with probability f (yt ) min ,1 1. f (x (t) ) and acceptance rate high
  • 44.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Acceptance rate A high acceptance rate is not indication of efficiency since the random walk may be moving “too slowly” on the target surface If average acceptance rate low, the proposed values f (yt ) tend to be small wrt f (x (t) ), i.e. the random walk [not the algorithm!] moves quickly on the target surface often reaching its boundaries
  • 45.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
  • 46.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Noisy AR(1) Target distribution of x given x1 , x2 and y is −1 τ2 exp (x − ϕx1 )2 + (x2 − ϕx)2 + (y − x 2 )2 . 2τ 2 σ2 For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
  • 47.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Noisy AR(2) Markov chain based on a random walk with scale ω = .1.
  • 48.
    Likelihood-free Methods The Metropolis-Hastings Algorithm The random walk Metropolis-Hastings algorithm Noisy AR(3) Markov chain based on a random walk with scale ω = .5.
  • 49.
    Likelihood-free Methods simulation-based methods in Econometrics Meanwhile, in a Gaulish village... Similar exploration of simulation-based techniques in Econometrics Simulated method of moments Method of simulated moments Simulated pseudo-maximum-likelihood Indirect inference [Gouri´roux & Monfort, 1996] e
  • 50.
    Likelihood-free Methods simulation-based methods in Econometrics Simulated method of moments o Given observations y1:n from a model yt = r (y1:(t−1) , t , θ) , t ∼ g (·) simulate 1:n , derive yt (θ) = r (y1:(t−1) , t , θ) and estimate θ by n arg min (yto − yt (θ))2 θ t=1
  • 51.
    Likelihood-free Methods simulation-based methods in Econometrics Simulated method of moments o Given observations y1:n from a model yt = r (y1:(t−1) , t , θ) , t ∼ g (·) simulate 1:n , derive yt (θ) = r (y1:(t−1) , t , θ) and estimate θ by n n 2 arg min yto − yt (θ) θ t=1 t=1
  • 52.
    Likelihood-free Methods simulation-based methods in Econometrics Method of simulated moments Given a statistic vector K (y ) with Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ) find an unbiased estimator of k(y1:(t−1) ; θ), ˜ k( t , y1:(t−1) ; θ) Estimate θ by n S arg min K (yt ) − ˜ t k( s , y1:(t−1) ; θ)/S θ t=1 s=1 [Pakes & Pollard, 1989]
  • 53.
    Likelihood-free Methods simulation-based methods in Econometrics Indirect inference ˆ Minimise (in θ) the distance between estimators β based on pseudo-models for genuine observations and for observations simulated under the true model and the parameter θ. [Gouri´roux, Monfort, & Renault, 1993; e Smith, 1993; Gallant & Tauchen, 1996]
  • 54.
    Likelihood-free Methods simulation-based methods in Econometrics Indirect inference (PML vs. PSE) Example of the pseudo-maximum-likelihood (PML) ˆ β(y) = arg max log f (yt |β, y1:(t−1) ) β t leading to ˆ ˆ arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
  • 55.
    Likelihood-free Methods simulation-based methods in Econometrics Indirect inference (PML vs. PSE) Example of the pseudo-score-estimator (PSE) 2 ˆ ∂ log f β(y) = arg min (yt |β, y1:(t−1) ) β t ∂β leading to ˆ ˆ arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
  • 56.
    Likelihood-free Methods simulation-based methods in Econometrics Consistent indirect inference ...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identified the different methods become easier...
  • 57.
    Likelihood-free Methods simulation-based methods in Econometrics Consistent indirect inference ...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identified the different methods become easier... Consistency depending on the criterion and on the asymptotic identifiability of θ [Gouri´roux, Monfort, 1996, p. 66] e
  • 58.
    Likelihood-free Methods simulation-based methods in Econometrics AR(2) vs. MA(1) example true (AR) model yt = t −θ t−1 and [wrong!] auxiliary (MA) model yt = β1 yt−1 + β2 yt−2 + ut R code x=eps=rnorm(250) x[2:250]=x[2:250]-0.5*x[1:249] simeps=rnorm(250) propeta=seq(-.99,.99,le=199) dist=rep(0,199) bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef) for (t in 1:199) dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]* simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
  • 59.
    Likelihood-free Methods simulation-based methods in Econometrics AR(2) vs. MA(1) example One sample: 0.8 0.6 distance 0.4 0.2 0.0 −1.0 −0.5 0.0 0.5 1.0
  • 60.
    Likelihood-free Methods simulation-based methods in Econometrics AR(2) vs. MA(1) example Many samples: 6 5 4 3 2 1 0 0.2 0.4 0.6 0.8 1.0
  • 61.
    Likelihood-free Methods simulation-based methods in Econometrics Choice of pseudo-model Pick model such that ˆ 1. β(θ) not flat (i.e. sensitive to changes in θ) ˆ 2. β(θ) not dispersed (i.e. robust agains changes in ys (θ)) [Frigessi & Heggland, 2004]
  • 62.
    Likelihood-free Methods simulation-based methods in Econometrics ABC using indirect inference We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference(...) In the indirect inference approach to ABC the parameters of an auxiliary model fitted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning(...) [Drovandi, Pettitt & Faddy, 2011] c Indirect inference provides summary statistics for ABC...
  • 63.
    Likelihood-free Methods simulation-based methods in Econometrics ABC using indirect inference ...the above result shows that, in the limit as h → 0, ABC will be more accurate than an indirect inference method whose auxiliary statistics are the same as the summary statistic that is used for ABC(...) Initial analysis showed that which method is more accurate depends on the true value of θ. [Fearnhead and Prangle, 2012] c Indirect inference provides estimates rather than global inference...
  • 64.
    Likelihood-free Methods Genetics of ABC Genetic background of ABC ABC is a recent computational technique that only requires being able to sample from the likelihood f (·|θ) This technique stemmed from population genetics models, about 15 years ago, and population geneticists still contribute significantly to methodological developments of ABC. [Griffith & al., 1997; Tavar´ & al., 1999] e
  • 65.
    Likelihood-free Methods Genetics of ABC Population genetics [Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010] Describe the genotypes, estimate the alleles frequencies, determine their distribution among individuals, populations and between populations; Predict and understand the evolution of gene frequencies in populations as a result of various factors. c Analyses the effect of various evolutive forces (mutation, drift, migration, selection) on the evolution of gene frequencies in time and space.
  • 66.
    Likelihood-free Methods Genetics of ABC A[B]ctors Les mécanismes de l’évolution Towards an explanation of the mechanisms of evolutionary changes, mathematical models ofl’évolution ont to be developed •! Les mathématiques de evolution started in the years 1920-1930. commencé à être développés dans les années 1920-1930 Ronald A Fisher (1890-1962) Sewall Wright (1889-1988) John BS Haldane (1892-1964)
  • 67.
    Likelihood-free Methods Genetics of ABC Wright-Fisher model de Le modèle Wright-Fisher •! En l’absence de mutation et de sélection,A population of constant les fréquences alléliquessize, in which individuals dérivent (augmentent et diminuent) inévitablement reproduce at the same time. jusqu’à la fixation d’un allèle Each gene in a generation is a copy of a gene of the •! La dérivepreviousdonc à la conduit generation. perte de variation génétique à l’intérieur In the absence of mutation des populations and selection, allele frequencies derive inevitably until the fixation of an allele.
  • 68.
    Likelihood-free Methods Genetics of ABC Coalescent theory [Kingman, 1982; Tajima, Tavar´, &tc] e !"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6" Coalescence theory interested in the genealogy of a sample of !!7**+$,-'",()5534%'" common ancestor of the sample. " genes back in time to the " "!"7**+$,-'"8",$)('5,'1,'"9"
  • 69.
    Likelihood-free Methods Genetics of ABC Common ancestor Les différentes lignées fusionnent (coalescent) au fur et à mesure que l’on remonte vers le Time of coalescence passé (T) Modélisation du processus de dérive génétique The different lineages mergeen “remontant dans lein the past. when we go back temps” jusqu’à l’ancêtre commun d’un échantillon de gènes 6
  • 70.
    Likelihood-free Methods Genetics of ABC Sous l’hypothèse de neutralité des marqueurs génétiques étudiés, les mutations sont indépendantes de la généalogie Neutral généalogie ne dépend que des processus démographiques i.e. la mutations On construit donc la généalogie selon les paramètres démographiques (ex. N), puis on ajoute aassumption les Under the posteriori of mutations sur les différentes neutrality, the mutations branches,independentau feuilles de are du MRCA of the l’arbregenealogy. On obtient ainsi des données de We construct the genealogy polymorphisme sous les modèles according to the démographiques et mutationnels demographic parameters, considérés then we add a posteriori the mutations. 20
  • 71.
    Likelihood-free Methods Genetics of ABC Demo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we can not calculate the likelihood of the polymorphism data f (y|θ).
  • 72.
    Likelihood-free Methods Genetics of ABC Untractable likelihood Missing (too missing!) data structure: f (y|θ) = f (y|G , θ)f (G |θ)dG G The genealogies are considered as nuisance parameters. This problematic thus differs from the phylogenetic approach where the tree is the parameter of interesst.
  • 73.
    Likelihood-free Methods Genetics of ABC A genuine !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! example of application 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ 94 Pygmies populations: do they have a common origin? Is there a lot of exchanges between pygmies and non-pygmies populations?
  • 74.
    Likelihood-free Methods Genetics of ABC !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ Alternative scenarios Différents scénarios possibles, choix de scenario par ABC Verdu et al. 2009 96
  • 75.
    1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+ Likelihood-free Methods !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03 Genetics of ABC 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*. Différentsresults Simulation scénarios possibles, choix de scenari Différents scénarios possibles, choix de scenario par ABC c Scenario 1A is chosen. largement soutenu par rapport aux Le scenario 1a est autres ! plaide pour une origine commune des Le scenario 1a est largement soutenu par populations pygmées d’Afrique de l’Ouest rap Verdu e
  • 76.
    Likelihood-free Methods Genetics of ABC !""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03! Most likely scenario 1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+ Scénario évolutif : on « raconte » une histoire à partir de ces inférences Verdu et al. 2009 99
  • 77.
    Likelihood-free Methods Approximate Bayesian computation Approximate Bayesian computation Introduction The Metropolis-Hastings Algorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC basics Alphabet soup Automated summary statistic selection
  • 78.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
  • 79.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Untractable likelihoods
  • 80.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Illustrations Example Stochastic volatility model: for Highest weight trajectories t = 1, . . . , T , 0.4 0.2 yt = exp(zt ) t , zt = a+bzt−1 +σηt , 0.0 −0.2 T very large makes it difficult to −0.4 include z within the simulated 0 200 400 t 600 800 1000 parameters
  • 81.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Illustrations Example Potts model: if y takes values on a grid Y of size k n and f (y|θ) ∝ exp θ Iyl =yi l∼i where l∼i denotes a neighbourhood relation, n moderately large prohibits the computation of the normalising constant
  • 82.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Illustrations Example Inference on CMB: in cosmology, study of the Cosmic Microwave Background via likelihoods immensely slow to computate (e.g WMAP, Plank), because of numerically costly spectral transforms [Data is a Fortran program] [Kilbinger et al., 2010, MNRAS]
  • 83.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Illustrations Example Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]
  • 84.
    Likelihood-free Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ)
  • 85.
    Likelihood-free Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique:
  • 86.
    Likelihood-free Methods Approximate Bayesian computation ABC basics The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´ et al., 1997] e
  • 87.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Why does it work?! The proof is trivial: f (θi ) ∝ π(θi )f (z|θi )Iy (z) z∈D ∝ π(θi )f (y|θi ) = π(θi |y) . [Accept–Reject 101]
  • 88.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Earlier occurrence ‘Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many models over one dataset’ [Don Rubin, Annals of Statistics, 1984] Note Rubin (1984) does not promote this algorithm for likelihood-free simulation but frequentist intuition on posterior distributions: parameters from posteriors are more likely to be those that could have generated the data.
  • 89.
    Likelihood-free Methods Approximate Bayesian computation ABC basics A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance
  • 90.
    Likelihood-free Methods Approximate Bayesian computation ABC basics A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance Output distributed from π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < ) [Pritchard et al., 1999]
  • 91.
    Likelihood-free Methods Approximate Bayesian computation ABC basics ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} ≤ set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 92.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
  • 93.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Output The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
  • 94.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (first attempt) What happens when → 0?
  • 95.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (first attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ > 0, there exists 0 such that < 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)µ(B ) ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
  • 96.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (first attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ > 0, there exists 0 such that < 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ) XX ) µ(BX ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) µ(B ) XXX
  • 97.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (first attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ 0, there exists 0 such that 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ) XX ) µ(BX ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X µ(B ) X X [Proof extends to other continuous-in-0 kernels K ]
  • 98.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (second attempt) What happens when → 0?
  • 99.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Convergence of ABC (second attempt) What happens when → 0? For B ⊂ Θ, we have A f (z|θ)dz f (z|θ)π(θ)dθ ,y B π(θ)dθ = dz B A ,y ×Θ π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ B f (z|θ)π(θ)dθ m(z) = dz A ,y m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ m(z) = π(B|z) dz A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ which indicates convergence for a continuous π(B|z).
  • 100.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  • 101.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
  • 102.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g -prior modelling: β ∼ N3 (0, n XT X)−1
  • 103.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g -prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 104.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Pima Indian benchmark 80 100 1.0 80 60 0.8 60 0.6 Density Density Density 40 40 0.4 20 20 0.2 0.0 0 0 −0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0 Figure: Comparison between density estimates of the marginals on β1 (left), β2 (center) and β3 (right) from ABC rejection samples (red) and MCMC samples (black) .
  • 105.
    Likelihood-free Methods Approximate Bayesian computation ABC basics MA example Back to the MA(q) model q xt = t + ϑi t−i i=1 Simple prior: uniform over the inverse [real and complex] roots in q Q(u) = 1 − ϑi u i i=1 under the identifiability conditions
  • 106.
    Likelihood-free Methods Approximate Bayesian computation ABC basics MA example Back to the MA(q) model q xt = t + ϑi t−i i=1 Simple prior: uniform prior over the identifiability zone, e.g. triangle for MA(2)
  • 107.
    Likelihood-free Methods Approximate Bayesian computation ABC basics MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−qt≤T 3. producing a simulated series (xt )1≤t≤T
  • 108.
    Likelihood-free Methods Approximate Bayesian computation ABC basics MA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−qt≤T 3. producing a simulated series (xt )1≤t≤T Distance: basic distance between the series T ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2 t=1 or distance between summary statistics like the q autocorrelations T τj = xt xt−j t=j+1
  • 109.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Comparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 110.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Comparison of distance impact 4 1.5 3 1.0 2 0.5 1 0.0 0 0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5 θ1 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 111.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Comparison of distance impact 4 1.5 3 1.0 2 0.5 1 0.0 0 0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5 θ1 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
  • 112.
    Likelihood-free Methods Approximate Bayesian computation ABC basics Homonomy The ABC algorithm is not to be confused with the ABC algorithm The Artificial Bee Colony algorithm is a swarm based meta-heuristic algorithm that was introduced by Karaboga in 2005 for optimizing numerical problems. It was inspired by the intelligent foraging behavior of honey bees. The algorithm is specifically based on the model proposed by Tereshko and Loengarov (2005) for the foraging behaviour of honey bee colonies. The model consists of three essential components: employed and unemployed foraging bees, and food sources. The first two components, employed and unemployed foraging bees, search for rich food sources (...) close to their hive. The model also defines two leading modes of behaviour (...): recruitment of foragers to rich food sources resulting in positive feedback and abandonment of poor sources by foragers causing negative feedback. [Karaboga, Scholarpedia]
  • 113.
    Likelihood-free Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency
  • 114.
    Likelihood-free Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
  • 115.
    Likelihood-free Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
  • 116.
    Likelihood-free Methods Approximate Bayesian computation ABC basics ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
  • 117.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ such that ρ(η(z), η(y)) , replace θ’s with locally regressed transforms (use with BIC) θ∗ = θ − {η(z) − η(y)}T β ˆ [Csill´ry et al., TEE, 2010] e ˆ where β is obtained by [NP] weighted least square regression on (η(z) − η(y)) with weights Kδ {ρ(η(z), η(y))} [Beaumont et al., 2002, Genetics]
  • 118.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)]
  • 119.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)] Curse of dimensionality: poor estimate when d = dim(η) is large...
  • 120.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior expectation i θi Kδ {ρ(η(z(θi )), η(y))} i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
  • 121.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior conditional density ˜ Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
  • 122.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))}
  • 123.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d ) g c var(ˆ (θ|y)) = g (1 + oP (1)) nbδ d [Blum, JASA, 2010]
  • 124.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d ) g c var(ˆ (θ|y)) = g (1 + oP (1)) nbδ d [standard NP calculations]
  • 125.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NCH Incorporating non-linearities and heterocedasticities: σ (η(y)) ˆ θ∗ = m(η(y)) + [θ − m(η(z))] ˆ ˆ σ (η(z)) ˆ
  • 126.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NCH Incorporating non-linearities and heterocedasticities: σ (η(y)) ˆ θ∗ = m(η(y)) + [θ − m(η(z))] ˆ ˆ σ (η(z)) ˆ where m(η) estimated by non-linear regression (e.g., neural network) ˆ σ (η) estimated by non-linear regression on residuals ˆ log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi ˆ [Blum Fran¸ois, 2009] c
  • 127.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NCH (2) Why neural network?
  • 128.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-NCH (2) Why neural network? fights curse of dimensionality selects relevant summary statistics provides automated dimension reduction offers a model choice capability improves upon multinomial logistic [Blum Fran¸ois, 2009] c
  • 129.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,  ω (θ  (t) θ otherwise,
  • 130.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,  ω (θ  (t) θ otherwise, has the posterior π(θ|y ) as stationary distribution [Marjoram et al, 2003]
  • 131.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MCMC (2) Algorithm 2 Likelihood-free MCMC sampler Use Algorithm 1 to get (θ(0) , z(0) ) for t = 1 to N do Generate θ from Kω ·|θ(t−1) , Generate z from the likelihood f (·|θ ), Generate u from U[0,1] , π(θ )Kω (θ(t−1) |θ ) if u ≤ I π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then set (θ(t) , z(t) ) = (θ , z ) else (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ), end if end for
  • 132.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Why does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) XX|θ IA ,y (z ) ) f (z X X = π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) ) q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) ) × q(θ |θ (t−1) ) X|θ X ) f (z XX
  • 133.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Why does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) XX|θ IA ,y (z ) ) f (z X X = hh ( π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) ) f (z(t−1) |θ (t−1) ((( h (( hh ( q(θ (t−1) |θ ) ((hhhhh) f (z(t−1) |θ (t−1) (( ((( h × q(θ |θ (t−1) ) X|θ X ) f (z XX
  • 134.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Why does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) X|θ IA ,y (z ) X ) f (z X X = hh ( π(θ (t−1) ) ((hhhhh) XX(t−1) ) f (z(t−1) |θ (t−1) IAX (( X ((( h ,y (zX X hh ( q(θ (t−1) |θ ) ((hhhhh) f (z(t−1) |θ (t−1) (( ((( h × q(θ |θ (t−1) ) X|θ X ) f (z XX π(θ )q(θ (t−1) |θ ) = IA ,y (z ) π(θ (t−1) q(θ |θ (t−1) )
  • 135.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example Case of 1 1 x ∼ N (θ, 1) + N (θ, 1) 2 2 under prior θ ∼ N (0, 10)
  • 136.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example Case of 1 1 x ∼ N (θ, 1) + N (θ, 1) 2 2 under prior θ ∼ N (0, 10) ABC sampler thetas=rnorm(N,sd=10) zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1) eps=quantile(abs(zed-x),.01) abc=thetas[abs(zed-x)eps]
  • 137.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example Case of 1 1 x ∼ N (θ, 1) + N (θ, 1) 2 2 under prior θ ∼ N (0, 10) ABC-MCMC sampler metas=rep(0,N) metas[1]=rnorm(1,sd=10) zed[1]=x for (t in 2:N){ metas[t]=rnorm(1,mean=metas[t-1],sd=5) zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1) if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){ metas[t]=metas[t-1] zed[t]=zed[t-1]} }
  • 138.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x =2
  • 139.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x =2 0.30 0.25 0.20 0.15 0.10 0.05 0.00 −4 −2 0 2 4 θ
  • 140.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x = 10
  • 141.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x = 10 0.25 0.20 0.15 0.10 0.05 0.00 −10 −5 0 5 10 θ
  • 142.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x = 50
  • 143.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A toy example x = 50 0.10 0.08 0.06 0.04 0.02 0.00 −40 −20 0 20 40 θ
  • 144.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ)
  • 145.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and x when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation.
  • 146.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
  • 147.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ) ABCµ involves acceptance probability ˆ π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ ) ˆ π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
  • 148.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ multiple errors [ c Ratmann et al., PNAS, 2009]
  • 149.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABCµ for model choice [ c Ratmann et al., PNAS, 2009]
  • 150.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Questions about ABCµ For each model under comparison, marginal posterior on used to assess the fit of the model (HPD includes 0 or not).
  • 151.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Questions about ABCµ For each model under comparison, marginal posterior on used to assess the fit of the model (HPD includes 0 or not). Is the data informative about ? [Identifiability] How is the prior π( ) impacting the comparison? How is using both ξ( |x0 , θ) and π ( ) compatible with a standard probability model? [remindful of Wilkinson ] Where is the penalisation for complexity in the model comparison? [X, Mengersen Chen, 2010, PNAS]
  • 152.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )
  • 153.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1. Select θ at random from previous θi ’s with probabilities (t−1) ωi (1 ≤ i ≤ N). 2. Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3. Check that (x, y ) , otherwise start again. [Sisson et al., 2007]
  • 154.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Why PRC? Partial rejection control: Resample from a population of weighted particles by pruning away particles with weights below threshold C , replacing them by new particles obtained by propagating an existing particle by an SMC step and modifying the weights accordinly. [Liu, 2001]
  • 155.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Why PRC? Partial rejection control: Resample from a population of weighted particles by pruning away particles with weights below threshold C , replacing them by new particles obtained by propagating an existing particle by an SMC step and modifying the weights accordinly. [Liu, 2001] PRC justification in ABC-PRC: Suppose we then implement the PRC algorithm for some c 0 such that only identically zero weights are smaller than c Trouble is, there is no such c...
  • 156.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel.
  • 157.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior.
  • 158.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness
  • 159.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC bias Lack of unbiasedness of the method
  • 160.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-PRC bias Lack of unbiasedness of the method Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to (t−1) (t) (t−1) (t) π(θ |y )Kt (θ |θ )f (y |θ ), For an arbitrary function h(θ), E [ωt h(θ(t) )] proportional to (t) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) (t−1) (t) (t−1) (t) (t−1) (t) h(θ ) π(θ |y )Kt (θ |θ )f (y |θ )dθ dθ π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) (t−1) (t−1) ∝ h(θ ) π(θ )f (y |θ ) π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) (t−1) (t) (t−1) (t) × Kt (θ |θ )f (y |θ )dθ dθ (t) (t) (t−1) (t) (t−1) (t−1) (t) ∝ h(θ )π(θ |y ) Lt−1 (θ |θ )f (y |θ )dθ dθ .
  • 161.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A mixture example (1) Toy model of Sisson et al. (2007): if θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) , then the posterior distribution associated with y = 0 is the normal mixture θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100) restricted to [−10, 10]. Furthermore, true target available as π(θ||x| ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
  • 162.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup “Ugly, squalid graph...” 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ Comparison of τ = 0.15 and τ = 1/0.15 in Kt
  • 163.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A PMC version Use of the same kernel idea as ABC-PRC but with IS correction ( check PMC ) Generate a sample at iteration t by N (t) (t−1) (t−1) πt (θ ˆ )∝ ωj Kt (θ(t) |θj ) j=1 modulo acceptance of the associated xt , and use an importance (t) weight associated with an accepted simulation θi (t) (t) (t) ωi ∝ π(θi ) πt (θi ) . ˆ c Still likelihood free [Beaumont et al., 2009]
  • 164.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Our ABC-PMC algorithm Given a decreasing sequence of approximation levels 1 ≥ ... ≥ T, 1. At iteration t = 1, For i = 1, ..., N (1) (1) Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y ) 1 (1) Set ωi = 1/N (1) Take τ 2 as twice the empirical variance of the θi ’s 2. At iteration 2 ≤ t ≤ T , For i = 1, ..., N, repeat (t−1) (t−1) Pick θi from the θj ’s with probabilities ωj (t) 2 (t) generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi ) until (x, y ) t (t) (t) N (t−1) −1 (t) (t−1) Set ωi ∝ π(θi )/ j=1 ωj ϕ σt θi − θj ) 2 (t) Take τt+1 as twice the weighted empirical variance of the θi ’s
  • 165.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Sequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions πn with π0 “easy” and πT as target. Iterated IS as PMC: particles moved from time n to time n via kernel Kn and use of a sequence of extended targets πn˜ n πn (z0:n ) = πn (zn ) ˜ Lj (zj+1 , zj ) j=0 where the Lj ’s are backward Markov kernels [check that πn (zn ) is a marginal] [Del Moral, Doucet Jasra, Series B, 2006]
  • 166.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Sequential Monte Carlo (2) Algorithm 3 SMC sampler (0) sample zi ∼ γ0 (x) (i = 1, . . . , N) (0) (0) (0) compute weights wi = π0 (zi )/γ0 (zi ) for t = 1 to N do if ESS(w (t−1) ) NT then resample N particles z (t−1) and set weights to 1 end if (t−1) (t−1) generate zi ∼ Kt (zi , ·) and set weights to (t) (t) (t−1) (t) (t−1) πt (zi ))Lt−1 (zi ), zi )) wi = wi−1 (t−1) (t−1) (t) πt−1 (zi ))Kt (zi ), zi )) end for
  • 167.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-SMC [Del Moral, Doucet Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel Kn associated with target π n and derivation of the backward kernel π n (z )Kn (z , z) Ln−1 (z, z ) = πn (z) Update of the weights M m m=1 IA n (xin ) win ∝ wi(n−1) M m (xi(n−1) ) m=1 IA n−1 m when xin ∼ K (xi(n−1) , ·)
  • 168.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-SMCM Modification: Makes M repeated simulations of the pseudo-data z given the parameter, rather than using a single [M = 1] simulation, leading to weight that is proportional to the number of accepted zi s M 1 ω(θ) = Iρ(η(y),η(zi )) M i=1 [limit in M means exact simulation from (tempered) target]
  • 169.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Properties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles Major assumption: the forward kernel K is supposed to be invariant against the true target [tempered version of the true posterior]
  • 170.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Properties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles Major assumption: the forward kernel K is supposed to be invariant against the true target [tempered version of the true posterior] Adaptivity in ABC-SMC algorithm only found in on-line construction of the thresholds t , slowly enough to keep a large number of accepted transitions
  • 171.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A mixture example (2) Recovery of the target, whether using a fixed standard deviation of τ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s. 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 θ θ θ θ θ
  • 172.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Yet another ABC-PRC Another version of ABC-PRC called PRC-ABC with a proposal distribution Mt , S replications of the pseudo-data, and a PRC step: PRC-ABC Algorithm (t−1) (t−1) 1. Select θ at random from the θi ’s with probabilities ωi (t) iid (t−1) 2. Generate θi ∼ Kt , x1 , . . . , xS ∼ f (x|θi ), set (t) (t) ωi = Kt {η(xs ) − η(y)} Kt (θi ) , s (t) and accept with probability p (i) = ωi /ct,N (t) (t) 3. Set ωi ∝ ωi /p (i) (1 ≤ i ≤ N) [Peters, Sisson Fan, Stat Computing, 2012]
  • 173.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Yet another ABC-PRC Another version of ABC-PRC called PRC-ABC with a proposal distribution Mt , S replications of the pseudo-data, and a PRC step: (t−1) (t−1) Use of a NP estimate Mt (θ) = ωi ψ(θ|θi (with a fixed variance?!) S = 1 recommended for “greatest gains in sampler performance” ct,N upper quantile of non-zero weights at time t
  • 174.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Wilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008]
  • 175.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Wilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = y + ξ, ξ ∼ K , and an acceptance probability of ˜ K (y − z)/M gives draws from the posterior distribution π(θ|y).
  • 176.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup How exact a BC? “Using to represent measurement error is straightforward, whereas using to model the model discrepancy is harder to conceptualize and not as commonly used” [Richard Wilkinson, 2008]
  • 177.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup How exact a BC? Pros Pseudo-data from true model and observed data from noisy model Interesting perspective in that outcome is completely controlled Link with ABCµ and assuming y is observed with a measurement error with density K Relates to the theory of model approximation [Kennedy O’Hagan, 2001] Cons Requires K to be bounded by M True approximation error never assessed Requires a modification of the standard ABC algorithm
  • 178.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC for HMMs Specific case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) 0 where only y1:n is observed. [Dean, Singh, Jasra, Peters, 2011]
  • 179.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC for HMMs Specific case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) 0 where only y1:n is observed. [Dean, Singh, Jasra, Peters, 2011] Use of specific constraints, adapted to the Markov structure: 0 0 y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
  • 180.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MLE for HMMs ABC-MLE defined by ˆ 0 0 θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , ) θ
  • 181.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MLE for HMMs ABC-MLE defined by ˆ 0 0 θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , ) θ Exact MLE for the likelihood same basis as Wilkinson! 0 pθ (y1 , . . . , yn ) corresponding to the perturbed process (xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1) [Dean, Singh, Jasra, Peters, 2011]
  • 182.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MLE is biased ABC-MLE is asymptotically (in n) biased with target l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
  • 183.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup ABC-MLE is biased ABC-MLE is asymptotically (in n) biased with target l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )] but ABD-MLE converges to true value in the sense l n (θn ) → l (θ) for all sequences (θn ) converging to θ and n
  • 184.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Noisy ABC-MLE Idea: Modify instead the data from the start 0 (y1 + ζ1 , . . . , yn + ζn ) [ see Fearnhead-Prangle ] noisy ABC-MLE estimate 0 0 arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , ) θ [Dean, Singh, Jasra, Peters, 2011]
  • 185.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Consistent noisy ABC-MLE Degrading the data improves the estimation performances: Noisy ABC-MLE is asymptotically (in n) consistent under further assumptions, the noisy ABC-MLE is asymptotically normal increase in variance of order −2 likely degradation in precision or computing time due to the lack of summary statistic
  • 186.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup SMC for ABC likelihood Algorithm 4 SMC ABC for HMMs Given θ for k = 1, . . . , n do 1 1 N N generate proposals (xk , yk ), . . . , (xk , yk ) from the model weigh each proposal with ωk l =I l B(yk + ζk , ) (yk ) 0 l renormalise the weights and sample the xk ’s accordingly end for approximate the likelihood by n N l ωk N k=1 l=1 [Jasra, Singh, Martin, McCoy, 2010]
  • 187.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field]
  • 188.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test.
  • 189.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup Which summary? Fundamental difficulty of the choice of the summary statistic when there is no non-trivial sufficient statistics [except when done by the experimenters in the field] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test. Does not taking into account the sequential nature of the tests Depends on parameterisation Order of inclusion matters.
  • 190.
    Likelihood-free Methods Approximate Bayesian computation Alphabet soup A connected Monte Carlo study of the pseudo-data per simulated parameter Repeating simulations does not improve approximation Tolerance level does not seem to be highly influential Choice of distance / summary statistics / calibration factors are paramount to successful approximation ABC-SMC outperforms ABC-MCMC [Mckinley, Cook, Deardon, 2009]
  • 191.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Semi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes. Use of a randomised (or ‘noisy’) version of the summary statistics η (y) = η(y) + τ ˜ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic.
  • 192.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Semi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes. Use of a randomised (or ‘noisy’) version of the summary statistics η (y) = η(y) + τ ˜ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic. [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior.]
  • 193.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Summary statistics Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
  • 194.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Summary statistics Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)! Use of the standard quadratic loss function (θ − θ0 )T A(θ − θ0 ) .
  • 195.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Details on Fearnhead and Prangle (FP) ABC Use of a summary statistic S(·), an importance proposal g (·), a kernel K (·) ≤ 1 and a bandwidth h 0 such that (θ, ysim ) ∼ g (θ)f (ysim |θ) is accepted with probability (hence the bound) K [{S(ysim ) − sobs }/h] and the corresponding importance weight defined by π(θ) g (θ) [Fearnhead Prangle, 2012]
  • 196.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Errors, errors, and errors Three levels of approximation π(θ|yobs ) by π(θ|sobs ) loss of information [ignored] π(θ|sobs ) by π(s)K [{s − sobs }/h]π(θ|s) ds πABC (θ|sobs ) = π(s)K [{s − sobs }/h] ds noisy observations πABC (θ|sobs ) by importance Monte Carlo based on N simulations, represented by var(a(θ)|sobs )/Nacc [expected number of acceptances] [M. Twain/B. Disraeli]
  • 197.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Average acceptance asymptotics For the average acceptance probability/approximate likelihood p(θ|sobs ) = f (ysim |θ) K [{S(ysim ) − sobs }/h] dysim , overall acceptance probability p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd ) [FP, Lemma 1]
  • 198.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Optimal importance proposal Best choice of importance proposal in terms of effective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice]
  • 199.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Optimal importance proposal Best choice of importance proposal in terms of effective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice] note that p(θ|sobs ) is an approximate likelihood reminiscent of parallel tempering could be approximately achieved by attrition of half of the data
  • 200.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Calibration of h “This result gives insight into how S(·) and h affect the Monte Carlo error. To minimize Monte Carlo error, we need hd to be not too small. Thus ideally we want S(·) to be a low dimensional summary of the data that is sufficiently informative about θ that π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (FP, p.5) turns h into an absolute value while it should be context-dependent and user-calibrated only addresses one term in the approximation error and acceptance probability (“curse of dimensionality”) h large prevents πABC (θ|sobs ) to be close to π(θ|sobs ) d small prevents π(θ|sobs ) to be close to π(θ|yobs ) (“curse of [dis]information”)
  • 201.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Calibrating ABC “If πABC is calibrated, then this means that probability statements that are derived from it are appropriate, and in particular that we can use πABC to quantify uncertainty in estimates” (FP, p.5)
  • 202.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Calibrating ABC “If πABC is calibrated, then this means that probability statements that are derived from it are appropriate, and in particular that we can use πABC to quantify uncertainty in estimates” (FP, p.5) Definition For 0 q 1 and subset A, event Eq (A) made of sobs such that PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if Pr(θ ∈ A|Eq (A)) = q unclear meaning of conditioning on Eq (A)
  • 203.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Calibrated ABC Theorem (FP) Noisy ABC, where sobs = S(yobs ) + h , ∼ K (·) is calibrated [Wilkinson, 2008] no condition on h!!
  • 204.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Calibrated ABC Consequence: when h = ∞ Theorem (FP) The prior distribution is always calibrated is this a relevant property then?
  • 205.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection More about calibrated ABC “Calibration is not universally accepted by Bayesians. It is even more questionable here as we care how statements we make relate to the real world, not to a mathematically defined posterior.” R. Wilkinson Same reluctance about the prior being calibrated Property depending on prior, likelihood, and summary Calibration is a frequentist property (almost a p-value!) More sensible to account for the simulator’s imperfections than using noisy-ABC against a meaningless based measure [Wilkinson, 2012]
  • 206.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Converging ABC Theorem (FP) For noisy ABC, the expected noisy-ABC log-likelihood, E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + )]π(yobs |θ0 )K ( )dyobs d , has its maximum at θ = θ0 . True for any choice of summary statistic? even ancilary statistics?! [Imposes at least identifiability...] Relevant in asymptotia and not for the data
  • 207.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Converging ABC Corollary For noisy ABC, the ABC posterior converges onto a point mass on the true parameter value as m → ∞. For standard ABC, not always the case (unless h goes to zero). Strength of regularity conditions (c1) and (c2) in Bernardo Smith, 1994? [out-of-reach constraints on likelihood and posterior] Again, there must be conditions imposed upon summary statistics...
  • 208.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Loss motivated statistic Under quadratic loss function, Theorem (FP) ˆ (i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when ˆ = E(θ|yobs ) (!) θ (ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs ) ˆ (iii) If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ] ˆ E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK (x)dx + o(h2 ). measure-theoretic difficulties? dependence of sobs on h makes me uncomfortable inherent to noisy ABC Relevant for choice of K ?
  • 209.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Optimal summary statistic “We take a different approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (FP, p.5) From this result, FP derive their choice of summary statistic, S(y) = E(θ|y) [almost sufficient] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
  • 210.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Optimal summary statistic “We take a different approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (FP, p.5) From this result, FP derive their choice of summary statistic, S(y) = E(θ|y) [wow! EABC [θ|S(yobs )] = E[θ|yobs ]] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
  • 211.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Caveat Since E(θ|yobs ) is most usually unavailable, FP suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic.
  • 212.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Caveat Since E(θ|yobs ) is most usually unavailable, FP suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic. where is the assessment of the first stage error?
  • 213.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Approximating the summary statistic As Beaumont et al. (2002) and Blum and Fran¸ois (2010), FP c use a linear regression to approximate E(θ|yobs ): (i) θi = β0 + β (i) f (yobs ) + i with f made of standard transforms [Further justifications?]
  • 214.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection [my]questions about semi-automatic ABC dependence on h and S(·) in the early stage reduction of Bayesian inference to point estimation approximation error in step (i) not accounted for not parameterisation invariant practice shows that proper approximation to genuine posterior distributions stems from using a (much) larger number of summary statistics than the dimension of the parameter the validity of the approximation to the optimal summary statistic depends on the quality of the pilot run important inferential issues like model choice are not covered by this approach. [Robert, 2012]
  • 215.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection More about semi-automatic ABC [End of section derived from comments on Read Paper, to appear] “The apparently arbitrary nature of the choice of summary statistics has always been perceived as the Achilles heel of ABC.” M. Beaumont
  • 216.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection More about semi-automatic ABC [End of section derived from comments on Read Paper, to appear] “The apparently arbitrary nature of the choice of summary statistics has always been perceived as the Achilles heel of ABC.” M. Beaumont “Curse of dimensionality” linked with the increase of the dimension of the summary statistic Connection with principal component analysis [Itan et al., 2010] Connection with partial least squares [Wegman et al., 2009] Beaumont et al. (2002) postprocessed output is used as input by FP to run a second ABC
  • 217.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Wood’s alternative Instead of a non-parametric kernel approximation to the likelihood 1 K {η(yr ) − η(yobs )} R r Wood (2010) suggests a normal approximation η(y(θ)) ∼ Nd (µθ , Σθ ) whose parameters can be approximated based on the R simulations (for each value of θ).
  • 218.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Wood’s alternative Instead of a non-parametric kernel approximation to the likelihood 1 K {η(yr ) − η(yobs )} R r Wood (2010) suggests a normal approximation η(y(θ)) ∼ Nd (µθ , Σθ ) whose parameters can be approximated based on the R simulations (for each value of θ). Parametric versus non-parametric rate [Uh?!] Automatic weighting of components of η(·) through Σθ Dependence on normality assumption (pseudo-likelihood?) [Cornebise, Girolami Kosmidis, 2012]
  • 219.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Reinterpretation and extensions Reinterpretation of ABC output as joint simulation from π (x, y |θ) = f (x|θ)¯Y |X (y |x) ¯ π where πY |X (y |x) = K (y − x) ¯
  • 220.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Reinterpretation and extensions Reinterpretation of ABC output as joint simulation from π (x, y |θ) = f (x|θ)¯Y |X (y |x) ¯ π where πY |X (y |x) = K (y − x) ¯ Reinterpretation of noisy ABC if y |y obs ∼ πY |X (·|y obs ), then marginally ¯ ¯ y ∼ πY |θ (·|θ0 ) ¯ ¯ c Explain for the consistency of Bayesian inference based on y and π ¯ ¯ [Lee, Andrieu Doucet, 2012]
  • 221.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Reinterpretation and extensions Also fits within the pseudo-marginal of Andrieu and Roberts (2009) Algorithm 5 Rejuvenating GIMH-ABC At time t, given θt = θ: Sample θ ∼ g (·|θ). ⊗N Sample z1:N ∼ πY |Θ (·|θ ). ⊗(N−1) Sample x1:N−1 ∼ πY |Θ (·|θ). With probability N π(θ )g (θ|θ ) i=1 IBh (zi ) (y ) min 1, × N−1 π(θ)g (θ |θ) 1+ i=1 IBh (xi ) (y ) set θt+1 = θ . Otherwise set θt+1 = θ.
  • 222.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Reinterpretation and extensions Also fits within the pseudo-marginal of Andrieu and Roberts (2009) Algorithm 6 One-hit MCMC-ABC At time t, with θt = θ: Sample θ ∼ g (·|θ). With probability 1 − min 1, π(θ )g(θ |θ)) , set θt+1 = θ and go to π(θ)g (θ|θ time t + 1. Sample zi ∼ πY |Θ (·|θ ) and xi ∼ πY |Θ (·|θ) for i = 1, . . . until y ∈ Bh (zi ) and/or y ∈ Bh (xi ). If y ∈ Bh (zi ) set θt+1 = θ and go to time t + 1. If y ∈ Bh (xi ) set θt+1 = θ and go to time t + 1. [Lee, Andrieu Doucet, 2012]
  • 223.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Reinterpretation and extensions Also fits within the pseudo-marginal of Andrieu and Roberts (2009) Validation c The invariant distribution of θ is πθ|Y (·|y ) ¯ for both algorithms. [Lee, Andrieu Doucet, 2012]
  • 224.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection ABC for Markov chains Rewriting the posterior as π(θ)1−n (θ)π(θ|x1 ) π(θ|xt−1 , xt ) where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
  • 225.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection ABC for Markov chains Rewriting the posterior as π(θ)1−n (θ)π(θ|x1 ) π(θ|xt−1 , xt ) where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ) Allows for a stepwise ABC, replacing each π(θ|xt−1 , xt ) by an ABC approximation Similarity with FP’s multiple sources of data (and also with Dean et al., 2011 ) [White et al., 2010, 2012]
  • 226.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Back to sufficiency Difference between regular sufficiency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and
  • 227.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Back to sufficiency Difference between regular sufficiency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and marginal sufficiency, stated as π(µ(θ)|y) = π(µ(θ)|η(y)) for all θ’s, the given prior π and a subvector µ(θ) [Basu, 1977]
  • 228.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Back to sufficiency Difference between regular sufficiency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and marginal sufficiency, stated as π(µ(θ)|y) = π(µ(θ)|η(y)) for all θ’s, the given prior π and a subvector µ(θ) [Basu, 1977] Relates to F P’s main result, but could event be reduced to conditional sufficiency π(µ(θ)|yobs ) = π(µ(θ)|η(yobs )) (if feasible at all...) [Dawson, 2012]
  • 229.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection ABC and BCa’s Arguments in favour of a comparison of ABC with bootstrap in the event π(θ) is not defined from prior information but as conventional non-informative prior. [Efron, 2012]
  • 230.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Predictive performances Instead of posterior means, other aspects of posterior to explore. E.g., look at minimising loss of information p(θ, y) p(θ, η(y)) p(θ, y) log dθdy − p(θ, η(y)) log dθdη(y) p(θ)p(y) p(θ)p(η(y)) for selection of summary statistics. [Filippi, Barnes, Stumpf, 2012]
  • 231.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ)
  • 232.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artificial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) Accept with probability π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ) ∧1 π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ ) [Møller, Pettitt, Berthelsen, Reeves, 2006]
  • 233.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artificial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) Accept with probability ˜ ˜ π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ) ∧1 ˜ ˜ π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ ) [Møller, Pettitt, Berthelsen, Reeves, 2006]
  • 234.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artificial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) For Gibbs random fields , existence of a genuine sufficient statistic η(y). [Møller, Pettitt, Berthelsen, Reeves, 2006]
  • 235.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables and ABC Special case of ABC when g (z|θ, y) = K (η(bz) − η(y)) ˜ ˜ ˜ ˜ f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!]
  • 236.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection Auxiliary variables and ABC Special case of ABC when g (z|θ, y) = K (η(bz) − η(y)) ˜ ˜ ˜ ˜ f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!] Consequences likelihood-free (ABC) versus constant-free (AVM) in ABC, K (·) should be allowed to depend on θ for Gibbs random fields, the auxiliary approach should be prefered to ABC [Møller, 2012]
  • 237.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection ABC and BIC Idea of applying BIC during the local regression : Run regular ABC Select summary statistics during local regression Recycle the prior simulation sample (reference table) with those summary statistics Rerun the corresponding local regression (low cost) [Pudlo Sedki, 2012]
  • 238.
    Likelihood-free Methods Approximate Bayesian computation Automated summary statistic selection A Brave New World?!
  • 239.
    Likelihood-free Methods ABC for model choice ABC for model choice Introduction The Metropolis-Hastings Algorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice Model choice Gibbs random fields
  • 240.
    Likelihood-free Methods ABC for model choice Model choice Bayesian model choice Several models M1 , M2 , . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm (θ m ) Goal is to derive the posterior distribution of M, challenging computational target when models are complex.
  • 241.
    Likelihood-free Methods ABC for model choice Model choice Generic ABC for model choice Algorithm 7 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θ m from the prior πm (θ m ) Generate z from the model fm (z|θ m ) until ρ{η(z), η(y)} Set m(t) = m and θ (t) = θ m end for
  • 242.
    Likelihood-free Methods ABC for model choice Model choice ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Issues with implementation: should tolerances be the same for all models? should summary statistics vary across models (incl. their dimension)? should the distance measure ρ vary as well?
  • 243.
    Likelihood-free Methods ABC for model choice Model choice ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Extension to a weighted polychotomous logistic regression estimate of π(M = m|y), with non-parametric kernel weights [Cornuet et al., DIYABC, 2009]
  • 244.
    Likelihood-free Methods ABC for model choice Model choice The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Against: Templeton, 2008, 2009, 2010a, 2010b, 2010c argues that nested hypotheses cannot have higher probabilities than nesting hypotheses (!)
  • 245.
    Likelihood-free Methods ABC for model choice Model choice The Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Replies: Fagundes et al., 2008, Against: Templeton, 2008, Beaumont et al., 2010, Berger et 2009, 2010a, 2010b, 2010c al., 2010, Csill`ry et al., 2010 e argues that nested hypotheses point out that the criticisms are cannot have higher probabilities addressed at [Bayesian] than nesting hypotheses (!) model-based inference and have nothing to do with ABC...
  • 246.
    Likelihood-free Methods ABC for model choice Gibbs random fields Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential sufficient statistic U(y) = c∈C Vc (yc ) is the energy function
  • 247.
    Likelihood-free Methods ABC for model choice Gibbs random fields Gibbs random fields Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random field associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential sufficient statistic U(y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
  • 248.
    Likelihood-free Methods ABC for model choice Gibbs random fields Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure
  • 249.
    Likelihood-free Methods ABC for model choice Gibbs random fields Potts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θ T S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR Titterington, 2009]
  • 250.
    Likelihood-free Methods ABC for model choice Gibbs random fields Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1
  • 251.
    Likelihood-free Methods ABC for model choice Gibbs random fields Bayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1 Use of Jeffreys’ scale to select most appropriate model
  • 252.
    Likelihood-free Methods ABC for model choice Gibbs random fields Neighbourhood relations Choice to be made between M neighbourhood relations m i ∼i (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi } m i ∼i driven by the posterior probabilities of the models.
  • 253.
    Likelihood-free Methods ABC for model choice Gibbs random fields Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm )
  • 254.
    Likelihood-free Methods ABC for model choice Gibbs random fields Model index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm
  • 255.
    Likelihood-free Methods ABC for model choice Gibbs random fields Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM−1 ), P(M = m|x) = P(M = m|S(x)) .
  • 256.
    Likelihood-free Methods ABC for model choice Gibbs random fields Sufficient statistics By definition, if S(x) sufficient statistic for the joint parameters (M, θ0 , . . . , θM−1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own sufficient statistic Sm (·) and S(·) = (S0 (·), . . . , SM−1 (·)) also sufficient.
  • 257.
    Likelihood-free Methods ABC for model choice Gibbs random fields Sufficient statistics in Gibbs random fields For Gibbs random fields, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜) = S(x)} x x c S(x) is therefore also sufficient for the joint parameters [Specific to Gibbs random fields!]
  • 258.
    Likelihood-free Methods ABC for model choice Gibbs random fields ABC model choice Algorithm ABC-MC Generate m∗ from the prior π(M = m). ∗ Generate θm∗ from the prior πm∗ (·). Generate x ∗ from the model fm∗ (·|θm∗ ). ∗ Compute the distance ρ(S(x0 ), S(x∗ )). Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) . ∗ Note When = 0 the algorithm is exact
  • 259.
    Likelihood-free Methods ABC for model choice Gibbs random fields ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 )
  • 260.
    Likelihood-free Methods ABC for model choice Gibbs random fields ABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).
  • 261.
    Likelihood-free Methods ABC for model choice Gibbs random fields Toy example iid Bernoulli model versus two-state first-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).
  • 262.
    Likelihood-free Methods ABC for model choice Gibbs random fields Toy example (2) 10 5 5 BF01 BF01 0 ^ ^ 0 −5 −5 −40 −20 0 10 −10 −40 −20 0 10 BF01 BF01 (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.
  • 263.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
  • 264.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and η2 (x) sufficient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always sufficient for (m, θm )
  • 265.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Back to sufficiency ‘Sufficient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and η2 (x) sufficient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always sufficient for (m, θm ) c Potential loss of information at the testing level
  • 266.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior
  • 267.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior As T go to infinity, limit Iρ{η(z),η(y)}≤ π1 (θ 1 )f1 (z|θ 1 ) dz dθ 1 B12 (y) = Iρ{η(z),η(y)}≤ π2 (θ 2 )f2 (z|θ 2 ) dz dθ 2 Iρ{η,η(y)}≤ π1 (θ 1 )f1η (η|θ 1 ) dη dθ 1 = , Iρ{η,η(y)}≤ π2 (θ 2 )f2η (η|θ 2 ) dη dθ 2 where f1η (η|θ 1 ) and f2η (η|θ 2 ) distributions of η(z)
  • 268.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 ( → 0) When goes to zero, η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 B12 (y) = , π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2
  • 269.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 ( → 0) When goes to zero, η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 B12 (y) = , π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2 c Bayes factor based on the sole observation of η(y)
  • 270.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θ i ) = gi (y)fi η (η(y)|θ i ) Thus Θ1 π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1 B12 (y) = Θ2 π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2 g1 (y) π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen Lawson, 2011]
  • 271.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θ i ) = gi (y)fi η (η(y)|θ i ) Thus Θ1 π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1 B12 (y) = Θ2 π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2 g1 (y) π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen Lawson, 2011] c No discrepancy only when cross-model sufficiency
  • 272.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Poisson/geometric example Sample x = (x1 , . . . , xn ) from either a Poisson P(λ) or from a geometric G(p) Then n S= yi = η(x) i=1 sufficient statistic for either model but not simultaneously Discrepancy ratio g1 (x) S!n−S / i yi ! = g2 (x) 1 n+S−1 S
  • 273.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Poisson/geometric discrepancy η Range of B12 (x) versus B12 (x) B12 (x): The values produced have nothing in common.
  • 274.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011]
  • 275.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011] In the Poisson/geometric case, if i xi ! is added to S, no discrepancy
  • 276.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Formal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011] Only applies in genuine sufficiency settings... c Inability to evaluate loss brought by summary statistics
  • 277.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Meaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
  • 278.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Meaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og] In the Poisson/geometric case, if E[yi ] = θ0 0, η (θ0 + 1)2 −θ0 lim B12 (y) = e n→∞ θ0
  • 279.
    Likelihood-free Methods ABC for model choice Generic ABC model choice MA(q) divergence 0.7 0.6 0.6 0.6 0.6 0.5 0.5 0.5 0.5 0.4 0.4 0.4 0.4 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 1 2 1 2 1 2 1 2 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor equal to 17.71.
  • 280.
    Likelihood-free Methods ABC for model choice Generic ABC model choice MA(q) divergence 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1 2 1 2 1 2 1 2 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21 equal to .004.
  • 281.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Further comments ‘There should be the possibility that for the same model, but different (non-minimal) [summary] statistics (so ∗ different η’s: η1 and η1 ) the ratio of evidences may no longer be equal to one.’ [Michael Stumpf, Jan. 28, 2011, ’Og] Using different summary statistics [on different models] may indicate the loss of information brought by each set but agreement does not lead to trustworthy approximations.
  • 282.
    Likelihood-free Methods ABC for model choice Generic ABC model choice A population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter
  • 283.
    Likelihood-free Methods ABC for model choice Generic ABC model choice A population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter 24 summary statistics 2 million ABC proposal importance [tree] sampling alternative
  • 284.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Stability of importance sampling 1.0 1.0 1.0 1.0 1.0 q q 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 q 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0
  • 285.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction 1.0 q q q q q q q q q q q q q q q q qq q q 0.8 q q q q q qq q q q q q q qq q ABC estimates of P(M=1|y) q q q q q q qq qq q qq q q q q q 0.6 q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq 0.4 qq q q q q q q q q q 0.2 q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
  • 286.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 15 summary statistics and DIY-ABC logistic correction 1.0 q q q q qq q qq q qq q qqq qq q qq q q q q qq q q q q q q qq q q q q q 0.8 q q q q qq q q q q q q q q q ABC estimates of P(M=1|y) q q q q q q q q q q q q q q 0.6 q q q qq q q q q q q qq 0.4 q q q q q q q q q q q 0.2 q q q q q q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
  • 287.
    Likelihood-free Methods ABC for model choice Generic ABC model choice Comparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction 1.0 q q q q q q qq q q q qq q q q q q q q q q q q 0.8 qq q q q q q q q q q q q qq qq ABC estimates of P(M=1|y) q q q q q 0.6 q q q q q q q q q q q qq q q q q q q q q q q q q q q 0.4 q q q q q q q q q q q q q q q q q q q q 0.2 q q q q q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
  • 288.
    Likelihood-free Methods ABC for model choice Generic ABC model choice The only safe cases??? Besides specific models like Gibbs random fields, using distances over the data itself escapes the discrepancy... [Toni Stumpf, 2010; Sousa al., 2009]
  • 289.
    Likelihood-free Methods ABC for model choice Generic ABC model choice The only safe cases??? Besides specific models like Gibbs random fields, using distances over the data itself escapes the discrepancy... [Toni Stumpf, 2010; Sousa al., 2009] ...and so does the use of more informal model fitting measures [Ratmann al., 2009]
  • 290.
    Likelihood-free Methods ABC model choice consistency ABC model choice consistency Introduction The Metropolis-Hastings Algorithm simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
  • 291.
    Likelihood-free Methods ABC model choice consistency Formalised framework The starting point Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent?
  • 292.
    Likelihood-free Methods ABC model choice consistency Formalised framework The starting point Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? T Note: c drawn on T(y) through B12 (y) necessarily differs from c drawn on y through B12 (y)
  • 293.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one).
  • 294.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). Four possible statistics 1. sample mean y (sufficient for M1 if not M2 ); 2. sample median med(y) (insufficient); 3. sample variance var(y) (ancillary); 4. median absolute deviation mad(y) = med(y − med(y));
  • 295.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). 6 5 4 Density 3 2 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 posterior probability
  • 296.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). n=100 0.7 0.6 0.5 0.4 q 0.3 q q q 0.2 0.1 q q q q q 0.0 q q Gauss Laplace
  • 297.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). 6 3 5 4 Density Density 2 3 2 1 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 posterior probability probability
  • 298.
    Likelihood-free Methods ABC model choice consistency Formalised framework A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). n=100 n=100 1.0 0.7 q q 0.6 q 0.8 q q 0.5 q q q 0.6 q 0.4 q q q q 0.3 q q 0.4 q q 0.2 0.2 q 0.1 q q q q q q q 0.0 0.0 q q Gauss Laplace Gauss Laplace
  • 299.
    Likelihood-free Methods ABC model choice consistency Formalised framework Framework Starting from sample y = (y1 , . . . , yn ) the observed sample, not necessarily iid with true distribution y ∼ Pn Summary statistics T(y) = Tn = (T1 (y), T2 (y), · · · , Td (y)) ∈ Rd with true distribution Tn ∼ Gn .
  • 300.
    Likelihood-free Methods ABC model choice consistency Formalised framework Framework c Comparison of – under M1 , y ∼ F1,n (·|θ1 ) where θ1 ∈ Θ1 ⊂ Rp1 – under M2 , y ∼ F2,n (·|θ2 ) where θ2 ∈ Θ2 ⊂ Rp2 turned into – under M1 , T(y) ∼ G1,n (·|θ1 ), and θ1 |T(y) ∼ π1 (·|Tn ) – under M2 , T(y) ∼ G2,n (·|θ2 ), and θ2 |T(y) ∼ π2 (·|Tn )
  • 301.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A1] There exist a sequence {vn } converging to +∞, an a.c. distribution Q with continuous bounded density q(·), a symmetric, d × d positive definite matrix V0 and a vector µ0 ∈ Rd such that −1/2 n→∞ vn V0 (Tn − µ0 ) Q, under Gn and for all M 0 −d −1/2 sup |V0 |1/2 vn gn (t) − q vn V0 {t − µ0 } = o(1) vn |t−µ0 |M
  • 302.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A2] For i = 1, 2, there exist d × d symmetric positive definite matrices Vi (θi ) and µi (θi ) ∈ Rd such that n→∞ vn Vi (θi )−1/2 (Tn − µi (θi )) Q, under Gi,n (·|θi ) .
  • 303.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A3] For i = 1, 2, there exist sets Fn,i ⊂ Θi and constants i , τi , αi 0 such that for all τ 0, sup Gi,n |Tn − µ(θi )| τ |µi (θi ) − µ0 | ∧ i |θi θi ∈Fn,i −α −αi vn i sup (|µi (θi ) − µ0 | ∧ i ) θi ∈Fn,i with c −τ πi (Fn,i ) = o(vn i ).
  • 304.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A4] For u 0 −1 Sn,i (u) = θi ∈ Fn,i ; |µ(θi ) − µ0 | ≤ u vn if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exist constants di τi ∧ αi − 1 such that −d πi (Sn,i (u)) ∼ u di vn i , ∀u vn
  • 305.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A5] If inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exists U 0 such that for any M 0, −d sup sup |Vi (θi )|1/2 vn gi (t|θi ) vn |t−µ0 |M θi ∈Sn,i (U) −q vn Vi (θi )−1/2 (t − µ(θi ) = o(1) and πi Sn,i (U) ∩ ||Vi (θi )−1 || + ||Vi (θi )|| M lim lim sup =0. M→∞ n πi (Sn,i (U))
  • 306.
    Likelihood-free Methods ABC model choice consistency Consistency results Assumptions A collection of asymptotic “standard” assumptions: [A1]–[A2] are standard central limit theorems ([A1] redundant when one model is “true”) [A3] controls the large deviations of the estimator Tn from the estimand µ(θ) [A4] is the standard prior mass condition found in Bayesian asymptotics (di effective dimension of the parameter) [A5] controls more tightly convergence esp. when µi is not one-to-one
  • 307.
    Likelihood-free Methods ABC model choice consistency Consistency results Effective dimension [A4] Understanding d1 , d2 : defined only when µ0 ∈ {µi (θi ), θi ∈ Θi }, πi (θi : |µi (θi ) − µ0 | n−1/2 ) = O(n−di /2 ) is the effective dimension of the model Θi around µ0
  • 308.
    Likelihood-free Methods ABC model choice consistency Consistency results Asymptotic marginals Asymptotically, under [A1]–[A5] mi (t) = gi (t|θi ) πi (θi ) dθi Θi is such that (i) if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, Cl vn i ≤ mi (Tn ) ≤ Cu vn i d−d d−d and (ii) if inf{|µi (θi ) − µ0 |; θi ∈ Θi } 0 mi (Tn ) = oPn [vn i + vn i ]. d−τ d−α
  • 309.
    Likelihood-free Methods ABC model choice consistency Consistency results Within-model consistency Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn provided αi ∧ τi di .
  • 310.
    Likelihood-free Methods ABC model choice consistency Consistency results Within-model consistency Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn provided αi ∧ τi di . Note: di can truly be seen as an effective dimension of the model under the posterior πi (.|Tn ), since if µ0 ∈ {µi (θi ); θi ∈ Θi }, mi (Tn ) ∼ vn i d−d
  • 311.
    Likelihood-free Methods ABC model choice consistency Consistency results Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value!
  • 312.
    Likelihood-free Methods ABC model choice consistency Consistency results Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0 then −(d1 −d2 ) −(d1 −d2 ) Cl vn ≤ m1 (Tn ) m2 (Tn ) ≤ Cu vn , where Cl , Cu = OPn (1), irrespective of the true model. c Only depends on the difference d1 − d2
  • 313.
    Likelihood-free Methods ABC model choice consistency Consistency results Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value! Else, if inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0 then m1 (Tn ) −(d1 −α2 ) −(d1 −τ2 ) n ≥ Cu min vn , vn , m2 (T )
  • 314.
    Likelihood-free Methods ABC model choice consistency Consistency results Consistency theorem If inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0, Bayes factor T −(d1 −d2 ) B12 = O(vn ) irrespective of the true model. It is consistent iff Pn is within the model with the smallest dimension
  • 315.
    Likelihood-free Methods ABC model choice consistency Consistency results Consistency theorem If Pn belongs to one of the two models and if µ0 cannot be attained by the other one : 0 = min (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) max (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) , T then the Bayes factor B12 is consistent
  • 316.
    Likelihood-free Methods ABC model choice consistency Summary statistics Consequences on summary statistics Bayes factor driven by the means µi (θi ) and the relative position of µ0 wrt both sets {µi (θi ); θi ∈ Θi }, i = 1, 2. For ABC, this implies the most likely statistics Tn are ancillary statistics with different mean values under both models Else, if Tn asymptotically depends on some of the parameters of the models, it is quite likely that there exists θi ∈ Θi such that µi (θi ) = µ0 even though model M1 is misspecified
  • 317.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [1] If n Tn = n−1 Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 and the true distribution is Laplace with mean θ0 = 1, under the √ Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ ) [here d1 = d2 = d = 1]
  • 318.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [1] If n Tn = n−1 Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 and the true distribution is Laplace with mean θ0 = 1, under the √ Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ ) [here d1 = d2 = d = 1] c a Bayes factor associated with such a statistic is inconsistent
  • 319.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 probability Fourth moment
  • 320.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 n=1000 0.45 q 0.40 0.35 q q q 0.30 q M1 M2
  • 321.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 Caption: Comparison of the distributions of the posterior probabilities that the data is from a normal model (as opposed to a Laplace model) with unknown mean when the data is made of n = 1000 observations either from a normal (M1) or Laplace (M2) distribution with mean one and when the summary statistic in the ABC algorithm is restricted to the empirical fourth moment. The ABC algorithm uses proposals from the prior N (0, 4) and selects the tolerance as the 1% distance quantile.
  • 322.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ and the true distribution is Laplace with mean θ0 = 0, then √ ∗ ∗ µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3 [d1 = 1 and d2 = 1/2] thus B12 ∼ n−1/4 → 0 : consistent
  • 323.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ and the true distribution is Laplace with mean θ0 = 0, then √ ∗ ∗ µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3 [d1 = 1 and d2 = 1/2] thus B12 ∼ n−1/4 → 0 : consistent Under the Gaussian model µ0 = 3 µ2 (θ2 ) ≥ 6 3 ∀θ2 B12 → +∞ : consistent
  • 324.
    Likelihood-free Methods ABC model choice consistency Summary statistics Toy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ 5 4 3 Density 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 posterior probabilities Fourth and sixth moments
  • 325.
    Likelihood-free Methods ABC model choice consistency Summary statistics Embedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn
  • 326.
    Likelihood-free Methods ABC model choice consistency Summary statistics Embedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn If summary statistic only informative on a parameter that is the same under both models, i.e if d1 = d2 , then c the Bayes factor is not consistent
  • 327.
    Likelihood-free Methods ABC model choice consistency Summary statistics Embedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn Else, d1 d2 and Bayes factor is consistent under M1 . If true distribution not in M1 , then c Bayes factor is consistent only if µ1 = µ2 = µ0