0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Mcmc & lkd free II

329

Published on

Published in: Technology, Economy & Finance
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
329
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
5
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. MCMC and likelihood-free methods Part/day II: ABC MCMC and likelihood-free methods Part/day II: ABC Christian P. Robert Universit´ Paris-Dauphine, IuF, & CREST e Monash University, EBS, July 18, 2012
• 2. MCMC and likelihood-free methods Part/day II: ABCOutlinesimulation-based methods in EconometricsGenetics of ABCApproximate Bayesian computationABC for model choiceABC model choice consistency
• 3. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsEcon’ections simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
• 4. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsUsages of simulation in Econometrics Similar exploration of simulation-based techniques in Econometrics Simulated method of moments Method of simulated moments Simulated pseudo-maximum-likelihood Indirect inference [Gouri´roux & Monfort, 1996] e
• 5. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsSimulated method of moments o Given observations y1:n from a model yt = r (y1:(t−1) , t , θ) , t ∼ g (·) simulate 1:n , derive yt (θ) = r (y1:(t−1) , t , θ) and estimate θ by n arg min (yto − yt (θ))2 θ t=1
• 6. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsSimulated method of moments o Given observations y1:n from a model yt = r (y1:(t−1) , t , θ) , t ∼ g (·) simulate 1:n , derive yt (θ) = r (y1:(t−1) , t , θ) and estimate θ by n n 2 arg min yto − yt (θ) θ t=1 t=1
• 7. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsMethod of simulated moments Given a statistic vector K (y ) with Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ) ﬁnd an unbiased estimator of k(y1:(t−1) ; θ), ˜ k( t , y1:(t−1) ; θ) Estimate θ by n S arg min K (yt ) − ˜ t k( s , y1:(t−1) ; θ)/S θ t=1 s=1 [Pakes & Pollard, 1989]
• 8. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsIndirect inference ˆ Minimise (in θ) the distance between estimators β based on pseudo-models for genuine observations and for observations simulated under the true model and the parameter θ. [Gouri´roux, Monfort, & Renault, 1993; e Smith, 1993; Gallant & Tauchen, 1996]
• 9. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsIndirect inference (PML vs. PSE) Example of the pseudo-maximum-likelihood (PML) ˆ β(y) = arg max log f (yt |β, y1:(t−1) ) β t leading to ˆ ˆ arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
• 10. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsIndirect inference (PML vs. PSE) Example of the pseudo-score-estimator (PSE) 2 ˆ ∂ log f β(y) = arg min (yt |β, y1:(t−1) ) β t ∂β leading to ˆ ˆ arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2 θ when ys (θ) ∼ f (y|θ) s = 1, . . . , S
• 11. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsConsistent indirect inference ...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identiﬁed the diﬀerent methods become easier...
• 12. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsConsistent indirect inference ...in order to get a unique solution the dimension of the auxiliary parameter β must be larger than or equal to the dimension of the initial parameter θ. If the problem is just identiﬁed the diﬀerent methods become easier... Consistency depending on the criterion and on the asymptotic identiﬁability of θ [Gouri´roux, Monfort, 1996, p. 66] e
• 13. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsAR(2) vs. MA(1) example true (AR) model yt = t −θ t−1 and [wrong!] auxiliary (MA) model yt = β1 yt−1 + β2 yt−2 + ut R code x=eps=rnorm(250) x[2:250]=x[2:250]-0.5*x[1:249] simeps=rnorm(250) propeta=seq(-.99,.99,le=199) dist=rep(0,199) bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)\$coef) for (t in 1:199) dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]* simeps[1:249]),c(2,0,0),incl=FALSE)\$coef)-bethat)^2)
• 14. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsAR(2) vs. MA(1) example One sample: 0.8 0.6 distance 0.4 0.2 0.0 −1.0 −0.5 0.0 0.5 1.0
• 15. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsAR(2) vs. MA(1) example Many samples: 6 5 4 3 2 1 0 0.2 0.4 0.6 0.8 1.0
• 16. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsChoice of pseudo-model Pick model such that ˆ 1. β(θ) not ﬂat (i.e. sensitive to changes in θ) ˆ 2. β(θ) not dispersed (i.e. robust agains changes in ys (θ)) [Frigessi & Heggland, 2004]
• 17. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsABC using indirect inference (1) We present a novel approach for developing summary statistics for use in approximate Bayesian computation (ABC) algorithms by using indirect inference(...) In the indirect inference approach to ABC the parameters of an auxiliary model ﬁtted to the data become the summary statistics. Although applicable to any ABC technique, we embed this approach within a sequential Monte Carlo algorithm that is completely adaptive and requires very little tuning(...) [Drovandi, Pettitt & Faddy, 2011] c Indirect inference provides summary statistics for ABC...
• 18. MCMC and likelihood-free methods Part/day II: ABC simulation-based methods in EconometricsABC using indirect inference (2) ...the above result shows that, in the limit as h → 0, ABC will be more accurate than an indirect inference method whose auxiliary statistics are the same as the summary statistic that is used for ABC(...) Initial analysis showed that which method is more accurate depends on the true value of θ. [Fearnhead and Prangle, 2012] c Indirect inference provides estimates rather than global inference...
• 19. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCGenetics of ABC simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
• 20. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCGenetic background of ABC ABC is a recent computational technique that only requires being able to sample from the likelihood f (·|θ) This technique stemmed from population genetics models, about 15 years ago, and population geneticists still contribute signiﬁcantly to methodological developments of ABC. [Griﬃth & al., 1997; Tavar´ & al., 1999] e
• 21. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCPopulation genetics [Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010] Describe the genotypes, estimate the alleles frequencies, determine their distribution among individuals, populations and between populations; Predict and understand the evolution of gene frequencies in populations as a result of various factors. c Analyses the eﬀect of various evolutive forces (mutation, drift, migration, selection) on the evolution of gene frequencies in time and space.
• 22. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCA[B]ctors Les mécanismes de l’évolution Towards an explanation of the mechanisms of evolutionary changes, mathematical models ofl’évolution ont to be developed •! Les mathématiques de evolution started in the years 1920-1930. commencé à être développés dans les années 1920-1930 Ronald A Fisher (1890-1962) Sewall Wright (1889-1988) John BS Haldane (1892-1964)
• 23. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABC Le modèleWright-Fisher model de Wright-Fisher •! En l’absence de mutation et de A population of constant sélection, les fréquences size, in which individuals alléliques dérivent (augmentent reproduce at the same time. et diminuent) inévitablement jusqu’à Each gene in a generation is la fixation d’un allèle a copy of a gene of the •! La dérive conduit generation. previous donc à la perte de variation génétique à l’intérieur des populations mutation In the absence of and selection, allele frequencies derive inevitably until the ﬁxation of an allele.
• 24. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCCoalescent theory [Kingman, 1982; Tajima, Tavar´, &tc] e !"#\$%&((")**+\$,-"."/010234%".5"*\$*%()23\$15"6" Coalescence theory interested in the genealogy of a sample of genes7**+\$,-",()5534%" common ancestor of the sample. " !! back in time to the " "!"7**+\$,-"8",\$)(5,1,"9"
• 25. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCCommon ancestor Les différentes lignées fusionnent (coalescent) au fur et à mesure que l’on remonte vers le Time of coalescence passé (T) Modélisation du processus de dérive génétique The diﬀerent lineages mergeen “remontant dans lein the past. when we go back temps” jusqu’à l’ancêtre commun d’un échantillon de gènes 6
• 26. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABC Sous l’hypothèse de neutralité des marqueurs génétiques étudiés, les mutations sont indépendantes de la généalogieNeutral généalogie ne dépend que des processus démographiques i.e. la mutations On construit donc la généalogie selon les paramètres démographiques (ex. N), puis on ajoute aassumption les Under the posteriori of mutations sur les différentes neutrality, the mutations branches,independentau feuilles de are du MRCA of the l’arbregenealogy. On obtient ainsi des données de We construct the genealogy polymorphisme sous les modèles according to the démographiques et mutationnels demographic parameters, considérés then we add a posteriori the mutations. 20
• 27. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCDemo-genetic inference Each model is characterized by a set of parameters θ that cover historical (time divergence, admixture time ...), demographics (population sizes, admixture rates, migration rates, ...) and genetic (mutation rate, ...) factors The goal is to estimate these parameters from a dataset of polymorphism (DNA sample) y observed at the present time Problem: most of the time, we can not calculate the likelihood of the polymorphism data f (y|θ).
• 28. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCUntractable likelihood Missing (too missing!) data structure: f (y|θ) = f (y|G , θ)f (G |θ)dG G The genealogies are considered as nuisance parameters. This problematic thus diﬀers from the phylogenetic approach where the tree is the parameter of interesst.
• 29. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABCA genuine !""#\$%&()*+,(-*.&(/+0\$"1)()&\$/+2!,03! example of application 1/+*%*"4*+56(""4&7()&\$/.+.1#+4*.+8-9:*.+ 94 Pygmies populations: do they have a common origin? Is there a lot of exchanges between pygmies and non-pygmies populations?
• 30. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABC !""#\$%&()*+,(-*.&(/+0\$"1)()&\$/+2!,03!Scenarios 1/+*%*"4*+56(""4&7()&\$/.+.1#+4*.+8-9:*.+ under competition Différents scénarios possibles, choix de scenario par ABC Verdu et al. 2009 96
• 31. 1/+*%*"4*+56(""4&7()&\$/.+.1#+4*.+MCMC and likelihood-free methods Part/day II: ABC !""#\$%&()*+,(-*.&(/+0\$"1)()&\$/+2!,03 Genetics of ABC 1/+*%*"4*+56(""4&7()&\$/.+.1#+4*.+8-9:*.Différentsresults Simulation scénarios possibles, choix de scenari Différents scénarios possibles, choix de scenario par ABC c Scenario 1A is chosen. largement soutenu par rapport aux Le scenario 1a est autres ! plaide pour une origine commune des Le scenario 1a est largement soutenu par populations pygmées d’Afrique de l’Ouest rap Verdu e
• 32. MCMC and likelihood-free methods Part/day II: ABC Genetics of ABC !""#\$%&()*+,(-*.&(/+0\$"1)()&\$/+2!,03!Most likely scenario 1/+*%*"4*+56(""4&7()&\$/.+.1#+4*.+8-9:*.+ Scénario évolutif : on « raconte » une histoire à partir de ces inférences Verdu et al. 2009 99
• 33. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computationApproximate Bayesian computation simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
• 34. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsUntractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
• 35. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsUntractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = f (y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
• 36. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsIllustrations Example () Stochastic volatility model: for Highest weight trajectories t = 1, . . . , T , 0.4 0.2 yt = exp(zt ) t , zt = a+bzt−1 +σηt , 0.0 −0.2 T very large makes it diﬃcult to −0.4 include z within the simulated 0 200 400 t 600 800 1000 parameters
• 37. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsIllustrations Example () Potts model: if y takes values on a grid Y of size k n and f (y|θ) ∝ exp θ Iyl =yi l∼i where l∼i denotes a neighbourhood relation, n moderately large prohibits the computation of the normalising constant
• 38. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsIllustrations Example (Genesis) Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]
• 39. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f (x|θ)
• 40. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique:
• 41. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´ et al., 1997] e
• 42. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsWhy does it work?! The proof is trivial: f (θi ) ∝ π(θi )f (z|θi )Iy (z) z∈D ∝ π(θi )f (y|θi ) = π(θi |y) . [Accept–Reject 101]
• 43. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsEarlier occurrence ‘Bayesian statistics and Monte Carlo methods are ideally suited to the task of passing many models over one dataset’ [Don Rubin, Annals of Statistics, 1984] Note Rubin (1984) does not promote this algorithm for likelihood-free simulation but frequentist intuition on posterior distributions: parameters from posteriors are more likely to be those that could have generated the data.
• 44. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsA as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance
• 45. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsA as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, (y, z) ≤ where is a distance Output distributed from π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < ) [Pritchard et al., 1999]
• 46. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} ≤ set θi = θ end for where η(y) deﬁnes a (not necessarily suﬃcient) statistic
• 47. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsOutput The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
• 48. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsOutput The likelihood-free algorithm samples from the marginal in z of: π(θ)f (z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f (z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
• 49. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (ﬁrst attempt) What happens when → 0?
• 50. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (ﬁrst attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ > 0, there exists 0 such that < 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)µ(B ) ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
• 51. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (ﬁrst attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ > 0, there exists 0 such that < 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ) XX ) µ(BX ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) µ(B ) XXX
• 52. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (ﬁrst attempt) What happens when → 0? If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary δ 0, there exists 0 such that 0 implies π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ) XX ) µ(BX ∈ A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X µ(B ) X X [Proof extends to other continuous-in-0 kernels K ]
• 53. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (second attempt) What happens when → 0?
• 54. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsConvergence of ABC (second attempt) What happens when → 0? For B ⊂ Θ, we have A f (z|θ)dz f (z|θ)π(θ)dθ ,y B π(θ)dθ = dz B A ,y ×Θ π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ B f (z|θ)π(θ)dθ m(z) = dz A ,y m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ m(z) = π(B|z) dz A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ which indicates convergence for a continuous π(B|z).
• 55. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsProbit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
• 56. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsProbit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
• 57. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsProbit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g -prior modelling: β ∼ N3 (0, n XT X)−1
• 58. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsProbit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g -prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
• 59. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsPima Indian benchmark 80 100 1.0 80 60 0.8 60 0.6 Density Density Density 40 40 0.4 20 20 0.2 0.0 0 0 −0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0 Figure: Comparison between density estimates of the marginals on β1 (left), β2 (center) and β3 (right) from ABC rejection samples (red) and MCMC samples (black) .
• 60. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsMA example Back to the MA(q) model q xt = t + ϑi t−i i=1 Simple prior: uniform over the inverse [real and complex] roots in q Q(u) = 1 − ϑi u i i=1 under the identiﬁability conditions
• 61. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsMA example Back to the MA(q) model q xt = t + ϑi t−i i=1 Simple prior: uniform prior over the identiﬁability zone, e.g. triangle for MA(2)
• 62. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsMA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−qt≤T 3. producing a simulated series (xt )1≤t≤T
• 63. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsMA example (2) ABC algorithm thus made of 1. picking a new value (ϑ1 , ϑ2 ) in the triangle 2. generating an iid sequence ( t )−qt≤T 3. producing a simulated series (xt )1≤t≤T Distance: basic distance between the series T ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2 t=1 or distance between summary statistics like the q autocorrelations T τj = xt xt−j t=j+1
• 64. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsComparison of distance impact Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
• 65. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsComparison of distance impact 4 1.5 3 1.0 2 0.5 1 0.0 0 0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5 θ1 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
• 66. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsComparison of distance impact 4 1.5 3 1.0 2 0.5 1 0.0 0 0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5 θ1 θ2 Evaluation of the tolerance on the ABC sample against both distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
• 67. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsHomonomy The ABC algorithm is not to be confused with the ABC algorithm The Artiﬁcial Bee Colony algorithm is a swarm based meta-heuristic algorithm that was introduced by Karaboga in 2005 for optimizing numerical problems. It was inspired by the intelligent foraging behavior of honey bees. The algorithm is speciﬁcally based on the model proposed by Tereshko and Loengarov (2005) for the foraging behaviour of honey bee colonies. The model consists of three essential components: employed and unemployed foraging bees, and food sources. The ﬁrst two components, employed and unemployed foraging bees, search for rich food sources (...) close to their hive. The model also deﬁnes two leading modes of behaviour (...): recruitment of foragers to rich food sources resulting in positive feedback and abandonment of poor sources by foragers causing negative feedback. [Karaboga, Scholarpedia]
• 68. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in eﬃciency
• 69. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in eﬃciency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
• 70. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in eﬃciency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
• 71. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in eﬃciency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y ... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
• 72. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ such that ρ(η(z), η(y)) , replace θ’s with locally regressed transforms (use with BIC) θ∗ = θ − {η(z) − η(y)}T β ˆ [Csill´ry et al., TEE, 2010] e ˆ where β is obtained by [NP] weighted least square regression on (η(z) − η(y)) with weights Kδ {ρ(η(z), η(y))} [Beaumont et al., 2002, Genetics]
• 73. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)]
• 74. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (regression) Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) : weight directly simulation by Kδ {ρ(η(z(θ)), η(y))} or S 1 Kδ {ρ(η(zs (θ)), η(y))} S s=1 [consistent estimate of f (η|θ)] Curse of dimensionality: poor estimate when d = dim(η) is large...
• 75. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior expectation i θi Kδ {ρ(η(z(θi )), η(y))} i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
• 76. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (density estimation) Use of the kernel weights Kδ {ρ(η(z(θ)), η(y))} leads to the NP estimate of the posterior conditional density ˜ Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} [Blum, JASA, 2010]
• 77. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))}
• 78. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d ) g c var(ˆ (θ|y)) = g (1 + oP (1)) nbδ d [Blum, JASA, 2010]
• 79. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NP (density estimations) Other versions incorporating regression adjustments ˜ Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))} i i Kδ {ρ(η(z(θi )), η(y))} In all cases, error E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d ) g c var(ˆ (θ|y)) = g (1 + oP (1)) nbδ d [standard NP calculations]
• 80. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NCH Incorporating non-linearities and heterocedasticities: σ (η(y)) ˆ θ∗ = m(η(y)) + [θ − m(η(z))] ˆ ˆ σ (η(z)) ˆ
• 81. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NCH Incorporating non-linearities and heterocedasticities: σ (η(y)) ˆ θ∗ = m(η(y)) + [θ − m(η(z))] ˆ ˆ σ (η(z)) ˆ where m(η) estimated by non-linear regression (e.g., neural network) ˆ σ (η) estimated by non-linear regression on residuals ˆ log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi ˆ [Blum Fran¸ois, 2009] c
• 82. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NCH (2) Why neural network?
• 83. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-NCH (2) Why neural network? ﬁghts curse of dimensionality selects relevant summary statistics provides automated dimension reduction oﬀers a model choice capability improves upon multinomial logistic [Blum Fran¸ois, 2009] c
• 84. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,  ω (θ  (t) θ otherwise,
• 85. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )K (θ |θ(t) ) ,  ω (θ  (t) θ otherwise, has the posterior π(θ|y ) as stationary distribution [Marjoram et al, 2003]
• 86. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MCMC (2) Algorithm 2 Likelihood-free MCMC sampler Use Algorithm 1 to get (θ(0) , z(0) ) for t = 1 to N do Generate θ from Kω ·|θ(t−1) , Generate z from the likelihood f (·|θ ), Generate u from U[0,1] , π(θ )Kω (θ(t−1) |θ ) if u ≤ I π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then set (θ(t) , z(t) ) = (θ , z ) else (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ), end if end for
• 87. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhy does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) XX|θ IA ,y (z ) ) f (z X X = π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) ) q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) ) × q(θ |θ (t−1) ) X|θ X ) f (z XX
• 88. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhy does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) XX|θ IA ,y (z ) ) f (z X X = hh ( π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) ) f (z(t−1) |θ (t−1) ((( h (( hh ( q(θ (t−1) |θ ) ((hhhhh) f (z(t−1) |θ (t−1) (( ((( h × q(θ |θ (t−1) ) X|θ X ) f (z XX
• 89. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhy does it work? Acceptance probability does not involve calculating the likelihood and π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) ) × π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ ) π(θ ) X|θ IA ,y (z ) X ) f (z X X = hh ( π(θ (t−1) ) ((hhhhh) XX(t−1) ) f (z(t−1) |θ (t−1) IAX (( X ((( h ,y (zX X hh ( q(θ (t−1) |θ ) ((hhhhh) f (z(t−1) |θ (t−1) (( ((( h × q(θ |θ (t−1) ) X|θ X ) f (z XX π(θ )q(θ (t−1) |θ ) = IA ,y (z ) π(θ (t−1) q(θ |θ (t−1) )
• 90. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example Case of 1 1 x ∼ N (θ, 1) + N (−θ, 1) 2 2 under prior θ ∼ N (0, 10)
• 91. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example Case of 1 1 x ∼ N (θ, 1) + N (−θ, 1) 2 2 under prior θ ∼ N (0, 10) ABC sampler thetas=rnorm(N,sd=10) zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1) eps=quantile(abs(zed-x),.01) abc=thetas[abs(zed-x)eps]
• 92. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example Case of 1 1 x ∼ N (θ, 1) + N (−θ, 1) 2 2 under prior θ ∼ N (0, 10) ABC-MCMC sampler metas=rep(0,N) metas[1]=rnorm(1,sd=10) zed[1]=x for (t in 2:N){ metas[t]=rnorm(1,mean=metas[t-1],sd=5) zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1) if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){ metas[t]=metas[t-1] zed[t]=zed[t-1]} }
• 93. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x =2
• 94. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x =2 0.30 0.25 0.20 0.15 0.10 0.05 0.00 −4 −2 0 2 4 θ
• 95. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x = 10
• 96. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x = 10 0.25 0.20 0.15 0.10 0.05 0.00 −10 −5 0 5 10 θ
• 97. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x = 50
• 98. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA toy example x = 50 0.10 0.08 0.06 0.04 0.02 0.00 −40 −20 0 20 40 θ
• 99. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
• 100. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ [Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS] Use of a joint density f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( ) where y is the data, and ξ( |y, θ) is the prior predictive density of ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ) Warning! Replacement of ξ( |y, θ) with a non-parametric kernel approximation.
• 101. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
• 102. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ details Multidimensional distances ρk (k = 1, . . . , K ) and errors k = ρk (ηk (z), ηk (y)), with ˆ 1 k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ] Bhk b ˆ then used in replacing ξ( |y, θ) with mink ξk ( |y, θ) ABCµ involves acceptance probability ˆ π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ ) ˆ π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
• 103. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ multiple errors [ c Ratmann et al., PNAS, 2009]
• 104. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABCµ for model choice [ c Ratmann et al., PNAS, 2009]
• 105. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupQuestions about ABCµ For each model under comparison, marginal posterior on used to assess the ﬁt of the model (HPD includes 0 or not).
• 106. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupQuestions about ABCµ For each model under comparison, marginal posterior on used to assess the ﬁt of the model (HPD includes 0 or not). Is the data informative about ? [Identiﬁability] How is the prior π( ) impacting the comparison? How is using both ξ( |x0 , θ) and π ( ) compatible with a standard probability model? [remindful of Wilkinson ] Where is the penalisation for complexity in the model comparison? [X, Mengersen Chen, 2010, PNAS]
• 107. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T )
• 108. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC Another sequential version producing a sequence of Markov (t) (t) transition kernels Kt and of samples (θ1 , . . . , θN ) (1 ≤ t ≤ T ) ABC-PRC Algorithm (t−1) 1. Select θ at random from previous θi ’s with probabilities (t−1) ωi (1 ≤ i ≤ N). 2. Generate (t) (t) θi ∼ Kt (θ|θ ) , x ∼ f (x|θi ) , 3. Check that (x, y ) , otherwise start again. [Sisson et al., 2007]
• 109. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhy PRC? Partial rejection control: Resample from a population of weighted particles by pruning away particles with weights below threshold C , replacing them by new particles obtained by propagating an existing particle by an SMC step and modifying the weights accordinly. [Liu, 2001]
• 110. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhy PRC? Partial rejection control: Resample from a population of weighted particles by pruning away particles with weights below threshold C , replacing them by new particles obtained by propagating an existing particle by an SMC step and modifying the weights accordinly. [Liu, 2001] PRC justiﬁcation in ABC-PRC: Suppose we then implement the PRC algorithm for some c 0 such that only identically zero weights are smaller than c Trouble is, there is no such c...
• 111. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel.
• 112. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior.
• 113. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC weight (t) Probability ωi computed as (t) (t) (t) (t) ωi ∝ π(θi )Lt−1 (θ |θi ){π(θ )Kt (θi |θ )}−1 , where Lt−1 is an arbitrary transition kernel. In case Lt−1 (θ |θ) = Kt (θ|θ ) , all weights are equal under a uniform prior. Inspired from Del Moral et al. (2006), who use backward kernels Lt−1 in SMC to achieve unbiasedness
• 114. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC bias Lack of unbiasedness of the method
• 115. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-PRC bias Lack of unbiasedness of the method Joint density of the accepted pair (θ(t−1) , θ(t) ) proportional to (t−1) (t) (t−1) (t) π(θ |y )Kt (θ |θ )f (y |θ ), For an arbitrary function h(θ), E [ωt h(θ(t) )] proportional to (t) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) (t−1) (t) (t−1) (t) (t−1) (t) h(θ ) π(θ |y )Kt (θ |θ )f (y |θ )dθ dθ π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) π(θ (t) )Lt−1 (θ (t−1) |θ (t) ) (t−1) (t−1) ∝ h(θ ) π(θ )f (y |θ ) π(θ (t−1) )Kt (θ (t) |θ (t−1) ) (t) (t−1) (t) (t−1) (t) × Kt (θ |θ )f (y |θ )dθ dθ (t) (t) (t−1) (t) (t−1) (t−1) (t) ∝ h(θ )π(θ |y ) Lt−1 (θ |θ )f (y |θ )dθ dθ .
• 116. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA mixture example (1) Toy model of Sisson et al. (2007): if θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) , then the posterior distribution associated with y = 0 is the normal mixture θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100) restricted to [−10, 10]. Furthermore, true target available as π(θ||x| ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
• 117. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soup“Ugly, squalid graph...” 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 θ θ θ θ θ Comparison of τ = 0.15 and τ = 1/0.15 in Kt
• 118. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA PMC version Use of the same kernel idea as ABC-PRC but with IS correction ( check PMC ) Generate a sample at iteration t by N (t) (t−1) (t−1) πt (θ ˆ )∝ ωj Kt (θ(t) |θj ) j=1 modulo acceptance of the associated xt , and use an importance (t) weight associated with an accepted simulation θi (t) (t) (t) ωi ∝ π(θi ) πt (θi ) . ˆ c Still likelihood free [Beaumont et al., 2009]
• 119. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupOur ABC-PMC algorithm Given a decreasing sequence of approximation levels 1 ≥ ... ≥ T, 1. At iteration t = 1, For i = 1, ..., N (1) (1) Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y ) 1 (1) Set ωi = 1/N (1) Take τ 2 as twice the empirical variance of the θi ’s 2. At iteration 2 ≤ t ≤ T , For i = 1, ..., N, repeat (t−1) (t−1) Pick θi from the θj ’s with probabilities ωj (t) 2 (t) generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi ) until (x, y ) t (t) (t) N (t−1) −1 (t) (t−1) Set ωi ∝ π(θi )/ j=1 ωj ϕ σt θi − θj ) 2 (t) Take τt+1 as twice the weighted empirical variance of the θi ’s
• 120. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupSequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions πn with π0 “easy” and πT as target. Iterated IS as PMC: particles moved from time n to time n via kernel Kn and use of a sequence of extended targets πn˜ n πn (z0:n ) = πn (zn ) ˜ Lj (zj+1 , zj ) j=0 where the Lj ’s are backward Markov kernels [check that πn (zn ) is a marginal] [Del Moral, Doucet Jasra, Series B, 2006]
• 121. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupSequential Monte Carlo (2) Algorithm 3 SMC sampler (0) sample zi ∼ γ0 (x) (i = 1, . . . , N) (0) (0) (0) compute weights wi = π0 (zi )/γ0 (zi ) for t = 1 to N do if ESS(w (t−1) ) NT then resample N particles z (t−1) and set weights to 1 end if (t−1) (t−1) generate zi ∼ Kt (zi , ·) and set weights to (t) (t) (t−1) (t) (t−1) πt (zi ))Lt−1 (zi ), zi )) wi = wi−1 (t−1) (t−1) (t) πt−1 (zi ))Kt (zi ), zi )) end for
• 122. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-SMC [Del Moral, Doucet Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel Kn associated with target π n and derivation of the backward kernel π n (z )Kn (z , z) Ln−1 (z, z ) = πn (z) Update of the weights M m m=1 IA n (xin ) win ∝ wi(n−1) M m (xi(n−1) ) m=1 IA n−1 m when xin ∼ K (xi(n−1) , ·)
• 123. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-SMCM Modiﬁcation: Makes M repeated simulations of the pseudo-data z given the parameter, rather than using a single [M = 1] simulation, leading to weight that is proportional to the number of accepted zi s M 1 ω(θ) = Iρ(η(y),η(zi )) M i=1 [limit in M means exact simulation from (tempered) target]
• 124. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupProperties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles Major assumption: the forward kernel K is supposed to be invariant against the true target [tempered version of the true posterior]
• 125. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupProperties of ABC-SMC The ABC-SMC method properly uses a backward kernel L(z, z ) to simplify the importance weight and to remove the dependence on the unknown likelihood from this weight. Update of importance weights is reduced to the ratio of the proportions of surviving particles Major assumption: the forward kernel K is supposed to be invariant against the true target [tempered version of the true posterior] Adaptivity in ABC-SMC algorithm only found in on-line construction of the thresholds t , slowly enough to keep a large number of accepted transitions
• 126. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA mixture example (2) Recovery of the target, whether using a ﬁxed standard deviation of τ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s. 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 θ θ θ θ θ
• 127. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupYet another ABC-PRC Another version of ABC-PRC called PRC-ABC with a proposal distribution Mt , S replications of the pseudo-data, and a PRC step: PRC-ABC Algorithm (t−1) (t−1) 1. Select θ at random from the θi ’s with probabilities ωi (t) iid (t−1) 2. Generate θi ∼ Kt , x1 , . . . , xS ∼ f (x|θi ), set (t) (t) ωi = Kt {η(xs ) − η(y)} Kt (θi ) , s (t) and accept with probability p (i) = ωi /ct,N (t) (t) 3. Set ωi ∝ ωi /p (i) (1 ≤ i ≤ N) [Peters, Sisson Fan, Stat Computing, 2012]
• 128. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupYet another ABC-PRC Another version of ABC-PRC called PRC-ABC with a proposal distribution Mt , S replications of the pseudo-data, and a PRC step: (t−1) (t−1) Use of a NP estimate Mt (θ) = ωi ψ(θ|θi ) (with a ﬁxed variance?!) S = 1 recommended for “greatest gains in sampler performance” ct,N upper quantile of non-zero weights at time t
• 129. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008]
• 130. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWilkinson’s exact BC ABC approximation error (i.e. non-zero tolerance) replaced with exact simulation from a controlled approximation to the target, convolution of true posterior with kernel function π(θ)f (z|θ)K (y − z) π (θ, z|y) = , π(θ)f (z|θ)K (y − z)dzdθ with K kernel parameterised by bandwidth . [Wilkinson, 2008] Theorem The ABC algorithm based on the assumption of a randomised observation y = y + ξ, ξ ∼ K , and an acceptance probability of ˜ K (y − z)/M gives draws from the posterior distribution π(θ|y).
• 131. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupHow exact a BC? “Using to represent measurement error is straightforward, whereas using to model the model discrepancy is harder to conceptualize and not as commonly used” [Richard Wilkinson, 2008]
• 132. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupHow exact a BC? Pros Pseudo-data from true model and observed data from noisy model Interesting perspective in that outcome is completely controlled Link with ABCµ and assuming y is observed with a measurement error with density K Relates to the theory of model approximation [Kennedy O’Hagan, 2001] Cons Requires K to be bounded by M True approximation error never assessed Requires a modiﬁcation of the standard ABC algorithm
• 133. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC for HMMs Speciﬁc case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) 0 where only y1:n is observed. [Dean, Singh, Jasra, Peters, 2011]
• 134. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC for HMMs Speciﬁc case of a hidden Markov model Xt+1 ∼ Qθ (Xt , ·) Yt+1 ∼ gθ (·|xt ) 0 where only y1:n is observed. [Dean, Singh, Jasra, Peters, 2011] Use of speciﬁc constraints, adapted to the Markov structure: 0 0 y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
• 135. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MLE for HMMs ABC-MLE deﬁned by ˆ 0 0 θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , ) θ
• 136. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MLE for HMMs ABC-MLE deﬁned by ˆ 0 0 θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , ) θ Exact MLE for the likelihood same basis as Wilkinson! 0 pθ (y1 , . . . , yn ) corresponding to the perturbed process (xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1) [Dean, Singh, Jasra, Peters, 2011]
• 137. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MLE is biased ABC-MLE is asymptotically (in n) biased with target l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
• 138. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupABC-MLE is biased ABC-MLE is asymptotically (in n) biased with target l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )] but ABC-MLE converges to true value in the sense l n (θn ) → l (θ) for all sequences (θn ) converging to θ and n
• 139. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupNoisy ABC-MLE Idea: Modify instead the data from the start 0 (y1 + ζ1 , . . . , yn + ζn ) [ see Fearnhead-Prangle ] noisy ABC-MLE estimate 0 0 arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , ) θ [Dean, Singh, Jasra, Peters, 2011]
• 140. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupConsistent noisy ABC-MLE Degrading the data improves the estimation performances: Noisy ABC-MLE is asymptotically (in n) consistent under further assumptions, the noisy ABC-MLE is asymptotically normal increase in variance of order −2 likely degradation in precision or computing time due to the lack of summary statistic [curse of dimensionality]
• 141. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupSMC for ABC likelihood Algorithm 4 SMC ABC for HMMs Given θ for k = 1, . . . , n do 1 1 N N generate proposals (xk , yk ), . . . , (xk , yk ) from the model weigh each proposal with ωk l =I l B(yk + ζk , ) (yk ) 0 l renormalise the weights and sample the xk ’s accordingly end for approximate the likelihood by n N l ωk N k=1 l=1 [Jasra, Singh, Martin, McCoy, 2010]
• 142. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhich summary? Fundamental diﬃculty of the choice of the summary statistic when there is no non-trivial suﬃcient statistics [except when done by the experimenters in the ﬁeld]
• 143. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhich summary? Fundamental diﬃculty of the choice of the summary statistic when there is no non-trivial suﬃcient statistics [except when done by the experimenters in the ﬁeld] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test
• 144. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupWhich summary? Fundamental diﬃculty of the choice of the summary statistic when there is no non-trivial suﬃcient statistics [except when done by the experimenters in the ﬁeld] Starting from a large collection of summary statistics is available, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test Not taking into account the sequential nature of the tests Depends on parameterisation Order of inclusion matters
• 145. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Alphabet soupA connected Monte Carlo study of the pseudo-data per simulated parameter Repeating simulations does not improve approximation Tolerance level does not seem to be highly inﬂuential Choice of distance / summary statistics / calibration factors are paramount to successful approximation ABC-SMC outperforms ABC-MCMC [Mckinley, Cook, Deardon, 2009]
• 146. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionSemi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics η (y) = η(y) + τ ˜ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic
• 147. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionSemi-automatic ABC Fearnhead and Prangle (2010) study ABC and the selection of the summary statistic in close proximity to Wilkinson’s proposal ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics η (y) = η(y) + τ ˜ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior]
• 148. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionSummary statistics Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
• 149. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionSummary statistics Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)! Use of the standard quadratic loss function (θ − θ0 )T A(θ − θ0 ) .
• 150. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionDetails on Fearnhead and Prangle (FP) ABC Use of a summary statistic S(·), an importance proposal g (·), a kernel K (·) ≤ 1 and a bandwidth h 0 such that (θ, ysim ) ∼ g (θ)f (ysim |θ) is accepted with probability (hence the bound) K [{S(ysim ) − sobs }/h] and the corresponding importance weight deﬁned by π(θ) g (θ) [Fearnhead Prangle, 2012]
• 151. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionErrors, errors, and errors Three levels of approximation π(θ|yobs ) by π(θ|sobs ) loss of information [ignored] π(θ|sobs ) by π(s)K [{s − sobs }/h]π(θ|s) ds πABC (θ|sobs ) = π(s)K [{s − sobs }/h] ds noisy observations πABC (θ|sobs ) by importance Monte Carlo based on N simulations, represented by var(a(θ)|sobs )/Nacc [expected number of acceptances] [M. Twain/B. Disraeli]
• 152. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAverage acceptance asymptotics For the average acceptance probability/approximate likelihood p(θ|sobs ) = f (ysim |θ) K [{S(ysim ) − sobs }/h] dysim , overall acceptance probability p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd ) [FP, Lemma 1]
• 153. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionOptimal importance proposal Best choice of importance proposal in terms of eﬀective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice]
• 154. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionOptimal importance proposal Best choice of importance proposal in terms of eﬀective sample size g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2 [Not particularly useful in practice] note that p(θ|sobs ) is an approximate likelihood reminiscent of parallel tempering could be approximately achieved by attrition of half of the data
• 155. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCalibration of h “This result gives insight into how S(·) and h aﬀect the Monte Carlo error. To minimize Monte Carlo error, we need hd to be not too small. Thus ideally we want S(·) to be a low dimensional summary of the data that is suﬃciently informative about θ that π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (FP, p.5) turns h into an absolute value while it should be context-dependent and user-calibrated only addresses one term in the approximation error and acceptance probability (“curse of dimensionality”) h large prevents πABC (θ|sobs ) to be close to π(θ|sobs ) d small prevents π(θ|sobs ) to be close to π(θ|yobs ) (“curse of [dis]information”)
• 156. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCalibrating ABC “If πABC is calibrated, then this means that probability statements that are derived from it are appropriate, and in particular that we can use πABC to quantify uncertainty in estimates” (FP, p.5)
• 157. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCalibrating ABC “If πABC is calibrated, then this means that probability statements that are derived from it are appropriate, and in particular that we can use πABC to quantify uncertainty in estimates” (FP, p.5) Deﬁnition For 0 q 1 and subset A, event Eq (A) made of sobs such that PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if Pr(θ ∈ A|Eq (A)) = q unclear meaning of conditioning on Eq (A)
• 158. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCalibrated ABC Theorem (FP) Noisy ABC, where sobs = S(yobs ) + h , ∼ K (·) is calibrated [Wilkinson, 2008] no condition on h!!
• 159. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCalibrated ABC Consequence: when h = ∞ Theorem (FP) The prior distribution is always calibrated is this a relevant property then?
• 160. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionMore about calibrated ABC “Calibration is not universally accepted by Bayesians. It is even more questionable here as we care how statements we make relate to the real world, not to a mathematically deﬁned posterior.” R. Wilkinson Same reluctance about the prior being calibrated Property depending on prior, likelihood, and summary Calibration is a frequentist property (almost a p-value!) More sensible to account for the simulator’s imperfections than using noisy-ABC against a meaningless based measure [Wilkinson, 2012]
• 161. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionConverging ABC Theorem (FP) For noisy ABC, the expected noisy-ABC log-likelihood, E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + )]π(yobs |θ0 )K ( )dyobs d , has its maximum at θ = θ0 . True for any choice of summary statistic? even ancilary statistics?! [Imposes at least identiﬁability...] Relevant in asymptotia and not for the data
• 162. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionConverging ABC Corollary For noisy ABC, the ABC posterior converges onto a point mass on the true parameter value as m → ∞. For standard ABC, not always the case (unless h goes to zero). Strength of regularity conditions (c1) and (c2) in Bernardo Smith, 1994? [out-of-reach constraints on likelihood and posterior] Again, there must be conditions imposed upon summary statistics...
• 163. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionLoss motivated statistic Under quadratic loss function, Theorem (FP) ˆ (i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when ˆ = E(θ|yobs ) (!) θ (ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs ) ˆ (iii) If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ] ˆ E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK (x)dx + o(h2 ). measure-theoretic diﬃculties? dependence of sobs on h makes me uncomfortable inherent to noisy ABC Relevant for choice of K ?
• 164. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionOptimal summary statistic “We take a diﬀerent approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (FP, p.5) From this result, FP derive their choice of summary statistic, S(y) = E(θ|y) [almost suﬃcient] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
• 165. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionOptimal summary statistic “We take a diﬀerent approach, and weaken the requirement for πABC to be a good approximation to π(θ|yobs ). We argue for πABC to be a good approximation solely in terms of the accuracy of certain estimates of the parameters.” (FP, p.5) From this result, FP derive their choice of summary statistic, S(y) = E(θ|y) [wow! EABC [θ|S(yobs )] = E[θ|yobs ]] suggest h = O(N −1/(2+d) ) and h = O(N −1/(4+d) ) as optimal bandwidths for noisy and standard ABC.
• 166. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCaveat Since E(θ|yobs ) is most usually unavailable, FP suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic.
• 167. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionCaveat Since E(θ|yobs ) is most usually unavailable, FP suggest (i) use a pilot run of ABC to determine a region of non-negligible posterior mass; (ii) simulate sets of parameter values and data; (iii) use the simulated sets of parameter values and data to estimate the summary statistic; and (iv) run ABC with this choice of summary statistic. where is the assessment of the ﬁrst stage error?
• 168. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionApproximating the summary statistic As Beaumont et al. (2002) and Blum and Fran¸ois (2010), FP c use a linear regression to approximate E(θ|yobs ): (i) θi = β0 + β (i) f (yobs ) + i with f made of standard transforms [Further justiﬁcations?]
• 169. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selection[my]questions about semi-automatic ABC dependence on h and S(·) in the early stage reduction of Bayesian inference to point estimation approximation error in step (i) not accounted for not parameterisation invariant practice shows that proper approximation to genuine posterior distributions stems from using a (much) larger number of summary statistics than the dimension of the parameter the validity of the approximation to the optimal summary statistic depends on the quality of the pilot run important inferential issues like model choice are not covered by this approach. [Robert, 2012]
• 170. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionMore about semi-automatic ABC [End of section derived from comments on Read Paper, to appear] “The apparently arbitrary nature of the choice of summary statistics has always been perceived as the Achilles heel of ABC.” M. Beaumont
• 171. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionMore about semi-automatic ABC [End of section derived from comments on Read Paper, to appear] “The apparently arbitrary nature of the choice of summary statistics has always been perceived as the Achilles heel of ABC.” M. Beaumont “Curse of dimensionality” linked with the increase of the dimension of the summary statistic Connection with principal component analysis [Itan et al., 2010] Connection with partial least squares [Wegman et al., 2009] Beaumont et al. (2002) postprocessed output is used as input by FP to run a second ABC
• 172. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionWood’s alternative Instead of a non-parametric kernel approximation to the likelihood 1 K {η(yr ) − η(yobs )} R r Wood (2010) suggests a normal approximation η(y(θ)) ∼ Nd (µθ , Σθ ) whose parameters can be approximated based on the R simulations (for each value of θ).
• 173. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionWood’s alternative Instead of a non-parametric kernel approximation to the likelihood 1 K {η(yr ) − η(yobs )} R r Wood (2010) suggests a normal approximation η(y(θ)) ∼ Nd (µθ , Σθ ) whose parameters can be approximated based on the R simulations (for each value of θ). Parametric versus non-parametric rate [Uh?!] Automatic weighting of components of η(·) through Σθ Dependence on normality assumption (pseudo-likelihood?) [Cornebise, Girolami Kosmidis, 2012]
• 174. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionReinterpretation and extensions Reinterpretation of ABC output as joint simulation from π (x, y |θ) = f (x|θ)¯Y |X (y |x) ¯ π where πY |X (y |x) = K (y − x) ¯
• 175. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionReinterpretation and extensions Reinterpretation of ABC output as joint simulation from π (x, y |θ) = f (x|θ)¯Y |X (y |x) ¯ π where πY |X (y |x) = K (y − x) ¯ Reinterpretation of noisy ABC if y |y obs ∼ πY |X (·|y obs ), then marginally ¯ ¯ y ∼ πY |θ (·|θ0 ) ¯ ¯ c Explain for the consistency of Bayesian inference based on y and π ¯ ¯ [Lee, Andrieu Doucet, 2012]
• 176. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionReinterpretation and extensions Also ﬁts within the pseudo-marginal of Andrieu and Roberts (2009) Algorithm 5 Rejuvenating GIMH-ABC At time t, given θt = θ: Sample θ ∼ g (·|θ). ⊗N Sample z1:N ∼ πY |Θ (·|θ ). ⊗(N−1) Sample x1:N−1 ∼ πY |Θ (·|θ). With probability N π(θ )g (θ|θ ) i=1 IBh (zi ) (y ) min 1, × N−1 π(θ)g (θ |θ) 1+ i=1 IBh (xi ) (y ) set θt+1 = θ . Otherwise set θt+1 = θ.
• 177. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionReinterpretation and extensions Also ﬁts within the pseudo-marginal of Andrieu and Roberts (2009) Algorithm 6 One-hit MCMC-ABC At time t, with θt = θ: Sample θ ∼ g (·|θ). With probability 1 − min 1, π(θ )g(θ |θ)) , set θt+1 = θ and go to π(θ)g (θ|θ time t + 1. Sample zi ∼ πY |Θ (·|θ ) and xi ∼ πY |Θ (·|θ) for i = 1, . . . until y ∈ Bh (zi ) and/or y ∈ Bh (xi ). If y ∈ Bh (zi ) set θt+1 = θ and go to time t + 1. If y ∈ Bh (xi ) set θt+1 = θ and go to time t + 1. [Lee, Andrieu Doucet, 2012]
• 178. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionReinterpretation and extensions Also ﬁts within the pseudo-marginal of Andrieu and Roberts (2009) Validation c The invariant distribution of θ is πθ|Y (·|y ) ¯ for both algorithms. [Lee, Andrieu Doucet, 2012]
• 179. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionABC for Markov chains Rewriting the posterior as π(θ)1−n π(θ|x1 ) π(θ|xt−1 , xt ) where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ)
• 180. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionABC for Markov chains Rewriting the posterior as π(θ)1−n π(θ|x1 ) π(θ|xt−1 , xt ) where π(θ|xt−1 , xt ) ∝ f (xt |xt−1 , θ)π(θ) Allows for a stepwise ABC, replacing each π(θ|xt−1 , xt ) by an ABC approximation Similarity with FP’s multiple sources of data (and also with Dean et al., 2011 ) [White et al., 2010, 2012]
• 181. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionBack to suﬃciency Diﬀerence between regular suﬃciency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and
• 182. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionBack to suﬃciency Diﬀerence between regular suﬃciency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and marginal suﬃciency, stated as π(µ(θ)|y) = π(µ(θ)|η(y)) for all θ’s, the given prior π and a subvector µ(θ) [Basu, 1977]
• 183. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionBack to suﬃciency Diﬀerence between regular suﬃciency, equivalent to π(θ|y) = π(θ|η(y)) for all θ’s and all priors π, and marginal suﬃciency, stated as π(µ(θ)|y) = π(µ(θ)|η(y)) for all θ’s, the given prior π and a subvector µ(θ) [Basu, 1977] Relates to F P’s main result, but could event be reduced to conditional suﬃciency π(µ(θ)|yobs ) = π(µ(θ)|η(yobs )) (if feasible at all...) [Dawson, 2012]
• 184. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionABC and BCa’s Arguments in favour of a comparison of ABC with bootstrap in the event π(θ) is not deﬁned from prior information but as conventional non-informative prior. [Efron, 2012]
• 185. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionPredictive performances Instead of posterior means, other aspects of posterior to explore. E.g., look at minimising loss of information p(θ, y) p(θ, η(y)) p(θ, y) log dθdy − p(θ, η(y)) log dθdη(y) p(θ)p(y) p(θ)p(η(y)) for selection of summary statistics. [Filippi, Barnes, Stumpf, 2012]
• 186. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ)
• 187. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artiﬁcial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) Accept with probability π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ) ∧1 π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ ) [Møller, Pettitt, Berthelsen, Reeves, 2006]
• 188. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artiﬁcial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) Accept with probability ˜ ˜ π(θ )f (y|θ )g (z |θ , y) K (θ , θ)f (z|θ) ∧1 ˜ ˜ π(θ)f (y|θ)g (z|θ, y) K (θ, θ )f (z |θ ) [Møller, Pettitt, Berthelsen, Reeves, 2006]
• 189. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables Auxiliary variable method avoids computations of untractable constant in likelihood ˜ f (y|θ) = Zθ f (y|θ) Introduce pseudo-data z with artiﬁcial target g (z|θ, y) Generate θ ∼ K (θ, θ ) and z ∼ f (z|θ ) For Gibbs random ﬁelds , existence of a genuine suﬃcient statistic η(y). [Møller, Pettitt, Berthelsen, Reeves, 2006]
• 190. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables and ABC Special case of ABC when g (z|θ, y) = K (η(z) − η(y)) ˜ ˜ ˜ ˜ f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!]
• 191. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionAuxiliary variables and ABC Special case of ABC when g (z|θ, y) = K (η(z) − η(y)) ˜ ˜ ˜ ˜ f (y|θ )f (z|θ)/f (y|θ)f (z |θ ) replaced by one [or not?!] Consequences likelihood-free (ABC) versus constant-free (AVM) in ABC, K (·) should be allowed to depend on θ for Gibbs random ﬁelds, the auxiliary approach should be prefered to ABC [Møller, 2012]
• 192. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionABC and BIC Idea of applying BIC during the local regression : Run regular ABC Select summary statistics during local regression Recycle the prior simulation sample (reference table) with those summary statistics Rerun the corresponding local regression (low cost) [Pudlo Sedki, 2012]
• 193. MCMC and likelihood-free methods Part/day II: ABC Approximate Bayesian computation Automated summary statistic selectionA Brave New World?!
• 194. MCMC and likelihood-free methods Part/day II: ABC ABC for model choiceABC for model choice simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
• 195. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceBayesian model choice Several models M1 , M2 , . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm (θ m ) Goal is to derive the posterior distribution of M, challenging computational target when models are complex.
• 196. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceGeneric ABC for model choice Algorithm 7 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θ m from the prior πm (θ m ) Generate z from the model fm (z|θ m ) until ρ{η(z), η(y)} Set m(t) = m and θ (t) = θ m end for
• 197. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Issues with implementation: should tolerances be the same for all models? should summary statistics vary across models (incl. their dimension)? should the distance measure ρ vary as well?
• 198. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m T 1 Im(t) =m . T t=1 Extension to a weighted polychotomous logistic regression estimate of π(M = m|y), with non-parametric kernel weights [Cornuet et al., DIYABC, 2009]
• 199. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceThe Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Against: Templeton, 2008, 2009, 2010a, 2010b, 2010c argues that nested hypotheses cannot have higher probabilities than nesting hypotheses (!)
• 200. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Model choiceThe Great ABC controversy On-going controvery in phylogeographic genetics about the validity of using ABC for testing Replies: Fagundes et al., 2008, Against: Templeton, 2008, Beaumont et al., 2010, Berger et 2009, 2010a, 2010b, 2010c al., 2010, Csill`ry et al., 2010 e argues that nested hypotheses point out that the criticisms are cannot have higher probabilities addressed at [Bayesian] than nesting hypotheses (!) model-based inference and have nothing to do with ABC...
• 201. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsGibbs random ﬁelds Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random ﬁeld associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential suﬃcient statistic U(y) = c∈C Vc (yc ) is the energy function
• 202. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsGibbs random ﬁelds Gibbs distribution The rv y = (y1 , . . . , yn ) is a Gibbs random ﬁeld associated with the graph G if 1 f (y) = exp − Vc (yc ) , Z c∈C where Z is the normalising constant, C is the set of cliques of G and Vc is any function also called potential suﬃcient statistic U(y) = c∈C Vc (yc ) is the energy function c Z is usually unavailable in closed form
• 203. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsPotts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure
• 204. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsPotts model Potts model Vc (y) is of the form Vc (y) = θS(y) = θ δyl =yi l∼i where l∼i denotes a neighbourhood structure In most realistic settings, summation Zθ = exp{θ T S(x)} x∈X involves too many terms to be manageable and numerical approximations cannot always be trusted [Cucala, Marin, CPR Titterington, 2009]
• 205. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsBayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1
• 206. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsBayesian Model Choice Comparing a model with potential S0 taking values in Rp0 versus a model with potential S1 taking values in Rp1 can be done through the Bayes factor corresponding to the priors π0 and π1 on each parameter space exp{θ T S0 (x)}/Zθ 0 ,0 π0 (dθ 0 ) 0 Bm0 /m1 (x) = exp{θ T S1 (x)}/Zθ 1 ,1 π1 (dθ 1 ) 1 Use of Jeﬀreys’ scale to select most appropriate model
• 207. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsNeighbourhood relations Choice to be made between M neighbourhood relations m i ∼i (0 ≤ m ≤ M − 1) with Sm (x) = I{xi =xi } m i ∼i driven by the posterior probabilities of the models.
• 208. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsModel index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm )
• 209. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsModel index Formalisation via a model index M that appears as a new parameter with prior distribution π(M = m) and π(θ|M = m) = πm (θm ) Computational target: P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m) , Θm
• 210. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsSuﬃcient statistics By deﬁnition, if S(x) suﬃcient statistic for the joint parameters (M, θ0 , . . . , θM−1 ), P(M = m|x) = P(M = m|S(x)) .
• 211. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsSuﬃcient statistics By deﬁnition, if S(x) suﬃcient statistic for the joint parameters (M, θ0 , . . . , θM−1 ), P(M = m|x) = P(M = m|S(x)) . For each model m, own suﬃcient statistic Sm (·) and S(·) = (S0 (·), . . . , SM−1 (·)) also suﬃcient.
• 212. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsSuﬃcient statistics in Gibbs random ﬁelds For Gibbs random ﬁelds, 1 2 x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm ) 1 = f 2 (S(x)|θm ) n(S(x)) m where n(S(x)) = {˜ ∈ X : S(˜) = S(x)} x x c S(x) is therefore also suﬃcient for the joint parameters [Speciﬁc to Gibbs random ﬁelds!]
• 213. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsABC model choice Algorithm ABC-MC Generate m∗ from the prior π(M = m). ∗ Generate θm∗ from the prior πm∗ (·). Generate x ∗ from the model fm∗ (·|θm∗ ). ∗ Compute the distance ρ(S(x0 ), S(x∗ )). Accept (θm∗ , m∗ ) if ρ(S(x0 ), S(x∗ )) . ∗ Note When = 0 the algorithm is exact
• 214. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 )
• 215. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsABC approximation to the Bayes factor Frequency ratio: ˆ P(M = m0 |x0 ) π(M = m1 ) BF m0 /m1 (x0 ) = × ˆ P(M = m1 |x0 ) π(M = m0 ) {mi∗ = m0 } π(M = m1 ) = × , {mi∗ = m1 } π(M = m0 ) replaced with 1 + {mi∗ = m0 } π(M = m1 ) BF m0 /m1 (x0 ) = × 1 + {mi∗ = m1 } π(M = m0 ) to avoid indeterminacy (also Bayes estimate).
• 216. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsToy example iid Bernoulli model versus two-state ﬁrst-order Markov chain, i.e. n f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n , i=1 versus n 1 f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 , 2 i=2 with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase transition” boundaries).
• 217. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Gibbs random ﬁeldsToy example (2) 10 5 5 BF01 BF01 0 ^ ^ 0 −5 −5 −40 −20 0 10 −10 −40 −20 0 10 BF01 BF01 (left) Comparison of the true BF m0 /m1 (x0 ) with BF m0 /m1 (x0 ) (in logs) over 2, 000 simulations and 4.106 proposals from the prior. (right) Same when using tolerance corresponding to the 1% quantile on the distances.
• 218. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceBack to suﬃciency ‘Suﬃcient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
• 219. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceBack to suﬃciency ‘Suﬃcient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1 (x) suﬃcient statistic for model m = 1 and parameter θ1 and η2 (x) suﬃcient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always suﬃcient for (m, θm )
• 220. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceBack to suﬃciency ‘Suﬃcient statistics for individual models are unlikely to be very informative for the model probability.’ [Scott Sisson, Jan. 31, 2011, X.’Og] If η1 (x) suﬃcient statistic for model m = 1 and parameter θ1 and η2 (x) suﬃcient statistic for model m = 2 and parameter θ2 , (η1 (x), η2 (x)) is not always suﬃcient for (m, θm ) c Potential loss of information at the testing level
• 221. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior
• 222. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 (T → ∞) ABC approximation T t=1 Imt =1 Iρ{η(zt ),η(y)}≤ B12 (y) = T , t=1 Imt =2 Iρ{η(zt ),η(y)}≤ where the (mt , z t )’s are simulated from the (joint) prior As T go to inﬁnity, limit Iρ{η(z),η(y)}≤ π1 (θ 1 )f1 (z|θ 1 ) dz dθ 1 B12 (y) = Iρ{η(z),η(y)}≤ π2 (θ 2 )f2 (z|θ 2 ) dz dθ 2 Iρ{η,η(y)}≤ π1 (θ 1 )f1η (η|θ 1 ) dη dθ 1 = , Iρ{η,η(y)}≤ π2 (θ 2 )f2η (η|θ 2 ) dη dθ 2 where f1η (η|θ 1 ) and f2η (η|θ 2 ) distributions of η(z)
• 223. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 ( → 0) When goes to zero, η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 B12 (y) = , π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2
• 224. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 ( → 0) When goes to zero, η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 B12 (y) = , π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2 c Bayes factor based on the sole observation of η(y)
• 225. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 (under suﬃciency) If η(y) suﬃcient statistic for both models, fi (y|θ i ) = gi (y)fi η (η(y)|θ i ) Thus Θ1 π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1 B12 (y) = Θ2 π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2 g1 (y) π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen Lawson, 2011]
• 226. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceLimiting behaviour of B12 (under suﬃciency) If η(y) suﬃcient statistic for both models, fi (y|θ i ) = gi (y)fi η (η(y)|θ i ) Thus Θ1 π(θ 1 )g1 (y)f1η (η(y)|θ 1 ) dθ 1 B12 (y) = Θ2 π(θ 2 )g2 (y)f2η (η(y)|θ 2 ) dθ 2 g1 (y) π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1 g1 (y) η = η = B (y) . g2 (y) π2 (θ 2 )f2 (η(y)|θ 2 ) dθ 2 g2 (y) 12 [Didelot, Everitt, Johansen Lawson, 2011] c No discrepancy only when cross-model suﬃciency
• 227. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choicePoisson/geometric example Sample x = (x1 , . . . , xn ) from either a Poisson P(λ) or from a geometric G(p) Then n S= yi = η(x) i=1 suﬃcient statistic for either model but not simultaneously Discrepancy ratio g1 (x) S!n−S / i yi ! = g2 (x) 1 n+S−1 S
• 228. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choicePoisson/geometric discrepancy η Range of B12 (x) versus B12 (x) B12 (x): The values produced have nothing in common.
• 229. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceFormal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a suﬃcient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011]
• 230. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceFormal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a suﬃcient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011] In the Poisson/geometric case, if i xi ! is added to S, no discrepancy
• 231. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceFormal recovery Creating an encompassing exponential family T T f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θ1 η1 (x) + θ1 η1 (x) + α1 t1 (x) + α2 t2 (x)} leads to a suﬃcient statistic (η1 (x), η2 (x), t1 (x), t2 (x)) [Didelot, Everitt, Johansen Lawson, 2011] Only applies in genuine suﬃciency settings... c Inability to evaluate loss brought by summary statistics
• 232. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceMeaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og]
• 233. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceMeaning of the ABC-Bayes factor ‘This is also why focus on model discrimination typically (...) proceeds by (...) accepting that the Bayes Factor that one obtains is only derived from the summary statistics and may in no way correspond to that of the full model.’ [Scott Sisson, Jan. 31, 2011, X.’Og] In the Poisson/geometric case, if E[yi ] = θ0 0, η (θ0 + 1)2 −θ0 lim B12 (y) = e n→∞ θ0
• 234. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceMA(q) divergence 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1 2 1 2 1 2 1 2 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insuﬃcient autocovariance distances. Sample of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor equal to 17.71.
• 235. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceMA(q) divergence 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1 2 1 2 1 2 1 2 Evolution [against ] of ABC Bayes factor, in terms of frequencies of visits to models MA(1) (left) and MA(2) (right) when equal to 10, 1, .1, .01% quantiles on insuﬃcient autocovariance distances. Sample of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21 equal to .004.
• 236. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceFurther comments ‘There should be the possibility that for the same model, but diﬀerent (non-minimal) [summary] statistics (so ∗ diﬀerent η’s: η1 and η1 ) the ratio of evidences may no longer be equal to one.’ [Michael Stumpf, Jan. 28, 2011, ’Og] Using diﬀerent summary statistics [on diﬀerent models] may indicate the loss of information brought by each set but agreement does not lead to trustworthy approximations.
• 237. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceA population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter
• 238. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceA population genetics evaluation Population genetics example with 3 populations 2 scenari 15 individuals 5 loci single mutation parameter 24 summary statistics 2 million ABC proposal importance [tree] sampling alternative
• 239. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceStability of importance sampling 1.0 1.0 1.0 1.0 1.0 q q 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 q 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0
• 240. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceComparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction 1.0 q q q q q q q q q q q q q q q q qq q q 0.8 q q q q q qq q q q q q q qq q ABC estimates of P(M=1|y) q q q q q q qq qq q qq q q q q q 0.6 q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q qq 0.4 qq q q q q q q q q q 0.2 q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
• 241. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceComparison with ABC Use of 15 summary statistics and DIY-ABC logistic correction 1.0 q q q q qq q qq q qq q qqq qq q qq q q q q qq q q q q q q qq q q q q q 0.8 q q q q qq q q q q q q q q q ABC estimates of P(M=1|y) q q q q q q q q q q q q q q 0.6 q q q qq q q q q q q qq 0.4 q q q q q q q q q q q 0.2 q q q q q q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
• 242. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceComparison with ABC Use of 24 summary statistics and DIY-ABC logistic correction 1.0 q q q q q q qq q q q qq q q q q q q q q q q q 0.8 qq q q q q q q q q q q q qq qq ABC estimates of P(M=1|y) q q q q q 0.6 q q q q q q q q q q q qq q q q q q q q q q q q q q q 0.4 q q q q q q q q q q q q q q q q q q q q 0.2 q q q q q q 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Importance Sampling estimates of P(M=1|y)
• 243. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceThe only safe cases??? Besides speciﬁc models like Gibbs random ﬁelds, using distances over the data itself escapes the discrepancy... [Toni Stumpf, 2010; Sousa al., 2009]
• 244. MCMC and likelihood-free methods Part/day II: ABC ABC for model choice Generic ABC model choiceThe only safe cases??? Besides speciﬁc models like Gibbs random ﬁelds, using distances over the data itself escapes the discrepancy... [Toni Stumpf, 2010; Sousa al., 2009] ...and so does the use of more informal model ﬁtting measures [Ratmann al., 2009]
• 245. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistencyABC model choice consistency simulation-based methods in Econometrics Genetics of ABC Approximate Bayesian computation ABC for model choice ABC model choice consistency
• 246. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkThe starting point Central question to the validation of ABC for model choice: When is a Bayes factor based on an insuﬃcient statistic T(y) consistent?
• 247. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkThe starting point Central question to the validation of ABC for model choice: When is a Bayes factor based on an insuﬃcient statistic T(y) consistent? T Note: c drawn on T(y) through B12 (y) necessarily diﬀers from c drawn on y through B12 (y)
• 248. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one).
• 249. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). Four possible statistics 1. sample mean y (suﬃcient for M1 if not M2 ); 2. sample median med(y) (insuﬃcient); 3. sample variance var(y) (ancillary); 4. median absolute deviation mad(y) = med(y − med(y));
• 250. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). 6 5 4 Density 3 2 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 posterior probability
• 251. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). n=100 0.7 0.6 0.5 0.4 q 0.3 q q q 0.2 0.1 q q q q q 0.0 q q Gauss Laplace
• 252. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). 6 3 5 4 Density Density 2 3 2 1 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 1.0 posterior probability probability
• 253. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkA benchmark if toy example Comparison suggested by referee of PNAS paper [thanks]: [X, Cornuet, Marin, Pillai, Aug. 2011] Model M1 : y ∼ N (θ1 , 1) opposed to model M2 : √ y ∼ L(θ2 , 1/ √ Laplace distribution with mean θ2 and scale 2), parameter 1/ 2 (variance one). n=100 n=100 1.0 0.7 q q 0.6 q 0.8 q q 0.5 q q q 0.6 q 0.4 q q q q 0.3 q q 0.4 q q 0.2 0.2 q 0.1 q q q q q q q 0.0 0.0 q q Gauss Laplace Gauss Laplace
• 254. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkFramework Starting from sample y = (y1 , . . . , yn ) the observed sample, not necessarily iid with true distribution y ∼ Pn Summary statistics T(y) = Tn = (T1 (y), T2 (y), · · · , Td (y)) ∈ Rd with true distribution Tn ∼ Gn .
• 255. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Formalised frameworkFramework c Comparison of – under M1 , y ∼ F1,n (·|θ1 ) where θ1 ∈ Θ1 ⊂ Rp1 – under M2 , y ∼ F2,n (·|θ2 ) where θ2 ∈ Θ2 ⊂ Rp2 turned into – under M1 , T(y) ∼ G1,n (·|θ1 ), and θ1 |T(y) ∼ π1 (·|Tn ) – under M2 , T(y) ∼ G2,n (·|θ2 ), and θ2 |T(y) ∼ π2 (·|Tn )
• 256. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A1] There exist a sequence {vn } converging to +∞, an a.c. distribution Q with continuous bounded density q(·), a symmetric, d × d positive deﬁnite matrix V0 and a vector µ0 ∈ Rd such that −1/2 n→∞ vn V0 (Tn − µ0 ) Q, under Gn and for all M 0 −d −1/2 sup |V0 |1/2 vn gn (t) − q vn V0 {t − µ0 } = o(1) vn |t−µ0 |M
• 257. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A2] For i = 1, 2, there exist d × d symmetric positive deﬁnite matrices Vi (θi ) and µi (θi ) ∈ Rd such that n→∞ vn Vi (θi )−1/2 (Tn − µi (θi )) Q, under Gi,n (·|θi ) .
• 258. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A3] For i = 1, 2, there exist sets Fn,i ⊂ Θi and constants i , τi , αi 0 such that for all τ 0, sup Gi,n |Tn − µ(θi )| τ |µi (θi ) − µ0 | ∧ i |θi θi ∈Fn,i −α −αi vn i sup (|µi (θi ) − µ0 | ∧ i ) θi ∈Fn,i with c −τ πi (Fn,i ) = o(vn i ).
• 259. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A4] For u 0 −1 Sn,i (u) = θi ∈ Fn,i ; |µ(θi ) − µ0 | ≤ u vn if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exist constants di τi ∧ αi − 1 such that −d πi (Sn,i (u)) ∼ u di vn i , ∀u vn
• 260. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A5] If inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, there exists U 0 such that for any M 0, −d sup sup |Vi (θi )|1/2 vn gi (t|θi ) vn |t−µ0 |M θi ∈Sn,i (U) −q vn Vi (θi )−1/2 (t − µ(θi ) = o(1) and πi Sn,i (U) ∩ ||Vi (θi )−1 || + ||Vi (θi )|| M lim lim sup =0. M→∞ n πi (Sn,i (U))
• 261. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAssumptions A collection of asymptotic “standard” assumptions: [A1]–[A2] are standard central limit theorems ([A1] redundant when one model is “true”) [A3] controls the large deviations of the estimator Tn from the estimand µ(θ) [A4] is the standard prior mass condition found in Bayesian asymptotics (di eﬀective dimension of the parameter) [A5] controls more tightly convergence esp. when µi is not one-to-one
• 262. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsEﬀective dimension [A4] Understanding d1 , d2 : deﬁned only when µ0 ∈ {µi (θi ), θi ∈ Θi }, πi (θi : |µi (θi ) − µ0 | n−1/2 ) = O(n−di /2 ) is the eﬀective dimension of the model Θi around µ0
• 263. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsAsymptotic marginals Asymptotically, under [A1]–[A5] mi (t) = gi (t|θi ) πi (θi ) dθi Θi is such that (i) if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, Cl vn i ≤ mi (Tn ) ≤ Cu vn i d−d d−d and (ii) if inf{|µi (θi ) − µ0 |; θi ∈ Θi } 0 mi (Tn ) = oPn [vn i + vn i ]. d−τ d−α
• 264. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsWithin-model consistency Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn provided αi ∧ τi di .
• 265. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsWithin-model consistency Under same assumptions, if inf{|µi (θi ) − µ0 |; θi ∈ Θi } = 0, the posterior distribution of µi (θi ) given Tn is consistent at rate 1/vn provided αi ∧ τi di . Note: di can truly be seen as an eﬀective dimension of the model under the posterior πi (.|Tn ), since if µ0 ∈ {µi (θi ); θi ∈ Θi }, mi (Tn ) ∼ vn i d−d
• 266. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsBetween-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value!
• 267. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsBetween-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0 then −(d1 −d2 ) −(d1 −d2 ) Cl vn ≤ m1 (Tn ) m2 (Tn ) ≤ Cu vn , where Cl , Cu = OPn (1), irrespective of the true model. c Only depends on the diﬀerence d1 − d2
• 268. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsBetween-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value of Tn under both models. And only by this mean value! Else, if inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0 then m1 (Tn ) −(d1 −α2 ) −(d1 −τ2 ) n ≥ Cu min vn , vn , m2 (T )
• 269. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsConsistency theorem If inf{|µ0 − µ2 (θ2 )|; θ2 ∈ Θ2 } = inf{|µ0 − µ1 (θ1 )|; θ1 ∈ Θ1 } = 0, Bayes factor T −(d1 −d2 ) B12 = O(vn ) irrespective of the true model. It is consistent iﬀ Pn is within the model with the smallest dimension
• 270. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Consistency resultsConsistency theorem If Pn belongs to one of the two models and if µ0 cannot be attained by the other one : 0 = min (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) max (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) , T then the Bayes factor B12 is consistent
• 271. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsConsequences on summary statistics Bayes factor driven by the means µi (θi ) and the relative position of µ0 wrt both sets {µi (θi ); θi ∈ Θi }, i = 1, 2. For ABC, this implies the most likely statistics Tn are ancillary statistics with diﬀerent mean values under both models Else, if Tn asymptotically depends on some of the parameters of the models, it is quite likely that there exists θi ∈ Θi such that µi (θi ) = µ0 even though model M1 is misspeciﬁed
• 272. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [1] If n Tn = n−1 Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 and the true distribution is Laplace with mean θ0 = 1, under the √ Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ ) [here d1 = d2 = d = 1]
• 273. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [1] If n Tn = n−1 Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 and the true distribution is Laplace with mean θ0 = 1, under the √ Gaussian model the value θ∗ = 2 3 − 3 leads to µ0 = µ(θ∗ ) [here d1 = d2 = d = 1] c a Bayes factor associated with such a statistic is inconsistent
• 274. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 8 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 probability Fourth moment
• 275. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 n=1000 0.45 q 0.40 0.35 q q q 0.30 q M1 M2
• 276. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [1] If n n −1 T =n Xi4 , µ1 (θ) = 3 + θ4 + 6θ2 , µ2 (θ) = 6 + · · · i=1 Caption: Comparison of the distributions of the posterior probabilities that the data is from a normal model (as opposed to a Laplace model) with unknown mean when the data is made of n = 1000 observations either from a normal (M1) or Laplace (M2) distribution with mean one and when the summary statistic in the ABC algorithm is restricted to the empirical fourth moment. The ABC algorithm uses proposals from the prior N (0, 4) and selects the tolerance as the 1% distance quantile.
• 277. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ and the true distribution is Laplace with mean θ0 = 0, then √ ∗ ∗ µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3 [d1 = 1 and d2 = 1/2] thus B12 ∼ n−1/4 → 0 : consistent
• 278. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ and the true distribution is Laplace with mean θ0 = 0, then √ ∗ ∗ µ0 = 6, µ1 (θ1 ) = 6 with θ1 = 2 3 − 3 [d1 = 1 and d2 = 1/2] thus B12 ∼ n−1/4 → 0 : consistent Under the Gaussian model µ0 = 3 µ2 (θ2 ) ≥ 6 3 ∀θ2 B12 → +∞ : consistent
• 279. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsToy example: Laplace versus Gauss [0] When (4) (6) T(y) = yn , yn ¯ ¯ 5 4 3 Density 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 posterior probabilities Fourth and sixth moments
• 280. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsEmbedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn
• 281. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsEmbedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn If summary statistic only informative on a parameter that is the same under both models, i.e if d1 = d2 , then c the Bayes factor is not consistent
• 282. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsEmbedded models When M1 submodel of M2 , and if the true distribution belongs to the smaller model M1 , Bayes factor is of order −(d1 −d2 ) vn Else, d1 d2 and Bayes factor is consistent under M1 . If true distribution not in M1 , then c Bayes factor is consistent only if µ1 = µ2 = µ0
• 283. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsAnother toy example: Quantile distribution 1 − exp{−gz(p)} k Q(p; A, B, g , k) = A+B 1 + 1 + z(p)2 z(p) 1 + exp{−gz(p)} A, B, g and k, location, scale, skewness and kurtosis parameters Embedded models: M1 : g = 0 and k ∼ U[−1/2, 5] M2 : g ∼ U[0, 4] and k ∼ U[−1/2, 5].
• 284. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsConsistency [or not] 0.7 0.7 0.7 0.6 0.6 0.6 q 0.5 0.5 0.5 0.4 0.4 0.4 q q q q 0.3 0.3 0.3 M1 M2 M1 M2 M1 M2 1.0 1.0 0.8 q q q 0.8 0.8 q q q q q 0.6 0.6 0.6 q q q q 0.4 0.4 0.4 q q q q q q q q q q 0.2 0.2 0.2 q q q q q q 0.0 0.0 0.0 M1 M2 M1 M2 M1 M2 1.0 1.0 q q 0.8 q q 0.8 0.8 q q q q 0.6 q 0.6 q 0.6 q q 0.4 0.4 q 0.4 q q q q q q q 0.2 0.2 q 0.2 q q q q 0.0 0.0 0.0 M1 M2 M1 M2 M1 M2
• 285. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency Summary statisticsConsistency [or not] Caption: Comparison of the distributions of the posterior probabilities that the data is from model M1 when the data is made of 100 observations (left column), 1000 observations (central column) and 10,000 observations (right column) either from M1 (M1) or M2 (M2) when the summary statistics in the ABC algorithm are made of the empirical quantile at level 10% (ﬁrst row), the empirical quantiles at levels 10% and 90% (second row), and the empirical quantiles at levels 10%, 40%, 60% and 90% (third row), respectively. The boxplots rely on 100 replicas and the ABC algorithms are based on 104 proposals from the prior, with the tolerance being chosen as the 1% quantile on the distances.
• 286. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency ConclusionsConclusions • Model selection feasible with ABC • Choice of summary statistics is paramount • At best, ABC → π(. | T(y)) which concentrates around µ0
• 287. MCMC and likelihood-free methods Part/day II: ABC ABC model choice consistency ConclusionsConclusions • Model selection feasible with ABC • Choice of summary statistics is paramount • At best, ABC → π(. | T(y)) which concentrates around µ0 • For estimation : {θ; µ(θ) = µ0 } = θ0 • For testing {µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅