This document discusses approximate Bayesian computation (ABC). ABC allows Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. It introduces ABC, describes how it originated from population genetics models, and outlines some of its limitations and advances, including various related computational methods like ABC with empirical likelihoods. The document also examines how ABC relates to other simulation-based statistical methods and considers perspectives on how Bayesian ABC can be.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
The presentation is an introduction to decision making with approximate Bayesian Methods. It consists of a review of Bayesian Decision Theory and Variational Inference along with a description of Loss Calibrated Variational Inference.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
We consider a new class of huge-scale problems, the problems with sparse subgradients. The most important functions of this type are piecewise linear. For optimization problems with uniform sparsity of corresponding linear operators, we suggest a very efficient implementation of subgradient iterations, the total cost of which depends logarithmically in the dimension. This technique is based on a recursive update of the results of matrix/vector products and the values of symmetric functions. It works well, for example, for matrices with few nonzero diagonals and for max-type functions.
We show that the updating technique can be efficiently coupled with the simplest subgradient methods. Similar results can be obtained for a new non-smooth random variant of a coordinate descent scheme. We also present promising results of preliminary computational experiments.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
"The Metropolis adjusted Langevin Algorithm
for log-concave probability measures in high
dimensions", talk by Andreas Elberle at the BigMC seminar, 9th June 2011, Paris
The presentation is an introduction to decision making with approximate Bayesian Methods. It consists of a review of Bayesian Decision Theory and Variational Inference along with a description of Loss Calibrated Variational Inference.
The present study concerns the feedback stabilisation of the unstable equilibria of a two-dimensional nonlinear pool-boiling system with essentially heterogeneous temperature distributions in the
fluid-heater interface. Regulation of such equilibria has great potential for application in, for instance, micro-electronics cooling. Here, as a first step, stabilisation of these equilibria is considered. To this end a control law is implemented that regulates the heat supply to the heater as a function of the Fourier-Chebyshev modes of its internal temperature distribution. These
modes are intimately related to the physical eigenmodes of the system and thus admit robust and efficient regulation on the basis of the natural composition of the temperature field. Key to this modal-control strategy and its application for controller design are two equivalent and interchangeable PDE and state-space forms of the linearised pool-boiling model. Derivation of these forms is a central theme of this study. Performance of modal controllers thus designed is demonstrated and analysed by simulations of the nonlinear closed-loop system.
How to find a cheap surrogate to approximate Bayesian Update Formula and to a...Alexander Litvinenko
We suggest the new vision for classical Bayesian Update formula. We expand all ingredients in Polynomial Chaos Expansion and write out a new formula for Bayesian* update of PCE coefficients. This formula is derived from Minimum Mean Square Estimation. One starts with prior PCE, take measurements into account, and obtain posterior PCE coefficients, without any MCMC sampling.
Non-sampling functional approximation of linear and non-linear Bayesian UpdateAlexander Litvinenko
We offer a non-sampling functional approximation of non-linear surrogate to classical Bayesian Update formula. We start with prior Polynomial Chaos Expansion (PCE), express log-likelihood in a PCE basis and obtain a new posterior PCE.
Main IDEA is to update not probability density, but basis coefficients.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
We present a proof of the Generalized Riemann hypothesis (GRH) based on asymptotic expansions and operations on series. The advantage of our method is that it only uses undergraduate maths which makes it accessible to a wider audience.
A crash coarse in stochastic Lyapunov theory for Markov processes (emphasis is on continuous time)
See also the survey for models in discrete time,
https://netfiles.uiuc.edu/meyn/www/spm_files/MarkovTutorial/MarkovTutorialUCSB2010.html
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
For more details, please have a look at:
1. https://www.mdpi.com/1099-4300/24/2/188
2. https://ieeexplore.ieee.org/document/9518004
Abstract:
In statistical inference, the information-theoretic performance limits can often be expressed in terms of a notion of divergence between the underlying statistical models (e.g., in binary hypothesis testing, the total error probability is equal to the total variation between the models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (divergence reduces due to the data processing inequality for divergence). This paper considers linear dimensionality reduction such that the divergence between the models is \emph{maximally} preserved. Specifically, the paper focuses on the Gaussian models and characterizes an optimal projection of the data onto a lower-dimensional subspace with respect to four $f$-divergence measures (Kullback-Leibler, $\chi^2$, Hellinger, and total variation). There are two key observations. First, projections are not necessarily along the dominant modes of the covariance matrix of the data, and even in some situations, they can be along the least dominant modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the $f$-divergence measures considered, rendering a degree of universality to the design independent of the inference problem of interest.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Unit 8 - Information and Communication Technology (Paper I).pdf
ABC: How Bayesian can it be?
1. Approximate Bayesian Computation:
how Bayesian can it be?
Christian P. Robert
ISBA LECTURES ON BAYESIAN FOUNDATIONS
ISBA 2012, Kyoto, Japan
Universit´ Paris-Dauphine, IuF, & CREST
e
http://www.ceremade.dauphine.fr/~xian
June 24, 2012
2. This talk is dedicated to the memory of our dear friend
and fellow Bayesian, George Casella, 1951–2012
4. Approximate Bayesian computation
Introduction
Monte Carlo basics
simulation-based methods in Econometrics
Approximate Bayesian computation
ABC as an inference machine
5. General issue
Given a density π known up to a normalizing constant, e.g. a
posterior density, and an integrable function h, how can one use
h(x)˜ (x)µ(dx)
π
Ih = h(x)π(x)µ(dx) =
π (x)µ(dx)
˜
when h(x)˜ (x)µ(dx) is intractable?
π
6. Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Ih by
N
Imc (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
since [LLN] IMC (h) −→ Ih
N
Furthermore, if Ih2 = h2 (x)π(x)µ(dx) < ∞,
√ L
[CLT] ˆ
N IMC (h) − Ih
N N 0, I [h − Ih ]2 .
Caveat
Often impossible or inefficient to simulate directly from π
7. Monte Carlo basics
Generate an iid sample x1 , . . . , xN from π and estimate Ih by
N
Imc (h) = N −1
ˆ
N h(xi ).
i=1
ˆ as
since [LLN] IMC (h) −→ Ih
N
Furthermore, if Ih2 = h2 (x)π(x)µ(dx) < ∞,
√ L
[CLT] ˆ
N IMC (h) − Ih
N N 0, I [h − Ih ]2 .
Caveat
Often impossible or inefficient to simulate directly from π
8. Importance Sampling
For Q proposal distribution with density q
Ih = h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1 , . . . , xN ∼ Q and estimate Ih by
N
IIS (h) = N −1
ˆ
Q,N h(xi ){π/q}(xi ).
i=1
9. Importance Sampling
For Q proposal distribution with density q
Ih = h(x){π/q}(x)q(x)µ(dx).
Principle
Generate an iid sample x1 , . . . , xN ∼ Q and estimate Ih by
N
IIS (h) = N −1
ˆ
Q,N h(xi ){π/q}(xi ).
i=1
10. Importance Sampling (convergence)
Then
ˆ as
[LLN] IIS (h) −→ Ih
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
[CLT] ˆ
N(IIS (h) − Ih )
Q,N N 0, Q{(hπ/q − Ih )2 } .
Caveat
ˆ
If normalizing constant unknown, impossible to use IIS (h)
Q,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
11. Importance Sampling (convergence)
Then
ˆ as
[LLN] IIS (h) −→ Ih
Q,N and if Q((hπ/q)2 ) < ∞,
√ L
[CLT] ˆ
N(IIS (h) − Ih )
Q,N N 0, Q{(hπ/q − Ih )2 } .
Caveat
ˆ
If normalizing constant unknown, impossible to use IIS (h)
Q,N
Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
12. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
13. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
14. Self-normalised importance Sampling
Self normalized version
N −1 N
ˆ
ISNIS (h) = {π/q}(xi ) h(xi ){π/q}(xi ).
Q,N
i=1 i=1
ˆ as
[LLN] ISNIS (h) −→ Ih
Q,N
and if I((1+h2 )(π/q)) < ∞,
√ L
[CLT] ˆ
N(ISNIS (h) − Ih )
Q,N N 0, π {(π/q)(h − Ih }2 ) .
Caveat
ˆ
If π cannot be computed, impossible to use ISNIS (h)
Q,N
15. Perspectives
What is the fundamental issue?
a mere computational issue (optimism: can be solved /
pessimism: too costly in the short term)
more of a inferencial issue (optimism: gathering legit from
classical B approach / pessimism: lacking the coherence of
classical B approach)
calling for a new methodology (optimism: equivalent to
classical B approach / pessimism: not always convergent)
16. Perspectives
What is the fundamental issue?
a mere computational issue (optimism: can be solved /
pessimism: too costly in the short term)
more of a inferencial issue (optimism: gathering legit from
classical B approach / pessimism: lacking the coherence of
classical B approach)
calling for a new methodology (optimism: equivalent to
classical B approach / pessimism: not always convergent)
17. Perspectives
What is the fundamental issue?
a mere computational issue (optimism: can be solved /
pessimism: too costly in the short term)
more of a inferencial issue (optimism: gathering legit from
classical B approach / pessimism: lacking the coherence of
classical B approach)
calling for a new methodology (optimism: equivalent to
classical B approach / pessimism: not always convergent)
18. Econom’ections
Model choice
Similar exploration of simulation-based and approximation
techniques in Econometrics
Simulated method of moments
Method of simulated moments
Simulated pseudo-maximum-likelihood
Indirect inference
[Gouri´roux & Monfort, 1996]
e
19. Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
1. simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
2. and estimate θ by
n
arg min (yto − yt (θ))2
θ
t=1
20. Simulated method of moments
o
Given observations y1:n from a model
yt = r (y1:(t−1) , t , θ) , t ∼ g (·)
1. simulate 1:n , derive
yt (θ) = r (y1:(t−1) , t , θ)
2. and estimate θ by
n n 2
arg min yto − yt (θ)
θ
t=1 t=1
21. Method of simulated moments
Given a statistic vector K (y ) with
Eθ [K (Yt )|y1:(t−1) ] = k(y1:(t−1) ; θ)
find an unbiased estimator of k(y1:(t−1) ; θ),
˜
k( t , y1:(t−1) ; θ)
Estimate θ by
n S
arg min K (yt ) − ˜ t
k( s , y1:(t−1) ; θ)/S
θ
t=1 s=1
[Pakes & Pollard, 1989]
22. Indirect inference
ˆ
Minimise [in θ] a distance between estimators β based on a
pseudo-model for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´roux, Monfort, & Renault, 1993;
e
Smith, 1993; Gallant & Tauchen, 1996]
23. Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
ˆ
β(y) = arg max log f (yt |β, y1:(t−1) )
β
t
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
24. Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
2
ˆ ∂ log f
β(y) = arg min (yt |β, y1:(t−1) )
β
t
∂β
leading to
ˆ ˆ
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
25. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux & Monfort, 1996, p. 66]
e
26. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux & Monfort, 1996, p. 66]
e
27. Choice of pseudo-model
Arbitrariness of pseudo-model
Pick model such that
ˆ
1. β(θ) not flat (i.e. sensitive to changes in θ)
ˆ
2. β(θ) not dispersed (i.e. robust agains changes in ys (θ))
[Frigessi & Heggland, 2004]
28. Empirical likelihood
Another approximation method (not yet related with simulation)
Definition
For dataset y = (y1 , . . . , yn ), and parameter of interest θ, pick
constraints
E[h(Y , θ)] = 0
uniquely identifying θ and define the empirical likelihood as
n
Lel (θ|y) = max pi
p
i=1
for p in the set {p ∈ [0; 1]n , pi = 1, i pi h(yi , θ) = 0}.
[Owen, 1988]
29. Empirical likelihood
Another approximation method (not yet related with simulation)
Example
When θ = Ef [Y ], empirical likelihood is the maximum of
p1 · · · pn
under constraint
p1 y1 + . . . + pn yn = θ
30. ABCel
Another approximation method (now related with simulation!)
Importance sampling implementation
Algorithm 1: Raw ABCel sampler
Given observation y
for i = 1 to M do
Generate θ i from the prior distribution π(·)
Set the weight ωi = Lel (θ i |y)
end for
Proceed with pairs (θi , ωi ) as in regular importance sampling
[Mengersen, Pudlo & Robert, 2012]
31. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
32. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
33. A?B?C?
A stands for approximate
[wrong likelihood]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
34. How much Bayesian?
asymptotically so (meaningfull?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
35. Approximate Bayesian computation
Introduction
Approximate Bayesian computation
Genetics of ABC
ABC basics
Advances and interpretations
Alphabet (compu-)soup
ABC as an inference machine
36. Genetic background of ABC
ABC is a recent computational technique that only requires being
able to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´ & al., 1999]
e
37. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we can not calculate the likelihood of the
polymorphism data f (y|θ).
38. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we can not calculate the likelihood of the
polymorphism data f (y|θ).
39. A genuine example of application
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
94
Pygmies populations: do they have a common origin? Was there a
lot of exchanges between pygmies and non-pygmies populations?
41. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
42. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
43. Untractable likelihood
So, what can we do when the
likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference...?
44. Untractable likelihood
So, what can we do when the
likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference...?
45. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
46. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
47. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 2984; Tavar´ et al., 1997]
e
48. Why does it work?!
The proof is trivial:
f (θi ) ∝ π(θi )f (z|θi )Iy (z)
z∈D
∝ π(θi )f (y|θi )
= π(θi |y) .
[Accept–Reject 101]
49. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
(y, z) ≤
where is a distance
Output distributed from
def
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
[Pritchard et al., 1999]
50. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
(y, z) ≤
where is a distance
Output distributed from
def
π(θ) Pθ { (y, z) < } ∝ π(θ| (y, z) < )
[Pritchard et al., 1999]
51. ABC algorithm
In most implementations, further degree of A...pproximation:
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)} ≤
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
52. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
53. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
54. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
55. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
56. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)µ(B )
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ)µ(B )
57. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ > 0, there exists 0 such that < 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X
µ(B )
X
X
58. Convergence of ABC (first attempt)
What happens when → 0?
If f (·|θ) is continuous in y , uniformly in θ [!], given an arbitrary
δ 0, there exists 0 such that 0 implies
π(θ) f (z|θ)IA ,y (z) dz π(θ)f (y|θ)(1 δ)
XX )
µ(BX
∈
A ,y ×Θ π(θ)f (z|θ)dzdθ Θ π(θ)f (y|θ)dθ(1 ± δ) X
µ(B )
X
X
[Proof extends to other continuous-in-0 kernels K ]
59. Convergence of ABC (second attempt)
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
60. Convergence of ABC (second attempt)
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
61. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
62. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
63. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
64. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can hope for is
π(θ|η(y) = π(η)
Bummer!!!
65. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
66. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
67. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
68. Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes as function of variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
200 observations of Pima.tr and g -prior modelling:
β ∼ N3 (0, n XT X)−1
importance function inspired from MLE estimator distribution
ˆ ˆ
β ∼ N (β, Σ)
69. Pima Indian benchmark
80
100
1.0
80
60
0.8
60
0.6
Density
Density
Density
40
40
0.4
20
20
0.2
0.0
0
0
−0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0
Figure: Comparison between density estimates of the marginals on β1
(left), β2 (center) and β3 (right) from ABC rejection samples (red) and
MCMC samples (black)
.
70. MA example
MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform over the inverse [real and complex] roots in
q
Q(u) = 1 − ϑi u i
i=1
under the identifiability conditions
71. MA example
MA(q) model
q
xt = t + ϑi t−i
i=1
Simple prior: uniform prior over the identifiability zone, i.e. triangle
for MA(2)
72. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
Distance: basic distance between the series
T
ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
73. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−qt≤T
3. producing a simulated series (xt )1≤t≤T
Distance: basic distance between the series
T
ρ((xt )1≤t≤T , (xt )1≤t≤T ) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
74. Comparison of distance impact
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
75. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
76. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Evaluation of the tolerance on the ABC sample against both
distances ( = 100%, 10%, 1%, 0.1%) for an MA(2) model
77. Comments
Role of distance paramount (because = 0)
matters little if “small enough”
representative of “curse of dimensionality”
the data as a whole may be paradoxically weakly informative
for ABC
78. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
79. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
80. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
81. ABC (simul’) advances
how approximative is ABC?
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
82. ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing away
θ such that ρ(η(z), η(y)) , replace
θ’s with locally regressed transforms
(use with BIC)
θ∗ = θ − {η(z) − η(y)}T β
ˆ [Csill´ry et al., TEE, 2010]
e
ˆ
where β is obtained by [NP] weighted least square regression on
(η(z) − η(y)) with weights
Kδ {ρ(η(z), η(y))}
[Beaumont et al., 2002, Genetics]
83. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
84. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
85. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior expectation
i θi Kδ {ρ(η(z(θi )), η(y))}
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
86. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior conditional density
˜
Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
87. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
88. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[Blum, JASA, 2010]
89. ABC-NP (density estimations)
Other versions incorporating regression adjustments
˜
Kb (θi∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[ˆ (θ|y)] − g (θ|y) = cb 2 + cδ 2 + OP (b 2 + δ 2 ) + OP (1/nδ d )
g
c
var(ˆ (θ|y)) =
g (1 + oP (1))
nbδ d
[standard NP calculations]
90. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ (η(y))
ˆ
θ∗ = m(η(y)) + [θ − m(η(z))]
ˆ ˆ
σ (η(z))
ˆ
where
m(η) estimated by non-linear regression (e.g., neural network)
ˆ
σ (η) estimated by non-linear regression on residuals
ˆ
log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi
ˆ
[Blum Fran¸ois, 2009]
c
91. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ (η(y))
ˆ
θ∗ = m(η(y)) + [θ − m(η(z))]
ˆ ˆ
σ (η(z))
ˆ
where
m(η) estimated by non-linear regression (e.g., neural network)
ˆ
σ (η) estimated by non-linear regression on residuals
ˆ
log{θi − m(ηi )}2 = log σ 2 (ηi ) + ξi
ˆ
[Blum Fran¸ois, 2009]
c
92. ABC-NCH (2)
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum Fran¸ois, 2009]
c
93. ABC-NCH (2)
Why neural network?
fights curse of dimensionality
selects relevant summary statistics
provides automated dimension reduction
offers a model choice capability
improves upon multinomial logistic
[Blum Fran¸ois, 2009]
c
94. ABC-MCMC
how approximative is ABC?
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (θ(t) |θ )
θ(t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )Kω (θ |θ(t) )
,
(t)
θ otherwise,
has the posterior π(θ|y ) as stationary distribution
[Marjoram et al, 2003]
95. ABC-MCMC
how approximative is ABC?
Markov chain (θ(t) ) created via the transition function
θ ∼ Kω (θ |θ(t) ) if x ∼ f (x|θ ) is such that x = y
π(θ )Kω (θ(t) |θ )
θ(t+1) = and u ∼ U(0, 1) ≤ π(θ(t) )Kω (θ |θ(t) )
,
(t)
θ otherwise,
has the posterior π(θ|y ) as stationary distribution
[Marjoram et al, 2003]
96. ABC-MCMC (2)
Algorithm 2 Likelihood-free MCMC sampler
Use Algorithm 1 to get (θ(0) , z(0) )
for t = 1 to N do
Generate θ from Kω ·|θ(t−1) ,
Generate z from the likelihood f (·|θ ),
Generate u from U[0,1] ,
π(θ )Kω (θ(t−1) |θ )
if u ≤ I
π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then
set (θ (t) , z(t) ) = (θ , z )
else
(θ(t) , z(t) )) = (θ(t−1) , z(t−1) ),
end if
end for
97. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
=
π(θ (t−1) ) f (z(t−1) |θ (t−1) ) IA ,y (z(t−1) )
q(θ (t−1) |θ ) f (z(t−1) |θ (t−1) )
×
q(θ |θ (t−1) ) XX|θ
)
f (z XX
98. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) IA ,y (z(t−1) )
f (z(t−1) |θ (t−1)
((( h ((
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) XX|θ
)
f (z XX
99. Why does it work?
Acceptance probability does not involve calculating the likelihood
and
π (θ , z |y) q(θ (t−1) |θ )f (z(t−1) |θ (t−1) )
×
π (θ (t−1) , z(t−1) |y) q(θ |θ (t−1) )f (z |θ )
π(θ ) XX|θ IA ,y (z )
)
f (z X X
= hh (
π(θ (t−1) ) ((hhhhh) X(t−1) )
f (z(t−1) |θ (t−1) IAXX
(( X
((( h ,y (zX
X
hh (
q(θ (t−1) |θ ) ((hhhhh)
f (z(t−1) |θ (t−1)
((
((( h
×
q(θ |θ (t−1) ) X|θ
X )
f (z XX
π(θ )q(θ (t−1) |θ )
= IA ,y (z )
π(θ (t−1) q(θ |θ (t−1) )
100. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC sampler
thetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)eps]
101. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC sampler
thetas=rnorm(N,sd=10)
zed=sample(c(1,-1),N,rep=TRUE)*thetas+rnorm(N,sd=1)
eps=quantile(abs(zed-x),.01)
abc=thetas[abs(zed-x)eps]
102. A toy example
Case of
1 1
x ∼ N (θ, 1) + N (−θ, 1)
2 2
under prior θ ∼ N (0, 10)
ABC-MCMC sampler
metas=rep(0,N)
metas[1]=rnorm(1,sd=10)
zed[1]=x
for (t in 2:N){
metas[t]=rnorm(1,mean=metas[t-1],sd=5)
zed[t]=rnorm(1,mean=(1-2*(runif(1).5))*metas[t],sd=1)
if ((abs(zed[t]-x)eps)||(runif(1)dnorm(metas[t],sd=10)/dnorm(metas[t-1],sd=10))){
metas[t]=metas[t-1]
zed[t]=zed[t-1]}
}
108. A toy example
0.10
0.08
0.06
0.04
0.02
0.00 x = 50
−40 −20 0 20 40
θ
109. ABC-PMC
Use of a transition kernel as in population Monte Carlo with
manageable IS correction
Generate a sample at iteration t by
N
(t−1) (t−1)
πt (θ(t) ) ∝
ˆ ωj Kt (θ(t) |θj )
j=1
modulo acceptance of the associated xt , and use an importance
(t)
weight associated with an accepted simulation θi
(t) (t) (t)
ωi ∝ π(θi ) πt (θi ) .
ˆ
c Still likelihood free
[Beaumont et al., 2009]
110. Our ABC-PMC algorithm
Given a decreasing sequence of approximation levels 1 ≥ ... ≥ T,
1. At iteration t = 1,
For i = 1, ..., N
(1) (1)
Simulate θi ∼ π(θ) and x ∼ f (x|θi ) until (x, y ) 1
(1)
Set ωi = 1/N
(1)
Take τ 2 as twice the empirical variance of the θi ’s
2. At iteration 2 ≤ t ≤ T ,
For i = 1, ..., N, repeat
(t−1) (t−1)
Pick θi from the θj ’s with probabilities ωj
(t) 2 (t)
generate θi |θi ∼ N (θi , σt ) and x ∼ f (x|θi )
until (x, y ) t
(t) (t) N (t−1) −1 (t) (t−1)
Set ωi ∝ π(θi )/ j=1 ωj ϕ σt θi − θj )
2 (t)
Take τt+1 as twice the weighted empirical variance of the θi ’s
111. Sequential Monte Carlo
SMC is a simulation technique to approximate a sequence of
related probability distributions πn with π0 “easy” and πT as
target.
Iterated IS as PMC: particles moved from time n to time n via
kernel Kn and use of a sequence of extended targets πn˜
n
πn (z0:n ) = πn (zn )
˜ Lj (zj+1 , zj )
j=0
where the Lj ’s are backward Markov kernels [check that πn (zn ) is a
marginal]
[Del Moral, Doucet Jasra, Series B, 2006]
112. Sequential Monte Carlo (2)
Algorithm 3 SMC sampler
(0)
sample zi ∼ γ0 (x) (i = 1, . . . , N)
(0) (0) (0)
compute weights wi = π0 (zi )/γ0 (zi )
for t = 1 to N do
if ESS(w (t−1) ) NT then
resample N particles z (t−1) and set weights to 1
end if
(t−1) (t−1)
generate zi ∼ Kt (zi , ·) and set weights to
(t) (t) (t−1)
(t) (t−1) πt (zi ))Lt−1 (zi ), zi ))
wi = wi−1 (t−1) (t−1) (t)
πt−1 (zi ))Kt (zi ), zi ))
end for
[Del Moral, Doucet Jasra, Series B, 2006]
113. ABC-SMC
[Del Moral, Doucet Jasra, 2009]
True derivation of an SMC-ABC algorithm
Use of a kernel Kn associated with target π n and derivation of the
backward kernel
π n (z )Kn (z , z)
Ln−1 (z, z ) =
πn (z)
Update of the weights
M m
m=1 IA n
(xin )
win ∝ wi(n−1) M m
(xi(n−1) )
m=1 IA n−1
m
when xin ∼ K (xi(n−1) , ·)
114. ABC-SMCM
Modification: Makes M repeated simulations of the pseudo-data z
given the parameter, rather than using a single [M = 1] simulation,
leading to weight that is proportional to the number of accepted
zi s
M
1
ω(θ) = Iρ(η(y),η(zi ))
M
i=1
[limit in M means exact simulation from (tempered) target]
115. Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z, z ) to
simplify the importance weight and to remove the dependence on
the unknown likelihood from this weight. Update of importance
weights is reduced to the ratio of the proportions of surviving
particles
Major assumption: the forward kernel K is supposed to be invariant
against the true target [tempered version of the true posterior]
Adaptivity in ABC-SMC algorithm only found in on-line
construction of the thresholds t , slowly enough to keep a large
number of accepted transitions
116. Properties of ABC-SMC
The ABC-SMC method properly uses a backward kernel L(z, z ) to
simplify the importance weight and to remove the dependence on
the unknown likelihood from this weight. Update of importance
weights is reduced to the ratio of the proportions of surviving
particles
Major assumption: the forward kernel K is supposed to be invariant
against the true target [tempered version of the true posterior]
Adaptivity in ABC-SMC algorithm only found in on-line
construction of the thresholds t , slowly enough to keep a large
number of accepted transitions
117. A mixture example (1)
Toy model of Sisson et al. (2007): if
θ ∼ U(−10, 10) , x|θ ∼ 0.5 N (θ, 1) + 0.5 N (θ, 1/100) ,
then the posterior distribution associated with y = 0 is the normal
mixture
θ|y = 0 ∼ 0.5 N (0, 1) + 0.5 N (0, 1/100)
restricted to [−10, 10].
Furthermore, true target available as
π(θ||x| ) ∝ Φ( −θ)−Φ(− −θ)+Φ(10( −θ))−Φ(−10( +θ)) .
118. A mixture example (2)
Recovery of the target, whether using a fixed standard deviation of
τ = 0.15 or τ = 1/0.15, or a sequence of adaptive τt ’s.
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
θ θ θ θ θ
119. ABC inference machine
Introduction
Approximate Bayesian computation
ABC as an inference machine
Error inc.
Exact BC and approximate targets
summary statistic
Series B discussion2
120. How much Bayesian?
maybe a convergent method
of inference (meaningful?
sufficient? foreign?)
approximation error
unknown (w/o simulation)
pragmatic Bayes (there is no
other solution!)
many calibration issues
(tolerance, distance,
statistics)
...should Bayesians care?!
121. How much Bayesian?
maybe a convergent method
of inference (meaningful?
sufficient? foreign?)
approximation error
unknown (w/o simulation)
pragmatic Bayes (there is no
other solution!)
many calibration issues
(tolerance, distance,
statistics)
...should Bayesians care?!
122. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
123. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
124. ABCµ
Idea Infer about the error as well:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
125. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
ABCµ involves acceptance probability
ˆ
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
ˆ
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
126. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
ˆ 1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) = K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
ˆ
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
ABCµ involves acceptance probability
ˆ
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
ˆ
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
129. Questions about ABCµ [and model choice]
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
Is the data informative about ? [Identifiability]
How much does the prior π( ) impact the comparison?
How is using both ξ( |x0 , θ) and π ( ) compatible with a
standard probability model? [remindful of Wilkinson’s eABC ]
Where is the penalisation for complexity in the model
comparison?
[X, Mengersen Chen, 2010, PNAS]
130. Questions about ABCµ [and model choice]
For each model under comparison, marginal posterior on used to
assess the fit of the model (HPD includes 0 or not).
Is the data informative about ? [Identifiability]
How much does the prior π( ) impact the comparison?
How is using both ξ( |x0 , θ) and π ( ) compatible with a
standard probability model? [remindful of Wilkinson’s eABC ]
Where is the penalisation for complexity in the model
comparison?
[X, Mengersen Chen, 2010, PNAS]
131. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
132. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
133. How exact a BC?
“Using to represent measurement error is
straightforward, whereas using to model the model
discrepancy is harder to conceptualize and not as
commonly used”
[Richard Wilkinson, 2008]
134. How exact a BC?
Pros
Pseudo-data from true model and observed data from noisy
model
Interesting perspective in that outcome is completely
controlled
Link with ABCµ and assuming y is observed with a
measurement error with density K
Relates to the theory of model approximation
[Kennedy O’Hagan, 2001]
Cons
Requires K to be bounded by M
True approximation error never assessed
Requires a modification of the standard ABC algorithm
135. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
0
where only y1:n is observed.
[Dean, Singh, Jasra, Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
0 0
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
136. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
0
where only y1:n is observed.
[Dean, Singh, Jasra, Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
0 0
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
137. ABC-MLE for HMMs
ABC-MLE defined by
ˆ 0 0
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, Peters, 2011]
138. ABC-MLE for HMMs
ABC-MLE defined by
ˆ 0 0
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1≤t≤n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, Peters, 2011]
139. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
140. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
141. Noisy ABC-MLE
Idea: Modify instead the data from the start
0
(y1 + ζ1 , . . . , yn + ζn )
[ see Fearnhead-Prangle ]
noisy ABC-MLE estimate
0 0
arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , )
θ
[Dean, Singh, Jasra, Peters, 2011]
142. Consistent noisy ABC-MLE
Degrading the data improves the estimation performances:
Noisy ABC-MLE is asymptotically (in n) consistent
under further assumptions, the noisy ABC-MLE is
asymptotically normal
increase in variance of order −2
likely degradation in precision or computing time due to the
lack of summary statistic [curse of dimensionality]
143. SMC for ABC likelihood
Algorithm 4 SMC ABC for HMMs
Given θ
for k = 1, . . . , n do
1 1 N N
generate proposals (xk , yk ), . . . , (xk , yk ) from the model
weigh each proposal with ωk l =I l
B(yk + ζk , ) (yk )
0
l
renormalise the weights and sample the xk ’s accordingly
end for
approximate the likelihood by
n N
l
ωk N
k=1 l=1
[Jasra, Singh, Martin, McCoy, 2010]
144. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
145. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
146. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
147. Which summary for model choice?
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1
B12 (y) = ,
π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2
is consistent or not.
[X, Cornuet, Marin, Pillai, 2012]
Consistency only depends on the range of Ei [η(y)] under both
models.
[Marin, Pillai, X, Rousseau, 2012]
148. Which summary for model choice?
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
η π1 (θ 1 )f1η (η(y)|θ 1 ) dθ 1
B12 (y) = ,
π2 (θ 2 )f2η (η(y)|θ 2 ) dθ 2
is consistent or not.
[X, Cornuet, Marin, Pillai, 2012]
Consistency only depends on the range of Ei [η(y)] under both
models.
[Marin, Pillai, X, Rousseau, 2012]
149. Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
η (y) = η(y) + τ
˜
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
150. Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
η (y) = η(y) + τ
˜
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
151. Summary [of FP/statistics)
Optimality of the posterior expectation E[θ|y] of the
parameter of interest as summary statistics η(y)!
Use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
152. Summary [of FP/statistics)
Optimality of the posterior expectation E[θ|y] of the
parameter of interest as summary statistics η(y)!
Use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
153. Details on Fearnhead and Prangle (FP) ABC
Use of a summary statistic S(·), an importance proposal g (·), a
kernel K (·) ≤ 1 and a bandwidth h 0 such that
(θ, ysim ) ∼ g (θ)f (ysim |θ)
is accepted with probability (hence the bound)
K [{S(ysim ) − sobs }/h]
and the corresponding importance weight defined by
π(θ) g (θ)
[Fearnhead Prangle, 2012]
154. Errors, errors, and errors
Three levels of approximation
π(θ|yobs ) by π(θ|sobs ) loss of information
[ignored]
π(θ|sobs ) by
π(s)K [{s − sobs }/h]π(θ|s) ds
πABC (θ|sobs ) =
π(s)K [{s − sobs }/h] ds
noisy observations
πABC (θ|sobs ) by importance Monte Carlo based on N
simulations, represented by var(a(θ)|sobs )/Nacc [expected
number of acceptances]
155. Average acceptance asymptotics
For the average acceptance probability/approximate likelihood
p(θ|sobs ) = f (ysim |θ) K [{S(ysim ) − sobs }/h] dysim ,
overall acceptance probability
p(sobs ) = p(θ|sobs ) π(θ) dθ = π(sobs )hd + o(hd )
[FP, Lemma 1]
156. Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2
[Not particularly useful in practice]
note that p(θ|sobs ) is an approximate likelihood
reminiscent of parallel tempering
could be approximately achieved by attrition of half of the
data
157. Optimal importance proposal
Best choice of importance proposal in terms of effective sample size
g (θ|sobs ) ∝ π(θ)p(θ|sobs )1/2
[Not particularly useful in practice]
note that p(θ|sobs ) is an approximate likelihood
reminiscent of parallel tempering
could be approximately achieved by attrition of half of the
data
158. Calibration of h
“This result gives insight into how S(·) and h affect the Monte
Carlo error. To minimize Monte Carlo error, we need hd to be not
too small. Thus ideally we want S(·) to be a low dimensional
summary of the data that is sufficiently informative about θ that
π(θ|sobs ) is close, in some sense, to π(θ|yobs )” (FP, p.5)
turns h into an absolute value while it should be
context-dependent and user-calibrated
only addresses one term in the approximation error and
acceptance probability (“curse of dimensionality”)
h large prevents πABC (θ|sobs ) to be close to π(θ|sobs )
d small prevents π(θ|sobs ) to be close to π(θ|yobs ) (“curse of
[dis]information”)
159. Calibrating ABC
“If πABC is calibrated, then this means that probability statements
that are derived from it are appropriate, and in particular that we
can use πABC to quantify uncertainty in estimates” (FP, p.5)
Definition
For 0 q 1 and subset A, event Eq (A) made of sobs such that
PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq (A)) = q
unclear meaning of conditioning on Eq (A)
160. Calibrating ABC
“If πABC is calibrated, then this means that probability statements
that are derived from it are appropriate, and in particular that we
can use πABC to quantify uncertainty in estimates” (FP, p.5)
Definition
For 0 q 1 and subset A, event Eq (A) made of sobs such that
PrABC (θ ∈ A|sobs ) = q. Then ABC is calibrated if
Pr(θ ∈ A|Eq (A)) = q
unclear meaning of conditioning on Eq (A)
161. Calibrated ABC
Theorem (FP)
Noisy ABC, where
sobs = S(yobs ) + h , ∼ K (·)
is calibrated
[Wilkinson, 2008]
no condition on h!!
162. Calibrated ABC
Consequence: when h = ∞
Theorem (FP)
The prior distribution is always calibrated
is this a relevant property then?
163. More questions about calibrated ABC
“Calibration is not universally accepted by Bayesians. It is even more
questionable here as we care how statements we make relate to the
real world, not to a mathematically defined posterior.” R. Wilkinson
Same reluctance about the prior being calibrated
Property depending on prior, likelihood, and summary
Calibration is a frequentist property (almost a p-value!)
More sensible to account for the simulator’s imperfections
than using noisy-ABC against a meaningless based measure
[Wilkinson, 2012]
164. Converging ABC
Theorem (FP)
For noisy ABC, the expected noisy-ABC log-likelihood,
E {log[p(θ|sobs )]} = log[p(θ|S(yobs ) + )]π(yobs |θ0 )K ( )dyobs d ,
has its maximum at θ = θ0 .
True for any choice of summary statistic? even ancilary statistics?!
[Imposes at least identifiability...]
Relevant in asymptotia and not for the data
165. Converging ABC
Corollary
For noisy ABC, the ABC posterior converges onto a point mass on
the true parameter value as m → ∞.
For standard ABC, not always the case (unless h goes to zero).
Strength of regularity conditions (c1) and (c2) in Bernardo
Smith, 1994?
[out-of-reach constraints on likelihood and posterior]
Again, there must be conditions imposed upon summary
statistics...
166. Loss motivated statistic
Under quadratic loss function,
Theorem (FP)
ˆ
(i) The minimal posterior error E[L(θ, θ)|yobs ] occurs when
ˆ = E(θ|yobs ) (!)
θ
(ii) When h → 0, EABC (θ|sobs ) converges to E(θ|yobs )
ˆ
(iii) If S(yobs ) = E[θ|yobs ] then for θ = EABC [θ|sobs ]
ˆ
E[L(θ, θ)|yobs ] = trace(AΣ) + h2 xT AxK (x)dx + o(h2 ).
measure-theoretic difficulties?
dependence of sobs on h makes me uncomfortable inherent to noisy
ABC
Relevant for choice of K ?
167. Optimal summary statistic
“We take a different approach, and weaken the requirement for
πABC to be a good approximation to π(θ|yobs ). We argue for πABC
to be a good approximation solely in terms of the accuracy of
certain estimates of the parameters.” (FP, p.5)
From this result, FP
derive their choice of summary statistic,
S(y) = E(θ|y)
[almost sufficient]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
168. Optimal summary statistic
“We take a different approach, and weaken the requirement for
πABC to be a good approximation to π(θ|yobs ). We argue for πABC
to be a good approximation solely in terms of the accuracy of
certain estimates of the parameters.” (FP, p.5)
From this result, FP
derive their choice of summary statistic,
S(y) = E(θ|y)
[wow! EABC [θ|S(yobs )] = E[θ|yobs ]]
suggest
h = O(N −1/(2+d) ) and h = O(N −1/(4+d) )
as optimal bandwidths for noisy and standard ABC.
169. Caveat
Since E(θ|yobs ) is most usually unavailable, FP suggest
(i) use a pilot run of ABC to determine a region of non-negligible
posterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data to
estimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
170. Caveat
Since E(θ|yobs ) is most usually unavailable, FP suggest
(i) use a pilot run of ABC to determine a region of non-negligible
posterior mass;
(ii) simulate sets of parameter values and data;
(iii) use the simulated sets of parameter values and data to
estimate the summary statistic; and
(iv) run ABC with this choice of summary statistic.
where is the assessment of the first stage error?
171. Approximating the summary statistic
As Beaumont et al. (2002) and Blum and Fran¸ois (2010), FP
c
use a linear regression to approximate E(θ|yobs ):
(i)
θi = β0 + β (i) f (yobs ) + i
with f made of standard transforms
[Further justifications?]
172. [my]questions about semi-automatic ABC
dependence on h and S(·) in the early stage
reduction of Bayesian inference to point estimation
approximation error in step (i) not accounted for
not parameterisation invariant
practice shows that proper approximation to genuine posterior
distributions stems from using a (much) larger number of
summary statistics than the dimension of the parameter
the validity of the approximation to the optimal summary
statistic depends on the quality of the pilot run
important inferential issues like model choice are not covered
by this approach.
[X, 2012]
173. More about semi-automatic ABC
[End of section derived from comments on Read Paper, just appeared]
“The apparently arbitrary nature of the choice of summary statistics
has always been perceived as the Achilles heel of ABC.” M.
Beaumont
“Curse of dimensionality” linked with the increase of the
dimension of the summary statistic
Connection with principal component analysis
[Itan et al., 2010]
Connection with partial least squares
[Wegman et al., 2009]
Beaumont et al. (2002) postprocessed output is used as input
by FP to run a second ABC
174. More about semi-automatic ABC
[End of section derived from comments on Read Paper, just appeared]
“The apparently arbitrary nature of the choice of summary statistics
has always been perceived as the Achilles heel of ABC.” M.
Beaumont
“Curse of dimensionality” linked with the increase of the
dimension of the summary statistic
Connection with principal component analysis
[Itan et al., 2010]
Connection with partial least squares
[Wegman et al., 2009]
Beaumont et al. (2002) postprocessed output is used as input
by FP to run a second ABC
175. Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
K {η(yr ) − η(yobs )}
R r
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd (µθ , Σθ )
whose parameters can be approximated based on the R simulations
(for each value of θ).
Parametric versus non-parametric rate [Uh?!]
Automatic weighting of components of η(·) through Σθ
Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami Kosmidis, 2012]
176. Wood’s alternative
Instead of a non-parametric kernel approximation to the likelihood
1
K {η(yr ) − η(yobs )}
R r
Wood (2010) suggests a normal approximation
η(y(θ)) ∼ Nd (µθ , Σθ )
whose parameters can be approximated based on the R simulations
(for each value of θ).
Parametric versus non-parametric rate [Uh?!]
Automatic weighting of components of η(·) through Σθ
Dependence on normality assumption (pseudo-likelihood?)
[Cornebise, Girolami Kosmidis, 2012]
177. Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π (x, y |θ) = f (x|θ)¯Y |X (y |x)
¯ π
where
πY |X (y |x) = K (y − x)
¯
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs ), then marginally
¯ ¯
y ∼ πY |θ (·|θ0 )
¯ ¯
c Explain for the consistency of Bayesian inference based on y and π
¯ ¯
[Lee, Andrieu Doucet, 2012]
178. Reinterpretation and extensions
Reinterpretation of ABC output as joint simulation from
π (x, y |θ) = f (x|θ)¯Y |X (y |x)
¯ π
where
πY |X (y |x) = K (y − x)
¯
Reinterpretation of noisy ABC
if y |y obs ∼ πY |X (·|y obs ), then marginally
¯ ¯
y ∼ πY |θ (·|θ0 )
¯ ¯
c Explain for the consistency of Bayesian inference based on y and π
¯ ¯
[Lee, Andrieu Doucet, 2012]