The document discusses Approximate Bayesian Computation (ABC). ABC allows inference for statistical models where the likelihood function is not available in closed form. ABC works by simulating data under different parameter values and comparing simulated to observed data. ABC has been used for model choice by comparing evidence for different models. Consistency of ABC for model choice depends on the criterion used and asymptotic identifiability of the parameters.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
This document describes ABC-MCMC algorithms that use quasi-likelihoods as proposals. It introduces quasi-likelihoods as approximations to true likelihoods that can be estimated from pilot runs. The ABCql algorithm uses a quasi-likelihood estimated from a pilot run as the proposal in an ABC-MCMC algorithm. Examples applying ABCql to mixture of normals, coalescent, and gamma models are provided to demonstrate its effectiveness compared to standard ABC-MCMC.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
Approximate Bayesian computation (ABC) is a computational technique for Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC approximates the likelihood by simulating data under different parameter values and comparing simulated and observed data using summary statistics. ABC produces a parameter sample without evaluating the full likelihood function, thus allowing Bayesian inference when likelihoods are unavailable or difficult to compute.
Approximate Bayesian Computation (ABC) can be used as a new empirical Bayes approach when the likelihood function is not available in closed form. ABC replaces the intractable likelihood with a non-parametric approximation and summarizes data with insufficient statistics. ABC has opened opportunities for new inference machines that are legitimate but different from classical Bayesian approaches, raising questions about how closely ABC relates to Bayesian inference. ABC originated in population genetics where likelihoods are often intractable, and population geneticists have contributed significantly to ABC methodology.
Pittsburgh and Toronto "Halloween US trip" seminarsChristian Robert
ABC stands for approximate Bayesian computation. ABC methods are used when the likelihood function is intractable or impossible to compute directly. ABC produces approximate samples from the posterior distribution by simulating data under different parameter values and comparing them to the observed data. ABC has been widely applied in population genetics where genealogies make likelihoods difficult to calculate. While ABC provides a practical computational approach, it raises questions about how closely it relates to true Bayesian inference with the raw data.
The document discusses Approximate Bayesian Computations (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC originated in population genetics to estimate parameters of demographic and genetic models from genetic data. The ABC algorithm works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure between summary statistics. Advances include using alternative distance measures and summary statistics to improve the approximation of the true posterior distribution. Questions remain about how to select optimal statistics and thresholds.
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
Approximate Bayesian computation (ABC) is a new empirical Bayes method for performing Bayesian inference when the likelihood function is intractable or unavailable in closed form. ABC replaces the likelihood with a non-parametric approximation based on simulating data under different parameter values and comparing simulated and observed data using summary statistics. This allows Bayesian inference to be performed even when direct calculation of the likelihood is not possible. However, ABC introduces an approximation error that is unknown without extensive simulation. Some view ABC as a true Bayesian approach for an estimated or noisy likelihood, while others see it as more of a computational technique that is only approximately Bayesian.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
The document discusses Approximate Bayesian Computation (ABC) as a new empirical Bayes method for performing inference when the likelihood function is intractable. ABC methods replace the intractable likelihood with a non-parametric approximation by degrading the data precision to a tolerance level and summarizing or replacing the data with insufficient statistics. ABC originated in population genetics models where likelihoods for polymorphism data are often intractable. ABC allows performing inference by simulating data under different parameter values when only the ability to simulate from the likelihood is available.
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
This document discusses recent advances in simulation methods for statistics. It motivates the use of such methods by explaining how latent variable models can make inference computationally difficult. It introduces Monte Carlo integration and the Metropolis-Hastings algorithm as two important simulation techniques. The document also discusses how Bayesian analysis provides a framework to combine prior information with data, but computing the posterior distribution can be challenging for complex models. Simulation methods are presented as a way to approximate solutions to these computationally difficult statistical problems.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
This document outlines the agenda for the second part of a lecture on Approximate Bayesian Computation (ABC). It begins with a discussion of simulation-based methods in econometrics like simulated method of moments. Next, it discusses the genetic origins and applications of ABC in population genetics, including coalescent theory. The document then covers using indirect inference to provide summary statistics for ABC and estimating demographic parameters from genetic data when the likelihood is intractable.
The document discusses simulation-based methods in econometrics such as simulated method of moments, method of simulated moments, and indirect inference. It then explains how approximate Bayesian computation (ABC) can use indirect inference by treating parameters from an auxiliary model fitted to the data as summary statistics. ABC with indirect inference provides an adaptive, tuning-free approach by embedding it within a sequential Monte Carlo algorithm. Consistency of indirect inference depends on the criterion and asymptotic identifiability of the parameters.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
Markov Chain Monte Carlo (MCMC) methods use Markov chains to sample from probability distributions for use in Monte Carlo simulations. The Metropolis-Hastings algorithm proposes transitions to new states in the chain and either accepts or rejects those states based on a probability calculation, allowing it to sample from complex, high-dimensional distributions. The Gibbs sampler is a special case of MCMC where each variable is updated conditional on the current values of the other variables, ensuring all proposed moves are accepted. These MCMC methods allow approximating integrals that are difficult to compute directly.
For the discovery of a regression relationship between y and x, a vector of p potential predictors, the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches. To exploit the potential monotonicity of the predictors, we introduce mBART, a constrained version of BART that incorporates monotonicity with a multivariate basis of monotone trees, thereby avoiding the further confines of a full parametric form. Using mBART to estimate such effects yields (i) function estimates that are smoother and more interpretable, (ii) better out-of-sample predictive performance and (iii) less post-data uncertainty. By using mBART to simultaneously estimate both the increasing and the decreasing regions of a predictor, mBART opens up a new approach to the discovery and estimation of the decomposition of a function into its monotone components.
(This is joint work with H. Chipman, R. McCulloch and T. Shively).
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
How to Download & Install Module From the Odoo App Store in Odoo 17Celine George
Custom modules offer the flexibility to extend Odoo's capabilities, address unique requirements, and optimize workflows to align seamlessly with your organization's processes. By leveraging custom modules, businesses can unlock greater efficiency, productivity, and innovation, empowering them to stay competitive in today's dynamic market landscape. In this tutorial, we'll guide you step by step on how to easily download and install modules from the Odoo App Store.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
1. Approximate Bayesian Computation (ABC):
model choice and empirical likelihood
Christian P. Robert
Venezia, Oct. 8, 2012
Universit´ Paris-Dauphine, IuF, & CREST
e
Joint works with J.-M. Cornuet, J.-M. Marin, K.L. Mengersen, N. Pillai, and
P. Pudlo
3. Intractable likelihood
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1 , . . . , yn |θ)
is (really!) not available in closed form
can (easily!) be neither completed nor demarginalised
cannot be estimated by an unbiased estimator
c Prohibits direct implementation of a generic MCMC algorithm
like Metropolis–Hastings
4. Intractable likelihood
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1 , . . . , yn |θ)
is (really!) not available in closed form
can (easily!) be neither completed nor demarginalised
cannot be estimated by an unbiased estimator
c Prohibits direct implementation of a generic MCMC algorithm
like Metropolis–Hastings
5. Different perspectives on abc
What is the (most) fundamental issue?
a mere computational issue (that will eventually end up being
solved by more powerful computers, &tc, even if too costly in
the short term)
an inferential issue (opening opportunities for new inference
machine, with different legitimity than classical B approach)
a Bayesian conundrum (while inferential methods available,
how closely related to the B approach?)
6. Different perspectives on abc
What is the (most) fundamental issue?
a mere computational issue (that will eventually end up being
solved by more powerful computers, &tc, even if too costly in
the short term)
an inferential issue (opening opportunities for new inference
machine, with different legitimity than classical B approach)
a Bayesian conundrum (while inferential methods available,
how closely related to the B approach?)
7. Different perspectives on abc
What is the (most) fundamental issue?
a mere computational issue (that will eventually end up being
solved by more powerful computers, &tc, even if too costly in
the short term)
an inferential issue (opening opportunities for new inference
machine, with different legitimity than classical B approach)
a Bayesian conundrum (while inferential methods available,
how closely related to the B approach?)
8. Econom’ections
Similar exploration of simulation-based and approximation
techniques in Econometrics
Simulated method of moments
Method of simulated moments
Simulated pseudo-maximum-likelihood
Indirect inference
[Gouri´roux & Monfort, 1996]
e
even though motivation is partly-defined models rather than
complex likelihoods
9. Econom’ections
Similar exploration of simulation-based and approximation
techniques in Econometrics
Simulated method of moments
Method of simulated moments
Simulated pseudo-maximum-likelihood
Indirect inference
[Gouri´roux & Monfort, 1996]
e
even though motivation is partly-defined models rather than
complex likelihoods
10. Indirect inference
^
Minimise [in θ] a distance between estimators β based on a
pseudo-model for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´roux, Monfort, & Renault, 1993;
e
Smith, 1993; Gallant & Tauchen, 1996]
11. Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
^
β(y) = arg max log f (yt |β, y1:(t−1) )
β
t
leading to
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
^ ^
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
12. Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
2
∂ log f
^
β(y) = arg min (yt |β, y1:(t−1) )
β
t
∂β
leading to
arg min ||β(yo ) − β(y1 (θ), . . . , yS (θ))||2
^ ^
θ
when
ys (θ) ∼ f (y|θ) s = 1, . . . , S
13. Consistent indirect inference
“...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...”
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux & Monfort, 1996, p. 66]
e
14. Consistent indirect inference
“...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...”
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´roux & Monfort, 1996, p. 66]
e
15. Choice of pseudo-model
Arbitrariness of pseudo-model
Pick model such that
^
1. β(θ) not flat (i.e. sensitive to changes in θ)
^
2. β(θ) not dispersed (i.e. robust agains changes in ys (θ))
[Frigessi & Heggland, 2004]
16. Approximate Bayesian computation
Introduction
ABC
Genesis of ABC
ABC basics
Advances and interpretations
ABC as knn
ABC as an inference machine
ABC for model choice
Model choice consistency
ABCel
17. Genetic background of ABC
skip genetics
ABC is a recent computational technique that only requires being
able to sample from the likelihood f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´ & al., 1999]
e
18. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we cannot calculate the likelihood of the
polymorphism data f (y|θ)...
19. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem:
most of the time, we cannot calculate the likelihood of the
polymorphism data f (y|θ)...
20. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
Sample of 8 genes
21. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T (k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
22. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T (k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
23. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T (k) ∼ Exp(k(k − 1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
Observations: leafs of the tree
• MRCA = 100
^
θ=?
• independent mutations:
±1 with pr. 1/2
24. Much more interesting models. . .
several independent locus
Independent gene genealogies and mutations
different populations
linked by an evolutionary scenario made of divergences,
admixtures, migrations between populations, etc.
larger sample size
usually between 50 and 100 genes
MRCA
τ2
τ1
A typical evolutionary scenario: POP 0 POP 1 POP 2
25. Intractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
26. Intractable likelihood
Missing (too missing!) data structure:
f (y|θ) = f (y|G , θ)f (G |θ)dG
G
cannot be computed in a manageable way...
The genealogies are considered as nuisance parameters
This modelling clearly differs from the phylogenetic perspective
where the tree is the parameter of interest.
27. not-so-obvious ancestry...
You went to school to learn, girl (. . . )
Why 2 plus 2 makes four
Now, now, now, I’m gonna teach you (. . . )
All you gotta do is repeat after me!
A, B, C!
It’s easy as 1, 2, 3!
Or simple as Do, Re, Mi! (. . . )
28. A?B?C?
A stands for approximate
[wrong likelihood /
picture]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
29. A?B?C?
A stands for approximate
[wrong likelihood /
picture]
B stands for Bayesian
C stands for computation
[producing a parameter
sample]
30. A?B?C?
ESS=108.9 ESS=81.48 ESS=105.2
3.0
2.0
2.0
Density
Density
Density
1.5
1.0
1.0
0.0
0.0
0.0
A stands for approximate −0.4 0.0 0.2
ESS=133.3
θ
0.4 0.6 −0.8 −0.6 −0.4
ESS=87.75
θ
−0.2 0.0 −0.2 0.0 0.2
ESS=72.89
θ
0.4 0.6 0.8
3.0
2.0
4
[wrong likelihood /
2.0
Density
Density
Density
1.0
1.0
2
0.0
0.0
0
picture] −0.8 −0.4
ESS=116.5
θ
0.0 0.2 0.4 −0.2 0.0 0.2
ESS=103.9
θ
0.4 0.6 0.8 −0.2 0.0
ESS=126.9
θ
0.2 0.4
3.0
2.0
3.0
2.0
Density
Density
Density
1.0
1.5
1.0
B stands for Bayesian
0.0
0.0
0.0
−0.4 0.0 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2 0.4 −0.8 −0.4 0.0 0.4
ESS=113.3
θ ESS=92.99
θ ESS=121.4
θ
3.0
2.0
C stands for computation
2.0
Density
Density
Density
1.5
1.0
1.0
0.0
0.0
0.0
[producing a parameter −0.6 −0.2
ESS=133.6
θ
0.2 0.6 −0.2 0.0 0.2
ESS=116.4
θ
0.4 0.6 −0.5
ESS=131.6
θ
0.0 0.5
0.0 1.0 2.0 3.0
2.0
sample]
Density
Density
Density
1.0
1.0
0.0
0.0
−0.6 −0.2 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2 0.4 −0.5 0.0 0.5
31. How Bayesian is aBc?
Could we turn the resolution into a Bayesian answer?
ideally so (not meaningfull: requires ∞-ly powerful computer
asymptotically so (when sample size goes to ∞: meaningfull?)
approximation error unknown (w/o costly simulation)
true Bayes for wrong model (formal and artificial)
true Bayes for estimated likelihood (back to econometrics?)
32. Untractable likelihood
Back to stage zero: what can we do
when a likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference/picture...?
33. Untractable likelihood
Back to stage zero: what can we do
when a likelihood function f (y|θ) is
well-defined but impossible / too
costly to compute...?
MCMC cannot be implemented!
shall we give up Bayesian
inference altogether?!
or settle for an almost Bayesian
inference/picture...?
34. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
e
35. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
e
36. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Tavar´ et al., 1997]
e
37. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
ρ(y, z)
where ρ is a distance
Output distributed from
def
π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
38. A as A...pproximative
When y is a continuous random variable, strict equality z = y is
replaced with a tolerance zone
ρ(y, z)
where ρ is a distance
Output distributed from
def
π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
39. ABC algorithm
In most implementations, further degree of A...pproximation:
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)}
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
40. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
41. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
42. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|y) .
...does it?!
43. Output
The likelihood-free algorithm samples from the marginal in z of:
π(θ)f (z|θ)IA ,y (z)
π (θ, z|y) = ,
A ,y ×Θ π(θ)f (z|θ)dzdθ
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
restricted posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
Not so good..!
skip convergence details!
44. Convergence of ABC
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ
π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ
π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
45. Convergence of ABC
What happens when → 0?
For B ⊂ Θ, we have
A f (z|θ)dz f (z|θ)π(θ)dθ
,y B
π(θ)dθ = dz
B A ,y ×Θ
π(θ)f (z|θ)dzdθ A ,y A ,y ×Θ
π(θ)f (z|θ)dzdθ
B f (z|θ)π(θ)dθ m(z)
= dz
A ,y
m(z) A ,y ×Θ
π(θ)f (z|θ)dzdθ
m(z)
= π(B|z) dz
A ,y A ,y ×Θ π(θ)f (z|θ)dzdθ
which indicates convergence for a continuous π(B|z).
46. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y)) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can “hope” for is
π(θ|η(y)) = π(θ)
Bummer!!!
47. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y)) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can “hope” for is
π(θ|η(y)) = π(θ)
Bummer!!!
48. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y)) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can “hope” for is
π(θ|η(y)) = π(θ)
Bummer!!!
49. Convergence (do not attempt!)
...and the above does not apply to insufficient statistics:
If η(y) is not a sufficient statistics, the best one can hope for is
π(θ|η(y)) , not π(θ|y)
If η(y) is an ancillary statistic, the whole information contained in
y is lost!, the “best” one can “hope” for is
π(θ|η(y)) = π(θ)
Bummer!!!
50. MA example
Inference on the parameters of a MA(q) model
q
xt = t + ϑi t−i t−i i.i.d.w.n.
i=1
bypass MA illustration
Simple prior: uniform over the inverse [real and complex] roots in
q
Q(u) = 1 − ϑi u i
i=1
under the identifiability conditions
51. MA example
Inference on the parameters of a MA(q) model
q
xt = t + ϑi t−i t−i i.i.d.w.n.
i=1
bypass MA illustration
Simple prior: uniform prior over the identifiability zone in the
parameter space, i.e. triangle for MA(2)
52. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−q<t T
3. producing a simulated series (xt )1 t T
Distance: basic distance between the series
T
ρ((xt )1 t T , (xt )1 t T) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
53. MA example (2)
ABC algorithm thus made of
1. picking a new value (ϑ1 , ϑ2 ) in the triangle
2. generating an iid sequence ( t )−q<t T
3. producing a simulated series (xt )1 t T
Distance: basic distance between the series
T
ρ((xt )1 t T , (xt )1 t T) = (xt − xt )2
t=1
or distance between summary statistics like the q = 2
autocorrelations
T
τj = xt xt−j
t=j+1
54. Comparison of distance impact
Impact of tolerance on ABC sample against either distance
( = 100%, 10%, 1%, 0.1%) for an MA(2) model
55. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Impact of tolerance on ABC sample against either distance
( = 100%, 10%, 1%, 0.1%) for an MA(2) model
56. Comparison of distance impact
4
1.5
3
1.0
2
0.5
1
0.0
0
0.0 0.2 0.4 0.6 0.8 −2.0 −1.0 0.0 0.5 1.0 1.5
θ1 θ2
Impact of tolerance on ABC sample against either distance
( = 100%, 10%, 1%, 0.1%) for an MA(2) model
57. Comments
Role of distance paramount (because = 0)
Scaling of components of η(y) is also determinant
matters little if “small enough”
representative of “curse of dimensionality”
small is beautiful!
the data as a whole may be paradoxically weakly informative
for ABC
58. ABC (simul’) advances
how approximative is ABC? ABC as knn
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
59. ABC (simul’) advances
how approximative is ABC? ABC as knn
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
60. ABC (simul’) advances
how approximative is ABC? ABC as knn
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
61. ABC (simul’) advances
how approximative is ABC? ABC as knn
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y ...
[Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002]
.....or even by including in the inferential framework [ABCµ ]
[Ratmann et al., 2009]
62. ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing away
θ such that ρ(η(z), η(y)) > , replace
θ’s with locally regressed transforms
θ∗ = θ − {η(z) − η(y)}T β
^
[Csill´ry et al., TEE, 2010]
e
^
where β is obtained by [NP] weighted least square regression on
(η(z) − η(y)) with weights
Kδ {ρ(η(z), η(y))}
[Beaumont et al., 2002, Genetics]
63. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
64. ABC-NP (regression)
Also found in the subsequent literature, e.g. in Fearnhead-Prangle (2012) :
weight directly simulation by
Kδ {ρ(η(z(θ)), η(y))}
or
S
1
Kδ {ρ(η(zs (θ)), η(y))}
S
s=1
[consistent estimate of f (η|θ)]
Curse of dimensionality: poor estimate when d = dim(η) is large...
65. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior expectation
i θi Kδ {ρ(η(z(θi )), η(y))}
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
66. ABC-NP (density estimation)
Use of the kernel weights
Kδ {ρ(η(z(θ)), η(y))}
leads to the NP estimate of the posterior conditional density
i
˜
Kb (θi − θ)Kδ {ρ(η(z(θi )), η(y))}
i Kδ {ρ(η(z(θi )), η(y))}
[Blum, JASA, 2010]
67. ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜
Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
g
c
var(^ (θ|y)) =
g (1 + oP (1))
nbδd
68. ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜
Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
g
c
var(^ (θ|y)) =
g (1 + oP (1))
nbδd
[Blum, JASA, 2010]
69. ABC-NP (density estimations)
Other versions incorporating regression adjustments
i
˜
Kb (θ∗ − θ)Kδ {ρ(η(z(θi )), η(y))}
i
i Kδ {ρ(η(z(θi )), η(y))}
In all cases, error
E[^ (θ|y)] − g (θ|y) = cb 2 + cδ2 + OP (b 2 + δ2 ) + OP (1/nδd )
g
c
var(^ (θ|y)) =
g (1 + oP (1))
nbδd
[standard NP calculations]
70. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ(η(y))
^
θ∗ = m(η(y)) + [θ − m(η(z))]
^ ^
σ(η(z))
^
where
m(η) estimated by non-linear regression (e.g., neural network)
^
σ(η) estimated by non-linear regression on residuals
^
log{θi − m(ηi )}2 = log σ2 (ηi ) + ξi
^
[Blum & Fran¸ois, 2009]
c
71. ABC-NCH
Incorporating non-linearities and heterocedasticities:
σ(η(y))
^
θ∗ = m(η(y)) + [θ − m(η(z))]
^ ^
σ(η(z))
^
where
m(η) estimated by non-linear regression (e.g., neural network)
^
σ(η) estimated by non-linear regression on residuals
^
log{θi − m(ηi )}2 = log σ2 (ηi ) + ξi
^
[Blum & Fran¸ois, 2009]
c
72. ABC as knn
[Biau et al., 2012, arxiv:1207.6461]
Practice of ABC: determine tolerance as a quantile on observed
distances, say 10% or 1% quantile,
= N = qα (d1 , . . . , dN )
Interpretation of ε as nonparametric bandwidth only
approximation of the actual practice
[Blum & Fran¸ois, 2010]
c
ABC is a k-nearest neighbour (knn) method with kN = N N
[Loftsgaarden & Quesenberry, 1965]
73. ABC as knn
[Biau et al., 2012, arxiv:1207.6461]
Practice of ABC: determine tolerance as a quantile on observed
distances, say 10% or 1% quantile,
= N = qα (d1 , . . . , dN )
Interpretation of ε as nonparametric bandwidth only
approximation of the actual practice
[Blum & Fran¸ois, 2010]
c
ABC is a k-nearest neighbour (knn) method with kN = N N
[Loftsgaarden & Quesenberry, 1965]
74. ABC as knn
[Biau et al., 2012, arxiv:1207.6461]
Practice of ABC: determine tolerance as a quantile on observed
distances, say 10% or 1% quantile,
= N = qα (d1 , . . . , dN )
Interpretation of ε as nonparametric bandwidth only
approximation of the actual practice
[Blum & Fran¸ois, 2010]
c
ABC is a k-nearest neighbour (knn) method with kN = N N
[Loftsgaarden & Quesenberry, 1965]
75. ABC consistency
Provided
kN / log log N −→ ∞ and kN /N −→ 0
as N → ∞, for almost all s0 (with respect to the distribution of
S), with probability 1,
kN
1
ϕ(θj ) −→ E[ϕ(θj )|S = s0 ]
kN
j=1
[Devroye, 1982]
Biau et al. (2012) also recall pointwise and integrated mean square error
consistency results on the corresponding kernel estimate of the
conditional posterior distribution, under constraints
p
kN → ∞, kN /N → 0, hN → 0 and hN kN → ∞,
76. ABC consistency
Provided
kN / log log N −→ ∞ and kN /N −→ 0
as N → ∞, for almost all s0 (with respect to the distribution of
S), with probability 1,
kN
1
ϕ(θj ) −→ E[ϕ(θj )|S = s0 ]
kN
j=1
[Devroye, 1982]
Biau et al. (2012) also recall pointwise and integrated mean square error
consistency results on the corresponding kernel estimate of the
conditional posterior distribution, under constraints
p
kN → ∞, kN /N → 0, hN → 0 and hN kN → ∞,
77. Rates of convergence
Further assumptions (on target and kernel) allow for precise
(integrated mean square) convergence rates (as a power of the
sample size N), derived from classical k-nearest neighbour
regression, like
4
when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8
4
when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N
4
when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4
[Biau et al., 2012, arxiv:1207.6461]
Only applies to sufficient summary statistics
78. Rates of convergence
Further assumptions (on target and kernel) allow for precise
(integrated mean square) convergence rates (as a power of the
sample size N), derived from classical k-nearest neighbour
regression, like
4
when m = 1, 2, 3, kN ≈ N (p+4)/(p+8) and rate N − p+8
4
when m = 4, kN ≈ N (p+4)/(p+8) and rate N − p+8 log N
4
when m > 4, kN ≈ N (p+4)/(m+p+4) and rate N − m+p+4
[Biau et al., 2012, arxiv:1207.6461]
Only applies to sufficient summary statistics
79. ABC inference machine
Introduction
ABC
ABC as an inference machine
Error inc.
Exact BC and approximate
targets
summary statistic
ABC for model choice
Model choice consistency
ABCel
80. How much Bayesian aBc is..?
maybe a convergent method of inference (meaningful?
sufficient? foreign?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
many calibration issues (tolerance, distance, statistics)
81. How much Bayesian aBc is..?
maybe a convergent method of inference (meaningful?
sufficient? foreign?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
many calibration issues (tolerance, distance, statistics)
...should Bayesians care?!
82. How much Bayesian aBc is..?
maybe a convergent method of inference (meaningful?
sufficient? foreign?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
many calibration issues (tolerance, distance, statistics)
yes they should!!!
83. How much Bayesian aBc is..?
maybe a convergent method of inference (meaningful?
sufficient? foreign?)
approximation error unknown (w/o simulation)
pragmatic Bayes (there is no other solution!)
many calibration issues (tolerance, distance, statistics)
to ABCel
84. ABCµ
Idea Infer about the error as well as about the parameter:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
85. ABCµ
Idea Infer about the error as well as about the parameter:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
86. ABCµ
Idea Infer about the error as well as about the parameter:
Use of a joint density
f (θ, |y) ∝ ξ( |y, θ) × πθ (θ) × π ( )
where y is the data, and ξ( |y, θ) is the prior predictive density of
ρ(η(z), η(y)) given θ and y when z ∼ f (z|θ)
Warning! Replacement of ξ( |y, θ) with a non-parametric kernel
approximation.
[Ratmann, Andrieu, Wiuf and Richardson, 2009, PNAS]
87. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =
^ K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
^
ABCµ involves acceptance probability
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
^
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
^
88. ABCµ details
Multidimensional distances ρk (k = 1, . . . , K ) and errors
k = ρk (ηk (z), ηk (y)), with
1
k ∼ ξk ( |y, θ) ≈ ξk ( |y, θ) =
^ K [{ k −ρk (ηk (zb ), ηk (y))}/hk ]
Bhk
b
then used in replacing ξ( |y, θ) with mink ξk ( |y, θ)
^
ABCµ involves acceptance probability
π(θ , ) q(θ , θ)q( , ) mink ξk ( |y, θ )
^
π(θ, ) q(θ, θ )q( , ) mink ξk ( |y, θ)
^
91. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
92. Wilkinson’s exact BC (not exactly!)
ABC approximation error (i.e. non-zero tolerance) replaced with
exact simulation from a controlled approximation to the target,
convolution of true posterior with kernel function
π(θ)f (z|θ)K (y − z)
π (θ, z|y) = ,
π(θ)f (z|θ)K (y − z)dzdθ
with K kernel parameterised by bandwidth .
[Wilkinson, 2008]
Theorem
The ABC algorithm based on the assumption of a randomised
observation y = y + ξ, ξ ∼ K , and an acceptance probability of
˜
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
93. How exact a BC?
“Using to represent measurement error is
straightforward, whereas using to model the model
discrepancy is harder to conceptualize and not as
commonly used”
[Richard Wilkinson, 2008]
94. How exact a BC?
Pros
Pseudo-data from true model and observed data from noisy
model
Interesting perspective in that outcome is completely
controlled
Link with ABCµ and assuming y is observed with a
measurement error with density K
Relates to the theory of model approximation
[Kennedy & O’Hagan, 2001]
Cons
Requires K to be bounded by M
True approximation error never assessed
Requires a modification of the standard ABC algorithm
95. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
where only y0 is observed.
1:n
[Dean, Singh, Jasra, & Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
0 0
96. ABC for HMMs
Specific case of a hidden Markov model
Xt+1 ∼ Qθ (Xt , ·)
Yt+1 ∼ gθ (·|xt )
where only y0 is observed.
1:n
[Dean, Singh, Jasra, & Peters, 2011]
Use of specific constraints, adapted to the Markov structure:
y1 ∈ B(y1 , ) × · · · × yn ∈ B(yn , )
0 0
97. ABC-MLE for HMMs
ABC-MLE defined by
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
^ 0 0
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1 t n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, & Peters, 2011]
98. ABC-MLE for HMMs
ABC-MLE defined by
θn = arg max Pθ Y1 ∈ B(y1 , ), . . . , Yn ∈ B(yn , )
^ 0 0
θ
Exact MLE for the likelihood same basis as Wilkinson!
0
pθ (y1 , . . . , yn )
corresponding to the perturbed process
(xt , yt + zt )1 t n zt ∼ U(B(0, 1)
[Dean, Singh, Jasra, & Peters, 2011]
99. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
100. ABC-MLE is biased
ABC-MLE is asymptotically (in n) biased with target
l (θ) = Eθ∗ [log pθ (Y1 |Y−∞:0 )]
but ABC-MLE converges to true value in the sense
l n (θn ) → l (θ)
for all sequences (θn ) converging to θ and n
101. Noisy ABC-MLE
Idea: Modify instead the data from the start
0
(y1 + ζ1 , . . . , yn + ζn )
[ see Fearnhead-Prangle ]
noisy ABC-MLE estimate
arg max Pθ Y1 ∈ B(y1 + ζ1 , ), . . . , Yn ∈ B(yn + ζn , )
0 0
θ
[Dean, Singh, Jasra, & Peters, 2011]
102. Consistent noisy ABC-MLE
Degrading the data improves the estimation performances:
Noisy ABC-MLE is asymptotically (in n) consistent
under further assumptions, the noisy ABC-MLE is
asymptotically normal
increase in variance of order −2
likely degradation in precision or computing time due to the
lack of summary statistic [curse of dimensionality]
103. SMC for ABC likelihood
Algorithm 2 SMC ABC for HMMs
Given θ
for k = 1, . . . , n do
1 1 N N
generate proposals (xk , yk ), . . . , (xk , yk ) from the model
weigh each proposal with ωk l =I l
B(yk + ζk , ) (yk )
0
l
renormalise the weights and sample the xk ’s accordingly
end for
approximate the likelihood by
n N
ωlk N
k=1 l=1
[Jasra, Singh, Martin, & McCoy, 2010]
104. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
105. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
106. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics
Starting from a large collection of summary statistics is available,
Joyce and Marjoram (2008) consider the sequential inclusion into
the ABC target, with a stopping rule based on a likelihood ratio
test
Not taking into account the sequential nature of the tests
Depends on parameterisation
Order of inclusion matters
likelihood ratio test?!
107. Which summary for model choice?
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
η π1 (θ1 )f1η (η(y)|θ1 ) dθ1
B12 (y) = ,
π2 (θ2 )f2η (η(y)|θ2 ) dθ2
is consistent or not.
[X, Cornuet, Marin, & Pillai, 2012]
Consistency only depends on the range of Ei [η(y)] under both
models.
[Marin, Pillai, X, & Rousseau, 2012]
108. Which summary for model choice?
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
η π1 (θ1 )f1η (η(y)|θ1 ) dθ1
B12 (y) = ,
π2 (θ2 )f2η (η(y)|θ2 ) dθ2
is consistent or not.
[X, Cornuet, Marin, & Pillai, 2012]
Consistency only depends on the range of Ei [η(y)] under both
models.
[Marin, Pillai, X, & Rousseau, 2012]
109. Semi-automatic ABC
Fearnhead and Prangle (2010) study ABC and the selection of the
summary statistic in close proximity to Wilkinson’s proposal
ABC considered as inferential method and calibrated as such
randomised (or ‘noisy’) version of the summary statistics
˜
η(y) = η(y) + τ
derivation of a well-calibrated version of ABC, i.e. an
algorithm that gives proper predictions for the distribution
associated with this randomised summary statistic
110. Summary [of F&P/statistics)
optimality of the posterior expectation
E[θ|y]
of the parameter of interest as summary statistics η(y)!
use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
recent extension to model choice, optimality of Bayes factor
B12 (y)
[F&P, ISBA 2012 talk]
111. Summary [of F&P/statistics)
optimality of the posterior expectation
E[θ|y]
of the parameter of interest as summary statistics η(y)!
use of the standard quadratic loss function
(θ − θ0 )T A(θ − θ0 ) .
recent extension to model choice, optimality of Bayes factor
B12 (y)
[F&P, ISBA 2012 talk]
112. Conclusion
Choice of summary statistics is paramount for ABC
validation/performance
At best, ABC approximates π(. | η(y))
Model selection feasible with ABC [with caution!]
For estimation, consistency if {θ; µ(θ) = µ0 } = θ0
For testing consistency if
{µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
[Marin et al., 2011]
113. Conclusion
Choice of summary statistics is paramount for ABC
validation/performance
At best, ABC approximates π(. | η(y))
Model selection feasible with ABC [with caution!]
For estimation, consistency if {θ; µ(θ) = µ0 } = θ0
For testing consistency if
{µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
[Marin et al., 2011]
114. Conclusion
Choice of summary statistics is paramount for ABC
validation/performance
At best, ABC approximates π(. | η(y))
Model selection feasible with ABC [with caution!]
For estimation, consistency if {θ; µ(θ) = µ0 } = θ0
For testing consistency if
{µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
[Marin et al., 2011]
115. Conclusion
Choice of summary statistics is paramount for ABC
validation/performance
At best, ABC approximates π(. | η(y))
Model selection feasible with ABC [with caution!]
For estimation, consistency if {θ; µ(θ) = µ0 } = θ0
For testing consistency if
{µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
[Marin et al., 2011]
116. Conclusion
Choice of summary statistics is paramount for ABC
validation/performance
At best, ABC approximates π(. | η(y))
Model selection feasible with ABC [with caution!]
For estimation, consistency if {θ; µ(θ) = µ0 } = θ0
For testing consistency if
{µ1 (θ1 ), θ1 ∈ Θ1 } ∩ {µ2 (θ2 ), θ2 ∈ Θ2 } = ∅
[Marin et al., 2011]
117. ABC for model choice
Introduction
ABC
ABC as an inference machine
ABC for model choice
BMC Principle
Gibbs random fields (counterexample)
Generic ABC model choice
Model choice consistency
ABCel
118. Bayesian model choice
BMC Principle
Several models
M1 , M2 , . . .
are considered simultaneously for dataset y and model index M
central to inference.
Use of
prior π(M = m), plus
prior distribution on the parameter conditional on the value m
of the model index, πm (θm )
119. Bayesian model choice
BMC Principle
Several models
M1 , M2 , . . .
are considered simultaneously for dataset y and model index M
central to inference.
Goal is to derive the posterior distribution of M,
π(M = m|data)
a challenging computational target when models are complex.
120. Generic ABC for model choice
Algorithm 3 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm (θm )
Generate z from the model fm (z|θm )
until ρ{η(z), η(y)} <
Set m(t) = m and θ(t) = θm
end for
[Grelaud & al., 2009; Toni & al., 2009]
121. ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
T
1
Im(t) =m .
T
t=1
122. ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
T
1
Im(t) =m .
T
t=1
Extension to a weighted polychotomous logistic regression
estimate of π(M = m|y), with non-parametric kernel weights
[Cornuet et al., DIYABC, 2009]
123. Potts model
Skip MRFs
Potts model
Distribution with an energy function of the form
θS(y) = θ δyl =yi
l∼i
where l∼i denotes a neighbourhood structure
In most realistic settings, summation
Zθ = exp{θS(x)}
x∈X
involves too many terms to be manageable and numerical
approximations cannot always be trusted
124. Potts model
Skip MRFs
Potts model
Distribution with an energy function of the form
θS(y) = θ δyl =yi
l∼i
where l∼i denotes a neighbourhood structure
In most realistic settings, summation
Zθ = exp{θS(x)}
x∈X
involves too many terms to be manageable and numerical
approximations cannot always be trusted
125. Neighbourhood relations
Setup
Choice to be made between M neighbourhood relations
m
i ∼i (0 m M − 1)
with
Sm (x) = I{xi =xi }
m
i ∼i
driven by the posterior probabilities of the models.
126. Model index
Computational target:
P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m)
Θm
If S(x) sufficient statistic for the joint parameters
(M, θ0 , . . . , θM−1 ),
P(M = m|x) = P(M = m|S(x)) .
127. Model index
Computational target:
P(M = m|x) ∝ fm (x|θm )πm (θm ) dθm π(M = m)
Θm
If S(x) sufficient statistic for the joint parameters
(M, θ0 , . . . , θM−1 ),
P(M = m|x) = P(M = m|S(x)) .
128. Sufficient statistics in Gibbs random fields
Each model m has its own sufficient statistic Sm (·) and
S(·) = (S0 (·), . . . , SM−1 (·)) is also (model-)sufficient.
Explanation: For Gibbs random fields,
x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
1 2
1
= f 2 (S(x)|θm )
n(S(x)) m
where
n(S(x)) = {x ∈ X : S(x) = S(x)}
˜ ˜
c S(x) is sufficient for the joint parameters
129. Sufficient statistics in Gibbs random fields
Each model m has its own sufficient statistic Sm (·) and
S(·) = (S0 (·), . . . , SM−1 (·)) is also (model-)sufficient.
Explanation: For Gibbs random fields,
x|M = m ∼ fm (x|θm ) = fm (x|S(x))fm (S(x)|θm )
1 2
1
= f 2 (S(x)|θm )
n(S(x)) m
where
n(S(x)) = {x ∈ X : S(x) = S(x)}
˜ ˜
c S(x) is sufficient for the joint parameters
130. Toy example
iid Bernoulli model versus two-state first-order Markov chain, i.e.
n
f0 (x|θ0 ) = exp θ0 I{xi =1} {1 + exp(θ0 )}n ,
i=1
versus
n
1
f1 (x|θ1 ) = exp θ1 I{xi =xi−1 } {1 + exp(θ1 )}n−1 ,
2
i=2
with priors θ0 ∼ U(−5, 5) and θ1 ∼ U(0, 6) (inspired by “phase
transition” boundaries).
131. About sufficiency
If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
(η1 (x), η2 (x)) is not always sufficient for (m, θm )
c Potential loss of information at the testing level
132. About sufficiency
If η1 (x) sufficient statistic for model m = 1 and parameter θ1 and
η2 (x) sufficient statistic for model m = 2 and parameter θ2 ,
(η1 (x), η2 (x)) is not always sufficient for (m, θm )
c Potential loss of information at the testing level
133. Poisson/geometric example
Sample
x = (x1 , . . . , xn )
from either a Poisson P(λ) or from a geometric G(p)
Sum
n
S= xi = η(x)
i=1
sufficient statistic for either model but not simultaneously
134. Limiting behaviour of B12 (T → ∞)
ABC approximation
T
t=1 Imt =1 Iρ{η(zt ),η(y)}
B12 (y) = T
,
t=1 Imt =2 Iρ{η(zt ),η(y)}
where the (mt , z t )’s are simulated from the (joint) prior
As T goes to infinity, limit
Iρ{η(z),η(y)} π1 (θ1 )f1 (z|θ1 ) dz dθ1
B12 (y) =
Iρ{η(z),η(y)} π2 (θ2 )f2 (z|θ2 ) dz dθ2
Iρ{η,η(y)} π1 (θ1 )f1η (η|θ1 ) dη dθ1
= ,
Iρ{η,η(y)} π2 (θ2 )f2η (η|θ2 ) dη dθ2
where f1η (η|θ1 ) and f2η (η|θ2 ) distributions of η(z)
135. Limiting behaviour of B12 (T → ∞)
ABC approximation
T
t=1 Imt =1 Iρ{η(zt ),η(y)}
B12 (y) = T
,
t=1 Imt =2 Iρ{η(zt ),η(y)}
where the (mt , z t )’s are simulated from the (joint) prior
As T goes to infinity, limit
Iρ{η(z),η(y)} π1 (θ1 )f1 (z|θ1 ) dz dθ1
B12 (y) =
Iρ{η(z),η(y)} π2 (θ2 )f2 (z|θ2 ) dz dθ2
Iρ{η,η(y)} π1 (θ1 )f1η (η|θ1 ) dη dθ1
= ,
Iρ{η,η(y)} π2 (θ2 )f2η (η|θ2 ) dη dθ2
where f1η (η|θ1 ) and f2η (η|θ2 ) distributions of η(z)
136. Limiting behaviour of B12 ( → 0)
When goes to zero,
η π1 (θ1 )f1η (η(y)|θ1 ) dθ1
B12 (y) =
π2 (θ2 )f2η (η(y)|θ2 ) dθ2
c Bayes factor based on the sole observation of η(y)
137. Limiting behaviour of B12 ( → 0)
When goes to zero,
η π1 (θ1 )f1η (η(y)|θ1 ) dθ1
B12 (y) =
π2 (θ2 )f2η (η(y)|θ2 ) dθ2
c Bayes factor based on the sole observation of η(y)
138. Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic in both models,
fi (y|θi ) = gi (y)fi η (η(y)|θi )
Thus
Θ1 π(θ1 )g1 (y)f1η (η(y)|θ1 ) dθ1
B12 (y) =
Θ2 π(θ2 )g2 (y)f2η (η(y)|θ2 ) dθ2
g1 (y) π1 (θ1 )f1η (η(y)|θ1 ) dθ1 g1 (y) η
= = B (y) .
g2 (y) π2 (θ2 )f2η (η(y)|θ2 ) dθ2 g2 (y) 12
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
139. Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic in both models,
fi (y|θi ) = gi (y)fi η (η(y)|θi )
Thus
Θ1 π(θ1 )g1 (y)f1η (η(y)|θ1 ) dθ1
B12 (y) =
Θ2 π(θ2 )g2 (y)f2η (η(y)|θ2 ) dθ2
g1 (y) π1 (θ1 )f1η (η(y)|θ1 ) dθ1 g1 (y) η
= = B (y) .
g2 (y) π2 (θ2 )f2η (η(y)|θ2 ) dθ2 g2 (y) 12
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
140. Poisson/geometric example (back)
Sample
x = (x1 , . . . , xn )
from either a Poisson P(λ) or from a geometric G(p)
Discrepancy ratio
g1 (x) S!n−S / i xi !
=
g2 (x) 1 n+S−1
S
143. Formal recovery
Creating an encompassing exponential family
f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θT η1 (x) + θT η2 (x) + α1 t1 (x) + α2 t2 (x)}
1 2
leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
[Didelot, Everitt, Johansen & Lawson, 2011]
In the Poisson/geometric case, if i xi ! is added to S, no
discrepancy
144. Formal recovery
Creating an encompassing exponential family
f (x|θ1 , θ2 , α1 , α2 ) ∝ exp{θT η1 (x) + θT η2 (x) + α1 t1 (x) + α2 t2 (x)}
1 2
leads to a sufficient statistic (η1 (x), η2 (x), t1 (x), t2 (x))
[Didelot, Everitt, Johansen & Lawson, 2011]
Only applies in genuine sufficiency settings...
c Inability to evaluate information loss due to summary
statistics
145. Meaning of the ABC-Bayes factor
The ABC approximation to the Bayes Factor is based solely on the
summary statistics....
In the Poisson/geometric case, if E[yi ] = θ0 > 0,
η (θ0 + 1)2 −θ0
lim B12 (y) = e
n→∞ θ0
146. Meaning of the ABC-Bayes factor
The ABC approximation to the Bayes Factor is based solely on the
summary statistics....
In the Poisson/geometric case, if E[yi ] = θ0 > 0,
η (θ0 + 1)2 −θ0
lim B12 (y) = e
n→∞ θ0
147. MA example
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1 2 1 2 1 2 1 2
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(2) with θ1 = 0.6, θ2 = 0.2. True Bayes factor
equal to 17.71.
148. MA example
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1 2 1 2 1 2 1 2
Evolution [against ] of ABC Bayes factor, in terms of frequencies of
visits to models MA(1) (left) and MA(2) (right) when equal to
10, 1, .1, .01% quantiles on insufficient autocovariance distances. Sample
of 50 points from a MA(1) model with θ1 = 0.6. True Bayes factor B21
equal to .004.
149. The only safe cases??? [circa April 2011]
Besides specific models like Gibbs random fields,
using distances over the data itself escapes the discrepancy...
[Toni & Stumpf, 2010; Sousa & al., 2009]
...and so does the use of more informal model fitting measures
[Ratmann & al., 2009]
...or use another type of approximation like empirical likelihood
[Mengersen et al., 2012, see Kerrie’s ASC 2012 talk]
150. The only safe cases??? [circa April 2011]
Besides specific models like Gibbs random fields,
using distances over the data itself escapes the discrepancy...
[Toni & Stumpf, 2010; Sousa & al., 2009]
...and so does the use of more informal model fitting measures
[Ratmann & al., 2009]
...or use another type of approximation like empirical likelihood
[Mengersen et al., 2012, see Kerrie’s ASC 2012 talk]
151. The only safe cases??? [circa April 2011]
Besides specific models like Gibbs random fields,
using distances over the data itself escapes the discrepancy...
[Toni & Stumpf, 2010; Sousa & al., 2009]
...and so does the use of more informal model fitting measures
[Ratmann & al., 2009]
...or use another type of approximation like empirical likelihood
[Mengersen et al., 2012, see Kerrie’s ASC 2012 talk]
152. ABC model choice consistency
Introduction
ABC
ABC as an inference machine
ABC for model choice
Model choice consistency
Formalised framework
Consistency results
Summary statistics
ABCel
153. The starting point
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
T
Note/warnin: c drawn on T(y) through B12 (y) necessarily differs
from c drawn on y through B12 (y)
[Marin, Pillai, X, & Rousseau, arXiv, 2012]
154. The starting point
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic T(y)
consistent?
T
Note/warnin: c drawn on T(y) through B12 (y) necessarily differs
from c drawn on y through B12 (y)
[Marin, Pillai, X, & Rousseau, arXiv, 2012]
155. A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1 : y ∼ N(θ1 , 1) opposed
√
to model M2 : y ∼ L(θ2 , 1/ 2), Laplace distribution with mean θ2
√
and scale parameter 1/ 2 (variance one).
156. A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1 : y ∼ N(θ1 , 1) opposed
√
to model M2 : y ∼ L(θ2 , 1/ 2), Laplace distribution with mean θ2
√
and scale parameter 1/ 2 (variance one).
Four possible statistics
1. sample mean y (sufficient for M1 if not M2 );
2. sample median med(y) (insufficient);
3. sample variance var(y) (ancillary);
4. median absolute deviation mad(y) = med(|y − med(y)|);
157. A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1 : y ∼ N(θ1 , 1) opposed
√
to model M2 : y ∼ L(θ2 , 1/ 2), Laplace distribution with mean θ2
√
and scale parameter 1/ 2 (variance one).
n=100
0.7
0.6
0.5
0.4
q
0.3
q
q
q
0.2
0.1
q
q
q
q
q
0.0
q
q
Gauss Laplace
158. A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1 : y ∼ N(θ1 , 1) opposed
√
to model M2 : y ∼ L(θ2 , 1/ 2), Laplace distribution with mean θ2
√
and scale parameter 1/ 2 (variance one).
n=100 n=100
1.0
0.7
q
q
0.6
q
0.8
q
q
0.5
q
q
q
0.6
q
0.4
q
q
q q
0.3
q
q
0.4
q
q
0.2
0.2
q
0.1
q
q
q q
q q
q
0.0
0.0
q
q
Gauss Laplace Gauss Laplace
159. Framework
Starting from sample
y = (y1 , . . . , yn )
the observed sample, not necessarily iid with true distribution
y ∼ Pn
Summary statistics
T(y) = Tn = (T1 (y), T2 (y), · · · , Td (y)) ∈ Rd
with true distribution Tn ∼ Gn .
160. Framework
c Comparison of
– under M1 , y ∼ F1,n (·|θ1 ) where θ1 ∈ Θ1 ⊂ Rp1
– under M2 , y ∼ F2,n (·|θ2 ) where θ2 ∈ Θ2 ⊂ Rp2
turned into
– under M1 , T(y) ∼ G1,n (·|θ1 ), and θ1 |T(y) ∼ π1 (·|Tn )
– under M2 , T(y) ∼ G2,n (·|θ2 ), and θ2 |T(y) ∼ π2 (·|Tn )
161. Assumptions
A collection of asymptotic “standard” assumptions:
[A1] is a standard central limit theorem under the true model
[A2] controls the large deviations of the estimator Tn from the
estimand µ(θ)
[A3] is the standard prior mass condition found in Bayesian
asymptotics (di effective dimension of the parameter)
[A4] restricts the behaviour of the model density against the true
density
[Think CLT!]
162. Assumptions
A collection of asymptotic “standard” assumptions:
[Think CLT!]
[A1] There exist
a sequence {vn } converging to +∞,
a distribution Q,
a symmetric, d × d positive definite matrix V0 and
a vector µ0 ∈ Rd
such that
−1/2 n→∞
vn V0 (Tn − µ0 ) Q, under Gn