This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
ABC convergence under well- and mis-specified modelsChristian Robert
1. Approximate Bayesian computation (ABC) is a simulation-based method for performing Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations where the simulated and observed data are close according to some distance measure.
2. Advances in ABC include modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including a tolerance parameter in the inferential framework.
3. Recent studies have analyzed the asymptotic properties of ABC, showing the posterior distributions and means can be consistent under certain conditions on the summary statistics and tolerance decreasing rates.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
The document discusses Approximate Bayesian Computation (ABC). ABC allows inference for statistical models where the likelihood function is not available in closed form. ABC works by simulating data under different parameter values and comparing simulated to observed data. ABC has been used for model choice by comparing evidence for different models. Consistency of ABC for model choice depends on the criterion used and asymptotic identifiability of the parameters.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
Those are the slides made for the course given in the Gran Paradiso National Park on July 13-17, based on the original slides for the three first chapters of Bayesian Core. The specific datasets used to illustrate those slides are not publicly available.
ABC convergence under well- and mis-specified modelsChristian Robert
1. Approximate Bayesian computation (ABC) is a simulation-based method for performing Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations where the simulated and observed data are close according to some distance measure.
2. Advances in ABC include modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including a tolerance parameter in the inferential framework.
3. Recent studies have analyzed the asymptotic properties of ABC, showing the posterior distributions and means can be consistent under certain conditions on the summary statistics and tolerance decreasing rates.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
The document discusses Approximate Bayesian Computation (ABC). ABC allows inference for statistical models where the likelihood function is not available in closed form. ABC works by simulating data under different parameter values and comparing simulated to observed data. ABC has been used for model choice by comparing evidence for different models. Consistency of ABC for model choice depends on the criterion used and asymptotic identifiability of the parameters.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
Those are the slides made for the course given in the Gran Paradiso National Park on July 13-17, based on the original slides for the three first chapters of Bayesian Core. The specific datasets used to illustrate those slides are not publicly available.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
This document discusses the use of approximate Bayesian computation (ABC) to perform Bayesian inference when the likelihood function is intractable. It presents François and Pudlo's (F&P) ABC method, which uses a summary statistic S(·), an importance proposal g(·), a kernel K(·), and a bandwidth h. The document outlines the approximations and errors involved in F&P's ABC and discusses calibrating and optimizing the choice of summary statistic and bandwidth. It notes that while the optimal summary statistic is the expectation E(θ|y), this is usually unavailable, so F&P suggest estimating it using simulated data from a pilot run.
The document discusses the history and importance of chocolate in human civilization. It notes that chocolate originated in Mesoamerica over 3000 years ago and was prized by the Aztecs and Mayans for its taste. Cocoa beans were used as currency and their cultivation was tightly regulated. The Spanish brought cocoa to Europe in the 16th century, starting its global spread and the development of the chocolate industry.
This document discusses various importance sampling methods for approximating marginal likelihoods, including regular importance sampling, bridge sampling, and harmonic means. It compares these methods on a probit model example using data on diabetes in Pima Indian women. Regular importance sampling uses the MLE distribution as an importance function. Bridge sampling introduces a pseudo-posterior to handle models with different parameter dimensions. Harmonic means directly uses the posterior sample but requires a proposal distribution with lighter tails than the posterior.
1. The document discusses Rao-Blackwellisation, a technique that can make MCMC sampling more efficient.
2. Rao-Blackwellisation involves partitioning the state space and analytically integrating out some variables, reducing the variance of the estimates.
3. It can be applied to any Hastings-Metropolis algorithm and provides asymptotic computational improvements over standard MCMC with controlled additional computation.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as dealing with latent variable models where the likelihood function is intractable. It then covers random variable generation techniques before introducing the key MCMC algorithms: the Metropolis-Hastings algorithm and the Gibbs sampler. The document outlines the remaining topics to be covered, which include Monte Carlo integration, notions of Markov chains, and further advanced topics.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
1. The document discusses Bayesian model choice and compares multiple computational methods for approximating model evidence, including importance sampling, bridge sampling, and nested sampling.
2. It explains that Bayesian model choice involves probabilizing models and parameters, and computing the evidence for each model, which is proportional to the marginal likelihood.
3. One method covered is bridge sampling, which uses importance functions to approximate the Bayes factor for comparing two models. Optimal bridge sampling chooses an auxiliary function that minimizes the variance of the estimate.
Presentation of Bassoum Abou on Stein's 1981 AoS paperChristian Robert
The document discusses estimation of the mean of a multivariate normal distribution. It covers basic formulas, Bayes estimates, choosing a scalar factor, applications including symmetric moving averages, and the case of unknown variance. The main results are theorems on the risk and unbiased risk estimates for different Bayes estimates of the mean.
Presentation of Birnbaum's Likelihood Principle foundational paper at the Reading Statistical Classics seminar, Jan. 20, 2013, Université Paris-Dauphine
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Christian Robert
The paper studies the relationship between p-values and Bayesian measures of evidence for testing point null hypotheses. It finds that p-values can be highly misleading about the evidence provided by data against the null hypothesis. Through examples, it shows that a small p-value does not necessarily mean high posterior probability for rejecting the null or a large Bayes factor against the null. The paper derives lower bounds on these Bayesian measures of evidence for several classes of distributions to demonstrate the conflicts between p-values and Bayesian analysis.
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsChristian Robert
The document outlines a seminar on Bayes estimates for the linear model. It introduces the linear model and Bayesian methods. It then discusses exchangeability, providing an example of an exchangeable distribution. It also discusses the general Bayesian linear model, including the posterior distribution of the parameters using a three stage model.
Likelihood-free methods provide techniques for approximating Bayesian computations when the likelihood function is unavailable or computationally intractable. Monte Carlo methods like importance sampling and iterated importance sampling generate samples from an approximating distribution to estimate integrals. Population Monte Carlo is an iterative Monte Carlo algorithm that propagates a population of particles over time to explore the target distribution. Approximate Bayesian computation uses simulation-based methods to approximate posterior distributions when the likelihood is unavailable.
This document discusses computational Bayesian statistics and focuses on computational basics. It covers the Bayesian toolbox including Bayes' theorem, prior and posterior distributions, improper prior distributions, testing hypotheses using Bayes factors, and Monte Carlo integration techniques like importance sampling. Examples are provided to demonstrate Monte Carlo integration for estimating integrals like the Bayes factor and posterior expectations.
This document provides an overview of chapters from Sir Harold Jeffreys' classic book "Theory of Probability". It summarizes Jeffreys' background and contributions, including developing one of the first comprehensive Bayesian frameworks. It also reviews Jeffreys' construction of noninformative prior distributions, which aim to represent a state of ignorance through being invariant to reparameterization of the model. However, the document notes some difficulties with Jeffreys' approach, such as improper priors potentially leading to inconsistent results for hypothesis testing of point nulls.
The document outlines the theory of the most efficient tests of statistical hypotheses as proposed by Neyman Pearson. It discusses the two types of errors in hypothesis testing, the likelihood principle, and provides solutions for finding the most efficient critical regions for simple and composite hypotheses. Examples are given to illustrate the theory and key points include that the most efficient critical region maximizes the probability of rejecting the null for a given probability of type 1 error.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, nested sampling, and approximate Bayesian computation (ABC) model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance function. Bridge sampling provides an alternative estimator that performs better when the models being compared have little overlap. Nested sampling and ABC can also be applied to Bayesian model choice by approximating the marginal likelihood of each model.
1. The document discusses various importance sampling methods for Bayesian model comparison and discrimination between embedded models.
2. It compares regular importance sampling, bridge sampling, mixtures to bridge sampling, harmonic means, Chib's solution, and the Savage-Dickey ratio as solutions for approximating Bayes factors for model comparison.
3. As an example, it applies regular importance sampling to test whether a diabetes pedigree function variable is significant in a probit model for predicting diabetes in Pima Indian women, using a g-prior specification.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, harmonic means, Chib's solution, cross-model solutions, nested sampling, and ABC model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance distribution. Bridge sampling is a special case of importance sampling that uses the posterior distribution under one model as the importance distribution for another model.
These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by introducing ABC as a technique for performing Bayesian inference when the likelihood function is intractable. It then describes how ABC can be used for model choice by sampling models from their prior and parameters from the prior conditional on the model. The document outlines the generic ABC algorithm for model choice and discusses how it estimates posterior model probabilities. It also discusses applications of ABC for model choice in Gibbs random fields and Potts models.
The document discusses computational methods for Bayesian model choice and model comparison. It introduces Bayes factors and the evidence as central quantities for model comparison. It then describes various computational methods for approximating the evidence, including importance sampling solutions like bridge sampling, harmonic mean approximations using posterior samples, and approximating the evidence using mixture representations.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by explaining the challenges of performing regular Bayesian computation when likelihoods are intractable. The ABC algorithm is then introduced as a likelihood-free rejection technique that approximates the posterior distribution. ABC can be applied to model choice problems by generating parameters and data from each model. The document discusses issues with implementing ABC for model choice and extensions like weighting schemes. Gibbs random field models are also discussed where the goal is to choose between different neighborhood structures based on posterior model probabilities.
ABC methods can be used for Bayesian model choice between multiple models. The ABC model choice algorithm samples models from their prior probabilities and parameters from the corresponding prior distributions. Simulated data is generated and accepted if the distance between summary statistics of the simulated and observed data is below a tolerance level. As the number of simulations increases, the ABC approximation of the Bayes factor converges to the true Bayes factor. Sufficient statistics for individual models may not be sufficient for the joint model and parameters.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins with an overview of ABC and how it can be used for model choice by simulating parameters from each model's prior and accepting simulations that meet a tolerance threshold based on summary statistics. It then discusses how ABC can estimate posterior model probabilities and issues that can arise, such as whether tolerances and summary statistics should vary across models. The document also discusses how ABC performs in limiting cases and when summary statistics are sufficient. ABC provides a way to perform Bayesian model choice and estimation when likelihoods are intractable.
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
This document discusses some uncertainties and difficulties that can arise when applying Bayesian concepts such as Bayes factors and model selection. Specifically, it notes that Bayes factors can depend on the choice of prior distributions and may not correspond to the "full model" when using approximate Bayesian computation. It also discusses issues with assigning probability to point null hypotheses and notes that improper priors cannot be used when computing Bayes factors, as the normalization would be undefined. The document aims to bring to light some of the subtleties and open problems within the Bayesian framework.
This document discusses some uncertainties and difficulties that can arise when applying Bayesian concepts such as Bayes factors and model selection. Specifically, it notes that Bayes factors can depend on the choice of prior distributions and may not correspond to the "full model" when using approximate Bayesian computation. It also discusses issues with assigning probability to point null hypotheses and notes that improper priors cannot be used when computing Bayes factors, as the normalization would be undefined. The document aims to bring to light some of the subtleties and open problems within the Bayesian framework.
For the discovery of a regression relationship between y and x, a vector of p potential predictors, the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches. To exploit the potential monotonicity of the predictors, we introduce mBART, a constrained version of BART that incorporates monotonicity with a multivariate basis of monotone trees, thereby avoiding the further confines of a full parametric form. Using mBART to estimate such effects yields (i) function estimates that are smoother and more interpretable, (ii) better out-of-sample predictive performance and (iii) less post-data uncertainty. By using mBART to simultaneously estimate both the increasing and the decreasing regions of a predictor, mBART opens up a new approach to the discovery and estimation of the decomposition of a function into its monotone components.
(This is joint work with H. Chipman, R. McCulloch and T. Shively).
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
This document summarizes an introduction to Bayesian nonparametric models presented by Alessandro Panella. It discusses Bayesian learning and De Finetti's theorem, which shows that any exchangeable sequence of random variables can be represented as conditionally independent given a random variable. Finite mixture models are introduced as a Bayesian approach to clustering. Dirichlet process mixture models provide a nonparametric generalization that allows for an unbounded number of clusters.
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
This document discusses Bayesian estimation of conditional moment models. It presents several approaches for completing conditional moment models for Bayesian processing, including using non-parametric parts, empirical likelihood Bayesian tools, or maximum entropy alternatives. It also discusses simplistic ABC alternatives and innovative aspects of introducing tolerance parameters for misspecification and cancelling conditional aspects. Unconditional and conditional model comparison using empirical likelihoods and Bayes factors is proposed.
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
The document proposes a new version of Hamiltonian Monte Carlo (HMC) sampling that is essentially calibration-free. It achieves this by learning the optimal leapfrog scale from the distribution of integration times using the No-U-Turn Sampler algorithm. Compared to the original NUTS algorithm on benchmark models, this new enhanced HMC (eHMC) exhibits significantly improved efficiency with no hand-tuning of parameters required. The document tests eHMC on a Susceptible-Infected-Recovered model of disease transmission.
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Approximating Bayes Factors
1. On approximating Bayes factors
On approximating Bayes factors
Christian P. Robert
Universit´ Paris Dauphine & CREST-INSEE
e
http://www.ceremade.dauphine.fr/~xian
CRiSM workshop on model uncertainty
U. Warwick, May 30, 2010
Joint works with N. Chopin, J.-M. Marin, K. Mengersen
& D. Wraith
3. On approximating Bayes factors
Introduction
Model choice
Model choice as model comparison
Choice between models
Several models available for the same observation
Mi : x ∼ fi (x|θi ), i∈I
where I can be finite or infinite
Replace hypotheses with models
4. On approximating Bayes factors
Introduction
Model choice
Model choice as model comparison
Choice between models
Several models available for the same observation
Mi : x ∼ fi (x|θi ), i∈I
where I can be finite or infinite
Replace hypotheses with models
5. On approximating Bayes factors
Introduction
Model choice
Bayesian model choice
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
j Θj
take largest π(Mi |x) to determine “best” model,
6. On approximating Bayes factors
Introduction
Model choice
Bayesian model choice
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
j Θj
take largest π(Mi |x) to determine “best” model,
7. On approximating Bayes factors
Introduction
Model choice
Bayesian model choice
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
j Θj
take largest π(Mi |x) to determine “best” model,
8. On approximating Bayes factors
Introduction
Model choice
Bayesian model choice
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
j Θj
take largest π(Mi |x) to determine “best” model,
9. On approximating Bayes factors
Introduction
Bayes factor
Bayes factor
Definition (Bayes factors)
For comparing model M0 with θ ∈ Θ0 vs. M1 with θ ∈ Θ1 , under
priors π0 (θ) and π1 (θ), central quantity
f0 (x|θ0 )π0 (θ0 )dθ0
π(Θ0 |x) π(Θ0 ) Θ0
B01 = =
π(Θ1 |x) π(Θ1 )
f1 (x|θ)π1 (θ1 )dθ1
Θ1
[Jeffreys, 1939]
10. On approximating Bayes factors
Introduction
Evidence
Evidence
Problems using a similar quantity, the evidence
Zk = πk (θk )Lk (θk ) dθk ,
Θk
aka the marginal likelihood.
[Jeffreys, 1939]
11. On approximating Bayes factors
Importance sampling solutions compared
A comparison of importance sampling solutions
1 Introduction
2 Importance sampling solutions compared
Regular importance
Bridge sampling
Harmonic means
Mixtures to bridge
Chib’s solution
The Savage–Dickey ratio
3 Nested sampling
[Marin & Robert, 2010]
12. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Bayes factor approximation
When approximating the Bayes factor
f0 (x|θ0 )π0 (θ0 )dθ0
Θ0
B01 =
f1 (x|θ1 )π1 (θ1 )dθ1
Θ1
use of importance functions 0 and 1 and
n−1
0
n0 i i
i=1 f0 (x|θ0 )π0 (θ0 )/
i
0 (θ0 )
B01 =
n−1
1
n1 i i
i=1 f1 (x|θ1 )π1 (θ1 )/
i
1 (θ1 )
13. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Diabetes in Pima Indian women
Example (R benchmark)
“A population of women who were at least 21 years old, of Pima
Indian heritage and living near Phoenix (AZ), was tested for
diabetes according to WHO criteria. The data were collected by
the US National Institute of Diabetes and Digestive and Kidney
Diseases.”
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
14. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
[Marin & Robert, 2007]
15. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
[Marin & Robert, 2007]
16. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
MCMC 101 for probit models
Use of either a random walk proposal
β =β+
in a Metropolis-Hastings algorithm (since the likelihood is
available) or of a Gibbs sampler that takes advantage of the
missing/latent variable
z|y, x, β ∼ N (xT β, 1) Iy × I1−y
z≥0 z≤0
(since β|y, X, z is distributed as a standard normal)
[Gibbs three times faster]
17. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
MCMC 101 for probit models
Use of either a random walk proposal
β =β+
in a Metropolis-Hastings algorithm (since the likelihood is
available) or of a Gibbs sampler that takes advantage of the
missing/latent variable
z|y, x, β ∼ N (xT β, 1) Iy × I1−y
z≥0 z≤0
(since β|y, X, z is distributed as a standard normal)
[Gibbs three times faster]
18. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
MCMC 101 for probit models
Use of either a random walk proposal
β =β+
in a Metropolis-Hastings algorithm (since the likelihood is
available) or of a Gibbs sampler that takes advantage of the
missing/latent variable
z|y, x, β ∼ N (xT β, 1) Iy × I1−y
z≥0 z≤0
(since β|y, X, z is distributed as a standard normal)
[Gibbs three times faster]
19. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Importance sampling for the Pima Indian dataset
Use of the importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
R Importance sampling code
model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
20. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Importance sampling for the Pima Indian dataset
Use of the importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
R Importance sampling code
model1=summary(glm(y~-1+X1,family=binomial(link="probit")))
is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
21. On approximating Bayes factors
Importance sampling solutions compared
Regular importance
Diabetes in Pima Indian women
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations from the prior and
the above MLE importance sampler
5
4
q
3
2
Monte Carlo Importance sampling
22. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Bridge sampling
Special case:
If
π1 (θ1 |x) ∝ π1 (θ1 |x)
˜
π2 (θ2 |x) ∝ π2 (θ2 |x)
˜
live on the same space (Θ1 = Θ2 ), then
n
1 π1 (θi |x)
˜
B12 ≈ θi ∼ π2 (θ|x)
n π2 (θi |x)
˜
i=1
[Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
23. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Bridge sampling variance
The bridge sampling estimator does poorly if
2
var(B12 ) 1 π1 (θ) − π2 (θ)
2 ≈ E
B12 n π2 (θ)
is large, i.e. if π1 and π2 have little overlap...
24. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Bridge sampling variance
The bridge sampling estimator does poorly if
2
var(B12 ) 1 π1 (θ) − π2 (θ)
2 ≈ E
B12 n π2 (θ)
is large, i.e. if π1 and π2 have little overlap...
29. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
30. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
31. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
R bridge sampling code
cova=model2$cov.unscaled
expecta=model2$coeff[,1]
covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]
probit1=hmprobit(Niter,y,X1)
probit2=hmprobit(Niter,y,X2)
pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
probit1p=cbind(probit1,pseudo)
bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
dnorm(pseudo,expecta[3],cova[3,3])))
32. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
R bridge sampling code
cova=model2$cov.unscaled
expecta=model2$coeff[,1]
covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]
probit1=hmprobit(Niter,y,X1)
probit2=hmprobit(Niter,y,X2)
pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
probit1p=cbind(probit1,pseudo)
bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
dnorm(pseudo,expecta[3],cova[3,3])))
33. On approximating Bayes factors
Importance sampling solutions compared
Bridge sampling
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 × 20, 000 simulations from the prior (MC), the above
bridge sampler and the above importance sampler
5
4
q
q
3
q
2
MC Bridge IS
34. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
35. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
36. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
“The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that
this estimator is consistent ie, it will very likely be very close to the
correct answer if you use a sufficiently large number of points from
the posterior distribution.
The bad news is that the number of points required for this
estimator to get close to the right answer will often be greater
than the number of atoms in the observable universe. The even
worse news is that it’s easy for people to not realize this, and to
na¨ıvely accept estimates that are nowhere close to the correct
value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
37. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
“The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that
this estimator is consistent ie, it will very likely be very close to the
correct answer if you use a sufficiently large number of points from
the posterior distribution.
The bad news is that the number of points required for this
estimator to get close to the right answer will often be greater
than the number of atoms in the observable universe. The even
worse news is that it’s easy for people to not realize this, and to
na¨ıvely accept estimates that are nowhere close to the correct
value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
38. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
39. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
40. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
41. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
42. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling (cont’d)
Compare Z1k with a standard importance sampling approximation
T (t) (t)
1 πk (θk )Lk (θk )
Z2k = (t)
T ϕ(θk )
t=1
(t)
where the θk ’s are generated from the density ϕ(·) (with fatter
tails like t’s)
43. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
HPD indicator as ϕ
Use the convex hull of MCMC simulations corresponding to the
10% HPD region (easily derived!) and ϕ as indicator:
10
ϕ(θ) = Id(θ,θ(t) )≤
T
t∈HPD
44. On approximating Bayes factors
Importance sampling solutions compared
Harmonic means
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above harmonic mean sampler and importance samplers
3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116
q
q
Harmonic mean Importance sampling
45. On approximating Bayes factors
Importance sampling solutions compared
Mixtures to bridge
Approximating Zk using a mixture representation
Bridge sampling redux
Design a specific mixture for simulation [importance sampling]
purposes, with density
ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
where ϕ(·) is arbitrary (but normalised)
Note: ω1 is not a probability weight
[Chopin & Robert, 2010]
46. On approximating Bayes factors
Importance sampling solutions compared
Mixtures to bridge
Approximating Zk using a mixture representation
Bridge sampling redux
Design a specific mixture for simulation [importance sampling]
purposes, with density
ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
where ϕ(·) is arbitrary (but normalised)
Note: ω1 is not a probability weight
[Chopin & Robert, 2010]
47. On approximating Bayes factors
Importance sampling solutions compared
Mixtures to bridge
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
1 Take δ (t) = 1 with probability
(t−1) (t−1) (t−1) (t−1) (t−1)
ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk )
and δ (t) = 2 otherwise;
(t) (t−1)
2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where
MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
(t)
3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
48. On approximating Bayes factors
Importance sampling solutions compared
Mixtures to bridge
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
1 Take δ (t) = 1 with probability
(t−1) (t−1) (t−1) (t−1) (t−1)
ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk )
and δ (t) = 2 otherwise;
(t) (t−1)
2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where
MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
(t)
3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
49. On approximating Bayes factors
Importance sampling solutions compared
Mixtures to bridge
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
1 Take δ (t) = 1 with probability
(t−1) (t−1) (t−1) (t−1) (t−1)
ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk )
and δ (t) = 2 otherwise;
(t) (t−1)
2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where
MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
(t)
3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
52. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
53. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
54. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
55. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Label switching
A mixture model [special case of missing variable model] is
invariant under permutations of the indices of the components.
E.g., mixtures
0.3N (0, 1) + 0.7N (2.3, 1)
and
0.7N (2.3, 1) + 0.3N (0, 1)
are exactly the same!
c The component parameters θi are not identifiable
marginally since they are exchangeable
56. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Label switching
A mixture model [special case of missing variable model] is
invariant under permutations of the indices of the components.
E.g., mixtures
0.3N (0, 1) + 0.7N (2.3, 1)
and
0.7N (2.3, 1) + 0.3N (0, 1)
are exactly the same!
c The component parameters θi are not identifiable
marginally since they are exchangeable
57. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Connected difficulties
1 Number of modes of the likelihood of order O(k!):
c Maximization and even [MCMC] exploration of the
posterior surface harder
2 Under exchangeable priors on (θ, p) [prior invariant under
permutation of the indices], all posterior marginals are
identical:
c Posterior expectation of θ1 equal to posterior expectation
of θ2
58. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Connected difficulties
1 Number of modes of the likelihood of order O(k!):
c Maximization and even [MCMC] exploration of the
posterior surface harder
2 Under exchangeable priors on (θ, p) [prior invariant under
permutation of the indices], all posterior marginals are
identical:
c Posterior expectation of θ1 equal to posterior expectation
of θ2
59. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
License
Since Gibbs output does not produce exchangeability, the Gibbs
sampler has not explored the whole parameter space: it lacks
energy to switch simultaneously enough component allocations at
once
0.2 0.3 0.4 0.5
−1 0 1 2 3
µi
0 100 200
n
300 400 500
pi −1 0
µ
1
i
2 3
0.4 0.6 0.8 1.0
0.2 0.3 0.4 0.5
σi
pi
0 100 200 300 400 500 0.2 0.3 0.4 0.5
n pi
0.4 0.6 0.8 1.0
−1 0 1 2 3
σi
µi
0 100 200 300 400 500 0.4 0.6 0.8 1.0
n σi
60. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Label switching paradox
We should observe the exchangeability of the components [label
switching] to conclude about convergence of the Gibbs sampler.
If we observe it, then we do not know how to estimate the
parameters.
If we do not, then we are uncertain about the convergence!!!
61. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Label switching paradox
We should observe the exchangeability of the components [label
switching] to conclude about convergence of the Gibbs sampler.
If we observe it, then we do not know how to estimate the
parameters.
If we do not, then we are uncertain about the convergence!!!
62. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Label switching paradox
We should observe the exchangeability of the components [label
switching] to conclude about convergence of the Gibbs sampler.
If we observe it, then we do not know how to estimate the
parameters.
If we do not, then we are uncertain about the convergence!!!
63. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
64. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
65. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Galaxy dataset
n = 82 galaxies as a mixture of k normal distributions with both
mean and variance unknown.
[Roeder, 1992]
Average density
0.8
0.6
Relative Frequency
0.4
0.2
0.0
−2 −1 0 1 2 3
data
66. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,
log(mk (x)) = −105.1396
ˆ
for k = 3 (based on 103 simulations), while introducing the
permutations leads to
log(mk (x)) = −103.3479
ˆ
Note that
−105.1396 + log(3!) = −103.3479
k 2 3 4 5 6 7 8
mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105 Gibbs iterations and, for k > 5, 100
permutations selected at random in Sk ).
[Lee, Marin, Mengersen & Robert, 2008]
67. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,
log(mk (x)) = −105.1396
ˆ
for k = 3 (based on 103 simulations), while introducing the
permutations leads to
log(mk (x)) = −103.3479
ˆ
Note that
−105.1396 + log(3!) = −103.3479
k 2 3 4 5 6 7 8
mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105 Gibbs iterations and, for k > 5, 100
permutations selected at random in Sk ).
[Lee, Marin, Mengersen & Robert, 2008]
68. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Galaxy dataset (k)
∗
Using only the original estimate, with θk as the MAP estimator,
log(mk (x)) = −105.1396
ˆ
for k = 3 (based on 103 simulations), while introducing the
permutations leads to
log(mk (x)) = −103.3479
ˆ
Note that
−105.1396 + log(3!) = −103.3479
k 2 3 4 5 6 7 8
mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44
Estimations of the marginal likelihoods by the symmetrised Chib’s
approximation (based on 105 Gibbs iterations and, for k > 5, 100
permutations selected at random in Sk ).
[Lee, Marin, Mengersen & Robert, 2008]
69. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Case of the probit model
For the completion by z,
1
π (θ|x) =
ˆ π(θ|x, z (t) )
T t
is a simple average of normal densities
R Bridge sampling code
gibbs1=gibbsprobit(Niter,y,X1)
gibbs2=gibbsprobit(Niter,y,X2)
bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
70. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Case of the probit model
For the completion by z,
1
π (θ|x) =
ˆ π(θ|x, z (t) )
T t
is a simple average of normal densities
R Bridge sampling code
gibbs1=gibbsprobit(Niter,y,X1)
gibbs2=gibbsprobit(Niter,y,X2)
bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
71. On approximating Bayes factors
Importance sampling solutions compared
Chib’s solution
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above Chib’s and importance samplers
q
0.0255
q
q q
q
0.0250
0.0245
0.0240
q
q
Chib's method importance sampling
72. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
The Savage–Dickey ratio
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
73. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Savage’s density ratio theorem
Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that
π1 (ψ|θ0 ) = π0 (ψ)
then
π1 (θ0 |x)
B01 = ,
π1 (θ0 )
with the obvious notations
π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ ,
[Dickey, 1971; Verdinelli & Wasserman, 1995]
74. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
75. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
76. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
77. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
78. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
79. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
80. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
81. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
82. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Computational implementation
Starting from the (new) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
to hold.
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
83. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Computational implementation
Starting from the (new) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
to hold.
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
84. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
First ratio
If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
˜
1
π1 (θ0 |x, ψ (t) )
˜
T t
converges to π1 (θ0 |x) (if the right version is used in θ0 ).
˜
π1 (θ0 )f (x|θ0 , ψ)
π1 (θ0 |x, ψ) =
˜
π1 (θ)f (x|θ, ψ) dθ
85. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
86. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
87. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
88. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
89. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate
(θ ¯
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 =
T π1 (θ0 ) T
t=1
π1 (ψ ¯
¯ (t) |θ (t) )
90. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate
(θ ¯
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 =
T π1 (θ0 ) T
t=1
π1 (ψ ¯
¯ (t) |θ (t) )
91. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Difference with Verdinelli–Wasserman representation
The above leads to the representation
π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
˜
B01 = E
π1 (θ0 ) π1 (ψ|θ)
which shows how our approach differs from Verdinelli and
Wasserman’s
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
92. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Difference with Verdinelli–Wasserman approximation
In terms of implementation,
(t) , ψ (t) ) T ¯
MR 1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 (x) = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
formaly resembles
T T ˜
VW 1 π1 (θ0 |x, z (t) , ψ (t) ) 1 π0 (ψ (t) )
B01 (x) = .
T π1 (θ0 ) T ˜
π1 (ψ (t) |θ0 )
t=1 t=1
But the simulated sequences differ: first average involves
simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second
˜
average relies on simulations from π1 (θ, ψ, z|x) and from
π1 (ψ, z|x, θ0 ),
93. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Difference with Verdinelli–Wasserman approximation
In terms of implementation,
(t) , ψ (t) ) T ¯
MR 1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 (x) = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
formaly resembles
T T ˜
VW 1 π1 (θ0 |x, z (t) , ψ (t) ) 1 π0 (ψ (t) )
B01 (x) = .
T π1 (θ0 ) T ˜
π1 (ψ (t) |θ0 )
t=1 t=1
But the simulated sequences differ: first average involves
simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second
˜
average relies on simulations from π1 (θ, ψ, z|x) and from
π1 (ψ, z|x, θ0 ),
94. On approximating Bayes factors
Importance sampling solutions compared
The Savage–Dickey ratio
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above importance, Chib’s, Savage–Dickey’s and bridge
samplers
q
3.4
3.2
3.0
2.8
q
IS Chib Savage−Dickey Bridge
95. On approximating Bayes factors
Nested sampling
Properties of nested sampling
1 Introduction
2 Importance sampling solutions compared
3 Nested sampling
Purpose
Implementation
Error rates
Impact of dimension
Constraints
Importance variant
A mixture comparison
[Chopin & Robert, 2010]
96. On approximating Bayes factors
Nested sampling
Purpose
Nested sampling: Goal
Skilling’s (2007) technique using the one-dimensional
representation:
1
Z = Eπ [L(θ)] = ϕ(x) dx
0
with
ϕ−1 (l) = P π (L(θ) > l).
Note; ϕ(·) is intractable in most cases.
97. On approximating Bayes factors
Nested sampling
Implementation
Nested sampling: First approximation
Approximate Z by a Riemann sum:
j
Z= (xi−1 − xi )ϕ(xi )
i=1
where the xi ’s are either:
deterministic: xi = e−i/N
or random:
x0 = 1, xi+1 = ti xi , ti ∼ Be(N, 1)
so that E[log xi ] = −i/N .
98. On approximating Bayes factors
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
Z= e−θ dθ = e e = Eδ e
δ δ
N
1
ˆ
Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
99. On approximating Bayes factors
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
Z= e−θ dθ = e e = Eδ e
δ δ
N
1
ˆ
Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
100. On approximating Bayes factors
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
Z= e−θ dθ = e e = Eδ e
δ δ
N
1
ˆ
Z= δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
101. On approximating Bayes factors
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
1 Take ϕi = L(θk ), where θk is the point with smallest
likelihood in the pool of θi ’s
2 Replace θk with a sample from the prior constrained to
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
102. On approximating Bayes factors
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
1 Take ϕi = L(θk ), where θk is the point with smallest
likelihood in the pool of θi ’s
2 Replace θk with a sample from the prior constrained to
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
103. On approximating Bayes factors
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
1 Take ϕi = L(θk ), where θk is the point with smallest
likelihood in the pool of θi ’s
2 Replace θk with a sample from the prior constrained to
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
104. On approximating Bayes factors
Nested sampling
Implementation
Nested sampling: Third approximation
Iterate the above steps until a given stopping iteration j is
reached: e.g.,
observe very small changes in the approximation Z;
reach the maximal value of L(θ) when the likelihood is
bounded and its maximum is known;
truncate the integral Z at level , i.e. replace
1 1
ϕ(x) dx with ϕ(x) dx
0
105. On approximating Bayes factors
Nested sampling
Error rates
Approximation error
Error = Z − Z
j 1
= (xi−1 − xi )ϕi − ϕ(x) dx = − ϕ(x) dx
i=1 0 0
j 1
+ (xi−1 − xi )ϕ(xi ) − ϕ(x) dx (Quadrature Error)
i=1
j
+ (xi−1 − xi ) {ϕi − ϕ(xi )} (Stochastic Error)
i=1
[Dominated by Monte Carlo!]
106. On approximating Bayes factors
Nested sampling
Error rates
A CLT for the Stochastic Error
The (dominating) stochastic error is OP (N −1/2 ):
D
N 1/2 {Stochastic Error} → N (0, V )
with
V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt.
s,t∈[ ,1]
[Proof based on Donsker’s theorem]
The number of simulated points equals the number of iterations j,
and is a multiple of N : if one stops at first iteration j such that
e−j/N < , then: j = N − log .
107. On approximating Bayes factors
Nested sampling
Error rates
A CLT for the Stochastic Error
The (dominating) stochastic error is OP (N −1/2 ):
D
N 1/2 {Stochastic Error} → N (0, V )
with
V =− sϕ (s)tϕ (t) log(s ∨ t) ds dt.
s,t∈[ ,1]
[Proof based on Donsker’s theorem]
The number of simulated points equals the number of iterations j,
and is a multiple of N : if one stops at first iteration j such that
e−j/N < , then: j = N − log .
108. On approximating Bayes factors
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
1 asymptotic variance of the NS estimator;
2 number of iterations (necessary to reach a given truncation
error);
3 cost of one simulated sample.
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
109. On approximating Bayes factors
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
1 asymptotic variance of the NS estimator;
2 number of iterations (necessary to reach a given truncation
error);
3 cost of one simulated sample.
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
110. On approximating Bayes factors
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
1 asymptotic variance of the NS estimator;
2 number of iterations (necessary to reach a given truncation
error);
3 cost of one simulated sample.
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
111. On approximating Bayes factors
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
1 asymptotic variance of the NS estimator;
2 number of iterations (necessary to reach a given truncation
error);
3 cost of one simulated sample.
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
112. On approximating Bayes factors
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
[Thanks, Gareth!]
113. On approximating Bayes factors
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
[Thanks, Gareth!]
114. On approximating Bayes factors
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
[Thanks, Gareth!]
115. On approximating Bayes factors
Nested sampling
Importance variant
A IS variant of nested sampling
˜
Consider instrumental prior π and likelihood L, weight function
π(θ)L(θ)
w(θ) =
π(θ)L(θ)
and weighted NS estimator
j
Z= (xi−1 − xi )ϕi w(θi ).
i=1
Then choose (π, L) so that sampling from π constrained to
L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
116. On approximating Bayes factors
Nested sampling
Importance variant
A IS variant of nested sampling
˜
Consider instrumental prior π and likelihood L, weight function
π(θ)L(θ)
w(θ) =
π(θ)L(θ)
and weighted NS estimator
j
Z= (xi−1 − xi )ϕi w(θi ).
i=1
Then choose (π, L) so that sampling from π constrained to
L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
117. On approximating Bayes factors
Nested sampling
A mixture comparison
Benchmark: Target distribution
Posterior distribution on (µ, σ) associated with the mixture
pN (0, 1) + (1 − p)N (µ, σ) ,
when p is known
118. On approximating Bayes factors
Nested sampling
A mixture comparison
Experiment
n observations with
µ = 2 and σ = 3/2,
Use of a uniform prior
both on (−2, 6) for µ
and on (.001, 16) for
log σ 2 .
occurrences of posterior
bursts for µ = xi
computation of the
various estimates of Z
119. On approximating Bayes factors
Nested sampling
A mixture comparison
Experiment (cont’d)
MCMC sample for n = 16 Nested sampling sequence
observations from the mixture. with M = 1000 starting points.
120. On approximating Bayes factors
Nested sampling
A mixture comparison
Experiment (cont’d)
MCMC sample for n = 50 Nested sampling sequence
observations from the mixture. with M = 1000 starting points.
121. On approximating Bayes factors
Nested sampling
A mixture comparison
Comparison
Monte Carlo and MCMC (=Gibbs) outputs based on T = 104
simulations and numerical integration based on a 850 × 950 grid in
the (µ, σ) parameter space.
Nested sampling approximation based on a starting sample of
M = 1000 points followed by at least 103 further simulations from
the constr’d prior and a stopping rule at 95% of the observed
maximum likelihood.
Constr’d prior simulation based on 50 values simulated by random
walk accepting only steps leading to a lik’hood higher than the
bound