The document discusses computational methods for Bayesian model choice and model comparison. It introduces Bayes factors and the evidence as central quantities for model comparison. It then describes various computational methods for approximating the evidence, including importance sampling solutions like bridge sampling, harmonic mean approximations using posterior samples, and approximating the evidence using mixture representations.
This document discusses several computational methods for Bayesian model choice, including importance sampling, cross-model solutions, nested sampling, and approximate Bayesian computation (ABC) model choice. It introduces Bayes factors and marginal likelihoods as key quantities for Bayesian model comparison, and describes how Bayesian model choice involves allocating probabilities to different models and computing the marginal likelihood or evidence for each model.
1. The document discusses various importance sampling methods for Bayesian model comparison and discrimination between embedded models.
2. It compares regular importance sampling, bridge sampling, mixtures to bridge sampling, harmonic means, Chib's solution, and the Savage-Dickey ratio as solutions for approximating Bayes factors for model comparison.
3. As an example, it applies regular importance sampling to test whether a diabetes pedigree function variable is significant in a probit model for predicting diabetes in Pima Indian women, using a g-prior specification.
1. The document discusses Bayesian model choice and compares multiple computational methods for approximating model evidence, including importance sampling, bridge sampling, and nested sampling.
2. It explains that Bayesian model choice involves probabilizing models and parameters, and computing the evidence for each model, which is proportional to the marginal likelihood.
3. One method covered is bridge sampling, which uses importance functions to approximate the Bayes factor for comparing two models. Optimal bridge sampling chooses an auxiliary function that minimizes the variance of the estimate.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, harmonic means, Chib's solution, cross-model solutions, nested sampling, and ABC model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance distribution. Bridge sampling is a special case of importance sampling that uses the posterior distribution under one model as the importance distribution for another model.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, nested sampling, and approximate Bayesian computation (ABC) model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance function. Bridge sampling provides an alternative estimator that performs better when the models being compared have little overlap. Nested sampling and ABC can also be applied to Bayesian model choice by approximating the marginal likelihood of each model.
These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by explaining the challenges of performing regular Bayesian computation when likelihoods are intractable. The ABC algorithm is then introduced as a likelihood-free rejection technique that approximates the posterior distribution. ABC can be applied to model choice problems by generating parameters and data from each model. The document discusses issues with implementing ABC for model choice and extensions like weighting schemes. Gibbs random field models are also discussed where the goal is to choose between different neighborhood structures based on posterior model probabilities.
Object Detection with Discrmininatively Trained Part based Modelszukun
The document describes an object detection method using deformable part-based models that are discriminatively trained. The models consist of root filters and deformable part filters at multiple resolutions. Latent SVM training is used to learn the filters and deformation costs from weakly labeled images. The method achieved state-of-the-art results on the PASCAL object detection challenge, outperforming other methods in accuracy and speed.
This document discusses several computational methods for Bayesian model choice, including importance sampling, cross-model solutions, nested sampling, and approximate Bayesian computation (ABC) model choice. It introduces Bayes factors and marginal likelihoods as key quantities for Bayesian model comparison, and describes how Bayesian model choice involves allocating probabilities to different models and computing the marginal likelihood or evidence for each model.
1. The document discusses various importance sampling methods for Bayesian model comparison and discrimination between embedded models.
2. It compares regular importance sampling, bridge sampling, mixtures to bridge sampling, harmonic means, Chib's solution, and the Savage-Dickey ratio as solutions for approximating Bayes factors for model comparison.
3. As an example, it applies regular importance sampling to test whether a diabetes pedigree function variable is significant in a probit model for predicting diabetes in Pima Indian women, using a g-prior specification.
1. The document discusses Bayesian model choice and compares multiple computational methods for approximating model evidence, including importance sampling, bridge sampling, and nested sampling.
2. It explains that Bayesian model choice involves probabilizing models and parameters, and computing the evidence for each model, which is proportional to the marginal likelihood.
3. One method covered is bridge sampling, which uses importance functions to approximate the Bayes factor for comparing two models. Optimal bridge sampling chooses an auxiliary function that minimizes the variance of the estimate.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, harmonic means, Chib's solution, cross-model solutions, nested sampling, and ABC model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance distribution. Bridge sampling is a special case of importance sampling that uses the posterior distribution under one model as the importance distribution for another model.
The document discusses several computational methods for Bayesian model choice, including importance sampling, bridge sampling, nested sampling, and approximate Bayesian computation (ABC) model choice. Importance sampling can be used to approximate Bayes factors by sampling from an importance function. Bridge sampling provides an alternative estimator that performs better when the models being compared have little overlap. Nested sampling and ABC can also be applied to Bayesian model choice by approximating the marginal likelihood of each model.
These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by explaining the challenges of performing regular Bayesian computation when likelihoods are intractable. The ABC algorithm is then introduced as a likelihood-free rejection technique that approximates the posterior distribution. ABC can be applied to model choice problems by generating parameters and data from each model. The document discusses issues with implementing ABC for model choice and extensions like weighting schemes. Gibbs random field models are also discussed where the goal is to choose between different neighborhood structures based on posterior model probabilities.
Object Detection with Discrmininatively Trained Part based Modelszukun
The document describes an object detection method using deformable part-based models that are discriminatively trained. The models consist of root filters and deformable part filters at multiple resolutions. Latent SVM training is used to learn the filters and deformation costs from weakly labeled images. The method achieved state-of-the-art results on the PASCAL object detection challenge, outperforming other methods in accuracy and speed.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. ABC methods allow Bayesian inference when the likelihood function is intractable or unavailable. The ABC algorithm samples parameter values from the prior that produce simulated data close to the observed data. This provides an approximation to the posterior distribution. ABC can also be used for model choice by sampling models from their prior and parameters from each model's prior. The frequency with which each model is sampled approximates the posterior model probabilities. The document provides an example of using ABC for choosing between neighborhood structures in Gibbs random field models.
This document summarizes research on using deformable models for object recognition. It discusses using deformable part models to detect objects by optimizing part locations. Efficient algorithms like dynamic programming and min-convolutions are used for matching. Non-rigid objects are modeled using triangulated polygons that can deform individual triangles. Hierarchical shape models capture shape variations. The document applies these techniques to the PASCAL visual object recognition challenge, achieving state-of-the-art results on 10 of 20 object categories through discriminatively trained, multiscale deformable part models.
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
The document discusses differential processing on triangular meshes, including defining functions on meshes, local averaging operators, gradient and Laplacian operators, and proving that the normalized Laplacian is symmetric and positive definite using the properties of the gradient and local connectivity of the mesh. Operators like the Laplacian can be used to smooth functions defined on meshes through diffusion.
1. The progress report outlines work on a discriminatively trained, multiscale, deformable part model for object detection.
2. Modifications are proposed to optimize the model's function, use lower dimensional but more informative features, and predict bounding boxes.
3. Adding contextual information is also discussed to help rescore detections using surrounding detections from other models.
This document discusses approximate Bayesian computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that match the observed data closely according to some distance measure and tolerance level. The document outlines the basic ABC algorithm and discusses some advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including the tolerance level in the inferential framework. It also provides examples of applying ABC to problems like inferring the number of socks in a drawer from an observation and simulating the outcome of a historical naval battle.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
ABC methods can be used for Bayesian model choice between multiple models. The ABC model choice algorithm samples models from their prior probabilities and parameters from the corresponding prior distributions. Simulated data is generated and accepted if the distance between summary statistics of the simulated and observed data is below a tolerance level. As the number of simulations increases, the ABC approximation of the Bayes factor converges to the true Bayes factor. Sufficient statistics for individual models may not be sufficient for the joint model and parameters.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
1. The document discusses the author's research in three areas: graph-based clustering methods, approximate Bayesian computation (ABC), and Bayesian computation using empirical likelihood.
2. For graph-based clustering, the author presents asymptotic results for spectral clustering as the number of data points and bandwidth approach infinity.
3. For ABC, the author discusses sequential ABC algorithms and challenges of model choice and high-dimensional summary statistics. Machine learning methods are proposed to analyze simulated ABC data.
4. For empirical likelihood, the author proposes using it for Bayesian computation when the likelihood is intractable and simulation is infeasible, as it provides correct confidence intervals unlike composite likelihoods.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
This document provides an overview of machine learning and web search, beginning with an introduction to classification and regression problems in machine learning. It then discusses specific algorithms for classification including Naive Bayes classifiers, logistic regression, and decision trees. For each algorithm, it provides a brief explanation of how the algorithm works and examples of its application to text classification problems.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
1. Geodesic sampling and meshing techniques can be used to generate adaptive triangulations and meshes on Riemannian manifolds based on a metric tensor.
2. Anisotropic metrics can be defined to generate meshes adapted to features like edges in images or curvature on surfaces. Triangles will be elongated along strong features to better approximate functions.
3. Farthest point sampling can be used to generate well-spaced point distributions over manifolds according to a metric, which can then be triangulated using geodesic Delaunay refinement.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
1. The document discusses likelihood-free computational statistics methods for Bayesian inference when the likelihood function is intractable. It covers approximate Bayesian computation (ABC), ABC model choice, and Bayesian computation using empirical likelihood.
2. ABC approximates the posterior distribution by simulating data under different parameter values and retaining simulations that best match the observed data. ABC model choice extends this to model selection problems.
3. Empirical likelihood provides an alternative to ABC by reconstructing a likelihood function from independent blocks of data, allowing faster Bayesian inference without loss of information from summary statistics.
Slides: A glance at information-geometric signal processingFrank Nielsen
This document discusses information geometry and its applications in statistical signal processing. It introduces several key concepts:
1) Statistical signal processing models data with probability distributions like Gaussians and histograms. Information geometry provides a geometric framework for intuitive reasoning about these statistical models.
2) Exponential family mixture models generalize Gaussian and Rayleigh mixtures and are algorithmically useful in dually flat spaces.
3) Distances between statistical models, like Kullback-Leibler divergence and Bregman divergences, can be interpreted geometrically in terms of convex conjugates and Legendre transformations.
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by introducing ABC as a technique for performing Bayesian inference when the likelihood function is intractable. It then describes how ABC can be used for model choice by sampling models from their prior and parameters from the prior conditional on the model. The document outlines the generic ABC algorithm for model choice and discusses how it estimates posterior model probabilities. It also discusses applications of ABC for model choice in Gibbs random fields and Potts models.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. ABC methods allow Bayesian inference when the likelihood function is intractable or unavailable. The ABC algorithm samples parameter values from the prior that produce simulated data close to the observed data. This provides an approximation to the posterior distribution. ABC can also be used for model choice by sampling models from their prior and parameters from each model's prior. The frequency with which each model is sampled approximates the posterior model probabilities. The document provides an example of using ABC for choosing between neighborhood structures in Gibbs random field models.
This document summarizes research on using deformable models for object recognition. It discusses using deformable part models to detect objects by optimizing part locations. Efficient algorithms like dynamic programming and min-convolutions are used for matching. Non-rigid objects are modeled using triangulated polygons that can deform individual triangles. Hierarchical shape models capture shape variations. The document applies these techniques to the PASCAL visual object recognition challenge, achieving state-of-the-art results on 10 of 20 object categories through discriminatively trained, multiscale deformable part models.
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
The document discusses differential processing on triangular meshes, including defining functions on meshes, local averaging operators, gradient and Laplacian operators, and proving that the normalized Laplacian is symmetric and positive definite using the properties of the gradient and local connectivity of the mesh. Operators like the Laplacian can be used to smooth functions defined on meshes through diffusion.
1. The progress report outlines work on a discriminatively trained, multiscale, deformable part model for object detection.
2. Modifications are proposed to optimize the model's function, use lower dimensional but more informative features, and predict bounding boxes.
3. Adding contextual information is also discussed to help rescore detections using surrounding detections from other models.
This document discusses approximate Bayesian computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that match the observed data closely according to some distance measure and tolerance level. The document outlines the basic ABC algorithm and discusses some advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including the tolerance level in the inferential framework. It also provides examples of applying ABC to problems like inferring the number of socks in a drawer from an observation and simulating the outcome of a historical naval battle.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
ABC methods can be used for Bayesian model choice between multiple models. The ABC model choice algorithm samples models from their prior probabilities and parameters from the corresponding prior distributions. Simulated data is generated and accepted if the distance between summary statistics of the simulated and observed data is below a tolerance level. As the number of simulations increases, the ABC approximation of the Bayes factor converges to the true Bayes factor. Sufficient statistics for individual models may not be sufficient for the joint model and parameters.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
1. The document discusses the author's research in three areas: graph-based clustering methods, approximate Bayesian computation (ABC), and Bayesian computation using empirical likelihood.
2. For graph-based clustering, the author presents asymptotic results for spectral clustering as the number of data points and bandwidth approach infinity.
3. For ABC, the author discusses sequential ABC algorithms and challenges of model choice and high-dimensional summary statistics. Machine learning methods are proposed to analyze simulated ABC data.
4. For empirical likelihood, the author proposes using it for Bayesian computation when the likelihood is intractable and simulation is infeasible, as it provides correct confidence intervals unlike composite likelihoods.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
This document provides an overview of machine learning and web search, beginning with an introduction to classification and regression problems in machine learning. It then discusses specific algorithms for classification including Naive Bayes classifiers, logistic regression, and decision trees. For each algorithm, it provides a brief explanation of how the algorithm works and examples of its application to text classification problems.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
1. Geodesic sampling and meshing techniques can be used to generate adaptive triangulations and meshes on Riemannian manifolds based on a metric tensor.
2. Anisotropic metrics can be defined to generate meshes adapted to features like edges in images or curvature on surfaces. Triangles will be elongated along strong features to better approximate functions.
3. Farthest point sampling can be used to generate well-spaced point distributions over manifolds according to a metric, which can then be triangulated using geodesic Delaunay refinement.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
1. The document discusses likelihood-free computational statistics methods for Bayesian inference when the likelihood function is intractable. It covers approximate Bayesian computation (ABC), ABC model choice, and Bayesian computation using empirical likelihood.
2. ABC approximates the posterior distribution by simulating data under different parameter values and retaining simulations that best match the observed data. ABC model choice extends this to model selection problems.
3. Empirical likelihood provides an alternative to ABC by reconstructing a likelihood function from independent blocks of data, allowing faster Bayesian inference without loss of information from summary statistics.
Slides: A glance at information-geometric signal processingFrank Nielsen
This document discusses information geometry and its applications in statistical signal processing. It introduces several key concepts:
1) Statistical signal processing models data with probability distributions like Gaussians and histograms. Information geometry provides a geometric framework for intuitive reasoning about these statistical models.
2) Exponential family mixture models generalize Gaussian and Rayleigh mixtures and are algorithmically useful in dually flat spaces.
3) Distances between statistical models, like Kullback-Leibler divergence and Bregman divergences, can be interpreted geometrically in terms of convex conjugates and Legendre transformations.
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
This document discusses using a maximum entropy approach to model loss distributions for operational risk modeling. It begins by motivating the need to accurately model loss distributions given challenges with limited datasets, including heavy tails and dependence between risks. It then provides an overview of the loss distribution approach commonly used in operational risk modeling and its limitations. The document introduces the maximum entropy approach, which frames the problem as maximizing entropy subject to moment constraints. It discusses using the Laplace transform of loss distributions to compress information into moments and how these can be estimated from sample data or fitted distributions to serve as constraints for the maximum entropy approach.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins by introducing ABC as a technique for performing Bayesian inference when the likelihood function is intractable. It then describes how ABC can be used for model choice by sampling models from their prior and parameters from the prior conditional on the model. The document outlines the generic ABC algorithm for model choice and discusses how it estimates posterior model probabilities. It also discusses applications of ABC for model choice in Gibbs random fields and Potts models.
This document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice. It begins with an overview of ABC and how it can be used for model choice by simulating parameters from each model's prior and accepting simulations that meet a tolerance threshold based on summary statistics. It then discusses how ABC can estimate posterior model probabilities and issues that can arise, such as whether tolerances and summary statistics should vary across models. The document also discusses how ABC performs in limiting cases and when summary statistics are sufficient. ABC provides a way to perform Bayesian model choice and estimation when likelihoods are intractable.
This document provides an overview of Approximate Bayesian Computation (ABC) methods for Bayesian model choice. ABC methods allow Bayesian inference when the likelihood function is intractable or unavailable. The ABC algorithm works by simulating parameters from the prior and accepting simulations where the simulated and observed data are close according to some distance measure and tolerance level. ABC outputs an approximation of the posterior distribution. An example application is presented for choosing a probit model for diabetes risk using data on Pima Indian women.
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsChristian Robert
ABC methods provide a way to perform Bayesian inference when the likelihood function is intractable or impossible to compute directly. The basic ABC algorithm works by simulating parameters from the prior and simulating data from those parameters, accepting the parameters if the simulated data is "close" to the actual observed data according to some distance measure and tolerance level.
Later advances include ABC-MCMC which uses an MCMC approach to sample from the posterior, and ABC-NP which adjusts the parameters to better match the observed data rather than rejecting simulations. Other variants such as ABC-SMC and ABC-μ extend the framework to include sequential Monte Carlo methods or jointly model the intractable parameters. Overall, ABC methods provide a
Link-based document classification using Bayesian NetworksAlfonso E. Romero
This document summarizes a paper on using a Bayesian network model for link-based text classification. The authors propose learning the interactions between document categories from data using a Bayesian network, rather than fixing the network structure. They model the probability of a document's category given both its content and the categories of linked documents. Experimental results show their Bayesian network approach improves over baseline classifiers, particularly providing around a 10% boost when used with an OR gate classifier. The authors conclude the model is valuable and parametrizable, and they discuss exploring other base classifiers and more experiments.
The document discusses approximate Bayesian computation (ABC) methods for Bayesian model choice when likelihoods are intractable. It introduces ABC, which approximates posteriors by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure. It then discusses applying ABC to model choice by simulating parameters and data from different candidate models. The document provides an example of using ABC for an MA time series model and compares distance measures. It also outlines a generic ABC algorithm for Bayesian model choice.
This document provides a summary of Lecture 10 on Bayesian decision theory and Naive Bayes machine learning algorithms. It begins with a recap of Lecture 9 on using probability to classify patterns into categories. It then discusses how to apply these probabilistic concepts to both nominal and continuous variables. A medical example is presented to illustrate Bayesian classification. The document concludes by explaining the Naive Bayes algorithm for classification tasks and providing a worked example of how it is trained and makes predictions.
This document provides an overview of Bayes classifiers and Naive Bayes classifiers. It begins by introducing probabilistic classification and the goal of predicting a categorical output given input attributes. It then discusses the Naive Bayes assumption that attributes are conditionally independent given the class label. This allows estimating each p(attribute|class) separately rather than the full joint distribution. The document covers maximum likelihood estimation, Laplace smoothing, and using Naive Bayes for problems like spam filtering. It contrasts generative models like Naive Bayes that model p(attribute|class) with discriminative approaches that directly model p(class|attributes).
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
Similar to Computational tools for Bayesian model choice (12)
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
SWOT analysis in the project Keeping the Memory @live.pptx
Computational tools for Bayesian model choice
1. On some computational methods for Bayesian model choice
On some computational methods for Bayesian
model choice
Christian P. Robert
CREST-INSEE and Universit´ Paris Dauphine
e
http://www.ceremade.dauphine.fr/~xian
Joint work with Nicolas Chopin and Jean-Michel Marin
2. On some computational methods for Bayesian model choice
Outline
Introduction
1
Importance sampling solutions
2
Cross-model solutions
3
Nested sampling
4
Mixture example
5
3. On some computational methods for Bayesian model choice
Introduction
Bayes factor
Bayes factor
Definition (Bayes factors)
For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior
π(Θ0 )π0 (θ) + π(Θc )π1 (θ) ,
0
central quantity
f (x|θ)π0 (θ)dθ
π(Θ0 |x) π(Θ0 ) Θ0
B01 = =
π(Θc |x) π(Θc )
0 0 f (x|θ)π1 (θ)dθ
Θc
0
[Jeffreys, 1939]
4. On some computational methods for Bayesian model choice
Introduction
Bayes factor
Self-contained concept
Outside decision-theoretic environment:
eliminates impact of π(Θ0 ) but depends on the choice of
(π0 , π1 )
Bayesian/marginal equivalent to the likelihood ratio
Jeffreys’ scale of evidence:
if log10 (B10 ) between 0 and 0.5, evidence against H0 weak,
π
if log10 (B10 ) 0.5 and 1, evidence substantial,
π
if log10 (B10 ) 1 and 2, evidence strong and
π
if log10 (B10 ) above 2, evidence decisive
π
Requires the computation of the marginal/evidence under
both hypotheses/models
5. On some computational methods for Bayesian model choice
Introduction
Model choice
Model choice and model comparison
Choice between models
Several models available for the same observation
Mi : x ∼ fi (x|θi ), i∈I
where I can be finite or infinite
6. On some computational methods for Bayesian model choice
Introduction
Model choice
Bayesian resolution
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
Θj
j
take largest π(Mi |x) to determine “best” model,
or use averaged predictive
fj (x′ |θj )πj (θj |x)dθj
π(Mj |x)
Θj
j
7. On some computational methods for Bayesian model choice
Introduction
Model choice
Bayesian resolution
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
Θj
j
take largest π(Mi |x) to determine “best” model,
or use averaged predictive
fj (x′ |θj )πj (θj |x)dθj
π(Mj |x)
Θj
j
8. On some computational methods for Bayesian model choice
Introduction
Model choice
Bayesian resolution
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
Θj
j
take largest π(Mi |x) to determine “best” model,
or use averaged predictive
fj (x′ |θj )πj (θj |x)dθj
π(Mj |x)
Θj
j
9. On some computational methods for Bayesian model choice
Introduction
Model choice
Bayesian resolution
Probabilise the entire model/parameter space
allocate probabilities pi to all models Mi
define priors πi (θi ) for each parameter space Θi
compute
pi fi (x|θi )πi (θi )dθi
Θi
π(Mi |x) =
pj fj (x|θj )πj (θj )dθj
Θj
j
take largest π(Mi |x) to determine “best” model,
or use averaged predictive
fj (x′ |θj )πj (θj |x)dθj
π(Mj |x)
Θj
j
10. On some computational methods for Bayesian model choice
Introduction
Evidence
Evidence
All these problems end up with a similar quantity, the evidence
Z= π(θ)L(θ) dθ,
aka the marginal likelihood.
11. On some computational methods for Bayesian model choice
Importance sampling solutions
Regular importance
Bridge sampling
If
π1 (θ1 |x) ∝ π1 (θ1 |x)
˜
π2 (θ2 |x) ∝ π2 (θ2 |x)
˜
live on the same space, then
n
π1 (θi |x)
1 ˜
≈ θi ∼ π2 (θ|x)
B12
π2 (θi |x)
n ˜
i=1
[Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
13. On some computational methods for Bayesian model choice
Importance sampling solutions
Regular importance
Optimal bridge sampling
The optimal choice of auxiliary function α
n1 + n2
α⋆ =
n1 π1 (θ|x) + n2 π2 (θ|x)
˜ ˜
leading to
n1
π2 (θ1i |x)
1 ˜
n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
n1 ˜ ˜
i=1
B12 ≈ n2
π1 (θ2i |x)
1 ˜
n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
n2 ˜ ˜
i=1
Back later!
14. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z from a posterior sample
Use of the identity
ϕ(θ) ϕ(θ) π(θ)L(θ) 1
Eπ x= dθ =
π(θ)L(θ) π(θ)L(θ) Z Z
no matter what the proposal ϕ(θ) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of MCMC output
RB-RJ
15. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z from a posterior sample
Use of the identity
ϕ(θ) ϕ(θ) π(θ)L(θ) 1
Eπ x= dθ =
π(θ)L(θ) π(θ)L(θ) Z Z
no matter what the proposal ϕ(θ) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of MCMC output
RB-RJ
16. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
π(θ)L(θ) for the approximation
T
ϕ(θ(t) )
1
Z1 = 1
π(θ(t) )L(θ(t) )
T
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
17. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
π(θ)L(θ) for the approximation
T
ϕ(θ(t) )
1
Z1 = 1
π(θ(t) )L(θ(t) )
T
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
18. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Comparison with regular importance sampling (cont’d)
Compare Z1 with a standard importance sampling approximation
T
π(θ(t) )L(θ(t) )
1
Z2 =
ϕ(θ(t) )
T
t=1
where the θ(t) ’s are generated from the density ϕ(θ) (with fatter
tails like t’s)
19. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z using a mixture representation
Bridge sampling redux
Design a specific mixture for simulation [importance sampling]
purposes, with density
ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) ,
˜
where ϕ(θ) is arbitrary (but normalised)
Note: ω1 is not a probability weight
20. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z using a mixture representation
Bridge sampling redux
Design a specific mixture for simulation [importance sampling]
purposes, with density
ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) ,
˜
where ϕ(θ) is arbitrary (but normalised)
Note: ω1 is not a probability weight
21. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
Take δ (t) = 1 with probability
1
ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )
and δ (t) = 2 otherwise;
If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
2
MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
with the posterior π(θ|x) ∝ π(θ)L(θ);
If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
3
22. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
Take δ (t) = 1 with probability
1
ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )
and δ (t) = 2 otherwise;
If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
2
MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
with the posterior π(θ|x) ∝ π(θ)L(θ);
If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
3
23. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Approximating Z using a mixture representation (cont’d)
Corresponding MCMC (=Gibbs) sampler
At iteration t
Take δ (t) = 1 with probability
1
ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )
and δ (t) = 2 otherwise;
If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
2
MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
with the posterior π(θ|x) ∝ π(θ)L(θ);
If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
3
24. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Evidence approximation by mixtures
Rao-Blackwellised estimate
T
ˆ1 ω1 π(θ(t) )L(θ(t) ) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) ,
ξ=
T
t=1
converges to ω1 Z/{ω1 Z + 1}
ˆ
ˆ ˆ ˆ
Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie
T (t) (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
t=1 ω1 π(θ )L(θ )
ˆ
Z3 =
T (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
t=1 ϕ(θ )
[Bridge sampler]
25. On some computational methods for Bayesian model choice
Importance sampling solutions
Harmonic means
Evidence approximation by mixtures
Rao-Blackwellised estimate
T
ˆ1 ω1 π(θ(t) )L(θ(t) ) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) ,
ξ=
T
t=1
converges to ω1 Z/{ω1 Z + 1}
ˆ
ˆ ˆ ˆ
Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie
T (t) (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
t=1 ω1 π(θ )L(θ )
ˆ
Z3 =
T (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
t=1 ϕ(θ )
[Bridge sampler]
26. On some computational methods for Bayesian model choice
Importance sampling solutions
Chib’s solution
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
mk (x) = ,
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
mk (x) =
ˆ .
ˆ∗
πk (θk |x)
27. On some computational methods for Bayesian model choice
Importance sampling solutions
Chib’s solution
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
mk (x) = ,
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
mk (x) =
ˆ .
ˆ∗
πk (θk |x)
28. On some computational methods for Bayesian model choice
Importance sampling solutions
Chib’s solution
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
1 (t)
ˆ∗ ∗
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are the latent variables simulated by a Gibbs
sampler.
29. On some computational methods for Bayesian model choice
Importance sampling solutions
Chib’s solution
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
1 (t)
˜∗ ∗
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
30. On some computational methods for Bayesian model choice
Importance sampling solutions
Chib’s solution
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
1 (t)
˜∗ ∗
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
31. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Reversible jump
Idea: Set up a proper measure–theoretic framework for designing
moves between models Mk
[Green, 1995]
Create a reversible kernel K on H = k {k} × Θk such that
K(x, dy)π(x)dx = K(y, dx)π(y)dy
A B B A
for the invariant density π [x is of the form (k, θ(k) )]
32. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Reversible jump
Idea: Set up a proper measure–theoretic framework for designing
moves between models Mk
[Green, 1995]
Create a reversible kernel K on H = k {k} × Θk such that
K(x, dy)π(x)dx = K(y, dx)π(y)dy
A B B A
for the invariant density π [x is of the form (k, θ(k) )]
33. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Local moves
For a move between two models, M1 and M2 , the Markov chain
being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
the corresponding kernels, under the detailed balance condition
π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,
and take, wlog, dim(M2 ) > dim(M1 ).
Proposal expressed as
θ2 = Ψ1→2 (θ1 , v1→2 )
where v1→2 is a random variable of dimension
dim(M2 ) − dim(M1 ), generated as
v1→2 ∼ ϕ1→2 (v1→2 ) .
34. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Local moves
For a move between two models, M1 and M2 , the Markov chain
being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
the corresponding kernels, under the detailed balance condition
π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,
and take, wlog, dim(M2 ) > dim(M1 ).
Proposal expressed as
θ2 = Ψ1→2 (θ1 , v1→2 )
where v1→2 is a random variable of dimension
dim(M2 ) − dim(M1 ), generated as
v1→2 ∼ ϕ1→2 (v1→2 ) .
35. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Local moves (2)
In this case, q1→2 (θ1 , dθ2 ) has density
−1
∂Ψ1→2 (θ1 , v1→2 )
ϕ1→2 (v1→2 ) ,
∂(θ1 , v1→2 )
by the Jacobian rule.
Reverse importance link
If probability ̟1→2 of choosing move to M2 while in M1 ,
acceptance probability reduces to
π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 )
α(θ1 , v1→2 ) = 1∧ .
π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 )
c Difficult calibration
36. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Local moves (2)
In this case, q1→2 (θ1 , dθ2 ) has density
−1
∂Ψ1→2 (θ1 , v1→2 )
ϕ1→2 (v1→2 ) ,
∂(θ1 , v1→2 )
by the Jacobian rule.
Reverse importance link
If probability ̟1→2 of choosing move to M2 while in M1 ,
acceptance probability reduces to
π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 )
α(θ1 , v1→2 ) = 1∧ .
π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 )
c Difficult calibration
37. On some computational methods for Bayesian model choice
Cross-model solutions
Reversible jump
Local moves (2)
In this case, q1→2 (θ1 , dθ2 ) has density
−1
∂Ψ1→2 (θ1 , v1→2 )
ϕ1→2 (v1→2 ) ,
∂(θ1 , v1→2 )
by the Jacobian rule.
Reverse importance link
If probability ̟1→2 of choosing move to M2 while in M1 ,
acceptance probability reduces to
π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 )
α(θ1 , v1→2 ) = 1∧ .
π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 )
c Difficult calibration
38. On some computational methods for Bayesian model choice
Cross-model solutions
Saturation schemes
Alternative
Saturation of the parameter space H = k {k} × Θk by creating
a model index M
pseudo-priors πj (θj |M = k) for j = k
[Carlin & Chib, 1995]
Validation by
π(M = k|y) = P (M = k|y, θ)π(θ|y)dθ = Zk
where the (marginal) posterior is
D
π(θ|y) = π(θ, M = k|y)
k=1
D
̺k mk (y) πk (θk |y) πj (θj |M = k) .
=
k=1 j=k
39. On some computational methods for Bayesian model choice
Cross-model solutions
Saturation schemes
Alternative
Saturation of the parameter space H = k {k} × Θk by creating
a model index M
pseudo-priors πj (θj |M = k) for j = k
[Carlin & Chib, 1995]
Validation by
π(M = k|y) = P (M = k|y, θ)π(θ|y)dθ = Zk
where the (marginal) posterior is
D
π(θ|y) = π(θ, M = k|y)
k=1
D
̺k mk (y) πk (θk |y) πj (θj |M = k) .
=
k=1 j=k
40. On some computational methods for Bayesian model choice
Cross-model solutions
Saturation schemes
MCMC implementation
(t) (t)
Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
distribution π(θ, M = k|y) by
Pick M (t) = k with probability P (θ(t−1) , M = k|y)
1
(t−1)
from the posterior πk (θk |y) [or MCMC step]
Generate θk
2
(t−1)
(j = k) from the pseudo-prior πj (θj |M = k)
Generate θj
3
Approximate π(M = k|y) = Zk by
T
(t) (t) (t)
̺k (y) ∝ ̺k πj (θj |M = k)
ˇ fk (y|θk ) πk (θk )
t=1 j=k
D
(t) (t) (t)
πj (θj |M = ℓ)
̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
ℓ=1 j=ℓ
41. On some computational methods for Bayesian model choice
Cross-model solutions
Saturation schemes
MCMC implementation
(t) (t)
Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
distribution π(θ, M = k|y) by
Pick M (t) = k with probability P (θ(t−1) , M = k|y)
1
(t−1)
from the posterior πk (θk |y) [or MCMC step]
Generate θk
2
(t−1)
(j = k) from the pseudo-prior πj (θj |M = k)
Generate θj
3
Approximate π(M = k|y) = Zk by
T
(t) (t) (t)
̺k (y) ∝ ̺k πj (θj |M = k)
ˇ fk (y|θk ) πk (θk )
t=1 j=k
D
(t) (t) (t)
πj (θj |M = ℓ)
̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
ℓ=1 j=ℓ
42. On some computational methods for Bayesian model choice
Cross-model solutions
Saturation schemes
MCMC implementation
(t) (t)
Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
distribution π(θ, M = k|y) by
Pick M (t) = k with probability P (θ(t−1) , M = k|y)
1
(t−1)
from the posterior πk (θk |y) [or MCMC step]
Generate θk
2
(t−1)
(j = k) from the pseudo-prior πj (θj |M = k)
Generate θj
3
Approximate π(M = k|y) = Zk by
T
(t) (t) (t)
̺k (y) ∝ ̺k πj (θj |M = k)
ˇ fk (y|θk ) πk (θk )
t=1 j=k
D
(t) (t) (t)
πj (θj |M = ℓ)
̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
ℓ=1 j=ℓ
43. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Scott’s (2002) proposal
Suggest estimating P (M = k|y) by
T
D
(t) (t)
̺k (y) ∝ ̺k
˜ f (y|θk ) ̺j fj (y|θj ) ,
k
t=1 j=1
based on D simultaneous and independent MCMC chains
(t)
1 ≤ k ≤ D,
(θk )t ,
with stationary distributions πk (θk |y) [instead of above joint]
44. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Scott’s (2002) proposal
Suggest estimating P (M = k|y) by
T
D
(t) (t)
̺k (y) ∝ ̺k
˜ f (y|θk ) ̺j fj (y|θj ) ,
k
t=1 j=1
based on D simultaneous and independent MCMC chains
(t)
1 ≤ k ≤ D,
(θk )t ,
with stationary distributions πk (θk |y) [instead of above joint]
45. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Congdon’s (2006) extension
Selecting flat [prohibited!] pseudo-priors, uses instead
T
D
(t) (t) (t) (t)
̺k (y) ∝ ̺k
ˆ fk (y|θk )πk (θk ) ̺j fj (y|θj )πj (θj ) ,
t=1 j=1
(t)
where again the θk ’s are MCMC chains with stationary
distributions πk (θk |y)
46. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Examples
Example (Model choice)
Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model
M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on
both models: ̺1 = ̺2 = 0.5.
Approximations of π(M = 1|y):
Scott’s (2002) (green), and
Congdon’s (2006) (brown)
(N = 106 simulations).
47. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Examples
Example (Model choice)
Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model
M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on
both models: ̺1 = ̺2 = 0.5.
Approximations of π(M = 1|y):
Scott’s (2002) (green), and
Congdon’s (2006) (brown)
(N = 106 simulations).
48. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Examples (2)
Example (Model choice (2))
Normal model M1 : y ∼ N (θ, 1) with θ ∼ N (0, 1) vs. normal
model M2 : y ∼ N (θ, 1) with θ ∼ N (5, 1)
Comparison of both
approximations with
π(M = 1|y): Scott’s (2002)
(green and mixed dashes) and
Congdon’s (2006) (brown and
long dashes) (N = 104
simulations).
49. On some computational methods for Bayesian model choice
Cross-model solutions
Implementation error
Examples (3)
Example (Model choice (3))
Model M1 : y ∼ N (0, 1/ω) with ω ∼ Exp(a) vs.
M2 : exp(y) ∼ Exp(λ) with λ ∼ Exp(b).
Comparison of Congdon’s (2006)
(brown and dashed lines) with
π(M = 1|y) when (a, b) is equal
to (.24, 8.9), (.56, .7), (4.1, .46)
and (.98, .081), resp. (N = 104
simulations).
50. On some computational methods for Bayesian model choice
Nested sampling
Purpose
Nested sampling: Goal
Skilling’s (2007) technique using the one-dimensional
representation:
1
Z = Eπ [L(θ)] = ϕ(x) dx
0
with
ϕ−1 (l) = P π (L(θ) > l).
Note; ϕ(·) is intractable in most cases.
51. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Nested sampling: First approximation
Approximate Z by a Riemann sum:
j
(xi−1 − xi )ϕ(xi )
Z=
i=1
where the xi ’s are either:
xi = e−i/N
deterministic:
or random:
ti ∼ Be(N, 1)
x0 = 0, xi+1 = ti xi ,
so that E[log xi ] = −i/N .
52. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
e−θ dθ =
Z= e e = Eδ e
δ δ
N
1
ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
Z=
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
53. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
e−θ dθ =
Z= e e = Eδ e
δ δ
N
1
ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
Z=
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
54. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Extraneous white noise
Take
1 −(1−δ)θ −δθ 1 −(1−δ)θ
e−θ dθ =
Z= e e = Eδ e
δ δ
N
1
ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 )
Z=
N
i=1
N deterministic random
50 4.64 10.5
4.65 10.5
100 2.47 4.9 Comparison of variances and MSEs
2.48 5.02
500 .549 1.01
.550 1.14
55. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
Take ϕi = L(θk ), where θk is the point with smallest
1
likelihood in the pool of θi ’s
Replace θk with a sample from the prior constrained to
2
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
56. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
Take ϕi = L(θk ), where θk is the point with smallest
1
likelihood in the pool of θi ’s
Replace θk with a sample from the prior constrained to
2
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
57. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Nested sampling: Second approximation
Replace (intractable) ϕ(xi ) by ϕi , obtained by
Nested sampling
Start with N values θ1 , . . . , θN sampled from π
At iteration i,
Take ϕi = L(θk ), where θk is the point with smallest
1
likelihood in the pool of θi ’s
Replace θk with a sample from the prior constrained to
2
L(θ) > ϕi : the current N points are sampled from prior
constrained to L(θ) > ϕi .
58. On some computational methods for Bayesian model choice
Nested sampling
Implementation
Nested sampling: Third approximation
Iterate the above steps until a given stopping iteration j is
reached: e.g.,
observe very small changes in the approximation Z;
reach the maximal value of L(θ) when the likelihood is
bounded and its maximum is known;
truncate the integral Z at level ǫ, i.e. replace
1 1
ϕ(x) dx with ϕ(x) dx
0 ǫ
59. On some computational methods for Bayesian model choice
Nested sampling
Error rates
Approximation error
Error = Z − Z
j 1 ǫ
(xi−1 − xi )ϕi − ϕ(x) dx = −
= ϕ(x) dx
0 0
i=1
j 1
(xi−1 − xi )ϕ(xi ) −
+ ϕ(x) dx (Quadrature Error)
ǫ
i=1
j
(xi−1 − xi ) {ϕi − ϕ(xi )}
+ (Stochastic Error)
i=1
[Dominated by Monte Carlo!]
60. On some computational methods for Bayesian model choice
Nested sampling
Error rates
A CLT for the Stochastic Error
The (dominating) stochastic error is OP (N −1/2 ):
D
N 1/2 {Stochastic Error} → N (0, V )
with
sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt.
V =−
s,t∈[ǫ,1]
[Proof based on Donsker’s theorem]
The number of simulated points equals the number of iterations j,
and is a multiple of N : if one stops at first iteration j such that
e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
61. On some computational methods for Bayesian model choice
Nested sampling
Error rates
A CLT for the Stochastic Error
The (dominating) stochastic error is OP (N −1/2 ):
D
N 1/2 {Stochastic Error} → N (0, V )
with
sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt.
V =−
s,t∈[ǫ,1]
[Proof based on Donsker’s theorem]
The number of simulated points equals the number of iterations j,
and is a multiple of N : if one stops at first iteration j such that
e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
62. On some computational methods for Bayesian model choice
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
asymptotic variance of the NS estimator;
1
number of iterations (necessary to reach a given truncation
2
error);
cost of one simulated sample.
3
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
63. On some computational methods for Bayesian model choice
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
asymptotic variance of the NS estimator;
1
number of iterations (necessary to reach a given truncation
2
error);
cost of one simulated sample.
3
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
64. On some computational methods for Bayesian model choice
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
asymptotic variance of the NS estimator;
1
number of iterations (necessary to reach a given truncation
2
error);
cost of one simulated sample.
3
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
65. On some computational methods for Bayesian model choice
Nested sampling
Impact of dimension
Curse of dimension
For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
the following 3 quantities are O(d):
asymptotic variance of the NS estimator;
1
number of iterations (necessary to reach a given truncation
2
error);
cost of one simulated sample.
3
Therefore, CPU time necessary for achieving error level e is
O(d3 /e2 )
66. On some computational methods for Bayesian model choice
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
67. On some computational methods for Bayesian model choice
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
68. On some computational methods for Bayesian model choice
Nested sampling
Constraints
Sampling from constr’d priors
Exact simulation from the constrained prior is intractable in most
cases!
Skilling (2007) proposes to use MCMC, but:
this introduces a bias (stopping rule).
if MCMC stationary distribution is unconst’d prior, more and
more difficult to sample points such that L(θ) > l as l
increases.
If implementable, then slice sampler can be devised at the same
cost!
69. On some computational methods for Bayesian model choice
Nested sampling
Constraints
Illustration of MCMC bias
Log-relative error against d (left), avg. number of iterations (right)
vs dimension d, for a Gaussian-Gaussian model with d parameters,
when using T = 10 iterations of the Gibbs sampler.
70. On some computational methods for Bayesian model choice
Nested sampling
Importance variant
A IS variant of nested sampling
˜
Consider instrumental prior π and likelihood L, weight function
π(θ)L(θ)
w(θ) =
π(θ)L(θ)
and weighted NS estimator
j
(xi−1 − xi )ϕi w(θi ).
Z=
i=1
Then choose (π, L) so that sampling from π constrained to
L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
71. On some computational methods for Bayesian model choice
Nested sampling
Importance variant
A IS variant of nested sampling
˜
Consider instrumental prior π and likelihood L, weight function
π(θ)L(θ)
w(θ) =
π(θ)L(θ)
and weighted NS estimator
j
(xi−1 − xi )ϕi w(θi ).
Z=
i=1
Then choose (π, L) so that sampling from π constrained to
L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
72. On some computational methods for Bayesian model choice
Mixture example
Benchmark: Target distribution
Posterior distribution on (µ, σ) associated with the mixture
pN (0, 1) + (1 − p)N (µ, σ) ,
when p is known
73. On some computational methods for Bayesian model choice
Mixture example
Experiment
n observations with
µ = 2 and σ = 3/2,
Use of a uniform prior
both on (−2, 6) for µ
and on (.001, 16) for
log σ 2 .
occurrences of posterior
bursts for µ = xi
computation of the
various estimates of Z
74. On some computational methods for Bayesian model choice
Mixture example
Experiment (cont’d)
Nested sampling sequence
MCMC sample for n = 16
with M = 1000 starting points.
observations from the mixture.
75. On some computational methods for Bayesian model choice
Mixture example
Experiment (cont’d)
Nested sampling sequence
MCMC sample for n = 50
with M = 1000 starting points.
observations from the mixture.
76. On some computational methods for Bayesian model choice
Mixture example
Comparison
Monte Carlo and MCMC (=Gibbs) outputs based on T = 104
simulations and numerical integration based on a 850 × 950 grid in
the (µ, σ) parameter space.
Nested sampling approximation based on a starting sample of
M = 1000 points followed by at least 103 further simulations from
the constr’d prior and a stopping rule at 95% of the observed
maximum likelihood.
Constr’d prior simulation based on 50 values simulated by random
walk accepting only steps leading to a lik’hood higher than the
bound
77. On some computational methods for Bayesian model choice
Mixture example
Comparison (cont’d)
Graph based on a sample of 10 observations for µ = 2 and
σ = 3/2 (150 replicas).
78. On some computational methods for Bayesian model choice
Mixture example
Comparison (cont’d)
Graph based on a sample of 50 observations for µ = 2 and
σ = 3/2 (150 replicas).
79. On some computational methods for Bayesian model choice
Mixture example
Comparison (cont’d)
Graph based on a sample of 100 observations for µ = 2 and
σ = 3/2 (150 replicas).
80. On some computational methods for Bayesian model choice
Mixture example
Comparison (cont’d)
Nested sampling gets less reliable as sample size increases
Most reliable approach is mixture Z3 although harmonic solution
Z1 close to Chib’s solution [taken as golden standard]
Monte Carlo method Z2 also producing poor approximations to Z
(Kernel φ used in Z2 is a t non-parametric kernel estimate with
standard bandwidth estimation.)