This document provides an introduction to advanced Markov chain Monte Carlo (MCMC) methods. It begins with a motivating example using mixture models that have latent variables, making the likelihood intractable. This introduces challenges for Bayesian computation. The document then describes the Metropolis-Hastings algorithm, which allows generating samples from a target distribution using an ergodic Markov chain, even when direct sampling is impossible. Several extensions and properties of the Metropolis-Hastings algorithm are discussed.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as computational difficulties that arise in models with latent variables like mixture models. It then discusses likelihood-based and Bayesian approaches, noting limitations of maximum likelihood methods. Conjugate priors are described that allow tractable Bayesian inference for some simple models. However, conjugate priors are not available for more complex models, motivating the use of MCMC methods which can approximate integrals and distributions of interest for more complex models.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as dealing with latent variable models where the likelihood function is intractable. It then covers random variable generation techniques before introducing the key MCMC algorithms: the Metropolis-Hastings algorithm and the Gibbs sampler. The document outlines the remaining topics to be covered, which include Monte Carlo integration, notions of Markov chains, and further advanced topics.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
This document discusses Markov chain Monte Carlo (MCMC) methods. It begins with an outline of the Metropolis-Hastings algorithm, which is a generic MCMC method for obtaining a sequence of random samples from a probability distribution when direct sampling is difficult. The document then provides details on the Metropolis-Hastings algorithm, including its convergence properties. It also discusses the independent Metropolis-Hastings algorithm as a special case and provides an example to illustrate it.
This document discusses computational issues in Bayesian statistics and introduces Markov chain Monte Carlo (MCMC) methods. It covers the following key points in three sentences:
MCMC methods like the Metropolis-Hastings algorithm and Gibbs sampler are introduced as techniques to sample from posterior distributions when direct calculation is intractable, as is often the case with latent variable models and mixture models. Latent variable and mixture models are provided as examples where the likelihood function contains too many terms to compute directly, necessitating approximate Bayesian computational methods. The autoregressive (AR) model is also presented as involving intractable likelihoods due to the inclusion of multiple lag terms that increase the number of parameters and terms to compute as the lag length grows.
This document discusses Markov chain Monte Carlo (MCMC) and likelihood-free methods. It begins with an introduction to computational issues in Bayesian statistics and an overview of MCMC methods like the Metropolis-Hastings algorithm and Gibbs sampler. Latent variable models are discussed as an example where computation becomes difficult due to integration over latent variables. Mixture models are provided as a specific example where the likelihood involves a prohibitive number of terms for direct computation in large samples.
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloXin-She Yang
Pseudorandom
Pseudorandom The document discusses Monte Carlo methods and Markov chain Monte Carlo (MCMC). It provides examples of using Monte Carlo simulations to estimate pi and solve Buffon's needle problem. It also discusses random walks in Markov chains, the PageRank algorithm used by Google, and challenges with high-dimensional integrals and distributions that do not have a closed-form inverse. MCMC methods are presented as a way to address these challenges.
The document provides an introduction to Markov Chain Monte Carlo (MCMC) methods. It discusses using MCMC to sample from distributions when direct sampling is difficult. Specifically, it introduces Gibbs sampling and the Metropolis-Hastings algorithm. Gibbs sampling updates variables one at a time based on their conditional distributions. Metropolis-Hastings proposes candidate samples and accepts or rejects them to converge to the target distribution. The document provides examples and outlines the algorithms to construct Markov chains that sample distributions of interest.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as computational difficulties that arise in models with latent variables like mixture models. It then discusses likelihood-based and Bayesian approaches, noting limitations of maximum likelihood methods. Conjugate priors are described that allow tractable Bayesian inference for some simple models. However, conjugate priors are not available for more complex models, motivating the use of MCMC methods which can approximate integrals and distributions of interest for more complex models.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as dealing with latent variable models where the likelihood function is intractable. It then covers random variable generation techniques before introducing the key MCMC algorithms: the Metropolis-Hastings algorithm and the Gibbs sampler. The document outlines the remaining topics to be covered, which include Monte Carlo integration, notions of Markov chains, and further advanced topics.
Those are the slides for my Master course on Monte Carlo Statistical Methods given in conjunction with the Monte Carlo Statistical Methods book with George Casella.
This document discusses Markov chain Monte Carlo (MCMC) methods. It begins with an outline of the Metropolis-Hastings algorithm, which is a generic MCMC method for obtaining a sequence of random samples from a probability distribution when direct sampling is difficult. The document then provides details on the Metropolis-Hastings algorithm, including its convergence properties. It also discusses the independent Metropolis-Hastings algorithm as a special case and provides an example to illustrate it.
This document discusses computational issues in Bayesian statistics and introduces Markov chain Monte Carlo (MCMC) methods. It covers the following key points in three sentences:
MCMC methods like the Metropolis-Hastings algorithm and Gibbs sampler are introduced as techniques to sample from posterior distributions when direct calculation is intractable, as is often the case with latent variable models and mixture models. Latent variable and mixture models are provided as examples where the likelihood function contains too many terms to compute directly, necessitating approximate Bayesian computational methods. The autoregressive (AR) model is also presented as involving intractable likelihoods due to the inclusion of multiple lag terms that increase the number of parameters and terms to compute as the lag length grows.
This document discusses Markov chain Monte Carlo (MCMC) and likelihood-free methods. It begins with an introduction to computational issues in Bayesian statistics and an overview of MCMC methods like the Metropolis-Hastings algorithm and Gibbs sampler. Latent variable models are discussed as an example where computation becomes difficult due to integration over latent variables. Mixture models are provided as a specific example where the likelihood involves a prohibitive number of terms for direct computation in large samples.
Monte Caro Simualtions, Sampling and Markov Chain Monte CarloXin-She Yang
Pseudorandom
Pseudorandom The document discusses Monte Carlo methods and Markov chain Monte Carlo (MCMC). It provides examples of using Monte Carlo simulations to estimate pi and solve Buffon's needle problem. It also discusses random walks in Markov chains, the PageRank algorithm used by Google, and challenges with high-dimensional integrals and distributions that do not have a closed-form inverse. MCMC methods are presented as a way to address these challenges.
The document provides an introduction to Markov Chain Monte Carlo (MCMC) methods. It discusses using MCMC to sample from distributions when direct sampling is difficult. Specifically, it introduces Gibbs sampling and the Metropolis-Hastings algorithm. Gibbs sampling updates variables one at a time based on their conditional distributions. Metropolis-Hastings proposes candidate samples and accepts or rejects them to converge to the target distribution. The document provides examples and outlines the algorithms to construct Markov chains that sample distributions of interest.
This document provides an introduction to Bayesian analysis and Metropolis-Hastings Markov chain Monte Carlo (MCMC). It explains the foundations of Bayesian analysis and how MCMC sampling methods like Metropolis-Hastings can be used to draw samples from posterior distributions that are intractable. The Metropolis-Hastings algorithm works by constructing a Markov chain with the target distribution as its stationary distribution. The document provides an example of using MCMC to perform linear regression in a Bayesian framework.
Why should you care about Markov Chain Monte Carlo methods?
→ They are in the list of "Top 10 Algorithms of 20th Century"
→ They allow you to make inference with Bayesian Networks
→ They are used everywhere in Machine Learning and Statistics
Markov Chain Monte Carlo methods are a class of algorithms used to sample from complicated distributions. Typically, this is the case of posterior distributions in Bayesian Networks (Belief Networks).
These slides cover the following topics.
→ Motivation and Practical Examples (Bayesian Networks)
→ Basic Principles of MCMC
→ Gibbs Sampling
→ Metropolis–Hastings
→ Hamiltonian Monte Carlo
→ Reversible-Jump Markov Chain Monte Carlo
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
Markov chain Monte Carlo (MCMC) methods are commonly used to approximate properties of target probability distributions. However, MCMC estimators are generally biased for any fixed number of samples. The document discusses various techniques for constructing unbiased estimators from MCMC output, including regeneration, sequential Monte Carlo samplers, and coupled Markov chains. Specifically, running two Markov chains in parallel and taking the difference in their values at meeting times can yield an unbiased estimator, though certain conditions must hold.
Differential analyses of structures in HiC datatuxette
When Hi-C matrices are collected from two different conditions, methods can compare the matrices to identify regions with significant structural differences between conditions. TADpole and TADcompare are two available methods. TADpole represents hierarchical TAD structures and detects differences by computing a difference index between normalized binarized matrices. TADcompare represents Hi-C matrices as networks and uses the eigenvectors of the graph Laplacian and gap scores to define boundaries and detect differential boundaries between conditions. Both methods were shown to recover known breakpoints and have boundaries enriched for biological marks.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
A short and naive introduction to using network in prediction modelstuxette
The document provides an introduction to using network information in prediction models. It discusses representing a network as a graph with a Laplacian matrix. The Laplacian captures properties like random walks on the graph and heat diffusion. Eigenvectors of the Laplacian related to small eigenvalues are strongly tied to graph structure. The document discusses using the Laplacian in prediction models by working in the feature space defined by the Laplacian eigenvectors or directly regularizing a linear model with the Laplacian. This introduces network information and encourages similar contributions from connected nodes. The approaches are applied to problems like predicting phenotypes from gene expression using a known gene network.
From RNN to neural networks for cyclic undirected graphstuxette
This document discusses different neural network methods for processing graph-structured data. It begins by describing recurrent neural networks (RNNs) and their limitations for graphs, such as an inability to handle undirected or cyclic graphs. It then summarizes two alternative approaches: one that uses contraction maps to allow recurrent updates on arbitrary graphs, and one that employs a constructive architecture with frozen neurons to avoid issues with cycles. Both methods aim to make predictions at the node or graph level on relational data like molecules or web pages.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
The Metropolis Hastings algorithm is an MCMC method for obtaining a sequence of samples from a probability distribution when direct sampling is difficult. It constructs a Markov chain that has the desired target distribution as its stationary distribution. At each step, a candidate sample is generated and either accepted, replacing the current state, or rejected, keeping the current state. The acceptance ratio is determined by the ratio of probabilities of the candidate and current states. The algorithm is a generalization of the Metropolis algorithm that allows for non-symmetric proposal distributions. When the chain satisfies ergodicity conditions, the sample distribution will converge to the target distribution as the number of samples increases.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
On Meme Self-Adaptation in Spatially-Structured Multimemetic AlgorithmsRafael Nogueras
This document summarizes a paper that examines meme self-adaptation in spatially-structured multimemetic algorithms. It introduces key concepts like memes, memetic algorithms, and multimemetic algorithms. It then describes the model used, which represents memes as rewriting rules of variable length and uses a spatial structure with neighborhoods. The document outlines the experimental setup, benchmark problems, and presents results showing that the spatially-structured approach finds better solutions and the optimum more often than a panmictic approach.
This document discusses macrocanonical models for texture synthesis. It begins by introducing the goal of texture synthesis and providing a brief history. It then describes the parametric question of combining randomness and structure in images. Specifically, it discusses maximizing entropy under geometric constraints. The document goes on to discuss links to statistical physics, defining microcanonical and macrocanonical models. It focuses on studying the macrocanonical model, describing how to find optimal parameters through gradient descent and how to sample from the model using Langevin dynamics. The document provides examples of texture synthesis and compares results to other methods.
Supervised Planetary Unmixing with Optimal TransportSina Nakhostin
This document presents a supervised planetary unmixing method using optimal transport. It introduces using the Wasserstein distance as a metric for comparing spectral signatures, which is defined over probability distributions and can account for shifts in frequency. The method formulates unmixing as an optimization problem that matches spectra to a dictionary using the Wasserstein distance, while also incorporating an abundance prior. Preliminary experiments on Vesta asteroid data show the abundance maps produced with this optimal transport approach.
1. The document presents Plug-and-Play priors for Bayesian imaging using Langevin-based sampling methods.
2. It introduces the Bayesian framework for image restoration and discusses challenges in modeling the prior.
3. A Plug-and-Play approach is proposed that uses an implicit prior defined by a denoising network in conjunction with Langevin sampling, termed PnP-ULA. Experiments demonstrate its effectiveness on image deblurring and inpainting tasks.
1. Rao-Blackwellisation can be applied to any Hastings-Metropolis algorithm to produce a more efficient Markov chain Monte Carlo (MCMC) method.
2. It works by breaking the state space into two components and analytically integrating over one of the components to reduce variance.
3. This approach takes advantage of parallel computing capabilities like GPUs in a basic way.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
This document provides information about a computational stochastic processes course, including lecture details, prerequisites, syllabus, and examples. The key points are:
- Lectures will cover Monte Carlo simulation, stochastic differential equations, Markov chain Monte Carlo methods, and inference for stochastic processes.
- Prerequisites include probability, stochastic processes, and programming.
- Assessments will include a coursework and exam. The coursework will involve computational problems in Python, Julia, R, or similar languages.
- Motivating examples discussed include using Monte Carlo methods to evaluate high-dimensional integrals and simulating Langevin dynamics in statistical physics.
This document provides an introduction to Bayesian analysis and Metropolis-Hastings Markov chain Monte Carlo (MCMC). It explains the foundations of Bayesian analysis and how MCMC sampling methods like Metropolis-Hastings can be used to draw samples from posterior distributions that are intractable. The Metropolis-Hastings algorithm works by constructing a Markov chain with the target distribution as its stationary distribution. The document provides an example of using MCMC to perform linear regression in a Bayesian framework.
Why should you care about Markov Chain Monte Carlo methods?
→ They are in the list of "Top 10 Algorithms of 20th Century"
→ They allow you to make inference with Bayesian Networks
→ They are used everywhere in Machine Learning and Statistics
Markov Chain Monte Carlo methods are a class of algorithms used to sample from complicated distributions. Typically, this is the case of posterior distributions in Bayesian Networks (Belief Networks).
These slides cover the following topics.
→ Motivation and Practical Examples (Bayesian Networks)
→ Basic Principles of MCMC
→ Gibbs Sampling
→ Metropolis–Hastings
→ Hamiltonian Monte Carlo
→ Reversible-Jump Markov Chain Monte Carlo
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
Markov chain Monte Carlo (MCMC) methods are commonly used to approximate properties of target probability distributions. However, MCMC estimators are generally biased for any fixed number of samples. The document discusses various techniques for constructing unbiased estimators from MCMC output, including regeneration, sequential Monte Carlo samplers, and coupled Markov chains. Specifically, running two Markov chains in parallel and taking the difference in their values at meeting times can yield an unbiased estimator, though certain conditions must hold.
Differential analyses of structures in HiC datatuxette
When Hi-C matrices are collected from two different conditions, methods can compare the matrices to identify regions with significant structural differences between conditions. TADpole and TADcompare are two available methods. TADpole represents hierarchical TAD structures and detects differences by computing a difference index between normalized binarized matrices. TADcompare represents Hi-C matrices as networks and uses the eigenvectors of the graph Laplacian and gap scores to define boundaries and detect differential boundaries between conditions. Both methods were shown to recover known breakpoints and have boundaries enriched for biological marks.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
A short and naive introduction to using network in prediction modelstuxette
The document provides an introduction to using network information in prediction models. It discusses representing a network as a graph with a Laplacian matrix. The Laplacian captures properties like random walks on the graph and heat diffusion. Eigenvectors of the Laplacian related to small eigenvalues are strongly tied to graph structure. The document discusses using the Laplacian in prediction models by working in the feature space defined by the Laplacian eigenvectors or directly regularizing a linear model with the Laplacian. This introduces network information and encourages similar contributions from connected nodes. The approaches are applied to problems like predicting phenotypes from gene expression using a known gene network.
From RNN to neural networks for cyclic undirected graphstuxette
This document discusses different neural network methods for processing graph-structured data. It begins by describing recurrent neural networks (RNNs) and their limitations for graphs, such as an inability to handle undirected or cyclic graphs. It then summarizes two alternative approaches: one that uses contraction maps to allow recurrent updates on arbitrary graphs, and one that employs a constructive architecture with frozen neurons to avoid issues with cycles. Both methods aim to make predictions at the node or graph level on relational data like molecules or web pages.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
The Metropolis Hastings algorithm is an MCMC method for obtaining a sequence of samples from a probability distribution when direct sampling is difficult. It constructs a Markov chain that has the desired target distribution as its stationary distribution. At each step, a candidate sample is generated and either accepted, replacing the current state, or rejected, keeping the current state. The acceptance ratio is determined by the ratio of probabilities of the candidate and current states. The algorithm is a generalization of the Metropolis algorithm that allows for non-symmetric proposal distributions. When the chain satisfies ergodicity conditions, the sample distribution will converge to the target distribution as the number of samples increases.
The document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. It discusses using stochastic diffusions and geometric concepts to improve MCMC methods. Specifically, it proposes using discretized Langevin and Hamiltonian diffusions across a Riemann manifold as an adaptive proposal mechanism. This is founded on deterministic geodesic flows on the manifold. Examples presented include a warped bivariate Gaussian, Gaussian mixture model, and log-Gaussian Cox process.
On Meme Self-Adaptation in Spatially-Structured Multimemetic AlgorithmsRafael Nogueras
This document summarizes a paper that examines meme self-adaptation in spatially-structured multimemetic algorithms. It introduces key concepts like memes, memetic algorithms, and multimemetic algorithms. It then describes the model used, which represents memes as rewriting rules of variable length and uses a spatial structure with neighborhoods. The document outlines the experimental setup, benchmark problems, and presents results showing that the spatially-structured approach finds better solutions and the optimum more often than a panmictic approach.
This document discusses macrocanonical models for texture synthesis. It begins by introducing the goal of texture synthesis and providing a brief history. It then describes the parametric question of combining randomness and structure in images. Specifically, it discusses maximizing entropy under geometric constraints. The document goes on to discuss links to statistical physics, defining microcanonical and macrocanonical models. It focuses on studying the macrocanonical model, describing how to find optimal parameters through gradient descent and how to sample from the model using Langevin dynamics. The document provides examples of texture synthesis and compares results to other methods.
Supervised Planetary Unmixing with Optimal TransportSina Nakhostin
This document presents a supervised planetary unmixing method using optimal transport. It introduces using the Wasserstein distance as a metric for comparing spectral signatures, which is defined over probability distributions and can account for shifts in frequency. The method formulates unmixing as an optimization problem that matches spectra to a dictionary using the Wasserstein distance, while also incorporating an abundance prior. Preliminary experiments on Vesta asteroid data show the abundance maps produced with this optimal transport approach.
1. The document presents Plug-and-Play priors for Bayesian imaging using Langevin-based sampling methods.
2. It introduces the Bayesian framework for image restoration and discusses challenges in modeling the prior.
3. A Plug-and-Play approach is proposed that uses an implicit prior defined by a denoising network in conjunction with Langevin sampling, termed PnP-ULA. Experiments demonstrate its effectiveness on image deblurring and inpainting tasks.
1. Rao-Blackwellisation can be applied to any Hastings-Metropolis algorithm to produce a more efficient Markov chain Monte Carlo (MCMC) method.
2. It works by breaking the state space into two components and analytically integrating over one of the components to reduce variance.
3. This approach takes advantage of parallel computing capabilities like GPUs in a basic way.
This document discusses computational issues that arise in Bayesian statistics. It provides examples of latent variable models like mixture models that make computation difficult due to the large number of terms that must be calculated. It also discusses time series models like the AR(p) and MA(q) models, noting that they have complex parameter spaces due to stationarity constraints. The document outlines the Metropolis-Hastings algorithm, Gibbs sampler, and other methods like Population Monte Carlo and Approximate Bayesian Computation that can help address these computational challenges.
This document provides information about a computational stochastic processes course, including lecture details, prerequisites, syllabus, and examples. The key points are:
- Lectures will cover Monte Carlo simulation, stochastic differential equations, Markov chain Monte Carlo methods, and inference for stochastic processes.
- Prerequisites include probability, stochastic processes, and programming.
- Assessments will include a coursework and exam. The coursework will involve computational problems in Python, Julia, R, or similar languages.
- Motivating examples discussed include using Monte Carlo methods to evaluate high-dimensional integrals and simulating Langevin dynamics in statistical physics.
Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passin...Varad Meru
Max-product message passing algorithms are commonly used for MAP inference in MRFs. Recent work showed these algorithms can be viewed as performing block coordinate descent in a dual objective. However, existing algorithms are limited by the restricted ways they select blocks to update. The paper proposes a "Subproblem-Tree Calibration" framework that subsumes MPLP, MSD, and TRW-S as special cases and allows more flexible block selection. The algorithm represents the problem as a subproblem multi-graph and calibrates potentials on randomly selected subproblem trees via message passing, achieving dual optimality with respect to the tree's block of variables. Experimental results show the approach converges to different dual objectives than existing methods.
Markov Chain Monte Carlo (MCMC) methods use Markov chains to sample from probability distributions for use in Monte Carlo simulations. The Metropolis-Hastings algorithm proposes transitions to new states in the chain and either accepts or rejects those states based on a probability calculation, allowing it to sample from complex, high-dimensional distributions. The Gibbs sampler is a special case of MCMC where each variable is updated conditional on the current values of the other variables, ensuring all proposed moves are accepted. These MCMC methods allow approximating integrals that are difficult to compute directly.
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
This document presents an investigation of parameter estimation for the generalized extreme value distribution based on record values. Maximum likelihood estimation is used to estimate the parameters β (scale parameter) and ξ (shape parameter). Likelihood equations are derived and solved numerically. Bootstrap and Markov chain Monte Carlo methods are proposed to construct confidence intervals for the parameters since intervals based on asymptotic normality may not perform well due to small sample sizes of records. Bayesian estimation of the parameters using MCMC is also investigated. An illustrative example involving simulated records is provided.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
Stability of adaptive random-walk Metropolis algorithmsBigMC
The document discusses adaptive MCMC algorithms and their stability. It introduces the stochastic approximation framework that is commonly used to construct adaptive MCMC algorithms. It then discusses issues with stability as the adaptive parameters are updated, and how enforced stability or adaptive reprojections can help address this. Finally, it provides examples of the adaptive Metropolis algorithm and adaptive scaling Metropolis algorithm, which aim to automatically tune the proposal distribution scale parameter.
This document outlines the key concepts and objectives for understanding vibration of continuous structures like strings and cables. It derives the wave equation for a string under tension as a second order PDE. It then shows how to solve for natural frequencies and mode shapes by separating variables. Examples are given of the first mode shape and calculating the natural frequency of a piano wire. The assignment asks students to solve cable problems with different boundary conditions.
Approximation in Stochastic Integer ProgrammingSSA KPI
This document discusses approximation algorithms for stochastic integer programming problems. It begins by introducing stochastic programming models, including recourse models and hierarchical planning models. It describes the mathematical properties of continuous and mixed-integer recourse models, noting that mixed-integer recourse problems are harder than continuous recourse and most combinatorial optimization problems. The document focuses on studying approximation algorithms for stochastic integer programming that are similar in nature to approximations for combinatorial optimization problems.
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
1. Rao-Blackwellisation can be applied to any Hastings-Metropolis algorithm to produce a more efficient Markov chain Monte Carlo (MCMC) method.
2. It works by breaking the state space into two components and analytically integrating over one of the components to reduce variance.
3. This approach takes advantage of parallel computing capabilities like GPUs in a basic way.
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
Stratified Monte Carlo is proposed as a method to accelerate ABC-MCMC by reducing its computational cost. It involves partitioning the summary statistic space into strata and estimating the ABC likelihood using a stratified Monte Carlo approach based on resampling. This reduces the variance compared to using a single resampled dataset, without introducing significant bias as resampling alone would. The method is tested on a simple Gaussian example where it provides a posterior approximation closer to the true posterior than standard ABC-MCMC.
This document provides an overview of linear models for classification. It discusses discriminant functions including linear discriminant analysis and the perceptron algorithm. It also covers probabilistic generative models that model class-conditional densities and priors to estimate posterior probabilities. Probabilistic discriminative models like logistic regression directly model posterior probabilities using maximum likelihood. Iterative reweighted least squares is used to optimize logistic regression since there is no closed-form solution.
This document summarizes a talk given by Mark Girolami on manifold Monte Carlo methods. The talk discusses using concepts from Riemannian geometry to improve Markov chain Monte Carlo (MCMC) methods. It presents manifold Langevin and Hamiltonian Monte Carlo as methods that use stochastic diffusions and deterministic geodesic flows on a manifold to propose moves in MCMC. Examples applying these methods to warped distributions, Gaussian mixtures, and log-Gaussian Cox processes are also discussed. The goal is to develop more efficient MCMC techniques by exploiting the geometric structure of target distributions.
This document discusses likelihood methods for continuous-time models in finance. It describes approximating the transition density function pX of a continuous-time process through a series of transformations to get closer to a normal distribution. This allows representing pX as a series expansion involving Hermite polynomials. Computing the expansion coefficients allows obtaining an explicit closed-form approximation to pX. Maximizing the approximate likelihood results in an estimator that converges to the true MLE as the number of terms increases.
ABC with data cloning for MLE in state space modelsUmberto Picchini
An application of the "data cloning" method for parameter estimation via MLE aided by Approximate Bayesian Computation. The relevant paper is http://arxiv.org/abs/1505.06318
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...Amro Elfeki
Park, E., Elfeki, A. M. M., Dekking, F.M. (2003). Characterization of subsurface heterogeneity: Integration of soft and hard information using multi-dimensional Coupled Markov chain approach. Underground Injection Science and Technology Symposium, Lawrence Berkeley National Lab., October 22-25, 2003. p.49. Eds. Tsang, Chin.-Fu and Apps, John A.
http://www.lbl.gov/Conferences/UIST/index.html#topics
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
This is a presntation of my second year intership on optimal stochastic theory and how we can apply it on some financial application then how we can solve such problems using finite differences methods!
Enjoy it !
Similar to Introduction to advanced Monte Carlo methods (20)
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
1. An introduction to advanced (?) MCMC methods
An introduction to advanced (?) MCMC methods
Christian P. Robert
Universit´ Paris-Dauphine and CREST-INSEE
e
http://www.ceremade.dauphine.fr/~xian
Royal Statistical Society, October 13, 2010
2. An introduction to advanced (?) MCMC methods
Motivating example
Motivating example
1 Motivating example
2 The Metropolis-Hastings Algorithm
3. An introduction to advanced (?) MCMC methods
Motivating example
Latent structures make life harder!
Even simple models may lead to computational complications,
as in latent variable models
f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆
4. An introduction to advanced (?) MCMC methods
Motivating example
Latent structures make life harder!
Even simple models may lead to computational complications,
as in latent variable models
f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆
If (x, x⋆ ) observed, fine!
5. An introduction to advanced (?) MCMC methods
Motivating example
Latent structures make life harder!
Even simple models may lead to computational complications,
as in latent variable models
f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆
If (x, x⋆ ) observed, fine!
If only x observed, trouble!
6. An introduction to advanced (?) MCMC methods
Motivating example
Example (Mixture models)
Models of mixtures of distributions:
X ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
X ∼ p1 f1 (x) + · · · + pk fk (x) .
7. An introduction to advanced (?) MCMC methods
Motivating example
Example (Mixture models)
Models of mixtures of distributions:
X ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
X ∼ p1 f1 (x) + · · · + pk fk (x) .
For a sample of independent random variables (X1 , · · · , Xn ),
sample density
n
{p1 f1 (xi ) + · · · + pk fk (xi )} .
i=1
8. An introduction to advanced (?) MCMC methods
Motivating example
Example (Mixture models)
Models of mixtures of distributions:
X ∼ fj with probability pj ,
for j = 1, 2, . . . , k, with overall density
X ∼ p1 f1 (x) + · · · + pk fk (x) .
For a sample of independent random variables (X1 , · · · , Xn ),
sample density
n
{p1 f1 (xi ) + · · · + pk fk (xi )} .
i=1
Expanding this product involves k n elementary terms: prohibitive
to compute in large samples.
10. An introduction to advanced (?) MCMC methods
Motivating example
A typology of Bayes computational problems
(i) use of a complex parameter space, as for instance in
constrained parameter sets like those resulting from imposing
stationarity constraints in dynamic models;
11. An introduction to advanced (?) MCMC methods
Motivating example
A typology of Bayes computational problems
(i) use of a complex parameter space, as for instance in
constrained parameter sets like those resulting from imposing
stationarity constraints in dynamic models;
(ii) use of a complex sampling model with an intractable
likelihood, as for instance in missing data and graphical
models;
12. An introduction to advanced (?) MCMC methods
Motivating example
A typology of Bayes computational problems
(i) use of a complex parameter space, as for instance in
constrained parameter sets like those resulting from imposing
stationarity constraints in dynamic models;
(ii) use of a complex sampling model with an intractable
likelihood, as for instance in missing data and graphical
models;
(iii) use of a huge dataset;
13. An introduction to advanced (?) MCMC methods
Motivating example
A typology of Bayes computational problems
(i) use of a complex parameter space, as for instance in
constrained parameter sets like those resulting from imposing
stationarity constraints in dynamic models;
(ii) use of a complex sampling model with an intractable
likelihood, as for instance in missing data and graphical
models;
(iii) use of a huge dataset;
(iv) use of a complex prior distribution (which may be the
posterior distribution associated with an earlier sample);
14. An introduction to advanced (?) MCMC methods
Motivating example
A typology of Bayes computational problems
(i) use of a complex parameter space, as for instance in
constrained parameter sets like those resulting from imposing
stationarity constraints in dynamic models;
(ii) use of a complex sampling model with an intractable
likelihood, as for instance in missing data and graphical
models;
(iii) use of a huge dataset;
(iv) use of a complex prior distribution (which may be the
posterior distribution associated with an earlier sample);
(v) use of a complex inferential procedure as for instance, Bayes
factors
π π(θ ∈ Θ0 )
B01 (x) = P (θ ∈ Θ0 | x)/P (θ ∈ Θ1 | x) .
π(θ ∈ Θ1 )
15. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm
1 Motivating example
2 The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
The Metropolis–Hastings algorithm
A collection of Metropolis-Hastings algorithms
Extensions
Convergence assessment
16. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Fact: It is not necessary to use a sample from the distribution f to
approximate the integral
I= h(x)f (x)dx ,
17. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains
Fact: It is not necessary to use a sample from the distribution f to
approximate the integral
I= h(x)f (x)dx ,
We can obtain X1 , . . . , Xn ∼ f (approx) without directly
simulating from f , using an ergodic Markov chain with
stationary distribution f
18. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is
generated using a transition kernel with stationary distribution f
19. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Monte Carlo Methods based on Markov Chains
Running Monte Carlo via Markov Chains (2)
Idea
For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is
generated using a transition kernel with stationary distribution f
Ensures the convergence in distribution of (X (t) ) to a random
variable from f .
For a “large enough” T0 , X (T0 ) can be considered as
distributed from f
Produces a dependent sample X (T0 ) , X (T0 +1) , . . ., which is
generated from f , sufficient for most approximation purposes.
20. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The Metropolis–Hastings algorithm
Problem:
How can one build a Markov chain with a given stationary
distribution?
21. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The Metropolis–Hastings algorithm
Problem:
How can one build a Markov chain with a given stationary
distribution?
MH basics
Algorithm that converges to the objective (target) density
f
using an arbitrary transition kernel density
q(x, y)
called instrumental (or proposal) distribution
22. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
The MH algorithm
Algorithm (Metropolis–Hastings)
Given x(t) ,
1 Generate Yt ∼ q(x(t) , y).
2 Take
Yt with prob. ρ(x(t) , Yt ),
X (t+1) =
x(t) with prob. 1 − ρ(x(t) , Yt ),
where
f (y) q(y, x)
ρ(x, y) = min ,1 .
f (x) q(x, y)
23. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Features
Independent of normalizing constants for both f and q(x, ·)
(ie, those constants independent of x)
Never move to values with f (y) = 0
The chain (x(t) )t may take the same value several times in a
row, even though f is a density wrt Lebesgue measure
The sequence (yt )t is usually not a Markov chain
24. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Features
Independent of normalizing constants for both f and q(x, ·)
(ie, those constants independent of x)
Never move to values with f (y) = 0
The chain (x(t) )t may take the same value several times in a
row, even though f is a density wrt Lebesgue measure
The sequence (yt )t is usually not a Markov chain
P( θ-> θ ’)
Satisfies the detailed balance condition
θ’
θ
f (x)K(x, y) = f (y)K(y, x) P(θ’-> θ )
[Green, 1995]
25. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
1 The M-H Markov chain is reversible, with invariant/stationary
density f .
26. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
1 The M-H Markov chain is reversible, with invariant/stationary
density f .
2 As f is a probability measure, the chain is positive recurrent
27. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties
1 The M-H Markov chain is reversible, with invariant/stationary
density f .
2 As f is a probability measure, the chain is positive recurrent
3 If
f (Yt ) q(Yt , X (t) )
Pr ≥ 1 < 1. (1)
f (X (t) ) q(X (t) , Yt )
i.e., if the event {X (t+1) = X (t) } occurs with positive
probability, then the chain is aperiodic
28. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties (2)
4 If
q(x, y) > 0 for every (x, y), (2)
the chain is irreducible
29. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties (2)
4 If
q(x, y) > 0 for every (x, y), (2)
the chain is irreducible
5 For M-H, f -irreducibility implies Harris recurrence
30. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
The Metropolis–Hastings algorithm
Convergence properties (2)
4 If
q(x, y) > 0 for every (x, y), (2)
the chain is irreducible
5 For M-H, f -irreducibility implies Harris recurrence
6 Thus, under conditions (1) and (2)
(i) For h, with Ef |h(X)| < ∞,
T
1
lim h(X (t) ) = h(x)df (x) a.e. f.
T →∞ T t=1
(ii) and
lim K n (x, ·)µ(dx) − f =0
n→∞
TV
for every initial distribution µ, where K n (x, ·) denotes the
kernel for n transitions.
31. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
The Independent Case
The instrumental distribution q(x, ·) is independent of x and is
denoted g
32. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
The Independent Case
The instrumental distribution q(x, ·) is independent of x and is
denoted g
Algorithm (Independent Metropolis-Hastings)
Given x(t) ,
1 Generate Yt ∼ g(y)
2 Take
Y f (Yt ) g(x(t) )
with prob. min ,1 ,
t
X (t+1) = f (x(t) ) g(Yt )
x(t) otherwise.
33. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Properties
The resulting sample is not iid
34. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Properties
The resulting sample is not iid but there exist strong convergence
properties:
Theorem (Ergodicity)
The algorithm produces a uniformly ergodic chain if there exists a
constant M such that
f (x) ≤ M g(x) , x ∈ supp f.
In this case,
n
1
K n (x, ·) − f TV ≤ 1− .
M
[Mengersen & Tweedie, 1996]
35. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Noisy AR(1))
Hidden Markov chain from a regular AR(1) model,
xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ 2 )
and observables
yt |xt ∼ N (x2 , σ 2 )
t
36. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Noisy AR(1))
Hidden Markov chain from a regular AR(1) model,
xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ 2 )
and observables
yt |xt ∼ N (x2 , σ 2 )
t
The distribution of xt given xt−1 , xt+1 and yt is
−1 τ2
exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2
t .
2τ 2 σ2
37. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Noisy AR(1) too)
2
Use for proposal the N (µt , ωt ) distribution, with
xt−1 + xt+1 2 τ2
µt = ϕ and ωt = .
1 + ϕ2 1 + ϕ2
38. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Noisy AR(1) too)
2
Use for proposal the N (µt , ωt ) distribution, with
xt−1 + xt+1 2 τ2
µt = ϕ and ωt = .
1 + ϕ2 1 + ϕ2
Ratio
π(x)/qind (x) = exp −(yt − x2 )2 /2σ 2
t
is bounded
39. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
(top) Last 500 realisations of the chain {Xk }k out of 10, 000
iterations; (bottom) histogram of the chain, compared with
the target distribution.
40. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk Metropolis–Hastings
Instead, use a local perturbation as proposal
Yt = X (t) + εt ,
where εt ∼ g, independent of X (t) .
The instrumental density is now of the form g(y − x) and the
Markov chain is a random walk if g is symmetric
g(x) = g(−x)
41. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Algorithm (Random walk Metropolis)
Given x(t)
1 Generate Yt ∼ g(y − x(t) )
2 Take
f (Yt )
Y with prob. min 1, ,
(t+1) t
X = f (x(t) )
(t)
x otherwise.
42. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Probit illustration
Likelihood and posterior given by
n
π(β|y, X) ∝ ℓ(β|y, X) ∝ Φ(xiT β)yi (1 − Φ(xiT β))ni −yi .
i=1
under the flat prior
43. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Probit illustration
Likelihood and posterior given by
n
π(β|y, X) ∝ ℓ(β|y, X) ∝ Φ(xiT β)yi (1 − Φ(xiT β))ni −yi .
i=1
under the flat prior
A random walk proposal works well for a small number of
ˆ
predictors. Use the maximum likelihood estimate β as starting
ˆ
value and asymptotic (Fisher) covariance matrix of the MLE, Σ, as
scale
44. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
MCMC algorithm
Probit random-walk Metropolis-Hastings
ˆ ˆ
Initialization: Set β (0) = β and compute Σ
Iteration t:
1 ˜ ˆ
Generate β ∼ Nk+1 (β (t−1) , τ Σ)
2 Compute
˜
π(β|y)
˜
ρ(β (t−1) , β) = min 1,
π(β (t−1) |y)
3 ˜ ˜
With probability ρ(β (t−1) , β) set β (t) = β;
otherwise set β (t) = β (t−1) .
45. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
R bank benchmark
Probit modelling with
no intercept over the
0.8
−1.0
1.0
0.4
four measurements.
−2.0
0.0
0.0
0 4000 8000 −2.0 −1.5 −1.0 −0.5 0 200 600 1000
Three different scales
3
0.0 0.4 0.8
τ = 1, 0.1, 10: best
2
0.4
1
mixing behavior is
−1
0.0
0 4000 8000 −1 0 1 2 3 0 200 600 1000
associated with τ = 1.
2.5
0.8
0.0 0.4 0.8
−0.5 1.0
Average of the
0.4
0.0
parameters over 1.8
0 4000 8000 −0.5 0.5 1.5 2.5 0 200 600 1000
MCMC 9, 000
0.0 0.4 0.8
2.0
1.2
1.0
iterations gives plug-in
0.0
0.6
0 4000 8000 0.6 1.0 1.4 1.8 0 200 600 1000
estimate
pi = Φ (−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4 ) .
ˆ
46. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Mixture models)
n k
π(θ|x) ∝ pℓ f (xj |µℓ , σℓ ) π(θ)
j=1 ℓ=1
47. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Example (Mixture models)
n k
π(θ|x) ∝ pℓ f (xj |µℓ , σℓ ) π(θ)
j=1 ℓ=1
Metropolis-Hastings proposal:
θ(t) + ωε(t) if u(t) < ρ(t)
θ(t+1) =
θ(t) otherwise
where
π(θ(t) + ωε(t) |x)
ρ(t) = ∧1
π(θ(t) |x)
and ω scaled for good acceptance rate
48. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
and scale 1
Iteration 1
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
49. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
and scale 1
Iteration 10
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
50. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
and scale 1
Iteration 100
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
51. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
and scale 1
Iteration 500
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
52. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
and scale 1
Iteration 1000
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
53. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 10
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
54. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 100
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
55. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 500
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
56. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 1000
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
57. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 10,000
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
58. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Random walk MCMC output for
.7N (µ1 , 1) + .3N (µ2 , 1)
√
and scale .1
Iteration 5000
4
3
2
µ2
1
0
−1
−1 0 1 2 3 4
µ1
59. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Convergence properties
Uniform ergodicity prohibited by random walk structure
60. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
Convergence properties
Uniform ergodicity prohibited by random walk structure
At best, geometric ergodicity:
Theorem (Sufficient ergodicity)
For a symmetric density f , log-concave in the tails, and a positive
and symmetric density g, the chain (X (t) ) is geometrically ergodic.
[Mengersen & Tweedie, 1996]
no tail effect
61. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
A collection of Metropolis-Hastings algorithms
1.5
1.5
1.0
1.0
Example (Comparison of tail
effects)
0.5
0.5
0.0
0.0
Random-walk
Metropolis–Hastings algorithms
-0.5
-0.5
based on a N (0, 1) instrumental
-1.0
-1.0
for the generation of (a) a
-1.5
-1.5
N (0, 1) distribution and (b) a 0 50 100
(a)
150 200 0 50 100
(b)
150 200
distribution with density 90% confidence envelopes of
ψ(x) ∝ (1 + |x|)−3 the means, derived from 500
parallel independent chains
62. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Extensions
There are many other families of HM algorithms
Adaptive Rejection Metropolis Sampling
Reversible Jump
Langevin algorithms
to name just a few...
63. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Langevin Algorithms
Proposal based on the Langevin diffusion Lt is defined by the
stochastic differential equation
1
dLt = dBt + ∇ log f (Lt )dt,
2
where Bt is the standard Brownian motion
Theorem
The Langevin diffusion is the only non-explosive diffusion which is
reversible with respect to f .
64. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Because continuous time cannot be simulated, consider the
discretised sequence
σ2
x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip )
2
where σ 2 corresponds to the discretisation step
0.6
0.5
0.4
Example of
Density
0.3
f (x) = exp(−x4 )
0.2
0.1
0.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
σ2 = .1
65. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Because continuous time cannot be simulated, consider the
discretised sequence
σ2
x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip )
2
where σ 2 corresponds to the discretisation step
0.6
0.5
0.4
Example of
0.3
f (x) = exp(−x4 )
0.2
0.1
0.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
σ2 = .01
66. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Because continuous time cannot be simulated, consider the
discretised sequence
σ2
x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip )
2
where σ 2 corresponds to the discretisation step
0.6
0.5
0.4
Example of
Density
0.3
f (x) = exp(−x4 )
0.2
0.1
0.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
σ2 = .001
67. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Because continuous time cannot be simulated, consider the
discretised sequence
σ2
x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip )
2
where σ 2 corresponds to the discretisation step
0.8
0.6
Example of
Density
0.4
f (x) = exp(−x4 )
0.2
0.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
σ2 = .0001
68. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Because continuous time cannot be simulated, consider the
discretised sequence
σ2
x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip )
2
where σ 2 corresponds to the discretisation step
0.6
0.5
0.4
Example of
Density
0.3
f (x) = exp(−x4 )
0.2
0.1
0.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
σ2 = .0001∗
69. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Discretization
Unfortunately, the discretized chain may be transient, for instance
when
lim σ 2 ∇ log f (x)|x|−1 > 1
x→±∞
Example of f (x) = exp(−x4 ) when σ 2 = .2
70. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
MH correction
Accept the new value Yt with probability
2
σ2
exp − Yt − x(t) − (t)
2 ∇ log f (x ) 2σ 2
f (Yt )
· ∧1.
f (x(t) ) σ2
2
exp − x(t) − Yt − 2 ∇ log f (Yt ) 2σ 2
Choice of the scaling factor σ
Should lead to an acceptance rate of 0.574 to achieve optimal
convergence rates (when the components of x are uncorrelated)
[Roberts & Rosenthal, 1998]
71. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Optimizing the Acceptance Rate
Problem of choice of the transition kernel from a practical point of
view
Most common alternatives:
1 a fully automated algorithm like ARMS;
2 an instrumental density g which approximates f , such that
f /g is bounded for uniform ergodicity to apply;
3 a random walk
In both cases (b) and (c), the choice of g is critical,
72. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Case of the random walk
Different approach to acceptance rates
A high acceptance rate does not indicate that the algorithm is
moving correctly since it indicates that the random walk is moving
too slowly on the surface of f .
73. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Case of the random walk
Different approach to acceptance rates
A high acceptance rate does not indicate that the algorithm is
moving correctly since it indicates that the random walk is moving
too slowly on the surface of f .
If x(t) and yt are close, i.e. f (x(t) ) ≃ f (yt ) y is accepted with
probability
f (yt )
min ,1 ≃ 1 .
f (x(t) )
For multimodal densities with well separated modes, the negative
effect of limited moves on the surface of f clearly shows.
74. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Case of the random walk (2)
If the average acceptance rate is low, the successive values of f (yt )
tend to be small compared with f (x(t) ), which means that the
random walk moves quickly on the surface of f since it often
reaches the “borders” of the support of f
75. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Rule of thumb
In small dimensions, aim at an average acceptance rate of 50%. In
large dimensions, at an average acceptance rate of 25%.
[Gelman,Gilks and Roberts, 1995]
76. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Rule of thumb
In small dimensions, aim at an average acceptance rate of 50%. In
large dimensions, at an average acceptance rate of 25%.
[Gelman,Gilks and Roberts, 1995]
This rule is to be taken with a pinch of salt!
77. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Example (Noisy AR(1) continued)
For a Gaussian random walk with scale ω small enough, the
random walk never jumps to the other mode. But if the scale ω is
sufficiently large, the Markov chain explores both modes and give a
satisfactory approximation of the target distribution.
78. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Markov chain based on a random walk with scale ω = .1
79. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Markov chain based on a random walk with scale ω = .5
80. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Where do we stand?
MCMC in a nutshell:
81. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Where do we stand?
MCMC in a nutshell:
Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation
to target density f when detailed balance condition holds
f (x)K(x, y) = f (y)K(y, x)
82. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Where do we stand?
MCMC in a nutshell:
Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation
to target density f when detailed balance condition holds
f (x)K(x, y) = f (y)K(y, x)
Easiest implementation of the principle is random walk
Metropolis-Hastings
Yt = X (t) + εt
83. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Extensions
Where do we stand?
MCMC in a nutshell:
Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation
to target density f when detailed balance condition holds
f (x)K(x, y) = f (y)K(y, x)
Easiest implementation of the principle is random walk
Metropolis-Hastings
Yt = X (t) + εt
Practical convergence requires sufficient energy from the
proposal that is calibrated by trial and error.
84. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Convergence diagnostics
How many iterations?
85. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Convergence diagnostics
How many iterations?
Rule # 1 There is no absolute number of simulations, i.e.
1, 000 is neither large, nor small.
Rule # 2 It takes [much] longer to check for convergence
than for the chain itself to converge.
Rule # 3 MCMC is a “what-you-get-is-what-you-see”
algorithm: it fails to tell about unexplored parts of the space.
Rule # 4 When in doubt, run MCMC chains in parallel and
check for consistency.
86. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Convergence diagnostics
How many iterations?
Rule # 1 There is no absolute number of simulations, i.e.
1, 000 is neither large, nor small.
Rule # 2 It takes [much] longer to check for convergence
than for the chain itself to converge.
Rule # 3 MCMC is a “what-you-get-is-what-you-see”
algorithm: it fails to tell about unexplored parts of the space.
Rule # 4 When in doubt, run MCMC chains in parallel and
check for consistency.
Many “quick-&-dirty” solutions in the literature, but not
necessarily 100% trustworthy.
87. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Example (Bimodal target)
0.4
Density
0.3
exp −x2 /2 4(x − .3)2 + .01
0.2
f (x) = √ .
4(1 + (.3)2 ) + .01
0.1
2π
0.0
−4 −2 0 2 4
and use of random walk Metropolis–Hastings algorithm with
variance .04
Evaluation of the missing mass by
T −1
[θ(t+1) − θ(t) ] f (θ(t) )
t=1
88. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
1.0
0.8
0.6
mass
0.4
0.2
0.0
0 500 1000 1500 2000
Index
Sequence [in blue] and mass evaluation [in brown]
[Philippe & Robert, 2001]
89. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Effective sample size
How many iid simulations from π are equivalent to N simulations
from the MCMC algorithm?
90. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Effective sample size
How many iid simulations from π are equivalent to N simulations
from the MCMC algorithm?
Based on estimated k-th order auto-correlation,
ρk = cov x(t) , x(t+k) ,
effective sample size
T0 −1/2
ess
N =n 1+2 ρk
ˆ ,
k=1
Only partial indicator that fails to signal chains stuck in one
mode of the target
91. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Tempering
Facilitate exploration of π by flattening the target: simulate from
πα (x) ∝ π(x)α for α > 0 small enough
92. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Tempering
Facilitate exploration of π by flattening the target: simulate from
πα (x) ∝ π(x)α for α > 0 small enough
Determine where the modal regions of π are (possibly with
parallel versions using different α’s)
Recycle simulations from π(x)α into simulations from π by
importance sampling
Simple modification of the Metropolis–Hastings algorithm,
with new acceptance
α
π(θ′ |x) q(θ|θ′ )
∧1
π(θ|x) q(θ′ |θ)
93. An introduction to advanced (?) MCMC methods
The Metropolis-Hastings Algorithm
Convergence assessment
Tempering with the mean mixture
1 0.5 0.2
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
−1
−1
−1
−1 0 1 2 3 4 −1 0 1 2 3 4 −1 0 1 2 3 4