Ito Lemması olarak bilinen stokastik analiz ve modelleme tekniği, stokastik modellerin çözümünde olağanüstü katkı sağlayan bir yaklaşım olarak yaygın bir şekilde uygulanıyor.
Finansal analiz genel olarak dinamik olasılıklar kuramı olan Stochastic Processes teknikleri ile yapılır. Bunların en basiti Wiener Süreci olarak bilinen Brown Hareketidir. Finans modelleri genellikle Geometrik Brown Hareketi olarak bilinin model etrafında şekillenir.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
Phylogenetic models and MCMC methods for the reconstruction of language historyRobin Ryder
The document summarizes a phylogenetic model and Markov chain Monte Carlo (MCMC) methods for reconstructing language history from linguistic data. The model treats languages as diverging over time like species in a phylogenetic tree. MCMC is used to infer rates of language change and divergence times. Analysis of Indo-European language data strongly supported an Anatolian root dating to around 8000 years ago, rather than the alternative Kurgan hypothesis. The methods were shown to be robust even with simulated borrowing between languages.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
Ito Lemması olarak bilinen stokastik analiz ve modelleme tekniği, stokastik modellerin çözümünde olağanüstü katkı sağlayan bir yaklaşım olarak yaygın bir şekilde uygulanıyor.
Finansal analiz genel olarak dinamik olasılıklar kuramı olan Stochastic Processes teknikleri ile yapılır. Bunların en basiti Wiener Süreci olarak bilinen Brown Hareketidir. Finans modelleri genellikle Geometrik Brown Hareketi olarak bilinin model etrafında şekillenir.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
Phylogenetic models and MCMC methods for the reconstruction of language historyRobin Ryder
The document summarizes a phylogenetic model and Markov chain Monte Carlo (MCMC) methods for reconstructing language history from linguistic data. The model treats languages as diverging over time like species in a phylogenetic tree. MCMC is used to infer rates of language change and divergence times. Analysis of Indo-European language data strongly supported an Anatolian root dating to around 8000 years ago, rather than the alternative Kurgan hypothesis. The methods were shown to be robust even with simulated borrowing between languages.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
The document discusses Monte Carlo, an Indian clothing brand. It describes how Monte Carlo was launched in 1984 and has since grown to have a strong presence in the woolen segment. The company is now entering the kid's segment, which accounts for 43% of the market share. To tap into this opportunity, Monte Carlo launched a new brand called "Tween Monte-Carlo" focused on kids' fashion. While big competitors exist, the document suggests Monte Carlo's strategies of unique pricing, strong positioning, innovative products, and focused marketing have allowed it to succeed in the kid's segment where other brands have failed.
The document discusses approximate Bayesian computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to some distance measure. The document covers the basic ABC algorithm, convergence properties as the tolerance approaches zero, examples of ABC for probit models and MA time series models, and advances such as modifying the proposal distribution to increase efficiency.
This document provides an overview of ABC methodology and applications. It begins with examples from population genetics and econometrics that are well-suited for ABC. It then describes the basic ABC algorithm for Bayesian inference using simulation: specifying prior distributions, simulating data under different parameter values, and accepting simulations that best match the observed data. Indirect inference is also discussed as a method for choosing informative summary statistics for ABC. The document traces the origins of ABC to population genetics models from the late 1990s and highlights ongoing contributions from that field to ABC methodology.
Uniform and non-uniform pseudo random numbers generators for high dimensional...LEBRUN Régis
This document outlines various topics related to pseudo-random number generators (PRNGs). It begins by discussing uniform PRNGs and the goal of approximating independent and uniformly distributed random variables. It then discusses linear congruential generators and multiplicative congruential generators as examples of uniform PRNGs. It notes some weaknesses of these generators, such as short periods and poor distribution in high dimensions. Finally, it briefly discusses statistical tests that can be used to validate PRNGs, such as the gap test and spectral test.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. It proposes:
1. Using random forests to infer a model from summary statistics, as random forests can handle a large number of statistics and find efficient combinations.
2. Replacing estimates of posterior model probabilities, which are poorly approximated, with posterior predictive expected losses to evaluate models.
3. An example comparing MA(1) and MA(2) time series models using two autocorrelations as summaries, finding embedded models and that random forests perform similarly to other methods on small problems.
This document discusses Markov chain Monte Carlo (MCMC) methods. It begins with an outline of the Metropolis-Hastings algorithm, which is a generic MCMC method for obtaining a sequence of random samples from a probability distribution when direct sampling is difficult. The document then provides details on the Metropolis-Hastings algorithm, including its convergence properties. It also discusses the independent Metropolis-Hastings algorithm as a special case and provides an example to illustrate it.
The document discusses the history and importance of chocolate in human civilization. It notes that chocolate originated in Mesoamerica over 3000 years ago and was prized by the Aztecs and Mayans for its taste. Cocoa beans were used as currency and their cultivation was tightly regulated. The Spanish brought cocoa to Europe in the 16th century, starting its global spread and the development of the chocolate industry.
This document provides an overview of Markov chain Monte Carlo (MCMC) methods. It begins with motivations for using MCMC, such as computational difficulties that arise in models with latent variables like mixture models. It then discusses likelihood-based and Bayesian approaches, noting limitations of maximum likelihood methods. Conjugate priors are described that allow tractable Bayesian inference for some simple models. However, conjugate priors are not available for more complex models, motivating the use of MCMC methods which can approximate integrals and distributions of interest for more complex models.
Monte Carlo methods use random sampling to solve problems numerically. They work by setting up probabilistic models and running simulations using random numbers. This allows approximating solutions to problems in physics, finance, optimization, and other fields. Examples include estimating pi by simulating dart throws, and using a "drunken wino" random walk simulation to approximate the solution to a partial differential equation on a grid. The accuracy of Monte Carlo methods increases with more simulation iterations, requiring truly random numbers for best results.
Monte Carlo simulation is a technique used to approximate probability distributions of potential outcomes by conducting multiple trial runs, called simulations, using random variables. It allows professionals to account for risk and uncertainty in fields like finance, engineering, and insurance. The technique works by simulating a system many times, each with randomly generated values for uncertain variables, to build probability distributions of possible results. It provides probabilistic, graphical, and sensitivity analysis advantages over deterministic models.
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
1. Folding Markov chains:
the origaMCMC
Christian P. Robert
Universit´e Paris-Dauphine PSL and University of Warwick
bayesianstatistics@gmail.com
Joint on-goin’ work with R. Douc and G. Roberts
1 / 25
4. motivating example
Consider the target
π(x) =
1
(1 + x2)π
standard Cauchy distribution
Basic Metropolis-Hastings algorithm with uniform proposal
zt ∼ U(xt − , xt + ) cannot be geometrically ergodic
[Mengersen and Tweedie (1996)]
−8 −6 −4 −2 0
0200040006000800010000
x
t
Dynamics of a standard random-walk
Metropolis–Hastings algorithm when
targeting a Cauchy distribution, based on
104 iterations and a uniform scale of ε = .1.
4 / 25
5. motivating example
Consider the target
π(x) =
1
(1 + x2)π
standard Cauchy distribution
Basic Metropolis-Hastings algorithm with uniform proposal
zt ∼ U(xt − , xt + ) cannot be geometrically ergodic
[Mengersen and Tweedie (1996)]
−8 −6 −4 −2 0
0200040006000800010000
x
t
Dynamics of a standard random-walk
Metropolis–Hastings algorithm when
targeting a Cauchy distribution, based on
104 iterations and a uniform scale of ε = .1.
4 / 25
6. new proposal
Metropolis-Hastings alternative:
1. the current value xt of the Markov chain is first inverted into
yt = 1/xt if found outside (−1, 1),
2. then moved by a random walk on (−1, 1) to
zt ∼ U(yt − , yt + ), which value is accepted or not
according to the standard Metropolis-Hastings ratio,
3. and outcome inverted into xt+1 = 1/yt+1 with probability 1/2
simple version of the folding algorithm, with folding set the unit
interval (−1, 1)
5 / 25
7. new proposal
Metropolis-Hastings alternative:
1. the current value xt of the Markov chain is first inverted into
yt = 1/xt if found outside (−1, 1),
2. then moved by a random walk on (−1, 1) to
zt ∼ U(yt − , yt + ), which value is accepted or not
according to the standard Metropolis-Hastings ratio,
3. and outcome inverted into xt+1 = 1/yt+1 with probability 1/2
simple version of the folding algorithm, with folding set the unit
interval (−1, 1)
5 / 25
8. new proposal
Metropolis-Hastings alternative:
1. the current value xt of the Markov chain is first inverted into
yt = 1/xt if found outside (−1, 1),
2. then moved by a random walk on (−1, 1) to
zt ∼ U(yt − , yt + ), which value is accepted or not
according to the standard Metropolis-Hastings ratio,
3. and outcome inverted into xt+1 = 1/yt+1 with probability 1/2
simple version of the folding algorithm, with folding set the unit
interval (−1, 1)
5 / 25
9. validation
simple version of the folding algorithm, with folding set the unit
interval (−1, 1)
Cauchy target still stationary for this distribution
probability 1/2 resulting from Jacobian rather than from
P(|X| < 1) = 1/2
not-so-simple [but still-manageable] probabilty if chosing
folding interval (−2, 2) and inversion yt = 4/xt
fundamental reason is that Cauchy is invariant by inversion
resulting Markov chain is uniformly ergodic
6 / 25
10. validation
simple version of the folding algorithm, with folding set the unit
interval (−1, 1)
Cauchy target still stationary for this distribution
probability 1/2 resulting from Jacobian rather than from
P(|X| < 1) = 1/2
not-so-simple [but still-manageable] probabilty if chosing
folding interval (−2, 2) and inversion yt = 4/xt
fundamental reason is that Cauchy is invariant by inversion
resulting Markov chain is uniformly ergodic
6 / 25
11. simulation outcome
−4 −2 0 2 4
02000400060008000
x
t
x
Density
−20 −10 0 10 20
0.000.050.100.150.200.25
Figure : (Left) Folded Markov chain for Cauchy target with same scale
of the random walk. (Right) Empirical distribution of the Markov chain
and fit to the Cauchy target
7 / 25
12. folding the Markov chain
Consider target π on state space X
Let A0, A1, . . . , AM be a finite partition of the state space and
create differentiable bijections g1, . . . , gM from A0 to A1, . . . , AM,
respectively. Set X = A0 as the folded space
Define the distribution
π (x ) = π(x ) + π(g1x ) |dx g1 (x )| + . . . + π(gMx ) |dx gM (x )|
on X
c π (·) is a proper density on X
8 / 25
13. folding the Markov chain
Consider target π on state space X
Let A0, A1, . . . , AM be a finite partition of the state space and
create differentiable bijections g1, . . . , gM from A0 to A1, . . . , AM,
respectively. Set X = A0 as the folded space
Define the distribution
π (x ) = π(x ) + π(g1x ) |dx g1 (x )| + . . . + π(gMx ) |dx gM (x )|
on X
c π (·) is a proper density on X
8 / 25
14. unfolding the folded Markov chain
Simulating from π is equivalent to simulating from π:
Lemma
If x ∼ π , then
x =
x with probability π(x )/π (x )
g1x with probability π(g1x ) |dx g1 (x )| /π (x )
· · ·
gMx with probability π(gMx ) |dx gM (x )| /π (x )
is distributed from the target π.
c build MCMC sampler aiming at π
9 / 25
15. unfolding the folded Markov chain
Simulating from π is equivalent to simulating from π:
Lemma
If x ∼ π , then
x =
x with probability π(x )/π (x )
g1x with probability π(g1x ) |dx g1 (x )| /π (x )
· · ·
gMx with probability π(gMx ) |dx gM (x )| /π (x )
is distributed from the target π.
c build MCMC sampler aiming at π
9 / 25
16. Cauchy example validated
For the Cauchy example:
A0 = (−1, 1), A1 = (−1, 1)c, g1x = 1/x
and
π (x) = π(x ) + π(g1x ) |dx g1 (x )|
=
1
(1 + x2)π
+
1
(1 + 1/x2)π
1
x2
=
2
(1 + x2)π
unfolding by x =
x w.p. 1/2
1/x w.p. 1/2
10 / 25
17. Cauchy example validated
For the alternative
A0 = (−2, 2), A1 = (−2, 2)c, g1x = 4/x
and
π (x) = π(x ) + π(g1x ) |dx g1 (x )|
=
1
(1 + x2)π
+
1
(1 + 4/x2)π
4
x2
=
1
(1 + x2)π
+
4
(4 + x2)π
unfolding by x =
x w.p. π(x )/π (x )
1/x w.p. 4π(4/x )/(x )2π (x )
10 / 25
19. improving the acceptance rate
Define folded transition kernel, K (x , dy ) as
k (x , y ) =
M
i=0
π(gi x ) |dx gi (x )|
π (x )
M
j=0
k(gi x , gj y ) |dx gj (y )|
Kernel considers all
possible images in
original space X and
brings them into
the folded space
X
(1)
k
X
,(1)
k
X
(1)
k+1
X
,(1)
k+1
Q Q
H K, π
12 / 25
20. improving the acceptance rate
Define folded transition kernel, K (x , dy ) as
k (x , y ) =
M
i=0
π(gi x ) |dx gi (x )|
π (x )
M
j=0
k(gi x , gj y ) |dx gj (y )|
Kernel considers all
possible images in
original space X and
brings them into
the folded space
X
(0)
k
X
,(0)
k
X
(0)
k+1
X
,(0)
k+1
Q Q
H Q KQ, π
12 / 25
21. improving the acceptance rate
Define folded transition kernel, K (x , dy ) as
k (x , y ) =
M
i=0
π(gi x ) |dx gi (x )|
π (x )
M
j=0
k(gi x , gj y ) |dx gj (y )|
Proposition
If α(x, y), resp. α (x , y ), is Metropolis–Hasting acceptance
probability for the original, resp. folded, proposal kernel K then
E[α (X , Y )] ≥ E[α(X, Y )]
when expectations computed under respective stationary
distributions, π (x )K (x , y ) and π(x)K(x, y)
12 / 25
22. asymptotic variance
Given (X, X) and (X , X ),
define the folding mapping
ϕ : X → X and write
Q(x, dx ) = δϕ(x)(dx )
X
(1)
k
X
,(1)
k
X
(1)
k+1
X
,(1)
k+1
Q Q
H K, π
13 / 25
23. asymptotic variance
Given (X, X) and (X , X ),
define the folding mapping
ϕ : X → X and write
Q(x, dx ) = δϕ(x)(dx )
For π target probability on
(X, X), set
π = π ◦ ϕ−1
and define kernel Q on X × X
by
π(dx)Q(x, dx ) = π (dx )Q (x , dx)
X
(1)
k
X
,(1)
k
X
(1)
k+1
X
,(1)
k+1
Q Q
H K, π
X
(0)
k
X
,(0)
k
X
(0)
k+1
X
,(0)
k+1
Q Q
H Q KQ, π
13 / 25
24. asymptotic variance
Given (X, X) and (X , X ), define the folding mapping ϕ : X → X
and write
Q(x, dx ) = δϕ(x)(dx )
set π = π ◦ ϕ−1 and define kernel Q on X × X by
π(dx)Q(x, dx ) = π (dx )Q (x , dx)
Lemma
for all (f , g) ∈ L2(π) × L2(π ),
f ; Qg π = Q f ; g π where f ; g µ = µ(fg)
If K is π-reversible, then, Q KQ is π -reversible, since
f ; Q KQg π = Qf ; KQg π = KQf ; Qg π = Q KQf ; g π
13 / 25
25. comparing two π-reversible Markov chains
Let P0 and P1 be two π-reversible Markov kernels.
easy-to-check conditions on P0 and P1 ensuring that for all f in
some ”class of functions”,
v(f , P0) ≥ v(f , P1)
where we have defined, for a Markov chain (X()
k)k∈N with
π-reversible transition kernel P and initial distribution π,
v(f , P) := lim
n→∞
1
n
Var
n−1
k=0
f (Xk) = lim
n→∞
√
nVarˆπn(f )
14 / 25
26. two notions
Definition
1. P1 dominates P0 on the off-diagonal, i.e. P1 P0, if
∀(x, A), P1(x, A {x}) ≥ P0(x, A {x}) .
2. P1 dominates P0 in the covariance ordering, i.e. P1 P0, if
∀f ∈ L2
(π), f ; P1f π ≤ f ; P0f π
where f ; g π = π(dx)f (x)g(x).
Theorem
P1 P0 ⇒ P1 P0 ⇒ v(f , P0) ≥ v(f , P1) ∀f ∈ L2(π) .
[Peskun (1973) and Tierney (1998)]
15 / 25
27. two notions
Definition
1. P1 dominates P0 on the off-diagonal, i.e. P1 P0, if
∀(x, A), P1(x, A {x}) ≥ P0(x, A {x}) .
2. P1 dominates P0 in the covariance ordering, i.e. P1 P0, if
∀f ∈ L2
(π), f ; P1f π ≤ f ; P0f π
where f ; g π = π(dx)f (x)g(x).
Theorem
P1 P0 ⇒ P1 P0 ⇒ v(f , P0) ≥ v(f , P1) ∀f ∈ L2(π) .
[Peskun (1973) and Tierney (1998)]
15 / 25
28. two notions
Definition
1. P1 dominates P0 on the off-diagonal, i.e. P1 P0, if
∀(x, A), P1(x, A {x}) ≥ P0(x, A {x}) .
2. P1 dominates P0 in the covariance ordering, i.e. P1 P0, if
∀f ∈ L2
(π), f ; P1f π ≤ f ; P0f π
where f ; g π = π(dx)f (x)g(x).
Theorem
P1 P0 ⇒ P1 P0 ⇒ v(f , P0) ≥ v(f , P1) ∀f ∈ L2(π) .
[Peskun (1973) and Tierney (1998)]
15 / 25
29. induced kernel
Given a proposition kernel K and the target distribution π, write
H K, π the Metropolis-Hastings kernel defined by:
H K, π (x, A {x}) = K(x, dy)α(x, y)IA{x}(y)
where
α(x, y) = 1 ∧ r(x, y)
r(x, y) = dµ
dν (x, y)
µ(dxdy) = π(dy)K(y, dx)
ν(dxdy) = π(dx)K(x, dy)
16 / 25
30. two approximate expectations
1. Let {X
(1)
k } a Markov chain of transition kernel H K, π . The
Rao-Blackwellised approximation is defined by
ˆπ
(1)
n (h) =
1
n
n
k=1
QQ h(X
(1)
k ) (1)
2. Let {X
,(0)
k } a Markov chain of transition kernel H Q KQ, π
and consider the Rao-Blackwellised approximation
ˆπ
(0)
n (h) =
1
n
n
k=1
Q h(X
,(0)
k ) (2)
17 / 25
31. two approximate expectations
1. Let {X
(1)
k } a Markov chain of transition kernel H K, π . The
Rao-Blackwellised approximation is defined by
ˆπ
(1)
n (h) =
1
n
n
k=1
QQ h(X
(1)
k ) (1)
2. Let {X
,(0)
k } a Markov chain of transition kernel H Q KQ, π
and consider the Rao-Blackwellised approximation
ˆπ
(0)
n (h) =
1
n
n
k=1
Q h(X
,(0)
k ) (2)
17 / 25
32. comparison
Theorem
For h real-valued measurable function on (X, X) such that
πh2 < ∞ and
{X
(1)
k , k ∈ N} Markov chain with kernel H K, π starting
from π
{X
,(0)
k , k ∈ N} Markov chain with kernel H Q KQ, π
starting from π
Then,
lim
n→∞
nVar(ˆπ
(0)
n (h)) ≤ lim
n→∞
nVar(ˆπ
(1)
n (h))
with ˆπ
(0)
n (h) and ˆπ
(1)
n (h) defined in (2) and (1)
18 / 25
33. comparison
Theorem
For h real-valued measurable function on (X, X) such that
πh2 < ∞ and
{X
(1)
k , k ∈ N} Markov chain with kernel H K, π starting
from π
{X
,(0)
k , k ∈ N} Markov chain with kernel H Q KQ, π
starting from π
Then,
lim
n→∞
nVar(ˆπ
(0)
n (h)) ≤ lim
n→∞
nVar(ˆπ
(1)
n (h))
with ˆπ
(0)
n (h) and ˆπ
(1)
n (h) defined in (2) and (1)
18 / 25
35. folding set
Unless target distribution simple enough for informed choice,
natural choice for A0 is HPD region
Hα = {x ∈ X; π(x) ≥ α}
as
π [and hence π] lower bounded on Hα
resulting Hα compact
some transition kernels produce uniform ergodic chains
partition of X into A0, Ac
0 with natural stereoscopic projection
[provided A0 star-convex]
g1(x ) =
2
|x |2
x
20 / 25
36. practical implementation
While Hα usually unavailable, approximations can be found from
preliminary MCMC runs when π(x) or unnormalised version of it
can be computed
preliminary run produces simulations with [relative] values of
π, π(x1), . . . , π(xN)
derivation of higher density values [and potential clustering]
choice of an HPD approximation as ball and g1 as natural
projection
reevaluation of the folding set after further simulations
note: black box compatibility with MCMC code
21 / 25
37. practical implementation
While Hα usually unavailable, approximations can be found from
preliminary MCMC runs when π(x) or unnormalised version of it
can be computed
preliminary run produces simulations with [relative] values of
π, π(x1), . . . , π(xN)
derivation of higher density values [and potential clustering]
choice of an HPD approximation as ball and g1 as natural
projection
reevaluation of the folding set after further simulations
note: black box compatibility with MCMC code
21 / 25
39. Cauchy illustration
preliminary run produces
simulations with values of
π(x)
derivation of higher density
values and clustering
choice of an HPD
approximation as ball and g1
as natural projection
potential reevaluation of the
folding set after further
simulations
x
Density
−0.6 −0.4 −0.2 0.0 0.2
0.00.51.01.52.0
22 / 25
41. Cauchy illustration
preliminary run produces
simulations with values of
π(x)
derivation of higher density
values and clustering
choice of an HPD
approximation as ball and g1
as natural projection
potential reevaluation of the
folding set after further
simulations
x
Density
0.8 1.0 1.2 1.4 1.6
01234
22 / 25
42. Cauchy illustration
preliminary run produces
simulations with values of
π(x)
derivation of higher density
values and clustering
choice of an HPD
approximation as ball and g1
as natural projection
potential reevaluation of the
folding set after further
simulations
x
Density
1.3 1.4 1.5 1.6 1.7
0246810
qMC version using sobol(1e5,3)
22 / 25
45. Gaussian sugarloaf
Target
π(x) ∝ ϕ(x; µ, Σ) × exp{−α/||x − x0||2
}
preliminary run produces
simulations with values of
π(x)
derivation of higher density
values and clustering
choice of an HPD
approximation as ball and g1
as natural projection
potential reevaluation of the
folding set after further
simulations
x
Density
−4 −2 0 2 4
0.00.10.20.3
x
Density
−4 −2 0 2 4
0.00.10.20.30.4
x
Density
−4 −2 0 2 4
0.00.10.20.30.4
x
Density
−4 −2 0 2 4
0.000.100.200.30
23 / 25
47. keep folding: the origaMCMC
When A0 shows too much variability of π , it can be folded again:
the procedure can be iterated or a more elaborated partition can
be constructed by clustering
cost of unfolding possibly a deterent
over-concentration not an issue with projected proposal
plus other proposals may be included
possible connection with Wang-Landau flat histogram
algorithm, although Jacobian may prevent flatness (and
become a liability)
[Jacob & Ryder, 2014]
24 / 25
48. keep folding: the origaMCMC
When A0 shows too much variability of π , it can be folded again:
the procedure can be iterated or a more elaborated partition can
be constructed by clustering
cost of unfolding possibly a deterent
over-concentration not an issue with projected proposal
plus other proposals may be included
possible connection with Wang-Landau flat histogram
algorithm, although Jacobian may prevent flatness (and
become a liability)
[Jacob & Ryder, 2014]
24 / 25
49. further questions
1. Folding increases the acceptance probabilities and improve the
asymptotic covariance. What about achieving geometric
ergodicity?
2. Folding is only possible if folding and unfolding the Markov
chains is not costly. What about a computing time criterion?
3. Domination results are obtained with the kernel induced on
the folded space [black box]. What about selecting more
appropriate [black/white box] folded kernels?
25 / 25
50. further questions
1. Folding increases the acceptance probabilities and improve the
asymptotic covariance. What about achieving geometric
ergodicity?
2. Folding is only possible if folding and unfolding the Markov
chains is not costly. What about a computing time criterion?
3. Domination results are obtained with the kernel induced on
the folded space [black box]. What about selecting more
appropriate [black/white box] folded kernels?
25 / 25
51. further questions
1. Folding increases the acceptance probabilities and improve the
asymptotic covariance. What about achieving geometric
ergodicity?
2. Folding is only possible if folding and unfolding the Markov
chains is not costly. What about a computing time criterion?
3. Domination results are obtained with the kernel induced on
the folded space [black box]. What about selecting more
appropriate [black/white box] folded kernels?
25 / 25