Those are the slides for my conference talk at 2013 WSC, in the "Jacob Bernoulli's "Ars Conjectandi" and the emergence of probability" session organised by Adam Jakubowski
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document provides an overview of normed and Banach spaces. It begins by defining normed vector spaces and Banach spaces, noting that Hilbert spaces are Banach spaces with additional properties. It then discusses several key theorems regarding Banach spaces, including the Banach-Steinhaus theorem, open mapping theorem, and Hahn-Banach theorem. The document concludes by noting that Banach spaces play a central role in functional analysis and topology.
This article continues the study of concrete algebra-like structures in our polyadic approach, where the arities of all operations are initially taken as arbitrary, but the relations between them, the arity shapes, are to be found from some natural conditions ("arity freedom principle"). In this way, generalized associative algebras, coassociative coalgebras, bialgebras and Hopf algebras are defined and investigated. They have many unusual features in comparison with the binary case. For instance, both the algebra and its underlying field can be zeroless and nonunital, the existence of the unit and counit is not obligatory, and the dimension of the algebra is not arbitrary, but "quantized". The polyadic convolution product and bialgebra can be defined, and when the algebra and coalgebra have unequal arities, the polyadic version of the antipode, the querantipode, has different properties. As a possible application to quantum group theory, we introduce the polyadic version of braidings, almost co-commutativity, quasitriangularity and the equations for the R-matrix (which can be treated as a polyadic analog of the Yang-Baxter equation). Finally, we propose another concept of deformation which is governed not by the twist map, but by the medial map, where only the latter is unique in the polyadic case. We present the corresponding braidings, almost co-mediality and M-matrix, for which the compatibility equations are found.
Some fundamental theorems in Banach spaces and Hilbert spacesSanjay Sharma
This document provides an overview of functional analysis and some fundamental theorems in Banach and Hilbert spaces. It discusses how functional analysis studies topological-algebraic structures and their applications in mathematics and sciences. It also summarizes key definitions like normed linear spaces and Hilbert spaces. Some fundamental theorems covered include the Hahn-Banach theorem, open mapping theorem, closed graph theorem, Banach-Steinhaus theorem, and Riesz representation theorem.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
This document discusses prior selection for mixture estimation. It begins by introducing mixture models and their common parameterization. It then discusses several types of weakly informative priors that can be used for mixture models, including empirical Bayes priors, hierarchical priors, and reparameterizations. It notes challenges with using improper priors for mixture models. The document also discusses saturated priors when the number of components is not known beforehand. It covers Jeffreys priors for mixtures and issues around propriety. It proposes some reparameterizations of mixtures, like using moments or a spherical reparameterization, that allow proper Jeffreys-like priors to be defined.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document provides an overview of normed and Banach spaces. It begins by defining normed vector spaces and Banach spaces, noting that Hilbert spaces are Banach spaces with additional properties. It then discusses several key theorems regarding Banach spaces, including the Banach-Steinhaus theorem, open mapping theorem, and Hahn-Banach theorem. The document concludes by noting that Banach spaces play a central role in functional analysis and topology.
This article continues the study of concrete algebra-like structures in our polyadic approach, where the arities of all operations are initially taken as arbitrary, but the relations between them, the arity shapes, are to be found from some natural conditions ("arity freedom principle"). In this way, generalized associative algebras, coassociative coalgebras, bialgebras and Hopf algebras are defined and investigated. They have many unusual features in comparison with the binary case. For instance, both the algebra and its underlying field can be zeroless and nonunital, the existence of the unit and counit is not obligatory, and the dimension of the algebra is not arbitrary, but "quantized". The polyadic convolution product and bialgebra can be defined, and when the algebra and coalgebra have unequal arities, the polyadic version of the antipode, the querantipode, has different properties. As a possible application to quantum group theory, we introduce the polyadic version of braidings, almost co-commutativity, quasitriangularity and the equations for the R-matrix (which can be treated as a polyadic analog of the Yang-Baxter equation). Finally, we propose another concept of deformation which is governed not by the twist map, but by the medial map, where only the latter is unique in the polyadic case. We present the corresponding braidings, almost co-mediality and M-matrix, for which the compatibility equations are found.
Some fundamental theorems in Banach spaces and Hilbert spacesSanjay Sharma
This document provides an overview of functional analysis and some fundamental theorems in Banach and Hilbert spaces. It discusses how functional analysis studies topological-algebraic structures and their applications in mathematics and sciences. It also summarizes key definitions like normed linear spaces and Hilbert spaces. Some fundamental theorems covered include the Hahn-Banach theorem, open mapping theorem, closed graph theorem, Banach-Steinhaus theorem, and Riesz representation theorem.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
1. The document proposes a method for making approximate Bayesian computation (ABC) inferences accurate by modeling the distribution of summary statistics calculated from simulated and observed data.
2. It involves constructing an auxiliary probability space (ρ-space) based on these summary values, and performing classification on ρ-space to determine whether simulated and observed data are from the same population.
3. Indirect inference is then used to link ρ-space back to the original parameter space, allowing the ABC approximation to match the true posterior distribution if the ABC tolerances and number of simulations are properly calibrated.
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
Fixed point theorem in fuzzy metric space with e.a propertyAlexander Decker
This document presents a theorem proving the existence of a common fixed point for four self-mappings (A, B, S, T) on a fuzzy metric space under certain conditions. Specifically:
1) The mappings satisfy containment and weakly compatible conditions, as well as property (E.A).
2) There exists a contractive inequality relating the mappings.
3) The range of one mapping (T) is a closed subspace.
Under these assumptions, the theorem proves the mappings have a unique common fixed point. The proof constructs sequences to show the mappings share a single fixed point. References at the end provide background on fuzzy metric spaces and related fixed point results.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document presents an asymptotic expansion of the posterior density in high dimensional generalized linear models. The main results are:
1) The authors prove a third order correct asymptotic expansion of the posterior density for generalized linear models with canonical link functions when the number of regressors grows with sample size.
2) This asymptotic expansion is then used to derive moment matching priors in the generalized linear model setting.
3) The expansion assumes the number of regressors grows such that p6+εn /n → 0 as n → ∞ for some small ε > 0, which is stronger than prior work requiring only p4n log(pn)/n → 0.
This document summarizes results on analyzing stochastic gradient descent (SGD) algorithms for minimizing convex functions. It shows that a continuous-time version of SGD (SGD-c) can strongly approximate the discrete-time version (SGD-d) under certain conditions. It also establishes that SGD achieves the minimax optimal convergence rate of O(t^-1/2) for α=1/2 by using an "averaging from the past" procedure, closing the gap between previous lower and upper bound results.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
ABC convergence under well- and mis-specified modelsChristian Robert
1. Approximate Bayesian computation (ABC) is a simulation-based method for performing Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations where the simulated and observed data are close according to some distance measure.
2. Advances in ABC include modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including a tolerance parameter in the inferential framework.
3. Recent studies have analyzed the asymptotic properties of ABC, showing the posterior distributions and means can be consistent under certain conditions on the summary statistics and tolerance decreasing rates.
A common fixed point theorem for six mappings in g banach space with weak-com...Alexander Decker
The document presents a theorem proving the existence of a common fixed point for six mappings (P, Q, A, B, S, T) in a G-Banach space under certain conditions. Some key points:
- Defines concepts of G-Banach space, which generalizes the ordinary Banach space.
- States a theorem that proves four mappings have a unique common fixed point in a G-Banach space if they satisfy certain contraction conditions.
- The main result extends this to prove that six mappings (P, Q, A, B, S, T) have a unique common fixed point in a G-Banach space if they satisfy new generalized contraction conditions and are weakly compatible.
The document summarizes research on threshold network models, which generate scale-free networks without growth by assigning intrinsic weights to nodes based on a given distribution and connecting nodes based on whether their total weight exceeds a threshold. The model has been extended to spatial networks by incorporating distance between nodes and to include homophily. Analytical results show the degree distribution and other properties depend on the weight distribution and thresholding function used. Several open problems are also discussed.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
This document discusses approximate inference techniques for probabilistic models. It begins with an introduction to variational inference and how it can be used to approximate intractable distributions. It then discusses applying variational inference to mixture of Gaussian models and exponential family distributions. Finally, it briefly introduces expectation propagation as another approximate inference method before concluding with a summary.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
This document discusses priors for mixture models. It introduces weakly informative priors like symmetric empirical Bayes priors and dependent priors. Improper independent priors are problematic for mixtures. Reparameterization techniques are discussed to define proper Jeffreys priors, including expressing components as local perturbations, using moments, and spherical reparameterization. Specific examples for Gaussian and Poisson mixtures show valid reparameterizations that lead to proper posteriors.
This document discusses challenges and actions needed for the European tourism sector regarding disability policy through 2010. It reviews policy developments from 1990-2007 related to accessible tourism, including initiatives by the UN, EU, and organizations like ENAT. The key points are: (1) Accessible tourism is important for Europe's global competitiveness and sustainability as demographic trends increase the proportion of older and disabled travelers. (2) While progress has been made in policies, more work is needed to implement accessible tourism and coordinate efforts across Europe. (3) Standards and sharing of good practices can help tourism providers meet growing demand from rights of disabled persons to equal participation in society.
Consumer Sentinel Network - Federal Trade Commission 2013- Mark - Fullbright
The document is a 102-page report from the Federal Trade Commission (FTC) summarizing data from the Consumer Sentinel Network (CSN) for calendar year 2013. Some key details:
- The CSN is a secure online database containing over 9 million consumer complaints available to law enforcement. It receives data from FTC and numerous state/federal agencies and organizations.
- In 2013, the CSN received over 2 million complaints - 55% related to fraud, 14% to identity theft, and 31% other. The top complaint categories were identity theft, debt collection, and banks/lenders.
- For fraud complaints, consumers reported paying over $1.6 billion. Identity theft most
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
The document summarizes recent work on establishing theoretical convergence rates for the bouncy particle sampler (BPS), a non-reversible Markov chain Monte Carlo algorithm. The main results show that under certain conditions on the target distribution, including having exponentially decaying tails, the BPS exhibits exponential ergodicity. A central limit theorem is also established. The analysis considers different cases for thin-tailed, thick-tailed, and transformed target distributions.
Fixed point theorem in fuzzy metric space with e.a propertyAlexander Decker
This document presents a theorem proving the existence of a common fixed point for four self-mappings (A, B, S, T) on a fuzzy metric space under certain conditions. Specifically:
1) The mappings satisfy containment and weakly compatible conditions, as well as property (E.A).
2) There exists a contractive inequality relating the mappings.
3) The range of one mapping (T) is a closed subspace.
Under these assumptions, the theorem proves the mappings have a unique common fixed point. The proof constructs sequences to show the mappings share a single fixed point. References at the end provide background on fuzzy metric spaces and related fixed point results.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
This document presents a method for Bayesian variable selection under generalized linear models. It begins by introducing the model setting and Bayesian model selection framework. It then discusses three algorithms for model search: deterministic search, stochastic search, and a hybrid search method. The key contribution is a method to simultaneously evaluate the marginal likelihoods of all neighbor models, without parallel computing. This is achieved by decomposing the coefficient vectors and estimating additional coefficients conditioned on the current model's coefficients. Newton-Raphson iterations are used to solve the system of equations and obtain the maximum a posteriori estimates for all neighbor models simultaneously in a single computation. This allows for a fast, inexpensive search of the model space.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
This document presents an asymptotic expansion of the posterior density in high dimensional generalized linear models. The main results are:
1) The authors prove a third order correct asymptotic expansion of the posterior density for generalized linear models with canonical link functions when the number of regressors grows with sample size.
2) This asymptotic expansion is then used to derive moment matching priors in the generalized linear model setting.
3) The expansion assumes the number of regressors grows such that p6+εn /n → 0 as n → ∞ for some small ε > 0, which is stronger than prior work requiring only p4n log(pn)/n → 0.
This document summarizes results on analyzing stochastic gradient descent (SGD) algorithms for minimizing convex functions. It shows that a continuous-time version of SGD (SGD-c) can strongly approximate the discrete-time version (SGD-d) under certain conditions. It also establishes that SGD achieves the minimax optimal convergence rate of O(t^-1/2) for α=1/2 by using an "averaging from the past" procedure, closing the gap between previous lower and upper bound results.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
ABC convergence under well- and mis-specified modelsChristian Robert
1. Approximate Bayesian computation (ABC) is a simulation-based method for performing Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations where the simulated and observed data are close according to some distance measure.
2. Advances in ABC include modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including a tolerance parameter in the inferential framework.
3. Recent studies have analyzed the asymptotic properties of ABC, showing the posterior distributions and means can be consistent under certain conditions on the summary statistics and tolerance decreasing rates.
A common fixed point theorem for six mappings in g banach space with weak-com...Alexander Decker
The document presents a theorem proving the existence of a common fixed point for six mappings (P, Q, A, B, S, T) in a G-Banach space under certain conditions. Some key points:
- Defines concepts of G-Banach space, which generalizes the ordinary Banach space.
- States a theorem that proves four mappings have a unique common fixed point in a G-Banach space if they satisfy certain contraction conditions.
- The main result extends this to prove that six mappings (P, Q, A, B, S, T) have a unique common fixed point in a G-Banach space if they satisfy new generalized contraction conditions and are weakly compatible.
The document summarizes research on threshold network models, which generate scale-free networks without growth by assigning intrinsic weights to nodes based on a given distribution and connecting nodes based on whether their total weight exceeds a threshold. The model has been extended to spatial networks by incorporating distance between nodes and to include homophily. Analytical results show the degree distribution and other properties depend on the weight distribution and thresholding function used. Several open problems are also discussed.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
This document discusses approximate inference techniques for probabilistic models. It begins with an introduction to variational inference and how it can be used to approximate intractable distributions. It then discusses applying variational inference to mixture of Gaussian models and exponential family distributions. Finally, it briefly introduces expectation propagation as another approximate inference method before concluding with a summary.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
This document discusses priors for mixture models. It introduces weakly informative priors like symmetric empirical Bayes priors and dependent priors. Improper independent priors are problematic for mixtures. Reparameterization techniques are discussed to define proper Jeffreys priors, including expressing components as local perturbations, using moments, and spherical reparameterization. Specific examples for Gaussian and Poisson mixtures show valid reparameterizations that lead to proper posteriors.
This document discusses challenges and actions needed for the European tourism sector regarding disability policy through 2010. It reviews policy developments from 1990-2007 related to accessible tourism, including initiatives by the UN, EU, and organizations like ENAT. The key points are: (1) Accessible tourism is important for Europe's global competitiveness and sustainability as demographic trends increase the proportion of older and disabled travelers. (2) While progress has been made in policies, more work is needed to implement accessible tourism and coordinate efforts across Europe. (3) Standards and sharing of good practices can help tourism providers meet growing demand from rights of disabled persons to equal participation in society.
Consumer Sentinel Network - Federal Trade Commission 2013- Mark - Fullbright
The document is a 102-page report from the Federal Trade Commission (FTC) summarizing data from the Consumer Sentinel Network (CSN) for calendar year 2013. Some key details:
- The CSN is a secure online database containing over 9 million consumer complaints available to law enforcement. It receives data from FTC and numerous state/federal agencies and organizations.
- In 2013, the CSN received over 2 million complaints - 55% related to fraud, 14% to identity theft, and 31% other. The top complaint categories were identity theft, debt collection, and banks/lenders.
- For fraud complaints, consumers reported paying over $1.6 billion. Identity theft most
This document discusses listening and active listening. It describes the listening process as involving receiving information, understanding it, remembering it, evaluating it, and responding. It also discusses barriers to listening like distractions and biases. It outlines different listening styles such as empathic vs objective and surface vs depth. It defines active listening as a style that helps check understanding, acknowledges feelings, and encourages further speaking. Key active listening techniques include paraphrasing, expressing understanding, and asking questions.
NYA Spirituality and Spiritual Development in Youth WorkDiocese of Exeter
This document discusses spirituality and spiritual development in youth work. It provides context on:
1) The historical roots of youth work being intertwined with faith and values often presented in spiritual/religious frameworks.
2) Definitions of spirituality focusing on meaning, values, transcendence rather than organized religion. Spiritual development involves exploring identity, beliefs and place in the world.
3) The role of spirituality and spiritual development in youth work being to provide opportunities for young people to explore their spirituality, to partner with faith communities, and to foster spiritual development in secular settings through social justice work.
4) Key issues that need further discussion including how to provide spiritual exploration opportunities for
This document provides motivation for using a circular kernel density estimator for nonparametric density estimation of circular data. It describes how a simple approximation theory from linear kernel estimation can be adapted to the circular case by replacing the kernel with a sequence of periodic densities on [-π,π] that converge to a degenerate distribution at θ=0. It shows that the wrapped Cauchy density satisfies the conditions to serve as such a kernel, resulting in the circular kernel density estimator proposed in equation 1.12. This estimator is shown to converge uniformly to the true density f(θ) as the sample size increases, providing theoretical justification for its use in smooth nonparametric density estimation for circular variables.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
Causal set theory is an approach to quantum gravity that represents spacetime as a locally finite partially ordered set of points with causal relations. It is a minimalist approach that does not assume an underlying spacetime continuum. There are two main methods to reconstruct a manifold from a causal set: 1) extracting manifold properties like dimension from causal sets that can be embedded in a manifold, and 2) sprinkling points randomly into an existing manifold to produce an embedded causal set. To study dynamics, an action must be defined on causal sets that reproduces the Einstein-Hilbert action in the continuum limit. Several proposals have been made to define nonlocal operators on causal sets that approach the d'Alembertian operator in the limit. Overall causal set
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
The document discusses smoothing parameter selection for density estimation from length-biased data using asymmetric kernels. It reviews recent work on applying Bayesian criteria for this purpose. The key points are: 1) Asymmetric kernels are better for density estimation of non-negative data as they avoid positive mass in negative regions. 2) Length-biased data arises in situations where observations are weighted by their values. 3) Estimating the underlying density from length-biased data requires adjusting for the bias. 4) Bayesian methods provide an approach for selecting the smoothing parameter for asymmetric kernel density estimators applied to length-biased data.
The document summarizes 18 important mathematical problems for the next century as identified by Steve Smale. Some of the key problems discussed include:
1) The Riemann Hypothesis concerning the distribution of primes.
2) The Poincaré Conjecture regarding classifying 3-dimensional spaces.
3) The famous P vs. NP problem about the difference between solving and verifying solutions to problems.
A Review Article on Fixed Point Theory and Its Applicationijtsrd
The theory of fixed point is one of the most important and powerful tools of the modern mathematics not only it is used on a daily bases in pure and applied mathematics but it is also solving a bridge between analysis and topology and provide a very fruitful are of interaction between the two. The theory of fixed points belongs to topology, a part of mathematics created at the end of the nineteenth century. The famous French mathematician H. Poincare 1854 1912 was the founder of the fixed point approach. He had deep insight into its future importance for problems of mathematical analysis and celestial mechanics and took an active part in its development. Dr. Brajraj Singh Chauhan "A Review Article on Fixed Point Theory & Its Application" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26431.pdfPaper URL: https://www.ijtsrd.com/mathemetics/applied-mathematics/26431/a-review-article-on-fixed-point-theory-and-its-application/dr-brajraj-singh-chauhan
The document discusses limitations of classical significance testing and advantages of Bayesian statistics for information retrieval (IR) evaluation. It proposes that the IR community should adopt the Bayesian approach to directly discuss the probability that a hypothesis is true given observed data. Bayesian methods allow estimating this probability for any hypothesis, using tools like Markov chain Monte Carlo sampling and Hamiltonian Monte Carlo. The document recommends always reporting effect sizes alongside probabilities to provide full understanding of results.
A Family Of Extragradient Methods For Solving Equilibrium ProblemsYasmine Anino
The document discusses using variational inequalities and bilevel programming models to analyze the optimal pollution emission price problem. Specifically, it presents a continuous-time central planning model where the government chooses the optimal price of pollution emissions considering how manufacturers in a supply chain will respond to the price. The lower-level problem involves the manufacturers determining their optimal production levels given the emission price, while the upper-level problem involves the government selecting the price to maximize social welfare. Existence of solutions is analyzed using variational inequality theory.
Monte Carlo methods can be used to estimate sums and integrals by approximating them as expectations under a probability distribution. Samples are drawn from the distribution and the average of the function evaluated at each sample is calculated. This provides an unbiased estimate with variance that decreases as more samples are taken. Importance sampling improves upon this by drawing samples from a different distribution that puts more weight on important areas, which can reduce variance. Markov chain Monte Carlo methods like Gibbs sampling are used to draw samples from distributions that cannot be directly sampled, like those represented by undirected graphs, by iteratively updating variables conditioned on others.
Quaternions, Alexander Armstrong, Harold Baker, Owen WilliamsHarold Baker
Quaternions are a mathematical structure used to represent rotations and orientations in 3D space. The document discusses the history, theory, and applications of quaternions. It was invented in 1843 by Sir William Rowan Hamilton and has found modern applications in computer graphics, where it is used for 3D animation and rotations due to advantages over other representations like Euler angles. The theory section covers properties like multiplication and identities. Applications discussed include physics, group theory, and using quaternions in linear interpolation algorithms for smooth 3D animation.
This document summarizes Yogendra Chaubey's upcoming talk on nonparametric density estimation for size biased data. It will highlight recent developments in this area, with an emphasis on density estimation when the data is subject to constraints that traditional estimators may not satisfy. It describes how Hille's approximation lemma can be used to propose alternative smooth density estimators. It will also present the results of a simulation study comparing various nonparametric density estimators and their asymptotic properties.
This dissertation consists of three chapters that study identification and inference in econometric models.
Chapter 1 considers identification robust inference when the moment variance matrix is singular. It develops a novel asymptotic approach based on higher order expansions of the eigensystem to show that the Generalized Anderson-Rubin statistic possesses a chi-squared limit under additional regularity conditions. When these conditions are violated, the statistic is shown to be Op(n) and exhibit "moment-singularity bias".
Chapter 2 provides a method called "Normalized Principal Components" to minimize many weak instrument bias in linear IV settings. It derives an asymptotically valid ranking of instruments in terms of correlation and selects instruments to minimize MSE approximations.
Chapter
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
This document provides an overview of Frank Nielsen's talk on pattern learning and recognition using information geometry and statistical manifolds. The talk focuses on departing from vector space representations and dealing with (dis)similarities that do not have Euclidean or metric properties. This poses new theoretical and computational challenges for pattern recognition. The talk describes using exponential family mixture models defined on dually flat statistical manifolds induced by convex functions. On these manifolds, dual coordinate systems and dual affine geodesics allow for computing-friendly representations of divergences and similarities between probabilistic patterns. The techniques aim to achieve statistical invariance and enable algorithmic approaches to problems like Gaussian mixture modeling, shape retrieval, and diffusion tensor imaging analysis.
This document summarizes a research article that proposes a new methodology for optimal state-space reconstruction from time series data using non-uniform time delays. The methodology aims to minimize redundancy between coordinates by using derivatives on a projected manifold. It is shown to achieve a better reconstruction compared to methods using multiples of the first minimum mutual information delay. The methodology is also more reliable for determining embedding dimension.
Markov Chain Monte Carlo (MCMC) methods use Markov chains to sample from probability distributions for use in Monte Carlo simulations. The Metropolis-Hastings algorithm proposes transitions to new states in the chain and either accepts or rejects those states based on a probability calculation, allowing it to sample from complex, high-dimensional distributions. The Gibbs sampler is a special case of MCMC where each variable is updated conditional on the current values of the other variables, ensuring all proposed moves are accepted. These MCMC methods allow approximating integrals that are difficult to compute directly.
A Survey On The Weierstrass Approximation TheoremMichele Thomas
The document provides a survey of the Weierstrass approximation theorem and related results in approximation theory over the past century. It begins with an introduction to the theorem proved by Weierstrass in 1885, which showed that continuous functions can be uniformly approximated by polynomials on compact intervals. The document then discusses several improvements, generalizations, and ramifications of the theorem developed in subsequent decades, including results on approximating functions by trigonometric polynomials, Bernstein polynomials, and rational functions. It concludes by mentioning several influential theorems in approximation theory from the 20th century, such as Stone's theorem on uniform approximation by collections of functions.
Similar to Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013 (20)
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
This document discusses Bayesian estimation of conditional moment models. It presents several approaches for completing conditional moment models for Bayesian processing, including using non-parametric parts, empirical likelihood Bayesian tools, or maximum entropy alternatives. It also discusses simplistic ABC alternatives and innovative aspects of introducing tolerance parameters for misspecification and cancelling conditional aspects. Unconditional and conditional model comparison using empirical likelihoods and Bayes factors is proposed.
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
The document proposes a new version of Hamiltonian Monte Carlo (HMC) sampling that is essentially calibration-free. It achieves this by learning the optimal leapfrog scale from the distribution of integration times using the No-U-Turn Sampler algorithm. Compared to the original NUTS algorithm on benchmark models, this new enhanced HMC (eHMC) exhibits significantly improved efficiency with no hand-tuning of parameters required. The document tests eHMC on a Susceptible-Infected-Recovered model of disease transmission.
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
Markov Chain Monte Carlo (MCMC) methods generate dependent samples from a target distribution using a Markov chain. The Metropolis-Hastings algorithm constructs a Markov chain with a desired stationary distribution by proposing moves to new states and accepting or rejecting them probabilistically. The algorithm is used to approximate integrals that are difficult to compute directly. It has been shown to converge to the target distribution as the number of iterations increases.
This document discusses using the Wasserstein distance for inference in generative models. It begins by introducing ABC methods that use a distance between samples to compare observed and simulated data. It then discusses using the Wasserstein distance as an alternative distance metric that has lower variance than the Euclidean distance. The document covers computational aspects of calculating the Wasserstein distance, asymptotic properties of minimum Wasserstein estimators, and applications to time series data.
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
This document describes a new MCMC method called the Coordinate Sampler. It is a non-reversible Gibbs-like sampler based on a piecewise deterministic Markov process (PDMP). The Coordinate Sampler generalizes the Bouncy Particle Sampler by making the bounce direction partly random and orthogonal to the gradient. It is proven that under certain conditions, the PDMP induced by the Coordinate Sampler has a unique invariant distribution of the target distribution multiplied by a uniform auxiliary variable distribution. The Coordinate Sampler is also shown to exhibit geometric ergodicity, an important convergence property, under additional regularity conditions on the target distribution.
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
Talk at 2013 WSC, ISI Conference in Hong Kong, August 26, 2013
1. An [under]view of Monte Carlo methods, from
importance sampling to MCMC, to ABC
(& kudos to Bernoulli)
Christian P. Robert
Universit´e Paris-Dauphine, University of Warwick, & CREST, Paris
2013 WSC, Hong Kong
bayesianstatistics@gmail.com
3. Bernoulli as founding father of Monte Carlo methods
The weak law of large numbers (or Bernoulli’s [Golden] theorem)
provides the justification for Monte Carlo approximations:
if x1, . . . , xn are i.i.d. rv’s with density f ,
lim
n→∞
h(x1) + . . . + h(xn)
n
=
X
h(x)f (x) dx
Stigler’s Law of Eponimy: Cardano (1501–1576) first stated the
result
4. Bernoulli as founding father of Monte Carlo methods
...and indeed
h(x1) + . . . + h(xn)
n
converges to
I =
X
h(x)f (x) dx
5. Bernoulli as founding father of Monte Carlo methods
...and indeed
h(x1) + . . . + h(xn)
n
converges to
I =
X
h(x)f (x) dx
...meaning that provided we can simulate xi ∼ f (·) long and fast
“enough”, the empirical mean will be a good “enough”
approximation to I
6. Early implementations of the LLN
While Jakob Bernoulli
himself apparently did not
engage in simulation,
Buffon (1707–1788) resorted
to a (not-yet-Monte-Carlo)
experiment in 1735 to
estimate the value of the
Saint Petersburg game
(even though he did not
perform a similar experiment
for estimating π)
[Stigler, STS, 1991; Stigler, JRSS A, 2010]
7. Early implementations of the LLN
While Jakob Bernoulli
himself apparently did not
engage in simulation,
De Forest (1834–1888)
found the median of a
log-Cauchy distribution,
using normal simulations
approximated to the second
digit (in 1876)
[Stigler, STS, 1991; Stigler, JRSS A, 2010]
8. Early implementations of the LLN
While Jakob Bernoulli
himself apparently did not
engage in simulation,
followed closely by the
ubuquitous Galton using
“normal” dice in 1890, after
developping the Quincunx,
used both for checking the
CLT and simulating from a
posterior distribution as
early as 1877
[Stigler, STS, 1991; Stigler, JRSS A, 2010]
9. Importance Sampling
When focussing on integral approximation, very loose principle in
that proposal distribution with pdf q(·) leads to alternative
representation
I =
X
h(x){f /q}(x) q(x) dx
Principle of importance
Generate an iid sample x1, . . . , xn ∼ q(·) and estimate I by
^IIS
= n−1
n
i=1
h(xi ){f /q}(xi ).
...provided q is positive on the right set
10. Importance Sampling
When focussing on integral approximation, very loose principle in
that proposal distribution with pdf q(·) leads to alternative
representation
I =
X
h(x){f /q}(x) q(x) dx
Principle of importance
Generate an iid sample x1, . . . , xn ∼ q(·) and estimate I by
^IIS
= n−1
n
i=1
h(xi ){f /q}(xi ).
...provided q is positive on the right set
11. Importance Sampling
When focussing on integral approximation, very loose principle in
that proposal distribution with pdf q(·) leads to alternative
representation
I =
X
h(x){f /q}(x) q(x) dx
Principle of importance
Generate an iid sample x1, . . . , xn ∼ q(·) and estimate I by
^IIS
= n−1
n
i=1
h(xi ){f /q}(xi ).
...provided q is positive on the right set
12. things aren’t all rosy...
LLN not sufficient to justify Monte
Carlo methods: if
n−1
n
i=1
h(xi ){f /q}(xi )
has an infinite variance, the estimator
^IIS is useless Importance sampling estimation of
P(2 Z 6) Z is Cauchy and
importance is normal, compared
with exact value, 0.095.
13. The harmonic mean estimator
Bayesian posterior distribution defined as
π(θ|x) = π(θ)L(θ|x)/m(x)
When θi ∼ π(θ|x),
1
T
T
t=1
1
L(θt|x)
is an unbiased estimator of 1/m(x)
[Gelfand & Dey, 1994; Newton & Raftery, 1994]
Highly hazardous material: Most often leads to an infinite
variance!!!
14. The harmonic mean estimator
Bayesian posterior distribution defined as
π(θ|x) = π(θ)L(θ|x)/m(x)
When θi ∼ π(θ|x),
1
T
T
t=1
1
L(θt|x)
is an unbiased estimator of 1/m(x)
[Gelfand & Dey, 1994; Newton & Raftery, 1994]
Highly hazardous material: Most often leads to an infinite
variance!!!
15. “The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that this
estimator is consistent ie, it will very likely be very close to the correct
answer if you use a sufficiently large number of points from the posterior
distribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to na¨ıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
16. Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: proposal ϕ(·) must have lighter (rather than fatter)
tails than π(·)L(·) for the approximation
1
1
T
T
t=1
ϕ(θt)
πk(θt)L(θt)
θt ∼ ϕ(·)
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
17. Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: proposal ϕ(·) must have lighter (rather than fatter)
tails than π(·)L(·) for the approximation
1
1
T
T
t=1
ϕ(θt)
πk(θt)L(θt)
θt ∼ ϕ(·)
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
18. HPD indicator as ϕ
Use the convex hull of MCMC simulations (θt)t=1,...,T
corresponding to the 10% HPD region (easily derived!) and ϕ as
indicator:
ϕ(θ) =
10
T
t∈HPD
Id(θ,θt )
[X & Wraith, 2009]
20. computational jam
In the 1970’s and early 1980’s, theoretical foundations of Bayesian
statistics were sound, but methodology was lagging for lack of
computing tools.
restriction to conjugate priors
limited complexity of models
small sample sizes
The field was desperately in need of a new computing paradigm!
[X & Casella, STS, 2012]
21. MCMC as in Markov Chain Monte Carlo
Notion that i.i.d. simulation is definitely not necessary, all that
matters is the ergodic theorem
Realization that Markov chains could be used in a wide variety of
situations only came to mainstream statisticians with Gelfand and
Smith (1990) despite earlier publications in the statistical literature
like Hastings (1970) and growing awareness in spatial statistics
(Besag, 1986)
Reasons:
lack of computing machinery
lack of background on Markov chains
lack of trust in the practicality of the method
22. pre-Gibbs/pre-Hastings era
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
[Hammersley and Clifford, 1971]
23. pre-Gibbs/pre-Hastings era
Early 1970’s, Hammersley, Clifford, and Besag were working on the
specification of joint distributions from conditional distributions
and on necessary and sufficient conditions for the conditional
distributions to be compatible with a joint distribution.
“What is the most general form of the conditional
probability functions that define a coherent joint
function? And what will the joint look like?”
[Besag, 1972]
24. Hammersley-Clifford[-Besag] theorem
Theorem (Hammersley-Clifford)
Joint distribution of vector associated with a dependence graph
must be represented as product of functions over the cliques of the
graphs, i.e., of functions depending only on the components
indexed by the labels in the clique.
[Cressie, 1993; Lauritzen, 1996]
25. Hammersley-Clifford[-Besag] theorem
Theorem (Hammersley-Clifford)
A probability distribution P with positive and continuous density f
satisfies the pairwise Markov property with respect to an
undirected graph G if and only if it factorizes according to G, i.e.,
(F) ≡ (G)
[Cressie, 1993; Lauritzen, 1996]
26. Hammersley-Clifford[-Besag] theorem
Theorem (Hammersley-Clifford)
Under the positivity condition, the joint distribution g satisfies
g(y1, . . . , yp) ∝
p
j=1
g j
(y j
|y 1 , . . . , y j−1
, y j+1
, . . . , y p
)
g j
(y j
|y 1 , . . . , y j−1
, y j+1
, . . . , y p
)
for every permutation on {1, 2, . . . , p} and every y ∈ Y.
[Cressie, 1993; Lauritzen, 1996]
27. Clicking in
After Peskun (1973), MCMC mostly dormant in mainstream
statistical world for about 10 years, then several papers/books
highlighted its usefulness in specific settings:
Geman and Geman (1984)
Besag (1986)
Strauss (1986)
Ripley (Stochastic Simulation, 1987)
Tanner and Wong (1987)
Younes (1988)
28. [Re-]Enters the Gibbs sampler
Geman and Geman (1984), building on
Metropolis et al. (1953), Hastings (1970), and
Peskun (1973), constructed a Gibbs sampler
for optimisation in a discrete image processing
problem with a Gibbs random field without
completion.
Back to Metropolis et al., 1953: the Gibbs
sampler is already in use therein and ergodicity
is proven on the collection of global maxima
29. [Re-]Enters the Gibbs sampler
Geman and Geman (1984), building on
Metropolis et al. (1953), Hastings (1970), and
Peskun (1973), constructed a Gibbs sampler
for optimisation in a discrete image processing
problem with a Gibbs random field without
completion.
Back to Metropolis et al., 1953: the Gibbs
sampler is already in use therein and ergodicity
is proven on the collection of global maxima
30. Removing the jam
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
linear mixed models (Gelfand & al., 1990; Zeger & Karim, 1991;
Wang & al., 1993, 1994)
generalized linear mixed models (Albert & Chib, 1993)
mixture models (Tanner & Wong, 1987; Diebolt & X., 1990, 1994;
Escobar & West, 1993)
changepoint analysis (Carlin & al., 1992)
point processes (Grenander & Møller, 1994)
&tc
31. Removing the jam
In early 1990s, researchers found that Gibbs and then Metropolis -
Hastings algorithms would crack almost any problem!
Flood of papers followed applying MCMC:
genomics (Stephens & Smith, 1993; Lawrence & al., 1993;
Churchill, 1995; Geyer & Thompson, 1995; Stephens & Donnelly,
2000)
ecology (George & X, 1992)
variable selection in regression (George & mcCulloch, 1993; Green,
1995; Chen & al., 2000)
spatial statistics (Raftery & Banfield, 1991; Besag & Green, 1993))
longitudinal studies (Lange & al., 1992)
&tc
32. MCMC and beyond
reversible jump MCMC which impacted considerably Bayesian model
choice (Green, 1995)
adaptive MCMC algorithms (Haario & al., 1999; Roberts & Rosenthal,
2009)
exact approximations to targets (Tanner & Wong, 1987; Beaumont,
2003; Andrieu & Roberts, 2009)
comp’al stats catching up with comp’al physics: free energy sampling
(e.g., Wang-Landau), Hamilton Monte Carlo (Girolami & Calderhead,
2011)
sequential Monte Carlo (SMC) for non-sequential problems (Chopin,
2002; Neal, 2001; Del Moral et al 2006)
retrospective sampling
intractability: EP – GIMH – PMCMC – SMC2
– INLA
QMC[MC] (Owen, 2011)
33. Particles
Iterating/sequential importance sampling is about as old as Monte
Carlo methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
et al. (1997) coined the term “particle filter”.
34. Particles
Iterating/sequential importance sampling is about as old as Monte
Carlo methods themselves!
[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]
Found in the molecular simulation literature of the 50’s with
self-avoiding random walks and signal processing
[Marshall, 1965; Handschin and Mayne, 1969]
Use of the term “particle” dates back to Kitagawa (1996), and Carpenter
et al. (1997) coined the term “particle filter”.
35. pMC & pMCMC
Recycling of past simulations legitimate to build better
importance sampling functions as in population Monte Carlo
[Iba, 2000; Capp´e et al, 2004; Del Moral et al., 2007]
synthesis by Andrieu, Doucet, and Hollenstein (2010) using
particles to build an evolving MCMC kernel ^pθ(y1:T ) in state
space models p(x1:T )p(y1:T |x1:T )
importance sampling on discretely observed diffusions
[Beskos et al., 2006; Fearnhead et al., 2008, 2010]
36. Metropolis-Hastings revisited
Bernoulli, Jakob (1654–1705)
MCMC connected steps
Metropolis-Hastings revisited
Reinterpretation and
Rao-Blackwellisation
Russian roulette
Approximate Bayesian computation
(ABC)
37. Metropolis Hastings algorithm
1. We wish to approximate
I =
h(x)π(x)dx
π(x)dx
= h(x)¯π(x)dx
2. π(x) is known but not π(x)dx.
3. Approximate I with δ = 1
n
n
t=1 h(x(t)) where (x(t)) is a
Markov chain with limiting distribution ¯π.
4. Convergence obtained from Law of Large Numbers or CLT for
Markov chains.
38. Metropolis Hasting Algorithm
Suppose that x(t) is drawn.
1. Simulate yt ∼ q(·|x(t)).
2. Set x(t+1) = yt with probability
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
Otherwise, set x(t+1) = x(t).
3. α is such that the detailed balance equation is satisfied: ¯π
is the stationary distribution of (x(t)).
The accepted candidates are simulated with the rejection
algorithm.
39. Metropolis Hasting Algorithm
Suppose that x(t) is drawn.
1. Simulate yt ∼ q(·|x(t)).
2. Set x(t+1) = yt with probability
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
Otherwise, set x(t+1) = x(t).
3. α is such that the detailed balance equation is satisfied: ¯π
is the stationary distribution of (x(t)).
The accepted candidates are simulated with the rejection
algorithm.
40. Metropolis Hasting Algorithm
Suppose that x(t) is drawn.
1. Simulate yt ∼ q(·|x(t)).
2. Set x(t+1) = yt with probability
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
Otherwise, set x(t+1) = x(t).
3. α is such that the detailed balance equation is satisfied:
π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x).
¯π is the stationary distribution of (x(t)).
The accepted candidates are simulated with the rejection
algorithm.
41. Metropolis Hasting Algorithm
Suppose that x(t) is drawn.
1. Simulate yt ∼ q(·|x(t)).
2. Set x(t+1) = yt with probability
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
Otherwise, set x(t+1) = x(t).
3. α is such that the detailed balance equation is satisfied:
π(x)q(y|x)α(x, y) = π(y)q(x|y)α(y, x).
¯π is the stationary distribution of (x(t)).
The accepted candidates are simulated with the rejection
algorithm.
42. Some properties of the HM algorithm
Alternative representation of the estimator δ is
δ =
1
n
n
t=1
h(x(t)
) =
1
n
Mn
i=1
ni h(zi ) ,
where
zi ’s are the accepted yj ’s,
Mn is the number of accepted yj ’s till time n,
ni is the number of times zi appears in the sequence (x(t))t.
43. The ”accepted candidates”
˜q(·|zi ) =
α(zi , ·) q(·|zi )
p(zi )
q(·|zi )
p(zi )
,
where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from ˜q(·|zi ):
1. Propose a candidate y ∼ q(·|zi )
2. Accept with probability
˜q(y|zi )
q(y|zi )
p(zi )
= α(zi , y)
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel
˜q enjoys ˜π as a stationary distribution:
˜π(x)˜q(y|x) = ˜π(y)˜q(x|y) ,
44. The ”accepted candidates”
˜q(·|zi ) =
α(zi , ·) q(·|zi )
p(zi )
q(·|zi )
p(zi )
,
where p(zi ) = α(zi , y) q(y|zi )dy. To simulate from ˜q(·|zi ):
1. Propose a candidate y ∼ q(·|zi )
2. Accept with probability
˜q(y|zi )
q(y|zi )
p(zi )
= α(zi , y)
Otherwise, reject it and starts again.
this is the transition of the HM algorithm.The transition kernel
˜q enjoys ˜π as a stationary distribution:
˜π(x)˜q(y|x) = ˜π(y)˜q(x|y) ,
45. ”accepted” Markov chain
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1. (zi , ni )i is a Markov chain;
2. zi+1 and ni are independent given zi ;
3. ni is distributed as a geometric random variable with
probability parameter
p(zi ) := α(zi , y) q(y|zi ) dy ; (1)
4. (zi )i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
46. ”accepted” Markov chain
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1. (zi , ni )i is a Markov chain;
2. zi+1 and ni are independent given zi ;
3. ni is distributed as a geometric random variable with
probability parameter
p(zi ) := α(zi , y) q(y|zi ) dy ; (1)
4. (zi )i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
47. ”accepted” Markov chain
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1. (zi , ni )i is a Markov chain;
2. zi+1 and ni are independent given zi ;
3. ni is distributed as a geometric random variable with
probability parameter
p(zi ) := α(zi , y) q(y|zi ) dy ; (1)
4. (zi )i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
48. ”accepted” Markov chain
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1. (zi , ni )i is a Markov chain;
2. zi+1 and ni are independent given zi ;
3. ni is distributed as a geometric random variable with
probability parameter
p(zi ) := α(zi , y) q(y|zi ) dy ; (1)
4. (zi )i is a Markov chain with transition kernel
˜Q(z, dy) = ˜q(y|z)dy and stationary distribution ˜π such that
˜q(·|z) ∝ α(z, ·) q(·|z) and ˜π(·) ∝ π(·)p(·) .
51. Importance sampling perspective
1. A natural idea:
δ∗
Mn
i=1
h(zi )
p(zi )
Mn
i=1
1
p(zi )
=
Mn
i=1
π(zi )
˜π(zi )
h(zi )
Mn
i=1
π(zi )
˜π(zi )
.
2. But p not available in closed form.
52. Importance sampling perspective
1. A natural idea:
δ∗
Mn
i=1
h(zi )
p(zi )
Mn
i=1
1
p(zi )
=
Mn
i=1
π(zi )
˜π(zi )
h(zi )
Mn
i=1
π(zi )
˜π(zi )
.
2. But p not available in closed form.
3. The geometric ni is the replacement, an obvious solution that
is used in the original Metropolis–Hastings estimate since
E[ni ] = 1/p(zi ).
53. The Bernoulli factory
The crude estimate of 1/p(zi ),
ni = 1 +
∞
j=1 j
I {u α(zi , y )} ,
can be improved:
Lemma (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y|zi ), the quantity
^ξi = 1 +
∞
j=1 j
{1 − α(zi , y )}
is an unbiased estimator of 1/p(zi ) which variance, conditional on
zi , is lower than the conditional variance of ni , {1 − p(zi )}/p2(zi ).
54. Rao-Blackwellised, for sure?
^ξi = 1 +
∞
j=1 j
{1 − α(zi , y )}
1. Infinite sum but finite with at least positive probability:
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
For example: take a symmetric random walk as a proposal.
2. What if we wish to be sure that the sum is finite?
Finite horizon k version:
^ξk
i = 1 +
∞
j=1 1 k∧j
1 − α(zi , yj )
k+1 j
I {u α(zi , y )}
55. Rao-Blackwellised, for sure?
^ξi = 1 +
∞
j=1 j
{1 − α(zi , y )}
1. Infinite sum but finite with at least positive probability:
α(x(t)
, yt) = min 1,
π(yt)
π(x(t))
q(x(t)|yt)
q(yt|x(t))
For example: take a symmetric random walk as a proposal.
2. What if we wish to be sure that the sum is finite?
Finite horizon k version:
^ξk
i = 1 +
∞
j=1 1 k∧j
1 − α(zi , yj )
k+1 j
I {u α(zi , y )}
56. which Bernoulli factory?!
Not the spice warehouse of Leon Bernoulli!
Query:
Given an algorithm delivering iid B(p) rv’s, is it possible to derive
an algorithm delivering iid B(p) rv’s when f is known and p
unknown?
[von Neumann, 1951; Keane & O’Brien, 1994]
existence (e.g., impossible for f (p) = min(2p, 1))
condition: for some n,
min{f (p), 1 − f (p)} min{p, 1 − p}n
implementation (polynomial vs. exponential time)
use of sandwiching polynomials/power series
57. which Bernoulli factory?!
Not the spice warehouse of Leon Bernoulli!
Query:
Given an algorithm delivering iid B(p) rv’s, is it possible to derive
an algorithm delivering iid B(p) rv’s when f is known and p
unknown?
[von Neumann, 1951; Keane & O’Brien, 1994]
existence (e.g., impossible for f (p) = min(2p, 1))
condition: for some n,
min{f (p), 1 − f (p)} min{p, 1 − p}n
implementation (polynomial vs. exponential time)
use of sandwiching polynomials/power series
58. Variance improvement
Theorem (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an
iid uniform sequence, for any k 0, the quantity
^ξk
i = 1 +
∞
j=1 1 k∧j
1 − α(zi , yj )
k+1 j
I {u α(zi , y )}
is an unbiased estimator of 1/p(zi ) with an almost sure finite
number of terms.
59. Variance improvement
Theorem (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an
iid uniform sequence, for any k 0, the quantity
^ξk
i = 1 +
∞
j=1 1 k∧j
1 − α(zi , yj )
k+1 j
I {u α(zi , y )}
is an unbiased estimator of 1/p(zi ) with an almost sure finite
number of terms. Moreover, for k 1,
V ^ξk
i zi =
1 − p(zi )
p2(zi )
−
1 − (1 − 2p(zi ) + r(zi ))k
2p(zi ) − r(zi )
2 − p(zi )
p2(zi )
(p(zi ) − r(zi )) ,
where p(zi ) := α(zi , y) q(y|zi ) dy. and r(zi ) := α2(zi , y) q(y|zi ) dy.
60. Variance improvement
Theorem (Douc & X., AoS, 2011)
If (yj )j is an iid sequence with distribution q(y|zi ) and (uj )j is an
iid uniform sequence, for any k 0, the quantity
^ξk
i = 1 +
∞
j=1 1 k∧j
1 − α(zi , yj )
k+1 j
I {u α(zi , y )}
is an unbiased estimator of 1/p(zi ) with an almost sure finite
number of terms. Therefore, we have
V ^ξi zi V ^ξk
i zi V ^ξ0
i zi = V [ni | zi ] .
61. B motivation for Russian roulette
drior π(θ), data density p(y|θ) = f (y; θ)/Z(θ) with
Z(θ) = f (x; θ)dx
intractable (e.g., Ising spin model, MRF, diffusion processes,
networks, &tc)
doubly-intractable posterior follows as
π(θ|y) = p(y|θ) × π(θ) ×
1
Z(y)
=
f (y; θ)
Z(θ)
× π(θ) ×
1
Z(y)
where Z(y) = p(y|θ)π(θ)dθ
both Z(θ) and Z(y) are intractable with massively different
consequences
[thanks to Mark Girolami for his Russian slides!]
62. B motivation for Russian roulette
drior π(θ), data density p(y|θ) = f (y; θ)/Z(θ) with
Z(θ) = f (x; θ)dx
intractable (e.g., Ising spin model, MRF, diffusion processes,
networks, &tc)
doubly-intractable posterior follows as
π(θ|y) = p(y|θ) × π(θ) ×
1
Z(y)
=
f (y; θ)
Z(θ)
× π(θ) ×
1
Z(y)
where Z(y) = p(y|θ)π(θ)dθ
both Z(θ) and Z(y) are intractable with massively different
consequences
[thanks to Mark Girolami for his Russian slides!]
63. B motivation for Russian roulette
If Z(θ) is intractable, Metropolis–Hasting acceptance
probability
α(θ , θ) = min 1,
f (y; θ )π(θ )
f (y; θ)π(θ)
×
q(θ|θ )
q(θ |θ)
×
Z(θ)
Z(θ )
is not available
Use instead biased approximations e.g. pseudo-likelihoods,
plugin ^Z(θ ) estimates without sacrificing exactness of MCMC
64. B motivation for Russian roulette
If Z(θ) is intractable, Metropolis–Hasting acceptance
probability
α(θ , θ) = min 1,
f (y; θ )π(θ )
f (y; θ)π(θ)
×
q(θ|θ )
q(θ |θ)
×
Z(θ)
Z(θ )
is not available
Use instead biased approximations e.g. pseudo-likelihoods,
plugin ^Z(θ ) estimates without sacrificing exactness of MCMC
65. Existing solution
Unbiased plugin estimate
Z(θ)
Z(θ )
≈
f (x; θ)
f (x; θ )
where x ∼
f (x; θ )
Z(θ )
[Møller et al, Bka, 2006; Murray et al 2006]
auxiliary variable method
removes Z(θ) Z(θ ) from the picture
require simulations from the model (e.g., via perfect sampling)
66. Exact approximate methods
Pseudo-Marginal construction that allows for the use of unbiased,
positive estimates of target in acceptance probability
α(θ , θ) = min 1,
^π(θ |y)
^π(θ|y)
×
q(θ|θ )
q(θ |θ)
[Beaumont, 2003; Andrieu and Roberts, 2009; Doucet et al, 2012]
Transition kernel has invariant distribution with exact target
density π(θ|y)
67. Exact approximate methods
Pseudo-Marginal construction that allows for the use of unbiased,
positive estimates of target in acceptance probability
α(θ , θ) = min 1,
^π(θ |y)
^π(θ|y)
×
q(θ|θ )
q(θ |θ)
[Beaumont, 2003; Andrieu and Roberts, 2009; Doucet et al, 2012]
Transition kernel has invariant distribution with exact target
density π(θ|y)
68. Infinite series estimator
For each (θ, y), construct rv’s {V
(j)
θ , j 0} such that
^π(θ, {V
(j)
θ }|y) :=
∞
j=0
V
(j)
θ
is a.s. finite with finite expectation
E ^π(θ, {V
(j)
θ } |y) = π(θ|y)
69. Infinite series estimator
For each (θ, y), construct rv’s {V
(j)
θ , j 0} such that
^π(θ, {V
(j)
θ }|y) :=
∞
j=0
V
(j)
θ
is a.s. finite with finite expectation
E ^π(θ, {V
(j)
θ } |y) = π(θ|y)
Introduce a random stopping time τθ, such that with
ξ := (τθ, {V
(j)
θ , 0 j τθ}) the estimate
^π(θ, ξ|y) :=
τθ
j=0
V
(j)
θ
satisfies
E ^π(θ, ξ|y)|{V
(j)
θ , j 0} = ^π(θ, {V
(j)
θ }|y)
70. Infinite series estimator
For each (θ, y), construct rv’s {V
(j)
θ , j 0} such that
^π(θ, {V
(j)
θ }|y) :=
∞
j=0
V
(j)
θ
is a.s. finite with finite expectation
E ^π(θ, {V
(j)
θ } |y) = π(θ|y)
Warning: unbiased estimate ^π(θ, ξ|y) using series
construction no general guarantee of positivity
71. Russian roulette
Method that requires unbiased truncation of a series
S(θ) =
∞
i=0
φi (θ)
Russian roulette employed extensively in simulation of neutron
scattering and computer graphics
Assign probabilities {qj , j 1} qj ∈ (0, 1] and generate
U(0, 1) i.i.d. r.v’s {Uj , j 1}
Find the first time k 1 such that Uk qk
Russian roulette estimate of S(θ) is
^S(θ) =
k
j=0
φj (θ)
j−1
i=1
qi ,
[Girolami, Lyne, Strathman, Simpson, & Atchad´e, arXiv:1306.4032]
72. Russian roulette
Method that requires unbiased truncation of a series
S(θ) =
∞
i=0
φi (θ)
Russian roulette employed extensively in simulation of neutron
scattering and computer graphics
Assign probabilities {qj , j 1} qj ∈ (0, 1] and generate
U(0, 1) i.i.d. r.v’s {Uj , j 1}
Find the first time k 1 such that Uk qk
Russian roulette estimate of S(θ) is
^S(θ) =
k
j=0
φj (θ)
j−1
i=1
qi ,
[Girolami, Lyne, Strathman, Simpson, & Atchad´e, arXiv:1306.4032]
73. Russian roulette
Method that requires unbiased truncation of a series
S(θ) =
∞
i=0
φi (θ)
Russian roulette employed extensively in simulation of neutron
scattering and computer graphics
Assign probabilities {qj , j 1} qj ∈ (0, 1] and generate
U(0, 1) i.i.d. r.v’s {Uj , j 1}
Find the first time k 1 such that Uk qk
Russian roulette estimate of S(θ) is
^S(θ) =
k
j=0
φj (θ)
j−1
i=1
qi ,
If limn→∞
n
j=1 qj = 0, Russian roulette terminates with
probability one
[Girolami, Lyne, Strathman, Simpson, & Atchad´e, arXiv:1306.4032]
74. Russian roulette
Method that requires unbiased truncation of a series
S(θ) =
∞
i=0
φi (θ)
Russian roulette employed extensively in simulation of neutron
scattering and computer graphics
Assign probabilities {qj , j 1} qj ∈ (0, 1] and generate
U(0, 1) i.i.d. r.v’s {Uj , j 1}
Find the first time k 1 such that Uk qk
Russian roulette estimate of S(θ) is
^S(θ) =
k
j=0
φj (θ)
j−1
i=1
qi ,
E{^S(θ)} = S(θ)
variance finite under certain known conditions
[Girolami, Lyne, Strathman, Simpson, & Atchad´e, arXiv:1306.4032]
75. towards ever more complexity
Bernoulli, Jakob (1654–1705)
MCMC connected steps
Metropolis-Hastings revisited
Approximate Bayesian computation
(ABC)
76. New challenges
Novel statisticial issues that forces a different Bayesian answer:
very large datasets
complex or unknown dependence structures with maybe p n
multiple and involved random effects
missing data structures containing most of the information
sequential structures involving most of the above
77. New paradigm?
“Surprisingly, the confident prediction of the previous
generation that Bayesian methods would ultimately supplant
frequentist methods has given way to a realization that Markov
chain Monte Carlo (MCMC) may be too slow to handle
modern data sets. Size matters because large data sets stress
computer storage and processing power to the breaking point.
The most successful compromises between Bayesian and
frequentist methods now rely on penalization and
optimization.”
[Lange at al., ISR, 2013]
78. New paradigm?
sad reality constraint that
size does matter
focus on much smaller
dimensions and on sparse
summaries
many (fast if non-Bayesian)
ways of producing those
summaries
Bayesian inference can kick
in almost automatically at
this stage
79. Approximate Bayesian computation (ABC)
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach!
Empirical approximations to the original
Bayesian inference problem
Degrading the data precision down
to a tolerance ε
Replacing the likelihood with a
non-parametric approximation
Summarising/replacing the data
with insufficient statistics
80. Approximate Bayesian computation (ABC)
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach!
Empirical approximations to the original
Bayesian inference problem
Degrading the data precision down
to a tolerance ε
Replacing the likelihood with a
non-parametric approximation
Summarising/replacing the data
with insufficient statistics
81. Approximate Bayesian computation (ABC)
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach!
Empirical approximations to the original
Bayesian inference problem
Degrading the data precision down
to a tolerance ε
Replacing the likelihood with a
non-parametric approximation
Summarising/replacing the data
with insufficient statistics
82. Approximate Bayesian computation (ABC)
Case of a well-defined statistical model where the likelihood
function
(θ|y) = f (y1, . . . , yn|θ)
is out of reach!
Empirical approximations to the original
Bayesian inference problem
Degrading the data precision down
to a tolerance ε
Replacing the likelihood with a
non-parametric approximation
Summarising/replacing the data
with insufficient statistics
83. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Griffith et al., 1997]
84. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Griffith et al., 1997]
85. ABC methodology
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
Foundation
For an observation y ∼ f (y|θ), under the prior π(θ), if one keeps
jointly simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y,
then the selected
θ ∼ π(θ|y)
[Rubin, 1984; Diggle & Gratton, 1984; Griffith et al., 1997]
86. ABC algorithm
In most implementations, degree of approximation:
Algorithm 1 Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)}
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
87. Comments
role of distance paramount
(because = 0)
scaling of components of η(y) also
capital
matters little if “small enough”
representative of “curse of
dimensionality”
small is beautiful!, i.e. data as a
whole may be weakly informative
for ABC
non-parametric method at core
88. ABC simulation advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Beaumont et al., 2009, Del Moral et al., 2012]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2010; Biau et al., 2013]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
89. ABC simulation advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Beaumont et al., 2009, Del Moral et al., 2012]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2010; Biau et al., 2013]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
90. ABC simulation advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Beaumont et al., 2009, Del Moral et al., 2012]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2010; Biau et al., 2013]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
91. ABC simulation advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Beaumont et al., 2009, Del Moral et al., 2012]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2010; Biau et al., 2013]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
92. ABC as an inference machine
Starting point is summary statistic
η(y), either chosen for computational
realism or imposed by external
constraints
ABC can produce a distribution on the parameter of interest
conditional on this summary statistic η(y)
inference based on ABC may be consistent or not, so it needs
to be validated on its own
the choice of the tolerance level is dictated by both
computational and convergence constraints
93. ABC as an inference machine
Starting point is summary statistic
η(y), either chosen for computational
realism or imposed by external
constraints
ABC can produce a distribution on the parameter of interest
conditional on this summary statistic η(y)
inference based on ABC may be consistent or not, so it needs
to be validated on its own
the choice of the tolerance level is dictated by both
computational and convergence constraints
94. How Bayesian aBc is..?
At best, ABC approximates π(θ|η(y)):
approximation error unknown (w/o massive simulation)
pragmatic or empirical Bayes (there is no other solution!)
many calibration issues (tolerance, distance, statistics)
the NP side should be incorporated into the whole Bayesian
picture
the approximation error should also be part of the Bayesian
inference
95. Noisy ABC
ABC approximation error (under non-zero tolerance ) replaced
with exact simulation from a controlled approximation to the
target, convolution of true posterior with kernel function
π (θ, z|y) =
π(θ)f (z|θ)K (y − z)
π(θ)f (z|θ)K (y − z)dzdθ
,
with K kernel parameterised by bandwidth .
[Wilkinson, 2013]
Theorem
The ABC algorithm based on a randomised observation y = ˜y + ξ,
ξ ∼ K , and an acceptance probability of
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
96. Noisy ABC
ABC approximation error (under non-zero tolerance ) replaced
with exact simulation from a controlled approximation to the
target, convolution of true posterior with kernel function
π (θ, z|y) =
π(θ)f (z|θ)K (y − z)
π(θ)f (z|θ)K (y − z)dzdθ
,
with K kernel parameterised by bandwidth .
[Wilkinson, 2013]
Theorem
The ABC algorithm based on a randomised observation y = ˜y + ξ,
ξ ∼ K , and an acceptance probability of
K (y − z)/M
gives draws from the posterior distribution π(θ|y).
97. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
98. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
Loss of statistical information balanced against gain in data
roughening
Approximation error and information loss remain unknown
Choice of statistics induces choice of distance function
towards standardisation
borrowing tools from data analysis (LDA) machine learning
[Estoup et al., ME, 2012]
99. Which summary?
Fundamental difficulty of the choice of the summary statistic when
there is no non-trivial sufficient statistics [except when done by the
experimenters in the field]
may be imposed for external/practical reasons
may gather several non-B point estimates
we can learn about efficient combination
distance can be provided by estimation techniques
100. Which summary for model choice?
‘This is also why focus on model discrimination typically (...)
proceeds by (...) accepting that the Bayes Factor that one obtains
is only derived from the summary statistics and may in no way
correspond to that of the full model.’
[S. Sisson, Jan. 31, 2011, xianblog]
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
is either consistent or not
[X et al., PNAS, 2012]
101. Which summary for model choice?
Depending on the choice of η(·), the Bayes factor based on this
insufficient statistic,
Bη
12(y) =
π1(θ1)f η
1 (η(y)|θ1) dθ1
π2(θ2)f η
2 (η(y)|θ2) dθ2
,
is either consistent or not
[X et al., PNAS, 2012]
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.10.20.30.40.50.60.7
n=100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.20.40.60.81.0
n=100
102. Selecting proper summaries
Consistency only depends on the range of
µi (θ) = Ei [η(y)]
under both models against the asymptotic mean µ0 of η(y)
Theorem
If Pn belongs to one of the two models and if µ0 cannot be
attained by the other one :
0 = min (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2)
< max (inf{|µ0 − µi (θi )|; θi ∈ Θi }, i = 1, 2) ,
then the Bayes factor Bη
12 is consistent
[Marin et al., JRSS B, 2013]
103. Selecting proper summaries
Consistency only depends on the range of
µi (θ) = Ei [η(y)]
under both models against the asymptotic mean µ0 of η(y)
q
M1 M2
0.30.40.50.60.7
q
q
q
q
M1 M2
0.30.40.50.60.7
M1 M2
0.30.40.50.60.7
q
q
q
q
q
q
q
q
M1 M2
0.00.20.40.60.8
q
qq
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
M1 M2
0.00.20.40.60.81.0
q
q
q
q
q
q
q
q
M1 M2
0.00.20.40.60.81.0
q
q
q
q
q
q
q
M1 M2
0.00.20.40.60.8
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
M1 M2
0.00.20.40.60.81.0
q
q
qq
q
qq
M1 M2
0.00.20.40.60.81.0
[Marin et al., JRSS B, 2013]