Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
We will describe and analyze accurate and efficient numerical algorithms to interpolate and approximate the integral of multivariate functions. The algorithms can be applied when we are given the function values at an arbitrary positioned, and usually small, existing sparse set of function values (samples), and additional samples are impossible, or difficult (e.g. expensive) to obtain. The methods are based on local, and global, tensor-product sparse quasi-interpolation methods that are exact for a class of sparse multivariate orthogonal polynomials.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
We will describe and analyze accurate and efficient numerical algorithms to interpolate and approximate the integral of multivariate functions. The algorithms can be applied when we are given the function values at an arbitrary positioned, and usually small, existing sparse set of function values (samples), and additional samples are impossible, or difficult (e.g. expensive) to obtain. The methods are based on local, and global, tensor-product sparse quasi-interpolation methods that are exact for a class of sparse multivariate orthogonal polynomials.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les Cordeliers
Slides of Richard Everitt's presentation
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Chris Sherlock's slides
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Paralell Markov Chain Monte Carlo - Scott Schmidler, Dec 11, 2017
This poster was created in LaTeX on a Dell Inspiron laptop with a Linux Fedora Core 4 operating system. The background image and the animation snapshots are dxf meshes of elastic waveform solutions, rendered on a Windows machine using 3D Studio Max.
Presentation of the NUTS Algorithm by M. Hoffmann and A. Gelman
(disclamer: informal work, the huge amount of interesting work by R.Neal is not entirely referenced)
My talk at the International Conference on Monte Carlo Methods and Applications (MCM2032) related to advances in mathematical aspects of stochastic simulation and Monte Carlo methods at Sorbonne Université June 28, 2023, about my recent works (i) "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://doi.org/10.1080/14697688.2022.2135455), and (ii) "Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities" (link: https://arxiv.org/abs/2003.05708).
Computation of Electromagnetic Fields Scattered from Dielectric Objects of Un...Alexander Litvinenko
We research how input uncertainties in the geometry shape propagate through the electromagnetic model to electro-magnetic fields. We use multi-level Monte Carlo methods.
High-Dimensional Network Estimation using ECLHPCC Systems
Kshitij Khare & Syed Rahman, University of Florida, present at the 2015 HPCC Systems Engineering Summit Community Day. In this presentation, we will discuss the motivation/theory behind CONCORD and its advantages over previous methods. In particular, we will discuss how the CONCORD estimate is superior to the empirical covariance matrix. We will end with an example detailing the implementation and use of the CONCORD algorithm in ECL. An exposure to multivariate statistics is helpful, but not necessary. Attendees should expect to come out with an understanding of sparse covariance estimation, its applications and how to use the CONCORD algorithm in ECL.
Traveling Salesman Problem in Distributed Environmentcsandit
In this paper, we focus on developing parallel algorithms for solving the traveling salesman problem (TSP) based on Nicos Christofides algorithm released in 1976. The parallel algorithm
is built in the distributed environment with multi-processors (Master-Slave). The algorithm is installed on the computer cluster system of National University of Education in Hanoi,
Vietnam (ccs1.hnue.edu.vn) and uses the library PJ (Parallel Java). The results are evaluated and compared with other works.
TRAVELING SALESMAN PROBLEM IN DISTRIBUTED ENVIRONMENTcscpconf
In this paper, we focus on developing parallel algorithms for solving the traveling salesman
problem (TSP) based on Nicos Christofides algorithm released in 1976. The parallel algorithm
is built in the distributed environment with multi-processors (Master-Slave). The algorithm is
installed on the computer cluster system of National University of Education in Hanoi,
Vietnam (ccs1.hnue.edu.vn) and uses the library PJ (Parallel Java). The results are evaluated
and compared with other works.
Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Paralell Markov Chain Monte Carlo - Scott Schmidler, Dec 11, 2017 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, Paralell Markov Chain Monte Carlo - Scott Schmidler, Dec 11, 2017
1. Parallel Markov Chain Monte Carlo
Scott C. Schmidler∗
Department of Statistical Science
Duke University
SAMSI Workshop
December 11, 2017
∗
joint work with Doug VanDerwerken
Scott C. Schmidler Parallel Markov Chain Monte Carlo
2. Markov chain Monte Carlo integration
A general problem in (esp Bayesian) statistics and statistical
mechanics is calculation of integrals of the form:
h π = Eπ (h(x)) =
X
h(x)π(dx)
A common, powerful approach is Monte Carlo integration:
h ≈
1
n
n
i=1
h(Xi ) for X1, X2, . . . , Xn ∼ π
When sampling π is difficult, can construct a Markov chain.
MCMC
Many ways to do so: Metropolis-Hastings, Gibbs sampling,
Langevin & Hamiltonian methods, etc.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
3. Problem: MCMC can be slow
When X0, X1, X2, . . . , Xn come from a Markov chain, convergence
of ergodic averages
ˆµh =
1
n
n
i=1
h(Xi )
can converge very slowly.
Mixing time
τ = sup
π0
min{n : πn − π TV < ∀n ≥ n}.
where
πn − π TV = sup
A⊂X
|πn(A) − π(A)|
In problems with multimodality, high dimensions, or simply strong
dependence, mixing times can be very, very long.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
4. Rapid and slow mixing
One way to characterize this is rapid mixing.
Let (X(d), F(d), λ(d)) a sequence of measure spaces, and π(d)
densities wrt λ(d) for d ∈ N the problem size.
P is rapidly mixing if τ (d) is bounded above by a polynomial in d.
P is torpidly mixing if τ (d) bounded below by an exponential in d.
Even if the chain is “rapidly” mixing, τ may be impractically large.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
5. Computation is changing
At the same time, the computing landscape has shifted
dramatically.
Moore’s law (exponential growth of processor speed) is dead.
Future growth must come through parallelism:
Multi-core platforms
Cluster computing
Massive parallelism (GPUs)
Cloud computing
Scott C. Schmidler Parallel Markov Chain Monte Carlo
6. Parallel algorithms
Basic idea: Break a problem into pieces that can be solved
independently - preferably asynchronously - and recombined into a
full solution.
Integration (wrt prob measure π):
Θ
h(θ)π(dθ)
One possibility:
partition space: J
j=1 Θj
integrate within each element Θj : µj = Θj
h(θ)π(θ)dθ
sum the results: µ = j µj
Easily done for grid-based quadrature, but . . .
For fixed , # evals grows exponentially in dim(Θ)
In contrast, Monte Carlo integration“spends” function evals only in
relevant parts of Θ. (Hence preferred for d > 8).
Scott C. Schmidler Parallel Markov Chain Monte Carlo
7. Parallelization
Our goal: Combine the best of both worlds: expend computation
only in regions of significant probability, while enabling parallel
evaluation in distinct regions.
Quandary: MCMC is an inherently serial algorithm, and number of
steps may be exponential in dim(Θ).
Scott C. Schmidler Parallel Markov Chain Monte Carlo
8. MCMC is a serial algorithm
MCMC is inherently serial:
Cannot compute Xt without first computing X1, X2, . . . Xt−1.
⇒ incompatible with parallelization
What we can do:
Parallelize individual steps (e.g. expensive likelihood calcs)
Proposing moves in parallel, or precomputing acceptance
ratios, for a individual steps
Markov chains with natural parallel structure
Parallel tempering
Population MCMC
but . . . such chains have inherent limitations on number of
processors; cannot parallelize component chains
Split ’big data’ and recombine results in ad hoc ways
Particle filtering/SMC
Scott C. Schmidler Parallel Markov Chain Monte Carlo
9. MCMC is a serial algorithm
Moreover, these approaches all require processor synchronization.
Achievable only on dedicated clusters, with high-speed
connectivity
Without this, parallelization may slow-down compared to
single processor.
Finally, all require the component (or joint) chains to reach
equilibrium for valid inference.
⇒ Cannot reduce the number of serial steps required..
e.g. Parallel Tempering:
may speed convergence vs single-temp, but . . .
increasing # processors > # temps doesn’t help.
When mixing slow, e.g in presence of multimodality, may not help.
(e.g Woodard, S., Huber 2009).
These algs fundamentally limited by mixing time of joint process.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
10. Goal of this work
Goal: A procedure that can be applied to any Markov chain Monte
Carlo algorithm (including above methods) to make it parallel, with
the ability to take advantage of as many processors as available:
Asynchronously parallel.
Ideally, linear speedup in # processors.
Not limited by the mixing time of the component chain(s).
Scott C. Schmidler Parallel Markov Chain Monte Carlo
11. Basic idea (not quite what we do)
Given a partition Θ = J
j=1 Θj .
For each j run an MCMC chain θ
(j)
1 , . . . , θ
(j)
nj on the target
distribution restricted to Θj :
πj (θ)
∆
= π(θ)1Θj
(θ)/wj
where wj = Θj
π(θ)ν(dθ).
Then for ergodic averages ˆµj,n = n−1
j
nj
i=1 f (θ
(i)
j ) we have
ˆµj,n −→ Eπj (f ) =
Θj
f (θ)πj (θ)ν(dθ)
as nj → ∞, for each j ∈ {1, . . . , J}.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
12. Combining the chains
If we can also construct estimators for the weights:
ˆwj,n → wj
Then the combined estimator
ˆµn =
J
j=1
ˆwj,n ˆµj,n −→ µ = Eπ(f )
If ˆµj,n’s and ˆwj,n’s unbiased and independent, then ˆµn unbiased.
Notice:
Need only ˆµj,n’s and ˆwj,n’s to converge, not the chains!
Requires only that each chain mix locally
Scott C. Schmidler Parallel Markov Chain Monte Carlo
13. Estimating the weights
Let g(θ) be unnormalized target density, i.e. π(θ) = g(θ)
c
Estimating cj = Θj
g(θ)ν(dθ) equivalent to estimating the
normalizing constant of target density gj (θ) = g(θ)1Θj
(θ)
Many techniques available (but requires care).
Then form
ˆwj,n =
n
i=1
ˆc
(i)
j /
n
i=1
J
k=1
ˆc
(i)
k
which is consistent (but not unbiased) for w.
Other ratio estimators may improve efficiency (Tin 1965), allowing
reduction in n.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
14. Estimating the weights
Approach 1: Markov chain output
Estimate cj directly from MCMC trajectories.
HME (Newton & Raftery 1994, Raftery et. al. 2007)
Chib’s method (Chib 1995, Chib & Jeliazkov 2001)
Bridge/path sampling (Meng & Wong 1996, Gelman & Meng
1998, Meng & Schilling 2002).
Note restriction to Θj helps avoid problems (eg Wolpert & S. 2012).
Scott C. Schmidler Parallel Markov Chain Monte Carlo
15. Estimating the weights
Approach 2: Adaptive importance sampling
Construct approximation qj to πj from MCMC draws:
t(mj , Sj ) distn for sample mean mj , covar Sj
Adaptive mixture of t-distributions (Ji & S. 2013, Wang & S.
2013)
Draw θt
iid
∼ qj to get unbiased IS estimate
ˆcj = T−1
T
t=1
g(θt)1Θj
(θt)/qj (θt)
Again, qj need only approximate π locally on Θj , so λ∗
j = sup πj /qj
much smaller
Scott C. Schmidler Parallel Markov Chain Monte Carlo
16. Estimating the weights
Approach 2: Adaptive importance sampling (cont’d)
More generally, may use a sequence of distributions qj,t.
Markov chain θt | θt−1 ∼ qj (θt | θt−1)
Adaptive MIS chain (Ji & S. 2013, Wang & S. 2013)
‘Sample’ (’trajectory’) to denote independent (conditional) draws.
Averaging n independent ˆcj ’s decreases variance as n−1.
Pseudo-marginal approach (Andrieu & Roberts 2009) using these
techniques significantly less efficient.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
17. Mixture of normals
Consider a simple mixture of 2 normals:
π(z) =
1
2
NM(z; −1M, σ2
1IM) +
1
2
NM(z; 1M, σ2
2IM)
Upper bounds on spectral gap (WSH07a,b) yield:
Thm: RW-MH is torpidly mixing.
Thm: Tempering is torpidly mixing for σ1 = σ2.
Lower bounds on hitting times obtained by (SW10) yield:
Thm: Equi-energy sampler torpidly mixing for σ1 = σ2.
Thm: Haario adaptive RW kernel torpidly mixing for σ1 = σ2.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
18. Towards some theory
However, if partition J
j=1 Θj is such that:
Θj ’s are convex
πj is log-concave for j = 1, . . . , J, then
Then
πj can be sampled in polynomial time (Frieze, Kannan, et al)
cj can be estimated in polynomial time (Lovasz, Vempala)
+ some additional technical restrictions gives:
⇒ we can sample π and approximate Eπ(h(x)) in polynomial time
. . . assuming we can intialize within the basins of attraction in poly
time!
(VanDerwerken & S., 2015)
Scott C. Schmidler Parallel Markov Chain Monte Carlo
19. FPRAS for mixture-of-normals
Theorem
Under above conditions, PMCMC algorithm returns a sample in
time O(poly(d)) from a distribution ˆπ for which ||ˆπ − π∗||TV ≤
with prob at least 1 − δ.
HPD region of modes sampled in poly-time
Use samples to estimate HPD hyperellipsoid Bj at each mode
where π logconcave on Bj .
Apply logconcave integration
Similar result allows construction of a rapidly mixing MIS chain
using adaptive mixture IS instead (VanDerwerken & S., 2015)
Scott C. Schmidler Parallel Markov Chain Monte Carlo
20. FPRAS for mixture-of-normals
Note: exponentially faster than estimating transition matrix as
in MD
Shows problem difficulty is finding modes, not mixing between
them. (Hard even in normal problem?)
Currently exploring limits of generalizability.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
21. Problems with Approach #1
This approach has some shortcomings:
1 Requires # chains (processors) equal to partition size, which
could be exponential in dim(Θ).
2 Where does the partition come from?
3 Restriction πj requires rejection; makes evaluating transition
density hard for ˆwj ’s.
4 Restriction could slow down mixing of chains.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
22. Solution
No need for 1-to-1 correspondence between chains and estimators.
For L indpt chains, let
ˆµj,n = n−1
j
L
l=1
Kl
k=1
f (θlk)1Θj
(θlk)
nj = L
l=1
Kl
k=1 1Θj
(θlk) is # draws in Θj from any chain
L can be much smaller; need not be exponential in dim(Θ).
⇒ Chains unrestricted, can cross between partition elements.
Partition imposed on samples after the fact.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
24. Adaptive partitioning
Still need a partition.
Key: Must not grow exponentially in dim(Θ).
PACE clustering algorithm (VanDerwerken & S., 2013)
Let x
(j)
t denote draw t from chain j
Xi set of draws available at iter i
1 Define x∗
i = arg max
x
(j)
t ∈Xi
{log π(x
(j)
t )}
2 Assign all draws lying in B (x∗
i ) to Ci , and set Xi+1 = Xi Ci .
3 Repeat (1)-(2) until 1 − α of draws clustered (e.g 98%).
4 Reallocate all draws to nearest cluster (Voronoi).
Scott C. Schmidler Parallel Markov Chain Monte Carlo
26. Multimodal example
Langevin diffusion:
dθt =
σ2
2
log π(θt)dt + σdWt
10 chains initialized uniformly
25k iterations each in parallel on 10 processors
Cluster firsting 1k draws after 250 burn-in ⇒ 7 element partition.
In parallel, 1 processor per element (7 total) each generated:
n ≈ 5000 trajectories of length T = 5, and corresponding ˆcj ’s
initialized
iid
∼ t4(mj , J−1(θ))
t4 perturbations instead of Gaussian to ensure var(ˆcj ) < ∞
Scott C. Schmidler Parallel Markov Chain Monte Carlo
27. Multimodal example
−10 −5 0 5 10
−10
−5
0
5
10
1
2
3
4
5
6
7
Clustering of 10 chains initialized uniformly within dashed lines.
Ellipses show 95% contours for component densities of target.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
28. Multimodal example
Estimated weights: [0.02, 0.23, 0.20, 0.55].
(True weights: [0.02, 0.20, 0.20, 0.58])
Using AIS approach instead:
5000 ˆcj ’s from samples of size T = 5 from t4 distn
requires 18s vs. 90s for simulating diffusions
Clustering + IS takes < 1
2 time of parallel 25k chains, so weights
estimated in parallel before sampling complete.
Estimated partition weights:
ˆw7 = [.378, .201, .201, .105, .020, .093, .002]
Estimated component weights (nearly exact):
ˆw = [.020, .201, .201, 0.578]
Scott C. Schmidler Parallel Markov Chain Monte Carlo
29. Multimodal example: higher dimensions
Harder example:
p = 10 dim
4 component-means drawn uniformly on (−10, 10)p
Random covar matrices LT L with L ∼ MNp×p(0, Ip, Ip)
Weights ∼Dirichlet(1, 1, 1, 1)
- 20 parallel r.w. Metropolis chains, 100k iterations each
- Proposal scales tuned adaptively during the first 1k iterations
- Next 49k draws clustered ⇒ 4 partition elements.
- IS using t4(mj , Sj ) for cluster center mj , empirical covar Sj ,
T = 100, n = 1000
Results:
dTV( ˆw, w) = .0024 ˆµ − E(X) L1
= 0.17
Scott C. Schmidler Parallel Markov Chain Monte Carlo
30. Multimodal example: higher dimensions
Sensitivity to partitioning:
Repeating with different gives 8-partition
dTV( ˆw, w) = .0074 ˆµ − E(X) L1
= 0.12
More, smaller weights to estimate, but better mixing within
(smaller) partition elements.
Since successfully repeated in 50, 100 dimensions.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
31. Multimodal example: higher dimensions
p = 50:
2 components: w = [0.1, 0.9]
Random means ∼ U(−10, 10)p; Corr LT L for
L ∼ MNp×p(0, Ip, Ip).
Parallel MCMC
14 chains, initialized uniformly
Normal RW MH with adaptive covar tuned during 100k iter
burn-in
2M post-burn-in draws each, thinned to 1000 draws.
Partition size: 2 ( 2 = 2p)
AIS using t4(mj , Σj ) (5M draws): ˆw = [.101, .899]
Pooling chains directly gives ˜w = [.210, .790], as 3 chains happen
to get stuck in mode 1, 11 in mode 2.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
32. Beyond multimodality
Parallelization easily visualized for multimodal problems, but our
approach is completely general.
What about other types of slowly-mixing chains?
E.g. component-wise chains with strong dependence between
dimensions. (such as correlated Gibbs samplers)
Scott C. Schmidler Parallel Markov Chain Monte Carlo
33. Example: Probit regression
Probit regression model:
Assigns probs 1 − Φ(βX), Φ(βX) to response Y ∈ {0, 1} for
covariate X.
Posterior
π0(β)
n
i=1
Φ(βXi )yi
{1 − Φ(βXi )}1−yi
Data: N = 2000 pairs simulated X ∼ Bern(1
2), β = 5/
√
2.
Diffuse prior (π0(β) = N(0, 102)).
Model also studied by Nobile (1998), Imai & van Dyk (2005)
Traditional Gibbs sampler (Albert & Chib 1993) mixes slowly:
autocorr ρ > 0.999.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
34. Probit regression: Parallel MCMC
10 parallel chains initialized U(0, 20), run for 50k iterations
Partition formed by Voronoi with centers the deciles from the 500k
pooled draws.
Weights estimated via AIS with:
qj = N(mj , 2sj ) for mean, sd draws in Θj
n = 500, T = 10
TV distance to “truth” (200k indpt rejection sampling draws)
calculated on a fine discretization gives dTV = .075.
dTV for serial Gibbs sampler reaches .075 at ∼1.2 million iterations.
⇒ Parallelized Gibbs sampler: same accuracy with < 1
2 as many
draws, and more than 20× speedup due to parallelization.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
35. Probit regression: higher dimensional
N = 500 points for p = 8 covariates drawn from:
(1, Bern(1
2 ), U(0,1), N(0,1), Exp(1), N(5,1), Pois(10), N(20,25))
with β = (0.25, 5, 1, −1.5, −0.1, 0, 0, 0) .
Compare:
1M iterations serial Gibbs sampler
300k iterations each for 10 parallel chains
Partitioning: 2
= p for normalized dimensions
Weights: AIS with qj = t4(m∗
j , Sj ) for empirical mode m∗
j and
covariance Sj in element j, using n = 500, T = 50.
β2 much slower to converge (ρ > 0.999) than others (ρ < 0.95).
So compare marginal distribution for β2 with “truth” (5M MH
samples, ρ < 0.95) using dTV calculated by discretization.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
36. Multivariate Probit Regression: Parallel vs serial Gibbs
0 200 400 600 800 1000
0.0
0.1
0.2
0.3
0.4
0.5
Thousands of iterations
Totalvariationdistance
Using PACE convergence threshold 0.10 (VDW & S., 2013),
parallel Gibbs sampler converges ∼ 20× faster.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
37. Example: Loss of Heterozygosity
Data from Seattle Barrett Esophagus project.
LOH a genetic change undergone by cancer cells; chromosomal
regions with high loss rates may contain regulatory genes.
Loss frequencies modeled by mixture (Desai & Emond, 2004):
(also studied by Craiu et. al. 2009, 2011)
Xi ∼ η Bin(Ni , π1) + (1 − η)Beta-Bin (Ni , π2, γ),
γ controls beta-binomial overdispersion.
Likelihood:
40
i=1
η
ni
xi
πxi
1 (1 − π1)(ni −xi )
+ (1 − η)
ni
xi
B(xi + π2
ω2
, ni − xi + 1−π2
ω2
)
B(π2
ω2
, 1−π2
ω2
)
,
for ω2 = eγ/(2(1 + eγ)) and beta function B.
Scott C. Schmidler Parallel Markov Chain Monte Carlo
38. Example: Loss of Heterozygosity
8 parallel chains initialized at logit(u) for u ∼ [0, 1]4
Clustering in logistic space (to choose 2 = 0.1) yields 7 clusters.
Weight estimation via AIS using:
t4(mj , Sj ) for cluster mean mj , empirical covar Sj
n = 10000 and T = 100.
Results agree with previous analyses, except γ slightly smaller.
Our results confirmed 4 times by iid importance sampling using
3-component t4 mixture, overdispersed covariances, n = 500, 000.
η π1 π2 γ
Parallel MCMC .816 (.001) .299 (.001) 0.678 (.002) 9.49 (.51)
IS .814 (.001) .299 (.001) .676 (.001) 9.84 (.06)
Scott C. Schmidler Parallel Markov Chain Monte Carlo
39. Conclusions
A general scheme for parallelizing any MCMC algorithm
Requires approximating norm constants, but only on local
regions
Requires MCMC to mix locally only
Doesn’t solve all problems, e.g. hitting modes in the first
place (which can be provably intractable)
Potentially powerful. Bigger applications in progress
Scott C. Schmidler Parallel Markov Chain Monte Carlo
40. References
VanDerwerken, DN and Schmidler, SC (2013). Parallel Markov
Chain Monte Carlo. arXiv:1312.7479
VanDerwerken, DN and Schmidler, SC (2017). Parallel Markov
Chain Monte Carlo. (revised and expanded version)
Scott C. Schmidler Parallel Markov Chain Monte Carlo