We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
In this talk, we discuss some recent advances in probabilistic schemes for high-dimensional PIDEs. It is known that traditional PDE solvers, e.g., finite element, finite difference methods, do not scale well with the increase of dimension. The idea of probabilistic schemes is to link a wide class of nonlinear parabolic PIDEs to stochastic Levy processes based on nonlinear version of the Feynman-Kac theory. As such, the solution of the PIDE can be represented by a conditional expectation (i.e., a high-dimensional integral) with respect to a stochastic dynamical system driven by Levy processes. In other words, we can solve the PIDEs by performing high-dimensional numerical integration. A variety of quadrature methods could be applied, including MC, QMC, sparse grids, etc. The probabilistic schemes have been used in many application problems, e.g., particle transport in plasmas (e.g., Vlasov-Fokker-Planck equations), nonlinear filtering (e.g., Zakai equations), and option pricing, etc.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Many mathematical models use a large number of poorly-known parameters as inputs. Quantifying the influence of each of these parameters is one of the aims of sensitivity analysis. Global Sensitivity Analysis is an important paradigm for understanding model behavior, characterizing uncertainty, improving model calibration, etc. Inputs’ uncertainty is modeled by a probability distribution. There exist various measures built in that paradigm. This tutorial focuses on the so-called Sobol’ indices, based on functional variance analysis. Estimation procedures will be presented, and the choice of the designs of experiments these procedures are based on will be discussed. As Sobol’ indices have no clear interpretation in the presence of statistical dependences between inputs, it also seems promising to measure sensitivity with Shapley effects, based on the notion of Shapley value, which is a solution concept in cooperative game theory.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, About Infinite-Dimensional Geometric MCMC - Shiwei Lan, Dec 14, 2017
We combined: low-rank tensor techniques and FFT to compute kriging, estimate variance, compute conditional covariance. We are able to solve 3D problems with very high resolution
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
Talk presented on SIAM IS 2022 conference.
Very often, in the course of uncertainty quantification tasks or
data analysis, one has to deal with high-dimensional random variables (RVs)
(with values in $\Rd$). Just like any other RV,
a high-dimensional RV can be described by its probability density (\pdf) and/or
by the corresponding probability characteristic functions (\pcf),
or a more general representation as
a function of other, known, random variables.
Here the interest is mainly to compute characterisations like the entropy, the Kullback-Leibler, or more general
$f$-divergences. These are all computed from the \pdf, which is often not available directly,
and it is a computational challenge to even represent it in a numerically
feasible fashion in case the dimension $d$ is even moderately large. It
is an even stronger numerical challenge to then actually compute said characterisations
in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose
to approximate density by a low-rank tensor.
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
1. Motivation: improve statistical models
2. Motivation: disadvantages of matrices
3. Tools: Tucker tensor format
4. Tensor approximation of Matern covariance function via FFT
5. Typical statistical operations in Tucker tensor format
6. Numerical experiments
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesMatt Moores
The grouped independence Metropolis-Hastings (GIMH) and Markov chain within Metropolis (MCWM) algorithms are pseudo-marginal methods used to perform Bayesian inference in latent variable models. These methods replace intractable likelihood calculations with unbiased estimates within Markov chain Monte Carlo algorithms. The GIMH method has the posterior of interest as its limiting distribution, but suffers from poor mixing if it is too computationally intensive to obtain high-precision likelihood estimates. The MCWM algorithm has better mixing properties, but less theoretical support. In this paper we accelerate the GIMH method by using a Gaussian process (GP) approximation to the log-likelihood and train this GP using a short pilot run of the MCWM algorithm. Our new method, GP-GIMH, is illustrated on simulated data from a stochastic volatility and a gene network model. Our approach produces reasonable estimates of the univariate and bivariate posterior distributions, and the posterior correlation matrix in these examples with at least an order of magnitude improvement in computing time.
On maximal and variational Fourier restrictionVjekoslavKovac1
Workshop talk slides, Follow-up workshop to trimester program "Harmonic Analysis and Partial Differential Equations", Hausdorff Institute, Bonn, May 2019.
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
Very often one has to deal with high-dimensional random variables (RVs). A high-dimensional RV can be described by its probability density (\pdf) and/or by the corresponding probability characteristic functions (\pcf), or by a function representation. Here the interest is mainly to compute characterisations like the entropy, or
relations between two distributions, like their Kullback-Leibler divergence, or more general measures such as $f$-divergences,
among others. These are all computed from the \pdf, which is often not available directly, and it is a computational challenge to even represent it in a numerically feasible fashion in case the dimension $d$ is even moderately large. It is an even stronger numerical challenge to then actually compute said characterisations in the high-dimensional case.
In this regard, in order to achieve a computationally feasible task, we propose to represent the density by a high order tensor product, and approximate this in a low-rank format.
We study an elliptic eigenvalue problem, with a random coefficient that can be parametrised by infinitely-many stochastic parameters. The physical motivation is the criticality problem for a nuclear reactor: in steady state the fission reaction can be modeled by an elliptic eigenvalue
problem, and the smallest eigenvalue provides a measure of how close the reaction is to equilibrium -- in terms of production/absorption of neutrons. The coefficients are allowed to be random to model the uncertainty of the composition of materials inside the reactor, e.g., the
control rods, reactor structure, fuel rods etc.
The randomness in the coefficient also results in randomness in the eigenvalues and corresponding eigenfunctions. As such, our quantity of interest is the expected value, with
respect to the stochastic parameters, of the smallest eigenvalue, which we formulate as an integral over the infinite-dimensional parameter domain. Our approximation involves three steps: truncating the stochastic dimension, discretizing the spatial domain using finite elements and approximating the now finite but still high-dimensional integral.
To approximate the high-dimensional integral we use quasi-Monte Carlo (QMC) methods. These are deterministic or quasi-random quadrature rules that can be proven to be very efficient for the numerical integration of certain classes of high-dimensional functions. QMC methods have previously been applied to linear functionals of the solution of a similar elliptic source problem; however, because of the nonlinearity of eigenvalues the existing analysis of the integration error
does not hold in our case.
We show that the minimal eigenvalue belongs to the spaces required for QMC theory, outline the approximation algorithm and provide numerical results.
a rather old set of slides, to test why slideshare does not accept my most recent slides
Similar to QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, About Infinite-Dimensional Geometric MCMC - Shiwei Lan, Dec 14, 2017 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop, About Infinite-Dimensional Geometric MCMC - Shiwei Lan, Dec 14, 2017
1. Department of Computing + Mathematical Sciences
California Institute of Technology
About ∞-Dimensional Geometric MCMC a b
Shiwei Lan
DEC 14, 2017 SAMSI, DUKE
aBeskos, Alexandros, Mark Girolami, Shiwei Lan, Patrick E. Farrell, and Andrew M. Stuart
(2017), Geometric MCMC for Infinite-Dimensional Inverse Problems. Journal of Computational
Physics, 335:327–351.
bHolbrook, Andrew, Shiwei Lan, Jeffrey Streets, and Babak Shahbaba. The non-parametric
Fisher information geometry and the chi-square process density prior. arXiv:1707.03117.
2. 1
Table of contents
1. Introduction
2. Geometric Monte Carlo on Infinite Dimensions
3. Dimension Reduction
4. ∞-dimensional Spherical Hamiltonian Monte Carlo
5. Conclusion
S.Lan | ∞-Dimensional Geometric MCMC
5. 4
Bayesian Inverse Problems
Let X, Y be separable Banach spaces, equipped with Borel σ-algebra,
G : X → Y measurable. Want to find u from y
y = G(u) + η
Prior: u ∼ µ0 on X. Noise: η ∼ Q0 on Y and η ⊥ u.
Assume y|u ∼ Qu Q0 for u µ0-a.s. Define likelihood
dQu
dQ0
(y) = exp(−Φ(u; y))
Theorem (Bayes’ Theorem)
Assume for y Q0-a.s.: Z := X
exp(−Φ(u; y))µ0(du) > 0. Then u|y exists
under ν, denoted by µy
. Furthermore, µy
µ0 and for y ν-a.s.
dµy
dµ0
(u) =
1
Z
exp(−Φ(u; y)).
S.Lan | ∞-Dimensional Geometric MCMC
6. 5
Metropolis-Hastings in general state space
(Tierney, 1998)
target µ(du), e.g. µy
(du) ∝ exp(−Φ(u; y))µ0(du), µ0 = N(0, C).
Denote Q(u, du ) as the proposal probability kernel
Denote the transition measure and its transpose as follows
ν(du, du ) = µ(du)Q(u, du ), νT
(du, du ) = ν(du , du)
When ν νT
(i.e. ν νT
and νT
ν) with density
r(u, u ) =
dν
dνT
(u, u )
The acceptance probability is
a(u, u ) = 1 ∧ r(u, u )
S.Lan | ∞-Dimensional Geometric MCMC
8. 7
preconditioned Crank-Nicolson
(Cotter et al., 2013)
A finite difference method used for numerically solving PDE (Richtmyer
and Morton, 1994)
Modified Random Walk Metropolis (RWM)
Given u, sample ξ ∼ N(0, C)
Make proposal
u = 1 − β2u + βξ (1)
Accept u with probability
a(u, u ) = 1 ∧ r(u, u ), r(u, u ) = exp(−Φ(u ) + Φ(u)) (2)
Independent of dimension, mixing faster than RWM
S.Lan | ∞-Dimensional Geometric MCMC
10. 9
∞-dimensional MALA
(Beskos et al., 2008)
Consider the Langevin SDE
du
dt
= −
1
2
K(C−1
u + DΦ(u)) +
√
K
dW
dt
(3)
Semi-implicit scheme to discretize the above SDE
u − u
h
= −
1
2
KC−1
[(1 − θ)u + θu ] −
α
2
KDΦ(u)) +
K
h
ξ, ξ ∼ N(0, I)
which may be simplified to give
u = Aθu + Bθv, v =
√
Cξ −
α
2
√
hKCDΦ(u)
Aθ = (I +
θ
2
hKC−1
)−1
(I −
1 − θ
2
hKC−1
), Bθ = (I +
θ
2
hKC−1
)−1
√
hKC−1
(4)
where α = 1. α = 0 =⇒ pCN.
K = I (IA) or K = C (PIA).
S.Lan | ∞-Dimensional Geometric MCMC
11. 10
∞-dimensional HMC
(Beskos et al., 2011)
Consider the Hamiltonian differential equation
d2
u
dt2
+ K(C−1
u + DΦ(u)) = 0, (v :=
du
dt
)
t=0
∼ N(0, K) (5)
Let K = C, f (u) := −DΦ(u). Störmer-Verlet/splitting scheme (Verlet,
1967; Neal, 2010) is used to discretize (5)
v−
= v +
t
2
Cf (u)
u
v+ =
cos t sin t
− sin t cos t
u
v−
v = v+
+
t
2
Cf (u )
(6)
This defines a mapping Ψt : (u, v) → (u , v )
S.Lan | ∞-Dimensional Geometric MCMC
13. 12
∞-dimensional manifold MALA
(Beskos, 2014)
Choose K = G(u)−1
in (3) (Girolami and Calderhead, 2011)
du
dt
= −
1
2
G(u)−1
(C−1
u + DΦ(u)) + G(u)−1
dW
dt
(7)
Define G : X → L(X, X) and g : X → X by
g(u) = −G(u)−1
((C−1
− G(u))u + DΦ(u)) (8)
Similar semi-implicit scheme (θ = 1
2 ) yields
u = a 1
2
u + b 1
2
v, v =
√
h
2
g(u) + G(u)− 1
2 ξ
a 1
2
=
1 − h/4
1 + h/4
, b 1
2
=
√
h
1 + h/4
(9)
Note a2
1
2
+ b2
1
2
= 1 follows Crank-Nicolson scheme (Richtmyer and
Morton, 1994).
S.Lan | ∞-Dimensional Geometric MCMC
14. 13
∞-dimensional manifold MALA
(Beskos, 2014)
Assumption (1)
N(0, G(u)−1
) N(0, C) for µ0-a.s. in u.
Assumption (2)
For µ0-a.s. in u, we have g(u) ∈ Im(C
1
2 ), i.e. N(g(u), C) N(0, C).
S.Lan | ∞-Dimensional Geometric MCMC
15. 14
∞-dimensional manifold MALA
(Beskos, 2014)
Theorem (Beskos (2014))
Denote ν(du, du ) = µ(du)Q(u, du ) with Q(u, du ) being the transition kernel of (9). Under Assumptions
1 and 2, ν νT
, and the acceptance probability has the following form
a(u, u ) = 1 ∧
dνT
dν
(u, u ),
dνT
dν
(u, u ) =
exp{−Φ(u )}λ(ρ−1
(u; u ); u )
exp{−Φ(u)}λ(ρ−1(u ; u); u)
(10)
where ρ−1
(u ; u) = [(1 + h/4)u − (1 − h/4)u]/
√
h and λ(w; u) is calculated as follows:
λ(w; u) =
dN (
√
h/2g(u), G(u)−1
)
dN (0, C)
=
dN (
√
h/2g(u), G(u)−1
)
dN (0, G(u)−1)
dN (0, G(u)−1
)
dN (0, C)
= exp
√
h
2
G
1
2 (u)g(u), G
1
2 (u)w −
h
8
|G
1
2 (u)g(u)|
2
·
exp −
1
2
|G
1
2 (u)w|
2
+
1
2
|C
− 1
2 w|
2
|C
1
2 G
1
2 (u)|
S.Lan | ∞-Dimensional Geometric MCMC
16. 15
∞-dimensional manifold HMC
Let K = G(u)−1
in (5) (Girolami and Calderhead, 2011)
d2
u
dt2
+ u = g(u), (v :=
du
dt
)
t=0
∼ N(0, G(u)−1
) (11)
Discretize (11) by a splitting method of the form
v−
= v +
t
2
g(u)
u
v+ =
cos τ sin τ
− sin τ cos τ
u
v−
v = v+
+
t
2
g(u )
(12)
where τ(t)/t → 1 as t → 0.
Evolving (u, v) for k-folds (12) defines Ψk
t : (u0, v0) → (uk, vk).
S.Lan | ∞-Dimensional Geometric MCMC
17. 16
∞-dimensional manifold HMC
Theorem
Denote ν(du, du ) = µ(du)Q(u, du ) with Q(u, du ) being the transition kernel of Ψk
t . Under Assumptions
1 and 2, ν νT
, and the acceptance probability has the following form
1 ∧ exp(−∆H(u, v)) (13)
where
∆H(u, v) = H(Ψ
(k)
t (u, v)) − H(u, v)
=Φ(uk) − Φ(u0) +
1
2
(|(G(uk) − C
−1
)
1
2 vk|
2
− |(G(u0) − C
−1
)
1
2 v0|
2
− log |G(uk)| + log |G(u0)|)
+
t2
8
(|C
− 1
2 g(u0)|
2
− |C
− 1
2 g(uk)|
2
) +
t
2
k−1
=0
( v , C
−1
g(u ) + v +1, C
−1
g(u +1) )
S.Lan | ∞-Dimensional Geometric MCMC
18. 17
Connections between ∞-dimensional MCMC
Remark
As a direct corollary, when I = 1, ∞-(m)HMC reduces to ∞-(m)MALA.
∞ − MALA
position-dependent pre-conditioner K(u)
−−−−−−−−−−−−−−−−−−−−→ ∞ − mMALA
h=4
−−→ SN
multiplesteps(k>1)
−−−−−−−−−−→
multiplesteps(k>1)
−−−−−−−−−−→
∞ − HMC
position-dependent pre-conditioner K(u)
−−−−−−−−−−−−−−−−−−−−→ ∞ − mHMC
S.Lan | ∞-Dimensional Geometric MCMC
21. 20
Dimension Reduction
intrinsic low-dimensional subspace
Particular choice of K(u) = G(u)−1
could be
K(u) = (C−1
+ H(u))−1
, H(u) = EY|u DΦ(u), DΦ(u) (14)
Computationally infeasible to update Fisher metric H(u) in each u
Given eigenbasis {φi(x)} we define projection operator Pr as follows:
Pr : X → Xr, u → ur
:=
r
i=1
φi φi, u (15)
Truncate H(u) on the r-dimensional subspace Xr ⊂ X (X = Xr + X⊥)
Hr(u)(v, w) = Prv, EY|u[DrΦ(u)DrΦ(u)
T
]Prw , ∀v, w ∈ X (16)
K(u)−1
can be approximated
K(u)−1
≈ Hr(u) + C−1
(17)
S.Lan | ∞-Dimensional Geometric MCMC
22. 21
Prior-Based Dimension Reduction
With the eigen-pair {λi, ui(x)}, the prior covariance operator C can be
written and approximated as
C = UΛU∗
≈ UrΛrU∗
r (18)
Then we can approximate the posterior covariance
K(u) = (C−1
+ H(u))−1
≈ C + UrΛ
1
2
r (Dr − Ir)Λ
1
2
r U∗
r (19)
where Dr := (ˆHr(u) + Ir)−1
and ˆHr(u) := Λ
1
2
r U∗
r H(u)UrΛ
1
2
r .
By applying U∗
r and U∗
⊥ to Langevin SDE (3) respectively we get
dur = −
1
2
Drurdt −
γr
2
Dr ur Φ(u; y)dt +
√
DrdWr (20a)
du⊥ = −
1
2
u⊥dt −
γ⊥
2
u⊥
Φ(u; y)dt + dW⊥ (20b)
where ur = Λ
− 1
2
r U∗
r u and u⊥ = Λ
− 1
2
⊥ U∗
⊥u.
S.Lan | ∞-Dimensional Geometric MCMC
23. 22
Karhunen-Loève Expansion
Let X be a Hilbert space H = L2
(D; R) on bounded open D ⊂ Rd
with Lipschitz boundary, and with inner-product ·, · and norm ·
Consider the following covariance operator C
C := σ2
(αI − ∆)−s
(21)
Let {λ2
i } and {φi(x)} denote eigenvalues and eigenfunctions of C.
If s > d/2 and λi i− s
d , C defines a Gaussian measure N(0, C) that
each draw u(·) ∼ N(0, C) admits Karhunen-Loève (K-L) expansion
(Adler, 1981; Bogachev, 1998; Dashti and Stuart, 2015):
u(x) =
+∞
i=0
uiλiφi(x), ui
iid
∼ N(0, 1) (22)
S.Lan | ∞-Dimensional Geometric MCMC
24. 23
Laminar Jet
formulation of the inverse problem
We consider the following 2d Navier-Stokes equation:
Momentum equation:
− div ν( u + uT
) + u · u + p = 0, (23)
where u = (u, v) is the velocity field and p the pressure;
Continuity equation:
div u = 0. (24)
Denote σn = −pn + ν( u + uT
) · n as the boundary traction.
u · n = −θ(y), σn × n = 0 on I := x = 0, y ∈ (−
1
2
Ly,
1
2
Ly)
σn + β(u · n)− u = 0 on O := x = Lx, y ∈ (−
1
2
Ly,
1
2
Ly)
u · n = 0, σn × n = 0 on B := x ∈ (0, Lx), y = ±
1
2
Ly
where (u · n)− = u·n−|u·n|
2 , and β ∈ (0, 1] is the backflow stabilization
parameter
S.Lan | ∞-Dimensional Geometric MCMC
25. 24
Laminar Jet
results
Laminar Jet problem: the location of observations (left) and the forward PDE
solutions with true unknown (right). Fluid viscosity ν = 3 × 10−2
.
S.Lan | ∞-Dimensional Geometric MCMC
26. 25
Laminar Jet
results
Laminar Jet test problem: true inflow velocity profiles with different number of
modes (left) and posterior estimates given by different algorithms (right). Shaded
region shows the 95% credible band estimated with samples by ∞-mHMC.
S.Lan | ∞-Dimensional Geometric MCMC
27. 26
Laminar Jet
results
Laminar Jet problem: the trace plots of data-misfit function evaluated with each
sample (left, values have been offset to be better compared with) and the
auto-correlation of data-misfits as a function of lag (right).
S.Lan | ∞-Dimensional Geometric MCMC
29. 28
Likelihood-Informed Dimension Reduction
By whitening coordinates vi(x) = C− 1
2 ui(x), generalized eigenproblem
H(u)ui(x) = λiC−1
ui(x) (25)
can be shown equivalent to the eigen-problem for ppGNH H(v)
H(v)vi(x) = C
1
2 H(u)C
1
2 vi(x) = λivi(x) (26)
The local posterior covariance is approximated (Dr := (Ir + Λr)−1
)
K(v) = (I + H(v))−1
≈ I + Vr(Dr − Ir)V∗
r (27)
By applying V∗
r and V∗
⊥ to whitened Langevin SDE respectively
dvr = −
1
2
Drvrdt −
γr
2
Dr vr
Φ(v; y)dt +
√
DrdWr (28a)
dv⊥ = −
1
2
v⊥dt −
γ⊥
2
v⊥
Φ(v; y)dt + dW⊥ (28b)
where vr = V∗
r v and v⊥ = V∗
⊥v.
S.Lan | ∞-Dimensional Geometric MCMC
30. 29
Connection to DILI
(Cui et al., 2016)
By considering the approximation H(v) ≈ VrΛrV∗
r , we have the
posterior covariance projected to r-dimensional subspace:
Kr = Covµ[V∗
r v] = V∗
r (I + H(v))−1
Vr ≈ Dr := (Ir + Λr)−1
(29)
While DILI computes the posterior covariance in the subspace
Kr := Covµ[V∗
r v] empirically.
They both have the following approximate posterior covariance
Kv = Covµ[v] ≈ VrKrV∗
r + I − VrV∗
r (30)
Since we directly work with (29), empirical calculation of Kr is avoided;
The approximation (29) is already in diagonal form thus the rotation
Ψr = VrWr in DILI is not needed.
They capture the similar geometric feature of subspace.
S.Lan | ∞-Dimensional Geometric MCMC
31. 30
Elliptic Inverse Problem
formulation of the inverse problem
Consider the elliptic inverse problem as in (DILI Cui et al., 2016)
defined on the unit square domain Ω = [0, 1]2
:
− · (k(s) p(s)) = f (s), s ∈ Ω
k(s) p(s), n(s) = 0, s ∈ ∂Ω
∂Ω
p(s)dl(s) = 0
(31)
where k(s) is the transmissivity field, p(s) is the potential function,
f (s) is the forcing term, and n(s) is the outward normal to the
boundary.
The inverse problem is to infer u = log k from observations {yn}.
25 observations arise from the solutions (solved on a 80 × 80 mesh)
contaminated by additive Gaussian error:
yn = p(xn) + εn, εn ∼ N(0, σ2
y), SNR := max
s
{u(s)}/σy = 100
S.Lan | ∞-Dimensional Geometric MCMC
33. 32
Elliptic Inverse Problem
results
Elliptic inverse problem (SNR = 100): Bayesian posterior mean estimates of the
log-transmissivity field u(s) based on 2000 samples by various MCMC algorithms;
the upper-left corner shows the MAP estimate.
S.Lan | ∞-Dimensional Geometric MCMC
34. 33
Elliptic Inverse Problem
results
Method h AP s/iter ESS(min,med,max) minESS/s spdup PDEsolns
pCN 0.01 0.57 0.99 (2.67,6.95,37.79) 0.0013 1.00 2501
∞-MALA 0.04 0.61 1.62 (4.32,15.34,51.45) 0.0013 0.99 5002
∞-HMC 0.04 0.59 3.52 (24.36,92.13,184.84) 0.0035 2.57 12342
DR-∞-mMALA 0.52 0.67 8.85 (127.25,210.84,460.07) 0.0072 5.34 80032
DR-∞-mHMC 0.25 0.56 22.97 (190.2,322.29,687.11) 0.0041 3.08 198176
DILI (0.1, 0.2) 0.69 1.59 (30.52,133.67,221.97) 0.0096 7.13 6612
aDR-∞-mMALA 0.25 0.71 1.61 (12.09,89.17,174.36) 0.0037 2.79 6612
aDR-∞-mHMC 0.10 0.69 3.63 (70.99,234.42,364.31) 0.0098 7.26 14056
Sampling efficiency in elliptic inverse problem (SNR=100). Column labels are as
follows. h: step size(s) used for making MCMC proposal; AP: average acceptance
probability; s/iter: average seconds per iteration; ESS(min,med,max): minimum,
median, maximum of Effective Sample Size across all posterior coordinates;
min(ESS)/s: minimum ESS per second; spdup: speed-up relative to base pCN
algorithm; PDEsolns: number of PDE solutions during execution.
S.Lan | ∞-Dimensional Geometric MCMC
36. 35
Representation of Probability Densities
Consider probability distributions over smooth manifolds D. Having
fixed a background measure µ, let
P := p : D → R | p ≥ 0,
D
p(x) µ(dx) = 1 (32)
Define the following nonparametric Fisher metric on the tangent space
TpP := φ ∈ C∞
(D) | D
φ(x) µ(dx) = 0 :
gF (φ, ψ)p :=
D
φ(x)ψ(x)
p(x)
µ(dx). (33)
The square-root mapping S : (P, gF ) → (Q, ·, · 2), S(p) = q = 2
√
p
is a Riemannian isometry, where Q is ∞-dimensional sphere in L2
(D)
Q := q : D → R |
D
q(x)2
µ(dx) = 1 , f , h 2 =
D
fhdµ(x) (34)
S.Lan | ∞-Dimensional Geometric MCMC
37. 36
Nonparametric Density Modeling
It is easier to work with root density q ∈ Q (e.g. clean geodesic flow).
Restrict Gaussian process prior q(·) ∼ GP(0, K(·)) to Q, where the
covariance operator K = σ2
(α − ∆)−s
has eigen-pairs {λ2
i , φi(x)}∞
i=1.
Then q(x) 2 =
∞
i=1 qiφi(x) 2 = 1 with qi ∼ N(0, λ2
i ) implies
q 2
2 :=
∞
i=1
q2
i = 1, i.e. q := (qi) ∈ S∞
(35)
Given data x = {xn ∈ D}N
n , we have the posterior density
π(q|x) ∝ π(q) π(x|q) =
∞
i=1
exp − q2
i /(2λ2
i ) δ q 2
(1)
N
n=1
q2
(xn) (36)
Sampling q = (qi) can be done by spherical HMC (Lan et al., 2014).
S.Lan | ∞-Dimensional Geometric MCMC
38. 37
Spherical Hamiltonian Monte Carlo
(Lan et al., 2014)
Truncate q := (qi) at I, q := (qi)I
i=1. Define the total energy
E(q, v) := U(q) + K(v; q) = ˜U(q) + K0(v; q) (37)
˜U(q) := U(q) −
1
2
log |G(q−I)| = − log π(q|x) + log |qI| (38)
K0(v; q) :=
1
2
vT
−IG(q−I)v−I =
1
2
vT
v (39)
Discretizing the Hamiltonian equation results in
v−
= v −
h
2
P(q)g(q)
q
v+ =
r 0
0 v−
2
cos( v−
2r−1
h) + sin( v−
2r−1
h)
− sin( v−
2r−1
h) + cos( v−
2r−1
h)
r−1
0
0 v− −1
2
q
v−
v = v+
−
h
2
P(q )g(q )
(40)
where g(q) := q
˜U(q), P(q) := ID − r−2
qqT
.
S.Lan | ∞-Dimensional Geometric MCMC
41. 40
Conclusion
Geometric information (gradient, metric) helps MCMC in mixing but
comes at computational cost, alleviated by dimension reduction.
Key: find intrinsic finite dimensional subspace where most
information concentrates and manifold MCMC can be applied
Light-computation algorithms, e.g. pCN, preconditioned MALA, can be
applied in the less informative complementary subspace
MCMC defined on manifold can help solve challenging statistical
problems, e.g. density estimation, modeling covariance/correlation
matrices.
S.Lan | ∞-Dimensional Geometric MCMC
42. 40
References I
Adler, R. J. (1981). The geometry of random fields, volume 62. Siam.
Beskos, A. (2014). A stable manifold mcmc method for high dimensions. Statistics & Probability Letters,
90:46–52.
Beskos, A., Pinski, F. J., Sanz-Serna, J. M., and Stuart, A. M. (2011). Hybrid Monte-Carlo on Hilbert spaces.
Stochastic Processes and their Applications, 121:2201–2230.
Beskos, A., Roberts, G., Stuart, A., and Voss, J. (2008). Mcmc methods for diffusion bridges. Stochastics and
Dynamics, 8(03):319–350.
Bogachev, V. I. (1998). Gaussian measures. Number 62. American Mathematical Soc.
Cotter, S. L., Roberts, G. O., Stuart, A., White, D., et al. (2013). Mcmc methods for functions: modifying old
algorithms to make them faster. Statistical Science, 28(3):424–446.
Cui, T., Law, K. J., and Marzouk, Y. M. (2016). Dimension-independent likelihood-informed {MCMC}. Journal of
Computational Physics, 304:109 – 137.
Dashti, M. and Stuart, A. M. (2015). The Bayesian Approach To Inverse Problems. ArXiv e-prints.
Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods.
Journal of the Royal Statistical Society, Series B, (with discussion) 73(2):123–214.
Lan, S., Zhou, B., and Shahbaba, B. (2014). Spherical hamiltonian monte carlo for constrained target
distributions. In Proceedings of The 31st International Conference on Machine Learning, pages 629–637,
Beijing, China.
Neal, R. M. (2010). MCMC using Hamiltonian dynamics. In Brooks, S., Gelman, A., Jones, G., and Meng, X. L.,
editors, Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC.
S.Lan | ∞-Dimensional Geometric MCMC
43. 41
References II
Richtmyer, R. D. and Morton, K. W. (1994). Difference methods for initial-value problems. Malabar, Fla.:
Krieger Publishing Co.,| c1994, 2nd ed., 1.
Tierney, L. (1998). A note on metropolis-hastings kernels for general state spaces. Annals of Applied
Probability, pages 1–9.
Verlet, L. (1967). Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of
Lennard-Jones Molecules. Phys. Rev., 159(1):98–103.
S.Lan | ∞-Dimensional Geometric MCMC