We examine the effectiveness of randomized quasi Monte Carlo (RQMC) to improve the convergence rate of the mean integrated square error, compared with crude Monte Carlo (MC), when estimating the density of a random variable X defined as a function over the s-dimensional unit cube (0,1)^s. We consider histograms and kernel density estimators. We show both theoretically and empirically that RQMC estimators can achieve faster convergence rates in
some situations.
This is joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
In this talk we consider the question of how to use QMC with an empirical dataset, such as a set of points generated by MCMC. Using ideas from partitioning for parallel computing, we apply recursive bisection to reorder the points, and then interleave the bits of the QMC coordinates to select the appropriate point from the dataset. Numerical tests show that in the case of known distributions this is almost as effective as applying QMC directly to the original distribution. The same recursive bisection can also be used to thin the dataset, by recursively bisecting down to many small subsets of points, and then randomly selecting one point from each subset. This makes it possible to reduce the size of the dataset greatly without significantly increasing the overall error. Co-author: Fei Xie
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
In this talk we consider the question of how to use QMC with an empirical dataset, such as a set of points generated by MCMC. Using ideas from partitioning for parallel computing, we apply recursive bisection to reorder the points, and then interleave the bits of the QMC coordinates to select the appropriate point from the dataset. Numerical tests show that in the case of known distributions this is almost as effective as applying QMC directly to the original distribution. The same recursive bisection can also be used to thin the dataset, by recursively bisecting down to many small subsets of points, and then randomly selecting one point from each subset. This makes it possible to reduce the size of the dataset greatly without significantly increasing the overall error. Co-author: Fei Xie
One of the central tasks in computational mathematics and statistics is to accurately approximate unknown target functions. This is typically done with the help of data — samples of the unknown functions. The emergence of Big Data presents both opportunities and challenges. On one hand, big data introduces more information about the unknowns and, in principle, allows us to create more accurate models. On the other hand, data storage and processing become highly challenging. In this talk, we present a set of sequential algorithms for function approximation in high dimensions with large data sets. The algorithms are of iterative nature and involve only vector operations. They use one data sample at each step and can handle dynamic/stream data. We present both the numerical algorithms, which are easy to implement, as well as rigorous analysis for their theoretical foundation.
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Multidimensional integrals may be approximated by weighted averages of integrand values. Quasi-Monte Carlo (QMC) methods are more accurate than simple Monte Carlo methods because they carefully choose where to evaluate the integrand. This tutorial focuses on how quickly QMC methods converge to the correct answer as the number of integrand values increases. The answer may depend on the smoothness of the integrand and the sophistication of the QMC method. QMC error analysis may assumes the integrand belongs to a reproducing kernel Hilbert space or may assume that the integrand is an instance of a stochastic process with known covariance structure. These two approaches have interesting parallels. This tutorial also explores how the computational cost of achieving a good approximation to the integral depends on the dimension of the domain of the integrand. Finally, this tutorial explores methods for determining how many integrand values are needed to satisfy the error tolerance. Relevant software is described.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
We present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means, k-medoids, k-medians, k-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate k by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.
http://arxiv.org/abs/1403.2485
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
We combined: low-rank tensor techniques and FFT to compute kriging, estimate variance, compute conditional covariance. We are able to solve 3D problems with very high resolution
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
In this tutorial I will provide a survey of recent research efforts on the application of QMC methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. A prime example is the flow of water through a disordered porous medium. There is a huge body of literature on this topic using a variety of methods. QMC methods are relatively new to this application area. The aim of this tutorial is to provide an entry point for QMC experts wanting to start research in this direction, for PDE analysts and practitioners wanting to tap into contemporary QMC theory and methods, and for anyone else who sees how to cross-fertilize the ideas to other application areas.
Implicit schemes are needed in order to have fast runtime in wave models. Parallelization using the Message Passing Interface are needed in order to run on computers with thousands of processors. Implicit schemes rely on preconditioner in order for the iterative schemes to converge fast. Thus we need fast preconditioners and we present those here.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
We present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means, k-medoids, k-medians, k-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate k by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.
http://arxiv.org/abs/1403.2485
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
The standard Galerkin formulation of the acoustic wave propagation, governed by the Helmholtz partial differential equation (PDE), is indefinite for large wavenumbers. However, the Helmholtz PDE is in general not indefinite. The lack of coercivity (indefiniteness) is one of the major difficulties for approximation and simulation of heterogeneous media wave propagation models, including application to stochastic wave propagation Quasi Monte Carlo (QMC) analysis. We will present a new class of sign-definite continuous and discrete preconditioned FEM Helmholtz wave propagation models.
QMC algorithms usually rely on a choice of “N” evenly distributed integration nodes in $[0,1)^d$. A common means to assess such an equidistributional property for a point set or sequence is the so-called discrepancy function, which compares the actual number of points to the expected number of points (assuming uniform distribution on $[0,1)^{d}$) that lie within an arbitrary axis parallel rectangle anchored at the origin. The dependence of the integration error using QMC rules on various norms of the discrepancy function is made precise within the well-known Koksma--Hlawka inequality and its variations. In many cases, such as $L^{p}$ spaces, $1<p<\infty$, the best growth rate in terms of the number of points “N” as well as corresponding explicit constructions are known. In the classical setting $p=\infty$ sharp results are absent for $d\geq3$ already and appear to be intriguingly hard to obtain. This talk shall serve as a survey on discrepancy theory with a special emphasis on the $L^{\infty}$ setting. Furthermore, it highlights the evolution of recent techniques and presents the latest results.
Computational Information Geometry: A quick review (ICMS)Frank Nielsen
From the workshop
Computational information geometry for image and signal processing
Sep 21, 2015 - Sep 25, 2015
ICMS, 15 South College Street, Edinburgh
http://www.icms.org.uk/workshop.php?id=343
In the study of probabilistic integrators for deterministic ordinary differential equations, one goal is to establish the convergence (in an appropriate topology) of the random solutions to the true deterministic solution of an initial value problem defined by some operator. The challenge is to identify the right conditions on the additive noise with which one constructs the probabilistic integrator, so that the convergence of the random solutions has the same order as the underlying deterministic integrator. In the context of ordinary differential equations, Conrad et. al. (Stat.
Comput., 2017), established the mean square convergence of the solutions for globally Lipschitz vector fields, under the assumptions of i.i.d., state-independent, mean-zero Gaussian noise. We extend their analysis by considering vector fields that need not be globally Lipschitz, and by
considering non-Gaussian, non-i.i.d. noise that can depend on the state and that can have nonzero mean. A key assumption is a uniform moment bound condition on the noise. We obtain convergence in the stronger topology of the uniform norm, and establish results that connect this topology to the regularity of the additive noise. Joint work with A. M. Stuart (Caltech), T. J. Sullivan (Free University of Berlin).
We combined: low-rank tensor techniques and FFT to compute kriging, estimate variance, compute conditional covariance. We are able to solve 3D problems with very high resolution
Sequential quasi-Monte Carlo (SQMC) is a quasi-Monte Carlo (QMC) version of sequential Monte Carlo (or particle filtering), a popular class of Monte Carlo techniques used to carry out inference in state space models. In this talk I will first review the SQMC methodology as well as some theoretical results. Although SQMC converges faster than the usual Monte Carlo error rate its performance deteriorates quickly as the dimension of the hidden variable increases. However, I will show with an example that SQMC may perform well for some "high" dimensional problems. I will conclude this talk with some open problems and potential applications of SQMC in complicated settings.
In this tutorial I will provide a survey of recent research efforts on the application of QMC methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. A prime example is the flow of water through a disordered porous medium. There is a huge body of literature on this topic using a variety of methods. QMC methods are relatively new to this application area. The aim of this tutorial is to provide an entry point for QMC experts wanting to start research in this direction, for PDE analysts and practitioners wanting to tap into contemporary QMC theory and methods, and for anyone else who sees how to cross-fertilize the ideas to other application areas.
Implicit schemes are needed in order to have fast runtime in wave models. Parallelization using the Message Passing Interface are needed in order to run on computers with thousands of processors. Implicit schemes rely on preconditioner in order for the iterative schemes to converge fast. Thus we need fast preconditioners and we present those here.
My talk in the International Conference on Computational Finance 2019 (ICCF2019). The talk is about designing new efficient methods for option pricing under the rough Bergomi model.
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
We develop a Bayesian update surrogate. Our formula allows us to update polynomial chaos coefficients. In contrast to classical Bayesian approach, we suggest to update PCE coefficients. We show that classical Kalman filter is a particular case of our update.
I am Katie P. I am a Maths Assignment Expert at mathsassignmenthelp.com. I hold a Master's in Mathematics from Concordia University. I have been helping students with their assignments for the past 10 years. I solve assignments related to Maths.
Visit mathsassignmenthelp.com or email info@mathsassignmenthelp.com.
You can also call +1 678 648 4277 for any assistance with Maths Assignments.
I am Ronald N. I am a Numerical Analysis Assignment Solver at mathhomeworksolver.com. I hold a Master's in Mathematics, from Quebec, Canada. I have been helping students with their assignments for the past 10 years. I solved assignments related to Numerical Analysis.
Visit mathhomeworksolver.com or email support@mathhomeworksolver.com. You can also call on +1 678 648 4277 for any assistance with Numerical Analysis Assignment.
We compute a low-rank surrogate (response surface) approximation to the solution of stochastic PDE. This is a Karhunen-Loeve/polynomial chaos approximation. After that, to compute required statistics, we sample this cheap surrogate, avoiding very expensive solution of the deterministic problem.
We compute polynomial based surrogates for all components of the solution of the Navier-Stokes equation. We compress this surrogate on the fly to reduce cubic computational complexity to almost linear. All these surrogates are used to quantify uncertainties in numerical aerodynamics.
Talk presented on this workshop "Workshop: Imaging With Uncertainty Quantification (IUQ), September 2022",
https://people.compute.dtu.dk/pcha/CUQI/IUQworkshop.html
We consider a weakly supervised classification problem. It
is a classification problem where the target variable can be unknown
or uncertain for some subset of samples. This problem appears when
the labeling is impossible, time-consuming, or expensive. Noisy measurements
and lack of data may prevent accurate labeling. Our task
is to build an optimal classification function. For this, we construct and
minimize a specific objective function, which includes the fitting error on
labeled data and a smoothness term. Next, we use covariance and radial AQ1
basis functions to define the degree of similarity between points. The further
process involves the repeated solution of an extensive linear system
with the graph Laplacian operator. To speed up this solution process,
we introduce low-rank approximation techniques. We call the resulting
algorithm WSC-LR. Then we use the WSC-LR algorithm for analysis
CT brain scans to recognize ischemic stroke disease. We also compare
WSC-LR with other well-known machine learning algorithms.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Similar to QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo - Pierre L'Ecuyer, May 7, 2018 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How libraries can support authors with open access requirements for UKRI fund...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo - Pierre L'Ecuyer, May 7, 2018
1. Draft
1
Density estimation
by Randomized Quasi-Monte Carlo
Pierre L’Ecuyer
Joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer
SAMSI, Raleigh-Durham, North-Carolina, May 2018
2. Draft
2
What this talk is about
Quasi-Monte Carlo (QMC) and randomized QMC (RQMC) methods have been studied
extensively for estimating an integral, E[X].
Can they be useful for estimating the entire distribution of X?
E.g., estimating a density, a cdf, some quantiles, etc.
When hours or days of computing time are required to perform simulation runs, reporting
only a confidence interval on the mean is a waste of information!
People routinely look at empirical distributions via histograms, for example.
More refined methods: kernel density estimators (KDEs).
Can RQMC improve such density estimators, and by how much?
3. Draft
3
Setting
Classical density estimation was developed in the context where independent observations
X1, . . . , Xn are given and one wishes to estimate the density f from which they come.
Here we assume that X1, . . . , Xn are generated by simulation from a stochastic model.
We can choose n and we have some freedom on how the simulation is performed.
The Xi ’s are realizations of a random variable X = g(U) ∈ R with density f , where
U = (U1, . . . , Us) ∼ U(0, 1)s and g(u) can be computed easily for any u ∈ (0, 1)s.
Can we obtain a better estimate of f with RQMC instead of MC? How much better?
4. Draft
4
Density Estimation
Suppose we estimate the density f over a finite interval (a, b).
Let ˆfn(x) denote the density estimator at x, with sample size n.
We use the following measures of error:
MISE = mean integrated squared error =
b
a
E[ˆfn(x) − f (x)]2
dx
= IV + ISB
IV = integrated variance =
b
a
Var[ˆfn(x)]dx
ISB = integrated squared bias =
b
a
(E[ˆfn(x)] − f (x))2
dx
5. Draft
5
Density Estimation
Simple histogram: Partition [a, b] in m intervals of size h = (b − a)/m and define
ˆfn(x) =
nj
nh
for x ∈ Ij = [a + (j − 1)h, a + jh), j = 1, ..., m
where nj is the number of observations Xi that fall in interval j.
Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at
0) and bandwidth h > 0 (serves as horizontal stretching factor for the kernel). The KDE is
defined by
ˆfn(x) =
1
nh
n
i=1
k
x − Xi
h
.
6. Draft
6
Asymptotic convergence with Monte Carlo for smooth f
For g : R → R, define
R(g) =
b
a
(g(x))2
dx,
µr (g) =
∞
−∞
xr
g(x)dx, for r = 0, 1, 2, . . .
For histograms and KDEs, when n → ∞ and h → 0:
AMISE = AIV + AISB ∼
C
nh
+ Bhα .
C B α
Histogram 1 R(f ) /12 2
KDE µ0(k2) (µ2(k))2 R(f ) /4 4
7. Draft
7
The asymptotically optimal h is
h∗
=
C
Bαn
1/(α+1)
and it gives AMISE = Kn−α/(1+α).
C B α h∗ AMISE
Histogram 1
R(f )
12
2 (nR(f )/6)−1/3 O(n−2/3)
KDE µ0(k2)
(µ2(k))2 R(f )
4
4
µ0(k2)
(µ2(k))2R(f )n
1/5
O(n−4/5)
To estimate h∗, one can estimate R(f ) and R(f ) via KDE (plugin).
8. Draft
8
Asymptotic convergence with RQMC for smooth f
Idea: Replace U1, . . . , Un by RQMC points.
RQMC does not change the bias.
For a KDE with smooth k, one could hope (perhaps) to get
AIV = C n−β
h−1
for β > 1, instead of Cn−1
h−1
.
If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well
(re-balance) and then reduce the MISE.
9. Draft
8
Asymptotic convergence with RQMC for smooth f
Idea: Replace U1, . . . , Un by RQMC points.
RQMC does not change the bias.
For a KDE with smooth k, one could hope (perhaps) to get
AIV = C n−β
h−1
for β > 1, instead of Cn−1
h−1
.
If the IV is reduced, the optimal h can be taken smaller to reduce the ISB as well
(re-balance) and then reduce the MISE.
Unfortunately, things are not so simple.
Roughly, decreasing h increases the variation of the function in the estimator. So we may
have something like
AIV = C n−β
h−δ
or IV ≈ C n−βh−δ in some bounded region.
10. Draft
9
Elementary QMC Bounds (Recall)
Integration error for g : [0, 1)s → R with point set Pn = {u0, . . . , un−1} ⊂ [0, 1)s:
En =
1
n
n−1
i=0
g(ui ) −
[0,1)s
g(u)du.
Koksma-Hlawka inequality: |En| ≤ VHK(g)D∗(Pn) where
VHK(g) =
∅=v⊆S [0,1)s
∂|v|g
∂v
du, (Hardy-Krause (HK) variation)
D∗
(Pn) = sup
u∈[0,1)s
vol[0, u) −
|Pn ∩ [0, u)|
n
(star-discrepancy).
There are explicit point sets for which D∗(Pn) = O((log n)s−1/n) = O(n−1+ ).
Explicit RQMC constructions for which E[En] = 0 and Var[En] = O(n−2+ ).
11. Draft
10
Also
|En| ≤ V2(g)D2(Pn)
where
V 2
2 (g) =
∅=v⊆S [0,1)s
∂|v|g
∂v
2
du, (square L2 variation),
D2
2 (Pn) =
u∈[0,1)s
vol[0, u) −
|Pn ∩ [0, u)|
n
2
du (square L2-star-discrepancy).
Explicit constructions for which D2(Pn) = O(n−1+ ).
Moreover, if Pn is a digital net randomized by a nested uniform scramble (NUS) and
V2(g) < ∞, then E[En] = 0 and Var[En] = O(V 2
2 (g)n−3+ ) for all > 0.
12. Draft
11
Bounding the AIV under RQMC for a KDE
KDE density estimator at a single point x:
ˆfn(x) =
1
n
n
i=1
1
h
k
x − g(Ui )
h
=
1
n
n
i=1
˜g(Ui ).
With RQMC points Ui , this is an RQMC estimator of E[˜g(U)] = [0,1)s ˜g(u)du = E[fn(x)].
RQMC does not change the bias, but may reduce Var[ˆfn(x)], and then the IV.
To get RQMC variance bounds, we need bounds on the variation of ˜g.
13. Draft
11
Bounding the AIV under RQMC for a KDE
KDE density estimator at a single point x:
ˆfn(x) =
1
n
n
i=1
1
h
k
x − g(Ui )
h
=
1
n
n
i=1
˜g(Ui ).
With RQMC points Ui , this is an RQMC estimator of E[˜g(U)] = [0,1)s ˜g(u)du = E[fn(x)].
RQMC does not change the bias, but may reduce Var[ˆfn(x)], and then the IV.
To get RQMC variance bounds, we need bounds on the variation of ˜g.
The partial derivatives are:
∂|v|
∂uv
˜g(u) =
1
h
∂|v|
∂uv
k
x − g(u)
h
.
We assume they exist and are uniformly bounded. E.g., Gaussian kernel k.
By expanding via the chain rule, we obtain terms in h−j for j = 2, . . . , |v| + 1.
One of the term for v = S grows as h−s−1k(s) ((g(u) − x)/h) s
j=1 gj (u) = O(h−s−1) when
h → 0, so this AIV bound grows as h−2s−2. Not so good!
14. Draft
12
Improvement by a Change of Variable, in One Dimension
Change of variable w = (x − g(u))/h.
In one dimension (s = 1), we have dw/du = −g (u)/h, so
VHK(˜g) =
1
h
1
0
k
x − g(u)
h
−g (u)
h
du =
1
h
∞
−∞
k (w)dw = O(h−1
).
Then, if k and g are continuously differentiable, with RQMC points having D∗
(Pn) = O(n−1+
), we
obtain AIV = O(n−2+
h−2
).
With h = Θ(n−1/3
), this gives AMISE = O(n−4/3
).
A similar argument gives
V 2
2 (˜g) =
1
h2
1
0
k
x − g(u)
h
−g (u)
h
2
du =
1
h3
Lg
∞
−∞
(k (w))2
dw = O(h−3
)
if |g | ≤ Lg , and then with NUS: AIV = O(n−3+
h−3
).
With h = Θ(n−3/7
), this gives AMISE = O(n−12/7
).
15. Draft
13
Higher Dimensions
Let s = 2 and v = {1, 2}. With the change of variable (u1, u2) → (w, u2), the Jacobian is
|dw/du1| = |g1(u1, u2)/h|, where gj = ∂g/∂uj . If |g2| and |g12/g1| are bounded by a constant L,
[0,1)2
∂|v|
˜g
∂uv
du =
1
h [0,1)2
∂2
∂u1∂u2
k
x − g(u)
h
du1du2
=
1
h [0,1)2
k
x − g(u)
h
g1(u)
h
g2(u)
h
+ k
x − g(u)
h
g12(u)
h
du1du2
=
1
h
1
0
∞
−∞
k (w)
g2(u)
h
+ k (w)
g12(u)
g1(u)
dw du2
=
L
h
[µ0(k )/h + µ0(k )] = O(h−2
).
This provides a bound of O(h−2
) for VHK(˜g), then AIV = O(n−2+
h−4
).
Generalizing to s ≥ 2 gives VHK(˜g) = O(h−s
), AIV = O(n−2+
h−2s
), MISE = O(n−4/(2+s)
) .
Beats MC for s < 3, same rate for s = 3. Not very satisfactory.
16. Draft
14
Empirical Evaluation with Linear Model in a limited region
Regardless of the asymptotic bounds, the true IV may behave better than for MC for pairs
(n, h) of interest. We consider the model
MISE = IV + ISB ≈ Cn−βh−δ + Bhα .
This model is only for a limited region of interest, not for everywhere, not necessarily
asymptotic. The optimal h for this model satisfies
hα+δ
=
Cδ
Bα
n−β
.
and it gives MISE ≈ Kn−αβ/(α+δ).
We can take the asymptotic α (known) and B (estimated as for MC).
To estimate C, β, and δ, estimate the IV over a grid of values of (n, h), and fit a linear
regression model:
log IV ≈ log C − β log n − δ log h.
17. Draft
15
For each (n, h), we estimate the IV by making nr indep. replications of the RQMC density
estimator, compute the variance at ne evaluation points (stratified) over [a, b], and multiply
by (b − a)/n. We use logs in base 2, since n is a power of 2.
After estimating model parameters, can test out-of-sample with independent simulation
experiments at pairs (n, h) with h = ˆh∗(n).
For test cases in which density is known, can compute a MISE estimate at each point, and
obtain new parameter estimates ˜K and ˜ν of model MISE ≈ Kn−ν.
18. Draft
16
Numerical illustrations
For each example, we first estimate model parameters by regression using a grid of pairs
(n, h) with n = 214, 215, . . . , 219 and (for KDE) h = h0, . . . , h5 with hj = h02j/2 = 2− 0+j/2.
For histograms, m = (b − a)/h must be an integer.
For each n and each RQMC method, we make nr = 100 independent replications and take
ne = 64 evaluation points over bounded interval [a, b]. Also tried larger ne.
RQMC Point sets:
Independent points (Crude Monte Carlo),
Stratification: stratified unit cube,
Sobol+LMS: Sobol’ points with left matrix scrambling (LMS) + digital random shift,
Sobol+NUS: Sobol’ points with NUS.
19. Draft
17
Simple test example with standard normal density
Let Z1, . . . , Zs i.i.d. standard normal generated by inversion, and X = (Z1 + · · · + Zs)/
√
s.
Then X ∼ N(0, 1).
Here we can estimate IV, ISB, and MISE accurately.
We can compute
b
a f (x)dx exactly.
Take (a, b) = (−2, 2). We have B = 0.04754 with α = 4 for KDE.
20. Draft
18
Estimates of model parameters for KDE
IV ≈ Cn−β
h−δ
, MISE ≈ κn−ν
method MC NUS
s 1 1 2 3 5 20
R2 0.999 0.999 1.000 0.995 0.979 0.993
β 1.017 2.787 2.110 1.788 1.288 1.026
δ 1.144 2.997 3.195 3.356 2.293 1.450
α 3.758 3.798 3.846 3.860 3.782 3.870
˜ν 0.770 1.600 1.172 0.981 0.827 0.730
LGM 16.96 34.05 24.37 20.80 17.91 17.07
LGM = − log2(MISE) for n = 219.
21. Draft
18
Convergence of the MISE in log-log scale, for the one-dimensional example
8 10 12 14 16 18 20
−30
−20
−10
log2(n)
log2(MISE)
normal01densityKDE Independent points
Stratification
Sobol + LMS
Sobol + NUS
22. Draft
18
Convergence of the MISE, for s = 2, for histograms (left) and KDE (right).
10 15 20
−15
−10
log2(n)
log2(MISE)
Independent points
Stratification
Sobol + LMS
Sobol + NUS
10 15 20
−25
−20
−15
−10
log2(n)
log2(MISE)
Independent points
Stratification
Sobol + LMS
Sobol + NUS
23. Draft
18
LGM (n = 219) for histogram (left) and KDE (right) for estimation over (−2, 2).
1 2 3 4 5
14
16
18
20
s
LGM
Independent
Stratification
Sobol+LMS
Sobol+NUS
1 2 3 4 5
20
25
30
35
s
Independent
Stratification
Sobol+LMS
Sobol+NUS
26. Draft
20
KDE Estimate in the tail
KDE (blue) vs true density (red) with RQMC point sets with n = 219:
lattice + shift, Sobol + digital shift, Sobol + LMS-19bits + shift, Sobol + LMS-31bits + shift
-5.1 -4.6 -4.1 -3.6
(10−5
)
0
10
20
30
40
50
60
70
80
-5.1 -4.6 -4.1 -3.6
(10−5
)
0
10
20
30
40
50
60
70
80
-5.1 -4.6 -4.1 -3.6
(10−5
)
0
10
20
30
40
50
60
70
80
-5.1 -4.6 -4.1 -3.6
(10−5
)
0
10
20
30
40
50
60
70
80
27. Draft
21
Displacement of a cantilevel beam
Displacement D of a cantilever beam with horizontal and vertical loads:
D =
4L3
Ewt
Y 2
t4
+
X2
w4
where L = 100, w = 4, t = 2 (in inches), X, Y , and E are independent and normally
distributed with means and standard deviations:
Description Symbol Mean St. dev.
Young’s modulus E 2.9 × 107 1.45 × 106
Horizontal load X 500 100
Vertical load Y 1000 100
We want to estimate the density of D over (a, b) = (0.336, 1.561) (about 99.5% of density).
28. Draft
22
Parameter estimates of the linear regression model for IV and MISE:
IV ≈ Cn−β
h−δ
, MISE ≈ κn−ν
Point set ˆC ˆβ ˆδ ˆν
Histogram, α = 2
Independent 0.888 1.001 1.021 0.797
Sobol+LMS 0.134 1.196 1.641 0.848
Sobol+NUS 0.136 1.194 1.633 0.848
KDE with Gaussian kernel, α = 4
Independent 0.210 0.993 1.037 0.789
Sobol+LMS 5.28E-4 1.619 2.949 0.932
Sobol+NUS 5.24E-4 1.621 2.955 0.932
Good fit: we have R2 > 0.99 in all cases.
29. Draft
23
log2(IV) vs log2 n for cantilever with KDE.
10 15 20
−30
−20
−10
0
log2(n)
log2(IV)
Independent, h = 2−9.5
Sobol+NUS, h = 2−9.5
Independent, h = 2−7.5
Sobol+NUS, h = 2−7.5
Independent, h = 2−5.5
Sobol+NUS, h = 2−5.5
30. Draft
23
A weighted sum of lognormals
X =
s
j=1
wj exp(Yj )
where Y = (Y1, . . . , Ys)t ∼ N(µ, C).
Let C = AAt. To generate Y, generate Z ∼ N(0, I) and put Y = µ + AZ.
We will use principal component decomposition (PCA).
This has several applications. In one of them, with wj = s0(s − j + 1)/s, e−ρ max(X − K, 0)
is the payoff of a financial option based on an average price at s observation times, under a
GBM process. Want to estimate density of positive payoffs.
31. Draft
23
A weighted sum of lognormals
X =
s
j=1
wj exp(Yj )
where Y = (Y1, . . . , Ys)t ∼ N(µ, C).
Let C = AAt. To generate Y, generate Z ∼ N(0, I) and put Y = µ + AZ.
We will use principal component decomposition (PCA).
This has several applications. In one of them, with wj = s0(s − j + 1)/s, e−ρ max(X − K, 0)
is the payoff of a financial option based on an average price at s observation times, under a
GBM process. Want to estimate density of positive payoffs.
Numerical experiment: Take s = 12, ρ = 0.037, s0 = 100, K = 101, and C defined
indirectly via: Yj = Yj−1(µ − σ2)j/s + σB(j/s) where Y0 = 0, σ = 0.12136, µ = 0.1, and B
a standard Brownian motion.
We will estimate the density of e−ρ(X − K) over (a, b) = (0, 50).
32. Draft
24
Histogram of positive values from n = 106
independent simulation runs:
Payoff
0 50 100 15013.1
Frequency (×103
)
0
10
20
30
33. Draft
25IV as a function of n for KDE.
14 15 16 17 18 19 20
−40
−30
−20
log2(n)
log2(IV)
Independent, h = 2−3,5
Sobol+NUS, h = 2−3,5
Independent, h = 2−1,5
Sobol+NUS, h = 2−1,5
Independent, h = 20,5
Sobol+NUS, h = 20,5
34. Draft
25
Example: A stochastic activity network
Gives precedence relations between activities. Activity k has random duration Yk (also length
of arc k) with known cumulative distribution function (cdf) Fk(y) := P[Yk ≤ y].
Project duration T = (random) length of longest path from source to sink.
May want to estimate E[T], P[T > x], a quantile, density of T, etc.
0source 1
Y0
2
Y1
Y2
3
Y3
4
Y7
5
Y9
Y4
Y5
6
Y6
7
Y11
Y8
8 sink
Y12
Y10
35. Draft
26
Simulation
Algorithm: to generate T:
for k = 0, . . . , 12 do
Generate Uk ∼ U(0, 1) and let Yk = F−1
k (Uk)
Compute X = T = h(Y0, . . . , Y12) = f (U0, . . . , U12)
Monte Carlo: Repeat n times independently to obtain n realizations X1, . . . , Xn of T.
Estimate E[T] = (0,1)s f (u)du by ¯Xn = 1
n
n−1
i=0 Xi .
To estimate P(T > x), take X = I[T > x] instead.
RQMC: Replace the n independent points by an RQMC point set of size n.
Numerical illustration from Elmaghraby (1977):
Yk ∼ N(µk , σ2
k ) for k = 0, 1, 3, 10, 11, and Vk ∼ Expon(1/µk ) otherwise.
µ0, . . . , µ12: 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5.
36. Draft
27
Naive idea: replace each Yk by its expectation. Gives T = 48.2.
Results of an experiment with n = 100 000.
Histogram of values of T is a density estimator that gives more information than a
confidence interval on E[T] or P[T > x].
T
0 25 50 75 100 125 150 175 200
Frequency
0
5000
10000
T = x = 90
T = 48.2
mean = 64.2
ˆξ0.99 = 131.8
38. Draft
29
Results for SAN Network
Parameter estimates of the regression models for the SAN network, n = 219.
Point set C β δ ˆγ∗ ˆν∗
Histogram, α = 2
Independent 0.892 0.999 1.005 0.333 0.665
Stratif. 0.897 1.001 1.006 0.333 0.666
Lattice+shift 0.841 0.988 0.988 0.331 0.662
Sobol+LMS 0.894 1.006 1.024 0.333 0.665
Sobol+NUS 0.888 1.005 1.026 0.332 0.665
KDE with Gaussian kernel, α = 4
Independent 0.254 1.001 1.004 0.199 0.799
Stratif. 0.248 1.001 1.012 0.199 0.799
Lattice+shift 0.222 0.969 0.947 0.196 0.784
Sobol+LMS 0.242 1.021 1.089 0.201 0.803
Sobol+NUS 0.246 1.023 1.087 0.201 0.804
39. Draft
30
The SAN example, Sobol+NUS vs Independent points, summary for n = 219 = 524288.
Density
Independent points Sobol+NUS
m or h log2IV IV rate log2IV IV rate
Histogram
64 -19.32 -1.003 -19.78 -1.039
256 -17.28 -0.999 -17.40 -1.011
1024 -15.27 -1.001 -15.30 -1.003
4096 -13.27 -0.998 -13.27 -1.000
Kernel
0.10 -16.64 -0.999 -16.71 -1.006
0.13 -17.31 -0.999 -17.42 -1.007
0.18 -17.96 -0.999 -18.18 -1.015
0.24 -18.64 -0.999 -18.92 -1.029
0.32 -19.33 -0.998 -19.79 -1.035
0.43 -19.99 -0.998 -20.71 -1.064
40. Draft
31
Conditional Monte Carlo for Derivative Estimation
The density is f (x) = F (x), so could we just take the derivative of a cdf estimator?
The derivative of the empirical cdf ˆFn(x) is zero almost everywhere, ... does not work!
We need a smooth cdf estimator.
41. Draft
31
Conditional Monte Carlo for Derivative Estimation
The density is f (x) = F (x), so could we just take the derivative of a cdf estimator?
The derivative of the empirical cdf ˆFn(x) is zero almost everywhere, ... does not work!
We need a smooth cdf estimator.
Let X = X(θ, ω) with parameter θ ∈ R. Want to estimate g (θ) = d
dθ E[X(θ, ω)].
When X(·, ω) is not continuous in θ (+ other conditions), we cannot interchange the
derivative and expectation, and cannot take d
dθ X(θ, ω) as an estimator of g (θ).
42. Draft
31
Conditional Monte Carlo for Derivative Estimation
The density is f (x) = F (x), so could we just take the derivative of a cdf estimator?
The derivative of the empirical cdf ˆFn(x) is zero almost everywhere, ... does not work!
We need a smooth cdf estimator.
Let X = X(θ, ω) with parameter θ ∈ R. Want to estimate g (θ) = d
dθ E[X(θ, ω)].
When X(·, ω) is not continuous in θ (+ other conditions), we cannot interchange the
derivative and expectation, and cannot take d
dθ X(θ, ω) as an estimator of g (θ).
Often possible to replace X by a conditional expectation Xe = E[X | G] where G contains
partial information, not enough to reveal X, but enough to compute Xe.
If Xe is smooth enough in θ, we may have
g (θ) = E
d
dθ
Xe(θ, ω) = E[Xe].
This is gradient estimation by IPA + CMC. L’Ecuyer and Perron (1994), Asmussen (2017).
Then, we can simulate Xe with RQMC instead of MC.
43. Draft
32
Application to the SAN Example
We want a smooth estimate of P[T ≤ t], whose sample derivative will be an unbiased
estimate of the density at t.
Naive estimator: Generate T and compute X = I[T ≤ t].
Repeat n times and average.
44. Draft
32
Application to the SAN Example
We want a smooth estimate of P[T ≤ t], whose sample derivative will be an unbiased
estimate of the density at t.
Naive estimator: Generate T and compute X = I[T ≤ t].
Repeat n times and average.
0source 1
Y0
2
Y1
Y2
3
Y3
4
Y7
5
Y9
Y4
Y5
6
Y6
7
Y11
Y8
8 sink
Y12
Y10
45. Draft
33
Conditional Monte Carlo estimator of P[T ≤ t]. Generate the Yj ’s only for the 8 arcs
that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T ≤ t] by its conditional
expectation given those Yj ’s,
Xe = P[T ≤ t | {Yj , j ∈ L}].
This makes the integrand continuous in the Uj ’s and in t.
46. Draft
33
Conditional Monte Carlo estimator of P[T ≤ t]. Generate the Yj ’s only for the 8 arcs
that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T ≤ t] by its conditional
expectation given those Yj ’s,
Xe = P[T ≤ t | {Yj , j ∈ L}].
This makes the integrand continuous in the Uj ’s and in t.
To compute Xe: for each l ∈ L, say from al to bl , compute the length αl of the longest path
from 1 to al , and the length βl of the longest path from bl to the destination.
The longest path that passes through link l does not exceed t iff αl + Yl + βl ≤ t, which
occurs with probability P[Yl ≤ t − αl − βl ] = Fl [t − αl − βl ].
47. Draft
33
Conditional Monte Carlo estimator of P[T ≤ t]. Generate the Yj ’s only for the 8 arcs
that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T ≤ t] by its conditional
expectation given those Yj ’s,
Xe = P[T ≤ t | {Yj , j ∈ L}].
This makes the integrand continuous in the Uj ’s and in t.
To compute Xe: for each l ∈ L, say from al to bl , compute the length αl of the longest path
from 1 to al , and the length βl of the longest path from bl to the destination.
The longest path that passes through link l does not exceed t iff αl + Yl + βl ≤ t, which
occurs with probability P[Yl ≤ t − αl − βl ] = Fl [t − αl − βl ].
Since the Yl are independent, we obtain
Xe =
l∈L
Fl [t − αl − βl ].
48. Draft
34
To estimate the density of T, just take the derivative w.r.t. t:
Xe =
d
dt
Xe(t, ω)
w.p.1
=
j∈L
fj [t − αj − βj ]
l∈L, l=j
Fl [t − αl − βl ].
One can prove that
E[Xe] =
d
dt
E[Xe] =
d
dt
P[T ≤ t] = fT (t)
via the dominated convergence theorem. See L’Ecuyer (1990).
Here, with MC, the IV converges as O(1/n) and there is no bias, so MISE = IV.
Now, we can apply RQMC to simulate Xe. It is a smooth function of the uniforms if each
inverse cdf F−1
j and density fj are smooth.
Then we can get a convergence rate near O(n−2) for the IV and the MISE.
49. Draft
35
Conclusion
We saw that RQMC can improve the convergence rate of the IV and MISE when
estimating a density.
With histograms and KDEs, the convergence rates observed in small examples are much
better than those that we have proved based on standard QMC theory. There are
opportunities for QMC theoreticians here.
This also applies in the context of Array-RQMC for Markov chains.
The combination of CMC with QMC for density estimation is very promising!
Lots of potential applications! We are working on this.
50. Draft
35Some references
S. Asmussen. Conditional Monte Carlo for sums, with applications to insurance and finance,
Annals of Actuarial Science, prepublication, 1–24, 2018.
J. Dick and F. Pillichshammer. Digital Nets and Sequences: Discrepancy Theory and
Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge, U.K., 2010.
P. L’Ecuyer. A unified view of the IPA, SF, and LR gradient estimation techniques. Management
Science 36: 1364–1383, 1990.
P. L’Ecuyer. Quasi-Monte Carlo methods with applications in finance. Finance and
Stochastics, 13(3):307–349, 2009.
P. L’Ecuyer. Randomized quasi-monte carlo: An introduction for practitioners. In P. W. Glynn
and A. B. Owen, editors, Monte Carlo and Quasi-Monte Carlo Methods 2016, 2017.
P. L’Ecuyer, C. L´ecot, and B. Tuffin. A randomized quasi-Monte Carlo simulation method for
Markov chains. Operations Research, 56(4):958–975, 2008.
P. L’Ecuyer and G. Perron. On the Convergence Rates of IPA and FDC Derivative Estimators for
Finite-Horizon Stochastic Simulations. Operations Research, 42 (4):643–656, 1994.
D. W. Scott. Multivariate Density Estimation. Wiley, 2015.