MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models in Calibrating Imperfect Mathematical Models- Mengyang Gu, August 21, 2018
This document summarizes some statistical models used for calibrating imperfect mathematical models. It discusses three main approaches:
1. Gaussian stochastic process (GaSP) calibration, which models bias as a Gaussian process. This is commonly used but can produce inconsistent parameter estimates.
2. L2 calibration, which estimates reality separately from the model before estimating parameters. However, it does not use model information.
3. Scaled Gaussian stochastic process (S-GaSP) calibration, which constrains the GaSP to have a fixed L2 norm. This satisfies predicting reality and calibrated parameters. The S-GaSP is equivalent to penalized kernel ridge regression.
The document analyzes the nonparametric regression setting
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
The generation of Gaussian random fields over a physical domain is a challenging problem in computational mathematics, especially when the correlation length is short and the field is rough. The traditional approach is to make use of a truncated Karhunen-Loeve (KL) expansion, but the generation of even a single realisation of the field may then be effectively beyond reach (especially for 3-dimensional domains) if the need is to obtain an expected L2 error of say 5%, because of the potentially very slow convergence of the KL expansion. In this talk, based on joint work with Ivan Graham, Frances Kuo, Dirk Nuyens, and Rob Scheichl, a completely different approach is used, in which the field is initially generated at a regular grid on a 2- or 3-dimensional rectangle that contains the physical domain, and then possibly interpolated to obtain the field at other points. In that case there is no need for any truncation. Rather the main problem becomes the factorisation of a large dense matrix. For this we use circulant embedding and FFT ideas. Quasi-Monte Carlo integration is then used to evaluate the expected value of some functional of the finite-element solution of an elliptic PDE with a random field as input.
Classification with mixtures of curved Mahalanobis metricsFrank Nielsen
Presentation at ICIP 2016.
Slide 4, there is a typo, replace absolute value by parenthesis. The cross-ratio can be negative and we use the principal complex logarithm
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
Similar to MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models in Calibrating Imperfect Mathematical Models- Mengyang Gu, August 21, 2018
We start with motivation, few examples of uncertainties. Then we discretize elliptic PDE with uncertain coefficients, apply TT format for permeability, the stochastic operator and for the solution. We compare sparse multi-index set approach with full multi-index+TT.
Tensor Train format allows us to keep the whole multi-index set, without any multi-index set truncation.
Double Robustness: Theory and Applications with Missing DataLu Mao
When data are missing at random (MAR), complete-case analysis with the full-data estimating equation is in general not valid. To correct the bias, we can employ the inverse probability weighting (IPW) technique on the complete cases. This requires modeling the missing pattern on the observed data (call it the $\pi$ model). The resulting IPW estimator, however, ignores information contained in cases with missing components, and is thus statistically inefficient. Efficiency can be improved by modifying the estimating equation along the lines of the semiparametric efficiency theory of Bickel et al. (1993). This modification usually requires modeling the distribution of the missing component on the observed ones (call it the $\mu$ model). Hence, when both the $\pi$ and the $\mu$ models are correct, the modified estimator is valid and is more efficient than the IPW one. In addition, the modified estimator is "doubly robust" in the sense that it is valid when either the $\pi$ model or the $\mu$ model is correct.
Essential materials of the slides are extracted from the book "Semiparametric Theory and Missing Data" (Tsiatis, 2006). The slides were originally presented in the class BIOS 773 Statistical Analysis with Missing Data in Spring 2013 at UNC Chapel Hill as a final project.
Talk presented on this workshop "Workshop: Imaging With Uncertainty Quantification (IUQ), September 2022",
https://people.compute.dtu.dk/pcha/CUQI/IUQworkshop.html
We consider a weakly supervised classification problem. It
is a classification problem where the target variable can be unknown
or uncertain for some subset of samples. This problem appears when
the labeling is impossible, time-consuming, or expensive. Noisy measurements
and lack of data may prevent accurate labeling. Our task
is to build an optimal classification function. For this, we construct and
minimize a specific objective function, which includes the fitting error on
labeled data and a smoothness term. Next, we use covariance and radial AQ1
basis functions to define the degree of similarity between points. The further
process involves the repeated solution of an extensive linear system
with the graph Laplacian operator. To speed up this solution process,
we introduce low-rank approximation techniques. We call the resulting
algorithm WSC-LR. Then we use the WSC-LR algorithm for analysis
CT brain scans to recognize ischemic stroke disease. We also compare
WSC-LR with other well-known machine learning algorithms.
In this paper, optimal control problem for processes represented by stochastic sequential machine is
analyzed. Principle of optimality is proven for the considered problem. Then by using method of dynamical
programming, solution of optimal control problem is found.
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSTahia ZERIZER
In this article, we study boundary value problems of a large
class of non-linear discrete systems at two-time-scales. Algorithms are given to implement asymptotic solutions for any order of approximation.
Hecke Operators on Jacobi Forms of Lattice Index and the Relation to Elliptic...Ali Ajouz
Jacobi forms of lattice index, whose theory can be viewed as extension of the theory of classical Jacobi forms, play an important role in various theories, like the theory of orthogonal modular forms or the theory of vertex operator
algebras. Every Jacobi form of lattice index has a theta expansion which implies, for index of odd rank, a connection to half integral weight modular forms and then via Shimura lifting to modular forms of integral weight, and implies a direct connection to modular forms of integral weight if the rank is
even. The aim of this thesis is to develop a Hecke theory for Jacobi forms of lattice index extending the Hecke theory for the classical Jacobi forms, and to study how the indicated relations to elliptic modular forms behave under Hecke operators. After defining Hecke operators as double coset operators,
we determine their action on the Fourier coefficients of Jacobi forms, and we determine the multiplicative relations satisfied by the Hecke operators, i.e. we study the structural constants of the algebra generated by the Hecke operators. As a consequence we show that the vector space of Jacobi forms
of lattice index has a basis consisting of simultaneous eigenforms for our Hecke operators, and we discover the precise relation between our Hecke algebras and the Hecke algebras for modular forms of integral weight. The
latter supports the expectation that there exist equivariant isomorphisms between spaces of Jacobi forms of lattice index and spaces of integral weight modular forms. We make this precise and prove the existence of such liftings
in certain cases. Moreover, we give further evidence for the existence of such liftings in general by studying numerical examples.
We apply tensor train (TT) data format to solve an elliptic PDE with uncertain coefficients. We reduce complexity and storage from exponential to linear. Post-processing in TT format is also provided.
Similar to MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models in Calibrating Imperfect Mathematical Models- Mengyang Gu, August 21, 2018 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models in Calibrating Imperfect Mathematical Models- Mengyang Gu, August 21, 2018
1. Facts about some statistical models in
calibrating imperfect mathematical models
Mengyang Gu
Department of Applied Mathematics and Statistics
Johns Hopkins University
2. The calibration problem
A mathematical model fM(x, u), where x ∈ X is a
p-dimensional vector of the observable inputs and u is a
q-dimensional vector of the unobservable parameters. The
experimental/field data are denoted as
yO = yO(x1), ..., yO(xn)
T
.
When the mathematical model is imperfect, it is usual to
model
yO
(x) = fM
(x, u) + b(x) + ,
with being random errors. The reality is
yR(x) = fM(x, u) + b(x).
How to model the bias function b(·)?
3. Outline
1. Gaussian stochastic process (GaSP) calibration
2. L2 calibration
3. Scaled Gaussian stochastic process (S-GaSP) calibration
4. Outline
1. Gaussian stochastic process (GaSP) calibration
2. L2 calibration
3. Scaled Gaussian stochastic process (S-GaSP) calibration
5. Gaussian stochastic process calibration
Assume the trend and intercept are properly modeled in the
mathematical model.
Kennedy and O’Hagan (2001) modeled b(·) via a
stationary Gaussian stochastic process (GaSP), meaning
that any marginal distribution
(b(x1), ..., b(xn))T
∼ MN(0, σ2
R),
where Ri,j = K(xi, xj), with K(·, ·) being a kernel function.
This statistical model for the bias function is followed by
many works in calibration (e.g. Bayarri et al. (2007b,a);
Higdon et al. (2008); Liu et al. (2009); Paulo et al. (2012);
Gu (2018)).
The predictive accuracy is improved by combining the
mathematical model and discrepancy function.
6. Two types of inconsistency in Gaussian stochastic
process calibration
(Random bias function.) When the bias is generated
from the Gaussian stochastic process, some usual
estimators for the calibration parameters u are inconsistent
when the sample size goes to infinity. ("Adding
spatially-correlated errors can mess up the fixed effect you
love" Reich et al. (2006); Hodges and Reich (2010);
Hughes and Haran (2013)).
(Deterministic bias function.) When the bias is a fixed
deterministic function in some functional space (e.g.
Sobolev space), the estimator of u does not minimize
some frequently used norms between the reality and the
mathematical model, e.g. the L2 norm:
||yR(·) − fM(·, ˆu)||L2(X) (Arendt et al. (2012b,a); Tuo and
Wu (2015, 2016); Wong et al. (2017); Plumlee (2016)).
7. Inconsistency when data is from the Gaussian
stochastic process
Example 1
Assume fM(x, u) = u and the experimental data is noise-free,
i.e. yO(x) = fM(x, u) + b(x) and b(·) ∼ GaSP(0, σ2K(·, ·)), with
K(xi, xj) = exp(−|xi − xj|/γ), the exponential correlation
function. n observations are obtained, equally spaced at
xi ∈ [0, 1]. Assume both σ2 and γ are known.
Lemma 2
Assume σ2 > 0 and γ > 0 are both finite. When n → ∞, the
maximum likelihood estimator ˆuMLE = (1T
n R−11n)−11T
n R−1yO in
Example 1 has the following limiting distribution
ˆuMLE ∼ N u,
2σ2γ
2γ + 1
.
The variance of the estimator does not go to zero when the
sample size increases to infinity.
8. 20 40 60 80 100
0.000.050.100.150.200.25
n
MSE
GaSP
S−GaSP
20 40 60 80 100
0.000.050.100.150.200.25
n
MSE
GaSP
S−GaSP
Figure 1: Mean squared error (MSE) of MLE of u in Example 1 when
the data is from a zero-mean GaSP (red triangles) and from a
zero-mean S-GaSP (blue dots) for different number of observations.
105
simulations are implemented for each point. The left panel is for
γ = 0.1 and the right panel is for γ = 0.02, both assuming σ2
= 1.
9. The equivalence to the kernel ridge regression in
GaSP calibration
After marginalize out b, the marginal distribution of yO
follows
[yO
| u, σ2
0, λ] ∼ MN(fM
u , σ2
0((nλ)−1
R + In)). (1)
where fM
u := (fM
(x1, u), ..., fM
(xn, u))T
. Denote L(u) the likelihood of
u in (1) and the regularization parameter λ := σ2
0/(nσ2
).
Lemma 3
The maximum likelihood estimator ˆuλ,n := argmaxu∈U L(u) and
predictive mean estimator ˆbλ,n(·) := E[b(·) | yO
, ˆuλ,n, λ] can be
expressed as the estimator of the kernel ridge regression (KRR)
(ˆuλ,n, ˆbλ,n(·)) = argmin
b(·)∈H, u∈U
λ,n(u, b),
λ,n(u, b) =
1
n
n
i=1
(yO
(xi ) − fM
(xi , u) − b(xi ))2
+ λ||b||2
H . (2)
where || · ||H is the native norm or the reproducing kernel Hilbert
space norm. The || · ||H is quite different than the L2 norm || · ||L2(X).
10. Outline
1. Gaussian stochastic process (GaSP) calibration
2. L2 calibration
3. Scaled Gaussian stochastic process (S-GaSP) calibration
11. L2 calibration
The L2 approach in Tuo and Wu (2015, 2016). First they use a
GaSP to estimate yR
(·) based on yO
and then they estimate uL2
by
ˆuL2
= argmin
u∈U x∈X
(ˆyR
(x) − fM
(x, u))2
dx
12. L2 calibration
The L2 approach in Tuo and Wu (2015, 2016). First they use a
GaSP to estimate yR
(·) based on yO
and then they estimate uL2
by
ˆuL2
= argmin
u∈U x∈X
(ˆyR
(x) − fM
(x, u))2
dx
The L2 approach does not use the mathematical model to
predict the reality. However, the mathematical model is often
developed by experts, meaning that it often contains information
about the reality.
13. Criteria
Denote L2 loss ||f(·)||2
L2(X) := x∈X f2(x)dx for any squared
integrable function f. We focus on two types of predictions.
i. The L2 loss between the reality and the estimator of the
reality L2(ˆyR(·, ˆu)) = ||yR(·) − ˆyR(·, ˆu)||2
L2(X).
14. Criteria
Denote L2 loss ||f(·)||2
L2(X) := x∈X f2(x)dx for any squared
integrable function f. We focus on two types of predictions.
i. The L2 loss between the reality and the estimator of the
reality L2(ˆyR(·, ˆu)) = ||yR(·) − ˆyR(·, ˆu)||2
L2(X).
ii. The L2 loss between the reality and calibrated mathematical
model L2(ˆu) = ||yR(·) − fM(·, ˆu)||2
L2(X) = ||bˆu(·)||2
L2(X),
where ˆu is the estimator of the calibration parameter.
Is it possible to satisfy both criteria when the sample size is
finite and infinite?
15. Outline
1. Gaussian stochastic process (GaSP) calibration
2. L2 calibration
3. Scaled Gaussian stochastic process (S-GaSP) calibration
16. The scaled Gaussian stochastic process
Consider the following process
yO
(x) = fM
(x, u) + bz(x) + ,
bz(x) = b(x) | ξ∈X
b2
(ξ)dξ = Z ,
b(·) ∼ GaSP(0, σ2
K(·, ·)),
Z ∼ pZ (·), ∼ N(0, σ2
0).
(3)
The bz(·) is called the scaled Gaussian stochastic process (S-GaSP).
Given Z = z, the S-GaSP becomes a GaSP constrained at the space
related to the L2 norm of the discrepancy function x∈X
b2
(x)dx = z.
Conditional on all the parameters, the default choice of pZ (·) is
pZ (z) =
gZ (z) pb (Z = z)
∞
0
gZ (t) pb (Z = t) dt
, (4)
with
gZ (z) =
λz
2σ2
exp −
λzz
2σ2
. (5)
17. Fact about the S-GaSP
Starting from a GaSP with any reasonable kernel, the S-GaSP is
a GaSP with a transformed kernel. For a GaSP with zero mean
and covariance function σ2
K(·, ·), one has
K(xi , xj ) =
∞
k=1
ρk φk (xi )φk (xj ).
where ρk and φk (·) are the ordered eigenvalues and
orthonormal eigenfunctions, respectively.
Lemma 4 (Mercer theorem for S-GaSP)
Any marginal distribution of the S-GaSP defined in (3) is a
multivariate normal distribution
[bz(x1), ..., bz(xn) | σ2
Rz] ∼ MN(0, σ2
Rz)
where the (i, j) entry of Rz is Kz(xi , xj ) as follows
Kz(xi , xj ) =
∞
k=1
ρk
1 + λzρk
φk (xi )φk (xj ). (6)
18. The equivalence to the penalized kernel ridge
regression in S-GaSP
After marginalizing out bz in (3),
[yO
| u, σ2
0, λ, λz] ∼ MN(fM
u , σ2
0((nλ)−1
Rz + In)). (7)
Denote Lz(u) the likelihood for u in (7).
Lemma 5
The maximum likelihood estimator ˆuλ,λz ,n := argmaxu Lz(u) and
predictive mean ˆbλ,λz ,n(·) := E[bz(·) | yO
, ˆuλ,λz ,n, λ, λz] are the same
as the estimator of the penalized kernel ridge regression (KRR) as
follows
(ˆuλ,λz ,n, ˆbλ,λz ,n(·)) = argmin
b(·)∈H, u∈u
λ,λz ,n(u, b),
λ,λz ,n(u, b) =
1
n
n
i=1
(yO
(xi ) − fM
(xi , u) − b(xi ))2
+ λ||b||2
Hz
. (8)
where ||b||2
Hz
= ||b||2
H + λz||b||2
L2(X).
19. The nonparametric regression setting
Let us first consider the nonparametric regression model,
y(xi ) = f(xi ) + i , i ∼ N(0, σ2
0), (9)
where f is assumed to follow a zero mean S-GaSP prior with the
default choice of pZ (·) and gZ (·) in Equation (4) and (5). For
simplicity, we assume xi
i.i.d.
∼ Unif([0, 1]p
).
20. The nonparametric regression setting
Let us first consider the nonparametric regression model,
y(xi ) = f(xi ) + i , i ∼ N(0, σ2
0), (9)
where f is assumed to follow a zero mean S-GaSP prior with the
default choice of pZ (·) and gZ (·) in Equation (4) and (5). For
simplicity, we assume xi
i.i.d.
∼ Unif([0, 1]p
).
(Function space of the reality). We assume the underlying
truth f0(·) := Ey [y(·)] reside in p dimensional Sobolev space with
order m > p/2 as follows
Wm
2 (X) = {f(·) =
∞
k=1
fk φ(·) ∈ L2(X) :
∞
k=1
k2m/p
f2
k < ∞},
21. The nonparametric regression setting
Let us first consider the nonparametric regression model,
y(xi ) = f(xi ) + i , i ∼ N(0, σ2
0), (9)
where f is assumed to follow a zero mean S-GaSP prior with the
default choice of pZ (·) and gZ (·) in Equation (4) and (5). For
simplicity, we assume xi
i.i.d.
∼ Unif([0, 1]p
).
(Function space of the reality). We assume the underlying
truth f0(·) := Ey [y(·)] reside in p dimensional Sobolev space with
order m > p/2 as follows
Wm
2 (X) = {f(·) =
∞
k=1
fk φ(·) ∈ L2(X) :
∞
k=1
k2m/p
f2
k < ∞},
(Choice of kernel). Denote {(ρj , φj )}∞
j=1 the eigenvalues and
eigenfunctions of the reproducing kernel K(·, ·). For all j, assume
cρj−2m/p
≤ ρj ≤ Cρj−2m/p
(10)
for some constant cρ, Cρ > 0. For all j and x ∈ X, we assume
the eigenfunctions are bounded. As an example, the widely
used Matérn kernel satisfies this assumption.
22. Convergence rate of the S-GaSP
Theorem 1
Assume the eigenvalues of K(·, ·) satisfy (10) and the
eigenfunctions are bounded. Assume f0 ∈ Wm
2 (X) and denote
β = (2m−p)2
2m(2m+p) . For the nonparametric regression model (9), for
sufficiently large n and any α > 2, with probability at least
1 − exp −α−2
3 (1 − 2 exp(−nβ))
||ˆfλ,λz ,n −f0||L2(X) ≤ 2
√
2(||f0||L2(X) + ||f0||H) + CK ασ0 n
− m
2m+p ,
(11)
by choosing λ = n−2m/(2m+p) and λz = λ−1/2, where CK is a
constant that only depends on the kernel K(·, ·).
23. Convergence rate of the S-GaSP in calibration
Define the estimator for the reality in the S-GaSP by the penalized
KRR for any x ∈ X
yR
λ,λz ,n(x, ˆu) := f(x, ˆuλ,λz ,n) + ˆbλ,λz ,n(x).
Corollary 6
Assume yR
(·) − fM
(·, u) ∈ Wm
2 (X) for any u ∈ U and
supu∈U yR
(·) − fM
(·, u) H < ∞. Let the eigenvalues of K(·, ·) satisfy
(10). For sufficiently large n and any α > 2 and Cβ ∈ (0, 1), with
probability at least 1 − exp{−(α − 2)/3} − exp(−nCβ β
),
yR
(·) − ˆyR
λ,λz ,n(·, ˆu) L2(X) ≤ 2
√
2 sup
u∈U
yR
(·) − fM
(·, u) L2(X)
+ sup
u∈U
yR
(·) − fM
(·, u) H + CK σ0
√
α n− m
2m+p
by choosing λ = n−2m/(2m+p)
and λz = λ−1/2
, where CK is a constant
depending on the kernel K(·, ·) and β = (2m − p)2
/(2m(2m + p)).
24. Convergence rate of the S-GaSP in calibration
Define the estimator for the reality in the S-GaSP by the penalized
KRR for any x ∈ X
yR
λ,λz ,n(x, ˆu) := f(x, ˆuλ,λz ,n) + ˆbλ,λz ,n(x).
Corollary 6
Assume yR
(·) − fM
(·, u) ∈ Wm
2 (X) for any u ∈ U and
supu∈U yR
(·) − fM
(·, u) H < ∞. Let the eigenvalues of K(·, ·) satisfy
(10). For sufficiently large n and any α > 2 and Cβ ∈ (0, 1), with
probability at least 1 − exp{−(α − 2)/3} − exp(−nCβ β
),
yR
(·) − ˆyR
λ,λz ,n(·, ˆu) L2(X) ≤ 2
√
2 sup
u∈U
yR
(·) − fM
(·, u) L2(X)
+ sup
u∈U
yR
(·) − fM
(·, u) H + CK σ0
√
α n− m
2m+p
by choosing λ = n−2m/(2m+p)
and λz = λ−1/2
, where CK is a constant
depending on the kernel K(·, ·) and β = (2m − p)2
/(2m(2m + p)).
First criterion: L2(ˆyR
(·, ˆu)) = ||yR
(·) − ˆyR
(·, ˆu)||2
L2(X).
25. Convergence to the L2 minimizer in S-GaSP
Denote the L2 minimizer in calibration that minimizes the reality
and mathematical models as follows
uL2
= argminu∈U
x∈X
(yR
(x) − fM
(x, u))2
dx. (12)
Theorem 2
Under some more regular conditions, the estimator of the
penalized kernel ridge regression by the S-GaSP calibration
model satisfies
ˆuλ,λz ,n = uL2
+ Op(n
− m
2m+p ),
by choosing λ = O(n
− 2m
2m+p ) and λz = O(λ−1/2).
26. Convergence to the L2 minimizer in S-GaSP
Denote the L2 minimizer in calibration that minimizes the reality
and mathematical models as follows
uL2
= argminu∈U
x∈X
(yR
(x) − fM
(x, u))2
dx. (12)
Theorem 2
Under some more regular conditions, the estimator of the
penalized kernel ridge regression by the S-GaSP calibration
model satisfies
ˆuλ,λz ,n = uL2
+ Op(n
− m
2m+p ),
by choosing λ = O(n
− 2m
2m+p ) and λz = O(λ−1/2).
Second criterion: L2(ˆu) = ||yR(·) − fM(·, ˆu)||2
L2(X).
27. Convergence to the L2 minimizer in S-GaSP
Denote the L2 minimizer in calibration that minimizes the reality
and mathematical models as follows
uL2
= argminu∈U
x∈X
(yR
(x) − fM
(x, u))2
dx. (12)
Theorem 2
Under some more regular conditions, the estimator of the
penalized kernel ridge regression by the S-GaSP calibration
model satisfies
ˆuλ,λz ,n = uL2
+ Op(n
− m
2m+p ),
by choosing λ = O(n
− 2m
2m+p ) and λz = O(λ−1/2).
Second criterion: L2(ˆu) = ||yR(·) − fM(·, ˆu)||2
L2(X).
The GaSP calibration does not have this property.
28. Example 2
Let yO
(x) = yR
(x) + , where yR
(x) = 2
∞
j=1
j−3
cos(π(j − 0.5)x) sin(j)
and ∼ N(0, 0.052
) is an independent Gaussian noise. Let the
mathematical model be a mean parameter, i.e. fM
(x) = u. The goal
is to predict yR
(x) at x ∈ [0, 1] and estimate u. This function is in the
Sobolev space with order m = 3.
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q q
q q q q q q q q q q q q q q q q q q q q q q q q q
5 6 7 8 9 10
0.0000.0040.0080.012
log(n)
AvgRMSEfM+δ
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q q
q q q q q q q q q q q q q q q q q q q q q q q q q
q
q
GaSP
S−GaSP
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 6 7 8 9 10−8−6−4−20
log(n)
log(RMSEθ)
q
q q q
q q
q
q q
q
q q q q
q q
q
q q
q q q q
q q
q q q q q q
q q q q q q q q q q q q q q q q q q q
q
q
GaSP
S−GaSP
Figure 2: Calibration and prediction by the GaSP and discretized S-GaSP
calibration models for Example 2. In the left panel, the black curve is the
theoretical upper bound from Corollary 6 (up to a constant). The blue and red
circles overlap in the left panel. In the right panel, the black curve is the
theoretical upper bound from Theorem 2. λ = n−2m/(2m+p)
× 10−4
with
m = 3, p = 1 and λz = λ−1/2
are assumed.
29. MLE for Example 2
q
q
q
q
q
q q
q
q
q
q
q
q q
q q
q q
q q q q
q q
q q q q q q q q q q q q q q q q q q q q q q q q q q
5 6 7 8 9 10
0.0000.0040.0080.012
log(n)
AvgRMSEfM+δ
q
q
q
q
q
q
q
q
q
q
q
q
q q
q q
q q
q q q q
q q
q q q q q q q q q q q q q q q q q q q q q q q q q qq
q
GaSP
S−GaSP
q q q q q q q
q
q
q q q q q
q q q q
q
q q q q
q
q q q q q
q
q q q
q
q q q
q
q
q q q q
q q q q q q q
5 6 7 8 9 10
−8−6−4−20
log(n)
log(RMSEθ)
q
q q q
q q
q
q q
q
q q q
q
q q
q
q
q
q
q q q
q q q
q q q
q q
q q q
q q q q
q q q q q q q q q q q q
q
q
GaSP
S−GaSP
Figure 3: Calibration and prediction for Example 2 when the
parameters (θ, σ2
0, γ, λ) are estimated by MLE. In the left panel, the
black curve represents the theoretical upper bound by Corollary 6 (up
to a constant). In the right panel, the black curve represents the
theoretical upper bound in Theorem 2 (up to a constant). λz = 1/
√
λ
is assumed for the S-GaSP calibration.
30. Orthogonality
Corollary 7
Under some regular conditions, the penalized KRR estimator for the
calibration parameters in the S-GaSP calibration in (8) satisfies
ˆbλ,λz ,n(·),
∂fM
(·, ˆuλ,λz ,n)
∂uj H
+ λz
ˆbλ,λz ,n(·),
∂fM
(·, ˆuλ,λz ,n)
∂uj L2(X)
= 0;
Further assuming the mathematical model is differentiable at ˆuλ,n, the
KRR estimator of the calibration parameters in the GaSP calibration
in (2) satisfies
ˆbλ,n(·),
∂fM
(·, ˆuλ,n)
∂uj H
= 0,
for any uj , j = 1, ..., q.
31. The discretized scaled Gaussian stochastic process
One can select NC distinct points to discretize the input space
[0, 1]p to replace ξ∈X b(ξ)2dξ by 1
NC
NC
i=1 b(xC
i )2 in the
S-GaSP model in (3). More specifically, we let the discretized
points be the observed variable inputs, i.e. xC
i = xi, for
i = 1, ..., NC and NC = n. The discretized S-GaSP is then
defined as
yO
(x) = fM
(x, u) + bzd
(x) + ,
bzd
(x) = b(x) |
1
n
n
i=1
b(xi)2
= Zd
b(·) ∼ GaSP(0, σ2
K(·, ·)),
Zd ∼ pZd
(·), ∼ N(0, σ2
0).
(13)
Still we assume the default choice of pZd
(·) and gZd
(·) defined
in (4) and (5), respectively.
32. Theorem 3 (Predictive distribution of the discretized
S-GaSP)
Assume the bzd
(·) in (13) with pZd
(·) and gZd
(·) defined in (4) and (5),
respectively. The predictive distribution of the field data at any x ∈ X
by the discretized S-GaSP model in (13) is as follows
yO
(x) | yO
, u, σ2
0, λ, λz ∼ MN(ˆµzd
(x), σ2
0((nλ)−1
K∗
zd
(x, x) + 1)),
where
ˆµzd
(x) = fM
(x, u) +
rT
(x)
1 + λλz
R +
nλ
1 + λλz
In
−1
yO
− fM
u ,
K∗
zd
(x, x) = K(x, x) − rT
(x) In − R +
nλ
1 + λλz
In
−1
n
(1 + λλz)λz
˜R−1
r(x)
for any x ∈ X where r(x) = (K(x, x1), ..., K(x, xn))T
and ˜R = R + n
λz
In
with the (i, j) entry of R being K(xi , xj ) and λ = σ2
0/(nσ2
).
A more interesting result is that the predictive mean and variance
from discretized S-GaSP is exactly the same as the GaSP, when the
data is noise-free.
33. Some of our works in calibration and identifiability
Gu, M., Xie, F. and Wang, L. (2018). A theoretical framework of the
scaled Gaussian stochastic process. arXiv:1807.03829.
Gu, M. and Wang, L. (2017). Scaled Gaussian stochastic process for
computer model calibration and prediction. arXiv:1707.08215.
Gu, M. (2018) “RobustCalibration" available at CRAN, an R package
for robust calibration of imperfect mathematical model. R package
version 0.5.1.
Gu, M. (2018). Jointly robust prior for emulation, variable selection
and calibration. arXiv:1804.09329.
Gu, M. and Shen, W. (2018) Generalized probabilistic principal
component analysis (GPPCA) for correlated data.
35. Kyle R Anderson and Michael P Poland. Bayesian estimation of
magma supply, storage, and eruption rates using a
multiphysical volcano model: K¯ılauea volcano, 2000–2012.
Earth and Planetary Science Letters, 447:161–171, 2016.
Kyle R Anderson and Michael P Poland. Abundant carbon in
the mantle beneath hawai/i. Nature Geoscience, 10(9):
704–708, 2017.
Paul D Arendt, Daniel W Apley, and Wei Chen. Quantification
of model uncertainty: Calibration, model discrepancy, and
identifiability. Journal of Mechanical Design, 134(10):100908,
2012a.
Paul D Arendt, Daniel W Apley, Wei Chen, David Lamb, and
David Gorsich. Improving identifiability in model calibration
using multiple responses. Journal of Mechanical Design, 134
(10):100909, 2012b.
Maria J Bayarri, James O Berger, Rui Paulo, Jerry Sacks,
John A Cafeo, James Cavendish, Chin-Hsu Lin, and Jian Tu.
A framework for validation of computer models.
Technometrics, 49(2):138–154, 2007a.
36. MJ Bayarri, JO Berger, J Cafeo, G Garcia-Donato, F Liu,
J Palomo, RJ Parthasarathy, R Paulo, J Sacks, and D Walsh.
Computer model validation with functional output. The
Annals of Statistics, 35(5):1874–1906, 2007b.
Mengyang Gu. Jointly robust prior for gaussian stochastic
process in emulation, calibration and variable selection. arXiv
preprint arXiv:1804.09329, 2018.
Dave Higdon, James Gattiker, Brian Williams, and Maria
Rightley. Computer model calibration using high-dimensional
output. Journal of the American Statistical Association, 103
(482):570–583, 2008.
James S Hodges and Brian J Reich. Adding
spatially-correlated errors can mess up the fixed effect you
love. The American Statistician, 64(4):325–334, 2010.
John Hughes and Murali Haran. Dimension reduction and
alleviation of confounding for spatial generalized linear mixed
models. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 75(1):139–159, 2013.
37. Marc C Kennedy and Anthony O’Hagan. Bayesian calibration
of computer models. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 63(3):425–464, 2001.
Fei Liu, MJ Bayarri, and JO Berger. Modularization in bayesian
analysis, with emphasis on analysis of computer models.
Bayesian Analysis, 4(1):119–150, 2009.
Rui Paulo, Gonzalo García-Donato, and Jesús Palomo.
Calibration of computer models with multivariate output.
Computational Statistics and Data Analysis, 56(12):
3959–3974, 2012.
Matthew Plumlee. Bayesian calibration of inexact computer
models. Journal of the American Statistical Association,
(just-accepted), 2016.
Brian J Reich, James S Hodges, and Vesna Zadnik. Effects of
residual smoothing on the posterior of the fixed effects in
disease-mapping models. Biometrics, 62(4):1197–1206,
2006.
Rui Tuo and CF Jeff Wu. Efficient calibration for imperfect
38. computer models. The Annals of Statistics, 43(6):
2331–2352, 2015.
Rui Tuo and CF Jeff Wu. A theoretical framework for calibration
in computer models: parametrization, estimation and
convergence properties. SIAM/ASA Journal on Uncertainty
Quantification, 4(1):767–795, 2016.
Raymond KW Wong, Curtis B Storlie, and Thomas Lee. A
frequentist approach to computer model calibration. Journal
of the Royal Statistical Society: Series B (Statistical
Methodology), 79:635–648, 2017.
39. Application: ground deformation and Kilauea Volcano
(just had the biggest eruption in 100 years...)
Figure 4: Two satellite interferograms used in Anderson and Poland
(2016, 2017).
A geophysical/mathematical model fM
(x, u) is used to model the
ground displacement.
The calibration parameters are the spatial location of the magma
chamber, depth of the chamber, magma storage rate, host rock
properties, etc.
“All models are wrong".
40. Calibrating the geophysical model for Kilauea Volcano
Table 1: Input variables and calibration parameters of the geophysical
model for Kilauea Volcano in 2011 to 2012.
Input variable (x) Name Description
x1 Latitude Spatial coordinate
x2 Longitude Spatial coordinate
Parameter (u) Name Description
u1 ∈ [−2000, 3000] Chamber east (m) Spatial coordinate for the chamber
u2 ∈ [−2000, 5000] Chamber north (m) Spatial coordinate for the chamber
u3 ∈ [500, 6000] Chamber depth (m) Depth of the chamber
u4 ∈ [0, 0.15] Res. vol. change rate (m3/s) Volume change rate of the reservoir
u5 ∈ [0.25, 0.33] Poisson’s ratio Host rock property
We compare GaSP and S-GaSP for calibrating the geophysical model
in Anderson and Poland (2016) – the displacement of the ground’s
surface caused by addition of magma to a spherical reservoir.
41. Posterior distribution of the calibration parameters
−2000 −1000 0 1000 2000 3000
0.00000.00050.00100.00150.0020
θ1
Density
GaSP
S−GaSP
−2000 0 1000 3000 5000
0.00000.00050.00100.00150.0020
θ2
Density
GaSP
S−GaSP
1000 2000 3000 4000 5000 6000
0.00000.00040.00080.0012
θ3
Density
GaSP
S−GaSP
0.00 0.05 0.10 0.15
010203040
θ4
Density
GaSP
S−GaSP
0.26 0.28 0.30 0.32
68101214
θ5
Density
GaSP
S−GaSP
Figure 5: Marginal posterior density of u by the GaSP (red curves)
and S-GaSP (blue curves).
The marginal posterior of u3, the chamber depth, is centered at
the large value ≈ 5000 in the GaSP calibration.
42. Prediction for the first interferogram
−4000 −2000 0 2000 4000
−200020006000
Computer model by GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model by S−GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model and discrepancy by GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model and discrepancy by S−GaSP
−0.02
0.00
0.02
0.04
Figure 6: Prediction by different models of the first interferogram.
The predictive mean squared error by the GaSP calibration and
S-GaSP calibration is 2.70 × 10−5
and 2.40 × 10−5
using the
calibrated computer model, respectively.
43. Prediction for the second interferogram
−4000 −2000 0 2000 4000
−200020006000
Computer model by GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model by S−GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model and discrepancy by GaSP
−0.02
0.00
0.02
0.04
−4000 −2000 0 2000 4000
−200020006000
Computer model and discrepancy by S−GaSP
−0.02
0.00
0.02
0.04
Figure 7: Prediction by different models of the second interferogram.
The predictive mean squared error by the GaSP calibration and
S-GaSP calibration is 1.43 × 10−4
and 1.21 × 10−4
using the
calibrated computer model, respectively.