Final Presentation given at the conclusion of the 2018 IMSM by the US EPA Student Working Group.
Group Members: Elizabeth Herman, Jeonghwa Lee, Kartik Lovekar, Dorcas Ofori-Boateng, Fatemeh Norouzi, Benazir Rowe and Jianhui Sun
In this paper we propose a new test statistic for unsupervised
change detection in polarimetric synthetic aperture radar (PolSAR)
data. We work with multilook complex (MLC) covariance
matrix data, whose underlying model is assumed to be
the scaled complex Wishart distribution. We use the complex
kind Hotelling-Lawley (HL) trace statistic for measuring the
similarity of two covariance matrices. The sampling distribution
of the HL trace is approximated by a Fisher-Snedecor
distribution, which is used to define the significance level of a
constant false alarm rate change detector. The performance of
the proposed method is tested on simulated and real PolSAR
data sets and compared to the likelihood ratio test statistic
Data Visualization Techniques in Meteorological and Climatological World usin...Bosnia Agile
1. How to visualize .nc data in R using RNCEP library
Exempli Gratia: Mean Precipitation and Temperature Regime Map for European Countries in 2019 (w/o rasterizing)
Technical part: this part will briefly explain the importance of proper data visualization of meteorological/climatological data, especially in NWP (numerical weather prediction). The brief, comprehensive hierarchy of .nc data will be enlisted within the presentation and clarified for the audience. As an alternative, GRIB2 type will be also mentioned.
Practical part: once the technical part is clear, the programmable code will be briefly shown to audience on how to visualize the precipitation and temperature map. This will be achieved using variety of libraries and corresponding methods under CRAN repository, such as sf, lubridate, tidyverse and the most pivotal - RNCEP.
2. Our own developed NWP (numerical weather prediction) model: NOTHAS
NWP Logic: the algorithmic approach behind NOTHAS will be briefly explained as part of visualizing and parametrizing the .nc and GRIB2 data within the integrated WRF domain inside the Southeastern Europe Domain using ICON-EU, GFS, ECMWF and/or ICON-EU model data as initial parameters. The algorithm itself will be shown and onwards briefly explained for parameterized data to the audience. Final result will include the results of visualized parameters for specific scenarios.
3. Why Stripes?
Logic: a simple, yet effective way of showing the importance of global/local temperature rise caused by effects of climate change. Three colors and bunch of stripes inside one simple piece of R code will be demonstrated on the example of our country.
4. OpenGrADS
Technical part: as a software tool that has been widely used in the meteorological circles, we will briefly explain the logic behind OpenGrADS.
Exempla Gratia: show both results: the existing visual, and code our visual for 500 hPa altitude pressure anomaly using CFS data.
swisstopo, the early cloud adopter within the Swiss Public Administration, has started usind cloud services in 2008 and successfully deployed "geo.admin.ch: the Swiss Geoportal" in 2010. The presentation depicts swisstopo's 5 years experience in using cloud services and explains why the "Public Cloud Only" approach has to be extended to an "Open Hybrid Cloud" approach to enable further continued growth and sustainability.
On 15th October 2013 geo.admin.ch, the geoportal of the Swiss Confederation has achieved the 2nd place in the EuroCloud Award 2013 in the category of «Best Cloud Service Use Case Public Sector» (the award ceremony took place in the context of the EuroCloud Congress in Luxembourg on October 15th 2013). This award honors the most innovative cloud solution characterized by originality, innovation, creativity and efficiency.
In this paper we propose a new test statistic for unsupervised
change detection in polarimetric synthetic aperture radar (PolSAR)
data. We work with multilook complex (MLC) covariance
matrix data, whose underlying model is assumed to be
the scaled complex Wishart distribution. We use the complex
kind Hotelling-Lawley (HL) trace statistic for measuring the
similarity of two covariance matrices. The sampling distribution
of the HL trace is approximated by a Fisher-Snedecor
distribution, which is used to define the significance level of a
constant false alarm rate change detector. The performance of
the proposed method is tested on simulated and real PolSAR
data sets and compared to the likelihood ratio test statistic
Data Visualization Techniques in Meteorological and Climatological World usin...Bosnia Agile
1. How to visualize .nc data in R using RNCEP library
Exempli Gratia: Mean Precipitation and Temperature Regime Map for European Countries in 2019 (w/o rasterizing)
Technical part: this part will briefly explain the importance of proper data visualization of meteorological/climatological data, especially in NWP (numerical weather prediction). The brief, comprehensive hierarchy of .nc data will be enlisted within the presentation and clarified for the audience. As an alternative, GRIB2 type will be also mentioned.
Practical part: once the technical part is clear, the programmable code will be briefly shown to audience on how to visualize the precipitation and temperature map. This will be achieved using variety of libraries and corresponding methods under CRAN repository, such as sf, lubridate, tidyverse and the most pivotal - RNCEP.
2. Our own developed NWP (numerical weather prediction) model: NOTHAS
NWP Logic: the algorithmic approach behind NOTHAS will be briefly explained as part of visualizing and parametrizing the .nc and GRIB2 data within the integrated WRF domain inside the Southeastern Europe Domain using ICON-EU, GFS, ECMWF and/or ICON-EU model data as initial parameters. The algorithm itself will be shown and onwards briefly explained for parameterized data to the audience. Final result will include the results of visualized parameters for specific scenarios.
3. Why Stripes?
Logic: a simple, yet effective way of showing the importance of global/local temperature rise caused by effects of climate change. Three colors and bunch of stripes inside one simple piece of R code will be demonstrated on the example of our country.
4. OpenGrADS
Technical part: as a software tool that has been widely used in the meteorological circles, we will briefly explain the logic behind OpenGrADS.
Exempla Gratia: show both results: the existing visual, and code our visual for 500 hPa altitude pressure anomaly using CFS data.
swisstopo, the early cloud adopter within the Swiss Public Administration, has started usind cloud services in 2008 and successfully deployed "geo.admin.ch: the Swiss Geoportal" in 2010. The presentation depicts swisstopo's 5 years experience in using cloud services and explains why the "Public Cloud Only" approach has to be extended to an "Open Hybrid Cloud" approach to enable further continued growth and sustainability.
On 15th October 2013 geo.admin.ch, the geoportal of the Swiss Confederation has achieved the 2nd place in the EuroCloud Award 2013 in the category of «Best Cloud Service Use Case Public Sector» (the award ceremony took place in the context of the EuroCloud Congress in Luxembourg on October 15th 2013). This award honors the most innovative cloud solution characterized by originality, innovation, creativity and efficiency.
Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...Mumbai B.Sc.IT Study
Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 Pattern]
april - 2017, april - 2016, april - 2015, april - 2014, april - 2013, october - 2017, october - 2016, october - 2015, october - 2014, may - 2016, may - 2017, december - 2017, 75:25 pattern, 60:40 pattern, revised course, old course, mumbai bscit study, mumbai university, bscit semester vi, bscit question paper, old question paper, previous year question paper, semester vi question paper, question paper, CBSGS, IDOL, kamal t, internet technology, digital signals and systems, data warehousing, ipr and cyber laws, project management, geographic information system
Mapping the anthropic backfill of the historical center of Rome (Italy) by us...Beniamino Murgante
Mapping the anthropic backfill of the historical center of Rome (Italy) by using Intrinsic Random Functions of order k (IRF-k)
Ciotoli Giancarlo, Francesco Stigliano, Fabrizio Marconi, Massimiliano Moscatelli, Marco Mancini, Gian Paolo Cavinato - Institute of Environmental Geology and Geo-engineering (I.G.A.G.), National Research Council, Italy
3d hydrogeological conceptual model building in denmarkTorben Bach
Lessons learned from the Danish groundwater mapping campaign
Presented October 4th at the 2017 Groundwater Resources Association meeting in Sacramento, CA
Presenter: Torben Bach, I-GIS
Dr Jerome O Connell - presentation made at various conferences throughout Europe as part of PhD which was funded by the EPA under the STRIVE Research Programme 2007-2013 (2007-PhD-ET-2)
The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation
technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space,
we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in
distributed systems. This is joint work with Jon Hobbs, Alex Konomi, Pulong Ma, and Anirban Mondal, and Joon Jin Song.
PLOTCON NYC: Custom Colormaps for Your FieldPlotly
Visualizations can be clear or obscure depending on the color scheme used to represent the data, and careful use of color can also be attractive. However, colormaps have not generally received the attention they deserve, given their significance. The colors used carry the responsibility of conveying data honestly and accurately. They should generally be perceptually uniform so that equal steps through the dataset are represented by equal perceptual jumps in the colormap. They should be intuitive to help support quick, natural understanding of the data. They should match basic properties of the data, like showing the presence of information (sequential) or anomalies in a field (diverging). Additionally, just as different variables are typically represented with different specific Greek letters when written, different variables should also be represented with different colormaps when plotted. A suite of colormaps called cmocean have been developed to meet the needs of oceanographers, and can be used by any plotter out there. The suite is freely available for many different software packages (including Python and R). You can use these colormaps to help convey your data honestly and accurately.
The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space, we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in distributed systems.
Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...Mumbai B.Sc.IT Study
Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 Pattern]
april - 2017, april - 2016, april - 2015, april - 2014, april - 2013, october - 2017, october - 2016, october - 2015, october - 2014, may - 2016, may - 2017, december - 2017, 75:25 pattern, 60:40 pattern, revised course, old course, mumbai bscit study, mumbai university, bscit semester vi, bscit question paper, old question paper, previous year question paper, semester vi question paper, question paper, CBSGS, IDOL, kamal t, internet technology, digital signals and systems, data warehousing, ipr and cyber laws, project management, geographic information system
Mapping the anthropic backfill of the historical center of Rome (Italy) by us...Beniamino Murgante
Mapping the anthropic backfill of the historical center of Rome (Italy) by using Intrinsic Random Functions of order k (IRF-k)
Ciotoli Giancarlo, Francesco Stigliano, Fabrizio Marconi, Massimiliano Moscatelli, Marco Mancini, Gian Paolo Cavinato - Institute of Environmental Geology and Geo-engineering (I.G.A.G.), National Research Council, Italy
3d hydrogeological conceptual model building in denmarkTorben Bach
Lessons learned from the Danish groundwater mapping campaign
Presented October 4th at the 2017 Groundwater Resources Association meeting in Sacramento, CA
Presenter: Torben Bach, I-GIS
Dr Jerome O Connell - presentation made at various conferences throughout Europe as part of PhD which was funded by the EPA under the STRIVE Research Programme 2007-2013 (2007-PhD-ET-2)
The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation
technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space,
we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in
distributed systems. This is joint work with Jon Hobbs, Alex Konomi, Pulong Ma, and Anirban Mondal, and Joon Jin Song.
PLOTCON NYC: Custom Colormaps for Your FieldPlotly
Visualizations can be clear or obscure depending on the color scheme used to represent the data, and careful use of color can also be attractive. However, colormaps have not generally received the attention they deserve, given their significance. The colors used carry the responsibility of conveying data honestly and accurately. They should generally be perceptually uniform so that equal steps through the dataset are represented by equal perceptual jumps in the colormap. They should be intuitive to help support quick, natural understanding of the data. They should match basic properties of the data, like showing the presence of information (sequential) or anomalies in a field (diverging). Additionally, just as different variables are typically represented with different specific Greek letters when written, different variables should also be represented with different colormaps when plotted. A suite of colormaps called cmocean have been developed to meet the needs of oceanographers, and can be used by any plotter out there. The suite is freely available for many different software packages (including Python and R). You can use these colormaps to help convey your data honestly and accurately.
The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space, we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in distributed systems.
This deals with the assessment of several parameterizations of longwave radiation. The parametes were calibrated using a calibration tool on Ameriflux data.
A Novel Technique in Software Engineering for Building Scalable Large Paralle...Eswar Publications
Parallel processing is the only alternative for meeting computational demand of scientific and technological advancement. Yet first few parallelized versions of a large application code- in the present case-a meteorological Global Circulation Model- are not usually optimal or efficient. Large size and complexity of the code cause making changes for efficient parallelization and further validation difficult. The paper presents some novel techniques to enable change of parallelization strategy keeping the correctness of the code under control throughout the modification.
Optimization of sample configurations for spatial trend estimationAlessandro Samuel-Rosa
Developed with Dick J Brus (Alterra, Wageningen University and Research Centre, the Netherlands), Gustavo M Vasques (Embrapa Soils, Brazil), Lúcia Helena Cunha dos Anjos (Universidade Federal Rural do Rio de Janeiro, Brazil). Presented at Pedometrics 2015, 14-18 September 2015, Córdoba, Spain.
Gauss–Bonnet Boson Stars in AdS, Bielefeld, Germany, 2014Jurgen Riedel
Strong coupling to gravity: self-interacting rotating boson
stars are destabilized.
Sufficiently small AdS radius: self-interacting rotating boson
stars are destabilized.
Sufficiently strong rotation stabilizes self-interacting rotating
boson stars.
Onset of ergoregions can occur on the main branch of boson
star solutions, supposed to be classically stable.
Radially excited self-interacting rotating boson stars can be
classically stable in aAdS for sufficiently large AdS radius and
sufficiently small backreaction.
MONOGENIC SCALE SPACE BASED REGION COVARIANCE MATRIX DESCRIPTOR FOR FACE RECO...cscpconf
In this paper, we have presented a new face recognition algorithm based on region covariance
matrix (RCM) descriptor computed in monogenic scale space. In the proposed model, energy
information obtained using monogenic filter is used to represent a pixel at different scales to
form region covariance matrix descriptor for each face image during training phase. An eigenvalue
based distance measure is used to compute the similarity between face images. Extensive
experimentation on AT&T and YALE face database has been conducted to reveal the
performance of the monogenic scale space based region covariance matrix method and
comparative analysis is made with the basic RCM method and Gabor based region covariance matrix method to exhibit the superiority of the proposed technique.
Exploring DEM error with geographically weighted regressionGeoCommunity
Michal Gallay, Christopher D. Lloyd, Jennifer McKinley: Exploring DEM error with geographically weighted regression (poster), 9th International Symposium GIS Ostrava, VŠB – Technical Univerzity of Ostrava, from 23rd to 25th January 2012
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2018 IMSM: Splicing of Multi-Scale Downscaler Air Quality Sufaces - US EPA Working Group, July 25, 2018
1. Splicing of Multi-Scale Downscaler Air Quality
Surfaces
Elizabeth Herman, Jeonghwa Lee, Kartik Lovekar, Dorcas
Ofori-Boateng, Fatemeh Norouzi, Benazir Rowe, and Jianhui Sun
Industrial Math/Stat Modeling Workshop 2018
July 25, 2018
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 1 / 23
2. Motivation
In 2016,
122.5 million people
live in counties high
levels of air
pollutant
concentrations
12.1 million people
live in counties
which have high
levels of PM2.5
7 million premature
deaths caused by
ambient air
pollution.
http://www.who.int/gho/phe/air pollution mortality/en/
https://www.epa.gov/air-trends/air-quality-national-summary
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 2 / 23
3. Data
Air Quality System (AQS):
Point-source measurements
(usually near large cities)
IMPROVE sites: Point-source
measurements (usually near
rural areas)
Downscaler Model (DS): fuses
estimates of pollutant obtained
through a model that uses
current knowledge of the
atmosphere and AQS readings
using a spatially-varying
weighted model
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 3 / 23
4. Data
Old method: Run DS on National surface
New method: Run DS over regional surface
In DS, there is one range parameter
Run regions in parallel
Perform better
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 4 / 23
5. Data
Run the DS on the NOAA climate regions with overlap area.
Question: How to deal with the multiple values in the overlap
region?
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 5 / 23
6. Regions: Overlap
Question: How to deal with the multiple values in the overlap
region?
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 6 / 23
8. Exploratory Data Analysis: Relative Discrepancy
Let IMPROVEs be the air
pollutant readings from
IMPROVE station at
location s, and DSk be the
DS output from the k-th
grid which includes the
IMPROVE station s, then
FB(IMPROVEs, DSk) =
DSk − IMPROVEs
(IMPROVEs + DSk)/2
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 8 / 23
9. Downscaler and IMPROVE Discrepancy
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 9 / 23
10. Downscaler and AQS Discrepancy
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 10 / 23
11. Methodology: Horizontal Mixed Density (HMD)
Model Assumption: For site s
fs = w1(s)f1,s + w2(s)f2,s
where fi,s is a normal density function with
µ = ˆµi,s(estimated DS mean at s),
σ = ˆσi,s(estimated DS standard error at s) from region i,
wi (s) =
e−φd(s,i)
e−φd(s,1) + e−φd(s,2)
d(s, i) is the distance of point s to region i, i = 1, 2.
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 11 / 23
12. Methodology: Horizontal Mixed Density (HMD)
Figure 1: Distance from a site to the boundary
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 12 / 23
13. Methodology: Horizontal Mixed Variable (HMD)
Figure 2: Weight functions with different φ values
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 13 / 23
14. Results
HMD
Figure 3: HMD applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 14 / 23
15. Methodology: Horizontal Mixed Variable (HMV)
For a site s the DS random variable from region i is:
Xi,s ∼ N(ˆµi,s, ˆσi,s), i = 1, 2.
Our new variable at site s is :
Xs = w1(s)X1,s + w2(s)X2,s
where the weight wi (s) is defined as before,
wi (s) = e−φd(s,i)
/(e−φd(s,1)
+ e−φd(s,2)
).
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 15 / 23
16. Results
HMV
Figure 4: HMV applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 16 / 23
17. Methodology: Adaptive Horizontal Mixed Variable
(AHMV)
Our new variable at site s is :
Xs = w1(s)X1,s + w2(s)X2,s
with
wi (s) = e−φd(s,i)
/(e−φd(s,1)
+ e−φd(s,2)
)
and
φ(d(s, c)) = β0 + β1d(s, c)
where d(s, c) is the horizontal distance of s to the vertical center line.
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 17 / 23
18. Methodology: Adaptive Horizontal Mixed Variable
(AHMV)
Figure 5: Distance from a site to the center
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 18 / 23
19. Results
AHMV
Figure 6: AHMV applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 19 / 23
20. Results
Table 1: Mean Square Error for AQS and IMPROVE sites (NW & NR)
Method
Data source HMD HMV AHMV
AQS 2.596 2.829 2.823
IMPROVE 47.913 42.588 42.250
Table 2: Mean Square Error for DS (NW & NR)
Region
Data source NW NR
AQS 3.89 3.21
IMPROVE 65.00 24.00
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 20 / 23
21. Conclusion and Future Work
Conclusion:
Methods produce smooth surface
Future Work:
Extend to multiple zones and include latitudes
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 21 / 23
22. THANK YOU!
Elizabeth Mannshardt, Barron Henderson, and Brett Gantt
Brian Reich
Organizers of IMSM
SAMSI
QUESTIONS
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 22 / 23
23. References
Berrocal, V. J., Gelfand, A. E. and Holland, D. M. (2010a). A
spatio-temporal DS for outputs from numerical models. J. Agric. Biol.
Environ. Stat. 15 176197.doi:10.1007/s13253- 009-0004-z
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 23 / 23