2018 IMSM: Splicing of Multi-Scale Downscaler Air Quality Sufaces - US EPA Working Group, July 25, 2018

•

0 likes•139 views

The Statistical and Applied Mathematical Sciences Institute

Final Presentation given at the conclusion of the 2018 IMSM by the US EPA Student Working Group. Group Members: Elizabeth Herman, Jeonghwa Lee, Kartik Lovekar, Dorcas Ofori-Boateng, Fatemeh Norouzi, Benazir Rowe and Jianhui Sun

Education

Splicing of Multi-Scale Downscaler Air Quality
Surfaces
Elizabeth Herman, Jeonghwa Lee, Kartik Lovekar, Dorcas
Ofori-Boateng, Fatemeh Norouzi, Benazir Rowe, and Jianhui Sun
Industrial Math/Stat Modeling Workshop 2018
July 25, 2018
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 1 / 23

Motivation
In 2016,
122.5 million people
live in counties high
levels of air
pollutant
concentrations
12.1 million people
live in counties
which have high
levels of PM2.5
7 million premature
deaths caused by
ambient air
pollution.
http://www.who.int/gho/phe/air pollution mortality/en/
https://www.epa.gov/air-trends/air-quality-national-summary
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 2 / 23

Data
Air Quality System (AQS):
Point-source measurements
(usually near large cities)
IMPROVE sites: Point-source
measurements (usually near
rural areas)
Downscaler Model (DS): fuses
estimates of pollutant obtained
through a model that uses
current knowledge of the
atmosphere and AQS readings
using a spatially-varying
weighted model
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 3 / 23

Data
Old method: Run DS on National surface
New method: Run DS over regional surface
In DS, there is one range parameter
Run regions in parallel
Perform better
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 4 / 23

Data
Run the DS on the NOAA climate regions with overlap area.
Question: How to deal with the multiple values in the overlap
region?
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 5 / 23

Regions: Overlap
Question: How to deal with the multiple values in the overlap
region?
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 6 / 23

Problem statement
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 7 / 23

Exploratory Data Analysis: Relative Discrepancy
Let IMPROVEs be the air
pollutant readings from
IMPROVE station at
location s, and DSk be the
DS output from the k-th
grid which includes the
IMPROVE station s, then
FB(IMPROVEs, DSk) =
DSk − IMPROVEs
(IMPROVEs + DSk)/2
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 8 / 23

Downscaler and IMPROVE Discrepancy
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 9 / 23

Downscaler and AQS Discrepancy
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 10 / 23

Methodology: Horizontal Mixed Density (HMD)
Model Assumption: For site s
fs = w1(s)f1,s + w2(s)f2,s
where fi,s is a normal density function with
µ = ˆµi,s(estimated DS mean at s),
σ = ˆσi,s(estimated DS standard error at s) from region i,
wi (s) =
e−φd(s,i)
e−φd(s,1) + e−φd(s,2)
d(s, i) is the distance of point s to region i, i = 1, 2.
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 11 / 23

Methodology: Horizontal Mixed Density (HMD)
Figure 1: Distance from a site to the boundary
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 12 / 23

Methodology: Horizontal Mixed Variable (HMD)
Figure 2: Weight functions with diﬀerent φ values
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 13 / 23

Results
HMD
Figure 3: HMD applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 14 / 23

Methodology: Horizontal Mixed Variable (HMV)
For a site s the DS random variable from region i is:
Xi,s ∼ N(ˆµi,s, ˆσi,s), i = 1, 2.
Our new variable at site s is :
Xs = w1(s)X1,s + w2(s)X2,s
where the weight wi (s) is deﬁned as before,
wi (s) = e−φd(s,i)
/(e−φd(s,1)
+ e−φd(s,2)
).
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 15 / 23

Results
HMV
Figure 4: HMV applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 16 / 23

Methodology: Adaptive Horizontal Mixed Variable
(AHMV)
Our new variable at site s is :
Xs = w1(s)X1,s + w2(s)X2,s
with
wi (s) = e−φd(s,i)
/(e−φd(s,1)
+ e−φd(s,2)
)
and
φ(d(s, c)) = β0 + β1d(s, c)
where d(s, c) is the horizontal distance of s to the vertical center line.
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 17 / 23

Methodology: Adaptive Horizontal Mixed Variable
(AHMV)
Figure 5: Distance from a site to the center
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 18 / 23

Results
AHMV
Figure 6: AHMV applied on the intersection of NR and NW
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 19 / 23

Results
Table 1: Mean Square Error for AQS and IMPROVE sites (NW & NR)
Method
Data source HMD HMV AHMV
AQS 2.596 2.829 2.823
IMPROVE 47.913 42.588 42.250
Table 2: Mean Square Error for DS (NW & NR)
Region
Data source NW NR
AQS 3.89 3.21
IMPROVE 65.00 24.00
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 20 / 23

Conclusion and Future Work
Conclusion:
Methods produce smooth surface
Future Work:
Extend to multiple zones and include latitudes
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 21 / 23

THANK YOU!
Elizabeth Mannshardt, Barron Henderson, and Brett Gantt
Brian Reich
Organizers of IMSM
SAMSI
QUESTIONS
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 22 / 23

References
Berrocal, V. J., Gelfand, A. E. and Holland, D. M. (2010a). A
spatio-temporal DS for outputs from numerical models. J. Agric. Biol.
Environ. Stat. 15 176197.doi:10.1007/s13253- 009-0004-z
H, L, L, O, N, R, S (IMSM 2018) Splicing July 25, 2018 23 / 23

In this paper we propose a new test statistic for unsupervised change detection in polarimetric synthetic aperture radar (PolSAR) data. We work with multilook complex (MLC) covariance matrix data, whose underlying model is assumed to be the scaled complex Wishart distribution. We use the complex kind Hotelling-Lawley (HL) trace statistic for measuring the similarity of two covariance matrices. The sampling distribution of the HL trace is approximated by a Fisher-Snedecor distribution, which is used to define the significance level of a constant false alarm rate change detector. The performance of the proposed method is tested on simulated and real PolSAR data sets and compared to the likelihood ratio test statistic

Use of the German Weather Services KLAM Model to Investigate the Cold Air Dra...

IES / IAQM

Data Visualization Techniques in Meteorological and Climatological World usin...

Bosnia Agile

1. How to visualize .nc data in R using RNCEP library Exempli Gratia: Mean Precipitation and Temperature Regime Map for European Countries in 2019 (w/o rasterizing) Technical part: this part will briefly explain the importance of proper data visualization of meteorological/climatological data, especially in NWP (numerical weather prediction). The brief, comprehensive hierarchy of .nc data will be enlisted within the presentation and clarified for the audience. As an alternative, GRIB2 type will be also mentioned. Practical part: once the technical part is clear, the programmable code will be briefly shown to audience on how to visualize the precipitation and temperature map. This will be achieved using variety of libraries and corresponding methods under CRAN repository, such as sf, lubridate, tidyverse and the most pivotal - RNCEP. 2. Our own developed NWP (numerical weather prediction) model: NOTHAS NWP Logic: the algorithmic approach behind NOTHAS will be briefly explained as part of visualizing and parametrizing the .nc and GRIB2 data within the integrated WRF domain inside the Southeastern Europe Domain using ICON-EU, GFS, ECMWF and/or ICON-EU model data as initial parameters. The algorithm itself will be shown and onwards briefly explained for parameterized data to the audience. Final result will include the results of visualized parameters for specific scenarios. 3. Why Stripes? Logic: a simple, yet effective way of showing the importance of global/local temperature rise caused by effects of climate change. Three colors and bunch of stripes inside one simple piece of R code will be demonstrated on the example of our country. 4. OpenGrADS Technical part: as a software tool that has been widely used in the meteorological circles, we will briefly explain the logic behind OpenGrADS. Exempla Gratia: show both results: the existing visual, and code our visual for 500 hPa altitude pressure anomaly using CFS data.

オープンハウスにおける機械学習・データサイエンスの取り組みについて

Teito Nakagawa

Establishing nitrogen deposition over Germany using modelling and observations

Integrated Carbon Observation System (ICOS)

geo.admin.ch at the EuroCloud Congress in Luxembourg

geoportal of the federal authorities of the Swiss Confederation

swisstopo, the early cloud adopter within the Swiss Public Administration, has started usind cloud services in 2008 and successfully deployed "geo.admin.ch: the Swiss Geoportal" in 2010. The presentation depicts swisstopo's 5 years experience in using cloud services and explains why the "Public Cloud Only" approach has to be extended to an "Open Hybrid Cloud" approach to enable further continued growth and sustainability. On 15th October 2013 geo.admin.ch, the geoportal of the Swiss Confederation has achieved the 2nd place in the EuroCloud Award 2013 in the category of «Best Cloud Service Use Case Public Sector» (the award ceremony took place in the context of the EuroCloud Congress in Luxembourg on October 15th 2013). This award honors the most innovative cloud solution characterized by originality, innovation, creativity and efficiency.

A low-cost sensor network to monitor the CO2 emissions of the city of Zurich

Integrated Carbon Observation System (ICOS)

Presentation slide on Project entitled "Preparation of Deformation Model of C...

Ashmita Dhakal

Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 Pattern] april - 2017, april - 2016, april - 2015, april - 2014, april - 2013, october - 2017, october - 2016, october - 2015, october - 2014, may - 2016, may - 2017, december - 2017, 75:25 pattern, 60:40 pattern, revised course, old course, mumbai bscit study, mumbai university, bscit semester vi, bscit question paper, old question paper, previous year question paper, semester vi question paper, question paper, CBSGS, IDOL, kamal t, internet technology, digital signals and systems, data warehousing, ipr and cyber laws, project management, geographic information system

POSTER_BUSTOSGuillermo Bustos

Estimating SMOS error structure using triple collocation.pptgrssieee

07 a70401 remotesensingandgisapplicationsimaduddin91

Akiyo yatagaiClimDev15

Mapping the anthropic backfill of the historical center of Rome (Italy) by us...

Beniamino Murgante

Comparing directional maps

Rahul Rakshit

Kronholm IGS 2010, Sapporo

kallekronholm

Unit 3: figuresSERC at Carleton College

GENERATING FINE RESOLUTION LEAF AREA INDEX MAPS FOR BOREAL FORESTS OF FINLAND...grssieee

Ecosystem assessment and accounting in the context of SDG 15 Hans Dufourmont EEA

The European GNSS Agency (GSA)

Weather factor landslide hazard

Alfonso Crisci

Quantifying Landscape Changes through Land Cover Transition Potential Analysi...

Alexander Mkrtchian

Thomas Bjørneboe Berg

Association of Danish Museums / Organisationen Danske Museer

Eastern Europe

ipcc-media

Absorption factor method of radiation

Shyamala C

Smart Interpretation - Fast AEM Modelling - SAGEEP 2017

Torben Bach

3d hydrogeological conceptual model building in denmark

Torben Bach

Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...

Environmental Protection Agency, Ireland

CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...

The Statistical and Applied Mathematical Sciences Institute

The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space, we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in distributed systems. This is joint work with Jon Hobbs, Alex Konomi, Pulong Ma, and Anirban Mondal, and Joon Jin Song.

PLOTCON NYC: Custom Colormaps for Your Field

Plotly

Visualizations can be clear or obscure depending on the color scheme used to represent the data, and careful use of color can also be attractive. However, colormaps have not generally received the attention they deserve, given their significance. The colors used carry the responsibility of conveying data honestly and accurately. They should generally be perceptually uniform so that equal steps through the dataset are represented by equal perceptual jumps in the colormap. They should be intuitive to help support quick, natural understanding of the data. They should match basic properties of the data, like showing the presence of information (sequential) or anomalies in a field (diverging). Additionally, just as different variables are typically represented with different specific Greek letters when written, different variables should also be represented with different colormaps when plotted. A suite of colormaps called cmocean have been developed to meet the needs of oceanographers, and can be used by any plotter out there. The suite is freely available for many different software packages (including Python and R). You can use these colormaps to help convey your data honestly and accurately.

CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...

The Statistical and Applied Mathematical Sciences Institute

What's hot

Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...

Mumbai B.Sc.IT Study

POSTER_BUSTOSGuillermo Bustos

Estimating SMOS error structure using triple collocation.pptgrssieee

07 a70401 remotesensingandgisapplicationsimaduddin91

Akiyo yatagaiClimDev15

Mapping the anthropic backfill of the historical center of Rome (Italy) by us...

Beniamino Murgante

Comparing directional maps

Rahul Rakshit

Kronholm IGS 2010, Sapporo

kallekronholm

Unit 3: figuresSERC at Carleton College

GENERATING FINE RESOLUTION LEAF AREA INDEX MAPS FOR BOREAL FORESTS OF FINLAND...grssieee

Ecosystem assessment and accounting in the context of SDG 15 Hans Dufourmont EEA

The European GNSS Agency (GSA)

Weather factor landslide hazard

Alfonso Crisci

Quantifying Landscape Changes through Land Cover Transition Potential Analysi...

Alexander Mkrtchian

Thomas Bjørneboe Berg

Association of Danish Museums / Organisationen Danske Museer

Eastern Europe

ipcc-media

Absorption factor method of radiation

Shyamala C

Smart Interpretation - Fast AEM Modelling - SAGEEP 2017

Torben Bach

3d hydrogeological conceptual model building in denmark

Torben Bach

Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...

Environmental Protection Agency, Ireland

What's hot (19)

Geographic Information Systems (April – 2016) [Question Paper | CBSGS: 75:25 ...

POSTER_BUSTOS

Estimating SMOS error structure using triple collocation.ppt

07 a70401 remotesensingandgisapplications

Akiyo yatagai

Mapping the anthropic backfill of the historical center of Rome (Italy) by us...

Comparing directional maps

Kronholm IGS 2010, Sapporo

Unit 3: figures

GENERATING FINE RESOLUTION LEAF AREA INDEX MAPS FOR BOREAL FORESTS OF FINLAND...

Ecosystem assessment and accounting in the context of SDG 15 Hans Dufourmont EEA

Weather factor landslide hazard

Quantifying Landscape Changes through Land Cover Transition Potential Analysi...

Thomas Bjørneboe Berg

Eastern Europe

Absorption factor method of radiation

Smart Interpretation - Fast AEM Modelling - SAGEEP 2017

3d hydrogeological conceptual model building in denmark

Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...

Similar to 2018 IMSM: Splicing of Multi-Scale Downscaler Air Quality Sufaces - US EPA Working Group, July 25, 2018

CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...

The Statistical and Applied Mathematical Sciences Institute

PLOTCON NYC: Custom Colormaps for Your Field

Plotly

CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...

The Statistical and Applied Mathematical Sciences Institute

Longwave radiation

Riccardo Rigon

IGARSS11_HongboSu_ver3.pptgrssieee

On the Effect of Geometries Simplification on Geo-spatial Link Discovery

Abdullah Ahmed

Data Mgmt-Liu.pdf

clientmentailai

Fire Risk Mapping using GIS.pptx

sahl_2fast

Improving Dtm AccuracyUniversity of Oradea

A Novel Technique in Software Engineering for Building Scalable Large Paralle...

Eswar Publications

Parallel processing is the only alternative for meeting computational demand of scientific and technological advancement. Yet first few parallelized versions of a large application code- in the present case-a meteorological Global Circulation Model- are not usually optimal or efficient. Large size and complexity of the code cause making changes for efficient parallelization and further validation difficult. The paper presents some novel techniques to enable change of parallelization strategy keeping the correctness of the code under control throughout the modification.

Optimization of sample configurations for spatial trend estimation

Alessandro Samuel-Rosa

Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...

SYRTO Project

1-s2.0-S0045782514004332-main.pdf

filiatra

Gauss–Bonnet Boson Stars in AdS, Bielefeld, Germany, 2014

Jurgen Riedel

Strong coupling to gravity: self-interacting rotating boson stars are destabilized. Sufficiently small AdS radius: self-interacting rotating boson stars are destabilized. Sufficiently strong rotation stabilizes self-interacting rotating boson stars. Onset of ergoregions can occur on the main branch of boson star solutions, supposed to be classically stable. Radially excited self-interacting rotating boson stars can be classically stable in aAdS for sufficiently large AdS radius and sufficiently small backreaction.

V27i03

Rizal Ferdiansyah

MONOGENIC SCALE SPACE BASED REGION COVARIANCE MATRIX DESCRIPTOR FOR FACE RECO...

cscpconf

In this paper, we have presented a new face recognition algorithm based on region covariance matrix (RCM) descriptor computed in monogenic scale space. In the proposed model, energy information obtained using monogenic filter is used to represent a pixel at different scales to form region covariance matrix descriptor for each face image during training phase. An eigenvalue based distance measure is used to compute the similarity between face images. Extensive experimentation on AT&T and YALE face database has been conducted to reveal the performance of the monogenic scale space based region covariance matrix method and comparative analysis is made with the basic RCM method and Gabor based region covariance matrix method to exhibit the superiority of the proposed technique.

Assessment of the Environmental Impact of Landfill Sites in the East Riding o...Mark Kwabena Gadogbe

3.3 Climate data and projections

NAP Events

Exploring DEM error with geographically weighted regression

GeoCommunity

A Combined Entropy-FR Weightage Formulation Model for Delineation of Groundwa...

IRJET Journal

Similar to 2018 IMSM: Splicing of Multi-Scale Downscaler Air Quality Sufaces - US EPA Working Group, July 25, 2018 (20)

CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...

PLOTCON NYC: Custom Colormaps for Your Field

CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...

Longwave radiation

IGARSS11_HongboSu_ver3.ppt

On the Effect of Geometries Simplification on Geo-spatial Link Discovery

Data Mgmt-Liu.pdf

Fire Risk Mapping using GIS.pptx

Improving Dtm Accuracy

A Novel Technique in Software Engineering for Building Scalable Large Paralle...

Optimization of sample configurations for spatial trend estimation

Spatial GAS Models for Systemic Risk Measurement - Blasques F., Koopman S.J.,...

1-s2.0-S0045782514004332-main.pdf

Gauss–Bonnet Boson Stars in AdS, Bielefeld, Germany, 2014

V27i03

MONOGENIC SCALE SPACE BASED REGION COVARIANCE MATRIX DESCRIPTOR FOR FACE RECO...

Assessment of the Environmental Impact of Landfill Sites in the East Riding o...

3.3 Climate data and projections

Exploring DEM error with geographically weighted regression

A Combined Entropy-FR Weightage Formulation Model for Delineation of Groundwa...

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...

The Statistical and Applied Mathematical Sciences Institute

Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.

2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...

The Statistical and Applied Mathematical Sciences Institute

I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.

Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...

The Statistical and Applied Mathematical Sciences Institute

Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.

Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...

The Statistical and Applied Mathematical Sciences Institute

Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding. We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.

Causal Inference Opening Workshop - A Bracketing Relationship between Differe...

The Statistical and Applied Mathematical Sciences Institute

Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.

Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...

The Statistical and Applied Mathematical Sciences Institute

We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.

Causal Inference Opening Workshop - Difference-in-differences: more than meet...

The Statistical and Applied Mathematical Sciences Institute

The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.

Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...

The Statistical and Applied Mathematical Sciences Institute

We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.

Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...

The Statistical and Applied Mathematical Sciences Institute

A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.

Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...

The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...

The Statistical and Applied Mathematical Sciences Institute

We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.

Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...

The Statistical and Applied Mathematical Sciences Institute

The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.

Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...

The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...

The Statistical and Applied Mathematical Sciences Institute

We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platformâ€™s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.

Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...

The Statistical and Applied Mathematical Sciences Institute

We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include. In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.

Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...

The Statistical and Applied Mathematical Sciences Institute

We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.

2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...

The Statistical and Applied Mathematical Sciences Institute

Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.

2019 Fall Series: Professional Development, Writing Academic Papers…What Work...

The Statistical and Applied Mathematical Sciences Institute

2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...

The Statistical and Applied Mathematical Sciences Institute

Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)