We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
Bitcoin is a crypto-currency that enables peer-to-peer payments in a way that is partially anonymous and which requires no central authorities. These features are attractive to many criminals, as it is difficult to track and shut down Bitcoin payments — difficult, but not impossible. In this talk, I present novel approaches to trace the flow of bitcoins in two types of criminal operations: (i) human traffickers that advertised sex services on Backpage [KDD ’17]; and (ii) ransomware, which encrypts files and demands bitcoins in exchange for decryption keys [IEEE S&P ’18].
Cryptocurrency (crypto) is a digital asset whose value is the function of the algorithm it is based on. This algorithm, or protocol, is the core of the crypto. It defines, among many other things, whether new crypto units can be created, the circumstances under which new units can be created, and the ownership of these new units. The protocol is coded and distributed to users as software. Being software, crypto is prone to bugs, requires upgrades, and subject to hacking.
We use this variation in holders’ type and in crypto type to study two questions: (1) Do announcements that provide important information about the future of the crypto (e.g., protocol upgrades) increase the information asymmetry across sophisticated investors and pure users? and (2) Do announcements (e.g., on Twitter) with wider spread decrease the value lost by pure users? That is, does a medium like Twitter allows for wide reach that is effective enough to also inform pure users and thus decrease the information asymmetry?
Cryptocurrency exchanges offer challenges to even the most seasoned data scientists, security engineers and financial analysts. Everything from customer due diligence to predictive analytics to cybersecurity to illicit activity tracking requires both techniques and data that are unique to cryptocurrency and often hard to obtain. The ease by which dishonest exchanges can inflate trading volumes and the difficulty of obtaining approval for custody solutions presents hurdles when seeking formal regulatory approval. Advancements in Deep Fake videos and highly targeted phishing campaigns have tipped the scales in favor of attackers, keeping security teams constantly trying to stay one step ahead of adversaries and leaving enforcement bodies wondering about what precisely to collect in terms of forensic evidence. My talk will describe how a regulated cryptocurrency spot and futures exchange went from an idea to a fully regulated entity, the challenges we encountered along the way with regard to the design of custody and unique data analytics, and the open research challenges remaining.
The goal of this work is to advance our understanding of what new can be learned about crypto-tokens by analyzing the topological structure of the Ethereum transaction network. By introducing a novel combination of tools from topological data analysis and functional data depth into blockchain data analytics, we show that Ethereum network can provide critical insights on price strikes of crypto-token prices that are otherwise largely inaccessible with conventional data sources and traditional analytic methods.
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
In this talk, we present the research around “Cryptocurrency and blockchain systems”. In particular, we analyse, three different sources of data originating from (i) blockchains, (ii) exchange office, and (iii) news data.
In the first part, we study the possibility of inferring early warning indicators for periods of extreme bitcoin price volatility using features obtained from the non-negative decomposition of Bitcoin daily transaction graphs.
In the second part, we show the temporal mixture models capable of adaptively exploiting both volatility history and order book features.
Our temporal mixture model enables to decipher time-varying effect of order book features on the volatility.
In the last part, we focus on cryptocurrency news. In order to track popular news in real-time, we (a) match news from the web with tweets from social media,(b) track their intraday tweet activity and (c) explore different machine learning models for predicting the number of article mentions on Twitter after its publication.
The definition of infrastructure in the digital age is changing, asset ownership is becoming blurred, and new data-driven operating and financial models are emerging. By intertwining infrastructure into digital assets, informational inefficiencies can be uncovered that change how we design, value, price and invest in infrastructure assets. Tokenization of information from IoT coupled to smart contracts are offering opportunities to improve efficiencies throughout the financing and operation phases. We are working with incubators, public finance managers, investors and public officials to explore blockchain-enabled financing models for transportation, water, energy and real estate infrastructure. The integration of highly variable time series of data impacts valuation and pricing, and lack of standardization challenge emerging applications and scalability at this time.
We describe three approaches including fully collateralized custodial tokens, partially collateralized custodial tokens, and dynamically stabilized tokens, and demonstrate that only fully collateralized tokens can be stable, even under extreme circumstances. To conclude, we discuss in detail Digital Trade Coin, a stable coin backed by either fiat or real assets and argue that such a coin can be used as a much-needed counterpoint to the US dollar. We also briefly discuss merits and demerits of Libra proposed by Facebook.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
More Related Content
More from The Statistical and Applied Mathematical Sciences Institute
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
Bitcoin is a crypto-currency that enables peer-to-peer payments in a way that is partially anonymous and which requires no central authorities. These features are attractive to many criminals, as it is difficult to track and shut down Bitcoin payments — difficult, but not impossible. In this talk, I present novel approaches to trace the flow of bitcoins in two types of criminal operations: (i) human traffickers that advertised sex services on Backpage [KDD ’17]; and (ii) ransomware, which encrypts files and demands bitcoins in exchange for decryption keys [IEEE S&P ’18].
Cryptocurrency (crypto) is a digital asset whose value is the function of the algorithm it is based on. This algorithm, or protocol, is the core of the crypto. It defines, among many other things, whether new crypto units can be created, the circumstances under which new units can be created, and the ownership of these new units. The protocol is coded and distributed to users as software. Being software, crypto is prone to bugs, requires upgrades, and subject to hacking.
We use this variation in holders’ type and in crypto type to study two questions: (1) Do announcements that provide important information about the future of the crypto (e.g., protocol upgrades) increase the information asymmetry across sophisticated investors and pure users? and (2) Do announcements (e.g., on Twitter) with wider spread decrease the value lost by pure users? That is, does a medium like Twitter allows for wide reach that is effective enough to also inform pure users and thus decrease the information asymmetry?
Cryptocurrency exchanges offer challenges to even the most seasoned data scientists, security engineers and financial analysts. Everything from customer due diligence to predictive analytics to cybersecurity to illicit activity tracking requires both techniques and data that are unique to cryptocurrency and often hard to obtain. The ease by which dishonest exchanges can inflate trading volumes and the difficulty of obtaining approval for custody solutions presents hurdles when seeking formal regulatory approval. Advancements in Deep Fake videos and highly targeted phishing campaigns have tipped the scales in favor of attackers, keeping security teams constantly trying to stay one step ahead of adversaries and leaving enforcement bodies wondering about what precisely to collect in terms of forensic evidence. My talk will describe how a regulated cryptocurrency spot and futures exchange went from an idea to a fully regulated entity, the challenges we encountered along the way with regard to the design of custody and unique data analytics, and the open research challenges remaining.
The goal of this work is to advance our understanding of what new can be learned about crypto-tokens by analyzing the topological structure of the Ethereum transaction network. By introducing a novel combination of tools from topological data analysis and functional data depth into blockchain data analytics, we show that Ethereum network can provide critical insights on price strikes of crypto-token prices that are otherwise largely inaccessible with conventional data sources and traditional analytic methods.
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
In this talk, we present the research around “Cryptocurrency and blockchain systems”. In particular, we analyse, three different sources of data originating from (i) blockchains, (ii) exchange office, and (iii) news data.
In the first part, we study the possibility of inferring early warning indicators for periods of extreme bitcoin price volatility using features obtained from the non-negative decomposition of Bitcoin daily transaction graphs.
In the second part, we show the temporal mixture models capable of adaptively exploiting both volatility history and order book features.
Our temporal mixture model enables to decipher time-varying effect of order book features on the volatility.
In the last part, we focus on cryptocurrency news. In order to track popular news in real-time, we (a) match news from the web with tweets from social media,(b) track their intraday tweet activity and (c) explore different machine learning models for predicting the number of article mentions on Twitter after its publication.
The definition of infrastructure in the digital age is changing, asset ownership is becoming blurred, and new data-driven operating and financial models are emerging. By intertwining infrastructure into digital assets, informational inefficiencies can be uncovered that change how we design, value, price and invest in infrastructure assets. Tokenization of information from IoT coupled to smart contracts are offering opportunities to improve efficiencies throughout the financing and operation phases. We are working with incubators, public finance managers, investors and public officials to explore blockchain-enabled financing models for transportation, water, energy and real estate infrastructure. The integration of highly variable time series of data impacts valuation and pricing, and lack of standardization challenge emerging applications and scalability at this time.
We describe three approaches including fully collateralized custodial tokens, partially collateralized custodial tokens, and dynamically stabilized tokens, and demonstrate that only fully collateralized tokens can be stable, even under extreme circumstances. To conclude, we discuss in detail Digital Trade Coin, a stable coin backed by either fiat or real assets and argue that such a coin can be used as a much-needed counterpoint to the US dollar. We also briefly discuss merits and demerits of Libra proposed by Facebook.
More from The Statistical and Applied Mathematical Sciences Institute (20)
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observational Studies - Colin Fogarty, December 11, 2019
1. Testing Weak Nulls in Matched Observational
Studies
Colin Fogarty
Massachusetts Institute of Technology
December 11, 2019
2. Robustness of Randomization Tests
Randomization tests provide exact tests for sharp null hypotheses,
the most common of which is Fisher’s sharp null:
HF : yi (0) = yi (1) for all i
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
3. Robustness of Randomization Tests
Randomization tests provide exact tests for sharp null hypotheses,
the most common of which is Fisher’s sharp null:
HF : yi (0) = yi (1) for all i
Concern: are randomization tests of sharp nulls prone to
misinterpretation?
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
4. Robustness of Randomization Tests
Randomization tests provide exact tests for sharp null hypotheses,
the most common of which is Fisher’s sharp null:
HF : yi (0) = yi (1) for all i
Concern: are randomization tests of sharp nulls prone to
misinterpretation?
Perhaps a researcher will use a randomization test, but then
think that she/he has evidence for the existence of a positive
average effect if it rejects...
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
5. Robustness of Randomization Tests
Randomization tests provide exact tests for sharp null hypotheses,
the most common of which is Fisher’s sharp null:
HF : yi (0) = yi (1) for all i
Concern: are randomization tests of sharp nulls prone to
misinterpretation?
Perhaps a researcher will use a randomization test, but then
think that she/he has evidence for the existence of a positive
average effect if it rejects...
Related issue: permutation tests are often viewed by
practitioners as nonparametric alternatives to t-tests when the
two samples don’t look normally distributed
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
6. Randomization Tests for Weak Nulls
Consider a completely randomized experiment under the finite
population model. Suppose I want to use a randomization test to
test the weak null hypothesis
HN : ¯y(1) − ¯y(0) = 0.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
7. Randomization Tests for Weak Nulls
Consider a completely randomized experiment under the finite
population model. Suppose I want to use a randomization test to
test the weak null hypothesis
HN : ¯y(1) − ¯y(0) = 0.
Cannot use the randomization test based upon the observed
treated-minus-control difference in means, ˆτ
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
8. Randomization Tests for Weak Nulls
Consider a completely randomized experiment under the finite
population model. Suppose I want to use a randomization test to
test the weak null hypothesis
HN : ¯y(1) − ¯y(0) = 0.
Cannot use the randomization test based upon the observed
treated-minus-control difference in means, ˆτ
Can use the randomization test with the studentized difference
in means,
ˆτ
s2
T /nT + s2
C /nC
,
where s2
T and s2
C are the sample variances of the observed
outcomes in the treated and control groups
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
9. Randomization Tests for Weak Nulls
The studentized randomization test yields a single, unified mode of
inference under the finite population model that is
1 Exact under Fisher’s sharp null
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
10. Randomization Tests for Weak Nulls
The studentized randomization test yields a single, unified mode of
inference under the finite population model that is
1 Exact under Fisher’s sharp null
2 Asymptotically conservative under Neyman’s weak null
Conservative inference is a fundamental property of inference
under the finite population model with heterogeneous effects
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
11. Randomization Tests for Weak Nulls
The studentized randomization test yields a single, unified mode of
inference under the finite population model that is
1 Exact under Fisher’s sharp null
2 Asymptotically conservative under Neyman’s weak null
Conservative inference is a fundamental property of inference
under the finite population model with heterogeneous effects
While the sharp and weak nulls are surely different, the studentized
randomization test obviates the distinction for practitioners
See Loh et al. (2017); Wu and Ding (2018). Also see Chung and
Romano (2013) for related developments for permutation tests.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
12. What’s Different for Observational Studies?
For randomized experiments, E(ˆτ) = 0 under both the sharp and
weak nulls. But regardless of assuming the sharp or weak null, we
would not expect E(ˆτ) = 0 in a matched observational study.
Why? Unmeasured confounding.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
13. What’s Different for Observational Studies?
For randomized experiments, E(ˆτ) = 0 under both the sharp and
weak nulls. But regardless of assuming the sharp or weak null, we
would not expect E(ˆτ) = 0 in a matched observational study.
Why? Unmeasured confounding.
In observational studies, we assess the robustness of our study’s
findings to unmeasured confounding through a sensitivity analysis
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
14. What’s Different for Observational Studies?
For randomized experiments, E(ˆτ) = 0 under both the sharp and
weak nulls. But regardless of assuming the sharp or weak null, we
would not expect E(ˆτ) = 0 in a matched observational study.
Why? Unmeasured confounding.
In observational studies, we assess the robustness of our study’s
findings to unmeasured confounding through a sensitivity analysis
We’ll review a model proposed by Rosenbaum for such an
analysis in matched observational studies
Existing methods under the model primarily focus on testing
sharp null hypotheses.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
15. What’s Different for Observational Studies?
For randomized experiments, E(ˆτ) = 0 under both the sharp and
weak nulls. But regardless of assuming the sharp or weak null, we
would not expect E(ˆτ) = 0 in a matched observational study.
Why? Unmeasured confounding.
In observational studies, we assess the robustness of our study’s
findings to unmeasured confounding through a sensitivity analysis
We’ll review a model proposed by Rosenbaum for such an
analysis in matched observational studies
Existing methods under the model primarily focus on testing
sharp null hypotheses.
What if effect heterogeneity induces larger discrepancies between
¯y(1) − ¯y(0) and the worst-case expectation of ˆτ?
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
16. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
17. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Individual j in set i has observed covariates, xij and an
unobserved covariate, 0 ≤ uij ≤ 1.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
18. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Individual j in set i has observed covariates, xij and an
unobserved covariate, 0 ≤ uij ≤ 1.
Using an optimal, without-replacement matching algorithm,
treated and control individuals are placed into matched sets such
that xij ≈ xij .
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
19. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Individual j in set i has observed covariates, xij and an
unobserved covariate, 0 ≤ uij ≤ 1.
Using an optimal, without-replacement matching algorithm,
treated and control individuals are placed into matched sets such
that xij ≈ xij .
Consider post-strata with one treated individual; ni − 1 controls
(what follows immediately extends to full matching).
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
20. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Individual j in set i has observed covariates, xij and an
unobserved covariate, 0 ≤ uij ≤ 1.
Using an optimal, without-replacement matching algorithm,
treated and control individuals are placed into matched sets such
that xij ≈ xij .
Consider post-strata with one treated individual; ni − 1 controls
(what follows immediately extends to full matching).
yij (1) and yij (0) are the potential outcomes under treatment and
control for individual j in set i.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
21. Notation for Matched Studies
The ith
of B matched sets has ni individuals. N = B
i=1 ni
Individual j in set i has observed covariates, xij and an
unobserved covariate, 0 ≤ uij ≤ 1.
Using an optimal, without-replacement matching algorithm,
treated and control individuals are placed into matched sets such
that xij ≈ xij .
Consider post-strata with one treated individual; ni − 1 controls
(what follows immediately extends to full matching).
yij (1) and yij (0) are the potential outcomes under treatment and
control for individual j in set i.
τij = yij (1) − yij (0) is the treatment effect for each individual
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
22. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
23. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Yij = Zij yij (1) + (1 − Zij )yij (0) is the observed response
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
24. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Yij = Zij yij (1) + (1 − Zij )yij (0) is the observed response
ˆτi = ni
j=1 Zij Yij − ni
j=1(1 − Zij )Yij /(ni − 1)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
25. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Yij = Zij yij (1) + (1 − Zij )yij (0) is the observed response
ˆτi = ni
j=1 Zij Yij − ni
j=1(1 − Zij )Yij /(ni − 1)
¯τi = ni
j=1 τij /ni is the average of the treatment effects in set i.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
26. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Yij = Zij yij (1) + (1 − Zij )yij (0) is the observed response
ˆτi = ni
j=1 Zij Yij − ni
j=1(1 − Zij )Yij /(ni − 1)
¯τi = ni
j=1 τij /ni is the average of the treatment effects in set i.
¯τ = ni
j=1(ni /N)¯τi is the sample average treatment effect.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
27. Notation for Matched Studies
What do we observe?
Zij is the treatment indicator (1 treated, 0 control).
Yij = Zij yij (1) + (1 − Zij )yij (0) is the observed response
ˆτi = ni
j=1 Zij Yij − ni
j=1(1 − Zij )Yij /(ni − 1)
¯τi = ni
j=1 τij /ni is the average of the treatment effects in set i.
¯τ = ni
j=1(ni /N)¯τi is the sample average treatment effect.
Let F = {yij (1), yij (0), xij , uij }, Ω = {z : ni
j=1 zij = 1},
Z = {Z ∈ Ω}
Inference moving forwards will condition upon F and Z.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
28. A Simple Model for Hidden Bias
Let πij = pr(Zij = 1 | F). A simple model for hidden bias states that
πij = pr(Zij = 1 | xij , uij ), with 0 ≤ uij ≤ 1 and
log
πij
1 − πij
= κi + log(Γ)uij .
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
29. A Simple Model for Hidden Bias
Let πij = pr(Zij = 1 | F). A simple model for hidden bias states that
πij = pr(Zij = 1 | xij , uij ), with 0 ≤ uij ≤ 1 and
log
πij
1 − πij
= κi + log(Γ)uij .
Γ ≥ 1 controls the impact of hidden bias on treatment
assignment.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
30. A Simple Model for Hidden Bias
Let πij = pr(Zij = 1 | F). A simple model for hidden bias states that
πij = pr(Zij = 1 | xij , uij ), with 0 ≤ uij ≤ 1 and
log
πij
1 − πij
= κi + log(Γ)uij .
Γ ≥ 1 controls the impact of hidden bias on treatment
assignment.
Equivalently, for individuals j, j in the same matched set i,
1
Γ
≤
πij (1 − πij )
πij (1 − πij )
≤ Γ
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
31. A Simple Model for Hidden Bias
Let ij = pr(Zij = 1 | F, Z)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
32. A Simple Model for Hidden Bias
Let ij = pr(Zij = 1 | F, Z)
Conditions on the matched structure, removing dependence on
the nuisance parameters κi
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
33. A Simple Model for Hidden Bias
Let ij = pr(Zij = 1 | F, Z)
Conditions on the matched structure, removing dependence on
the nuisance parameters κi
At Γ = 1, ij = 1/ni .
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
34. A Simple Model for Hidden Bias
Let ij = pr(Zij = 1 | F, Z)
Conditions on the matched structure, removing dependence on
the nuisance parameters κi
At Γ = 1, ij = 1/ni .
Recovers a finely stratified experiment (one treated, ni − 1
controls, ni equiprobable assignments).
Entitles one to modes of inference justified under those designs
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
35. A Simple Model for Hidden Bias
Let ij = pr(Zij = 1 | F, Z)
Conditions on the matched structure, removing dependence on
the nuisance parameters κi
At Γ = 1, ij = 1/ni .
Recovers a finely stratified experiment (one treated, ni − 1
controls, ni equiprobable assignments).
Entitles one to modes of inference justified under those designs
Γ > 1 allows for departures from the idealized finely stratified
experiment that matching seeks to emulate,
ij =
exp{log(Γ)uij }
ni
k=1 exp{log(Γ)uik}
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
36. Sensitivity Analysis Under the Sharp Null
Let δij be the observed value of the treated-minus-control paired
difference if individual j receives the treatment.
δij = yij (1) −
j =j
yij (0)/(ni − 1),
Under the sharp null, yij (1) = yij (0) and the observed response Yij
impute the yij (0), and hence δij .
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
37. Sensitivity Analysis Under the Sharp Null
Let δij be the observed value of the treated-minus-control paired
difference if individual j receives the treatment.
δij = yij (1) −
j =j
yij (0)/(ni − 1),
Under the sharp null, yij (1) = yij (0) and the observed response Yij
impute the yij (0), and hence δij .
Consider the randomization distribution based upon ˆτ,
pr(ˆτ ≥ a | F, Z) =
z∈Ω
1{ˆτ ≥ a}pr(Z = z | F, Z)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
38. Sensitivity Analysis Under the Sharp Null
Let δij be the observed value of the treated-minus-control paired
difference if individual j receives the treatment.
δij = yij (1) −
j =j
yij (0)/(ni − 1),
Under the sharp null, yij (1) = yij (0) and the observed response Yij
impute the yij (0), and hence δij .
Consider the randomization distribution based upon ˆτ,
pr(ˆτ ≥ a | F, Z) =
z∈Ω
1{ˆτ ≥ a}pr(Z = z | F, Z)
Even under the sharp null, we can’t directly use this randomization
distribution in observational studies because pr(Z = z | F, Z) is
unknown.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
39. Sensitivity Analysis Under the Sharp Null
For a given value of Γ, a sensitivity analysis tries to construct a
random variable TΓ such that under the sharp null
pr(ˆτ ≥ a | F, Z) ≤ pr(TΓ ≥ a | F, Z)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
40. Sensitivity Analysis Under the Sharp Null
For a given value of Γ, a sensitivity analysis tries to construct a
random variable TΓ such that under the sharp null
pr(ˆτ ≥ a | F, Z) ≤ pr(TΓ ≥ a | F, Z)
This bounding random variable is used to upper bound p-values
Can be done exactly and tractably in a paired design
Not tractable for general matched designs, so one typically
resorts to asymptotic approximations (Gastwirth 2000)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
41. Sensitivity Analysis Under the Sharp Null
For a given value of Γ, a sensitivity analysis tries to construct a
random variable TΓ such that under the sharp null
pr(ˆτ ≥ a | F, Z) ≤ pr(TΓ ≥ a | F, Z)
This bounding random variable is used to upper bound p-values
Can be done exactly and tractably in a paired design
Not tractable for general matched designs, so one typically
resorts to asymptotic approximations (Gastwirth 2000)
Iteratively increase Γ until one can no longer reject the null. Attests
to the robustness of the study’s finding to hidden bias
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
42. A Unified Procedure?
Constructing the worst-case random variable makes heavy use of the
sharp null
Finding worst-case treatment assignment probabilities requires
knowledge of what the outcomes would have been for all
assignments
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
43. A Unified Procedure?
Constructing the worst-case random variable makes heavy use of the
sharp null
Finding worst-case treatment assignment probabilities requires
knowledge of what the outcomes would have been for all
assignments
Suppose we assume that the sensitivity model holds at Γ ≥ 1. Can
we find a single, unified non-randomized hypothesis test ϕ(α, Γ) (1
if reject, 0 otherwise) such that, under suitable regularity conditions
and for any u
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
44. A Unified Procedure?
Constructing the worst-case random variable makes heavy use of the
sharp null
Finding worst-case treatment assignment probabilities requires
knowledge of what the outcomes would have been for all
assignments
Suppose we assume that the sensitivity model holds at Γ ≥ 1. Can
we find a single, unified non-randomized hypothesis test ϕ(α, Γ) (1
if reject, 0 otherwise) such that, under suitable regularity conditions
and for any u
1 Under the sharp null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
45. A Unified Procedure?
Constructing the worst-case random variable makes heavy use of the
sharp null
Finding worst-case treatment assignment probabilities requires
knowledge of what the outcomes would have been for all
assignments
Suppose we assume that the sensitivity model holds at Γ ≥ 1. Can
we find a single, unified non-randomized hypothesis test ϕ(α, Γ) (1
if reject, 0 otherwise) such that, under suitable regularity conditions
and for any u
1 Under the sharp null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
2 Under the weak null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
46. A Unified Procedure?
Constructing the worst-case random variable makes heavy use of the
sharp null
Finding worst-case treatment assignment probabilities requires
knowledge of what the outcomes would have been for all
assignments
Suppose we assume that the sensitivity model holds at Γ ≥ 1. Can
we find a single, unified non-randomized hypothesis test ϕ(α, Γ) (1
if reject, 0 otherwise) such that, under suitable regularity conditions
and for any u
1 Under the sharp null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
2 Under the weak null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
47. Success for Paired Designs at Γ > 1
Define n new random variables
DΓi = ˆτi −
Γ − 1
1 + Γ
|ˆτi |,
For any Γ, ˆτi has worst-case expectation {(Γ − 1)/(1 + Γ)}|ˆτi |
under sharp null.
Consider the standard error estimate
se( ¯DΓ) =
1
n(n − 1)
n
i=1
(DΓi − ¯DΓ)2.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
48. Success for Paired Designs at Γ > 1
Define n new random variables
DΓi = ˆτi −
Γ − 1
1 + Γ
|ˆτi |,
For any Γ, ˆτi has worst-case expectation {(Γ − 1)/(1 + Γ)}|ˆτi |
under sharp null.
Consider the standard error estimate
se( ¯DΓ) =
1
n(n − 1)
n
i=1
(DΓi − ¯DΓ)2.
Consider the test
ϕ(α, Γ) = 1 ¯DΓ/se( ¯DΓ) ≥ Φ−1
(1 − α) ,
where Φ(·) is the standard normal CDF.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
49. Success for Paired Designs at Γ > 1
An Asymptotic Sensitivity Analysis for Neyman’s Null
Under mild regularity conditions, if the sensitivity model holds at Γ
and the weak null holds
lim sup
n→∞
E{ϕ(α, Γ) | F, Z} ≤ α.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
50. Success for Paired Designs at Γ > 1
An Asymptotic Sensitivity Analysis for Neyman’s Null
Under mild regularity conditions, if the sensitivity model holds at Γ
and the weak null holds
lim sup
n→∞
E{ϕ(α, Γ) | F, Z} ≤ α.
Furthermore, one can replace the standard normal reference
distribution with a studentized randomization distribution (with
biased assignment probabilities governed by Γ) such that
1 Under the sharp null, E{ϕ(α, Γ) | F, Z} ≤ α for any sample
size, with equality possible
2 Under the weak null, lim sup E{ϕ(α, Γ) | F, Z} ≤ α, with
equality possible
See Fogarty (2019+) for more details.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
51. Γ > 1, General Matched Designs
For a given Γ, consider as a test statistic any monotone increasing
function of the stratumwise differences in means hΓ,ni
(ˆτi )
May depend upon Γ and ni
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
52. Γ > 1, General Matched Designs
For a given Γ, consider as a test statistic any monotone increasing
function of the stratumwise differences in means hΓ,ni
(ˆτi )
May depend upon Γ and ni
Let µΓi be the worst-case expectation for hΓ,ni
(ˆτi ) under the sharp
null hypothesis
Easy to compute through asymptotic separability (could also just
solve the LP).
µΓi is random under the weak null, varying with Z.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
53. An Impossibility Result
Consider
E N−1
B
i=1
{hΓ,ni
(ˆτi ) − µΓi }
Bounded above by zero under sharp null if model holds at Γ,
with equality possible. What about the weak null?
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
54. An Impossibility Result
Consider
E N−1
B
i=1
{hΓ,ni
(ˆτi ) − µΓi }
Bounded above by zero under sharp null if model holds at Γ,
with equality possible. What about the weak null?
Failure to Control the Worst-Case Expectation
Suppose the sensitivity model holds at Γ. Then, for any choice of
functions {hΓni
}ni ≥2, there exist combinations of stratum sizes,
potential outcomes satisfying the weak null, and unmeasured
confounders u such that
lim inf
B→∞
N−1
E{hΓ,ni
(ˆτi ) − µΓi } > 0
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
56. Implications
The expectations cannot be simultaneously tightly controlled
If a sensitivity analysis could be asymptotically correct (rather
than conservative) under the sharp null, it must only be valid for
a subset of the weak null
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
57. Implications
The expectations cannot be simultaneously tightly controlled
If a sensitivity analysis could be asymptotically correct (rather
than conservative) under the sharp null, it must only be valid for
a subset of the weak null
If a sensitivity analysis is valid for the entirety of the weak null,
it must be unnecessarily conservative if only the sharp null holds.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
58. Implications
The expectations cannot be simultaneously tightly controlled
If a sensitivity analysis could be asymptotically correct (rather
than conservative) under the sharp null, it must only be valid for
a subset of the weak null
If a sensitivity analysis is valid for the entirety of the weak null,
it must be unnecessarily conservative if only the sharp null holds.
Using a studentized test statistic does nothing to help! Has to
do with misalignment of worst-case expectations
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
59. Valid Inference for the Weak Null
For a given Γ, the conditional probabilities ij are bounded as
1
Γ(ni − 1) + 1
≤ ij ≤
Γ
ni − 1 + Γ
,
and are further constrained by ni
j=1 ij = 1.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
60. Valid Inference for the Weak Null
For a given Γ, the conditional probabilities ij are bounded as
1
Γ(ni − 1) + 1
≤ ij ≤
Γ
ni − 1 + Γ
,
and are further constrained by ni
j=1 ij = 1.
If we knew ij , an unbiased estimator for ¯τi given F, Z would be
IPWi =
1
ni
ˆτi
ni
j=1 Zij ij
At Γ = 1, IPWi = ˆτi .
For Γ > 1, can’t use IPWi for any i as ij are unknown
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
61. Worst-Case Weighting
We can’t use IPWi as πi is unknown. Consider instead
WΓi =
1
ni
ˆτi
Γ
ni −1+Γ
1(ˆτi ≥ 0) + 1
1+Γ(ni −1)
1(ˆτi < 0)
.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
62. Worst-Case Weighting
We can’t use IPWi as πi is unknown. Consider instead
WΓi =
1
ni
ˆτi
Γ
ni −1+Γ
1(ˆτi ≥ 0) + 1
1+Γ(ni −1)
1(ˆτi < 0)
.
Weights ˆτi by the worst-case assignment probability if the sensitivity
model holds at Γ
E(WΓi | F, Z) ≤ ¯τi
E
B
i=1
(ni /n)WΓi | F, Z ≤ ¯τ
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
63. Why Might This Be Conservative?
Suppose ni = 3, the potential values for ˆτi given F, Z are
δi1 = 5, δi2 = −2, δi3 = −3, and we conduct inference at Γ = 2.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
64. Why Might This Be Conservative?
Suppose ni = 3, the potential values for ˆτi given F, Z are
δi1 = 5, δi2 = −2, δi3 = −3, and we conduct inference at Γ = 2.
What would the worst-case probabilities employed by W2i be?
W2i =
1
3
ˆτi
1
2
1(ˆτi ≥ 0) + 1
5
1(ˆτi < 0)
.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
65. Why Might This Be Conservative?
Suppose ni = 3, the potential values for ˆτi given F, Z are
δi1 = 5, δi2 = −2, δi3 = −3, and we conduct inference at Γ = 2.
What would the worst-case probabilities employed by W2i be?
W2i =
1
3
ˆτi
1
2
1(ˆτi ≥ 0) + 1
5
1(ˆτi < 0)
.
δi1 = 5 ⇒ ∗
i1 = 1/2
δi2 = −2 ⇒ ∗
i2 = 1/5
δi3 = −3 ⇒ ∗
i3 = 1/5
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
66. Why Might This Be Conservative?
Suppose ni = 3, the potential values for ˆτi given F, Z are
δi1 = 5, δi2 = −2, δi3 = −3, and we conduct inference at Γ = 2.
What would the worst-case probabilities employed by W2i be?
W2i =
1
3
ˆτi
1
2
1(ˆτi ≥ 0) + 1
5
1(ˆτi < 0)
.
δi1 = 5 ⇒ ∗
i1 = 1/2
δi2 = −2 ⇒ ∗
i2 = 1/5
δi3 = −3 ⇒ ∗
i3 = 1/5
ni
j=1
∗
ij = 9/10. Not a probability distribution!
If model holds at Γ = 2, E(W2i | F, Z) < 0.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
67. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
68. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Under the sharp null, we can find the worst-case probability
distribution because we know δij for j = 1, ..., ni
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
69. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Under the sharp null, we can find the worst-case probability
distribution because we know δij for j = 1, ..., ni
WΓi is unduly conservative for sharp null, can be improved upon
by weighting with the worst-case distribution.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
70. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Under the sharp null, we can find the worst-case probability
distribution because we know δij for j = 1, ..., ni
WΓi is unduly conservative for sharp null, can be improved upon
by weighting with the worst-case distribution.
Under the weak null, we can’t impose the constraint! We only
know δij for one of the ni individuals.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
71. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Under the sharp null, we can find the worst-case probability
distribution because we know δij for j = 1, ..., ni
WΓi is unduly conservative for sharp null, can be improved upon
by weighting with the worst-case distribution.
Under the weak null, we can’t impose the constraint! We only
know δij for one of the ni individuals.
Any attempt at adjusting ∗
ij runs the risk of yielding liberal
inference.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
72. Incompatibility
In general, the worst-case IPW estimator WΓi does not generate
worst-case probabilities corresponding to a valid probability
distribution
Under the sharp null, we can find the worst-case probability
distribution because we know δij for j = 1, ..., ni
WΓi is unduly conservative for sharp null, can be improved upon
by weighting with the worst-case distribution.
Under the weak null, we can’t impose the constraint! We only
know δij for one of the ni individuals.
Any attempt at adjusting ∗
ij runs the risk of yielding liberal
inference.
Incompatibility not a deficiency of matching - prevalent in many
modes of sensitivity analysis
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
73. An Alternative Weighting
Consider the alternative weighting
˜WΓi =
1
ni
ˆτi
2Γ
ni (1+Γ)
1(ˆτi ≥ 0) + 2
ni (1+Γ)
1(ˆτi < 0)
.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
74. An Alternative Weighting
Consider the alternative weighting
˜WΓi =
1
ni
ˆτi
2Γ
ni (1+Γ)
1(ˆτi ≥ 0) + 2
ni (1+Γ)
1(ˆτi < 0)
.
Relative to before...
1
Γ(ni − 1) + 1
⇒
2
ni (1 + Γ)
(larger)
Γ
ni − 1 + Γ
⇒
2Γ
ni (1 + Γ)
(smaller)
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
75. Expectation under the Sharp Null
This weighting scheme has nice properties under the sharp null
The Worst-Case Expectation Under the Sharp Null
Suppose the sensitivity model holds at Γ. Then,
E( ˜WΓi | F, Z) ≤ 0,
and equality holds when uij = 1{δij ≥ 0}
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
76. Expectation under the Sharp Null
This weighting scheme has nice properties under the sharp null
The Worst-Case Expectation Under the Sharp Null
Suppose the sensitivity model holds at Γ. Then,
E( ˜WΓi | F, Z) ≤ 0,
and equality holds when uij = 1{δij ≥ 0}
So, taking a weighted average, under the sharp null
E
B
i=1
(ni /N) ˜WΓi | F, Z ≤ 0,
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
77. Expectation under the Sharp Null
This weighting scheme has nice properties under the sharp null
The Worst-Case Expectation Under the Sharp Null
Suppose the sensitivity model holds at Γ. Then,
E( ˜WΓi | F, Z) ≤ 0,
and equality holds when uij = 1{δij ≥ 0}
So, taking a weighted average, under the sharp null
E
B
i=1
(ni /N) ˜WΓi | F, Z ≤ 0,
Because it sharply bounds the expectation under the sharp null, it
must only do so for a subset of the weak null!
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
78. The Expectation Under the Weak Null
For other matched designs, if the weak null holds but the sharp null
does not,
E
B
i=1
(ni /N) ˜WΓi
≤ CΓ
B
i=1
(ni /N) 1 +
1 − Γ
Γ
pr(ˆτi ≥ ¯τi | F, Z) ¯τi
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
79. The Expectation Under the Weak Null
For other matched designs, if the weak null holds but the sharp null
does not,
E
B
i=1
(ni /N) ˜WΓi
≤ CΓ
B
i=1
(ni /N) 1 +
1 − Γ
Γ
pr(ˆτi ≥ ¯τi | F, Z) ¯τi
A sufficient condition for the worst-case expectation under the weak
null to be controlled under the weak null is that, for the worst-case
confounder,
cov {pr(ˆτi ≥ ¯τi | F, Z), ni ¯τi } ≥ 0,
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
80. Reasonable?
Consider uij = {δij ≥ ¯τi }. The sufficient condition can be rewritten as
cov
ni
ni
j=1{1(δij < ¯τi ) + Γ1(δij ≥ ¯τi )}
, ni (¯τi ) ≤ 0.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
81. Reasonable?
Consider uij = {δij ≥ ¯τi }. The sufficient condition can be rewritten as
cov
ni
ni
j=1{1(δij < ¯τi ) + Γ1(δij ≥ ¯τi )}
, ni (¯τi ) ≤ 0.
The first term is a function of the stratum size ni and
ni
j=1 1(δij − ¯τi ≥ 0); Second term is ni ¯τi
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
82. Reasonable?
Consider uij = {δij ≥ ¯τi }. The sufficient condition can be rewritten as
cov
ni
ni
j=1{1(δij < ¯τi ) + Γ1(δij ≥ ¯τi )}
, ni (¯τi ) ≤ 0.
The first term is a function of the stratum size ni and
ni
j=1 1(δij − ¯τi ≥ 0); Second term is ni ¯τi
ni
ni
j=1(δij − ¯τi )¯τi = 0 for all i!
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
83. Reasonable?
Consider uij = {δij ≥ ¯τi }. The sufficient condition can be rewritten as
cov
ni
ni
j=1{1(δij < ¯τi ) + Γ1(δij ≥ ¯τi )}
, ni (¯τi ) ≤ 0.
The first term is a function of the stratum size ni and
ni
j=1 1(δij − ¯τi ≥ 0); Second term is ni ¯τi
ni
ni
j=1(δij − ¯τi )¯τi = 0 for all i!
Covariance may be nonzero only because residuals being uncorrelated
from fitted values does not imply that functions of residuals are
uncorrelated from fitted values
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
84. Reasonable?
Consider uij = {δij ≥ ¯τi }. The sufficient condition can be rewritten as
cov
ni
ni
j=1{1(δij < ¯τi ) + Γ1(δij ≥ ¯τi )}
, ni (¯τi ) ≤ 0.
The first term is a function of the stratum size ni and
ni
j=1 1(δij − ¯τi ≥ 0); Second term is ni ¯τi
ni
ni
j=1(δij − ¯τi )¯τi = 0 for all i!
Covariance may be nonzero only because residuals being uncorrelated
from fitted values does not imply that functions of residuals are
uncorrelated from fitted values
Do we care?
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
85. Reasonable?
Are we worried about hidden bias acting in this way?
To be anticonservative, skewness of δij − ¯τi must relate to values
of ¯τi
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
86. Reasonable?
Are we worried about hidden bias acting in this way?
To be anticonservative, skewness of δij − ¯τi must relate to values
of ¯τi
If we decide this doesn’t matter, sensitivity analysis using the test
statistic B
i=1(ni /N) ˜WΓi yields a single procedure that...
Controls the worst-case expectation under the sharp null, with
equality possible
Controls the worst-case expectation under a (potentially
benign?) subset of the weak null.
Variance estimation, asymptotic normality follows naturally from
Fogarty (2018); Pashley and Miratrix (2019+).
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
87. Conclusions
In randomized experiments, randomization tests using suitably
studentized differences-in-means typically yield a single mode of
inference for both the sharp (exact) and weak (asymptotic) nulls
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
88. Conclusions
In randomized experiments, randomization tests using suitably
studentized differences-in-means typically yield a single mode of
inference for both the sharp (exact) and weak (asymptotic) nulls
Continues to be true in a sensitivity analysis in paired
observational studies with a careful choice of test statistic
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
89. Conclusions
In randomized experiments, randomization tests using suitably
studentized differences-in-means typically yield a single mode of
inference for both the sharp (exact) and weak (asymptotic) nulls
Continues to be true in a sensitivity analysis in paired
observational studies with a careful choice of test statistic
Modes of inference cannot be unified in the same way when
conducting a sensitivity analysis for general forms of matching
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
90. Conclusions
In randomized experiments, randomization tests using suitably
studentized differences-in-means typically yield a single mode of
inference for both the sharp (exact) and weak (asymptotic) nulls
Continues to be true in a sensitivity analysis in paired
observational studies with a careful choice of test statistic
Modes of inference cannot be unified in the same way when
conducting a sensitivity analysis for general forms of matching
If we truly require correctness over the entirety of the weak null,
we must be conservative if the sharp null is true
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
91. Conclusions
In randomized experiments, randomization tests using suitably
studentized differences-in-means typically yield a single mode of
inference for both the sharp (exact) and weak (asymptotic) nulls
Continues to be true in a sensitivity analysis in paired
observational studies with a careful choice of test statistic
Modes of inference cannot be unified in the same way when
conducting a sensitivity analysis for general forms of matching
If we truly require correctness over the entirety of the weak null,
we must be conservative if the sharp null is true
Presented a test statistic that can be exact for the sharp null,
and valid over a potentially innocuous subset of the weak null
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
92. Thanks!
Fogarty, C.B. Testing weak nulls in matched observational
studies. arXiv.
Fogarty, C.B. (2019+) Studentized sensitivity analysis for the
sample average treatment effect in paired observational studies.
Journal of the American Statistical Association.
Fogarty, C.B. (2018) On mitigating the analytical limitations of
finely stratified experiments Journal of the Royal Statistical
Society, Series B.
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
93. Paired Designs
For paired designs, ˜WΓi equals WΓi , and is equivalent to the unifying
procedure described in Fogarty (2019+).
Paired Designs
Recall that DΓi = ˆτi − {(Γ − 1)/(1 + Γ)}|ˆτi |. ˜WΓi and DΓi are
proportional,
WΓi =
(1 + Γ)2
4Γ
DΓi ,
such that E B−1 B
i=1
˜WΓi | F, Z ≤ 0 under the weak null
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
94. The Propensity Score as a Nuisance Parameter
For inference on average treatment effects under strong ignorability,
the propensity score, e(x) = pr(Z = 1 | X = x), is a nuisance
parameter.
Not of primary interest, but important for inference
Inverse Propensity Weighting, AIPW,...
Use a plug-in estimator, ˆe(x)
Pay attention to the rate of convergence of ˆe(x) and its impact
on resulting inference
How does matching try to handle nuisance parameters?
A mix of conditioning (dealing with observed covariates) and
optimizing (sensitivity to unmeasured confounders)
Imperfections in the strategy will be be investigated here
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
95. A Simple Model with Hidden Bias
Suppose treatment is strongly ignorable given (X, U) where U is
unobserved, and consider the following model probability of
assignment to treatment
logit{pr(Z = 1 | X = x, U = u)} = xβ + log(Γ)u
x are observed covariates
u ∈ [0, 1] is an unobserved scalar covariate
Γ is a sensitivity parameter
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
96. Conditional Permutation Tests
Let Ω = {z : zT
x = a}, where a is observed value of ZT
x in the
study, and consider conditioning upon Z ∈ Ω
Rosenbaum (1984) with hidden bias
pr(Z = z | x, u, Z ∈ Ω) =
exp{log(Γ)zT
u}
b∈Ω exp{log(Γ)bT u}
ZT
x is sufficient for β, so conditioning removes dependence on β
Inference still depends on u
Example: Suppose x ∈ RN×B
, where the ith of B columns contains
an indicator for membership in the ith stratum
βi = stratum-specific slope
(ZT
x)i = number of treated in ith stratum
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
97. Similarities with Matching
Consider optimal matching without replacement.
ith matched set is of size ni
mi is the number of treated individuals in matched set i
Letting Ω = {z : ni
j=1 zij = mi }, Rosenbaum (2002) suggests using
the following model for biased treatment assignments
pr(Z = z | x, u, Z ∈ Ω) =
exp{log(Γ)zT
u}
b∈Ω exp{log(Γ)bT u}
,
aligning with the randomization distribution on the previous slide
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019
98. Matching as a Search for Sufficiency
Modify the logit form to allow for nonlinearities in x
logit{pr(Z = 1 | X = x, U = u)} = ϕ(x) + log(Γ)u
In using the conditional inference described in Rosenbaum (2002), we
assume
1 ϕ(xij ) = ϕ(xik) for individuals j and k in the same matched set i
2 The matched structure would be invariant over z ∈ Ω
Of course, these won’t hold in practice
1 Discrepancies on ϕ(xij ) are inevitable (induce bias)
2 Had I observed a different z ∈ Ω, I may have attained a different
match (affects variance)
How much of a difference can this make? Working paper...
Colin Fogarty Sensitivity Analysis for Weak Nulls December 11, 2019