Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
In general, a factorial experiment involves several variables.
One variable is the response variable, which is sometimes called the outcome variable or the dependent variable.
The other variables are called factors.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
2008 JSM - Meta Study Data vs Patient DataTerry Liao
Hsini (Terry) Liao, Ph.D., Yun Lu, Hong Wang, “Comparison of Individual Patient-Level and Study-Level Meta-Analyses Using time to Event Analysis in Drug-Eluting Stent Data”, Abstract No 301037, Joint Statistical Meetings, Session No 90, Denver, CO, August 2008
An extended GEE model when estimating marginal associations across multiple informants on the same outcome and formally comparing those associations in a hierarchical data set
Ee184405 statistika dan stokastik statistik deskriptif 1 grafikyusufbf
Statistika adalah suatu bidang ilmu yang mempelajari cara-cara mengumpulkan data untuk selanjutnya dapat dideskripsikan dan diolah, kemudian melakukan induksi/inferensi dalam rangka membuat kesimpulan, agar dapat ditentukan keputusan yang akan diambil berdasarkan data yang dimiliki.
DATA =============> PROSES STATISTIK ===========> INFORMASI
Statistik Deskriptif adalah suatu cara menggambarkan persoalan yang berdasarkan data yang dimiliki yakni dengan cara menata data tersebut sedemikian rupa agar karakteristik data dapat dipahami dengan mudah sehingga berguna untuk keperluan selanjutnya.
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
Using the Breeder GA to Optimize a Multiple Regression Analysis Modelinfopapers
Florin Stoica, Cornel Gheorghe Boitor, Using the Breeder genetic algorithm to optimize a multiple regression analysis model used in prediction of the mesiodistal width of unerupted teeth, International Journal of Computers, Communications & Control, Vol 9, No 1, pp. 62-70, ISSN 1841-9836, february 2014
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
Many Decision Problems in business and social systems can be modeled using mathematical optimization, which seeks to maximize or minimize some objective which is a function of the decisions.
Stochastic Optimization Problems are mathematical programs where some of the data incorporated into the objective or constraints are Uncertain.
whereas, Deterministic Optimization Problems are formulated with known parameters.
In general, a factorial experiment involves several variables.
One variable is the response variable, which is sometimes called the outcome variable or the dependent variable.
The other variables are called factors.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
2008 JSM - Meta Study Data vs Patient DataTerry Liao
Hsini (Terry) Liao, Ph.D., Yun Lu, Hong Wang, “Comparison of Individual Patient-Level and Study-Level Meta-Analyses Using time to Event Analysis in Drug-Eluting Stent Data”, Abstract No 301037, Joint Statistical Meetings, Session No 90, Denver, CO, August 2008
An extended GEE model when estimating marginal associations across multiple informants on the same outcome and formally comparing those associations in a hierarchical data set
Ee184405 statistika dan stokastik statistik deskriptif 1 grafikyusufbf
Statistika adalah suatu bidang ilmu yang mempelajari cara-cara mengumpulkan data untuk selanjutnya dapat dideskripsikan dan diolah, kemudian melakukan induksi/inferensi dalam rangka membuat kesimpulan, agar dapat ditentukan keputusan yang akan diambil berdasarkan data yang dimiliki.
DATA =============> PROSES STATISTIK ===========> INFORMASI
Statistik Deskriptif adalah suatu cara menggambarkan persoalan yang berdasarkan data yang dimiliki yakni dengan cara menata data tersebut sedemikian rupa agar karakteristik data dapat dipahami dengan mudah sehingga berguna untuk keperluan selanjutnya.
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
Using the Breeder GA to Optimize a Multiple Regression Analysis Modelinfopapers
Florin Stoica, Cornel Gheorghe Boitor, Using the Breeder genetic algorithm to optimize a multiple regression analysis model used in prediction of the mesiodistal width of unerupted teeth, International Journal of Computers, Communications & Control, Vol 9, No 1, pp. 62-70, ISSN 1841-9836, february 2014
Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.
Many Decision Problems in business and social systems can be modeled using mathematical optimization, which seeks to maximize or minimize some objective which is a function of the decisions.
Stochastic Optimization Problems are mathematical programs where some of the data incorporated into the objective or constraints are Uncertain.
whereas, Deterministic Optimization Problems are formulated with known parameters.
Similar to Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019
Inferring cause-effect relationships between variables is of primary importance in many sciences. In this talk, I will discuss two approaches for making valid inference on treatment effects when a large number of covariates are present. The first approach is to perform model selection and then to deliver inference based on the selected model. If the inference is made ignoring the randomness of the model selection process, then there could be severe biases in estimating the parameters of interest. While the estimation bias in an under-fitted model is well understood, I will address a lesser known bias that arises from an over-fitted model. The over-fitting bias can be eliminated through data splitting at the cost of statistical efficiency, and I will propose a repeated data splitting approach to mitigate the efficiency loss. The second approach concerns the existing methods for debiased inference. I will show that the debiasing approach is an extension of OLS to high dimensions, and that a careful bias analysis leads to an improvement to further control the bias. The comparison between these two approaches provides insights into their intrinsic bias-variance trade-off, and I will show that the debiasing approach may lose efficiency in observational studies.
Similar to Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019 (20)
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
Bitcoin is a crypto-currency that enables peer-to-peer payments in a way that is partially anonymous and which requires no central authorities. These features are attractive to many criminals, as it is difficult to track and shut down Bitcoin payments — difficult, but not impossible. In this talk, I present novel approaches to trace the flow of bitcoins in two types of criminal operations: (i) human traffickers that advertised sex services on Backpage [KDD ’17]; and (ii) ransomware, which encrypts files and demands bitcoins in exchange for decryption keys [IEEE S&P ’18].
Cryptocurrency (crypto) is a digital asset whose value is the function of the algorithm it is based on. This algorithm, or protocol, is the core of the crypto. It defines, among many other things, whether new crypto units can be created, the circumstances under which new units can be created, and the ownership of these new units. The protocol is coded and distributed to users as software. Being software, crypto is prone to bugs, requires upgrades, and subject to hacking.
We use this variation in holders’ type and in crypto type to study two questions: (1) Do announcements that provide important information about the future of the crypto (e.g., protocol upgrades) increase the information asymmetry across sophisticated investors and pure users? and (2) Do announcements (e.g., on Twitter) with wider spread decrease the value lost by pure users? That is, does a medium like Twitter allows for wide reach that is effective enough to also inform pure users and thus decrease the information asymmetry?
Cryptocurrency exchanges offer challenges to even the most seasoned data scientists, security engineers and financial analysts. Everything from customer due diligence to predictive analytics to cybersecurity to illicit activity tracking requires both techniques and data that are unique to cryptocurrency and often hard to obtain. The ease by which dishonest exchanges can inflate trading volumes and the difficulty of obtaining approval for custody solutions presents hurdles when seeking formal regulatory approval. Advancements in Deep Fake videos and highly targeted phishing campaigns have tipped the scales in favor of attackers, keeping security teams constantly trying to stay one step ahead of adversaries and leaving enforcement bodies wondering about what precisely to collect in terms of forensic evidence. My talk will describe how a regulated cryptocurrency spot and futures exchange went from an idea to a fully regulated entity, the challenges we encountered along the way with regard to the design of custody and unique data analytics, and the open research challenges remaining.
The goal of this work is to advance our understanding of what new can be learned about crypto-tokens by analyzing the topological structure of the Ethereum transaction network. By introducing a novel combination of tools from topological data analysis and functional data depth into blockchain data analytics, we show that Ethereum network can provide critical insights on price strikes of crypto-token prices that are otherwise largely inaccessible with conventional data sources and traditional analytic methods.
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
More from The Statistical and Applied Mathematical Sciences Institute (20)
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019
1. Bayesian Nonparametric Models for Treatment
Effect Heterogeneity: Parameterization, Prior
Choice, and Posterior Summarization
Jared S. Murray – The University of Texas at Austin.
Joint work with Carlos Carvalho, P. Richard Hahn, David Yeager, et al.
December 2019
3. National Study of Learning Mindsets
• National Study of Learning Mindsets (Yeager et. al., 2019):
Randomized controlled trial of a low-cost mindset intervention
• Probability sample, 65 schools in this analysis (> 11, 000 9th
grade students)
• Specifically designed to assess treatment effect heterogeneity
by factors including baseline levels of growth mindset beliefs
and overall school quality/achievement
2
4. National Study of Learning Mindsets
Desiderata for our analysis:
• Avoid model selection/specification search
• School-level effect heterogeneity explained and unexplained by
covariates
• Individual-level effect heterogeneity explained by covariates
• Interpretable model summaries for communicating results
• Uncertainty!
We can get all these by modifying Bayesian causal forests (Hahn,
Murray, and Carvalho (2017))
3
5. Our (generic) assumptions
Strong ignorability:
Yi(0), Yi(1) ⊥⊥ Zi | Xi = xi,
Positivity:
0 < Pr(Zi = 1 | Xi = xi) < 1
for all i. Then
P(Y(z) | x) = P(Y | Z = z, x)
,
and the conditional average treatment effect (CATE) is
τ(xi) : = E(Yi(1) − Yi(0) | xi)
= E(Yi | xi, Zi = 1) − E(Yi | xi, Zi = 0).
4
6. Parameterizing Nonparametric Models of Causal Effects
Let’s forget covariates for a second and pretend we have a simple
RCT.
A simple model:
(Yi | Zi = 0)
iid
∼ N(µ0, σ2
)
(Yi | Zi = 1)
iid
∼ N(µ1, σ2
)
where the estimand of interest is τ ≡ µ1 − µ0.
If µj ∼ N(ϕj, δj) independently then τ ∼ N(ϕ1 − ϕ0, δ0 + δ1)
Often we have stronger prior information about τ than µ1 or µ0 – in
particular, we expect it to be small.
5
7. Parameterizing Nonparametric Models of Causal Effects
A more natural parameterization:
(Yi | Zi = 0)
iid
∼ N(µ, σ2
)
(Yi | Zi = 1)
iid
∼ N(µ + τ, σ2
)
where the estimand of interest is still τ.
Now we can express prior beliefs on τ directly.
6
8. Parameterizing Nonparametric Models of Causal Effects
How does this relate to models for heterogeneous treatment effects?
Consider (mostly) separate models for treatment arms:
yi = fzi
(xi) + ϵi ϵi ∼ P
Independent priors on f0, f1 → prior on τ(x) ≡ f1(x) − f0(x) has larger
variance than prior on f0 or f1
No direct prior control – simple forms for f0, f1 can compose to
complex τ.
Every variable in x is necessarily a potential effect modifier.
7
9. Parameterizing Nonparametric Models of Causal Effects
What about the “just another covariate” parameterization?
yi = f(xi, zi) + ϵi ϵi ∼ P
Then the heterogeneous treatment effects given by
τ(x) ≡ f(x, 1) − f(x, 0)
and every variable in x is still a potential effect modifier, and we
have no direct prior control for most priors on f...
8
10. Parameterizing Nonparametric Models of Causal Effects
Set f(xi, zi) = µ(xi) + τ(wi)zi, where w is a subset of x:
yi = µ(xi) + τ(wi)zi + ϵi, ϵi ∼ P
The heterogeneous treatment effects are given by τ(w) directly, and
can be easily regularized.
In Hahn et. al. (2017) we use independent Bayesian additive
regression tree priors on µ and τ (“Bayesian causal forests”)
9
11. Building Functions with Regression Trees
Represent each function µ and τ as the sum of many small
regression trees:
∑
h
g(x, Th, Mh)Building Regression Functions with Trees
10
12. Tweaking priors for causal effects
Several adjustments to the BART prior on τ:
• Higher probability on smaller τ trees (than BART defaults)
• Higher probability on “stumps”, trees that never split (all stumps
= homogeneous effects)
• N+
(0, v) Hyperprior on the scale of leaf parameters in τ
11
13. What about observational data?
What changes with (measured) confounding?
Not much, but we should include an estimated propensity score as a
covariate:
yi = µ(xi, ˆπ(xi)) + τ(wi)zi + ϵi, ϵi ∼ P
Mitigates regularization induced confounding (RIC); see Hahn et. al.
(2017) for details.
Lots of independent empirical evidence this is important (cf Dorie et
al (2019), Wendling et al (2019)).
12
15. Analysis of National Study of Learning Mindsets
• A new analaysis of effects on math GPA in the overall population
of students
• Interesting moderators are baseline level of mindset norms,
school achievement, and minority composition
• Many, many other controls
The usual approach is the multilevel linear model...
14
16. Multilevel BCF
Replaces the linear pieces of a multilevel model with functions that
have BART priors:
yij = αj + µ(xij) +
[
ϕj + τ(wij)
]
zij + ϵij, ϵij ∼ N(0, σ2
)
αj, ϕj have standard random effect priors (normal with half-t
hyperpriors on scale)
w = school achievement, baseline mindset norms, minority
composition
15
17. We fit this model, now what?
How do we present our results? Average treatment effect (ATE),
school ATE, plots of (school) ATE by xj....
Two questions from our collaborators:
• For which groups was the treatment most/least effective?
• What is the impact of posited treatment effect modifiers,
holding other modifiers constant?
16
18. We fit this model, now what?
How do we present our results? Average treatment effect (ATE),
school ATE, plots of (school) ATE by xj....
Two questions from our collaborators:
• For which groups was the treatment most/least effective?
• What is the impact of posited treatment effect modifiers,
holding other modifiers constant?
16
19. Subgroup finding as a decision problem
The action γ is choosing subgroups, here represented by a recursive
partition (tree)
• Minimize the posterior expected loss
ˆγ = arg min
˜γ∈Γ
Eτ [d(τ, ˜γ) + p(˜γ) | Y, x]
where p() is a complexity penalty and d() is squared error
averaged over a distribution for x.
• With ˆγ in hand we can look at the joint posterior distribution of
subgroup ATEs
Relaxing complexity penalties in the loss function → growing a
deeper tree
(Hahn et al (2017), Sivaganesan et al (2017))
17
20. Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
−0.05 0.00 0.05 0.10 0.15
0510152025
High Achieving
CATE = 0.016
n = 5023
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
21. Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
−0.05 0.00 0.05 0.10 0.15
0510152025
High Achieving
CATE = 0.016
n = 5023
−0.05 0.05 0.10 0.15 0.20
04812
Diff in Subgroup ATE
Density
Pr(diff > 0) = 0.93
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
22. Achievement
Norms
Lower Achieving
High Norm
CATE = 0.073
n = 3265
High Achieving
CATE = 0.016
n = 5023
Pr(diff > 0) = 0.81
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
Low Achieving
Low Norm
CATE = 0.010
n = 1208
Mid Achieving
Low Norm
CATE = 0.045
n = 2045
Achievement
≤ 0.38 > 0.38
−0.05 0.00 0.05 0.10 0.15
0510152025
−0.05 0.05 0.10 0.15 0.20
048
Diff in Subgroup ATE
23. We fit this model, now what?
How do we present our results? ATE, school ATE, plots of school ATE
by variables....
Two questions we got from our collaborators:
• For which groups was the treatment most/least effective?
• What was the impact of posited treatment effect modifiers,
holding other modifiers constant?
We could generate posterior distributions of individual-level partial
effects manually and try to summarize these, or try to approximate τ
by a proxy with simple partial effects (i.e. an additive function)
18
24. Counterfactual treatment effect predictions
• How do estimated treatment
effects change in lower
achieving/low norm schools if
norms increase, holding
constant school minority comp
& achievement?
• Not a formal causal mediation/
moderation analysis (roughly,
we would need “no
unmeasured moderators
correlated with norms”)
Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
High Achieving
CATE = 0.016
n = 5023
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
25. 0
5
10
15
0.0 0.1 0.2
CATE
density
Increase +0.5 IQR +1 IQR +10% Orig Group Low Norm/Lower Ach High Norm/Lower Ach
Original 0.032 (-0.011 0.076)
+10% 0.050 (0.005, 0.097)
+Half IQR 0.051 (0.005, 0.099)
+Full IQR 0.059 (0.009, 0.114)
1 IQR = 0.6 extra problems
on worksheet task
26. An Alternative: Interpretable Summaries
The goal: Examine the “best” (in a user-defined sense) simple
approximation to the “true” τ(x)
Given samples of τ,
1. Consider a class of simple/interpretable approximations Γ to τ
2. Make inference on
γ = arg min
˜γ∈Γ
d(τ, ˜γ) + p(˜γ)
for an appropriate distance function d and complexity penalty
p(γ)
Get draws of γ by solving the optimization for each draw of τ
Also get discrepancy metrics, like pseudo-R2
: Cor2
[γ(xi), τ(xi)]
(Woody, Carvalho, and Murray (2019) for this approach in predictive models)
19
27. Approximate partial effects of treatment effect modifiers
Here we use:
• An additive approximation γ(x)
• d(τ, γ) =
∑n
i=1(τ(xi) − γ(xi))2
• A smoothness penalty p(γ)
Don’t average draws to get a point estimate! Treat
d(τ, ˜γ) + p(˜γ) =
n∑
i=1
(τ(xi) − γ(xi))2
+ p(˜γ)
as a loss function, and minimize posterior expected loss:
ˆγ = arg min
˜γ∈Γ
Eτ [d(τ, ˜γ) + p(˜γ) | Y, x]
20
28. −2 −1 0 1 2
−0.2−0.10.00.10.2
Approx Additive Effect
School Achievement
gamma_j
2.0 2.5 3.0 3.5
−0.15−0.10−0.050.000.050.10
Approx Additive Effect
Baseline Mindset Norms
gamma_j
0.4 0.5 0.6 0.7 0.8 0.9 1.0
01234
Approximation R^2
N = 2000 Bandwidth = 0.01901
Density
(Partial effect of minority composition not shown)
29. −2 −1 0 1 2
−0.2−0.10.00.10.2
Approx Additive Effect
School Achievement
gamma_j
2.0 2.5 3.0 3.5
−0.15−0.10−0.050.000.050.10
Approx Additive Effect
Baseline Mindset Norms
gamma_j
0.4 0.5 0.6 0.7 0.8 0.9 1.0
051015
Approximation R^2
N = 2000 Bandwidth = 0.004762
Density
(Partial effect of minority composition not shown)
30. Thank you!
• Yeager, David S., Paul Hanselman, Gregory M. Walton, Jared S.
Murray, Robert Crosnoe, Chandra Muller, Elizabeth Tipton et al.
”A national experiment reveals where a growth mindset
improves achievement.” Nature 573, no. 7774 (2019): 364-369.
• Hahn, P. Richard , Jared S. Murray, Carlos M. Carvalho: “Bayesian
regression tree models for causal inference: regularization,
confounding, and heterogeneous effects”, 2017; arXiv:1706.09523.
• Woody, Spencer, Carlos M. Carvalho, Jared S. Murray: “Model
interpretation through lower-dimensional posterior
summarization”, 2019; arXiv:1905.07103.
21