Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.5: Assessing Normality
Susie Bayarri Plenary Lecture given in the ISBA (International Society of Bayesian Analysis) World Meeting in Montreal, Canada on June 30, 2022, by Pierre E, Jacob (https://sites.google.com/site/pierrejacob/)
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of
decision problems under uncertainty. The exact approaches for computing decision based on possibilistic
networks are limited by the size of the possibility distributions. Generally, these approaches are based on
possibilistic propagation algorithms. An important step in the computation of the decision is the
transformation of the DAG (Direct Acyclic Graph) into a secondary structure, known as the junction trees
(JT). This transformation is known to be costly and represents a difficult problem. We propose in this paper
a new approximate approach for the computation of decision under uncertainty within possibilistic
networks. The computing of the optimal optimistic decision no longer goes through the junction tree
construction step. Instead, it is performed by calculating the degree of normalization in the moral graph
resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying
its preferences.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
This Logistic Regression Presentation will help you understand how a Logistic Regression algorithm works in Machine Learning. In this tutorial video, you will learn what is Supervised Learning, what is Classification problem and some associated algorithms, what is Logistic Regression, how it works with simple examples, the maths behind Logistic Regression, how it is different from Linear Regression and Logistic Regression applications. At the end, you will also see an interesting demo in Python on how to predict the number present in an image using Logistic Regression.
Below topics are covered in this Machine Learning Algorithms Presentation:
1. What is supervised learning?
2. What is classification? what are some of its solutions?
3. What is logistic regression?
4. Comparing linear and logistic regression
5. Logistic regression applications
6. Use case - Predicting the number in an image
What is Machine Learning: Machine Learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
- - - - - - - -
About Simplilearn Machine Learning course:
A form of artificial intelligence, Machine Learning is revolutionizing the world of computing as well as all people’s digital interactions. Machine Learning powers such innovative automated technologies as recommendation engines, facial recognition, fraud protection and even self-driving cars.This Machine Learning course prepares engineers, data scientists and other professionals with knowledge and hands-on skills required for certification and job competency in Machine Learning.
- - - - - - -
Why learn Machine Learning?
Machine Learning is taking over the world- and with that, there is a growing need among companies for professionals to know the ins and outs of Machine Learning
The Machine Learning market size is expected to grow from USD 1.03 Billion in 2016 to USD 8.81 Billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
- - - - - -
What skills will you learn from this Machine Learning course?
By the end of this Machine Learning course, you will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, naive bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
5. Be able to model a wide variety of robust Machine Learning algorithms including deep learning, clustering, and recommendation systems
- - - - - - -
Causal Inference in Data Science and Machine LearningBill Liu
Event: https://learn.xnextcon.com/event/eventdetails/W20042010
Video: https://www.youtube.com/channel/UCj09XsAWj-RF9kY4UvBJh_A
Modern machine learning techniques are able to learn highly complex associations from data, which has led to amazing progress in computer vision, NLP, and other predictive tasks. However, there are limitations to inference from purely probabilistic or associational information. Without understanding causal relationships, ML models are unable to provide actionable recommendations, perform poorly in new, but related environments, and suffer from a lack of interpretability.
In this talk, I provide an introduction to the field of causal inference, discuss its importance in addressing some of the current limitations in machine learning, and provide some real-world examples from my experience as a data scientist at Brex.
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.5: Assessing Normality
Susie Bayarri Plenary Lecture given in the ISBA (International Society of Bayesian Analysis) World Meeting in Montreal, Canada on June 30, 2022, by Pierre E, Jacob (https://sites.google.com/site/pierrejacob/)
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of
decision problems under uncertainty. The exact approaches for computing decision based on possibilistic
networks are limited by the size of the possibility distributions. Generally, these approaches are based on
possibilistic propagation algorithms. An important step in the computation of the decision is the
transformation of the DAG (Direct Acyclic Graph) into a secondary structure, known as the junction trees
(JT). This transformation is known to be costly and represents a difficult problem. We propose in this paper
a new approximate approach for the computation of decision under uncertainty within possibilistic
networks. The computing of the optimal optimistic decision no longer goes through the junction tree
construction step. Instead, it is performed by calculating the degree of normalization in the moral graph
resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying
its preferences.
Knowledge of cause-effect relationships is central to the field of climate science, supporting mechanistic understanding, observational sampling strategies, experimental design, model development and model prediction. While the major causal connections in our planet's climate system are already known, there is still potential for new discoveries in some areas. The purpose of this talk is to make this community familiar with a variety of available tools to discover potential cause-effect relationships from observed or simulation data. Some of these tools are already in use in climate science, others are just emerging in recent years. None of them are miracle solutions, but many can provide important pieces of information to climate scientists. An important way to use such methods is to generate cause-effect hypotheses that climate experts can then study further. In this talk we will (1) introduce key concepts important for causal analysis; (2) discuss some methods based on the concepts of Granger causality and Pearl causality; (3) point out some strengths and limitations of these approaches; and (4) illustrate such methods using a few real-world examples from climate science.
I explore ways to combine complex network science with the Rubin model of conference inference. In broad strokes, I discuss the difference between exogenous shocks and endogenous process, and how granularity in time can be used to tease causality out of a complex system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
I am using DL & Actor critic tools for solving Variational inference problem. The intriguing part from my hand is that the likelihood has a Beta distribution.Thus we handle both VI issues and a non common distributions
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
Bitcoin is a crypto-currency that enables peer-to-peer payments in a way that is partially anonymous and which requires no central authorities. These features are attractive to many criminals, as it is difficult to track and shut down Bitcoin payments — difficult, but not impossible. In this talk, I present novel approaches to trace the flow of bitcoins in two types of criminal operations: (i) human traffickers that advertised sex services on Backpage [KDD ’17]; and (ii) ransomware, which encrypts files and demands bitcoins in exchange for decryption keys [IEEE S&P ’18].
Cryptocurrency (crypto) is a digital asset whose value is the function of the algorithm it is based on. This algorithm, or protocol, is the core of the crypto. It defines, among many other things, whether new crypto units can be created, the circumstances under which new units can be created, and the ownership of these new units. The protocol is coded and distributed to users as software. Being software, crypto is prone to bugs, requires upgrades, and subject to hacking.
We use this variation in holders’ type and in crypto type to study two questions: (1) Do announcements that provide important information about the future of the crypto (e.g., protocol upgrades) increase the information asymmetry across sophisticated investors and pure users? and (2) Do announcements (e.g., on Twitter) with wider spread decrease the value lost by pure users? That is, does a medium like Twitter allows for wide reach that is effective enough to also inform pure users and thus decrease the information asymmetry?
Cryptocurrency exchanges offer challenges to even the most seasoned data scientists, security engineers and financial analysts. Everything from customer due diligence to predictive analytics to cybersecurity to illicit activity tracking requires both techniques and data that are unique to cryptocurrency and often hard to obtain. The ease by which dishonest exchanges can inflate trading volumes and the difficulty of obtaining approval for custody solutions presents hurdles when seeking formal regulatory approval. Advancements in Deep Fake videos and highly targeted phishing campaigns have tipped the scales in favor of attackers, keeping security teams constantly trying to stay one step ahead of adversaries and leaving enforcement bodies wondering about what precisely to collect in terms of forensic evidence. My talk will describe how a regulated cryptocurrency spot and futures exchange went from an idea to a fully regulated entity, the challenges we encountered along the way with regard to the design of custody and unique data analytics, and the open research challenges remaining.
The goal of this work is to advance our understanding of what new can be learned about crypto-tokens by analyzing the topological structure of the Ethereum transaction network. By introducing a novel combination of tools from topological data analysis and functional data depth into blockchain data analytics, we show that Ethereum network can provide critical insights on price strikes of crypto-token prices that are otherwise largely inaccessible with conventional data sources and traditional analytic methods.
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
More from The Statistical and Applied Mathematical Sciences Institute (20)
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference, and Sensitivity - Alex D'Amour, December 9, 2019
1. Latent Variable Models,
Causal Inference,
and Sensitivity Analysis
Alex D’Amour
Google Brain
SAMAI Causal Inference Opening Workshop
Dec 8, 2019
2. Goal
Summarize a recent conversation at the intersection of ML
and Causal Inference communities.
Advocate for sensitivity analysis as a constructive way
forward.
17. Apply standard causal adjustment
Lots of ML tools for fitting the factor
model.
Factor Model as “Soft” Observation
18. Methods of this type appear across science, and are
standard procedure (e.g., Price et al 2006 in GWAS).
This paper brings them under the causal
microscope.
19. Methods of this type appear across science, and are
standard procedure (e.g., Price et al 2006 in GWAS).
This paper finally brings them under the causal
microscope.
20. Methods of this type appear across science, and are
standard procedure (e.g., Price et al 2006 in GWAS).
This paper makes the causal argument explicit.
21. The factor model setting buys useful information,
but not sufficient for identification.
23. Knowing the Audience
The ML community is optimistic.
Presumption: Identified until proven otherwise (this is changing).
24. Knowing the Audience
The ML community is optimistic.
Presumption: Identified until proven otherwise (this is changing).
Proofs of identification.
25. Knowing the Audience
The ML community is optimistic.
Presumption: Identified until proven otherwise (this is changing).
Proofs of identification.
Counter-examples with non-identification.
27. Causal Inference Optimism
In every badly identified model,
there lies a great* sensitivity analysis.
*Here, “great” means a sensitivity analysis that maps out a true
region of partial identification. See, e.g.,
https://arxiv.org/abs/1809.00399 and cited work.
28. Sensitivity Analysis
Constructive proof of non-identification.
Identify indeterminacy in the model for U, A, Y.
Ignorance region: the set of causal parameters compatible
with observed data distribution.
37. How much do we know about the confounding?
1. P(U | A) ambiguous.MoreinformationaboutU
38. How much do we know about the confounding?
1. P(U | A) ambiguous.
2. P(U | A) identified.
MoreinformationaboutU
39. How much do we know about the confounding?
1. P(U | A) ambiguous.
2. P(U | A) identified.
3. P(U | A) degenerate.
MoreinformationaboutU
40. Rosenbaum and Rubin 1983
Single cause setting.
All binary.
Parameterize violations of
ignorability by 𝛿 and 𝛼.
Examine range of conclusions
as parameters vary.
Rosenbaum and Rubin 1983, Assessing Sensitivity to an Unobserved Binary
Covariate in an Observational Study with Binary Outcome. JRSS-B.
𝛿𝛼
41. Binary with Multiple Causes
Note: m ≥ 3 and pA
(U) non-trivial ⇒ P(U, A) unique.
42. Rosenbaum and Rubin 1983
Rosenbaum and Rubin 1983, Assessing Sensitivity to an Unobserved Binary
Covariate in an Observational Study with Binary Outcome. JRSS-B.
𝛿𝛼
43. Rosenbaum and Rubin 1983
Rosenbaum and Rubin 1983, Assessing Sensitivity to an Unobserved Binary
Covariate in an Observational Study with Binary Outcome. JRSS-B.
𝛿𝛼
80. Where do the assumptions live?
Parameterize for
sensitivity analysis
81. Where do the assumptions live?
Interpretation
hinges on this
82. Going Forward
In every badly identified model, there lies a great sensitivity analysis.
Many threats to identification not covered here.
E.g., partitioning variation between confounders and mediators;
measurement error models (for some of these concerns, see Ogburn et al
comment).
86. Two Approaches to “Downsizing”
Effects on the treated
Subset of causes
87. “Consistent” Confounder Recovery (no overlap)
Require
Then missing outcome distributions are identifiable for values of A that map to
same value of Z. Specifically, for the above estimands
88. “Consistent” Confounder Recovery
Require
Then missing outcome distributions are identifiable for values of A that map to
same value of Z. Specifically, for the above estimands
Where’d the factor
model go?
89. The Deconfounder as Detector
Suppose we obtain a function
That renders the causes A mutually
conditionally independent. Then,
Note: Requires overlap in hat z(A, X) to be useful.
91. Strategy 1: Change the Estimand
Give up on estimating effect of all causes simultaneously.
Set some causes aside. Call these W.
Simpler estimand,
harder to interpret.
92. Strategy 1: Change the Estimand
If P(U | W) degenerate, identified! (Theorem 7).
Technically, no unobserved confounding.
Factor model provides useful
structure.
(Also approach in Price et al 2006)
93. Add a negative control exposure Z.
Strategy 2: Augment the Data
Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy
variables of an unmeasured confounder. Biometrika. 2018.
94. Strategy 2: Augment the Data
Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy
variables of an unmeasured confounder. Biometrika. 2018.
Under completeness conditions, conditional independence
between Z and Y identifies the copula.
Completeness conditions
put strong constraints on
how Z, W represent U.
95. An Identification Result
Kong, Yang, and Wang 2019. Multi-causal causal inference
with unmeasured confounding and binary outcome.
https://arxiv.org/abs/1907.13323.
96. An Identification Result
Kong, Yang, and Wang 2019. Multi-causal causal inference
with unmeasured confounding and binary outcome.
https://arxiv.org/abs/1907.13323.
✔Identified!
97. An Identification Result?
Kong, Yang, and Wang 2019. Multi-causal causal inference
with unmeasured confounding and binary outcome.
https://arxiv.org/abs/1907.13323.
98. An Identification Result?
Kong, Yang, and Wang 2019. Multi-causal causal inference
with unmeasured confounding and binary outcome.
https://arxiv.org/abs/1907.13323.
✘Not identified!