SlideShare a Scribd company logo
Estimating Causal
Effects from
Observations
Advanced Data Analysis
from an Elementary Point of View
Credits Team
The slides below are derived from the
Chapter 27 of the book “Advanced
Data Analysis from an Elementary
Point of View“ by Cosma Shalizi of the
Carnegie Mellon University, which
was created in order to assist the
“Advanced Data Analysis” course of
the CMU.
Antigoni-Maria Founta,
UIM: 647
Ioannis Athanasiadis,
UIM: 607
Overview
Identification & Estimation
➔ Back-door Criteria & Estimators
◆ Estimating Average Causal Effects
◆ Avoiding Estimating Marginal Distributions
◆ Matching
◆ Propensity Scores
◆ Propensity Score Matching
➔ Instrumental Variables Estimates
Conclusion
➔ Uncertainty & Inference
➔ Comparison of Methods
Identification of Causal Effects
● Causal Conditioning: Pr(Y|do(X = x))
//Counter-factual, not always identifiable even if X & Y are observable
● Probabilistic Conditioning: Pr(Y|X = x)
//Factual, always identifiable if X & Y are observable
Our goal is to identify when quantities like the former are
functions of the distribution of observable variables!
And once identified, the next question is how we can actually estimate it...
Identification/Estimation of Causal Effects
Each one of the next techniques provides us with both identification
of causal effects as well as formulas to estimate them:
1. Back-door Criterion
2. Front-door Criterion
3. Instrumental Variables
* Since estimating with the front-door criterion amounts to doing two rounds of
back-door adjustment, we will continue this lecture working only with the back-door
criterion.
*
Back-door Criteria
Estimations
Back-door criterion
CRITERIA:
I. S blocks every back-door path between X and Y
II. No node in S is a descendant of X
S is each subset that meets the
back-door criteria.
E.g.
● S={S1,S2} or S={S3} or
S={S1,S2,S3} meet criteria
● S={S1} or S={S2} or S={S3,B}
fail the criteria
Back-door Criteria Formula
When S satisfies the back-door criterion, the formula to estimate the Causal
Conditioning becomes:
...so we have transformed the unidentifiable Causal Conditioning to a distribution of
observables.
Estimations in the Back-Door Criteria
There are some special cases and tricks which are worth knowing about,
that will be analysed below.
1. Estimating Average Causal Effects
2. Avoiding Estimating Marginal Distributions
3. Matching
4. Propensity Scores
5. Propensity Score Matching
1 2
3 4
5
1. Estimating average causal effects I
Finding the average causal effects can summarize what there is to know of the effect
X has on Y. We don’t always learn all there is to know, however it’s still useful!
● Average Effect (or sometimes just the effect of do(X=x)) *
*When it makes sense for Y to have an expectation value
1. Estimating average causal effects II
● If we apply the Back-Door Criterion Formula with control variables S:
=>
µ(x, s)
(Inner Conditional Expectation => Regression Function)
We can compute µ(x, s) but we still need to know Pr(S=s)...
2. Avoiding Estimating Marginal Distributions
● The demands for computing, enumerating and summarizing the various
distributions of the Back-Door Criteria Formula is very high, so we need to
reduce these burdens. One shortcut is to smartly enumerate the possible values
of S.
● How? Law of Large Numbers!
With a large sample:
3. Matching (Binary Levels)
Combination of the above methods:
● By the Estimation of Average Causal Effects we got:
E [Y|do(X=x)] = ΣPr(S=s)µ(x, s)
● By the Law of Large Numbers we got:
E [Y|do(X=x)] = 1/n ΣPr(Y=y | X = x, S = si
)
When the variable is binary, we are interested in the expectations of X=0 or 1.
So we want the difference: E [Y|do(X=1)] - E [Y|do(X=0)]
This is called Average Treatment Effect or ATE!
3. Matching (Binary Levels)
ATE = E [Y|do(X=1)] - E [Y|do(X=0)] →
OR
3. Matching (Binary Levels)
● But what we can never observe µ(x, s) - we need to consider the errors!
● What we can see is Y = µ(xi
, si
) + ei
!
● So if we want to compute Yi
- Yj
we will actually have:
3. Matching (Binary Levels)
If we make an assumption that there is an s whose value is the same for two units i
and i*, where X=0 and X=1, then we can say that:
So, if we can find a match i* for every unit i, then:
Matching works vastly better than estimating μ through a linear model!
3. Matching (Binary Levels) - Conclusion
Matching is really just Nearest Neighbor Regression!
● Many technicalities arise. What if we match a unit against multiple units / we
can’t find exact match?
If not an exact match -> match each treated unit against the control-group unit
with the closest values of the covariates.
● Matching does not solve the identification problem. We need S to satisfy the
back-door criterion!
● As any NN method, many issues arise. Fixed K → low bias, but not always
what we want. Prioritizing low bias is not always a good idea.
Bias - Variance tradeoff is a tradeoff!
4. Propensity Scores
● Curse of dimensionality over data or conditional distributions is actually
manageable if control variables have few dimensions.
● If R, S sets of control variables, both satisfy back-door criterion and have all else
equal but dimensions, we should choose the one with the fewer variables (e.g. R).
● Especially when we can say that R = f(S): if S satisfies the back-door criterion, then
so does R. Since R is a function of S, both the computational and the statistical
problems which come from using R are no worse than those of using S, and
possibly much better, if R has much lower dimension.
4. Propensity Scores
● All that is required is that some combinations of the variables in S carry the same
information about X as the whole of S does - summary score!
● We (hope we) can reduce a p-dimensional set of control variables to a one-dimensional set.
● In reality that is difficult as it depends on the functional form a non-parametric case.
However, there is a special case where we can possibly use one-dimensional summary;
when X is binary! So in that case:
Propensity score: f(S) = Pr(X=1|S=s), which is our summary R
● However, still no analytical formula for f(S) so it needs to be modeled and estimated. The
most common model used is Logistic Regression, but in a high-dimensional S that would also
suffer from curse of dimensionality.
5. Propensity Score Matching
● Even if X is a binary variable, if S suffers from high dimensions then matching is
a really tough task → the more the levels of S variables, the lower the matching.
● Solution: Matching on R = f(S) - propensity scores!
● The gain here is in computational tractability and (perhaps) statistical
efficiency, not in fundamental identification. Que sera, sera…
Incredibly popular method ever since, huge number of implementations.
Overview
Identification & Estimation
➔ Back-door Criteria & Estimators
◆ Estimating Average Causal Effects
◆ Avoiding Estimating Marginal Distributions
◆ Matching
◆ Propensity Scores
◆ Propensity Score Matching
➔ Instrumental Variables Estimates
Conclusion
➔ Uncertainty & Inference
➔ Comparison of Methods
Instrumental Variables
Instrumental Variables
● I is an instrument for identifying the effect of X on Y if:
○ I is a cause of X
○ I is associated to Y only through X
○ Given some controls S, I is a valid instrument when every path from I to Y left open by S has an
arrow into X
● Change one-unit in I → changes α-unit in X → changes αβ-unit in Y
● So, β is the way for us to estimate the causal coefficient of X on Y!
● We need to estimate β so that we satisfy all the above criteria. However, things
get very complicated when we care about many instruments and/or interest
about causal of multiple variables.
Instrumental Variables
● In general, we need two steps * //2-stage Regression / 2-stage Least Squares
1. Regress X on I and S. Call the fitted values xˆ.
//We see how much changes in the instruments affect X.
2. Regress Y on xˆ and S, but not on I. The coefficient of Y on xˆ is a consistent estimate
of β.
//We see how much these I-caused changes in X change Y
● In the simplest case, when everything is linear, the above procedure is
calculated by the Wald estimator:
* This works, provided that the linearity assumption holds.
Instrumental Variables
● There are circumstances where it is possible to use instrumental variables in
nonlinear and even nonparametric models, BUT the technique becomes far
more complicated.
● Solving integral equations like this:
It is not impossible, BUT it’s painful :-(
The techniques needed are really complicated.
Overview
Identification & Estimation
➔ Back-door Criteria & Estimators
◆ Estimating Average Causal Effects
◆ Avoiding Estimating Marginal Distributions
◆ Matching
◆ Propensity Scores
◆ Propensity Score Matching
➔ Instrumental Variables Estimates
Conclusion
➔ Uncertainty & Inference
➔ Comparison of Methods
Uncertainty & Inference
We can reduce the problem of causal inference → ordinary statistical inference.
So, in general, with the use of analytical formulas we can assess our uncertainty
about any of our estimates of causal effects the same way we would assess any
other statistical inference (e.g. standard errors).
However, in the two-stage least squares, taking standard errors, etc for β from the
usual formulas for the second regression neglects the fact that this estimate of β
comes from regressing Y on x, which is it itself an estimate and so uncertain.
Comparison of Methods
Matching Instrumental Variables
(+) Clever
(-) Assumptions on
underlying DAG
Crucial Point
The covariates block all
back-door pathways between
X and Y.
The instrument is an indirect cause
of Y, but only through X (no other
unblocked paths connecting I & Y).
Covariates used in matching are
not enough to block all the
back-door paths.
Too quick to discount the possibility
that the instruments are connected to
Y through unmeasured pathways.
According to
Practitioners:
Thank You!

More Related Content

What's hot

Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overview
Rizwan S A
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Sagar Kamble
 
Bayesian decision making in clinical research
Bayesian decision making in clinical researchBayesian decision making in clinical research
Bayesian decision making in clinical research
Bhaswat Chakraborty
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
nszakir
 
Spss software
Spss softwareSpss software
Spss software
ManonmaniA3
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
Bhavik A Shah
 
Z test, f-test,etc
Z test, f-test,etcZ test, f-test,etc
Z test, f-test,etc
Juan Apolinario Reyes
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
Autocorrelation (1)
Autocorrelation (1)Autocorrelation (1)
Autocorrelation (1)
Manokamna Kochar
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
Jaey Jeong
 
Chapter13
Chapter13Chapter13
Chapter13
rwmiller
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
NBER
 
Statistical significance
Statistical significanceStatistical significance
Statistical significance
Mai Ngoc Duc
 
Parametric & non parametric
Parametric & non parametricParametric & non parametric
Parametric & non parametric
ANCYBS
 
Poisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric DistributionPoisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric Distribution
DataminingTools Inc
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
Jeremy Lane
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
Amit Sharma
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time Series
Manu Antony
 
Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
Long Beach City College
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Shameer P Hamsa
 

What's hot (20)

Confidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overviewConfidence Intervals: Basic concepts and overview
Confidence Intervals: Basic concepts and overview
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Bayesian decision making in clinical research
Bayesian decision making in clinical researchBayesian decision making in clinical research
Bayesian decision making in clinical research
 
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...Chapter 6 part2-Introduction to Inference-Tests of Significance,  Stating Hyp...
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
 
Spss software
Spss softwareSpss software
Spss software
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
Z test, f-test,etc
Z test, f-test,etcZ test, f-test,etc
Z test, f-test,etc
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Autocorrelation (1)
Autocorrelation (1)Autocorrelation (1)
Autocorrelation (1)
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Chapter13
Chapter13Chapter13
Chapter13
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Statistical significance
Statistical significanceStatistical significance
Statistical significance
 
Parametric & non parametric
Parametric & non parametricParametric & non parametric
Parametric & non parametric
 
Poisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric DistributionPoisson Distribution, Poisson Process & Geometric Distribution
Poisson Distribution, Poisson Process & Geometric Distribution
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
Analysis of Time Series
Analysis of Time SeriesAnalysis of Time Series
Analysis of Time Series
 
Basics of Hypothesis Testing
Basics of Hypothesis TestingBasics of Hypothesis Testing
Basics of Hypothesis Testing
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Viewers also liked

Τweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match FixingΤweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match Fixing
Antigoni-Maria Founta
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
Antigoni-Maria Founta
 
Experimental Causal Inference
Experimental Causal InferenceExperimental Causal Inference
Experimental Causal Inference
Antigoni-Maria Founta
 
Social Media Fraud Metrics
Social Media Fraud MetricsSocial Media Fraud Metrics
Social Media Fraud Metrics
Antigoni-Maria Founta
 
Transitivity of Trust
Transitivity of TrustTransitivity of Trust
Transitivity of Trust
Antigoni-Maria Founta
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
Antigoni-Maria Founta
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval Engine
Antigoni-Maria Founta
 
Semantic Linked Data
Semantic Linked DataSemantic Linked Data
Linked data and Graph properties
Linked data and Graph propertiesLinked data and Graph properties
Linked data and Graph properties
Praxitelis Nikolaos Kouroupetroglou
 
Incremental clustering in search engines
Incremental clustering in search enginesIncremental clustering in search engines
Incremental clustering in search engines
Praxitelis Nikolaos Kouroupetroglou
 

Viewers also liked (10)

Τweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match FixingΤweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match Fixing
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
 
Experimental Causal Inference
Experimental Causal InferenceExperimental Causal Inference
Experimental Causal Inference
 
Social Media Fraud Metrics
Social Media Fraud MetricsSocial Media Fraud Metrics
Social Media Fraud Metrics
 
Transitivity of Trust
Transitivity of TrustTransitivity of Trust
Transitivity of Trust
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval Engine
 
Semantic Linked Data
Semantic Linked DataSemantic Linked Data
Semantic Linked Data
 
Linked data and Graph properties
Linked data and Graph propertiesLinked data and Graph properties
Linked data and Graph properties
 
Incremental clustering in search engines
Incremental clustering in search enginesIncremental clustering in search engines
Incremental clustering in search engines
 

Similar to Estimating Causal Effects from Observations

Py data19 final
Py data19   finalPy data19   final
Py data19 final
Maria Navarro Jiménez
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Amit Sharma
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
ShayanChowdary
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
MenglinLiu1
 
Detail Study of the concept of Regression model.pptx
Detail Study of the concept of  Regression model.pptxDetail Study of the concept of  Regression model.pptx
Detail Study of the concept of Regression model.pptx
truptikulkarni2066
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
Avjinder (Avi) Kaler
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
martyynyyte
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
ajondaree
 
Representing and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıonRepresenting and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıon
Azdeen Najah
 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
Malik Hassan Qayyum 🕵🏻‍♂️
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
TanmayVijay1
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
anhlodge
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Maninda Edirisooriya
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
Amish Akbar
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
AbdalrahmanTahaJaya
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
AbdalrahmanTahaJaya
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
Machine Learning Valencia
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
weka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
DataminingTools Inc
 
Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
Matthew L Levy
 

Similar to Estimating Causal Effects from Observations (20)

Py data19 final
Py data19   finalPy data19   final
Py data19 final
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Anomaly detection Full Article
Anomaly detection Full ArticleAnomaly detection Full Article
Anomaly detection Full Article
 
Detail Study of the concept of Regression model.pptx
Detail Study of the concept of  Regression model.pptxDetail Study of the concept of  Regression model.pptx
Detail Study of the concept of Regression model.pptx
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Representing and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıonRepresenting and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıon
 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
15 anomaly detection
15 anomaly detection15 anomaly detection
15 anomaly detection
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Simple egression.pptx
Simple egression.pptxSimple egression.pptx
Simple egression.pptx
 
Simple Linear Regression.pptx
Simple Linear Regression.pptxSimple Linear Regression.pptx
Simple Linear Regression.pptx
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
 

Recently uploaded

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Estimating Causal Effects from Observations

  • 1. Estimating Causal Effects from Observations Advanced Data Analysis from an Elementary Point of View
  • 2. Credits Team The slides below are derived from the Chapter 27 of the book “Advanced Data Analysis from an Elementary Point of View“ by Cosma Shalizi of the Carnegie Mellon University, which was created in order to assist the “Advanced Data Analysis” course of the CMU. Antigoni-Maria Founta, UIM: 647 Ioannis Athanasiadis, UIM: 607
  • 3. Overview Identification & Estimation ➔ Back-door Criteria & Estimators ◆ Estimating Average Causal Effects ◆ Avoiding Estimating Marginal Distributions ◆ Matching ◆ Propensity Scores ◆ Propensity Score Matching ➔ Instrumental Variables Estimates Conclusion ➔ Uncertainty & Inference ➔ Comparison of Methods
  • 4. Identification of Causal Effects ● Causal Conditioning: Pr(Y|do(X = x)) //Counter-factual, not always identifiable even if X & Y are observable ● Probabilistic Conditioning: Pr(Y|X = x) //Factual, always identifiable if X & Y are observable Our goal is to identify when quantities like the former are functions of the distribution of observable variables! And once identified, the next question is how we can actually estimate it...
  • 5. Identification/Estimation of Causal Effects Each one of the next techniques provides us with both identification of causal effects as well as formulas to estimate them: 1. Back-door Criterion 2. Front-door Criterion 3. Instrumental Variables * Since estimating with the front-door criterion amounts to doing two rounds of back-door adjustment, we will continue this lecture working only with the back-door criterion. *
  • 7. Back-door criterion CRITERIA: I. S blocks every back-door path between X and Y II. No node in S is a descendant of X S is each subset that meets the back-door criteria. E.g. ● S={S1,S2} or S={S3} or S={S1,S2,S3} meet criteria ● S={S1} or S={S2} or S={S3,B} fail the criteria
  • 8. Back-door Criteria Formula When S satisfies the back-door criterion, the formula to estimate the Causal Conditioning becomes: ...so we have transformed the unidentifiable Causal Conditioning to a distribution of observables.
  • 9. Estimations in the Back-Door Criteria There are some special cases and tricks which are worth knowing about, that will be analysed below. 1. Estimating Average Causal Effects 2. Avoiding Estimating Marginal Distributions 3. Matching 4. Propensity Scores 5. Propensity Score Matching 1 2 3 4 5
  • 10. 1. Estimating average causal effects I Finding the average causal effects can summarize what there is to know of the effect X has on Y. We don’t always learn all there is to know, however it’s still useful! ● Average Effect (or sometimes just the effect of do(X=x)) * *When it makes sense for Y to have an expectation value
  • 11. 1. Estimating average causal effects II ● If we apply the Back-Door Criterion Formula with control variables S: => µ(x, s) (Inner Conditional Expectation => Regression Function) We can compute µ(x, s) but we still need to know Pr(S=s)...
  • 12. 2. Avoiding Estimating Marginal Distributions ● The demands for computing, enumerating and summarizing the various distributions of the Back-Door Criteria Formula is very high, so we need to reduce these burdens. One shortcut is to smartly enumerate the possible values of S. ● How? Law of Large Numbers! With a large sample:
  • 13. 3. Matching (Binary Levels) Combination of the above methods: ● By the Estimation of Average Causal Effects we got: E [Y|do(X=x)] = ΣPr(S=s)µ(x, s) ● By the Law of Large Numbers we got: E [Y|do(X=x)] = 1/n ΣPr(Y=y | X = x, S = si ) When the variable is binary, we are interested in the expectations of X=0 or 1. So we want the difference: E [Y|do(X=1)] - E [Y|do(X=0)] This is called Average Treatment Effect or ATE!
  • 14. 3. Matching (Binary Levels) ATE = E [Y|do(X=1)] - E [Y|do(X=0)] → OR
  • 15. 3. Matching (Binary Levels) ● But what we can never observe µ(x, s) - we need to consider the errors! ● What we can see is Y = µ(xi , si ) + ei ! ● So if we want to compute Yi - Yj we will actually have:
  • 16. 3. Matching (Binary Levels) If we make an assumption that there is an s whose value is the same for two units i and i*, where X=0 and X=1, then we can say that: So, if we can find a match i* for every unit i, then: Matching works vastly better than estimating μ through a linear model!
  • 17. 3. Matching (Binary Levels) - Conclusion Matching is really just Nearest Neighbor Regression! ● Many technicalities arise. What if we match a unit against multiple units / we can’t find exact match? If not an exact match -> match each treated unit against the control-group unit with the closest values of the covariates. ● Matching does not solve the identification problem. We need S to satisfy the back-door criterion! ● As any NN method, many issues arise. Fixed K → low bias, but not always what we want. Prioritizing low bias is not always a good idea. Bias - Variance tradeoff is a tradeoff!
  • 18. 4. Propensity Scores ● Curse of dimensionality over data or conditional distributions is actually manageable if control variables have few dimensions. ● If R, S sets of control variables, both satisfy back-door criterion and have all else equal but dimensions, we should choose the one with the fewer variables (e.g. R). ● Especially when we can say that R = f(S): if S satisfies the back-door criterion, then so does R. Since R is a function of S, both the computational and the statistical problems which come from using R are no worse than those of using S, and possibly much better, if R has much lower dimension.
  • 19. 4. Propensity Scores ● All that is required is that some combinations of the variables in S carry the same information about X as the whole of S does - summary score! ● We (hope we) can reduce a p-dimensional set of control variables to a one-dimensional set. ● In reality that is difficult as it depends on the functional form a non-parametric case. However, there is a special case where we can possibly use one-dimensional summary; when X is binary! So in that case: Propensity score: f(S) = Pr(X=1|S=s), which is our summary R ● However, still no analytical formula for f(S) so it needs to be modeled and estimated. The most common model used is Logistic Regression, but in a high-dimensional S that would also suffer from curse of dimensionality.
  • 20. 5. Propensity Score Matching ● Even if X is a binary variable, if S suffers from high dimensions then matching is a really tough task → the more the levels of S variables, the lower the matching. ● Solution: Matching on R = f(S) - propensity scores! ● The gain here is in computational tractability and (perhaps) statistical efficiency, not in fundamental identification. Que sera, sera… Incredibly popular method ever since, huge number of implementations.
  • 21. Overview Identification & Estimation ➔ Back-door Criteria & Estimators ◆ Estimating Average Causal Effects ◆ Avoiding Estimating Marginal Distributions ◆ Matching ◆ Propensity Scores ◆ Propensity Score Matching ➔ Instrumental Variables Estimates Conclusion ➔ Uncertainty & Inference ➔ Comparison of Methods
  • 23. Instrumental Variables ● I is an instrument for identifying the effect of X on Y if: ○ I is a cause of X ○ I is associated to Y only through X ○ Given some controls S, I is a valid instrument when every path from I to Y left open by S has an arrow into X ● Change one-unit in I → changes α-unit in X → changes αβ-unit in Y ● So, β is the way for us to estimate the causal coefficient of X on Y! ● We need to estimate β so that we satisfy all the above criteria. However, things get very complicated when we care about many instruments and/or interest about causal of multiple variables.
  • 24. Instrumental Variables ● In general, we need two steps * //2-stage Regression / 2-stage Least Squares 1. Regress X on I and S. Call the fitted values xˆ. //We see how much changes in the instruments affect X. 2. Regress Y on xˆ and S, but not on I. The coefficient of Y on xˆ is a consistent estimate of β. //We see how much these I-caused changes in X change Y ● In the simplest case, when everything is linear, the above procedure is calculated by the Wald estimator: * This works, provided that the linearity assumption holds.
  • 25. Instrumental Variables ● There are circumstances where it is possible to use instrumental variables in nonlinear and even nonparametric models, BUT the technique becomes far more complicated. ● Solving integral equations like this: It is not impossible, BUT it’s painful :-( The techniques needed are really complicated.
  • 26. Overview Identification & Estimation ➔ Back-door Criteria & Estimators ◆ Estimating Average Causal Effects ◆ Avoiding Estimating Marginal Distributions ◆ Matching ◆ Propensity Scores ◆ Propensity Score Matching ➔ Instrumental Variables Estimates Conclusion ➔ Uncertainty & Inference ➔ Comparison of Methods
  • 27. Uncertainty & Inference We can reduce the problem of causal inference → ordinary statistical inference. So, in general, with the use of analytical formulas we can assess our uncertainty about any of our estimates of causal effects the same way we would assess any other statistical inference (e.g. standard errors). However, in the two-stage least squares, taking standard errors, etc for β from the usual formulas for the second regression neglects the fact that this estimate of β comes from regressing Y on x, which is it itself an estimate and so uncertain.
  • 28. Comparison of Methods Matching Instrumental Variables (+) Clever (-) Assumptions on underlying DAG Crucial Point The covariates block all back-door pathways between X and Y. The instrument is an indirect cause of Y, but only through X (no other unblocked paths connecting I & Y). Covariates used in matching are not enough to block all the back-door paths. Too quick to discount the possibility that the instruments are connected to Y through unmeasured pathways. According to Practitioners: