SlideShare a Scribd company logo
Experimental
Causal
Inference
Advanced Data Analysis
from an Elementary Point of View
Credits Team
The slides below are derived from the
Chapter 26 of the book “Advanced Data
Analysis from an Elementary Point of
View“ by Cosma Shalizi of the Carnegie
Mellon University, which was created in
order to assist the “Advanced Data
Analysis” course of the CMU.
The example we used is derived from the
notes of Prof. Rosenbaum et al for the
Department of Statistics, of the University
of Pennsylvania
Antigoni-Maria Founta,
UID: 647
Ioannis Athanasiadis,
UID: 607
Overview
➔ CI vs ECI
➔ Why ECI
➔ Example-Driver ECI
➔ Basic Idea
➔ Randomization
◆ Jargon
◆ Causal Identification & Linearity
➔ Open Issues
◆ Randomization Issues
◆ Choice of Levels
◆ Other Issues
CI vs ECI
Causal Inference (CI) is the undertaking of trying to
answer causal questions from empirical data.
Experimental Causal Inference (ECI) is CI that is based on
experiments rather than observations.
“You can only prove causality with statistics.”
F. Mosteller
Why ECI?
Experimental CI is very useful to answer particular questions!
Observations suffer from hidden bias.
Using experiments to prove causality is very powerful,
...but...
Things are much more complicated (need to design the experiments).
Example-driven ECI
● At age 45, Ms. Smith is diagnosed with stage II breast cancer.
● Her oncologist discusses with her two possible treatments: (i) lumpectomy
alone, or (ii) lumpectomy plus irradiation. They decide on (ii).
● Ten years later, Ms. Smith is alive and the tumor has not recurred.
● Her surgeon, Steve, and her radiologist, Rachael debate:
Rachael says: “The irradiation prevented the recurrence – without it, the tumor
would have recurred.”
Steve says: “You can’t know that. It’s a fantasy – you’re making it up. We’ll never
know.”
Overview
➔ CI vs ECI
➔ Why ECI
➔ Example-Driver ECI
➔ Basic Idea
➔ Randomization
◆ Jargon
◆ Causal Identification & Linearity
➔ Open Issues
◆ Randomization Issues
◆ Choice of Levels
◆ Other Issues
Basic Idea behind Experimental Design
1. Maximize Useful Variation
2. Eliminate Unhelpful Variation
3. Randomize what we cannot Eliminate
1. Maximize Useful Variation
● If treatments are identified as important regarding causation, then we want to
maximize the possible manipulations in order to spot any interesting behaviour.
● That idea applies even if we want to show that a treatment has no effect.
Basically: we can only learn anything about how Y relates
to X if X varies.
2. Eliminate Unhelpful Variation
A. Precision of Measurement
// Easy to say and often the right thing to do, but typically reaches limits.
B. Homogenization of Units
// Can raise concerns about generalization to a less-homogeneous population.
C. Limiting comparison to similar units
//The principle behind doing a paired t-test rather than an unpaired, and generally of
trying to eliminate the consequences of uncontrolled variation by matching.
3. Randomize what can’t be eliminated
The great trick of Ronald Fisher!*
// Makes the distribution of uncontrolled variables the same across treatments, so they are
statistically homogeneous.
*Author of the book “The arrangement of Field Experiments” (1926), precursor of the “Design
of Experiments” book!
Important: randomly assigned Z!
Overview
➔ CI vs ECI
➔ Why ECI
➔ Example-Driver ECI
➔ Basic Idea
➔ Randomization
◆ Jargon
◆ Causal Identification & Linearity
➔ Open Issues
◆ Randomization Issues
◆ Choice of Levels
◆ Other Issues
Randomization
Jargon
Unit
X = 0
Y = 1
Z = 0
Treatments:
Variables X, Y, Z
Levels of X:
e.g. 0,1,2,3
control condition: 0
Manipulation for X=0, Y=1, Z=0
Features
Instances
Variables: Observations + Treatments
Jargon
Patient
X = 0 Y = 1
Treatments:
X - Irradiation Usage
Levels of X:
0→ Lumpectomy with
Irradiation
1→ Lumpectomy
without Irradiation
control condition: 0
Manipulation of X
Observable Var:
Y - Cancer Recurrence
Values:
0 → Yes / 1 → No
Jargon
Unit Examples
Randomization & Linear Models
In all the below-mentioned cases, linear models (e.g. Linear Regression) can be
sufficient for the estimation of the expected causal effects, either entirely or under
conditions.
● Randomize one treatment
○ Binary Values
Coefficient on X: E[Y|X=1]-E[Y|X=0]
○ Discrete Values
Coefficients on X: E[Y|X=x]-E[Y|X=0] //for all x
● Randomize multiple treatments
E[Y|do(X=x,Z=z)] = μ + fX
(x) + fZ
(z) + fXZ
(x,z) //only if levels of X and Z are discrete
Randomization & Non-Linear Models
● If the levels of the treatments are continuous and have been discretized for the
purpose of the experiment, then linear models are not fitting well.
Why? Because we can’t generalize without concerning the continuous nature of the
treatment!
● It is better to use non-linear models (like a spline or a kernel).
● Important: at least three levels are needed!
Linear vs Non-Linear
In a randomized experiment with
discrete levels of a treatment X, linear
models can be perfectly adequate to
estimate the expected causal effects
for those levels. Instead, when there is a need for
generalization to any values of X we
should use an established regression
model.
Overview
➔ CI vs ECI
➔ Why ECI
➔ Example-Driver ECI
➔ Basic Idea
➔ Randomization
◆ Jargon
◆ Causal Identification & Linearity
➔ Open Issues
◆ Randomization Issues
◆ Choice of Levels
◆ Other Issues
Open Issues
Randomization Issues
● Modes of Randomization: Assignment of Treatments
○ IID Assignment: Independent assignment of treatments to each unit
// easy; may lead to lack of balance & issues with constraints
○ Planned Assignment: Assignment according to a fixed schedule applied independently of the
units’ attributes
// complexity; guarantee of balance and constraints
● Perspectives: Units vs Treatments
○ Unit Perspective: fixed units, variate treatments
○ Treatment Perspective: fixed treatment levels, variate unit sampling
// The second is more useful (though harder to understand), because we care about consequences of
treatments, not units!
Choice of Levels
Discretization of continuous values depends on the goal of the experiment.
Goals:
1. Parameter Estimation or Prediction
2. Maximizing Yield
3. Model Discrimination
4. Multiple Goals
Other Issues
● Multiple Manipulated Variables: we want to consider all combinations of all variables.
To achieve that: factorial design!
○ Advantages: can detect all possible interactions
○ Disadvantages: cost!
→ Solution: Partial factorial design!
● Blocking: Divide experimental units into relatively-homogeneous “blocks”.
Other Issues
● “What the experiments died of” aka failures of randomization:
○ Subjectivity of influence (placebo effect, expectations, Hawthorne effect)
○ Threat to generalization to other populations
e.g. experimentation on a school vs generalizing on all schools
○ Non-compliance
○ Non-adequate sample in order to generalize
○ Interference between units
Thank You!

More Related Content

What's hot

Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingPenn State University
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Sandra Nicks
 
The challenge of small data
The challenge of small dataThe challenge of small data
The challenge of small data
Stephen Senn
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
Maarten van Smeden
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
Stephen Senn
 
Absence of a gold standard in diagnostic test accuracy research
Absence of a gold standard in diagnostic test accuracy researchAbsence of a gold standard in diagnostic test accuracy research
Absence of a gold standard in diagnostic test accuracy research
Maarten van Smeden
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
Stephen Senn
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
StephenSenn2
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experiments
nszakir
 
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Laure Wynants
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
StephenSenn2
 
What is your question
What is your questionWhat is your question
What is your question
StephenSenn2
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximations
Burak Kürsad Günhan
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
Maarten van Smeden
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
Stephen Senn
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Designnszakir
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
Maarten van Smeden
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
Anuj Bhatia
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
Stephen Senn
 

What's hot (20)

Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-Making
 
Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011Aron chpt 7 ed effect size f2011
Aron chpt 7 ed effect size f2011
 
GIADMI Portfolio Poster 1
GIADMI Portfolio Poster 1GIADMI Portfolio Poster 1
GIADMI Portfolio Poster 1
 
The challenge of small data
The challenge of small dataThe challenge of small data
The challenge of small data
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
Absence of a gold standard in diagnostic test accuracy research
Absence of a gold standard in diagnostic test accuracy researchAbsence of a gold standard in diagnostic test accuracy research
Absence of a gold standard in diagnostic test accuracy research
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experiments
 
Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...Prediction research in a pandemic: 3 lessons from a living systematic review ...
Prediction research in a pandemic: 3 lessons from a living systematic review ...
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
 
What is your question
What is your questionWhat is your question
What is your question
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximations
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Design
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 

Viewers also liked

Τweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match FixingΤweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match Fixing
Antigoni-Maria Founta
 
Estimating Causal Effects from Observations
Estimating Causal Effects from ObservationsEstimating Causal Effects from Observations
Estimating Causal Effects from Observations
Antigoni-Maria Founta
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
Antigoni-Maria Founta
 
Social Media Fraud Metrics
Social Media Fraud MetricsSocial Media Fraud Metrics
Social Media Fraud Metrics
Antigoni-Maria Founta
 
Transitivity of Trust
Transitivity of TrustTransitivity of Trust
Transitivity of Trust
Antigoni-Maria Founta
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
Antigoni-Maria Founta
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval Engine
Antigoni-Maria Founta
 
Semantic Linked Data
Semantic Linked DataSemantic Linked Data
Linked data and Graph properties
Linked data and Graph propertiesLinked data and Graph properties
Linked data and Graph properties
Praxitelis Nikolaos Kouroupetroglou
 
Incremental clustering in search engines
Incremental clustering in search enginesIncremental clustering in search engines
Incremental clustering in search engines
Praxitelis Nikolaos Kouroupetroglou
 

Viewers also liked (10)

Τweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match FixingΤweetfix: Data Analytics on Match Fixing
Τweetfix: Data Analytics on Match Fixing
 
Estimating Causal Effects from Observations
Estimating Causal Effects from ObservationsEstimating Causal Effects from Observations
Estimating Causal Effects from Observations
 
Exploring Language Communities on Github
Exploring Language Communities on GithubExploring Language Communities on Github
Exploring Language Communities on Github
 
Social Media Fraud Metrics
Social Media Fraud MetricsSocial Media Fraud Metrics
Social Media Fraud Metrics
 
Transitivity of Trust
Transitivity of TrustTransitivity of Trust
Transitivity of Trust
 
Opinion mining
Opinion miningOpinion mining
Opinion mining
 
Periscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval EnginePeriscope: A Content-based Image Retrieval Engine
Periscope: A Content-based Image Retrieval Engine
 
Semantic Linked Data
Semantic Linked DataSemantic Linked Data
Semantic Linked Data
 
Linked data and Graph properties
Linked data and Graph propertiesLinked data and Graph properties
Linked data and Graph properties
 
Incremental clustering in search engines
Incremental clustering in search enginesIncremental clustering in search engines
Incremental clustering in search engines
 

Similar to Experimental Causal Inference

Endogeneity and Entrepreneurship Research
Endogeneity and Entrepreneurship ResearchEndogeneity and Entrepreneurship Research
Endogeneity and Entrepreneurship Research
Brian Anderson
 
Causality and Propensity Score Methods
Causality and Propensity Score MethodsCausality and Propensity Score Methods
Causality and Propensity Score Methods
inovex GmbH
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
Sean Yu
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
Florian Wilhelm
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Amit Sharma
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
Edhole.com
 
Statistic
StatisticStatistic
Statistic
As Siyam
 
4646150.ppt
4646150.ppt4646150.ppt
4646150.ppt
TulkinChulliev
 
Sample Size And Gpower Module
Sample Size And Gpower ModuleSample Size And Gpower Module
Sample Size And Gpower Module
llalablink
 
sample_size_Determination .pdf
sample_size_Determination .pdfsample_size_Determination .pdf
sample_size_Determination .pdf
statsanjal
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
Rajesh Mishra
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
NBER
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Research methodology 2
Research methodology 2Research methodology 2
Research methodology 2
Indian dental academy
 
Supervised Machine learning Algorithm.pptx
Supervised Machine learning Algorithm.pptxSupervised Machine learning Algorithm.pptx
Supervised Machine learning Algorithm.pptx
King Khalid University
 
supervised-learning.pptx
supervised-learning.pptxsupervised-learning.pptx
supervised-learning.pptx
GandhiMathy6
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
Global Polis
 

Similar to Experimental Causal Inference (20)

Endogeneity and Entrepreneurship Research
Endogeneity and Entrepreneurship ResearchEndogeneity and Entrepreneurship Research
Endogeneity and Entrepreneurship Research
 
Causality and Propensity Score Methods
Causality and Propensity Score MethodsCausality and Propensity Score Methods
Causality and Propensity Score Methods
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 
Statistic
StatisticStatistic
Statistic
 
4646150.ppt
4646150.ppt4646150.ppt
4646150.ppt
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Sample Size And Gpower Module
Sample Size And Gpower ModuleSample Size And Gpower Module
Sample Size And Gpower Module
 
sample_size_Determination .pdf
sample_size_Determination .pdfsample_size_Determination .pdf
sample_size_Determination .pdf
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Quantitative analysis
Quantitative analysisQuantitative analysis
Quantitative analysis
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Research methodology 2
Research methodology 2Research methodology 2
Research methodology 2
 
Supervised Machine learning Algorithm.pptx
Supervised Machine learning Algorithm.pptxSupervised Machine learning Algorithm.pptx
Supervised Machine learning Algorithm.pptx
 
supervised-learning.pptx
supervised-learning.pptxsupervised-learning.pptx
supervised-learning.pptx
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

Experimental Causal Inference

  • 2. Credits Team The slides below are derived from the Chapter 26 of the book “Advanced Data Analysis from an Elementary Point of View“ by Cosma Shalizi of the Carnegie Mellon University, which was created in order to assist the “Advanced Data Analysis” course of the CMU. The example we used is derived from the notes of Prof. Rosenbaum et al for the Department of Statistics, of the University of Pennsylvania Antigoni-Maria Founta, UID: 647 Ioannis Athanasiadis, UID: 607
  • 3. Overview ➔ CI vs ECI ➔ Why ECI ➔ Example-Driver ECI ➔ Basic Idea ➔ Randomization ◆ Jargon ◆ Causal Identification & Linearity ➔ Open Issues ◆ Randomization Issues ◆ Choice of Levels ◆ Other Issues
  • 4. CI vs ECI Causal Inference (CI) is the undertaking of trying to answer causal questions from empirical data. Experimental Causal Inference (ECI) is CI that is based on experiments rather than observations. “You can only prove causality with statistics.” F. Mosteller
  • 5. Why ECI? Experimental CI is very useful to answer particular questions! Observations suffer from hidden bias. Using experiments to prove causality is very powerful, ...but... Things are much more complicated (need to design the experiments).
  • 6. Example-driven ECI ● At age 45, Ms. Smith is diagnosed with stage II breast cancer. ● Her oncologist discusses with her two possible treatments: (i) lumpectomy alone, or (ii) lumpectomy plus irradiation. They decide on (ii). ● Ten years later, Ms. Smith is alive and the tumor has not recurred. ● Her surgeon, Steve, and her radiologist, Rachael debate: Rachael says: “The irradiation prevented the recurrence – without it, the tumor would have recurred.” Steve says: “You can’t know that. It’s a fantasy – you’re making it up. We’ll never know.”
  • 7. Overview ➔ CI vs ECI ➔ Why ECI ➔ Example-Driver ECI ➔ Basic Idea ➔ Randomization ◆ Jargon ◆ Causal Identification & Linearity ➔ Open Issues ◆ Randomization Issues ◆ Choice of Levels ◆ Other Issues
  • 8. Basic Idea behind Experimental Design 1. Maximize Useful Variation 2. Eliminate Unhelpful Variation 3. Randomize what we cannot Eliminate
  • 9. 1. Maximize Useful Variation ● If treatments are identified as important regarding causation, then we want to maximize the possible manipulations in order to spot any interesting behaviour. ● That idea applies even if we want to show that a treatment has no effect. Basically: we can only learn anything about how Y relates to X if X varies.
  • 10. 2. Eliminate Unhelpful Variation A. Precision of Measurement // Easy to say and often the right thing to do, but typically reaches limits. B. Homogenization of Units // Can raise concerns about generalization to a less-homogeneous population. C. Limiting comparison to similar units //The principle behind doing a paired t-test rather than an unpaired, and generally of trying to eliminate the consequences of uncontrolled variation by matching.
  • 11. 3. Randomize what can’t be eliminated The great trick of Ronald Fisher!* // Makes the distribution of uncontrolled variables the same across treatments, so they are statistically homogeneous. *Author of the book “The arrangement of Field Experiments” (1926), precursor of the “Design of Experiments” book!
  • 13. Overview ➔ CI vs ECI ➔ Why ECI ➔ Example-Driver ECI ➔ Basic Idea ➔ Randomization ◆ Jargon ◆ Causal Identification & Linearity ➔ Open Issues ◆ Randomization Issues ◆ Choice of Levels ◆ Other Issues
  • 15. Jargon Unit X = 0 Y = 1 Z = 0 Treatments: Variables X, Y, Z Levels of X: e.g. 0,1,2,3 control condition: 0 Manipulation for X=0, Y=1, Z=0 Features Instances Variables: Observations + Treatments
  • 16. Jargon Patient X = 0 Y = 1 Treatments: X - Irradiation Usage Levels of X: 0→ Lumpectomy with Irradiation 1→ Lumpectomy without Irradiation control condition: 0 Manipulation of X Observable Var: Y - Cancer Recurrence Values: 0 → Yes / 1 → No
  • 18. Randomization & Linear Models In all the below-mentioned cases, linear models (e.g. Linear Regression) can be sufficient for the estimation of the expected causal effects, either entirely or under conditions. ● Randomize one treatment ○ Binary Values Coefficient on X: E[Y|X=1]-E[Y|X=0] ○ Discrete Values Coefficients on X: E[Y|X=x]-E[Y|X=0] //for all x ● Randomize multiple treatments E[Y|do(X=x,Z=z)] = μ + fX (x) + fZ (z) + fXZ (x,z) //only if levels of X and Z are discrete
  • 19. Randomization & Non-Linear Models ● If the levels of the treatments are continuous and have been discretized for the purpose of the experiment, then linear models are not fitting well. Why? Because we can’t generalize without concerning the continuous nature of the treatment! ● It is better to use non-linear models (like a spline or a kernel). ● Important: at least three levels are needed!
  • 20. Linear vs Non-Linear In a randomized experiment with discrete levels of a treatment X, linear models can be perfectly adequate to estimate the expected causal effects for those levels. Instead, when there is a need for generalization to any values of X we should use an established regression model.
  • 21. Overview ➔ CI vs ECI ➔ Why ECI ➔ Example-Driver ECI ➔ Basic Idea ➔ Randomization ◆ Jargon ◆ Causal Identification & Linearity ➔ Open Issues ◆ Randomization Issues ◆ Choice of Levels ◆ Other Issues
  • 23. Randomization Issues ● Modes of Randomization: Assignment of Treatments ○ IID Assignment: Independent assignment of treatments to each unit // easy; may lead to lack of balance & issues with constraints ○ Planned Assignment: Assignment according to a fixed schedule applied independently of the units’ attributes // complexity; guarantee of balance and constraints ● Perspectives: Units vs Treatments ○ Unit Perspective: fixed units, variate treatments ○ Treatment Perspective: fixed treatment levels, variate unit sampling // The second is more useful (though harder to understand), because we care about consequences of treatments, not units!
  • 24. Choice of Levels Discretization of continuous values depends on the goal of the experiment. Goals: 1. Parameter Estimation or Prediction 2. Maximizing Yield 3. Model Discrimination 4. Multiple Goals
  • 25. Other Issues ● Multiple Manipulated Variables: we want to consider all combinations of all variables. To achieve that: factorial design! ○ Advantages: can detect all possible interactions ○ Disadvantages: cost! → Solution: Partial factorial design! ● Blocking: Divide experimental units into relatively-homogeneous “blocks”.
  • 26. Other Issues ● “What the experiments died of” aka failures of randomization: ○ Subjectivity of influence (placebo effect, expectations, Hawthorne effect) ○ Threat to generalization to other populations e.g. experimentation on a school vs generalizing on all schools ○ Non-compliance ○ Non-adequate sample in order to generalize ○ Interference between units