SlideShare a Scribd company logo
D A T A D A Y T E X A S 2 0 1 8
Making Causal Claims as a Data
Scientist: Tips and Tricks Using R
Lucy D’Agostino McGowan
D A T A D A Y T E X A S 2 0 1 8
http://bit.ly/LucyStatsDDTX18
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/552/
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/552/
D A T A D A Y T E X A S 2 0 1 8
D A T A D A Y T E X A S 2 0 1 8
D A T A D A Y T E X A S 2 0 1 8
✌ parts
1. Discuss some ways to strengthen a causal
argument: Hill’s criteria
2. Discuss a specific causal inference method:
propensity scores + sensitivity analyses
D A T A D A Y T E X A S 2 0 1 8
Hill’s Criteria
h-ps://commons.wikimedia.org/wiki/File:AusBn_Bradford_Hill.jpg
D A T A D A Y T E X A S 2 0 1 8
http://nssdeviations.com
h<p://livefreeordichotomize.com/2016/12/15/hill-for-the-data-scienDst-an-xkcd-story/
D A T A D A Y T E X A S 2 0 1 8
Strength
h-ps://xkcd.com/539/
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/242/
Consistency
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/1217/
Specificity
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/925/
Temporality
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/323/
Biological
gradient
D A T A D A Y T E X A S 2 0 1 8
h-ps://xkcd.com/605/
Plausibility
D A T A D A Y T E X A S 2 0 1 8
Coherence
h-ps://xkcd.com/1170/
D A T A D A Y T E X A S 2 0 1 8
Analogy
h-ps://xkcd.com/882/
D A T A D A Y T E X A S 2 0 1 8
Experiment
h-ps://xkcd.com/1462/
D A T A D A Y T E X A S 2 0 1 8
What if you can’t
do a controlled
experiment?
D A T A D A Y T E X A S 2 0 1 8
Observational Studies
Goal: Answer a research question
exposure outcome
D A T A D A Y T E X A S 2 0 1 8
“Gold Standard” – Randomized Controlled Trials
Treatment
Control
P=0.5
P=0.5
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
“Gold Standard” – Randomized Controlled Trials
A
B
P=0.5
P=0.5
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
Treatment
Control
P=0.9
P=0.1
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
Exposed
Unexposed
P=0.8
P=0.2
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
Exposed
Unexposed
P=0.8
P=0.2
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
Exposed
Unexposed
P=0.9
P=0.1
Observational Studies
D A T A D A Y T E X A S 2 0 1 8
D A T A D A Y T E X A S 2 0 1 8
D A T A D A Y T E X A S 2 0 1 8
Confounding
Exposure Outcome
Confounder
D A T A D A Y T E X A S 2 0 1 8
Confounding
Exposure Outcome
Confounder
D A T A D A Y T E X A S 2 0 1 8
Meaningful confounders
1. How imbalanced is the confounder between the
exposure groups?
2. How predictive is the confounder of the outcome?
D A T A D A Y T E X A S 2 0 1 8
Confounding
Exposure Outcome
Confounder
D A T A D A Y T E X A S 2 0 1 8
Meaningful confounders
1. How imbalanced is the confounder between the
exposure groups?
2. How predictive is the confounder of the outcome?
D A T A D A Y T E X A S 2 0 1 8
Confounding
Exposure Outcome
Confounder
D A T A D A Y T E X A S 2 0 1 8
D A T A D A Y T E X A S 2 0 1 8
Rosenbaum and Rubin showed in observational studies, conditioning on
propensity scores can lead to unbiased estimates of the exposure effect
1. There are no unmeasured confounders
2. Every subject has a nonzero probability of
receiving either exposure
Propensity scores
D A T A D A Y T E X A S 2 0 1 8
Rosenbaum and Rubin showed in observational studies, conditioning on
propensity scores can lead to unbiased estimates of the exposure effect
1. There are no unmeasured confounders
2. Every subject has a nonzero probability of
receiving either exposure
Propensity scores
D A T A D A Y T E X A S 2 0 1 8
• Fit a logistic regression predicting exposure using known
covariates
Pr exposure = 1 =
1
1 + exp −X.
• Each individuals' predicted values are the propensity
scores
Propensity scores
D A T A D A Y T E X A S 2 0 1 8
Propensity scores
glm(exposure ~ smoker + cowboy_hat + sick . . .,
data = df,
family = binomial)
D A T A D A Y T E X A S 2 0 1 8
Propensity scores
glm(exposure ~ smoker + cowboy_hat + sick . . .,
data = df,
family = binomial) %>%
predict(type = “response”)
D A T A D A Y T E X A S 2 0 1 8
Propensity scores
glm(exposure ~ smoker + cowboy_hat + sick . . .,
data = df,
family = binomial) %>%
predict(type = “response”)
#> 1 2 3 4 5
#> 0.6967404 0.6370019 0.7195434 0.9079320 0.5905542
D A T A D A Y T E X A S 2 0 1 8
Propensity scores
p <- glm(exposure ~ smoker + cowboy_hat + sick . . .,
data = df,
family = binomial) %>%
predict(type = “response”)
D A T A D A Y T E X A S 2 0 1 8
0.6
0.9
0.8
0.2
0.7
0.1
0.2
0.3
0.2
0.7
0.8
0.2
0.3
0.7
0.2
0.1
0.1
0.6
0.1
0.3
0.9
Propensity scores
DA T A DA Y T E X A S 2 0 1 8
0.6
0.9
0.1
0.3
0.8
0.3
0.7
0.6
0.9
0.1
0.8
0.2
0.2
0.2
0.2
0.7
Propensity scores
DA T A DA Y T E X A S 2 0 1 8
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
Propensity scores
DA T A DA Y T E X A S 2 0 1 8
Propensity Scores
• Weighting
• Matching
• Stratification
• Direct Adjustment
• …
DA T A DA Y T E X A S 2 0 1 8
Propensity Scores
• Weighting
• Matching
• Stratification
• Direct Adjustment
• …
DA T A DA Y T E X A S 2 0 1 8
Weighting
!"#$ =
min{*+, 1 − *+}
0+*+ + (1 − 0+)(1 − *+)
DA T A DA Y T E X A S 2 0 1 8
Weighting
p_0 <- 1 - p
df$weight <- pmin(p, p_0) /
ifelse(exposure == 1, p, p_0)
DA T A DA Y T E X A S 2 0 1 8
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
Imbalance between
exposures
D A T A D A Y T E X A S 2 0 1 8
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9
Propensity Scores
Imbalance between
exposures
DA T A DA Y T E X A S 2 0 1 8
Standardized mean
difference
! =
#$%#&'(%! − #$*+%#&'(%!
(%#&'(%!
,
+ (*+%#&'(%!
,
,
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(survey)
svy_des <- svydesign(
ids = ~ 1,
data = df,
weights = ~ weight)
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(tableone)
smd_table <- svyCreateTableOne(
vars = c(“smoker”, “cowboy_hat”, “sick”, . . .),
strata = “exposure”,
data = svy_des,
test = FALSE)
print(smd_table, smd = TRUE)
DA T A DA Y T E X A S 2 0 1 8
Exposed Unexposed SMD
n 1897.07 1900.24
mustache 60.71 (16.95) 60.90 (15.61) 0.012
cowboy hat 204.59 (101.45) 202.90 (106.44) 0.016
glasses 70.77 (27.00) 70.51 (27.50) 0.01
smoker 37.46 (10.93) 37.40 (11.28) 0.005
crazy hair 2.27 (2.46) 2.27 (1.82) 0.003
smile 2.49 (5.48) 2.47 (4.80) 0.003
sick 30.92 (8.23) 30.95 (7.54) 0.003
harry potter 3.04 (0.69) 3.03 (0.93) 0.01
healthy 145.4 (7.7) 152.2 (8.0) 0.013
Standardized Mean Difference
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(tableone)
smd_table_unweighted <- CreateTableOne(
vars = c(“smoker”, “cowboy_hat”, “sick”, . . .),
strata = “exposure”,
data = df,
test = FALSE)
print(smd_table_unweighted, smd = TRUE)
DA T A DA Y T E X A S 2 0 1 8
Love Plots
DA T A DA Y T E X A S 2 0 1 8
Love Plots
Imbalance between
exposures
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(tidyr)
plot_df <- data.frame(
var = names(ExtractSmd(smd_table)),
Unadjusted = ExtractSmd(smd_table_unweighted ),
Weighted = ExtractSmd(smd_table)) %>%
gather("Method", "SMD", -var)
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(ggplot2)
ggplot(
data = plot_df,
mapping = aes(x = var, y = SMD, group = Method, color = Method)
) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.1, color = "black", size = 0.1) +
coord_flip()
DA T A DA Y T E X A S 2 0 1 8
Love Plots
DA T A DA Y T E X A S 2 0 1 8
Outcome model
svyglm(
outcome ~ exposure,
data = svy_des,
family = binomial)
DA T A DA Y T E X A S 2 0 1 8
Outcome model
svyglm(
outcome ~ exposure,
data = svy_des,
family = binomial) %>%
confint(parm = “exposure”)
DA T A DA Y T E X A S 2 0 1 8
Outcome model
svyglm(
outcome ~ exposure,
data = svy_des,
family = binomial) %>%
confint(parm = “exposure”) %>%
exp()
#> 2.5% 97.5%
#> exposure 1.51 2.01
DA T A DA Y T E X A S 2 0 1 8
unmeasured confounding
DA T A DA Y T E X A S 2 0 1 8
All you need
ü Exposure-outcome effect
ü Exposure-unmeasured confounder effect
ü Outcome-unmeasured confounder effect
DA T A DA Y T E X A S 2 0 1 8
All you need
ü Exposure-outcome effect generally estimated from a
model, for example an odds ratio, hazard ratio, or risk
ratio
ü Exposure-unmeasured confounder effect
ü Outcome-unmeasured confounder effect
Odds Ratio
Confidence Interval:
(1.5, 2)
DA T A DA Y T E X A S 2 0 1 8
All you need
ü Exposure-outcome effect generally estimated from a
model, for example an odds ratio, hazard ratio, or risk
ratio
ü Exposure-unmeasured confounder effect
ü Outcome-unmeasured confounder effect
DA T A DA Y T E X A S 2 0 1 8
Tipping point analyses
what will tip our confidence bound to cross 1
DA T A DA Y T E X A S 2 0 1 8
All you need
ü Exposure-outcome effect generally estimated from a
model, for example an odds ratio, hazard ratio, or risk
ratio
ü Exposure-unmeasured confounder effect can use my
observed variables to estimate this
ü Outcome-unmeasured confounder effect
DA T A DA Y T E X A S 2 0 1 8
All you need
ü Exposure-outcome effect generally estimated from a
model, for example an odds ratio, hazard ratio, or risk
ratio
ü Exposure-unmeasured confounder effect can use my
observed variables to estimate this
ü Outcome-unmeasured confounder effect
DA T A DA Y T E X A S 2 0 1 8
Exposure-unmeasured
confounder effect
•Prevalence of the unmeasured confounder
among the exposed
•Prevalence of the unmeasured confounder
among the unexposed
DA T A DA Y T E X A S 2 0 1 8
Love Plots
!" = 0.3
!' = 0.1
DA T A DA Y T E X A S 2 0 1 8
Confounding
Exposure Outcome
Confounder
!" = 0.3
!' = 0.1
?
Odds Ratio Confidence
Interval: (1.5, 2)
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(tipr)
tip_with_binary(
p0 = 0.1,
p1 = 0.3,
lb = 1.5,
ub = 2)
DA T A DA Y T E X A S 2 0 1 8
Standardized Mean Difference
library(tipr)
tip_with_binary(
p0 = 0.1,
p1 = 0.3,
lb = 1.5,
ub = 2)
#> [1] 4.333333
DA T A DA Y T E X A S 2 0 1 8
A hypothetical unobserved binary
confounder that is prevalent in 30% of
the exposed population and 10% of the
unexposed population would need to
have an association with Y of 4.33 to
tip this analysis at the 5% level,
rendering it inconclusive.
”
“
DA T A DA Y T E X A S 2 0 1 8
✌ parts
1. Discuss some ways to strengthen a causal
argument: Hill’s criteria
2. Discuss a specific causal inference method:
propensity scores + sensitivity analyses
DA T A DA Y T E X A S 2 0 1 8
h.ps://xkcd.com/552/
DA T A DA Y T E X A S 2 0 1 8
R
• survey
• tableone
• 9dyr
• ggplot2
• 9pr
DA T A DA Y T E X A S 2 0 1 8
References
1.Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., & Wynder, E. L.
(2009). Smoking and lung cancer: recent evidence and a discussion of some quesQons. 1959.
Interna'onal journal of epidemiology (Vol. 38, pp. 1175–1191). Oxford University Press.
hp://doi.org/10.1093/ije/dyp289
2.Schlesselman, J. J. (1978). Assessing effects of confounding variables. American Journal of
Epidemiology, 108(1), 3–8.
3.Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in ObservaQonal
Studies for Causal Effects. Biometrika, 70(1), 41. hp://doi.org/10.2307/2335942
4.Lin, D. Y., Psaty, B. M., & Kronmal, R. A. (1998). Assessing the sensiQvity of regression results to
unmeasured confounders in observaQonal studies. Biometrics, 54(3), 948–963.
5.hps://cran.r-project.org/web/packages/tableone/vignees/smd.html
DA T A DA Y T E X A S 2 0 1 8
Thank you!
@LucyStats
http://bit.ly/LucyStatsDDTX18
lucymcgowan.com

More Related Content

Similar to Making Causal Claims as a Data Scientist: Tips and Tricks Using R

Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
Ryan Herzog
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
Christina K J
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
Michael770443
 
variance ( STAT).pptx
variance ( STAT).pptxvariance ( STAT).pptx
variance ( STAT).pptx
MarkMontederamos
 
Estimation by c.i
Estimation by c.iEstimation by c.i
Estimation by c.i
Shahab Yaseen
 
Chapter 9
Chapter 9Chapter 9
Chapter 9
MaryWall14
 
Priliminary Research on Multi-Dimensional Panel Data Modeling
Priliminary Research on Multi-Dimensional Panel Data Modeling Priliminary Research on Multi-Dimensional Panel Data Modeling
Priliminary Research on Multi-Dimensional Panel Data Modeling
Parthasarathi E
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
Eugene Yan Ziyou
 
Math for 800 06 statistics, probability, sets, and graphs-charts
Math for 800   06 statistics, probability, sets, and graphs-chartsMath for 800   06 statistics, probability, sets, and graphs-charts
Math for 800 06 statistics, probability, sets, and graphs-charts
Edwin Lapuerta
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Probability 4
Probability 4Probability 4
Probability 4herbison
 
Probability 4.1
Probability 4.1Probability 4.1
Probability 4.1herbison
 
Strategies Without Frontiers
Strategies Without FrontiersStrategies Without Frontiers
Strategies Without Frontiers
maradydd
 
12.pdf
12.pdf12.pdf
1Bivariate RegressionStraight Lines¾ Simple way to.docx
1Bivariate RegressionStraight Lines¾ Simple way to.docx1Bivariate RegressionStraight Lines¾ Simple way to.docx
1Bivariate RegressionStraight Lines¾ Simple way to.docx
aulasnilda
 
Presentation1group b
Presentation1group bPresentation1group b
Presentation1group bAbhishekDas15
 
lecture6.ppt
lecture6.pptlecture6.ppt
lecture6.ppt
Temporary57
 
continuous probability distributions.ppt
continuous probability distributions.pptcontinuous probability distributions.ppt
continuous probability distributions.ppt
LLOYDARENAS1
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
Long Beach City College
 

Similar to Making Causal Claims as a Data Scientist: Tips and Tricks Using R (20)

Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
variance ( STAT).pptx
variance ( STAT).pptxvariance ( STAT).pptx
variance ( STAT).pptx
 
Estimation by c.i
Estimation by c.iEstimation by c.i
Estimation by c.i
 
Chapter 9
Chapter 9Chapter 9
Chapter 9
 
Priliminary Research on Multi-Dimensional Panel Data Modeling
Priliminary Research on Multi-Dimensional Panel Data Modeling Priliminary Research on Multi-Dimensional Panel Data Modeling
Priliminary Research on Multi-Dimensional Panel Data Modeling
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
Math for 800 06 statistics, probability, sets, and graphs-charts
Math for 800   06 statistics, probability, sets, and graphs-chartsMath for 800   06 statistics, probability, sets, and graphs-charts
Math for 800 06 statistics, probability, sets, and graphs-charts
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Probability 4
Probability 4Probability 4
Probability 4
 
Probability 4.1
Probability 4.1Probability 4.1
Probability 4.1
 
Strategies Without Frontiers
Strategies Without FrontiersStrategies Without Frontiers
Strategies Without Frontiers
 
12.pdf
12.pdf12.pdf
12.pdf
 
1Bivariate RegressionStraight Lines¾ Simple way to.docx
1Bivariate RegressionStraight Lines¾ Simple way to.docx1Bivariate RegressionStraight Lines¾ Simple way to.docx
1Bivariate RegressionStraight Lines¾ Simple way to.docx
 
Presentation1group b
Presentation1group bPresentation1group b
Presentation1group b
 
lecture6.ppt
lecture6.pptlecture6.ppt
lecture6.ppt
 
continuous probability distributions.ppt
continuous probability distributions.pptcontinuous probability distributions.ppt
continuous probability distributions.ppt
 
Two Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched PairsTwo Means, Two Dependent Samples, Matched Pairs
Two Means, Two Dependent Samples, Matched Pairs
 

Recently uploaded

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 

Recently uploaded (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 

Making Causal Claims as a Data Scientist: Tips and Tricks Using R

  • 1. D A T A D A Y T E X A S 2 0 1 8 Making Causal Claims as a Data Scientist: Tips and Tricks Using R Lucy D’Agostino McGowan
  • 2. D A T A D A Y T E X A S 2 0 1 8 http://bit.ly/LucyStatsDDTX18
  • 3. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/552/
  • 4. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/552/
  • 5. D A T A D A Y T E X A S 2 0 1 8
  • 6. D A T A D A Y T E X A S 2 0 1 8
  • 7. D A T A D A Y T E X A S 2 0 1 8 ✌ parts 1. Discuss some ways to strengthen a causal argument: Hill’s criteria 2. Discuss a specific causal inference method: propensity scores + sensitivity analyses
  • 8. D A T A D A Y T E X A S 2 0 1 8 Hill’s Criteria h-ps://commons.wikimedia.org/wiki/File:AusBn_Bradford_Hill.jpg
  • 9. D A T A D A Y T E X A S 2 0 1 8 http://nssdeviations.com h<p://livefreeordichotomize.com/2016/12/15/hill-for-the-data-scienDst-an-xkcd-story/
  • 10. D A T A D A Y T E X A S 2 0 1 8 Strength h-ps://xkcd.com/539/
  • 11. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/242/ Consistency
  • 12. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/1217/ Specificity
  • 13. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/925/ Temporality
  • 14. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/323/ Biological gradient
  • 15. D A T A D A Y T E X A S 2 0 1 8 h-ps://xkcd.com/605/ Plausibility
  • 16. D A T A D A Y T E X A S 2 0 1 8 Coherence h-ps://xkcd.com/1170/
  • 17. D A T A D A Y T E X A S 2 0 1 8 Analogy h-ps://xkcd.com/882/
  • 18. D A T A D A Y T E X A S 2 0 1 8 Experiment h-ps://xkcd.com/1462/
  • 19. D A T A D A Y T E X A S 2 0 1 8 What if you can’t do a controlled experiment?
  • 20. D A T A D A Y T E X A S 2 0 1 8 Observational Studies Goal: Answer a research question exposure outcome
  • 21. D A T A D A Y T E X A S 2 0 1 8 “Gold Standard” – Randomized Controlled Trials Treatment Control P=0.5 P=0.5 Observational Studies
  • 22. D A T A D A Y T E X A S 2 0 1 8 “Gold Standard” – Randomized Controlled Trials A B P=0.5 P=0.5 Observational Studies
  • 23. D A T A D A Y T E X A S 2 0 1 8 Treatment Control P=0.9 P=0.1 Observational Studies
  • 24. D A T A D A Y T E X A S 2 0 1 8 Exposed Unexposed P=0.8 P=0.2 Observational Studies
  • 25. D A T A D A Y T E X A S 2 0 1 8 Exposed Unexposed P=0.8 P=0.2 Observational Studies
  • 26. D A T A D A Y T E X A S 2 0 1 8 Exposed Unexposed P=0.9 P=0.1 Observational Studies
  • 27. D A T A D A Y T E X A S 2 0 1 8
  • 28. D A T A D A Y T E X A S 2 0 1 8
  • 29. D A T A D A Y T E X A S 2 0 1 8 Confounding Exposure Outcome Confounder
  • 30. D A T A D A Y T E X A S 2 0 1 8 Confounding Exposure Outcome Confounder
  • 31. D A T A D A Y T E X A S 2 0 1 8 Meaningful confounders 1. How imbalanced is the confounder between the exposure groups? 2. How predictive is the confounder of the outcome?
  • 32. D A T A D A Y T E X A S 2 0 1 8 Confounding Exposure Outcome Confounder
  • 33. D A T A D A Y T E X A S 2 0 1 8 Meaningful confounders 1. How imbalanced is the confounder between the exposure groups? 2. How predictive is the confounder of the outcome?
  • 34. D A T A D A Y T E X A S 2 0 1 8 Confounding Exposure Outcome Confounder
  • 35. D A T A D A Y T E X A S 2 0 1 8
  • 36. D A T A D A Y T E X A S 2 0 1 8 Rosenbaum and Rubin showed in observational studies, conditioning on propensity scores can lead to unbiased estimates of the exposure effect 1. There are no unmeasured confounders 2. Every subject has a nonzero probability of receiving either exposure Propensity scores
  • 37. D A T A D A Y T E X A S 2 0 1 8 Rosenbaum and Rubin showed in observational studies, conditioning on propensity scores can lead to unbiased estimates of the exposure effect 1. There are no unmeasured confounders 2. Every subject has a nonzero probability of receiving either exposure Propensity scores
  • 38. D A T A D A Y T E X A S 2 0 1 8 • Fit a logistic regression predicting exposure using known covariates Pr exposure = 1 = 1 1 + exp −X. • Each individuals' predicted values are the propensity scores Propensity scores
  • 39. D A T A D A Y T E X A S 2 0 1 8 Propensity scores glm(exposure ~ smoker + cowboy_hat + sick . . ., data = df, family = binomial)
  • 40. D A T A D A Y T E X A S 2 0 1 8 Propensity scores glm(exposure ~ smoker + cowboy_hat + sick . . ., data = df, family = binomial) %>% predict(type = “response”)
  • 41. D A T A D A Y T E X A S 2 0 1 8 Propensity scores glm(exposure ~ smoker + cowboy_hat + sick . . ., data = df, family = binomial) %>% predict(type = “response”) #> 1 2 3 4 5 #> 0.6967404 0.6370019 0.7195434 0.9079320 0.5905542
  • 42. D A T A D A Y T E X A S 2 0 1 8 Propensity scores p <- glm(exposure ~ smoker + cowboy_hat + sick . . ., data = df, family = binomial) %>% predict(type = “response”)
  • 43. D A T A D A Y T E X A S 2 0 1 8 0.6 0.9 0.8 0.2 0.7 0.1 0.2 0.3 0.2 0.7 0.8 0.2 0.3 0.7 0.2 0.1 0.1 0.6 0.1 0.3 0.9 Propensity scores
  • 44. DA T A DA Y T E X A S 2 0 1 8 0.6 0.9 0.1 0.3 0.8 0.3 0.7 0.6 0.9 0.1 0.8 0.2 0.2 0.2 0.2 0.7 Propensity scores
  • 45. DA T A DA Y T E X A S 2 0 1 8 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 Propensity scores
  • 46. DA T A DA Y T E X A S 2 0 1 8 Propensity Scores • Weighting • Matching • Stratification • Direct Adjustment • …
  • 47. DA T A DA Y T E X A S 2 0 1 8 Propensity Scores • Weighting • Matching • Stratification • Direct Adjustment • …
  • 48. DA T A DA Y T E X A S 2 0 1 8 Weighting !"#$ = min{*+, 1 − *+} 0+*+ + (1 − 0+)(1 − *+)
  • 49. DA T A DA Y T E X A S 2 0 1 8 Weighting p_0 <- 1 - p df$weight <- pmin(p, p_0) / ifelse(exposure == 1, p, p_0)
  • 50. DA T A DA Y T E X A S 2 0 1 8 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 Imbalance between exposures
  • 51. D A T A D A Y T E X A S 2 0 1 8 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 0.1 0.2 0.2 0.3 0.6 0.7 0.8 0.9 Propensity Scores Imbalance between exposures
  • 52. DA T A DA Y T E X A S 2 0 1 8 Standardized mean difference ! = #$%#&'(%! − #$*+%#&'(%! (%#&'(%! , + (*+%#&'(%! , ,
  • 53. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(survey) svy_des <- svydesign( ids = ~ 1, data = df, weights = ~ weight)
  • 54. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(tableone) smd_table <- svyCreateTableOne( vars = c(“smoker”, “cowboy_hat”, “sick”, . . .), strata = “exposure”, data = svy_des, test = FALSE) print(smd_table, smd = TRUE)
  • 55. DA T A DA Y T E X A S 2 0 1 8 Exposed Unexposed SMD n 1897.07 1900.24 mustache 60.71 (16.95) 60.90 (15.61) 0.012 cowboy hat 204.59 (101.45) 202.90 (106.44) 0.016 glasses 70.77 (27.00) 70.51 (27.50) 0.01 smoker 37.46 (10.93) 37.40 (11.28) 0.005 crazy hair 2.27 (2.46) 2.27 (1.82) 0.003 smile 2.49 (5.48) 2.47 (4.80) 0.003 sick 30.92 (8.23) 30.95 (7.54) 0.003 harry potter 3.04 (0.69) 3.03 (0.93) 0.01 healthy 145.4 (7.7) 152.2 (8.0) 0.013 Standardized Mean Difference
  • 56. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(tableone) smd_table_unweighted <- CreateTableOne( vars = c(“smoker”, “cowboy_hat”, “sick”, . . .), strata = “exposure”, data = df, test = FALSE) print(smd_table_unweighted, smd = TRUE)
  • 57. DA T A DA Y T E X A S 2 0 1 8 Love Plots
  • 58. DA T A DA Y T E X A S 2 0 1 8 Love Plots Imbalance between exposures
  • 59. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(tidyr) plot_df <- data.frame( var = names(ExtractSmd(smd_table)), Unadjusted = ExtractSmd(smd_table_unweighted ), Weighted = ExtractSmd(smd_table)) %>% gather("Method", "SMD", -var)
  • 60. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(ggplot2) ggplot( data = plot_df, mapping = aes(x = var, y = SMD, group = Method, color = Method) ) + geom_line() + geom_point() + geom_hline(yintercept = 0.1, color = "black", size = 0.1) + coord_flip()
  • 61. DA T A DA Y T E X A S 2 0 1 8 Love Plots
  • 62. DA T A DA Y T E X A S 2 0 1 8 Outcome model svyglm( outcome ~ exposure, data = svy_des, family = binomial)
  • 63. DA T A DA Y T E X A S 2 0 1 8 Outcome model svyglm( outcome ~ exposure, data = svy_des, family = binomial) %>% confint(parm = “exposure”)
  • 64. DA T A DA Y T E X A S 2 0 1 8 Outcome model svyglm( outcome ~ exposure, data = svy_des, family = binomial) %>% confint(parm = “exposure”) %>% exp() #> 2.5% 97.5% #> exposure 1.51 2.01
  • 65. DA T A DA Y T E X A S 2 0 1 8 unmeasured confounding
  • 66. DA T A DA Y T E X A S 2 0 1 8 All you need ü Exposure-outcome effect ü Exposure-unmeasured confounder effect ü Outcome-unmeasured confounder effect
  • 67. DA T A DA Y T E X A S 2 0 1 8 All you need ü Exposure-outcome effect generally estimated from a model, for example an odds ratio, hazard ratio, or risk ratio ü Exposure-unmeasured confounder effect ü Outcome-unmeasured confounder effect Odds Ratio Confidence Interval: (1.5, 2)
  • 68. DA T A DA Y T E X A S 2 0 1 8 All you need ü Exposure-outcome effect generally estimated from a model, for example an odds ratio, hazard ratio, or risk ratio ü Exposure-unmeasured confounder effect ü Outcome-unmeasured confounder effect
  • 69. DA T A DA Y T E X A S 2 0 1 8 Tipping point analyses what will tip our confidence bound to cross 1
  • 70. DA T A DA Y T E X A S 2 0 1 8 All you need ü Exposure-outcome effect generally estimated from a model, for example an odds ratio, hazard ratio, or risk ratio ü Exposure-unmeasured confounder effect can use my observed variables to estimate this ü Outcome-unmeasured confounder effect
  • 71. DA T A DA Y T E X A S 2 0 1 8 All you need ü Exposure-outcome effect generally estimated from a model, for example an odds ratio, hazard ratio, or risk ratio ü Exposure-unmeasured confounder effect can use my observed variables to estimate this ü Outcome-unmeasured confounder effect
  • 72. DA T A DA Y T E X A S 2 0 1 8 Exposure-unmeasured confounder effect •Prevalence of the unmeasured confounder among the exposed •Prevalence of the unmeasured confounder among the unexposed
  • 73. DA T A DA Y T E X A S 2 0 1 8 Love Plots !" = 0.3 !' = 0.1
  • 74. DA T A DA Y T E X A S 2 0 1 8 Confounding Exposure Outcome Confounder !" = 0.3 !' = 0.1 ? Odds Ratio Confidence Interval: (1.5, 2)
  • 75. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(tipr) tip_with_binary( p0 = 0.1, p1 = 0.3, lb = 1.5, ub = 2)
  • 76. DA T A DA Y T E X A S 2 0 1 8 Standardized Mean Difference library(tipr) tip_with_binary( p0 = 0.1, p1 = 0.3, lb = 1.5, ub = 2) #> [1] 4.333333
  • 77. DA T A DA Y T E X A S 2 0 1 8 A hypothetical unobserved binary confounder that is prevalent in 30% of the exposed population and 10% of the unexposed population would need to have an association with Y of 4.33 to tip this analysis at the 5% level, rendering it inconclusive. ” “
  • 78. DA T A DA Y T E X A S 2 0 1 8 ✌ parts 1. Discuss some ways to strengthen a causal argument: Hill’s criteria 2. Discuss a specific causal inference method: propensity scores + sensitivity analyses
  • 79. DA T A DA Y T E X A S 2 0 1 8 h.ps://xkcd.com/552/
  • 80. DA T A DA Y T E X A S 2 0 1 8 R • survey • tableone • 9dyr • ggplot2 • 9pr
  • 81. DA T A DA Y T E X A S 2 0 1 8 References 1.Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., & Wynder, E. L. (2009). Smoking and lung cancer: recent evidence and a discussion of some quesQons. 1959. Interna'onal journal of epidemiology (Vol. 38, pp. 1175–1191). Oxford University Press. hp://doi.org/10.1093/ije/dyp289 2.Schlesselman, J. J. (1978). Assessing effects of confounding variables. American Journal of Epidemiology, 108(1), 3–8. 3.Rosenbaum, P. R., & Rubin, D. B. (1983). The Central Role of the Propensity Score in ObservaQonal Studies for Causal Effects. Biometrika, 70(1), 41. hp://doi.org/10.2307/2335942 4.Lin, D. Y., Psaty, B. M., & Kronmal, R. A. (1998). Assessing the sensiQvity of regression results to unmeasured confounders in observaQonal studies. Biometrics, 54(3), 948–963. 5.hps://cran.r-project.org/web/packages/tableone/vignees/smd.html
  • 82. DA T A DA Y T E X A S 2 0 1 8 Thank you! @LucyStats http://bit.ly/LucyStatsDDTX18 lucymcgowan.com