Repurposing Classification & Regression Trees
for Causal Research with High-Dimensional Data
Galit Shmueli 徐茉莉
Institute of Service Science
WOMBAT 2019
Monash University
We tackle 2 key issues in causal research:
Self Selection
Identifying Confounders
A Tree-Based Approach
for Addressing Self-Selection
in Impact Studies with Big Data
Yahav, Shmueli & Mani (2016), A Tree-Based Approach for Addressing Self-Selection
in Impact Studies with Big Data, MIS Quarterly, vol 40 no 4, pp. 819-848.
With Inbal Yahav (Tel Aviv U) & Deepa Mani (Indian School of Business)
The Challenge in Impact Studies
• Individuals/firms self-select intervention
group/duration (quasi-experiment)
• Even in randomized experiments, some variables
might remain unbalanced in sample
How to identify and adjust for self-selection?
Randomized Experiment
Manipulation
Quasi-Experiment
(self-selection or administrator selection)
Manipulation
Self
Selection
Common Approaches
for Addressing Self-Selection
Two steps:
1. Selection model: T = f(X)
2. Performance analysis on matched samples
Y = performance measure(s)
T = treatment
X = pre-intervention variables
Self-selection:
P(T|X) ≠P(T)
• 2SLS modeling (Heckman correction) -- econometrics
• Propensity Score Approach (PS) -- statistics
Propensity Scores Approach
Step 1: Estimate selection model logit(T) = f(X)
to compute propensity scores P(T|X)
Step 3: Estimate Effect (compare matched groups)
e.g., t-test or Y = b0 + b1 T+ b2 X+ b3 PS +e
Step 2: Use scores to create matched samples
PSM = use matching algorithm
PSS = divide scores into bins
Challenges of PS in Big Data
1. Matching leads to severe data loss
2. PS methods suffer from “data dredging”
3. No variable selection (what drive selection?)
4. Assumes constant intervention effect
5. Sequential process is computationally costly
6. Logistic model requires specifying exact form
of selection model
Our Proposed Solution:
Trees
Propensity scores
P(T|X)
Y, T, X E(Y|T)
Even E(Y|T,X)
“Kill the Intermediary”
Proposed Method: Tree
Output: T (treat/control)
Inputs: X’s (income, education, family…)
Records in each terminal node
share same profile (X) and same
propensity score P(T=1|X)
Tree-Based Approach
Four steps:
1. Run selection model: fit tree T = f(X)
2. Visualize tree; see unbalanced X’s
3. Treat each terminal node as sub-sample;
conduct terminal-node-level performance
analysis
4. Present terminal-node-analyses visually
5. [optional]: combine analyses from nodes with
homogeneous effects
Like PS, assumes observable self-selection
Three Applications (MISQ 2016)
1. Impact of labor training on earnings
(Famous) randomized experiment by US gov
2. Impact of new online passport service on
bribing, efficiency,…
Quasi-experiment by India gov
3. Impact of outsourcing contract pricing &
duration on financial performance
Study 1: Impact of training on financial gains
In mid-1970’s US govt program randomly
assigned eligible candidates to labor training
program
• Goal: increase future earnings
• LaLonde (1986) showed:
Groups statistically equal in terms of demographic
& pre-train earnings
 ATE = $1794 (p<0.004)
Tree on Lalonde’s RCT data
If groups are completely
balanced, we expect…
Y = Earnings in 1978
T = Received NSW training (T = 1) or not (T = 0)
X = Demographic information and prior earnings
Tree reveals…
LaLonde’s naïve approach
(experiment)
Tree approach
HS dropout
(n=348)
HS degree
(n=97)
Not trained (n=260) $4554 $4,495 $4,855
Trained (n=185) $6349 $5,649 $8,047
Training effect
$1794
(p=0.004)
$1,154
(p=0.063)
$3,192
(p=0.015)
Overall: $1598
(p=0.017)
no yes
High school
degree
1. Unbalanced variable (HS degree)
2. Heterogeneous effect
Labor Training effect:
Observational control group
• LaLonde also compared with observational
control groups (PSID, CPS)
– experimental training group vs. obs control
– showed training effect not estimated correctly with
structural equations
• Dehejia & Wahba (1999,2002) re-analyzed CPS
control group (n=15,991), using PSM
– Effects in range [$1122, $1681], depends on settings
– “Best” setting effect: $1360
– Uses only 119 control group members (out of 15,991)
Tree for obs control group reveals…
unemployed prior to training
in 1974 (u74=0 )
-> negative effect outlier
eligibility
issue!
some profiles are rare in
trained group but
popular in control group
1. Unbalanced variables
2. Heterogeneous effect in u74
3. Outlier
4. Eligibility issue
Study 2:
Impact of eGov Initiative
(India)
Survey commissioned by Govt of India in 2006
• >9500 individuals who used passport services
• Representative sample of 13 Passport Offices
• “Quasi-experimental, non-equivalent groups design”
• Equal number of offline and online users, matched by
geography and demographics
Naïve Approach
Assess impact by
comparing
online/offline
performance stats
% bribe RPO
% use agent
% prefer online
% bribe police
Naive By Aware / Unaware
online onlineonline
Awareness of electronic services
provided by Government of India
Simpson’s
Paradox
1. Demographics properly balanced
2. Unbalanced variable (Awareness)
3. Heterogeneous effects on various y’s
+ even Simpson’s paradox
PSMAwareness of electronic services
provided by Government of India
Would we detect this
with PSM?
Heterogeneous effect
Scaling Up to Big Data
• We inflated eGov dataset by bootstrap
• Up to 9,000,000 records and 360 variables
• 10 runs for each configuration: runtime for tree
20 sec
Big Data Simulation
Binary intervention
T = {0, 1}
Continuous intervention
T∼ N
Sample sizes (n) 10K, 100K, 1M
#Pre-intervention
variables (p)
4, 50 (+interactions)
Pre-intervention
variable types
Binary, Likert-scale, continuous
Outcome
variable types
Binary, continuous
Selection models
#1: P (T=1) = logit (b0 + b1 x1 +…+ bp xp)
#2: P (T=1) = logit (b0 + b1 x1 +…+ bp xp + interactions)
Intervention
effects
1. Homogeneous
Control: E(Y | T = 0) = 0.5
Intervention: E(Y | T = 1) = 0.7
2. Heterogeneous
Control: E(Y | T = 0) = 0.5
Intervention: E(Y | T = 1, X1=0) = 0.7
E(Y | T = 1, X1=1) = 0.3
1. Homogeneous
Control: E(Y | T = 0) = 0
Intervention: E(Y | T = 1) = 1
2. Heterogeneous
Control: E(Y | T = 0) = 0
Intervention: E(Y | T = 1, X1=0) = 1
E(Y | T = 1, X1=1) = -1
Example: Heterogeneous Effect
Big Data Scalability
Theoretical Complexity:
• O(mn/p) for binary X
• O(m/p nlog(n) ) for continuous X
Runtime as function of sample size, dimension
Scaling Trees Even Further
• “Big Data” in research vs. industry
• Industrial scaling
– Sequential trees: efficient data structure, access
(SPRINT, SLIQ, RainForest)
– Parallel computing (parallel SPRINT, ScalParC,
SPARK, PLANET) “as long as split metric can be
computed on subsets of the training data and
later aggregated, PLANET can be easily extended”
Tree Approach Benefits
1. Data-driven selection model
2. Scales up to Big Data
3. Less user choices (data dredging)
4. Nuanced insights
• Detect unbalanced variables
• Detect heterogeneous effect from anticipated outcomes
5. Simple to communicate
6. Automatic variable selection
7. Missing values do not remove record
8. Binary, multiple, continuous interventions
9. Post-analysis of RCT quasi-experiments & observational studies
Tree Approach Limits
1. Assumes selection on observables
2. Need sufficient data
3. Continuous variables can lead to large tree
4. Instability
[possible solution: use variable importance scores (forest)]
Detecting
Simpson’s Paradox
in Big Data
Using Trees
Shmueli & Yahav (2017), The Forest or the Trees? Tackling Simpson’s Paradox with
Classification Trees, Production & Operations Management Journal, Forthcoming
With Inbal Yahav, Tel Aviv University
Simpson’s Paradox
The direction of a cause on an effect appears reversed when
examining aggregate vs. disaggregate of a sample (or population)
Simpson's Paradox is the reversal
of an association between two
variables after a third variable is
taken into account
Schield (1999)
The phenomenon whereby an event B
increases the probability of A in a given
population p, at the same time, decreases the
probability of A in every subpopulation of p.
Pearl (2009)
Death Sentence and Race
(Agresti, 1984)
Does defendant's race (X) affect
chance of death sentence (Y)?
Causal explanation:
Black murderers tend to kill blacks;
hence lower overall death sentence rates
Causal effect seems
to reverse when
disaggregating by
victim race (Z)
Goal: Does a dataset exhibit SP?
C = confounder
E = effectA = cause
P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ )
“If Cornfield’s minimum effect size is not reached,
[you] can assume no causality” Schield, 1999
Cornfield et al’s Criterion
Translate Cornfield’s Criterion
into a Tree
Y = outcome of interest
X = causal variable
Z = confounding variable(s)
Tree Predictors
#1
If cause -> effect, then
cause should appear in tree
#2
If Z is confounder, then
Z should appear in tree
5 potential tree structures
- single causal variable (X)
- single confounding variable (Z)
Which might exhibit Simpson’s Paradox?
P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ )
Simpson’s Paradox on a Tree
#1
If cause -> effect, then cause
should appear in tree
#2
If Z is confounder, then
Z should appear in tree
#3
Z should appear before cause
(Cornfield criterion)
Death Sentence and Race:
Tree Approach #1: full tree
P(death)
Accounting for Sampling Error
Logistic/linear regression:
Interaction X*Z significant?
No → no paradox
Yes → ?
Trees:
Tree structure + significance of interaction
= conditional-inference tree
Tree splits based on statistical tests
(c2, F , permutation tests)
Tree Approach #2:
Conditional Inference tree
(Hothorn et al., JCGS 2006)
Variable selection based on statistical test (c2)
• Recursive partitioning with early stopping
• Separate steps for variable selection and split search
• R packages party, partykit (function ctree)
Cornfield’s criterion + sampling error:
Conditional Inference Trees
Proof for trees that use concave impurity
measure (Gini, entropy) as well as c2
CART, CHAID, Conditional-Inference Trees
Accounting for Sampling Error:
Conditional-Inference Tree
P(death)
Seatbelts and Injuries (Agresti 2012)
Does use of seat-belts (X) reduce chance of injury (Y)?
Z = Passenger gender and accident location
n=68,694 passengers involved in accidents in Maine
Potential Paradox
(by location)
How about logistic regression?
% Injuries
Simpson’s Paradox in Big Data
Large n , High-dimensional Z
Multiple Potential Confounders (Z)
The Challenge
Statistical significance of
Simpson’s paradox
≠
Significance threshold of
tree splits in CI treeCI Tree Full Tree
Solution: X-Terminal Tree
Paradox Detection in Big Data
(Tree Approach #3):
X-Terminal Trees
X-Terminal Tree:
Grow tree only
until X-splits
Tree paths with terminal X nodes
can indicate…
• Full paradox, statistically significant
• Partial paradox, statistically significant
• Statistically insignificant paradox
• No paradox
Pivot table equivalence:
Filter by Z variables above terminal X node
Impact of eGov Initiative
(India)
Survey commissioned by Govt of India in 2006
• >9500 individuals who used passport services
• Representative sample of 13 Passport Offices
• “Quasi-experimental, non-equivalent groups design”
• Equal number of offline and online users, matched by
geography and demographics
Y = police bribe (0/1)
X = online/offline
Z = {demographics; survey Qs}
Split
p=.32
Paradox p=0.003Paradox p=0.16
No paradox
Kidney Allocation in USA
(104,000 patients, 19 confounders)
Is the kidney allocation system racist?
Type 4 tree, but no significant Simpson’s paradox
detected!
Y = waiting time (days)
X = patient race
Z = {patient demog, health, bio}
• Greediness of tree
• Weak paradox or in
small subset of data
can go undetected
• Highly correlated Z’s
might lead to
“wrong” Z choice
Summary & Challenges
Full tree: eliminate non-type-4 trees
Conditional-inference trees: for single Z
X-terminal trees: for multiple Zs
• More efficient than
stepwise regression
• Tree structure more
informative than
interaction terms
• Extends: continuous Y,
>2 subpopulations
We tackle 2 key issues in causal research:
Self Selection
Identifying Confounders
Anal yt ics
Humanit y
Responsibil it y
Galit Shmueli 徐茉莉
Institute of Service Science

Repurposing Classification & Regression Trees for Causal Research with High-Dimensional Data

  • 1.
    Repurposing Classification &Regression Trees for Causal Research with High-Dimensional Data Galit Shmueli 徐茉莉 Institute of Service Science WOMBAT 2019 Monash University
  • 2.
    We tackle 2key issues in causal research: Self Selection Identifying Confounders
  • 3.
    A Tree-Based Approach forAddressing Self-Selection in Impact Studies with Big Data Yahav, Shmueli & Mani (2016), A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Big Data, MIS Quarterly, vol 40 no 4, pp. 819-848. With Inbal Yahav (Tel Aviv U) & Deepa Mani (Indian School of Business)
  • 4.
    The Challenge inImpact Studies • Individuals/firms self-select intervention group/duration (quasi-experiment) • Even in randomized experiments, some variables might remain unbalanced in sample How to identify and adjust for self-selection?
  • 5.
  • 6.
    Quasi-Experiment (self-selection or administratorselection) Manipulation Self Selection
  • 7.
    Common Approaches for AddressingSelf-Selection Two steps: 1. Selection model: T = f(X) 2. Performance analysis on matched samples Y = performance measure(s) T = treatment X = pre-intervention variables Self-selection: P(T|X) ≠P(T) • 2SLS modeling (Heckman correction) -- econometrics • Propensity Score Approach (PS) -- statistics
  • 8.
    Propensity Scores Approach Step1: Estimate selection model logit(T) = f(X) to compute propensity scores P(T|X) Step 3: Estimate Effect (compare matched groups) e.g., t-test or Y = b0 + b1 T+ b2 X+ b3 PS +e Step 2: Use scores to create matched samples PSM = use matching algorithm PSS = divide scores into bins
  • 9.
    Challenges of PSin Big Data 1. Matching leads to severe data loss 2. PS methods suffer from “data dredging” 3. No variable selection (what drive selection?) 4. Assumes constant intervention effect 5. Sequential process is computationally costly 6. Logistic model requires specifying exact form of selection model
  • 10.
    Our Proposed Solution: Trees Propensityscores P(T|X) Y, T, X E(Y|T) Even E(Y|T,X) “Kill the Intermediary”
  • 11.
    Proposed Method: Tree Output:T (treat/control) Inputs: X’s (income, education, family…) Records in each terminal node share same profile (X) and same propensity score P(T=1|X)
  • 12.
    Tree-Based Approach Four steps: 1.Run selection model: fit tree T = f(X) 2. Visualize tree; see unbalanced X’s 3. Treat each terminal node as sub-sample; conduct terminal-node-level performance analysis 4. Present terminal-node-analyses visually 5. [optional]: combine analyses from nodes with homogeneous effects Like PS, assumes observable self-selection
  • 13.
    Three Applications (MISQ2016) 1. Impact of labor training on earnings (Famous) randomized experiment by US gov 2. Impact of new online passport service on bribing, efficiency,… Quasi-experiment by India gov 3. Impact of outsourcing contract pricing & duration on financial performance
  • 14.
    Study 1: Impactof training on financial gains In mid-1970’s US govt program randomly assigned eligible candidates to labor training program • Goal: increase future earnings • LaLonde (1986) showed: Groups statistically equal in terms of demographic & pre-train earnings  ATE = $1794 (p<0.004)
  • 15.
    Tree on Lalonde’sRCT data If groups are completely balanced, we expect… Y = Earnings in 1978 T = Received NSW training (T = 1) or not (T = 0) X = Demographic information and prior earnings
  • 16.
    Tree reveals… LaLonde’s naïveapproach (experiment) Tree approach HS dropout (n=348) HS degree (n=97) Not trained (n=260) $4554 $4,495 $4,855 Trained (n=185) $6349 $5,649 $8,047 Training effect $1794 (p=0.004) $1,154 (p=0.063) $3,192 (p=0.015) Overall: $1598 (p=0.017) no yes High school degree 1. Unbalanced variable (HS degree) 2. Heterogeneous effect
  • 17.
    Labor Training effect: Observationalcontrol group • LaLonde also compared with observational control groups (PSID, CPS) – experimental training group vs. obs control – showed training effect not estimated correctly with structural equations • Dehejia & Wahba (1999,2002) re-analyzed CPS control group (n=15,991), using PSM – Effects in range [$1122, $1681], depends on settings – “Best” setting effect: $1360 – Uses only 119 control group members (out of 15,991)
  • 18.
    Tree for obscontrol group reveals… unemployed prior to training in 1974 (u74=0 ) -> negative effect outlier eligibility issue! some profiles are rare in trained group but popular in control group 1. Unbalanced variables 2. Heterogeneous effect in u74 3. Outlier 4. Eligibility issue
  • 19.
    Study 2: Impact ofeGov Initiative (India) Survey commissioned by Govt of India in 2006 • >9500 individuals who used passport services • Representative sample of 13 Passport Offices • “Quasi-experimental, non-equivalent groups design” • Equal number of offline and online users, matched by geography and demographics
  • 20.
    Naïve Approach Assess impactby comparing online/offline performance stats
  • 21.
    % bribe RPO %use agent % prefer online % bribe police Naive By Aware / Unaware online onlineonline Awareness of electronic services provided by Government of India Simpson’s Paradox 1. Demographics properly balanced 2. Unbalanced variable (Awareness) 3. Heterogeneous effects on various y’s + even Simpson’s paradox
  • 22.
    PSMAwareness of electronicservices provided by Government of India Would we detect this with PSM?
  • 23.
  • 24.
    Scaling Up toBig Data • We inflated eGov dataset by bootstrap • Up to 9,000,000 records and 360 variables • 10 runs for each configuration: runtime for tree 20 sec
  • 25.
    Big Data Simulation Binaryintervention T = {0, 1} Continuous intervention T∼ N Sample sizes (n) 10K, 100K, 1M #Pre-intervention variables (p) 4, 50 (+interactions) Pre-intervention variable types Binary, Likert-scale, continuous Outcome variable types Binary, continuous Selection models #1: P (T=1) = logit (b0 + b1 x1 +…+ bp xp) #2: P (T=1) = logit (b0 + b1 x1 +…+ bp xp + interactions) Intervention effects 1. Homogeneous Control: E(Y | T = 0) = 0.5 Intervention: E(Y | T = 1) = 0.7 2. Heterogeneous Control: E(Y | T = 0) = 0.5 Intervention: E(Y | T = 1, X1=0) = 0.7 E(Y | T = 1, X1=1) = 0.3 1. Homogeneous Control: E(Y | T = 0) = 0 Intervention: E(Y | T = 1) = 1 2. Heterogeneous Control: E(Y | T = 0) = 0 Intervention: E(Y | T = 1, X1=0) = 1 E(Y | T = 1, X1=1) = -1
  • 26.
  • 27.
    Big Data Scalability TheoreticalComplexity: • O(mn/p) for binary X • O(m/p nlog(n) ) for continuous X Runtime as function of sample size, dimension
  • 28.
    Scaling Trees EvenFurther • “Big Data” in research vs. industry • Industrial scaling – Sequential trees: efficient data structure, access (SPRINT, SLIQ, RainForest) – Parallel computing (parallel SPRINT, ScalParC, SPARK, PLANET) “as long as split metric can be computed on subsets of the training data and later aggregated, PLANET can be easily extended”
  • 29.
    Tree Approach Benefits 1.Data-driven selection model 2. Scales up to Big Data 3. Less user choices (data dredging) 4. Nuanced insights • Detect unbalanced variables • Detect heterogeneous effect from anticipated outcomes 5. Simple to communicate 6. Automatic variable selection 7. Missing values do not remove record 8. Binary, multiple, continuous interventions 9. Post-analysis of RCT quasi-experiments & observational studies
  • 30.
    Tree Approach Limits 1.Assumes selection on observables 2. Need sufficient data 3. Continuous variables can lead to large tree 4. Instability [possible solution: use variable importance scores (forest)]
  • 31.
    Detecting Simpson’s Paradox in BigData Using Trees Shmueli & Yahav (2017), The Forest or the Trees? Tackling Simpson’s Paradox with Classification Trees, Production & Operations Management Journal, Forthcoming With Inbal Yahav, Tel Aviv University
  • 32.
    Simpson’s Paradox The directionof a cause on an effect appears reversed when examining aggregate vs. disaggregate of a sample (or population) Simpson's Paradox is the reversal of an association between two variables after a third variable is taken into account Schield (1999) The phenomenon whereby an event B increases the probability of A in a given population p, at the same time, decreases the probability of A in every subpopulation of p. Pearl (2009)
  • 33.
    Death Sentence andRace (Agresti, 1984) Does defendant's race (X) affect chance of death sentence (Y)? Causal explanation: Black murderers tend to kill blacks; hence lower overall death sentence rates Causal effect seems to reverse when disaggregating by victim race (Z)
  • 34.
    Goal: Does adataset exhibit SP? C = confounder E = effectA = cause P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ ) “If Cornfield’s minimum effect size is not reached, [you] can assume no causality” Schield, 1999 Cornfield et al’s Criterion
  • 35.
    Translate Cornfield’s Criterion intoa Tree Y = outcome of interest X = causal variable Z = confounding variable(s) Tree Predictors #1 If cause -> effect, then cause should appear in tree #2 If Z is confounder, then Z should appear in tree
  • 36.
    5 potential treestructures - single causal variable (X) - single confounding variable (Z) Which might exhibit Simpson’s Paradox? P (E|C ) – P( E|C’ ) P (E|A ) – P(E|A’ )
  • 37.
    Simpson’s Paradox ona Tree #1 If cause -> effect, then cause should appear in tree #2 If Z is confounder, then Z should appear in tree #3 Z should appear before cause (Cornfield criterion)
  • 38.
    Death Sentence andRace: Tree Approach #1: full tree P(death)
  • 39.
    Accounting for SamplingError Logistic/linear regression: Interaction X*Z significant? No → no paradox Yes → ? Trees: Tree structure + significance of interaction = conditional-inference tree Tree splits based on statistical tests (c2, F , permutation tests)
  • 40.
    Tree Approach #2: ConditionalInference tree (Hothorn et al., JCGS 2006) Variable selection based on statistical test (c2) • Recursive partitioning with early stopping • Separate steps for variable selection and split search • R packages party, partykit (function ctree)
  • 41.
    Cornfield’s criterion +sampling error: Conditional Inference Trees
  • 42.
    Proof for treesthat use concave impurity measure (Gini, entropy) as well as c2 CART, CHAID, Conditional-Inference Trees
  • 43.
    Accounting for SamplingError: Conditional-Inference Tree P(death)
  • 44.
    Seatbelts and Injuries(Agresti 2012) Does use of seat-belts (X) reduce chance of injury (Y)? Z = Passenger gender and accident location n=68,694 passengers involved in accidents in Maine Potential Paradox (by location) How about logistic regression?
  • 45.
  • 46.
    Simpson’s Paradox inBig Data Large n , High-dimensional Z
  • 47.
    Multiple Potential Confounders(Z) The Challenge Statistical significance of Simpson’s paradox ≠ Significance threshold of tree splits in CI treeCI Tree Full Tree Solution: X-Terminal Tree
  • 48.
    Paradox Detection inBig Data (Tree Approach #3): X-Terminal Trees X-Terminal Tree: Grow tree only until X-splits
  • 49.
    Tree paths withterminal X nodes can indicate… • Full paradox, statistically significant • Partial paradox, statistically significant • Statistically insignificant paradox • No paradox Pivot table equivalence: Filter by Z variables above terminal X node
  • 50.
    Impact of eGovInitiative (India) Survey commissioned by Govt of India in 2006 • >9500 individuals who used passport services • Representative sample of 13 Passport Offices • “Quasi-experimental, non-equivalent groups design” • Equal number of offline and online users, matched by geography and demographics
  • 51.
    Y = policebribe (0/1) X = online/offline Z = {demographics; survey Qs} Split p=.32 Paradox p=0.003Paradox p=0.16 No paradox
  • 52.
    Kidney Allocation inUSA (104,000 patients, 19 confounders) Is the kidney allocation system racist? Type 4 tree, but no significant Simpson’s paradox detected! Y = waiting time (days) X = patient race Z = {patient demog, health, bio}
  • 53.
    • Greediness oftree • Weak paradox or in small subset of data can go undetected • Highly correlated Z’s might lead to “wrong” Z choice Summary & Challenges Full tree: eliminate non-type-4 trees Conditional-inference trees: for single Z X-terminal trees: for multiple Zs • More efficient than stepwise regression • Tree structure more informative than interaction terms • Extends: continuous Y, >2 subpopulations
  • 54.
    We tackle 2key issues in causal research: Self Selection Identifying Confounders
  • 55.
    Anal yt ics Humanity Responsibil it y Galit Shmueli 徐茉莉 Institute of Service Science

Editor's Notes

  • #8 Heckman correction builds selection model based on economic theory
  • #37 Proof for Entropy
  • #53 Blue: no paradox (same as overall direction), then splitting stops when reaches online/offline (X) Orange: although p-value is very large, we still have a split