Adapting neural networks for the estimation of treatment effects

Adapting Neural Networks for
the Estimation of Treatment
Effects
Authors: Claudia Shi, David M. Blei, and
Victor Veitch
Presented by Viswanath Gangavaram

Estimation of Average Treatment Effect
Where Q_hat is an estimate of the conditional outcome Q(t,x) = E[ Y | x, t]

● Conditional Outcome Model(COM)
○ S-Learner( Single Model Approach)
■ Q(t,x) = E[ Y | x, t]
○ Advantages:
■ Given it’s a single model, it’s efficient with data when compare to two model
approaches
○ Disadvantages:
■ Estimated ATE is biased towards zero when actual ATE is small
Various Estimation Techniques

● Grouped Conditional Outcome Model(GCOM)
○ T-Learner( TWo Model Approach)
■ T(X) = E[Y | T=1, X ]
■ C(X) = E[Y | T=0, X ]
○ Advantages:
■ Can model small ATEs
■ Not so good data efficient estimator
○ Disadvantages:
■ Not good in handling local sparsity

● X-Learner
○ T-Learner( Two Model & Two Stage Approach)
■ T(X) = E[Y | T=1, X ]
■ C(X) = E[Y | T=0, X ]
■ DT(X) = E[Y - C(X) | X] on train data
■ DC(X) = E[T(X) - Y | X] on test data
■ Causal effect = DC(X) * g(X) + DT(X) * (1-g(X))
● Note: g(X) is P(T=1|X)
○ Advantages:
■ Like T-learner can model small ATE & no so data efficient estimators
○ Disadvantages:
■ Can handle local sparsity

● TARNET
○ Jointly learn T(X) and C(X) with intermediate shared representation
○ Somewhat data efficient estimator

● Generally, estimation proceeds in two stages. First, we fit models for the
expected outcome and the probability of treatment for each unit. Second,
we add this fitted models in a downstream estimator of causal effect.
● Neural Networks are great choice for the first stage models
● This paper tackles the case of adapting design & training of neural networks
which are used in first stage for the task of estimating causal effect.
Adapting Neural Networks for the estimation of Treatment effects

● Proposed a dragonnet architecture which takes advantage of propensity
score theorem. In dragonnet we learn Q_hat(1,.), Q_hat(0,.) & g(x) jointly with
shared representations both for outcome models & propensity score model.
● Proposed a regularization procedure, targeted regularization, that induces a
bias towards models that have non-parametrically optimal asymptotic
properties
Adapting Neural Networks for the estimation of Treatment effects

Intuition behind dragonnet architecture
● PST in words: Propensity as a balancing score property
○ It suffices to adjust for only the information in X that is relevant for predicting the outcome.
○ The parts that relevant for predicting the outcome but not for predicting the treatment are
irrelevant for estimating the causal effect.
○ Authors posits, conditioning on those irrelevant parts are going to hurt finite-sample
performance
● Two-stage model (Transfer learning)
○ Learn propensity score model
○ Remove the output layer & freeze weights
○ Learn conditional outcome model
○ Estimate ATE

Dragonnet
● An end-to-end procedure for predicting propensity
score and conditional outcomes from covariates and
treatment
● Z(X) is a share representation layer
● 2 layer neural networks for outcome models, whereas
for propensity score a simple linear map(followed by
sigmoid). This simple map forces the representation
layer to tightly couple to estimated propensity scores
● Trade off prediction quality to achieve good
representation of the propensity score
● This trade-off improves ATE estimation even when we
use a downstream estimator that does not use the
estimated propensity scores.

Targeted Regularization: Non-Parametric Estimation theory at work

Targeted Regularization: Non-Parametric Estimation theory at work
Targeted Minimum Loss Estimation(TMLE)

Results & Comments
Let’s go to the main paper

Adapting neural networks for the estimation of treatment effects

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Adapting neural networks for the estimation of treatment effects

Similar to Adapting neural networks for the estimation of treatment effects (20)

Recently uploaded

Recently uploaded (20)

Adapting neural networks for the estimation of treatment effects