Confounding, politics, frustration and knavish tricks

Bradford Hill lecture 2008 1 of 48
Confounding, politics,
frustration and knavish tricks
Stephen Senn

"If when the tide is falling you take out
water with a twopenny pail, you and the
moon together can do a great deal”
Bradford Hill, A., and Hill, I. D. (1990), (12th
edition) Principles of Medical
Statistics. p247

The Central Problem of
Epidemiology
• This is generally recognised to be confounding
• Where experiments cannot be conducted we must make
do with observational studies
• There is also the risk that due to hidden confounders we
will conclude causation when all we have is association
• Hill was a (the) key figure in promoting randomised
controlled trials (RCTs)
• But he also recognised that RCTs were not enough and
was a pioneer of observational studies
– Case control as in Doll and Hill (1950)
– Cohort as in Doll and Hill (1954)

Outline
• Some statistics of the propensity score
• An explanation of the propensity score
• Comparison to ANCOVA
• Some criticisms
• Conclusions
Acknowledgement
This is based on joint work with Erika Graf and Angelika Caputo
Senn, S., Graf, E., and Caputo, A. (2007), "Stratification for the Propensity Score
Compared with Linear Regression Techniques to Assess the Effect of Treatment or
Exposure," Statistics in Medicine, 26, 5529-5544.

A Question for you to Consider
• Consider these two experiments
– A completely randomised trial
– Patients allocated with 50% probability to A or B
– Randomised matched pairs
– Member of any pair randomised with 50% probability to A
or B
• In analysing, would you ignore the
matching in the second case?

Propensity score: background
• Due to Rosenbaum and Rubin, Biometrika
1983
• Has been cited over 1000 times since first
published
• Citation rate has grown rapidly since 1995
and is now more than 200 per year

This model
predicting more
than 300
citations in
2008
Annual citations of RosenbaumandRubin
50
150
0
100
1990 2000 2005
200
1995
250
1985
Year
Citations
Fit
Data

Cumulativecitations of RosenbaumandRubin
0
1985
1000
600
200
2005
1200
400
800
200019951990
Year
Cumulative

MEDICINE, GENERAL & INTERNAL (5.67%)
ECONOMICS (19.45%)
MATHEMATICAL & COMPUTATIONAL BIOLOGY (6.63%)
CARDIAC & CARDIOVASCULAR SYSTEMS (10.69%)
SOCIAL SCIENCES, MATHEMATICAL METHODS (8.99%)
PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH (12.75%)
SURGERY (6.48%)
HEALTH CARE SCIENCES & SERVICES (6.34%)
RESPIRATORY SYSTEM (5.75%)
STATISTICS & PROBABILITY (17.24%)

Propensity Score Explanation
• We consider two ‘treatments’ or exposures
a subject might have received
• The assignment indicator is X
– X = 0, if subject receives exposure 0
– X = 1, if subject receives exposure 1
• There is a vector of covariates W

Counterfactual responses
• For every subject we have two responses
– ro
– r1
• One of these will be observed
• One of these is unobserved
– Counterfactual

Propensity score: definition
( ) ( )1e W P X W= =
This is a form of balancing score b(W). A balancing score is
defined as follows. If r0 is the response given by a subject that is
unexposed (indexed by 0) and r1 is the response when the same
subject is exposed (indexed by 1), and
( ) ( ) ( )0 1 0 1, ,r r X W and r r X b W⊥ ⊥
then b(W) is a balancing score. R & R show that the finest such
score is W itself and the coarsest is the propensity score

Propensity score uses
• Calculate the propensity score for each
subject
• Stratify by the propensity score
– In practice fifths are used
• The resulting estimator is unbiased
– The possible confounding influence of W has
been eliminated

Exposure
A B Total
Young 240 80 320Male
Old 60 20 80
Young 80 240 320Female
Old 20 60 80
Total 400 400 800
Propensity Score:An Example
Disposition of subjects in a study

Exposure
Sex A B Total
Male 300 100 400
Female 100 300 400
400 400 800
Exposure
Age A B Total
Young 320 320 640
Old 80 80 160
400 400 800
Sex is predictive of exposure but age is not

Class Relative
frequency (A)
‘Probability’ of
Disposition (to A)
Young males 240/320 3/4
Old males 60/80 3/4
Young females 80/320 1/4
Old females 20/80 1/4
The philosophy of the propensity score is to stratify by probability
of allocation. In this case this is equivalent to stratifying by sex.

Treatment
Sex A B Difference
Male 96 136 40
Female 96 136 40
Treatment
Age A B Difference
Young 100 140 40
Old 80 120 40
Response
Age is predictive of outcome but sex is not

The Difference to Conventional
Approaches
• Conventional approaches correct for covariates
if they are predictive of outcome
– Analysis of covariance
– Stratification
• The propensity score corrects if covariates are
predictive of assignment (allocation)
• In this example correcting either for sex
(propensity score) or age (ANCOVA) will
produce an “unbiased” estimate

In terms of linear regression
UVβLet be the marginal regression of U on V
be the conditional regression of U on V given TTUV .Let β
)2(0
or
)1(0
if
.
.
..


=
=
=∴
+=
WX
XYW
WYXYX
WXXYWWYXYX
β
β
ββ
ββββ
(1) Is the analysis of covariance condition for not including
something in the model and (2) is the propensity score condition.
To define some general notation
Now consider a specific
implementation where Y is
outcome X is treatment and
W is covariate

Some myths of the propensity
score
• Colinearity of predictors makes traditional
regression adjustments unusable
• Quintile stratification on the propensity
score eliminates bias more effectively than
ANCOVA
• The propensity score can be more efficient
than ANCOVA
• The coarsening property of the propensity
score benefits efficiency

Colinearity of Predictors
Consider a simple example in which the following predictor pattern is
repeated a number of times
Covariate/Confounder Exposure
W1 W2 X
0 0 0
0 0 1
1 1 0
1 1 1
Clearly the effects of W1 and W2 are not identifiable but the effect of X is and
any decent statistical package should be able to estimate the effect even if
W1 and W2 are in the model. In the following example it is supposed that
( )
1 2
0,1
Y W W X
N
ε
ε
= + + +
:
And that we have the same basic pattern
of predictors for 1000 observations

Analysis with GenStat 1
Case where W1 and W2 are completely colinear
Message: term W2 cannot be included in the model because it is aliased
with terms already in the model.
(W2) = (W1)
Regression analysis
Estimates of parameters
Parameter estimate s.e. t(997) t pr.
Constant -0.0266 0.0542 -0.49 0.624
W1 2.0067 0.0626 32.05 <.001
X 1.0377 0.0626 16.57 <.001

Analysis with GenStat 2
Case where W1 and W2 are strongly colinear ( a small bit of noise added to W2)
Regression analysis
Estimates of parameters
Parameter estimate s.e. t(996) t pr.
Constant -0.0270 0.0542 -0.50 0.619
W1 -0.82 3.16 -0.26 0.795
W2e 2.83 3.16 0.89 0.372
X 1.0372 0.0626 16.56 <.001
Message: the variance of some parameter estimates is seriously inflated, due to near
collinearity or aliasing between the following parameters, listed with their variance
inflation factors.
W1 2553.00
W2e 2553.00

Better at eliminating bias?
• Some papers have purported to show this
• Claims have been demonstrated using
simulation
• But the simulations have been unfair
– For example using models of different implicit
complexity
• It is trivial to produce examples where quintile
stratification does not work
– Suppose a baseline covariate differs by one standard
deviation between exposures and outcome is a linear
function of this
• ANCOVA works perfectly, propensity score is biased

More efficient than ANCOVA ?
• Stratification by probability of assignment
• But ANCOVA stratifies by predictors of
outcome; not assignment.
• By definition residual variance less for
ANCOVA
• By definition, loss of orthogonality greater for
propensity.
• Consequence: variance of estimators higher
for propensity score
• Propensity score incoherent?

Furthermore
• The coarseness property of the propensity
score is completely irrelevant
• There is no gain in efficiency through this
property
• The loss in orthogonality is equivalent to
fitting all covariates and their interactions
with each other.
• You might as well just use (multivariate)
W.

A Regression Reminder
[ ]
( ) ( )
1 2
00
2
ˆvar
XX
Let P X W
β P P
a
a
σ
σ
−
=
′=
 
 ÷
 ÷=
 ÷
 ÷
 
L L M
M
M O
M O
The propensity score philosophy chooses the members of W
in such a way that axx is maximised. Analysis of covariance
chooses the members so that σ2
is minimised.

Another Example
Young Old Total all ages Total
X = 0 X = 1 X = 0 X = 1 X = 0 X = 1
Male 3 7 80 30 83 37 120
Female 8 42 9 21 17 63 80
Total
both
11 49 89 51 100 100
Total 60 140 200

Another Example
Young Old Total all ages Total
X = 0 X = 1 X = 0 X= 1 X= 0 X = 1
Male 3 7 80 30 83 37 120
Female 8 42 9 21 17 63 80
Total
both
11 49 89 51 100 100
Total 60 140 200
e(w) = 0.7

Propensity score stratification
Exposure Assignment Total
Stratum or
strata
Propensity
score
X = 0 X= 1
Old males e(W) = 0.27 80 30 110
Young males +
Old females
e(W) = 0.70 12 28 40
Young females e(W) = 0.84 8 42 50
Total 100 100 200

The last of these is the same as for the propensity score
For our Second Example
Factors in Model in
Addition to Exposure
Variance Multiplier, axx.
None 0.0200
Age 0.0242
Sex 0.0257
sex + age 0.0267
sex + age + sex × age 0.0271

Conditional Distributions and
The Propensity Score
• The appropriateness of the propensity
score is always illustrated in terms of the
expectation of the treatment estimate
– Unbiasedness in linear framework
• Its suitability when looked at in terms of
the full conditional distribution less obvious
as will now be demonstrated

Suppose that we are interested in the conditional distribution of an
outcome variable Y given a putative causal variable X and a further
covariate W. We wish to investigate the circumstances under
which W can be ignored. That is to say we wish to know the
conditions that
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( )
.
,,1)4(
)3(
)2(
)1(
XWYXYW
XWfYXWf
XWf
YXWf
XYfXWYf
XWf
YXWf
XYfXWYf
YXWfXYfXfXWYfXWfXf
YXWfXYfXfYXWf
XWYfXWfXfYXWf
⊥⊥
=∩∴=
∩
=∩
∩
=∩∴
∩=∩
∩=∩∩
∩=∩∩
henceandimplieswhich
andunless
toequivalentgeneralinnotis(3)Now,
(2)and(1)ofRHSEquating
and
Now,
L
L
L
L
( ) ( )f Y W X f Y X∩ =

Conclusion
• The claims that are made for the
propensity score are true in terms of
conditional expectation (at least for the
linear model)
• However, they are not true in terms of the
full conditional model
• For W to be ignorable in that sense
requires
• This is the ANCOVA condition
XWY ⊥

Implications for Modelling
• It is not true that ignoring a covariate that
is predictive of outcome but not
assignment is acceptable
• In the linear case estimators are unbiased
but their variances are “incorrect”
• More generally, however, conditional and
unconditional estimators are different
– Logistic regression, survival analysis

Y
Z
X4
X2
X3
X1
X5
X6
What should join Z
in the model?

Y
Z
X4
X2
X3
X1
X5
X6
With inappropriate
terms removed

Y
Z
X4
X2
X3
X1
X5
X6
Propensity score
adjustment

Y
Z
X4
X2
X3
X1
X5
X6
ANCOVA
adjustment

Non-linear example
Simulation as before but binary response on Y >1.5
With balanced covariates
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant -2.442 0.185 -13.18 <.001 0.08696
W1 4.98 8.51 0.59 0.558 146.2
W2e -1.73 8.51 -0.20 0.839 0.1768
X 1.689 0.192 8.78 <.001 5.413
antilog of
Parameter estimate s.e. t(*) t pr. estimate
Constant -0.4642 0.0918 -5.06 <.001 0.6287
X 0.962 0.130 7.40 <.001 2.617

Not convinced?
An Example
• An open trial of the effect of alcohol
consumption on the ability to memorize
word lists
• Volunteers to be drawn at random and
divided into two groups
• One lot to be given a glass of wine, the
other a glass of water

Two Possible Approaches
Experiment 1 Experiment 2
• A subject has name
drawn at random
• If chosen for control
group, given blue ball
• If chosen for treatment
group given red ball
• “All you who have a blue
ball please come to
receive your glass of
water, red ball to receive
your glass of wine”
• A subject has name
drawn at random
• If chosen for control
group given glass of beer
to drink
• Otherwise given nothing
• “All you who have had a
beer come to receive
your glass of water, if you
had nothing, to receive
your glass of wine.”

Experiment 1
• Probability of receiving wine if ball blue = 0
• Probability of receiving wine if ball red = 1
• The propensity score takes on the values
0 and 1
• Do you have to stratify by the propensity
score?

Experiment 2
• Probability of receiving wine if beer = 0
• Probability of receiving wine if no beer = 1
• The propensity score takes on the values
0 and 1
• Do you have to stratify by the propensity
score?

The Difference?
• The difference between these two
experiments is not the propensity score
• This is 0 and 1 in both cases and all
subjects in both cases have a score of 0
and 1
• The difference is that in the first case the
covariate used to construct the score is
predictive of outcome and in the second it
is not.

Consequence
• It is association with outcome that is
important
– ANCOVA tradition
• Not association with assignment
– Propensity point of view

And that Question
• Consider these two experiments
– A completely randomised trial
– Patients allocated with 50% probability to A or B
– Randomised matched pairs
– Member of any pair randomised with 50% probability to A
or B
• In analysing, would you ignore the
matching in the second case?
• The propensity score philosophy says you
can!

Finally
All scientific work is incomplete - whether it be observational or experimental.
All scientific work is liable to be upset or modified by advancing knowledge.
That does not confer upon us a freedom to ignore the knowledge we already
have, or to postpone the action that it appears to demand at a given time.
Sir Austin Bradford Hill , 1965

Confounding, politics, frustration and knavish tricks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Confounding, politics, frustration and knavish tricks

Similar to Confounding, politics, frustration and knavish tricks (20)

More from Stephen Senn

More from Stephen Senn (6)

Recently uploaded

Recently uploaded (20)

Confounding, politics, frustration and knavish tricks

Editor's Notes