The Rothamsted school meets Lord's paradox

The Rothamsted School meets
Lord’s Paradox
Stephen Senn
(C) Stephen Senn 2018 1

Outline
Topic Number of Slides
Adjusting for baseline in clinical trials 12
Lord’s Paradox 6
The Book of Why versus Lord’s Paradox 2
The Rothamsted School 8
Genstat® versus Lord’s paradox 11
Conclusions 2

Disclaimer
• I shall be criticising one
particular claim made in The
Book of Why
• This should not be taken as a
criticism of the causal calculus
• In fact, I regard this as being
important for statisticians
• I freely admit that my work
would benefit from being more
familiar with it

Adjusting for baseline in clinical
trials
Some standard and not–so standard theory

SACS and ANCOVA
A simple randomised clinical trial in which there are two
treatment groups and only two measurements per patient: a
baseline measurement, X and an outcome measurement, Y.
Popular choices of outcome measure are
1) raw outcomes Y
2) change score d = Y - X
3) covariance adjusted outcomes Y - X. (where  is chosen
appropriately)
NB As Laird (Am Stat., 37, 329-330, 1983) has shown, covariate
adjusted change scores are the same as 3)

Which to use?
• ANCOVA has a variance that is always less than or equal to the other
two
• Provided the slope (adjustment) parameter is known
• The Gauss-Markov theorem does not apply to random regressors so one
could do slightly better in theory
• Analogous to recovering inter-block information
• ANCOVA is conditionally unbiased
• It exhaust the information in the baselines
• If an additive model applies
• Nevertheless, it is usually better and most commentators have
concluded it is the approach to use

Here the variances
at outcome and
baseline are
assumed to be the
same in which case
the regression
coefficient is just the
correlation

Counter-Claims
• There is a significant minority of papers arguing against ANCOVA as a
means of dealing with bias
• E.g. Liang and Zeger (2000), Sankyha, Samuelson (1986), American Statistician
• The variance claims are accepted
• However, claims are made that unless there is balance at baseline
ANCOVA is biased

Justification of the Counter-Claim
 
 
 
 
 
 
    
      















1)(
)(
ctCt
ctCt
Ct
ct
cctcttt
ccc
ctt
cc
XXYYE
XXYYE
YYE
XXE
Hence
YE
YE
XE
XE
This just proves how
misleading models can
be
SACS is unbiased
ANCOVA is biased unless
𝜃 = 0

A Counter Counter-Example
• Suppose we design a bizarre clinical trial
• Only persons with diastolic blood pressure at baseline equal to
95mmHg or 105mmHg may enter
• In the first stratum they are randomised 3 to 1 and in the second 1 to
3
• Situation as follows

A Stupid Trial
Numbers of Patients by dbp and Treatment
Treatment
A B Total
Baseline
diastolic
blood
pressure
95mm Hg 300 100 400
105mm Hg 100 300 400
Total 400 400 800

Approach to Analysis
• Stratify by baseline dbp
• Produce treatment estimate for each stratum
• Overall estimate is average of the two estimates
• Stratification deals with the imbalance

An Equivalent Approach
• Create dummy variable stratum
S = -1 if baseline dbp, X = 95mmHg
S = 1 if baseline dbp, X =105 mmHg
• Regress dbp at outcome, Y, on treatment indicator T and on stratum
indicator S
• Estimate will be same as by stratification
• If you want variance estimate to be exactly the same you need to include
interaction also

An Equivalent Equivalent Approach
• Regress Y on T and X rather than on T and S
• This is called ANCOVA!
• Note that S= (X-100)/5
• Hence, this approach is equivalent to the previous one, which is
equivalent to stratification, which is unbiased
• On the other hand SACS is biased
• Hence we have produced a counter-example

Conclusion
• Contrary to what is often claimed there are cases where ANCOVA is
unbiased but SACS is biased.
• No simple statement of the form “ANCOVA is more efficient but SACS
is unbiased” is possible.
• In fact it is very difficult to imagine cases where SACS is the preferred
analysis

Lord’s Paradox
Baffling statisticians for over half a century

Lord’s Paradox
Lord, F.M. (1967) “ A paradox in the interpretation of
group comparisons”, Psychological Bulletin, 68, 304-
305.
“A large university is interested in investigating the effects on
the students of the diet provided in the university dining
halls….Various types of data are gathered. In particular the
weight of each student at the time of his arrival in September
and his weight in the following June are recorded”
We shall consider this in the Wainer and Brown version (also
considered by Pearl) in which there are two halls each
assigned a different one of two diets being compared.

Two Statisticians
Statistician One (Say John)
• Calculates difference in weight
(outcome-baseline) for each hall
• No significant difference
between diets as regards this
‘change score’
• Concludes no evidence of
difference between diets
Statistician Two (Say Jane)
• Adjusts for initial weight as a
covariate
• Finds significant diet effect on
adjusted weight
• Concludes there is a difference
between diets

John’s analysis:
comparing
change-scores)

Jane’s analysis:
Comparing covariate
adjusted scores

Pearl’s causal calculus versus Lord’s
Paradox
Is expectation enough? What about variance?

Judea Pearl, born 1936
• Israeli-American computer scientist and philosopher
• Has developed powerful causal calculus based on distinguishing
between seeing and doing
• Explains Simpson’s paradox
• Causality: Models, Reasoning and Inference (2000)
• Has recently co-authored a popular book with Dana Mackenzie, The
Book of Why, 2018

Pearl & Mackenzie, 2018
D
(Diet)
WF
W1 However, for statisticians who
are trained in “conventional”
(i.e. model-blind) methodology
and avoid using causal lenses,
it is deeply paradoxical
The Book of Why p217
In this diagram, W1, is a
confounder
of D and WF and not a
mediator. Therefore, the
second statistician would
be unambiguously right
here.
The Book of Why p216

The Rothamsted School
A century of variance from ANOVA to Genstat® and back via General Balance

The Rothamsted School
RA Fisher
1890-1962
Variance, ANOVA
Randomisation, design,
significance tests
Frank Yates
1902-1994
Factorials, recovering
Inter-block information
John Nelder
1924-2010
General balance, computing
Genstat®
and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc

General Balance
• An idea of John Nelder’s
• Two papers in the Proceedings of the Royal Society, 1965 concerning
“The analysis of randomized experiments with orthogonal block
structure”
• Block structure and the null analysis of variance
• Treatment structure and the general analysis of variance

Basic Idea
• Splits an experiment into two radically different components
• The block structure, which describes the way that the experimental units are
organised
• The way that variation amongst units can be described
• Null ANOVA – an idea of Anscombe’s
• The treatment structure, which reflects the way that treatments are
combined for the scientific purpose of the experiment

Design Driven Modelling
• Together with a third piece of information, the design matrix, these
determine the analysis of variance
• Note that because both block and treatments structure can be hierarchical
such a design matrix is not, on its own sufficient to derive an ANOVA
• But together with John’s block and treatment structure it is
• For designs exhibiting general balance
• This approach is incorporated in Genstat®

Genstat® Help File Example
Block Plot S N Yield
1 1 0 0 0.750
1 4 0 180 1.204
1 3 0 230 0.799
1 12 10 0 0.925
1 5 10 180 1.648
1 8 10 230 1.463
1 7 20 0 0.654
1 2 20 180 1.596
1 10 20 230 1.594
1 11 40 0 0.526
1 9 40 180 1.672
1 6 40 230 1.804
2 8 0 0 0.503
2 10 0 180 0.489
etc
" This is a field experiment
to study the effects of
nitrogen and sulphur on the
yield of wheat with a
randomized block design."
BLOCKSTRUCTURE Block / Plot
TREATMENTSTRUCTURE N * S
ANOVA [PRINT=aov; FPROBABILITY=yes]
Yield

How R is unsatisfactory

Genstat® versus Lord’s paradox
Rothamsted makes it simple

Start with the randomised equivalent
• We suppose that the diets had been randomised to the two halls
• Le us suppose there are 100 students per hall
• Generate some data
• See what Genstat® says about analysis
• Note that it is a particular feature of Genstat® that it does not have to
have outcome data to do this
• Given the block and treatment structure alone it will give us a
skeleton ANOVA
• We start by ignoring the covariate

BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
ANOVA
Analysis of variance
Source of variation d.f.
Hall stratum
Diet 1
Hall.Student stratum 198
Total 199
Code Output
Gentstat® points out the obvious (which, however, has
been universally overlooked). There are no
degrees of freedom to estimate the variability of the
Diet estimate which appears in the Hall and not the
Hall.Student stratum

Consequences and further considerations
• Using outcomes only we cannot analyse this experiment
• We have no degrees of freedom to estimate the variance of any treatment
estimate
• We will return to baselines in due course
• Let’s first consider how we could fix this ‘experiment’
• Let’s increase the number of halls, while keeping the total number of
students we shall follow fixed
• 20 halls
• 10 halls per diet
• 10 students followed per hall

Analysis of variance
Source of variation d.f.
Hall stratum
Diet 1
Residual 18
Hall.Student stratum 180
Total 199
We now see that this experiment is analysable. Had we
carried out an experiment of this form we would not
need to use baseline values but we could do. Let’s
consider John’s and Jane’s estimators again.
Would they produce valid analyses?

The two estimators compared
John
Type Change score
Formula 𝑌 − 𝑌 − 𝑋 − 𝑋
Consistent? Yes
Correct variance? Not without strong
assumptions
Jane
Type ANCOVA
Formula 𝑌 − 𝑌 − 𝑟 𝑋 − 𝑋
Consistent? Yes
Correct variance? Not without strong
assumptions
NB
1. As the number of halls goes to infinity, then the second term for either estimator goes to zero.
2. Since the first term is the same, asymptotically they give the same answer.
3. The expectation of the first term, over all randomisations, is the effect of diet.
4. Thus, the two estimators are consistent.
5. The question is, which has the correct variance?

Adding covariates
Parameter settings Analysis code
Students per hall Number of halls per diet
10 10
g2, variance between halls s2, variance within halls
25.00 16.00
, average student weight D, Effect of diet
75.00 3.00
rh, between halls rs, within halls
0.70 0.50
Correct
BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
COVARIATE Base
ANOVA Weight
Or incorrect
BLOCKSTRUCTURE Student
etc

Correct block structure
367.337
29.856
= 12.3
2.73
0.779
= 12.3

Incorrect block-structure
376.84
6.253
= 60.3
.
.
=60.4

We now understand the situation well enough to
return to the two hall case
Change-score (John)
• The between hall component of
variance must be zero having
subtracted the baseline
• Between-hall regression must be
equal to 1
ANCOVA (Jane)
• The between hall component of
variance must be zero having
conditioned on the baselines
• The regression between halls
must be as predicted by the
regression within
The minimal requirement for the analyses to be valid is the following

The Necessary Condition for ANCOVA to be
Unbiased
   
   
 
t C t c
t C t c
t C
E Y Y X X
E Y Y X X
E Y Y
 
 
 
     
   
  
Or in everyday language that the bias in the raw comparison at outcome
should be  times the bias at baseline where  is the individual regression
effect.
This requires a strong assumption that is untestable in the two-hall case.
But in any case, the fact that
the estimate is unbiased is
not a guarantee that the
estimate of the variance of
the estimate is unbiased

Conclusions
Both particular and general

Lord’s Paradox
• It is not true that ‘the second statistician would be unambiguously
right’
• Additional untestable assumptions would be needed
• This does not mean that the first statistician would be right
• A lesson is that we need to consider the probability distribution of an
inference
• At least the variance and not just the expectation
• I note, by the by, that this is a mistake made in developing the propensity
score approach (See Senn, Graf and Caputo, 2007)

More generally
• The Rothamsted approach is valuable but sadly neglected
• Only implemented in Genstat®
• An R package is in development by Cullis and Smith
• All too often we take completely randomised designs as being the default analogy
to observational data-sets
• More complex designs may be appropriate
• Such as cluster randomised
• Even where we have identified the ‘correct’ confounders (perhaps with the help of causal
calculus) we may be getting the standard errors wrong
• Lessons for epidemiology?
• Variances matter
• It is an open question for me whether the causal calculus in its current form is
adequate to deal with complex data-sets
• Can it deal adequately with hierarchical structures?

References
47
1. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block
structure and the null analysis of variance. Proceedings of the Royal Society of London Series A.
1965;283:147-62.
2. Nelder JA. The analysis of randomised experiments with orthogonal block structure II.
Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London
Series A. 1965;283:163-78.
3. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin.
1967;66:304-5.
4. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, editors. Principals of
Modern Psychological Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983.
5. Liang KY, Zeger SL. Longitudinal data analysis of continuous and discrete responses for pre-post
designs. Sankhya-the Indian Journal of Statistics Series B. 2000;62:134-48.
6. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences:
Illustrated with medical school admission and licensing data. American Statistician. 2004;58(2):117-23.
7. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine.
2006;25(24):4334–44.
8. Senn SJ, Graf E, Caputo A. Stratification for the propensity score compared with linear regression
techniques to assess the effect of treatment or exposure. Statistics in Medicine. 2007;26(30):5529-44.
9. Van Breukelen GJ. ANCOVA versus change from baseline had more power in randomized studies
and more bias in nonrandomized studies. Journal of clinical epidemiology. 2006;59(9):920-5.
10. Pearl J, Mackenzie D. The Book of Why: Basic Books; 2018.

The Rothamsted school meets Lord's paradox

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Rothamsted school meets Lord's paradox

Similar to The Rothamsted school meets Lord's paradox (20)

More from Stephen Senn

More from Stephen Senn (18)

Recently uploaded

Recently uploaded (20)

The Rothamsted school meets Lord's paradox