Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Rothamsted school meets Lord's paradox


Published on

Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle  in 1967.  It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only  two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.

Published in: Data & Analytics
  • Be the first to comment

The Rothamsted school meets Lord's paradox

  1. 1. The Rothamsted School meets Lord’s Paradox Stephen Senn (C) Stephen Senn 2018 1
  2. 2. Outline Topic Number of Slides Adjusting for baseline in clinical trials 12 Lord’s Paradox 6 The Book of Why versus Lord’s Paradox 2 The Rothamsted School 8 Genstat® versus Lord’s paradox 11 Conclusions 2 (C) Stephen Senn 2018 2
  3. 3. Disclaimer • I shall be criticising one particular claim made in The Book of Why • This should not be taken as a criticism of the causal calculus • In fact, I regard this as being important for statisticians • I freely admit that my work would benefit from being more familiar with it (C) Stephen Senn 2018 3
  4. 4. Adjusting for baseline in clinical trials Some standard and not–so standard theory (C) Stephen Senn 2018 4
  5. 5. (C) Stephen Senn 2018 5 SACS and ANCOVA A simple randomised clinical trial in which there are two treatment groups and only two measurements per patient: a baseline measurement, X and an outcome measurement, Y. Popular choices of outcome measure are 1) raw outcomes Y 2) change score d = Y - X 3) covariance adjusted outcomes Y - X. (where  is chosen appropriately) NB As Laird (Am Stat., 37, 329-330, 1983) has shown, covariate adjusted change scores are the same as 3)
  6. 6. Which to use? • ANCOVA has a variance that is always less than or equal to the other two • Provided the slope (adjustment) parameter is known • The Gauss-Markov theorem does not apply to random regressors so one could do slightly better in theory • Analogous to recovering inter-block information • ANCOVA is conditionally unbiased • It exhaust the information in the baselines • If an additive model applies • Nevertheless, it is usually better and most commentators have concluded it is the approach to use (C) Stephen Senn 2018 6
  7. 7. (C) Stephen Senn 2018 7 Here the variances at outcome and baseline are assumed to be the same in which case the regression coefficient is just the correlation
  8. 8. (C) Stephen Senn 2018 8 Counter-Claims • There is a significant minority of papers arguing against ANCOVA as a means of dealing with bias • E.g. Liang and Zeger (2000), Sankyha, Samuelson (1986), American Statistician • The variance claims are accepted • However, claims are made that unless there is balance at baseline ANCOVA is biased
  9. 9. (C) Stephen Senn 2018 9 Justification of the Counter-Claim                                        1)( )( ctCt ctCt Ct ct cctcttt ccc ctt cc XXYYE XXYYE YYE XXE Hence YE YE XE XE This just proves how misleading models can be SACS is unbiased ANCOVA is biased unless 𝜃 = 0
  10. 10. (C) Stephen Senn 2005 10 A Counter Counter-Example • Suppose we design a bizarre clinical trial • Only persons with diastolic blood pressure at baseline equal to 95mmHg or 105mmHg may enter • In the first stratum they are randomised 3 to 1 and in the second 1 to 3 • Situation as follows
  11. 11. (C) Stephen Senn 2005 11 A Stupid Trial Numbers of Patients by dbp and Treatment Treatment A B Total Baseline diastolic blood pressure 95mm Hg 300 100 400 105mm Hg 100 300 400 Total 400 400 800
  12. 12. (C) Stephen Senn 2005 12 Approach to Analysis • Stratify by baseline dbp • Produce treatment estimate for each stratum • Overall estimate is average of the two estimates • Stratification deals with the imbalance
  13. 13. (C) Stephen Senn 2005 13 An Equivalent Approach • Create dummy variable stratum S = -1 if baseline dbp, X = 95mmHg S = 1 if baseline dbp, X =105 mmHg • Regress dbp at outcome, Y, on treatment indicator T and on stratum indicator S • Estimate will be same as by stratification • If you want variance estimate to be exactly the same you need to include interaction also
  14. 14. (C) Stephen Senn 2005 14 An Equivalent Equivalent Approach • Regress Y on T and X rather than on T and S • This is called ANCOVA! • Note that S= (X-100)/5 • Hence, this approach is equivalent to the previous one, which is equivalent to stratification, which is unbiased • On the other hand SACS is biased • Hence we have produced a counter-example
  15. 15. (C) Stephen Senn 2005 15 Conclusion • Contrary to what is often claimed there are cases where ANCOVA is unbiased but SACS is biased. • No simple statement of the form “ANCOVA is more efficient but SACS is unbiased” is possible. • In fact it is very difficult to imagine cases where SACS is the preferred analysis
  16. 16. Lord’s Paradox Baffling statisticians for over half a century (C) Stephen Senn 2018 16
  17. 17. (C) Stephen Senn 2018 17 Lord’s Paradox Lord, F.M. (1967) “ A paradox in the interpretation of group comparisons”, Psychological Bulletin, 68, 304- 305. “A large university is interested in investigating the effects on the students of the diet provided in the university dining halls….Various types of data are gathered. In particular the weight of each student at the time of his arrival in September and his weight in the following June are recorded” We shall consider this in the Wainer and Brown version (also considered by Pearl) in which there are two halls each assigned a different one of two diets being compared.
  18. 18. (C) Stephen Senn 2018 18 Two Statisticians Statistician One (Say John) • Calculates difference in weight (outcome-baseline) for each hall • No significant difference between diets as regards this ‘change score’ • Concludes no evidence of difference between diets Statistician Two (Say Jane) • Adjusts for initial weight as a covariate • Finds significant diet effect on adjusted weight • Concludes there is a difference between diets
  19. 19. (C) Stephen Senn 2018 19
  20. 20. (C) Stephen Senn 2018 20 John’s analysis: comparing change-scores)
  21. 21. (C) Stephen Senn 2018 21 Jane’s analysis: Comparing covariate adjusted scores
  22. 22. Pearl’s causal calculus versus Lord’s Paradox Is expectation enough? What about variance? (C) Stephen Senn 2018 22
  23. 23. Judea Pearl, born 1936 • Israeli-American computer scientist and philosopher • Has developed powerful causal calculus based on distinguishing between seeing and doing • Explains Simpson’s paradox • Causality: Models, Reasoning and Inference (2000) • Has recently co-authored a popular book with Dana Mackenzie, The Book of Why, 2018 (C) Stephen Senn 2018 23
  24. 24. Pearl & Mackenzie, 2018 (C) Stephen Senn 2018 24 D (Diet) WF W1 However, for statisticians who are trained in “conventional” (i.e. model-blind) methodology and avoid using causal lenses, it is deeply paradoxical The Book of Why p217 In this diagram, W1, is a confounder of D and WF and not a mediator. Therefore, the second statistician would be unambiguously right here. The Book of Why p216
  25. 25. The Rothamsted School A century of variance from ANOVA to Genstat® and back via General Balance (C) Stephen Senn 2018 25
  26. 26. The Rothamsted School (C) Stephen Senn 2018 26 RA Fisher 1890-1962 Variance, ANOVA Randomisation, design, significance tests Frank Yates 1902-1994 Factorials, recovering Inter-block information John Nelder 1924-2010 General balance, computing Genstat® and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc
  27. 27. (C) Stephen Senn 2018 27 General Balance • An idea of John Nelder’s • Two papers in the Proceedings of the Royal Society, 1965 concerning “The analysis of randomized experiments with orthogonal block structure” • Block structure and the null analysis of variance • Treatment structure and the general analysis of variance
  28. 28. (C) Stephen Senn 2018 28 Basic Idea • Splits an experiment into two radically different components • The block structure, which describes the way that the experimental units are organised • The way that variation amongst units can be described • Null ANOVA – an idea of Anscombe’s • The treatment structure, which reflects the way that treatments are combined for the scientific purpose of the experiment
  29. 29. (C) Stephen Senn 2018 29 Design Driven Modelling • Together with a third piece of information, the design matrix, these determine the analysis of variance • Note that because both block and treatments structure can be hierarchical such a design matrix is not, on its own sufficient to derive an ANOVA • But together with John’s block and treatment structure it is • For designs exhibiting general balance • This approach is incorporated in Genstat®
  30. 30. Genstat® Help File Example (C) Stephen Senn 2018 30 Block Plot S N Yield 1 1 0 0 0.750 1 4 0 180 1.204 1 3 0 230 0.799 1 12 10 0 0.925 1 5 10 180 1.648 1 8 10 230 1.463 1 7 20 0 0.654 1 2 20 180 1.596 1 10 20 230 1.594 1 11 40 0 0.526 1 9 40 180 1.672 1 6 40 230 1.804 2 8 0 0 0.503 2 10 0 180 0.489 etc " This is a field experiment to study the effects of nitrogen and sulphur on the yield of wheat with a randomized block design." BLOCKSTRUCTURE Block / Plot TREATMENTSTRUCTURE N * S ANOVA [PRINT=aov; FPROBABILITY=yes] Yield
  31. 31. (C) Stephen Senn 2018 31
  32. 32. How R is unsatisfactory (C) Stephen Senn 2018 32
  33. 33. Genstat® versus Lord’s paradox Rothamsted makes it simple (C) Stephen Senn 2018 33
  34. 34. Start with the randomised equivalent • We suppose that the diets had been randomised to the two halls • Le us suppose there are 100 students per hall • Generate some data • See what Genstat® says about analysis • Note that it is a particular feature of Genstat® that it does not have to have outcome data to do this • Given the block and treatment structure alone it will give us a skeleton ANOVA • We start by ignoring the covariate (C) Stephen Senn 2018 34
  35. 35. (C) Stephen Senn 2018 35 BLOCKSTRUCTURE Hall/Student TREATMENTSTRUCTURE Diet ANOVA Analysis of variance Source of variation d.f. Hall stratum Diet 1 Hall.Student stratum 198 Total 199 Code Output Gentstat® points out the obvious (which, however, has been universally overlooked). There are no degrees of freedom to estimate the variability of the Diet estimate which appears in the Hall and not the Hall.Student stratum
  36. 36. Consequences and further considerations • Using outcomes only we cannot analyse this experiment • We have no degrees of freedom to estimate the variance of any treatment estimate • We will return to baselines in due course • Let’s first consider how we could fix this ‘experiment’ • Let’s increase the number of halls, while keeping the total number of students we shall follow fixed • 20 halls • 10 halls per diet • 10 students followed per hall (C) Stephen Senn 2018 36
  37. 37. (C) Stephen Senn 2018 37 Analysis of variance Source of variation d.f. Hall stratum Diet 1 Residual 18 Hall.Student stratum 180 Total 199 We now see that this experiment is analysable. Had we carried out an experiment of this form we would not need to use baseline values but we could do. Let’s consider John’s and Jane’s estimators again. Would they produce valid analyses?
  38. 38. The two estimators compared John Type Change score Formula 𝑌 − 𝑌 − 𝑋 − 𝑋 Consistent? Yes Correct variance? Not without strong assumptions Jane Type ANCOVA Formula 𝑌 − 𝑌 − 𝑟 𝑋 − 𝑋 Consistent? Yes Correct variance? Not without strong assumptions (C) Stephen Senn 2018 38 NB 1. As the number of halls goes to infinity, then the second term for either estimator goes to zero. 2. Since the first term is the same, asymptotically they give the same answer. 3. The expectation of the first term, over all randomisations, is the effect of diet. 4. Thus, the two estimators are consistent. 5. The question is, which has the correct variance?
  39. 39. Adding covariates Parameter settings Analysis code (C) Stephen Senn 2018 39 Students per hall Number of halls per diet 10 10 g2, variance between halls s2, variance within halls 25.00 16.00 , average student weight D, Effect of diet 75.00 3.00 rh, between halls rs, within halls 0.70 0.50 Correct BLOCKSTRUCTURE Hall/Student TREATMENTSTRUCTURE Diet COVARIATE Base ANOVA Weight Or incorrect BLOCKSTRUCTURE Student etc
  40. 40. Correct block structure (C) Stephen Senn 2018 40 367.337 29.856 = 12.3 2.73 0.779 = 12.3
  41. 41. Incorrect block-structure (C) Stephen Senn 2018 41 376.84 6.253 = 60.3 . . =60.4
  42. 42. We now understand the situation well enough to return to the two hall case Change-score (John) • The between hall component of variance must be zero having subtracted the baseline • Between-hall regression must be equal to 1 ANCOVA (Jane) • The between hall component of variance must be zero having conditioned on the baselines • The regression between halls must be as predicted by the regression within (C) Stephen Senn 2018 42 The minimal requirement for the analyses to be valid is the following
  43. 43. (C) Stephen Senn 2005 43 The Necessary Condition for ANCOVA to be Unbiased           t C t c t C t c t C E Y Y X X E Y Y X X E Y Y                    Or in everyday language that the bias in the raw comparison at outcome should be  times the bias at baseline where  is the individual regression effect. This requires a strong assumption that is untestable in the two-hall case. But in any case, the fact that the estimate is unbiased is not a guarantee that the estimate of the variance of the estimate is unbiased
  44. 44. Conclusions Both particular and general (C) Stephen Senn 2018 44
  45. 45. Lord’s Paradox • It is not true that ‘the second statistician would be unambiguously right’ • Additional untestable assumptions would be needed • This does not mean that the first statistician would be right • A lesson is that we need to consider the probability distribution of an inference • At least the variance and not just the expectation • I note, by the by, that this is a mistake made in developing the propensity score approach (See Senn, Graf and Caputo, 2007) (C) Stephen Senn 2018 45
  46. 46. More generally • The Rothamsted approach is valuable but sadly neglected • Only implemented in Genstat® • An R package is in development by Cullis and Smith • All too often we take completely randomised designs as being the default analogy to observational data-sets • More complex designs may be appropriate • Such as cluster randomised • Even where we have identified the ‘correct’ confounders (perhaps with the help of causal calculus) we may be getting the standard errors wrong • Lessons for epidemiology? • Variances matter • It is an open question for me whether the causal calculus in its current form is adequate to deal with complex data-sets • Can it deal adequately with hierarchical structures? (C) Stephen Senn 2018 46
  47. 47. References 47 1. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block structure and the null analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:147-62. 2. Nelder JA. The analysis of randomised experiments with orthogonal block structure II. Treatment structure and the general analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:163-78. 3. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin. 1967;66:304-5. 4. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, editors. Principals of Modern Psychological Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983. 5. Liang KY, Zeger SL. Longitudinal data analysis of continuous and discrete responses for pre-post designs. Sankhya-the Indian Journal of Statistics Series B. 2000;62:134-48. 6. Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. American Statistician. 2004;58(2):117-23. 7. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine. 2006;25(24):4334–44. 8. Senn SJ, Graf E, Caputo A. Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure. Statistics in Medicine. 2007;26(30):5529-44. 9. Van Breukelen GJ. ANCOVA versus change from baseline had more power in randomized studies and more bias in nonrandomized studies. Journal of clinical epidemiology. 2006;59(9):920-5. 10. Pearl J, Mackenzie D. The Book of Why: Basic Books; 2018.