The document discusses lessons from small experiments and the Rothamsted School approach to experimental design and analysis. It provides three key lessons:
1) Variances matter - if you cannot estimate variances precisely, you do not know how to interpret your results or make inferences. The Rothamsted approach matches the analysis to the experimental design to properly account for variances.
2) Experimental designs should eliminate sources of variation that can be controlled, like blocking by centers. This allows the analysis to focus on remaining uncontrolled variations.
3) Lord's paradox arises because some analyses, like comparing change scores, do not adjust for important baseline covariates, while other analyses do adjust and find significant effects. Proper analysis depends on
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
Unfortunately, some have interpreted Numbers Needed to Treat as indicating the proportion of patients on whom the treatment has had a causal effect. This interpretation is very rarely, if ever, necessarily correct. It is certainly inappropriate if based on a responder dichotomy. I shall illustrate the problem using simple causal models.
One also sometimes encounters the claim that the extent to which two distributions of outcomes overlap from a clinical trial indicates how many patients benefit. This is also false and can be traced to a similar causal confusion.
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
Unfortunately, some have interpreted Numbers Needed to Treat as indicating the proportion of patients on whom the treatment has had a causal effect. This interpretation is very rarely, if ever, necessarily correct. It is certainly inappropriate if based on a responder dichotomy. I shall illustrate the problem using simple causal models.
One also sometimes encounters the claim that the extent to which two distributions of outcomes overlap from a clinical trial indicates how many patients benefit. This is also false and can be traced to a similar causal confusion.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
In Search of Lost Infinities: What is the “n” in big data?Stephen Senn
In designing complex experiments, agricultural scientists, with the help of their statistician collaborators, soon came to realise that variation at different levels had very different consequences for estimating different treatment effects, depending on how the treatments were mapped onto the underlying block structure. This was a key feature of the Rothamsted approach to design and analysis and a strong thread running through the work of Fisher, Yates and Nelder, being expressed in topics such as split-pot designs, recovering inter-block information and fractional factorials. The null block-structure of an experiment is key to this philosophy of design and analysis. However modern techniques for analysing experiments stress models rather than symmetries and this modelling approach requires much greater care in analysis, with the consequence that you can easily make mistakes and often will.
In this talk I shall underline the obvious, but often unintentionally overlooked, fact that understanding variation at the various levels at which it occurs is crucial to analysis. I shall take three examples, an application of John Nelder’s theory of general balance to Lord’s Paradox, the use of historical data in drug development and a hybrid randomised non-randomised clinical trial, the TARGET study, to show that the data that many, including those promoting a so-called causal revolution, assume to be ‘big’ may actually be rather ‘small’. The consequence is that there is a danger that the size of standard errors will be underestimated or even that the appropriate regression coefficients for adjusting for confounding may not be identified correctly.
I conclude that an old but powerful experimental design approach holds important lessons for observational data about limitations in interpretation that mere numbers cannot overcome. Small may be beautiful, after all.
When estimating sample sizes for clinical trials there are several different views that might be taken as to what definition and meaning should be given to the sought-for treatment effect. However, if the concept of a ‘minimally important difference’ (MID) does have relevance to interpreting clinical trials (which can be disputed) then its value cannot be the same as the ‘clinically relevant difference’ (CRD) that would be used for planning them.
A doubly pernicious use of the MID is as a means of classifying patients as responders and non-responders. Not only does such an analysis lead to an increase in the necessary sample size but it misleads trialists into making causal distinctions that the data cannot support and has been responsible for exaggerating the scope for personalised medicine.
In this talk these statistical points will be explained using a minimum of technical detail.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one.
A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising.
On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former.
The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests.
Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws.
By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.
The response to the COVID-19 crisis by various vaccine developers has been extraordinary, both in terms of speed of response and the delivered efficacy of the vaccines. It has also raised some fascinating issues of design, analysis and interpretation. I shall consider some of these issues, taking as my example, five vaccines: Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, Novavax, and J&J Janssen but concentrating mainly on the first two. Among matters covered will be concurrent control, efficient design, issues of measurement raised by two-shot vaccines and implications for roll-out, and the surprising effectiveness of simple analyses. Differences between the five development programmes as they affect statistics will be covered but some essential similarities will also be discussed.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
What should we expect from reproducibiliryStephen Senn
Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance.
Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.
Presidents' invited lecture ISCB Vigo 2017
Discusses various issues to do with how randomised clinical trials should be analysed. See also https://errorstatistics.com/2017/07/01/s-senn-fishing-for-fakes-with-fisher-guest-post/
Sample size determination in clinical trials is considered from various ethical and practical perspectives. It is concluded that cost is a missing dimension and that the value of information is key.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
The Rothamsted School & The analysis of designed experimentsStephenSenn2
A historical account is given of the approach of "The Rothamsted School" to the analysis of designed experiments. The link between the way that experiments are designed and how they should be analysed is fundamental to this approach. The key figures are RA Fisher, Frank Yates and John Nelder
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
A yet further issue is that causal analysis in epidemiology, which has brought valuable insights in many cases, has tended to stress point estimates and ignore standard errors. This has potentially misleading consequences.
An understanding of components of variation is key. Unfortunately, the development of two particular topics in recent years, evidence synthesis by the evidence based medicine movement and personalised medicine by bench scientists has either paid scant attention to components of variation or to the questions being asked or both resulting in confusion about many issues.
For instance, it is often claimed that numbers needed to treat indicate the proportion of patients for whom treatments work, that inclusion criteria determine the generalisability of results and that heterogeneity means that a random effects meta-analysis is required. None of these is true. The scope for personalised medicine has very plausibly been exaggerated and an important cause of variation in the healthcare system, physicians, is often overlooked.
I shall argue that thinking about questions is important.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
In Search of Lost Infinities: What is the “n” in big data?Stephen Senn
In designing complex experiments, agricultural scientists, with the help of their statistician collaborators, soon came to realise that variation at different levels had very different consequences for estimating different treatment effects, depending on how the treatments were mapped onto the underlying block structure. This was a key feature of the Rothamsted approach to design and analysis and a strong thread running through the work of Fisher, Yates and Nelder, being expressed in topics such as split-pot designs, recovering inter-block information and fractional factorials. The null block-structure of an experiment is key to this philosophy of design and analysis. However modern techniques for analysing experiments stress models rather than symmetries and this modelling approach requires much greater care in analysis, with the consequence that you can easily make mistakes and often will.
In this talk I shall underline the obvious, but often unintentionally overlooked, fact that understanding variation at the various levels at which it occurs is crucial to analysis. I shall take three examples, an application of John Nelder’s theory of general balance to Lord’s Paradox, the use of historical data in drug development and a hybrid randomised non-randomised clinical trial, the TARGET study, to show that the data that many, including those promoting a so-called causal revolution, assume to be ‘big’ may actually be rather ‘small’. The consequence is that there is a danger that the size of standard errors will be underestimated or even that the appropriate regression coefficients for adjusting for confounding may not be identified correctly.
I conclude that an old but powerful experimental design approach holds important lessons for observational data about limitations in interpretation that mere numbers cannot overcome. Small may be beautiful, after all.
When estimating sample sizes for clinical trials there are several different views that might be taken as to what definition and meaning should be given to the sought-for treatment effect. However, if the concept of a ‘minimally important difference’ (MID) does have relevance to interpreting clinical trials (which can be disputed) then its value cannot be the same as the ‘clinically relevant difference’ (CRD) that would be used for planning them.
A doubly pernicious use of the MID is as a means of classifying patients as responders and non-responders. Not only does such an analysis lead to an increase in the necessary sample size but it misleads trialists into making causal distinctions that the data cannot support and has been responsible for exaggerating the scope for personalised medicine.
In this talk these statistical points will be explained using a minimum of technical detail.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one.
A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising.
On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former.
The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests.
Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws.
By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.
The response to the COVID-19 crisis by various vaccine developers has been extraordinary, both in terms of speed of response and the delivered efficacy of the vaccines. It has also raised some fascinating issues of design, analysis and interpretation. I shall consider some of these issues, taking as my example, five vaccines: Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, Novavax, and J&J Janssen but concentrating mainly on the first two. Among matters covered will be concurrent control, efficient design, issues of measurement raised by two-shot vaccines and implications for roll-out, and the surprising effectiveness of simple analyses. Differences between the five development programmes as they affect statistics will be covered but some essential similarities will also be discussed.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
What should we expect from reproducibiliryStephen Senn
Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance.
Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.
Presidents' invited lecture ISCB Vigo 2017
Discusses various issues to do with how randomised clinical trials should be analysed. See also https://errorstatistics.com/2017/07/01/s-senn-fishing-for-fakes-with-fisher-guest-post/
Sample size determination in clinical trials is considered from various ethical and practical perspectives. It is concluded that cost is a missing dimension and that the value of information is key.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
The Rothamsted School & The analysis of designed experimentsStephenSenn2
A historical account is given of the approach of "The Rothamsted School" to the analysis of designed experiments. The link between the way that experiments are designed and how they should be analysed is fundamental to this approach. The key figures are RA Fisher, Frank Yates and John Nelder
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
A yet further issue is that causal analysis in epidemiology, which has brought valuable insights in many cases, has tended to stress point estimates and ignore standard errors. This has potentially misleading consequences.
An understanding of components of variation is key. Unfortunately, the development of two particular topics in recent years, evidence synthesis by the evidence based medicine movement and personalised medicine by bench scientists has either paid scant attention to components of variation or to the questions being asked or both resulting in confusion about many issues.
For instance, it is often claimed that numbers needed to treat indicate the proportion of patients for whom treatments work, that inclusion criteria determine the generalisability of results and that heterogeneity means that a random effects meta-analysis is required. None of these is true. The scope for personalised medicine has very plausibly been exaggerated and an important cause of variation in the healthcare system, physicians, is often overlooked.
I shall argue that thinking about questions is important.
Big data vs the RCT - Derek Angus - SSAI2017scanFOAM
A talk by Derek Angus at the 2017 meeting of the Scandinavian Society of Anaestesiology and Intensive Care Medicine.
All of the conference content can be found here: https://scanfoam.org/ssai2017/
Developed in collaboration between scanFOAM, SSAI and SFAI.
Experiments
A Quick History of Design of Experiments
Why We Use Experimental Designs
What is Design of Experiment
How Design of Experiment contributes
Terminology
Analysis Of Variation (ANOVA)
Basic Principle of Design of Experiments
Some Experimental Designs
An introduction to the stepped wedge cluster randomised trial Karla hemming
This set of slides introduces the SW-CRT within the context of several examples and specifically looks at some seemingly paradoxical results in the analysis of one SW-CRT.
It is argued that when it comes to nuisance parameters an assumption of ignorance is harmful. On the other hand this raises problems as to how far one should go in searching for further data when combining evidence.
History of how and why a complex cross-over trial was designed to prove the equivalence of two formulations of a beta-agonist and what the eventual results were. Presented at the Newton Institute 28 July 2008. Warning: following the important paper by Kenward & Roger Biostatistics, 2010, I no longer think the random effects analysis is appropriate, although, in fact the results are pretty much the same as for the fixed effects analysis.
Minimisation is an approach to allocating patients to treatment in clinical trials that forces a greater degree of balance than does randomisation. Here I explain why I dislike it.
The history of p-values is covered to try and shed light on a mystery: why did Student and Fisher agree numerically but disagree in terms of interpretation.?
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
1. To Infinity and Beyond
lessons for big data from small
experiments
Stephen Senn, Consultant Statistician, Edinburgh
(c) Stephen Senn 2019 1
Norwich 20 September 2019
2. Acknowledgements
(c) Stephen Senn 2019 2
My thanks for the kind invitation
This work is partly supported by the European Union’s 7th Framework Programme
for research, technological development and demonstration under grant
agreement no. 602552. “IDEAL”.
Work on historical controls is joint with Olivier Collignon, Anna Schritz and Riccardo
Spezia and will appear in Statistical Methods in Medical Research
3. (c) Stephen Senn 2019 3
An anniversary and
an excuse
• RA Fisher 1890-1962
• Statistician at Rothamsted agricultural
station 1919-1933 (Started in October)
• Most influential statistician ever
• Also a major figure in evolutionary biology
• Developed theory of small sample
inference and many modern concepts
• Likelihood, variance, sufficiency, ANOVA
• Developed theory of experimental design
• Blocking, Replication, Randomisation
• Genstat® is the only statistical package, as
far as I know, that reflects block structure
properly
This is the 100th anniversary of Fisher’s
arrival at Rothamsted research station
4. Outline
• The Rothamsted School
• Why block structure matters
• An example analysed with the help of the Rothamsted School and Genstat®
• The TARGET study – a practical example
• Historical controls – a warning
• Lord’s paradox – a misunderstanding
• Conclusions and lessons
(c) Stephen Senn 2019 4
5. The Rothamsted School
(c) Stephen Senn 2019 5
RA Fisher
1890-1962
Variance, ANOVA
Randomisation, design,
significance tests
Frank Yates
1902-1994
Factorials, recovering
Inter-block information
John Nelder
1924-2010
General balance, computing
Genstat®
and Frank Anscombe, David Finney, Rosemary Bailey, Roger Payne etc
6. A quote from John Nelder 1965
pp 147-148
(c) Stephen Senn 2019 6
7. Some jargon 1
• Outcomes
• What we measure at the end of a trial and regard as being relevant to judging the
effect of treatment
• Treatment
• What the experimenter varies
• Caution: Sometimes we refer to treatment as being a factor that has two or more levels (for
example beta-blocker or placebo) but sometimes, confusingly we may refer to one of the
levels as treatments .(For example, treatment versus placebo.)
• Analogy: geneticists sometimes use gene to mean locus and sometimes to mean allele . (The
gene for earwax or the gene for wet-type earwax.)
• Covariate
• Something else that may predict outcomes and can be measured before the trial
starts
(c) Stephen Senn 2019 7
8. Some jargon 2
• Unit
• That which is treated from the experimental point of view: usually patients,
but it could be centres or it might be episodes in the life of a patient
• Allocation algorithm
• The way that treatments are allocated to units (for example to patients)
• Blocking factor (or sometimes block)
• A particular type of covariate that can be recognised and accounted for in the
allocation process
• For example, centre We can choose to ‘block’ treatments by centre. We try make sure
that (say) equal numbers of patients within a given centre receive two treatments that
are being compared
(c) Stephen Senn 2019 8
9. What does the Rothamsted approach do?
• Matches the allocation procedure to the analysis. You can either
regard this as meaning
• The randomisation you carried out guides the analysis
• The analysis you intend guides the randomisation
• Or both
• Either way, the idea is to avoid inconsistency
• Regarding something as being very important at the allocation stage but not
at the analysis stage is inconsistent
• Permits you not only to take account of things seen but also to make
an appropriate allowance for things unseen
• The way the treatment structure maps onto the block structure is key
(c) Stephen Senn 2019 9
10. Trial in asthma
Basic situation
• Two beta-agonists compared
• Zephyr(Z) and Mistral(M)
• Block structure has several levels
• Different designs will be investigated
• Cluster
• Parallel group
• Cross-over Trial
• Each design will be blocked at a different
level
• NB Each design will collect
6 x 4 x 2 x 7 = 336 measurements of Forced
Expiratory Volume in one second (FEV1)
Block structure
Level Number
within higher
level
Total
Number
Centre 6 6
Patient 4 24
Episodes 2 48
Measurements 7 336
(c) Stephen Senn 2019 10
11. Block structure
• Patients are nested within centres
• Episodes are nested within patients
• Measurements are nested within
episodes
• Centres/Patients/Episodes/Measurements
(c) Stephen Senn 2019 11
Measurements not shown
12. Possible designs
• Cluster randomised
• In any given centre all the patients either receive Zephyr (Z) or Mistral (M) in
both episodes
• Three centres are chosen at random to receive Z and three to receive M
• Parallel group trial
• In each centre half the patients receive Z and half M in both episodes
• Two patients per centre are randomly chosen to receive Z and two receive M
• Cross-over design
• Each patient is given both treatments.
• M is received in one episode and Z in another
• The order of allocation, ZM or MZ is random
(c) Stephen Senn 2019 12
16. Null (skeleton) analysis of variance with Genstat ®
Code Output
(c) Stephen Senn 2019 16
BLOCKSTRUCTURE Centre/Patient/Episode/Measurement
ANOVA
17. Full (skeleton) analysis of variance with Genstat ®
Additional Code Output
(c) Stephen Senn 2019 17
TREATMENTSTRUCTURE Design[]
ANOVA
(Here Design[] is a pointer with values corresponding
to each of the three designs.)
18. The bottom line
• The approach recognises that things vary
• Centres, patients episodes
• It does not require everything to be balanced
• Things that can be eliminated will be eliminated by design
• Cross-over trial eliminates patients and centres
• Parallel group trial eliminates centres
• Cluster randomised eliminates none of these
• The measure of uncertainty produced by the analysis will reflected
what cannot be eliminated
• This requires matching the analysis to the design
(c) Stephen Senn 2019 18
19. A key lesson from the Rothamsted school
• Variances matter
• If you can’t say how precise your estimates are
• You don’t know what to do with them
• You don’t know if you need more information
• You can’t combine them with other information
• You cannot make useful inferences
• For example Bayesians cannot update their prior distributions
• You are in danger of mistaking the best supported for the probable
• To think about ‘point estimates’ only is a fundamental mistake
• Starting from this position leads to error
(c) Stephen Senn 2019 19
20. Variance matters
Points
• Which variances apply depends on
the design
• All three for cluster trial
• Last two for parallel trial
• Third only for cross-over trial
• It is possible for the number of
observations to go to infinity
without the variance going to zero
• There is no ‘design-free’ n
• There is no design-free asymptotic
inference
(c) Stephen Senn 2019 20
Variances
2
2
2
, centres
, patients per centre
, episodes per patient
1)
2)
3)
C
P
E
C C
P C P
E C P E
n
n
n
n between centre contribution
n n between patient contribution
n n n within patient contribution
21. What about the measurement level?
• I put this in to remind us that not
everything you measure brings
exploitable information to the same
degree
• Randomisation between
measurements was not possible in
any of the schemes
• This makes it difficult to exploit
measurements except in a summary
way
• For example, by averaging
• Warning: some repeated measures
analyses are very strongly reliant on
assumed model structure
(c) Stephen Senn 2019 21
1 6
1 12
1 7
1
6 1
7
2 2
1
1
1
, ,
1
, , usually
7 7 7
M
i
i
Y Y Var
Y
Y Var Y Var Y
Y Y
22. The TARGET study
• One of the largest studies ever run in osteoarthritis
• 18,000 patients
• Randomisation took place in two sub-studies of equal
size
• Lumiracoxib versus ibuprofen
• Lumiracoxib versus naproxen
• Purpose to investigate CV and GI tolerability of
lumiracoxib
(c) Stephen Senn 2019 22
23. Baseline Demographics
Sub-Study 1 Sub Study 2
Demographic
Characteristic
Lumiracoxib
n = 4376
Ibuprofen
n = 4397
Lumiracoxib
n = 4741
Naproxen
n = 4730
Use of low-dose
aspirin
975 (22.3) 966 (22.0) 1195 (25.1) 1193 (25.2)
History of
vascular disease
393 (9.0) 340 (7.7) 588 (12.4) 559 (11.8)
Cerebro-
vascular disease
69 (1.6) 65 (1.5) 108 (2.3) 107 (2.3)
Dyslipidaemias 1030 (23.5) 1025 (23.3) 799 (16.9) 809 (17.1)
Nitrate use 105 (2.4) 79 (1.8) 181 (3.8) 165 (3.5)
(c) Stephen Senn 2019 23
24. Baseline Chi-square P-values
Model Term
Demographic
Characteristic
Sub-study
(DF=1)
Treatment
given Sub-
study
(DF=2)
Treatment
(DF=2)
Use of low-dose
aspirin
< 0.0001 0.94 0.0012
History of
vascular disease
< 0.0001 0.07 <0.0001
Cerebro-
vascular disease
0.0002 0.93 0.0208
Dyslipidaemias <0.0001 0.92 <0.0001
Nitrate use < 0.0001 0.10 <0.0001
(c) Stephen Senn 2019 24
25. Outcome Variables
Lumiracoxib only
Sub-Study 1 Sub Study 2
Outcome
Variables
Lumiracoxib
n = 4376
Lumiracoxib
n = 4741
Total of
discontinuations
1751
(40.01)
1719
(36.26)
CV events 33
(0.75)
52
(1.10)
At least one AE 699
(15.97)
710
(14.98)
Any GI 1855
(42.39)
1785
(37.65)
Dyspepsia 1230
(28.11)
1037
(21.87)
(c) Stephen Senn 2019 25
26. Deviances and P-Values for Outcomes
Lumiracoxib only fitting Sub-study
Statistic
Outcome
Variables
Deviance Chi-
square
P-Value
Total of
discontinuations
37.43 < 0.0001
CV events 0.92 0.33
At least one AE 0.005 0.94
Any GI 0.004 0.95
Dyspepsia 16.85 < 0.0001
(c) Stephen Senn 2019 26
27. Lessons from TARGET
• If you want to use historical controls you will have to work very hard
• You need at least two components of variation in your model
• Between centre
• Between trial
• And possibly a third
• Between eras
• What seems like a lot of information may not be much
(c) Stephen Senn 2019 27
28. (c) Stephen Senn 2019 29
2
2
22
22
2
,
historical studies
patients per historical study
patients in current study
γ between study variance
σ between patient variance
lim
c
h
c
h
c
k n
k
n
n
n
nk
29. (c) Stephen Senn 2019 30
Lord’s Paradox
Lord, F.M. (1967) “ A paradox in the interpretation of
group comparisons”, Psychological Bulletin, 68, 304-
305.
“A large university is interested in investigating the effects on the students
of the diet provided in the university dining halls….Various types of data
are gathered. In particular the weight of each student at the time of his
arrival in September and his weight in the following June are recorded”
We shall consider this in the Wainer and Brown version (also considered
by Pearl & McKenzie) in which there are two halls each assigned a
different one of two diets being compared.
30. (c) Stephen Senn 2019 31
Two Statisticians
Statistician One (Say John)
• Calculates difference in weight
(outcome-baseline) for each hall
• No significant difference
between diets as regards this
‘change score’
• Concludes no evidence of
difference between diets
Statistician Two (Say Jane)
• Adjusts for initial weight as a
covariate
• Finds significant diet effect on
adjusted weight
• Concludes there is a difference
between diets
34. Pearl & Mackenzie, 2018
(c) Stephen Senn 2019 35
D
(Diet)
WF
W1 “However, for statisticians who
are trained in ‘conventional’
(i.e. model-blind) methodology
and avoid using causal lenses,
it is deeply paradoxical “
The Book of Why p217
“In this diagram, W1, is a
confounder
of D and WF and not a
mediator. Therefore, the
second statistician would
be unambiguously right
here.”
The Book of Why p216
NB This diagram adapted from theirs,
which covers change rather than final
weight.
35. Start with the randomised equivalent
• We suppose that the diets had been randomised to the two halls
• Le us suppose there are 100 students per hall
• Generate some data
• See what Genstat® says about analysis
• Note that ( as we have seen) it is a particular feature of Genstat® that
it does not have to have outcome data to do this
• Given the block and treatment structure alone it will give us a
skeleton ANOVA
• We start by ignoring the covariate
(c) Stephen Senn 2019 36
36. Skeleton ANOVA
(c) Stephen Senn 2019 37
BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
ANOVA
Analysis of variance
Source of variation d.f.
Hall stratum
Diet 1
Hall.Student stratum 198
Total 199
Code Output
Gentstat® points out the obvious (which, however, has
been universally overlooked). There are no
degrees of freedom to estimate the variability of the
Diet estimate which appears in the Hall and not the
Hall.Student stratum
37. Adding initial weight as a covariate
(c) Stephen Senn 2019 38
BLOCKSTRUCTURE Hall/Student
TREATMENTSTRUCTURE Diet
COVARIATE Base
ANOVA
Analysis of variance (adjusted for covariate)
Covariate: Base
Source of variation d.f.
Hall stratum
Diet 0
Covariate 1
Residual 0
Hall.Student stratum
Covariate 1
Residual 197
Total 199
Code Output
Again Gentstat® points out the obvious (which, however, has
been universally overlooked). There are no degrees of freedom
to estimate the treatment effect because the single degree of
freedom is needed to estimate the between-hall slope.
38. Conclusion
• The solution of The Book of Why is only correct if the between-hall
regression is the same as the within-hall regression
• Even if that assumption is correct the calculated standard error would
almost certainly be wrong
• Would require the assumption that there is no variance between-halls above
and beyond that predicted by the variance between students
• To treat the n as the number of students rather than the number of
halls is to commit the fallacy of pseudoreplication (Hurlbert, 1984)
• This is precisely what the Rothamsted School approach is designed to
avoid
(c) Stephen Senn 2019 39
39. A simulation to illustrate the problem
• Generate 20 Lord’s paradox cases
• Each of the sort that is addressed in The Book of Why
• Each consists of two halls with diet 1 in one hall and diet 2 in the other
• Impose the strong assumption that The Book of Why implicitly
assumes
• Set between-hall variance to zero
• See if a consistent message is repeated from case to case among the 20 cases
• Repeat the exercise, violating the strong assumption
• Make between-hall variance large and allow for a between-hall covariance
• See if a consistent message is repeated from case to case among the 20 cases
(c) Stephen Senn 2019 40
41. The two cases compared
The implicit assumption is true
Simulation 1
The implicit assumption is false
Simulation 2
(c) Stephen Senn 2019 42
Critical values reflect
Bonferroni values (one-
sided) of
1 40 20 = 1 800
42. Making ‘Hall’ part of the treatment structure
Code Output
BLOCKSTRUCTURE Student
TREATMENTSTRUCTURE Hall+Diet
ANOVA
Null ANOVA with Hall as part of the
treatment structure
Analysis of variance
Source of variation d.f.
Student stratum
Hall 1
Residual 198
Total 199
Information summary
Aliased model terms
Diet
(c) Stephen Senn 2019 43
43. A simple way of looking at it
Any effect of diet must be on the final weight
Thus the causal message of the diet is transmitted via 2 1Y Y
This estimates the diet effect. The question is, what else does it estimate?
We correct for anything else by using the counterfactual estimate: “what would this
difference show if there were no effect of diets?”
Statistician Counterfactual Assumption
John 𝑋2 − 𝑋1 Tracking over time
Jane 𝛽 𝑋𝑌 𝑋2 − 𝑋1 between-hall regression = within-hall
regression
Senn 2006 (c) Stephen Senn 2019 44
44. Conclusions
• Local control is valuable
• Design matters
• Components of variation matter
• The Rothamsted approach brings insight
• Causal analysis needs to be developed further to include components of
variation
• Greenland and Mansournia (2015)
• McLaren and Nicholson (2019)
• Kim and Steiner (2019)
• Just because you are rich in data does not mean you are rich in information
• Be sceptical about “big data”
(c) Stephen Senn 2019 45
45. A warning as regards observational studies
(c) Stephen Senn 2019 46
Things that are a problem for controlled clinical trials are very rarely less of a
problem for observational studies.
Propensity score, Mendelian randomisation, causal analysis, blah, blah, blah are
all very well but if you aren’t thinking about components of variation, you
should be.
Variances matter. Assuming that, once confounders are adjusted for, that your
uncertainty is proportional to 1 𝑛 is an all too standard error
46. Finally, I leave you with this thought
(c) Stephen Senn 2019 47
A big data-analyst is an expert at producing misleading
conclusions from huge datasets.
It is much more efficient to use a statistician, who can do
the same with small ones.
47. References
(c) Stephen Senn 2019 48
1. Greenland, S. and M.A. Mansournia, Limitations of individual causal models, causal graphs, and ignorability
assumptions, as illustrated by random confounding and design unfaithfulness. European journal of
Epidemiology, 2015. 30(10): p. 1101-1110.
2. Holland PW, Rubin DB. On Lord's Paradox. In: Wainer H, Messick S, editors. Principles of Modern
Psychological Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983.
3. Hurlbert, S.H., Pseudoreplication and the design of ecological field experiments. Ecological monographs,
1984. 54(2): p. 187-211
4. Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin. 1967;66:304-5.
5. Nelder JA. The analysis of randomised experiments with orthogonal block structure I. Block structure and
the null analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:147-62.
6. Nelder JA. The analysis of randomised experiments with orthogonal block structure II. Treatment structure
and the general analysis of variance. Proceedings of the Royal Society of London Series A. 1965;283:163-78.
7. Pearl J, Mackenzie D. The Book of Why: Basic Books; 2018
8. Senn SJ. Change from baseline and analysis of covariance revisited. Statistics in Medicine.
2006;25(24):4334–44.
9. .Wainer H, Brown LM. Two statistical paradoxes in the interpretation of group differences: Illustrated with
medical school admission and licensing data. American Statistician. 2004;58(2):117-23.
48. Blogposts related to the talk
(c) Stephen Senn 2019 49
https://errorstatistics.com/2019/03/09/s-senn-to-infinity-and-beyond-how-big-are-your-data-really-guest-post/
To infinity and beyond: how big are your data, really?
https://errorstatistics.com/2018/11/11/stephen-senn-rothamsted-statistics-meets-lords-paradox-guest-post/
Stephen Senn: Rothamsted Statistics meets Lord’s Paradox
On the level. Why block structure matters and its relevance to Lord’s paradox
https://errorstatistics.com/2018/11/22/stephen-senn-on-the-level-why-block-structure-matters-and-its-relevance-to-
lords-paradox-guest-post/
Red herrings and the art of cause fishing: Lord’s Paradox revisited
https://errorstatistics.com/2019/08/02/s-senn-red-herrings-and-the-art-of-cause-fishing-lords-paradox-revisited-guest-
post/