This document provides 10 recommendations for transparent and consistent scientific research and publication. It recommends (1) stating the research question and purpose, (2) describing the data source and sample size, (3) presenting individual observations when possible, (4) reporting numbers, averages, and variation when data is aggregated, (5) describing statistical methods used, (6) addressing assumptions of those methods, (7) distinguishing statistical and practical significance and clarifying hypotheses, (8) presenting confidence intervals rather than just p-values, (9) explaining departures from conventional significance levels, and (10) ensuring conclusions are consistent with the statistical rigor. The document emphasizes using clear and precise language to describe methodology, results, and conclusions.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
Statistical Methods for Removing Selection Bias In Observational StudiesNathan Taback
The slide deck is from a talk I delivered at a Dana Farber / Harvard Cancer Center outcomes seminar. It presents an overview of currently available statistical methods to remove bias in observational studies.
Medical research relies heavily on statistical inference for generalization of findings, for assessing the uncertainty in applying these findings on new patients. SPSS and similar packages has made complex statistical calculations possible with no or very little understanding of statistical inference. As a consequence, research findings are misunderstood, the presentation of them confusing, and their reliability massively overestimated.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
An early and overlooked causal revolution in statistics was the development of the theory of experimental design, initially associated with the "Rothamstead School". An important stage in the evolution of this theory was the experimental calculus developed by John Nelder in the 1960s with its clear distinction between block and treatment factors in designed experiments. This experimental calculus produced appropriate models automatically from more basic formal considerations but was, unfortunately, only ever implemented in Genstat®, a package widely used in agriculture but rarely so in medical research. In consequence its importance has not been appreciated and the approach of many statistical packages to designed experiments is poor. A key feature of the Rothamsted School approach is that identification of the appropriate components of variation for judging treatment effects is simple and automatic.
The impressive more recent causal revolution in epidemiology, associated with Judea Pearl, seems to have no place for components of variation, however. By considering the application of Nelder’s experimental calculus to Lord’s Paradox, I shall show that this reveals that solutions that have been proposed using the more modern causal calculus are problematic. I shall also show that lessons from designed clinical trials have important implications for the use of historical data and big data more generally.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
The response to the COVID-19 crisis by various vaccine developers has been extraordinary, both in terms of speed of response and the delivered efficacy of the vaccines. It has also raised some fascinating issues of design, analysis and interpretation. I shall consider some of these issues, taking as my example, five vaccines: Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, Novavax, and J&J Janssen but concentrating mainly on the first two. Among matters covered will be concurrent control, efficient design, issues of measurement raised by two-shot vaccines and implications for roll-out, and the surprising effectiveness of simple analyses. Differences between the five development programmes as they affect statistics will be covered but some essential similarities will also be discussed.
Statistical Methods for Removing Selection Bias In Observational StudiesNathan Taback
The slide deck is from a talk I delivered at a Dana Farber / Harvard Cancer Center outcomes seminar. It presents an overview of currently available statistical methods to remove bias in observational studies.
Medical research relies heavily on statistical inference for generalization of findings, for assessing the uncertainty in applying these findings on new patients. SPSS and similar packages has made complex statistical calculations possible with no or very little understanding of statistical inference. As a consequence, research findings are misunderstood, the presentation of them confusing, and their reliability massively overestimated.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
An early and overlooked causal revolution in statistics was the development of the theory of experimental design, initially associated with the "Rothamstead School". An important stage in the evolution of this theory was the experimental calculus developed by John Nelder in the 1960s with its clear distinction between block and treatment factors in designed experiments. This experimental calculus produced appropriate models automatically from more basic formal considerations but was, unfortunately, only ever implemented in Genstat®, a package widely used in agriculture but rarely so in medical research. In consequence its importance has not been appreciated and the approach of many statistical packages to designed experiments is poor. A key feature of the Rothamsted School approach is that identification of the appropriate components of variation for judging treatment effects is simple and automatic.
The impressive more recent causal revolution in epidemiology, associated with Judea Pearl, seems to have no place for components of variation, however. By considering the application of Nelder’s experimental calculus to Lord’s Paradox, I shall show that this reveals that solutions that have been proposed using the more modern causal calculus are problematic. I shall also show that lessons from designed clinical trials have important implications for the use of historical data and big data more generally.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
The response to the COVID-19 crisis by various vaccine developers has been extraordinary, both in terms of speed of response and the delivered efficacy of the vaccines. It has also raised some fascinating issues of design, analysis and interpretation. I shall consider some of these issues, taking as my example, five vaccines: Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, Novavax, and J&J Janssen but concentrating mainly on the first two. Among matters covered will be concurrent control, efficient design, issues of measurement raised by two-shot vaccines and implications for roll-out, and the surprising effectiveness of simple analyses. Differences between the five development programmes as they affect statistics will be covered but some essential similarities will also be discussed.
Biostatistics are widely used in clinical trials to collect and organize and describe and interpret these result and then give to us proves to take appropriate clinical decisions
basic lecture on literature types, importance of primary literature (papers,article) , study designs, and organization of scientific paper. p value and assessment of a new test is additional topic.
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...John Hoey
Much published health sciences literature is misleading and biased
Efforts to correct this include use of reporting guidelines- criteria for doing science and reporting the results properly
Also discussion of conflicts of interest - how to report them.
Practical Methods To Overcome Sample Size ChallengesnQuery
Watch the video at: https://www.statsols.com/webinars/practical-methods-to-overcome-sample-size-challenges
In this webinar hosted by Ronan Fitzpatrick - Head of Statistics and nQuery Lead Researcher at Statsols - we will examine some of the most common practical challenges you will experience while calculating sample size for your study. These challenges will be split into two categories:
1. Overcoming Sample Size Calculation Challenges
(Survival Analysis Example)
We will examine practical methods to overcome common sample size calculation issues by focusing in on one of the more complex areas for sample size determination; Survival analysis. We will cover difficulties and potential issues surrounding challenges such as:
Drop Out: How to deal with expected dropouts or censoring. We compare the simple loss-to-follow-up method and integrating a dropout process into the sample size model?
Planning Uncertainty: How best to deal with the inevitable uncertainty at the planning stage? We examine how best to apply a sensitivity analysis and Bayesian approaches to explore the uncertainty in your sample size calculations.
Choosing the Effect Size: Various approaches and interpretations exist for how to find the effect size value. We examine those contrasting interpretations and determine the best method and also how to deal with parameterization options.
2. Overcoming Study Design Challenges
(Vaccine Efficacy Example)
The Randomised Controlled Trial (RCT) is considered the gold standard in trial design in drug development. However, there are often practical impediments which mean that adjustments or pragmatic approaches are needed for some trials and studies.
We will examine practical methods how to overcome common study design challenges and how these affect your sample size calculations. In this webinar, we will use common issues in vaccine study design to examine difficulties surrounding issues such as:
Case-Control Analysis: We will examine how to deal with study constraints and how to deal with analyses done during an observational study.
Alternative Randomization Methods: How best to address randomization in your vaccine trial design when full randomization is difficult, expensive or impractical. We examine how sample size calculations are affected with cluster or Mendelian randomization.
Rare Events: How does an outcome being rare affect the types of study design and statistical methods chosen in your study.
2. Scientific research
A systematic investigation ... designed to develop or contribute to
generalizable knowledge1.
Generalizable: Having predictive and reliable results.
When sampling errors don't exist or are irrelevant, qualitative
research methods (e.g. case reporting) can be used.
If sampling errors do exist, the unavoidable sampling uncertainty
must be quantified (quantitative research) and presented, usually
in terms of p-values and confidence intervals.
1
The US National Science Foundation
3. Statistics
Medical researchers rely as never before on statistics for
generating and testing hypotheses and for estimating risks
and benefits of old and new therapies.
Journals can facilitate the writing and reading of research
reports by implementing clear guidelines for manuscript
preparation.
4. Milestones in scientific publication
1658 – the first scientific journals
1858 – the IMRAD structure
1957 – the abstract
1978 – the Vancouver convention (ICMJE)
1987 – the structured abstract
1997 – the CONSORT guidelines
2007 – the STROBE guidelines
7. 1. Purpose
State the research question and the purpose of the study. Is
the ambition to describe an observation, to generate
hypotheses or to test a pre-specified hypothesis?
8. 1. Purpose
State the research question and the purpose of the study. Is
the ambition to describe an observation, to generate
hypotheses or to test a pre-specified hypothesis?
Bad
We have shown that the success rate differs between two
common techniques for autologous chondrocyte implantation.
9. 1. Purpose
State the research question and the purpose of the study. Is
the ambition to describe an observation, to generate
hypotheses or to test a pre-specified hypothesis?
Good
We designed an experiment to test the hypothesis of identical
success rates of two common techniques for autologous
chondrocyte implantation.
10. 2. Data source
Describe the source of subjects, cadavers, animals, tissues,
cell line, etc. and how many of these units have been included
in the study.
11. 2. Data source
Describe the source of subjects, cadavers, animals, tissues,
cell line, etc. and how many of these units have been included
in the study.
Bad
We collected 36 pieces of human cartilage.
12. 2. Data source
Describe the source of subjects, cadavers, animals, tissues,
cell line, etc. and how many of these units have been included
in the study.
Good
Three pieces of cartilage from each of twelve physically active
men between 25 and 75 years of age, previously included as
healthy controls in a clinical trial (ref.), were collected for this
study.
13. 3. Observations
When observations can be presented individually, either
numerically or graphically, this should be preferred. With fewer
than 4 observations it should be the rule.
14. 3. Observations
When observations can be presented individually, either
numerically or graphically, this should be preferred. With fewer
than 4 observations it should be the rule.
Bad
15. 3. Observations
When observations can be presented individually, either
numerically or graphically, this should be preferred. With fewer
than 4 observations it should be the rule.
Good
16. 4. Descriptions
When presenting data in aggregated form, always present the
number of included observations as well as their average and
dispersion. If repeated measurements or replicates are
included, present both the number of independent samples and
the total number of observations.
17. 4. Descriptions
When presenting data in aggregated form, always present the
number of included observations as well as their average and
dispersion. If repeated measurements or replicates are
included, present both the number of independent samples and
the total number of observations.
Bad
The mean change in total knee cartilage volume was 0.62 ml.
18. 4. Descriptions
When presenting data in aggregated form, always present the
number of included observations as well as their average and
dispersion. If repeated measurements or replicates are
included, present both the number of independent samples and
the total number of observations.
Good
The mean change in total knee cartilage volume was 0.62
±1.3 ml (n=24).
19. 5. Methods
Describe all used statistical methods in a statistics section. Use
the original names of the methods. These are not always the
same as the names used in software packages.
20. 5. Methods
Describe all used statistical methods in a statistics section. Use
the original names of the methods. These are not always the
same as the names used in software packages.
Bad
We used the independent groups t-test in the group comparison.
21. 5. Methods
Describe all used statistical methods in a statistics section. Use
the original names of the methods. These are not always the
same as the names used in software packages.
Good
We used Satterthwaite's t-test in the group comparison.
23. 6. Assumptions
The validity of statistical results rely on certain assumptions
being fulfilled. Were they?
The man of science has learned to believe in justification, not by
faith, but by verification.
Thomas Huxley, 1866
24. 6. Assumptions
The validity of statistical results rely on certain assumptions
being fulfilled. Were they?
Good
The ANOVA residual was examined using a normal probability
plot, which indicated a Gaussian distribution.
The homogeneity of variance was tested using Levene's test.
The assumption of proportional hazards was investigated using
hypothesis tests of Schoenfeld residuals.
25. 7. Significance
A p-value describes the uncertainty in the generalization (the
outcome of a hypothesis test), and has no relevance for the
observed sample itself.
Distinguish between practical and statistical significance. Clarify
what hypotheses are tested.
26. 7. Significance
A p-value describes the uncertainty in the generalization (the
outcome of a hypothesis test), and has no relevance for the
observed sample itself.
Distinguish between practical and statistical significance. Clarify
what hypotheses are tested.
Bad
There was no difference in mean systolic blood pressure
between treated patients (190 mmHg) and controls (135 mmHg)
(p = 0.06).
27. 7. Significance
A p-value describes the uncertainty in the generalization (the
outcome of a hypothesis test), and has no relevance for the
observed sample itself.
Distinguish between practical and statistical significance. Clarify
what hypotheses are tested.
Good
Treated patients had in this study higher mean systolic blood
pressure than controls, 190 vs. 135 mmHg. The observation,
even if not statistically significant (p = 0.06), raises concern for
future treatment.
28. 8. Confidence
The uncertainty in the generalization of a finding is often better
presented using the two limits of a confidence interval, indicating
plausible values, than one probability of a false positive
conclusion.
29. 8. Confidence
The uncertainty in the generalization of a finding is often better
presented using the two limits of a confidence interval, indicating
plausible values, than one probability of a false positive
conclusion.
Bad
The reproducibility was high (ICC = 0.91; p < 0.0001).
30. 8. Confidence
The uncertainty in the generalization of a finding is often better
presented using the two limits of a confidence interval, indicating
plausible values, than one probability of a false positive
conclusion.
Good
The reproducibility was high (ICC = 0.91; 95%Ci: 0.64 - 0.98).
31. 9. Multiplicity
All departures from the conventional levels of 5% significance
and 95% confidence, like the ones achieved by using one-sided
tests, Bonferroni corrections, and simultaneous confidence
intervals, should be explained and motivated.
32. 9. Multiplicity
All departures from the conventional levels of 5% significance
and 95% confidence, like the ones achieved by using one-sided
tests, Bonferroni corrections, and simultaneous confidence
intervals, should be explained and motivated.
Bad
We have in this randomized trial shown that patients born under
the astrological sign of Gemini benefit aspirin treatment more
than others.
33. 9. Multiplicity
All departures from the conventional levels of 5% significance
and 95% confidence, like the ones achieved by using one-sided
tests, Bonferroni corrections, and simultaneous confidence
intervals, should be explained and motivated.
Good
When multiplicity issues were taken into account, we were
unable to find any interaction between astrological sign and
benefit from aspirin treatment.
34. 10. Claims
The level of statistical rigor (precision and addressed uncertainty
issues) should be consistent with the author's purpose and
conclusions.
35. What is all this fuss about confidence
intervals and clinical significance?
Questions that can be answered using p-values
- Can I be sure that there is an effect?
Questions that can be answered using confidence intervals
- Can I be sure that there is an effect?
- Can I be sure that there isn't an effect?
- What effect is there?
36. P-values
Statistical significance
p < 0.05 or n.s.
Confidence intervals
Statistical and clinical significance
Effect
0
Clinically significant effect
37. Statements that should be avoided
- “Statistical difference”
- “Significant difference”
- “There was no difference”
- “ns” and “p > 0.05”
- “p < 0.03”