The document summarizes recent criticisms of the use of p-values and significance testing in research. It discusses how influential researchers like Ioannidis and replicability studies have questioned whether most published findings are true. It also summarizes the American Statistical Association's statement on p-values, which advises that scientific conclusions should not be based solely on p-value thresholds. The document then provides background on the history of p-values, noting that the debate is not just between frequentist and Bayesian approaches, but between different Bayesian views. It concludes by advising that p<0.05 is a weak standard, replication is important, and researchers should follow the ASA's recommendations to avoid overreliance on p-values alone.
A comment in Nature, signed by over 800 researchers, called for a rise up against statistical significance. This was followed by a special issue of The American Statistician aimed at halting the use of the term "statistically significant", and new guidelines for statistical reporting in the New England Journal of Medicine. These slides discuss the broader context of the "p-value crisis" and alternatives for communicating the conclusions after statistical analyses.
Target audience: Medical researchers; Scientists involved in conducting or interpreting analyses and communicating the results of scientific research, as well as readers of scientific publications.
Learning objectives:
To understand the context of the reproducibility crisis in medical research.
To learn about problems with p-values and alternatives to report findings.
To understand how (not) to interpret significant and insignificant findings.
To learn how to communicate research findings in a modest, thoughtful, and transparent way.
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
Take it to the Limit: quantitation, likelihood, modelling and other mattersStephen Senn
Taking the example of estimating parameter values when some observations are below the limit of quantitation I illustrate the utility of likelihood as a general framework for modelling. Some geometrical representations of this simple problem are considered and the way that the method of maximum likelihood delivers not only an estimate but an estimate of the reliability of the estimate is considered. I also discuss the importance of assumptions about how data come to be missing and some difficulties and limitations for more complex cases will also be touched on.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
A comment in Nature, signed by over 800 researchers, called for a rise up against statistical significance. This was followed by a special issue of The American Statistician aimed at halting the use of the term "statistically significant", and new guidelines for statistical reporting in the New England Journal of Medicine. These slides discuss the broader context of the "p-value crisis" and alternatives for communicating the conclusions after statistical analyses.
Target audience: Medical researchers; Scientists involved in conducting or interpreting analyses and communicating the results of scientific research, as well as readers of scientific publications.
Learning objectives:
To understand the context of the reproducibility crisis in medical research.
To learn about problems with p-values and alternatives to report findings.
To understand how (not) to interpret significant and insignificant findings.
To learn how to communicate research findings in a modest, thoughtful, and transparent way.
The Seven Habits of Highly Effective StatisticiansStephen Senn
If you know why the title of this talk is extremely stupid, then you clearly know something about control, data and reasoning: in short, you have most of what it takes to be a statistician. If you have studied statistics then you will also know that a large amount of anything, and this includes successful careers, is luck.
In this talk I shall try share some of my experiences of being a statistician in the hope that it will help you make the most of whatever luck life throws you, In so doing, I shall try my best to overcome the distorting influence of that easiest of sciences hindsight. Without giving too much away, I shall be recommending that you read, listen, think, calculate, understand, communicate, and do. I shall give you some example of what I think works and what I think doesn’t
In all of this you should never forget the power of negativity and also the joy of being able to wake up every day and say to yourself ‘I love the small of data in the morning’.
Personalised medicine a sceptical viewStephen Senn
Some grounds for believing that the current enthusiasm about personalised medicine is exaggerated, founded on poor statistics and represents a disappointing loss of ambition.
Take it to the Limit: quantitation, likelihood, modelling and other mattersStephen Senn
Taking the example of estimating parameter values when some observations are below the limit of quantitation I illustrate the utility of likelihood as a general framework for modelling. Some geometrical representations of this simple problem are considered and the way that the method of maximum likelihood delivers not only an estimate but an estimate of the reliability of the estimate is considered. I also discuss the importance of assumptions about how data come to be missing and some difficulties and limitations for more complex cases will also be touched on.
How to combine results from randomised clinical trials on the additive scale with real world data to provide predictions on the clinically relevant scale for individual patients
When estimating sample sizes for clinical trials there are several different views that might be taken as to what definition and meaning should be given to the sought-for treatment effect. However, if the concept of a ‘minimally important difference’ (MID) does have relevance to interpreting clinical trials (which can be disputed) then its value cannot be the same as the ‘clinically relevant difference’ (CRD) that would be used for planning them.
A doubly pernicious use of the MID is as a means of classifying patients as responders and non-responders. Not only does such an analysis lead to an increase in the necessary sample size but it misleads trialists into making causal distinctions that the data cannot support and has been responsible for exaggerating the scope for personalised medicine.
In this talk these statistical points will be explained using a minimum of technical detail.
This module has prepared for the postgraduate medical students in any specialty. Last 10 questions are MCQ which is very important for FCPS part 1 (all subjects)
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
Today’s posterior is tomorrow’s prior. Dennis Lindley1 (P2)
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding of the purpose and nature of replication, which has only recently been subject to formal analysis2. What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement3 and one that has a strong dependence on the size of the replication study4.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension to true negative values. Yet, as is well known from the theory of diagnostic tests, it is meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence that is reliable as it can be qua evidence. There are many basic scientific practices in need of reform. Pseudoreplication5, for example, and the routine destruction of information through dichotomisation6 are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
References
1. Lindley DV. Bayesian statistics: A review. SIAM; 1972.
2. Devezer B, Navarro DJ, Vandekerckhove J, Ozge Buzbas E. The case for formal methodology in scientific reform. R Soc Open Sci. Mar 31 2021;8(3):200805. doi:10.1098/rsos.200805
3. Walker RCS. Theories of Truth. In: Hale B, Wright C, Miller A, eds. A Companion to the Philosophy of Language. John Wiley & Sons,; 2017:532-553:chap 21.
4. Senn SJ. A comment on replication, p-values and evidence by S.N.Goodman, Statistics in Medicine 1992; 11:875-879. Letter. Statistics in Medicine. 2002;21(16):2437-44.
5. Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211.
6. Senn SJ. Being Efficient About Efficacy Estimation. Research. Statistics in Biopharmaceutical Research. 2013;5(3):204-210. doi:10.1080/19466315.2012.754726
Critical Appraisal of systematic review and meta analysis articlesDr. Majdi Al Jasim
Critique of systematic review and meta analysis articles
This presentation is made to educate health care provide rs on how to do critical appraisal of systematic review and meta analysis articles
Minimisation is an approach to allocating patients to treatment in clinical trials that forces a greater degree of balance than does randomisation. Here I explain why I dislike it.
Superiority, Equivalence, and Non-Inferiority Trial DesignsKevin Clauson
http://bit.ly/bQKcGz This lecture was presented as part of the Drug Literature Evaluation course at Nova Southeastern University. Guided notes and an audience response system were used to augment to lecture. Context for my decision to share these slides can be found at the provided link.
How to practice medicine ? to provide ordinary care or to provide the best available care? Cochrane systematic reviews help u in this issue. This talk illustrates how Cochrane reviews helps with special focus on reproductive medicine
Clinical trials are about comparability not generalisability V2.pptxStephenSenn2
Lecture delivered at the September 2022 EFSPI meeting in Basle in which I argued that the patients in a clinical trial should not be viewed as being a representative sample of some target population.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
P Values and Replication: the problem is not what you think
Lecture at MRC Brain Science & Cognition, Cambridge 16 December 2015
Abstract
It has been claimed that there is a crisis of replication in science. Prominent amongst the many factors that have been fingered as being responsible is the humble and ubiquitous P-value. One journal has even gone so far as to ban all inferential statistics. However, it is one thing to banish measures of uncertainty and another to banish uncertainty from your measures. I shall claim that the apparent discrepancy between P-values and posterior probabilities is as much a discrepancy between two approaches to Bayesian inference as it is between frequentist and Bayesian frameworks and that a further problem has been misunderstandings regarding predictive probabilities. I conclude that banning P-values won’t make all published results repeatable and that it is possible undesirable that it should.
When estimating sample sizes for clinical trials there are several different views that might be taken as to what definition and meaning should be given to the sought-for treatment effect. However, if the concept of a ‘minimally important difference’ (MID) does have relevance to interpreting clinical trials (which can be disputed) then its value cannot be the same as the ‘clinically relevant difference’ (CRD) that would be used for planning them.
A doubly pernicious use of the MID is as a means of classifying patients as responders and non-responders. Not only does such an analysis lead to an increase in the necessary sample size but it misleads trialists into making causal distinctions that the data cannot support and has been responsible for exaggerating the scope for personalised medicine.
In this talk these statistical points will be explained using a minimum of technical detail.
This module has prepared for the postgraduate medical students in any specialty. Last 10 questions are MCQ which is very important for FCPS part 1 (all subjects)
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
Today’s posterior is tomorrow’s prior. Dennis Lindley1 (P2)
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding of the purpose and nature of replication, which has only recently been subject to formal analysis2. What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement3 and one that has a strong dependence on the size of the replication study4.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension to true negative values. Yet, as is well known from the theory of diagnostic tests, it is meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence that is reliable as it can be qua evidence. There are many basic scientific practices in need of reform. Pseudoreplication5, for example, and the routine destruction of information through dichotomisation6 are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
References
1. Lindley DV. Bayesian statistics: A review. SIAM; 1972.
2. Devezer B, Navarro DJ, Vandekerckhove J, Ozge Buzbas E. The case for formal methodology in scientific reform. R Soc Open Sci. Mar 31 2021;8(3):200805. doi:10.1098/rsos.200805
3. Walker RCS. Theories of Truth. In: Hale B, Wright C, Miller A, eds. A Companion to the Philosophy of Language. John Wiley & Sons,; 2017:532-553:chap 21.
4. Senn SJ. A comment on replication, p-values and evidence by S.N.Goodman, Statistics in Medicine 1992; 11:875-879. Letter. Statistics in Medicine. 2002;21(16):2437-44.
5. Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211.
6. Senn SJ. Being Efficient About Efficacy Estimation. Research. Statistics in Biopharmaceutical Research. 2013;5(3):204-210. doi:10.1080/19466315.2012.754726
Critical Appraisal of systematic review and meta analysis articlesDr. Majdi Al Jasim
Critique of systematic review and meta analysis articles
This presentation is made to educate health care provide rs on how to do critical appraisal of systematic review and meta analysis articles
Minimisation is an approach to allocating patients to treatment in clinical trials that forces a greater degree of balance than does randomisation. Here I explain why I dislike it.
Superiority, Equivalence, and Non-Inferiority Trial DesignsKevin Clauson
http://bit.ly/bQKcGz This lecture was presented as part of the Drug Literature Evaluation course at Nova Southeastern University. Guided notes and an audience response system were used to augment to lecture. Context for my decision to share these slides can be found at the provided link.
How to practice medicine ? to provide ordinary care or to provide the best available care? Cochrane systematic reviews help u in this issue. This talk illustrates how Cochrane reviews helps with special focus on reproductive medicine
Clinical trials are about comparability not generalisability V2.pptxStephenSenn2
Lecture delivered at the September 2022 EFSPI meeting in Basle in which I argued that the patients in a clinical trial should not be viewed as being a representative sample of some target population.
Talk given at RSS 2016 Manchester
I consider the problems that the ASA faced in getting a P-value statement together, not in terms of the process, but by looking at the expressed opinion of 21 published commentaries of the agreed statement. I then trace the history of the development of P-values. I show that the perceived problem with P-values in not just one of a supposed inadequacy of frequentist statistics but reflects a struggle at the very heart of Bayesian inference. I conclude that replacing P-values by automatic Bayesian approaches is unlikely to abolish controversy. It may be better to try and embrace diversity than to pretend it is not there.
P Values and Replication: the problem is not what you think
Lecture at MRC Brain Science & Cognition, Cambridge 16 December 2015
Abstract
It has been claimed that there is a crisis of replication in science. Prominent amongst the many factors that have been fingered as being responsible is the humble and ubiquitous P-value. One journal has even gone so far as to ban all inferential statistics. However, it is one thing to banish measures of uncertainty and another to banish uncertainty from your measures. I shall claim that the apparent discrepancy between P-values and posterior probabilities is as much a discrepancy between two approaches to Bayesian inference as it is between frequentist and Bayesian frameworks and that a further problem has been misunderstandings regarding predictive probabilities. I conclude that banning P-values won’t make all published results repeatable and that it is possible undesirable that it should.
Stephen Senn slides:"‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?" presented May 23 at the session on "The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference"," at the 2015 APS Annual Convention in NYC
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
Today’s posterior is tomorrow’s prior. Dennis Lindley
It has been claimed that science is undergoing a replication crisis and that when looking for culprits, the cult of significance is the chief suspect. It has also been claimed that Bayes factors might provide a solution.
In my opinion, these claims are misleading and part of the problem is our understanding
of the purpose and nature of replication, which has only recently been subject to formal
analysis.
What we are or should be interested in is truth. Replication is a coherence not a correspondence requirement and one that has a strong dependence on the size
of the replication study
.
Consideration of Bayes factors raises a puzzling question. Should the Bayes factor for a replication study be calculated as if it were the initial study? If the answer is yes, the approach is not fully Bayesian and furthermore the Bayes factors will be subject to
exactly the same replication ‘paradox’ as P-values. If the answer is no, then in what
sense can an initially found Bayes factor be replicated and what are the implications for how we should view replication of P-values?
A further issue is that little attention has been paid to false negatives and, by extension
to true negative values. Yet, as is well known from the theory of diagnostic tests, it is
meaningless to consider the performance of a test in terms of false positives alone.
I shall argue that we are in danger of confusing evidence with the conclusions we draw and that any reforms of scientific practice should concentrate on producing evidence
that is reliable as it can be qua evidence. There are many basic scientific practices in
need of reform. Pseudoreplication, for example, and the routine destruction of
information through dichotomisation are far more serious problems than many matters of inferential framing that seem to have excited statisticians.
Views of the role of hypothesis falsification in statistical testing do not divide as cleanly between frequentist and Bayesian views as is commonly supposed. This can be shown by considering the two major variants of the Bayesian approach to statistical inference and the two major variants of the frequentist one.
A good case can be made that the Bayesian, de Finetti, just like Popper, was a falsificationist. A thumbnail view, which is not just a caricature, of de Finetti’s theory of learning, is that your subjective probabilities are modified through experience by noticing which of your predictions are wrong, striking out the sequences that involved them and renormalising.
On the other hand, in the formal frequentist Neyman-Pearson approach to hypothesis testing, you can, if you wish, shift conventional null and alternative hypotheses, making the latter the strawman and by ‘disproving’ it, assert the former.
The frequentist, Fisher, however, at least in his approach to testing of hypotheses, seems to have taken a strong view that the null hypothesis was quite different from any other and there was a strong asymmetry on inferences that followed from the application of significance tests.
Finally, to complete a quartet, the Bayesian geophysicist Jeffreys, inspired by Broad, specifically developed his approach to significance testing in order to be able to ‘prove’ scientific laws.
By considering the controversial case of equivalence testing in clinical trials, where the object is to prove that ‘treatments’ do not differ from each other, I shall show that there are fundamental differences between ‘proving’ and falsifying a hypothesis and that this distinction does not disappear by adopting a Bayesian philosophy. I conclude that falsificationism is important for Bayesians also, although it is an open question as to whether it is enough for frequentists.
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
D. Mayo discusses various disputes-notably the replication crisis in science-in the context of her just released book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
Slides given for Deborah G. Mayo talk at Minnesota Center for Philosophy of Science at University of Minnesota on the ASA 2016 statement on P-values and Error Statistics
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian MethodsNUI Galway
Dr Milovan Krnjajic, School of Mathematics, NUI Galway, presented this inaugural workshop on Modern Statistical Analysis using Bayesian Methods as part of the launch of the Social Sciences Computing Hub at the Whitaker Institute, NUI Galway on 26th March 2013.
What is the significance of p value while reporting statistical analysis. Is there an alternate approach for Fisher, if so what is that approach. These are some of the issues addressed here.
Making Sense of Statistics in HCI: part 2 - doing itAlan Dix
Doing it – if not p then what
http://alandix.com/statistics/course/
In this part we will look at the major kinds of statistical analysis methods:
* Hypothesis testing (the dreaded p!) – robust but confusing
* Confidence intervals – powerful but underused
* Bayesian stats – mathematically clean but fragile
None of these is a magic bullet; all need care and a level of statistical understanding to apply.
We will discuss how these are related including the relationship between ‘likelihood’ in hypothesis testing and conditional probability as used in Bayesian analysis. There are common issues including the need to clearly report numbers and tests/distributions used. avoiding cherry picking, dealing with outliers, non-independent effects and correlated features. However, there are also specific issues for each method.
Classic statistical methods used in hypothesis testing and confidence intervals depend on ideas of ‘worse’ for measures, which are sometimes obvious, sometimes need thought (one vs. two tailed test), and sometimes outright confusing. In addition, care is needed in hypothesis testing to avoid classic fails such as treating non-significant as no-effect and inflated effect sizes.
In Bayesian statistics different problems arise: the need to be able to decide in a robust and defensible manner what are the expected likelihood of different hypothesis before an experiment; and the dangers of common causes leading to inflated probability estimates due to a single initial fluke event or optimistic prior.
Crucially, while all methods have problems that need to be avoided, we will see how not using statistics at all can be far worse.
There are many questions one might ask of a clinical trial, ranging from what was the effect in the patients studied to what might the effect be in future patients via what was the effect in individual patients? The extent to which the answer to these questions is similar depends on various assumptions made and in some cases the design used may not permit any meaningful answer to be given at all.
A related issue is confusion between randomisation, random sampling, linear model and true multivariate based modelling. These distinctions don’t matter much for some purposes and under some circumstances but for others they do.
A yet further issue is that causal analysis in epidemiology, which has brought valuable insights in many cases, has tended to stress point estimates and ignore standard errors. This has potentially misleading consequences.
An understanding of components of variation is key. Unfortunately, the development of two particular topics in recent years, evidence synthesis by the evidence based medicine movement and personalised medicine by bench scientists has either paid scant attention to components of variation or to the questions being asked or both resulting in confusion about many issues.
For instance, it is often claimed that numbers needed to treat indicate the proportion of patients for whom treatments work, that inclusion criteria determine the generalisability of results and that heterogeneity means that a random effects meta-analysis is required. None of these is true. The scope for personalised medicine has very plausibly been exaggerated and an important cause of variation in the healthcare system, physicians, is often overlooked.
I shall argue that thinking about questions is important.
The response to the COVID-19 crisis by various vaccine developers has been extraordinary, both in terms of speed of response and the delivered efficacy of the vaccines. It has also raised some fascinating issues of design, analysis and interpretation. I shall consider some of these issues, taking as my example, five vaccines: Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, Novavax, and J&J Janssen but concentrating mainly on the first two. Among matters covered will be concurrent control, efficient design, issues of measurement raised by two-shot vaccines and implications for roll-out, and the surprising effectiveness of simple analyses. Differences between the five development programmes as they affect statistics will be covered but some essential similarities will also be discussed.
The statistical revolution of the 20th century was largely concerned with developing methods for analysing small datasets. Student’s paper of 1908 was the first in the English literature to address the problem of second order uncertainty (uncertainty about the measures of uncertainty) seriously and was hailed by Fisher as heralding a new age of statistics. Much of what Fisher did was concerned with problems of what might be called ‘small data’, not only as regards efficient analysis but also as regards efficient design and in addition paying close attention to what was necessary to measure uncertainty validly.
I shall consider the history of some of these developments, in particular those that are associated with what might be called the Rothamsted School, starting with Fisher and having its apotheosis in John Nelder’s theory of General Balance and see what lessons they hold for the supposed ‘big data’ revolution of the 21st century.
Talk given at ISCB 2016 Birmingham
For indications and treatments where their use is possible, n-of-1 trials represent a promising means of investigating potential treatments for rare diseases. Each patient permits repeated comparison of the treatments being investigated and this both increases the number of observations and reduces their variability compared to conventional parallel group trials.
However, depending on whether the framework for analysis used is randomisation-based or model- based produces puzzling difference in inferences. This can easily be shown by starting on the one hand with the randomisation philosophy associated with the Rothamsted school of inference and building up the analysis through the block + treatment structure approach associated with John Nelder’s theory of general balance (as implemented in GenStat®) or starting on the other hand with a plausible variance component approach through a mixed model. However, it can be shown that these differences are related not so much to modelling approach per se but to the questions one attempts to answer: ranging from testing whether there was a difference between treatments in the patients studied, to predicting the true difference for a future patient, via making inferences about the effect in the average patient.
This in turn yields interesting insight into the long-run debate over the use of fixed or random effect meta-analysis.
Some practical issues of analysis will also be covered in R and SAS®, in which languages some functions and macros to facilitate analysis have been written. It is concluded that n-of-1 hold great promise in investigating chronic rare diseases but that careful consideration of matters of purpose, design and analysis is necessary to make best use of them.
Acknowledgement
This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
Clinical trials: quo vadis in the age of covid?Stephen Senn
A discussion of the role of clinical trials in the age of COVID. My contribution to the phastar 2020 life sciences summit https://phastar.com/phastar-life-science-summit
It is argued that when it comes to nuisance parameters an assumption of ignorance is harmful. On the other hand this raises problems as to how far one should go in searching for further data when combining evidence.
What should we expect from reproducibiliryStephen Senn
Is there really a reproducibility crisis and if so are P-values to blame? Choose any statistic you like and carry out two identical independent studies and report this statistic for each. In advance of collecting any data, you ought to expect that it is just as likely that statistic 1 will be smaller than statistic 2 as vice versa. Once you have seen statistic 1, things are not so simple but if they are not so simple, it is that you have other information in some form. However, it is at least instructive that you need to be careful in jumping to conclusions about what to expect from reproducibility. Furthermore, the forecasts of good Bayesians ought to obey a Martingale property. On average you should be in the future where you are now but, of course, your inferential random walk may lead to some peregrination before it homes in on “the truth”. But you certainly can’t generally expect that a probability will get smaller as you continue. P-values, like other statistics are a position not a movement. Although often claimed, there is no such things as a trend towards significance.
Using these and other philosophical considerations I shall try and establish what it is we want from reproducibility. I shall conclude that we statisticians should probably be paying more attention to checking that standard errors are being calculated appropriately and rather less to inferential framework.
Sample size determination in clinical trials is considered from various ethical and practical perspectives. It is concluded that cost is a missing dimension and that the value of information is key.
An early and overlooked causal revolution in statistics was the development of the theory of experimental design, initially associated with the "Rothamstead School". An important stage in the evolution of this theory was the experimental calculus developed by John Nelder in the 1960s with its clear distinction between block and treatment factors in designed experiments. This experimental calculus produced appropriate models automatically from more basic formal considerations but was, unfortunately, only ever implemented in Genstat®, a package widely used in agriculture but rarely so in medical research. In consequence its importance has not been appreciated and the approach of many statistical packages to designed experiments is poor. A key feature of the Rothamsted School approach is that identification of the appropriate components of variation for judging treatment effects is simple and automatic.
The impressive more recent causal revolution in epidemiology, associated with Judea Pearl, seems to have no place for components of variation, however. By considering the application of Nelder’s experimental calculus to Lord’s Paradox, I shall show that this reveals that solutions that have been proposed using the more modern causal calculus are problematic. I shall also show that lessons from designed clinical trials have important implications for the use of historical data and big data more generally.
In Search of Lost Infinities: What is the “n” in big data?Stephen Senn
In designing complex experiments, agricultural scientists, with the help of their statistician collaborators, soon came to realise that variation at different levels had very different consequences for estimating different treatment effects, depending on how the treatments were mapped onto the underlying block structure. This was a key feature of the Rothamsted approach to design and analysis and a strong thread running through the work of Fisher, Yates and Nelder, being expressed in topics such as split-pot designs, recovering inter-block information and fractional factorials. The null block-structure of an experiment is key to this philosophy of design and analysis. However modern techniques for analysing experiments stress models rather than symmetries and this modelling approach requires much greater care in analysis, with the consequence that you can easily make mistakes and often will.
In this talk I shall underline the obvious, but often unintentionally overlooked, fact that understanding variation at the various levels at which it occurs is crucial to analysis. I shall take three examples, an application of John Nelder’s theory of general balance to Lord’s Paradox, the use of historical data in drug development and a hybrid randomised non-randomised clinical trial, the TARGET study, to show that the data that many, including those promoting a so-called causal revolution, assume to be ‘big’ may actually be rather ‘small’. The consequence is that there is a danger that the size of standard errors will be underestimated or even that the appropriate regression coefficients for adjusting for confounding may not be identified correctly.
I conclude that an old but powerful experimental design approach holds important lessons for observational data about limitations in interpretation that mere numbers cannot overcome. Small may be beautiful, after all.
Unfortunately, some have interpreted Numbers Needed to Treat as indicating the proportion of patients on whom the treatment has had a causal effect. This interpretation is very rarely, if ever, necessarily correct. It is certainly inappropriate if based on a responder dichotomy. I shall illustrate the problem using simple causal models.
One also sometimes encounters the claim that the extent to which two distributions of outcomes overlap from a clinical trial indicates how many patients benefit. This is also false and can be traced to a similar causal confusion.
This year marks the 70th anniversary of the Medical Research Council randomised clinical trial (RCT) of streptomycin in tuberculosis led by Bradford Hill. This is widely regarded as a landmark in clinical research. Despite its widespread use in drug regulation and in clinical research more widely and its high standing with the evidence based medicine movement, the RCT continues to attracts criticism. I show that many of these criticisms are traceable to failure to understand two key concepts in statistics: probabilistic inference and design efficiency. To these methodological misunderstandings can be added the practical one of failing to appreciate that entry into clinical trials is not simultaneous but sequential.
I conclude that although randomisation should not be used as an excuse for ignoring prognostic variables, it is valuable and that many standard criticisms of RCTs are invalid.
The Rothamsted school meets Lord's paradoxStephen Senn
Lords ‘paradox’ is a notoriously difficult puzzle that is guaranteed to provoke discussion, dissent and disagreement. Two statisticians analyse some observational data and come to radically different conclusions, each of which has acquired defenders over the years since Lord first proposed his puzzle in 1967. It features in the recent Book of Why by Pearl and McKenzie, who use it to demonstrate the power of Pearl’s causal calculus, obtaining a solution they claim is unambiguously right. They also claim that statisticians have failed to get to grips with causal questions for well over a century, in fact ever since Karl Pearson developed Galton’s idea of correlation and warned the scientific world that correlation is not causation.
However, only two years before Lord published his paradox John Nelder outlined a powerful causal calculus for analyzing designed experiments based on a careful distinction between block and treatment structure. This represents an important advance in formalizing the approach to analysing complex experiments that started with Fisher 100 years ago, when he proposed splitting variability using the square of the standard deviation, which he called the variance, continued with Yates and has been developed since the 1960s by Rosemary Bailey, amongst others. This tradition might be referred to as The Rothamsted School. It is fully implemented in Genstat® but, as far as I am aware, not in any other package.
With the help of Genstat®, I demonstrate how the Rothamsted School would approach Lord’s paradox and come to a solution that is not the same as the one reached by Pearl and McKenzie, although given certain strong but untestable assumptions it would reduce to it. I conclude that the statistical tradition may have more to offer in this respect than has been supposed.
Presidents' invited lecture ISCB Vigo 2017
Discusses various issues to do with how randomised clinical trials should be analysed. See also https://errorstatistics.com/2017/07/01/s-senn-fishing-for-fakes-with-fisher-guest-post/
History of how and why a complex cross-over trial was designed to prove the equivalence of two formulations of a beta-agonist and what the eventual results were. Presented at the Newton Institute 28 July 2008. Warning: following the important paper by Kenward & Roger Biostatistics, 2010, I no longer think the random effects analysis is appropriate, although, in fact the results are pretty much the same as for the fixed effects analysis.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
2. Acknowledgements
(c) Stephen Senn 2017 2
Acknowledgements
Thanks to the EMA for inviting me and to Olivier Collignon for organizing it
This work is partly supported by the European Union’s 7th Framework Programme for
research, technological development and demonstration under grant agreement no.
602552. “IDEAL”
P value wars
3. Outline
• Basics
– The difference between probability and statistics
– A simple example
• Recent criticisms of P-values/significance
– Ioannides
– Repligate and other psychology fights
– ASA statement
• A brief history of P-values
– The real (hidden) reason why P-values are so
controversial
• Conclusions
(c) Stephen Senn 2017 3P value wars
5. Probability versus Statistics
an example
Probability theory
• Q. This die is fair, what is
the probability that I will get
two sixes in two rolls of the
die
• A.
1
6
2
=
1
36
Statistical theory
• Q I rolled this die twice and
got two sixes. What is the
probability that the die is
fair?
• A. Nobody knows
(c) Stephen Senn 2017 5P value wars
6. Probabilists versus Statisticians:
the difference
Probabilists
• Are mathematicians
• Deal with direct probability
statements
• Plays a formal mathematical
game
• Are involved in the divine
– God knows that dice is fair;
let’s work out the
consequences
Statisticians
• Are scientists
• Deal with inverse
probability statements
• Are dealing with the real
world
• Are involved in the human
– We are down here trying to
work out the mind of God
(c) Stephen Senn 2017 6P value wars
7. P value wars (c) Stephen Senn 2017 7
An Example
My compact disc (CD) player* allowed me to
press tracks in sequential order by pressing play
or in random order by playing shuffle.
One day I was playing the CD Hysteria by Def Leppard. This CD has 12 tracks.
I thought that I had pressed the shuffle button but the first track played was
‘women’, which is the first track on the CD.
Q. What is the probability that I did, in fact, press the shuffle button as
intended?
*I now have an Ipod nano
8. P value wars (c) Stephen Senn 2017 8
That Heavy Metal Problem
The Bayesian Solution
We have two basic hypotheses:
1) I pressed shuffle.
2) I pressed play.
First we have to establish a so-called prior probability for these
hypotheses: a probability before seeing the evidence.
Suppose that the probability that I press the shuffle button when I
mean to press the shuffle button is 9/10. The probability of
making a mistake and pressing the play button is then 1/10.
9. P value wars (c) Stephen Senn 2017 9
Next we establish probabilities of events given theories. These
particular sorts of probabilities are referred to as likelihoods ,a
term due to RA Fisher(1890-1962).
If I pressed shuffle, then the probability that the first track will be
‘women’ (W) is 1/12. If I pressed play, then the probability that
the first track is W is 1.
For completeness (although it is not necessary for the solution)
we consider the likelihoods had any other track apart from
‘women’ (say X) been played.
If I pressed shuffle then the probability of X is 11/12. If I pressed
play then this probability is 0.
10. We can put this together as follows
Hypothesis Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 W 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120
Play 1/10 X 0 0
TOTAL 120/120 = 1
P value wars (c) Stephen Senn 2017 10
11. After seeing (hearing) the evidence, however, only
two rows remain
Hypothesis Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 W 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120
Play 1/10 X 0 0
TOTAL 21/120
P value wars (c) Stephen Senn 2017 11
12. P value wars (c) Stephen Senn 2017 12
The probabilities of the two cases which remain do not add up
to 1.
However, since these two cases cover all the possibilities which
remain, their combined probability must be 1.
Therefore we rescale the individual probabilities to make them
add to 1.
We can do this without changing their relative value by dividing
by their total, 21/120.
This has been done in the table below.
13. So we rescale by dividing by the total probability
Hypothesis Prior
Probability
P
Evidence Likelihood P x L Posterior
Probability
Shuffle 9/10 W 1/12 9/120 (9/120)/(21/120)
=9/21
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120 (12/120)/(21/120)
=12/21
Play 1/10 X 0 0
TOTAL 21/120 21/21=1
P value wars (c) Stephen Senn 2017 13
14. What is the difference between the
Bayesian probability and the P-value?
Bayesian
• It is the probability that the
null hypothesis is true given
the evidence
– It is my posterior
probability that I pressed
shuffle
• The value is 9/21
P-value
• It is the probability of the
evidence given the null
hypothesis
– Usually more extreme
evidence must be
included
• The value is 1/12
(c) Stephen Senn 2017 14P value wars
15. Extremism
• For many practical
applications every result
is unlikely
• Example: the probability
that the mean difference
in blood pressure is
exactly 5mmHg is
effectively zero
• This has led to calculation
of probability of result as
extreme or more extreme
to act as ‘p-value’
(c) Stephen Senn 2017 15P value wars
16. NB Probability statements are not
reversible
• Is the Pope a Catholic?
– Yes
• Is a Catholic the Pope?
– Probably not
• The probability of the evidence given the
hypothesis is not the same as the probability
of the hypothesis given the evidence
– As we saw from the CD player example
(c) Stephen Senn 2017 16P value wars
17. To sum up
• The Bayesian approach provides a complete
and logical solution
• But it requires prior probabilities
• You can’t just use arguments of symmetry
• Different people will come to different
conclusions
(c) Stephen Senn 2017 17P value wars
19. Ioannidis (2005)
• Claimed that most
published research
findings are wrong
– By finding he means a
‘positive’ result
• 4380 citations by 16
February 2017 according
to Google Scholar
(c) Stephen Senn 2017 19
𝑇𝑃𝑅 =
𝑅 1−𝛽 1+𝑅
𝛼+𝑅 1−𝛽 1+𝑅
=
𝑅 1−𝛽
𝛼+𝑅 1−𝛽
(TPR=True Positive Rate)
P value wars
21. The Crisis of Replication
(c) Stephen Senn 2017 21
In countless tweets….The “replication police” were described as “shameless little
bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not
designed to find truth,” “second stringers” who were incapable of making novel
contributions of their own to the literature, and—most succinctly—“assholes.”
Why Psychologists’ Food Fight Matters
“Important findings” haven’t been replicated, and science may have to change
its ways.
By Michelle N. Meyer and Christopher Chabris , Science
P value wars
22. Many Labs Replication Project
(c) Stephen Senn 2017 22
Klein et al, Social
Psychology, 2014P value wars
25. The ASA statement
(c) Stephen Senn 2017 25
1. P-values can indicate how incompatible the data are with a specified
statistical model.
2. P-values do not measure the probability that the studied hypothesis is true,
or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based
only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or
the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a
model or hypothesis.
P value wars
27. Sympathy for Ron Waserstein
(c) Stephen Senn 2017 27P value wars
28. In summary
• There has been mounting criticism of ‘null-
hypothesis-significance-testing’ (NHST) and P-
values in particular
• Some people think that ‘science is broke’
• Some people claim that P-values are to blame
– They give ‘significance’ far too easily
• There is a lot of advice, much of it conflicting,
flying around
(c) Stephen Senn 2017 28P value wars
29. An Example of the Problem
“If you want to avoid making
a fool of yourself very often,
do not regard anything
greater than p <0.001 as a
demonstration that you
have discovered something.
Or, slightly less stringently,
use a three-sigma rule.”
David Colquhoun
Royal Society Open Science
2014
In general, P values larger
than 0.01 should be
reported to two decimal
places, those between 0.01
and 0.001 to three decimal
places; P values smaller
than 0.001 should be
reported as P<0.001
New England Journal of
Medicine guidelines to
authors
(c) Stephen Senn 2017 29P value wars
30. A BRIEF HISTORY OF P-VALUES
Did Fisher really teach scientists to fish for significance?
(c) Stephen Senn 2017 30P value wars
31. (c) Stephen Senn 2017 31
The collective noun for statisticians
is “a quarrel”
John Tukey
P value wars
32. A Common Story
• Scientists were treading the path of Bayesian
reason
• Along came RA Fisher and persuaded them into a
path of P-value madness
• This is responsible for a lot of unrepeatable
nonsense
• We need to return them to the path of Bayesian
virtue
• In fact the history is not like this and
understanding this is a key to understanding the
problem
(c) Stephen Senn 2017 32P value wars
33. (c) Stephen Senn 2017 33
Fisher, Statistical Methods for Research Workers, 1925
P value wars
37. The real history
• Scientists before Fisher were using tail area probabilities to
calculate posterior probabilities
– This was following Laplace’s use of uninformative prior
distributions
• Fisher pointed out that this interpretation was unsafe and
offered a more conservative one
• Jeffreys, influenced by CD Broad’s criticism, was unsatisfied
with the Laplacian framework and used a lump prior
probability on a point hypothesis being true
– Etz and Wagenmakers have claimed that Haldane 1932 anticipated
Jeffreys
• It is Bayesian Jeffreys versus Bayesian Laplace that makes the
dramatic difference, not frequentist Fisher versus Bayesian
Laplace
(c) Stephen Senn 2017 37P value wars
39. CD Broad 1887*-1971
• Graduated Cambridge 1910
• Fellow of Trinity 1911
• Lectured at St Andrews &
Bristol
• Returned to Cambridge
1926
• Knightbridge Professor of
Philosophy 1933-1953
• Interested in epistemology
and psychic research
*NB Harold Jeffreys born 1891
(c) Stephen Senn 2017 39P value wars
41. What Jeffreys concluded
• If you have an uninformative prior distribution
the probability of a precise hypothesis is very
low
• It will remain low even if you have lots of data
consistent with it
• You need to allocate a solid lump of
probability that it is true
• Nature has decided, other things being equal,
that simpler hypotheses are more likely
(c) Stephen Senn 2017 41P value wars
43. Why the difference?
• Imagine a point estimate of two standard
errors
• Now consider the likelihood ratio for a given
value of the parameter, under the
alternative to one under the null
– Dividing hypothesis (smooth prior) for any given
value = ’ compare to = -’
– Plausible hypothesis (lump prior) for any given
value = ’ compare to = 0
(c) Stephen Senn 2017 43P value wars
44. In summary
• The major disagreement is not between P-values
and Bayes using informative prior distribution
• It’s between two Bayesian approaches
– Using uninformative prior distributions
– Using highly informative one
• The conflict is not going to go away by banning P-
values
• There is no automatic Bayesianism
– You have to do it for real
(c) Stephen Senn 2017 44P value wars
46. What you should know
• P=0.05 is a weak standard of evidence
• Requiring two trials is a good idea
– NB If you replace them by a single larger trial or a
meta-analysis ask for P=1/800
– Variation in results from trial to trial is natural
• Don’t just rely on P-values
• Bayes is harder to do than many people think
• Pay attention to the ASA advice
(c) Stephen Senn 2017 46P value wars
47. In Conclusion
The Solid Six?
(c) Stephen Senn 2017 47
1. P-values can indicate how incompatible the data are with a specified
statistical model.
2. P-values do not measure the probability that the studied hypothesis is true,
or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based
only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or
the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a
model or hypothesis.
P value wars
48. However
48
Proponents of the “Bayesian revolution” should be wary of chasing yet
another chimera: an apparently universal inference procedure. A better
path would be to promote both an understanding of the various devices in
the “statistical toolbox” and informed judgment to select among these.
Gigerenzer and Marewski,
Journal of Management, Vol. 41 No. 2, February 2015 421–440
(c) Stephen Senn 2017P value wars