SlideShare a Scribd company logo
1 of 45
Download to read offline
The role of background
assumptions in severity appraisal
@Lakens
If we have a statistics war,
what are the underlying
causes? Economic gain,
territorial gain, religion, or
nationalism?
If we have a statistics war,
what are the underlying
causes? Economic gain,
territorial gain, religion, or
(or philosophy of science)
Thoughts about how people
should use statistics follow
from a philosophy of science.
So what is the goal of
statistical inferences?
[Rosnow & Rosenthal, 1989]
[Neyman & Pearson, 1933]
[Frick, 1996]
[Taper & Lele, 2011]
Researchers like to make claims
in titles and conclusions. It’s how
we draw attention to our work
(and get others annoyed enough
so that they will try to prove us
wrong).
Claims can be correct or wrong.
If we would use a coin flip as a
methodological procedure to
make claims, we would be right
50% of the time.
To be right a bit more often,
hypothesis testing procedures
were developed that control
error rates in the long run.
[Cox, 2020]
After p < alpha, we should formally conclude:
“We claim there is an effect, while
acknowledging that if scientists make claims
using this methodological procedure, they will
be misled at most alpha% or beta% of the
time, which we deem acceptable. Let’s for the
foreseeable future (until new data emerges that
proves us wrong) assume our claim is correct.”
[Cox, 1958]
[Neyman, 1957]
When we “accept” or “reject” a
hypothesis in a Neyman-Pearson
approach to statistical inferences,
we do not communicate any belief
or conclusion about the substantive
hypothesis.
Instead, we utter a Popperian basic
statement, based on a prespecified
decision rule and given background
assumptions, that the observed data
reflect a certain state of the world. This
basic statement corroborates our
prediction, or not.
The alternative to using a
methodological procedure that makes
claims with a known error rate, is to
use a methodological procedure that
does not make claims, or makes
claims with unknown error rates.
Without p-values we either do
not build a collectively agreed
upon corpus of findings, or we
build a corpus of findings that,
given bias, is wrong more often
than we want.
One attractive property of this
methodological approach to make
scientific claims is that the scientific
community can collectively agree
upon the severity with which a
claim has been tested.
If we design a study with 99.9%
power for the smallest effect size
of interest and use a 0.1% alpha
level, everyone agrees the risk
of an erroneous claim is low.
Or can we? Is the evaluation of
severity always simply a matter of
quantified error rates? In
practice, it seems not.
As researchers have started to
preregister, it turns out they
often preregister uninformed
predictions, and change their
analysis plan.
Deviations can be improvements
(as Meehl says: Don’t make a
mockery of honest ad-hoccery).
Deviations trade guaranteed
error control against a subjective
evaluation of higher validity.
From a methodological
falsificationist philosophy, high
validity is conceptually strongly
related to severity. They might
even be the same thing.
The following is based on work with:
Aline Claessen
Wolf Vanpaemel
Noah van Dongen
Many of the crises in psychology
(measurement crisis,
generalizability crisis, theory
crisis) are specific instances of
problems with severity.
Let’s define severity as the ratio of the
probability of observing evidence (e)
given the hypothesis (H) and
background knowledge (B), to
observing evidence (e) given only the
background knowledge (B).
S(e,H,B) = Pr(e|H,B) / Pr(e|B).
[Popper, 1963]
Colloquially: Getting it right
when you are right and getting it
wrong when you are wrong.
In the last decade most of the focus
has been on statistics. Problems
related to low statistical power,
flexibility in the data analysis, and
publication bias all impact severity.
The severity of statistical tests
can be quantified. But error rates
depend on assumptions, and
different researchers can
disagree on these assumptions.
For example, researcher A
preregisters a Student’s t-test, while
researcher B believe the
homogeneity assumption will be
violated and wants to see a Welch’s
t-test.
The severity of statistical tests
can be quantified (if we agree on
assumptions). But how severely a
claim is tested does not only
depend on the test result.
The theory crisis is caused by a
lack of a clear specification of a
theory’s universality and
precision.
[Glöckner & Betsch, 2011; Oberauer & Lewandowsky, 2019]
Just as flexibility in the Type 1 error
rate reduces the probability of
being wrong when you are wrong,
lack of specificity reduces the
possibility of a theory being proven
wrong.
Performing a direct replication in
such cases is rather uninteresting
because the results would have
little to no consequence for our
evaluation of the theory.
The measurement/validity crisis
is a problem with the severity of
claims.
For example, lack of measurement
invariance increases Pr(e|B) relative
to Pr(e|H,B). It can become
relatively easy to observe a
difference, even if the theory is
false.
Similarly, measurement error and
lack of construct validity reduce the
severity of tests as you are less
likely to get it right when you’re
right or wrong when you’re wrong.
If we want severely tested claims,
researchers need to reach
agreement on assumptions
about data, measurement, and
theories.
This requires an extremely
systematic approach to
hypothesis testing (a systematic
replications framework) to narrow
down all auxiliary hypotheses.
[Uygun-Tunç & Tunç 2020]
Summary
For some researchers, science is
about making claims.
If we make claims, we want these
to be severely tested.
Summary
How severely as statistical test is
can be quantified (under
assumptions).
But claims depend on more than
the statistical result.
Summary
All of the crises in psychology (and
other sciences) can be viewed from a
severity perspective.
The solution is to systematically test
auxiliary hypotheses to increase test
severity.
Thanks!

More Related Content

Similar to The role of background assumptions in severity appraisal (

Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxkarlhennesey
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxadampcarr67227
 
Types of research design experiments
Types of research design   experimentsTypes of research design   experiments
Types of research design experimentsrozy_kalsi
 
Answer questions Minimum 100 words each and reference (questions.docx
Answer questions Minimum 100 words each and reference (questions.docxAnswer questions Minimum 100 words each and reference (questions.docx
Answer questions Minimum 100 words each and reference (questions.docxamrit47
 
COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020Ellie Murray
 
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration  as a Tool to Evaluate the Severity of a TestD. Lakens: Preregistration  as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Testjemille6
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Causal inference for complex exposures: asking questions that matter, getting...
Causal inference for complex exposures: asking questions that matter, getting...Causal inference for complex exposures: asking questions that matter, getting...
Causal inference for complex exposures: asking questions that matter, getting...Ellie Murray
 

Similar to The role of background assumptions in severity appraisal ( (20)

Rm 3 Hypothesis
Rm   3   HypothesisRm   3   Hypothesis
Rm 3 Hypothesis
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Module7_RamdomError.pptx
Module7_RamdomError.pptxModule7_RamdomError.pptx
Module7_RamdomError.pptx
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
Russo Vub Seminar
Russo Vub SeminarRusso Vub Seminar
Russo Vub Seminar
 
Russo Vub Seminar
Russo Vub SeminarRusso Vub Seminar
Russo Vub Seminar
 
WF ED 540 Hypothesis Testing - 2018
WF ED 540 Hypothesis Testing - 2018WF ED 540 Hypothesis Testing - 2018
WF ED 540 Hypothesis Testing - 2018
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docx
 
Types of research design experiments
Types of research design   experimentsTypes of research design   experiments
Types of research design experiments
 
Answer questions Minimum 100 words each and reference (questions.docx
Answer questions Minimum 100 words each and reference (questions.docxAnswer questions Minimum 100 words each and reference (questions.docx
Answer questions Minimum 100 words each and reference (questions.docx
 
COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020
 
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration  as a Tool to Evaluate the Severity of a TestD. Lakens: Preregistration  as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
 
Causal inference for complex exposures: asking questions that matter, getting...
Causal inference for complex exposures: asking questions that matter, getting...Causal inference for complex exposures: asking questions that matter, getting...
Causal inference for complex exposures: asking questions that matter, getting...
 
Russo Ihpst Seminar
Russo Ihpst SeminarRusso Ihpst Seminar
Russo Ihpst Seminar
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 

Recently uploaded

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 

The role of background assumptions in severity appraisal (

  • 1. The role of background assumptions in severity appraisal @Lakens
  • 2. If we have a statistics war, what are the underlying causes? Economic gain, territorial gain, religion, or nationalism?
  • 3. If we have a statistics war, what are the underlying causes? Economic gain, territorial gain, religion, or (or philosophy of science)
  • 4. Thoughts about how people should use statistics follow from a philosophy of science. So what is the goal of statistical inferences?
  • 9. Researchers like to make claims in titles and conclusions. It’s how we draw attention to our work (and get others annoyed enough so that they will try to prove us wrong).
  • 10. Claims can be correct or wrong. If we would use a coin flip as a methodological procedure to make claims, we would be right 50% of the time.
  • 11. To be right a bit more often, hypothesis testing procedures were developed that control error rates in the long run.
  • 13. After p < alpha, we should formally conclude: “We claim there is an effect, while acknowledging that if scientists make claims using this methodological procedure, they will be misled at most alpha% or beta% of the time, which we deem acceptable. Let’s for the foreseeable future (until new data emerges that proves us wrong) assume our claim is correct.”
  • 16. When we “accept” or “reject” a hypothesis in a Neyman-Pearson approach to statistical inferences, we do not communicate any belief or conclusion about the substantive hypothesis.
  • 17. Instead, we utter a Popperian basic statement, based on a prespecified decision rule and given background assumptions, that the observed data reflect a certain state of the world. This basic statement corroborates our prediction, or not.
  • 18. The alternative to using a methodological procedure that makes claims with a known error rate, is to use a methodological procedure that does not make claims, or makes claims with unknown error rates.
  • 19. Without p-values we either do not build a collectively agreed upon corpus of findings, or we build a corpus of findings that, given bias, is wrong more often than we want.
  • 20. One attractive property of this methodological approach to make scientific claims is that the scientific community can collectively agree upon the severity with which a claim has been tested.
  • 21. If we design a study with 99.9% power for the smallest effect size of interest and use a 0.1% alpha level, everyone agrees the risk of an erroneous claim is low.
  • 22. Or can we? Is the evaluation of severity always simply a matter of quantified error rates? In practice, it seems not.
  • 23. As researchers have started to preregister, it turns out they often preregister uninformed predictions, and change their analysis plan.
  • 24. Deviations can be improvements (as Meehl says: Don’t make a mockery of honest ad-hoccery). Deviations trade guaranteed error control against a subjective evaluation of higher validity.
  • 25. From a methodological falsificationist philosophy, high validity is conceptually strongly related to severity. They might even be the same thing.
  • 26. The following is based on work with: Aline Claessen Wolf Vanpaemel Noah van Dongen
  • 27. Many of the crises in psychology (measurement crisis, generalizability crisis, theory crisis) are specific instances of problems with severity.
  • 28. Let’s define severity as the ratio of the probability of observing evidence (e) given the hypothesis (H) and background knowledge (B), to observing evidence (e) given only the background knowledge (B). S(e,H,B) = Pr(e|H,B) / Pr(e|B). [Popper, 1963]
  • 29. Colloquially: Getting it right when you are right and getting it wrong when you are wrong.
  • 30. In the last decade most of the focus has been on statistics. Problems related to low statistical power, flexibility in the data analysis, and publication bias all impact severity.
  • 31. The severity of statistical tests can be quantified. But error rates depend on assumptions, and different researchers can disagree on these assumptions.
  • 32. For example, researcher A preregisters a Student’s t-test, while researcher B believe the homogeneity assumption will be violated and wants to see a Welch’s t-test.
  • 33. The severity of statistical tests can be quantified (if we agree on assumptions). But how severely a claim is tested does not only depend on the test result.
  • 34. The theory crisis is caused by a lack of a clear specification of a theory’s universality and precision. [Glöckner & Betsch, 2011; Oberauer & Lewandowsky, 2019]
  • 35. Just as flexibility in the Type 1 error rate reduces the probability of being wrong when you are wrong, lack of specificity reduces the possibility of a theory being proven wrong.
  • 36. Performing a direct replication in such cases is rather uninteresting because the results would have little to no consequence for our evaluation of the theory.
  • 37. The measurement/validity crisis is a problem with the severity of claims.
  • 38. For example, lack of measurement invariance increases Pr(e|B) relative to Pr(e|H,B). It can become relatively easy to observe a difference, even if the theory is false.
  • 39. Similarly, measurement error and lack of construct validity reduce the severity of tests as you are less likely to get it right when you’re right or wrong when you’re wrong.
  • 40. If we want severely tested claims, researchers need to reach agreement on assumptions about data, measurement, and theories.
  • 41. This requires an extremely systematic approach to hypothesis testing (a systematic replications framework) to narrow down all auxiliary hypotheses. [Uygun-Tunç & Tunç 2020]
  • 42. Summary For some researchers, science is about making claims. If we make claims, we want these to be severely tested.
  • 43. Summary How severely as statistical test is can be quantified (under assumptions). But claims depend on more than the statistical result.
  • 44. Summary All of the crises in psychology (and other sciences) can be viewed from a severity perspective. The solution is to systematically test auxiliary hypotheses to increase test severity.