SlideShare a Scribd company logo
1 of 58
Critically Interpreting Statistics
Statistics for Experimental Research
Simon Columbus
simon@simoncolumbus.com
2
Today
I. Interpreting statistical results
II. Evaluating experimental evidence
a. Not all published results are true
b. Making sense of the mess
III. Doing good and open research
I. Honest methods
II. Open research
3
INTERPRETING STATISTICAL
RESULTS
4
An Example: Gender Bias
Van der Lee & Ellemers, 2015
Applications Successful
Male 1635 17.7%
Female 1188 14.9%
5
Simpson’s Paradox
Kievit et al., 2013; Bickel, Hammel, & O’Connell, 1975
6
Simpson’s Paradox
Kievit et al., 2013; Bickel, Hammel, & O’Connell, 1975
7
Simpson’s Paradox
• Women apply more in
some disciplines than in
others
• Disciplines with more
female applicants have
lower success rates
• Per discipline, women are
no less successful than
men
• But: gender bias may lie
elsewhere
– Why are “female”
disciplines less successful?
Van der Lee & Ellemers, 2015; Albers, 2015
8
A Matter of Life and Death
• The case Lucia de B.
– Dutch nurse
– Suspected of murdering
up to 9 patients
– Sentenced in 2004 for
five murders and two
attempts
• Statistical evidence
– Probability of that many
chance deaths during
one nurse’s shifts
http://www.kennislink.nl/publicaties/toch-statistiek-in-de-zaak-lucia-de-b; Gill, Groeneboom, & de Jong, 2010
9
p = .000000003
10
Interpreting statistical results
Shifts with incident Shifts without
incident
Total
Lucia on shift 9 133 142
Lucia not on shift 0 887 887
Total 9 1020 1029
• Data accuracy
– Actually, 5 during Lucia’s shift, 2 at other times
• Confounding variables
– Deaths cluster in time
– Stratify by day to account for time
http://www.kennislink.nl/publicaties/toch-statistiek-in-de-zaak-lucia-de-b; Gill, Groeneboom, & de Jong, 2010
11
p = .038
12
First Conclusion
• Watch out for
confounding variables
• Be wary of Simpson’s
paradox
– Stratify data
• Statistics may not be
“wrong”, but they may
answer the wrong
question
13
EVALUATING EXPERIMENTAL
EVIDENCE, PART I: PROBLEMS
14
Ioannidis, 2005
15
Ioannidis, 2005; McNeil, 2011
16
Bem, 2011; Wagenmakers et al., 2011
17
Questionable Research Practices
• File drawer problem
– Publication bias
• HARK-ing
– “Hypothesising after
results are known”
• Optional stopping
– Collecting more data
until a result is
significant
“I confess, Oprah… I was doping
when writing my international
publications…”
Image: KU Leuven
18
The File Drawer
• Negative results often
are not published
– E.g., Bem (2011) did not
report all measured
variables
• Suppresses evidence
against the effect
Rosenthal, 1979; Bakker, van Dijk, & Wicherts, 2012; Franco, Malhotra, & Simonovits, 2014
19
The File Drawer: Political Science
• Time-sharing Experiments in the Social
Sciences
– Political science studies run by the American NSF
Franco, Malhotra, & Simonovits, 2014
Never
published
Written,
unpublished
Published % Published
Null result 31 7 11 22
Mixed result 10 32 43 51
Strong result 4 31 57 62
20
HARK-ing
• Researchers may run
studies, but only come up
with hypotheses after
looking at the data
– “Fishing” for p-values
– Many small, under-
powered studies
– Inflates the type I error
rate
– Confuses exploratory and
confirmatory research
Kerr, 1998
21
Recap: Type I Error Rate
• The type I error rate is set by α
– α = .05: A 1 in 20 chance to falsely reject the null
• Multiple testing inflates the type I error rate
– 1 in 20 chance of a false positive per test
– With two tests, the chance of a false positive is almost 1 in
10, etc.
• Questionable research practices increase the chance of
false positive findings
http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf
22
HARK-ing: The Baby Factory
• “When a clear and interesting story could be
told about significant findings, the original
motivation was often abandoned. […] ‘You
want to know how it works? We have a bunch
of half-baked ideas. We run a bunch of
experiments. Whatever data we get, we
pretend that’s what we were looking for.’”
Peterson, 2016
23
Why HARK-ing?
• Small studies often have
low power
– Even if an effect exists, the
chance to identify it is low
– Average power in social
psychology around 50%
– I.e., even if there is an
effect, every second study
will fail
• Gaining more power is
expensive
– Does not increase linearly
with sample size
Cohen, 1962; Rossi, 1990; http://rpsychologist.com/d3/NHST/
24
Optional Stopping
• Continue collecting data
until a significant result
is obtained
– n = 30, test, not
significant; n = 35, test,
not significant; n = 40,
test, significant, stop.
– Inflates the type I error
rate
– Can be done correctly
(Schönbrodt, 2016)
25
Optional Stopping: The Baby Factory
“Rather than waiting for the results from a set
number of infants, experimenters began
‘eyeballing’ the data as soon as babies were run
and often began looking for statistical
significance after just 5 or 10 subjects. […] When
the preliminary data looked good, the test
continued. […] But when, after just a few
subjects, no significance was found, the original
protocol was abandoned and new variations
were developed.”
Peterson, 2016
26
Measuring Reproducibility
• Reproducibility Project:
Psychology
– 100 papers published in
3 journals in 2008
– One result from each
paper replicated once
– High-powered
replications
Open Science Collaboration, 2015
27
Measuring Reproducibility
About 40 out of 100 results were successfully replicated.
Open Science Collaboration, 2015
28
Measuring Reproducibility
Replication project in behavioural economics
• 11 out of 18 studies were replicated successfully.
Camerer et al., 2016
29
Interpreting (Non-) Replications
• A failure to replicate does
not mean the effect is not
real
– Replication may have low
power
– Even with high power, non-
replication is possible
– There may be differences
between original and
replication, e.g. cultural
variation
• Reproducibility projects
estimate the proportion
over non-replicable findings
Open Science Collaboration, 2015
30
Second Conclusion
• Many (key) results are
unreliable
– p-hacking and
publication bias distort
the scientific literature
• File drawer
• HARK-ing
• Optional stopping
– Reproducibility projects
indicate many false
positives in psychology
and economics
31
EVALUATING EXPERIMENTAL
EVIDENCE, PART II: SOLUTIONS
32
Second Conclusion
• Many (key) results are
unreliable
– p-hacking and
publication bias distort
the scientific literature
• File drawer
• HARK-ing
• Optional stopping
– Reproducibility projects
indicate many false
positives in psychology
and economics
• Science is becoming
more open and honest
– Aggregation of results
– Incentives for replication
33
Meta-Analyses
• Statistically summarise
results from many
separate studies
– Combine evidence for
and against an effect
Flore & Wicherts, 2015
34
Meta-Analyses
• Statistically summarise
results from many
separate studies
– Combine evidence for
and against an effect
• Susceptible to
publication bias
– Excessively high effect
size estimate
– Can be detected with
funnel plots
Flore & Wicherts, 2015
35
The File Drawer, Unlocked
• PsychFileDrawer
– Repository of
unpublished replications
– Online repositories make
publication easier
http://www.psychfiledrawer.org/chart.php?target_article=33
36
Registered Replication Reports
• Replication of specific
effects
– Important effects
– Prior doubt about effects
• Independent replication
– Not involving the original
authors
– Often involving multiple
labs
• Pre-registered
– No publication bias
http://www.psychologicalscience.org/index.php/replication/ongoing-projects
37
Curate Science
http://curatescience.org/#sbh2008a
38
Students Can Contribute
• Student projects are
particularly suitable for
replication efforts
– Opportunity to learn
research practices
– Contribute to
improvement of science
Grahe et al., 2012; Frank & Saxe, 2012; King, 2006
39
Third Conclusion
• Meta-analyses can
summarise studies
statistically
– File drawer problem
– Online repositories make
publication of null results
easier
• Registered replication
reports
– Provide reliable estimates
– Eliminate publication bias
40
HONEST RESEARCH AND OPEN
SCIENCE
41
Honest Research and Open Science
• Purely confirmatory
research
– Pre-register all statistical
analyses
– Only claim registered
analyses as hypothesis
tests
– Use strong statistical
tests
– Openly share methods
and data
Wagenmakers et al., 2012
42
Pre-registration
• Pre-register analyses
– How are data going to be
collected?
– How many subjects are
going to be recruited?
– When are outliers
excluded?
https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
43
Exclusion Rules
When one subject makes
all the difference…
http://www.ted.com/talks/dan_ariely_beware_conflicts_of_interest
44
Pre-registration
• Pre-register analyses
– How are data going to be
collected?
– How many subjects are
going to be recruited?
– When are outliers
excluded?
– What statistical
techniques are going to
be used?
https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
45
Statistical Diversity
• 1 data set
– Do darker-skinned
football players get more
red cards?
– Four different leagues
• 29 teams of analysts
Silberzahn et al., 2015; http://www.nature.com/news/crowdsourced-research-many-hands-make-tight-work-1.18508
46
Statistical Diversity
Silberzahn et al., 2015; http://www.nature.com/news/crowdsourced-research-many-hands-make-tight-work-1.18508
47
Pre-registration
• Pre-register analyses
– How are data going to be
collected?
– How many subjects are
going to be recruited?
– When are outliers
excluded?
– What statistical
techniques are going to
be used?
• Several platforms
https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
48
Registration and Exploration
• We need both
exploratory and
confirmatory research
– Pre-registration does not
prevent exploratory
research
– Exploratory and
confirmatory must be
labelled as such
Tukey, 1980
49
Cross-validation
• So you’ve found a
significant result…
– … through exploration
• Cross-validate analyses
– New data set
– Split data set
50
In Neuroscience: Double Dipping
• Using the same data
twice
– First, to set the
parameters of the
analysis
– Second, to run the
analysis
• Over-fitting the model
– Make the model fit the
data too much
Kriegeskorte et al., 2009
51
Registration and Exploration
• We need both
exploratory and
confirmatory research
– Pre-registration does not
prevent exploratory
research
– Exploratory and
confirmatory must be
labelled as such
Tukey, 1980
52
Publishing Pre-registered Research
• Badges
– Psychological Science
• Registered reports
– About two dozen
journals in psychology,
medicine, and politics
53
Pre-registration Works
• In medicine, pre-
registration is
mandatory
– When outcomes must be
pre-registered, null
results become more
common
Kaplan & Irvin, 2015
54
Sharing Data
• Sharing data openly
– For re-analysis
– For meta-analysis
– For archiving
– For teaching
• Sharing materials
openly
– For replication
http://re3data.org; http://osf.io; http://figshare.com
55
Why Sharing Data Matters
• Growth in a Time of
Debt
– Key study to justify
austerity policies
– Re-analysed by a 28-
year-old graduate
student
– Excel coding error led to
significant results
Reinhart & Rogoff, 2010; Herndon, Ash, & Pollin, 2013
56
Fourth Conclusion
• Honest research
– Explicit hypotheses
– Pre-registered methods
– Separating exploratory
and confirmatory
• Open Science
– Detailed methods
sections
– Open data sharing
57
Six Lessons for a Critical Reader
1. Consider methods, not just p-values
2. Be wary of small studies, even if they are many
3. Appreciate meta-analyses, but watch out for
publication bias
4. Independent replication is key – and you can
contribute
5. Value pre-registered analyses
6. Use open data
58
ENJOY YOUR BREAK!

More Related Content

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Critically Interpreting Statistics

  • 1. Critically Interpreting Statistics Statistics for Experimental Research Simon Columbus simon@simoncolumbus.com
  • 2. 2 Today I. Interpreting statistical results II. Evaluating experimental evidence a. Not all published results are true b. Making sense of the mess III. Doing good and open research I. Honest methods II. Open research
  • 4. 4 An Example: Gender Bias Van der Lee & Ellemers, 2015 Applications Successful Male 1635 17.7% Female 1188 14.9%
  • 5. 5 Simpson’s Paradox Kievit et al., 2013; Bickel, Hammel, & O’Connell, 1975
  • 6. 6 Simpson’s Paradox Kievit et al., 2013; Bickel, Hammel, & O’Connell, 1975
  • 7. 7 Simpson’s Paradox • Women apply more in some disciplines than in others • Disciplines with more female applicants have lower success rates • Per discipline, women are no less successful than men • But: gender bias may lie elsewhere – Why are “female” disciplines less successful? Van der Lee & Ellemers, 2015; Albers, 2015
  • 8. 8 A Matter of Life and Death • The case Lucia de B. – Dutch nurse – Suspected of murdering up to 9 patients – Sentenced in 2004 for five murders and two attempts • Statistical evidence – Probability of that many chance deaths during one nurse’s shifts http://www.kennislink.nl/publicaties/toch-statistiek-in-de-zaak-lucia-de-b; Gill, Groeneboom, & de Jong, 2010
  • 10. 10 Interpreting statistical results Shifts with incident Shifts without incident Total Lucia on shift 9 133 142 Lucia not on shift 0 887 887 Total 9 1020 1029 • Data accuracy – Actually, 5 during Lucia’s shift, 2 at other times • Confounding variables – Deaths cluster in time – Stratify by day to account for time http://www.kennislink.nl/publicaties/toch-statistiek-in-de-zaak-lucia-de-b; Gill, Groeneboom, & de Jong, 2010
  • 12. 12 First Conclusion • Watch out for confounding variables • Be wary of Simpson’s paradox – Stratify data • Statistics may not be “wrong”, but they may answer the wrong question
  • 17. 17 Questionable Research Practices • File drawer problem – Publication bias • HARK-ing – “Hypothesising after results are known” • Optional stopping – Collecting more data until a result is significant “I confess, Oprah… I was doping when writing my international publications…” Image: KU Leuven
  • 18. 18 The File Drawer • Negative results often are not published – E.g., Bem (2011) did not report all measured variables • Suppresses evidence against the effect Rosenthal, 1979; Bakker, van Dijk, & Wicherts, 2012; Franco, Malhotra, & Simonovits, 2014
  • 19. 19 The File Drawer: Political Science • Time-sharing Experiments in the Social Sciences – Political science studies run by the American NSF Franco, Malhotra, & Simonovits, 2014 Never published Written, unpublished Published % Published Null result 31 7 11 22 Mixed result 10 32 43 51 Strong result 4 31 57 62
  • 20. 20 HARK-ing • Researchers may run studies, but only come up with hypotheses after looking at the data – “Fishing” for p-values – Many small, under- powered studies – Inflates the type I error rate – Confuses exploratory and confirmatory research Kerr, 1998
  • 21. 21 Recap: Type I Error Rate • The type I error rate is set by α – α = .05: A 1 in 20 chance to falsely reject the null • Multiple testing inflates the type I error rate – 1 in 20 chance of a false positive per test – With two tests, the chance of a false positive is almost 1 in 10, etc. • Questionable research practices increase the chance of false positive findings http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf
  • 22. 22 HARK-ing: The Baby Factory • “When a clear and interesting story could be told about significant findings, the original motivation was often abandoned. […] ‘You want to know how it works? We have a bunch of half-baked ideas. We run a bunch of experiments. Whatever data we get, we pretend that’s what we were looking for.’” Peterson, 2016
  • 23. 23 Why HARK-ing? • Small studies often have low power – Even if an effect exists, the chance to identify it is low – Average power in social psychology around 50% – I.e., even if there is an effect, every second study will fail • Gaining more power is expensive – Does not increase linearly with sample size Cohen, 1962; Rossi, 1990; http://rpsychologist.com/d3/NHST/
  • 24. 24 Optional Stopping • Continue collecting data until a significant result is obtained – n = 30, test, not significant; n = 35, test, not significant; n = 40, test, significant, stop. – Inflates the type I error rate – Can be done correctly (Schönbrodt, 2016)
  • 25. 25 Optional Stopping: The Baby Factory “Rather than waiting for the results from a set number of infants, experimenters began ‘eyeballing’ the data as soon as babies were run and often began looking for statistical significance after just 5 or 10 subjects. […] When the preliminary data looked good, the test continued. […] But when, after just a few subjects, no significance was found, the original protocol was abandoned and new variations were developed.” Peterson, 2016
  • 26. 26 Measuring Reproducibility • Reproducibility Project: Psychology – 100 papers published in 3 journals in 2008 – One result from each paper replicated once – High-powered replications Open Science Collaboration, 2015
  • 27. 27 Measuring Reproducibility About 40 out of 100 results were successfully replicated. Open Science Collaboration, 2015
  • 28. 28 Measuring Reproducibility Replication project in behavioural economics • 11 out of 18 studies were replicated successfully. Camerer et al., 2016
  • 29. 29 Interpreting (Non-) Replications • A failure to replicate does not mean the effect is not real – Replication may have low power – Even with high power, non- replication is possible – There may be differences between original and replication, e.g. cultural variation • Reproducibility projects estimate the proportion over non-replicable findings Open Science Collaboration, 2015
  • 30. 30 Second Conclusion • Many (key) results are unreliable – p-hacking and publication bias distort the scientific literature • File drawer • HARK-ing • Optional stopping – Reproducibility projects indicate many false positives in psychology and economics
  • 32. 32 Second Conclusion • Many (key) results are unreliable – p-hacking and publication bias distort the scientific literature • File drawer • HARK-ing • Optional stopping – Reproducibility projects indicate many false positives in psychology and economics • Science is becoming more open and honest – Aggregation of results – Incentives for replication
  • 33. 33 Meta-Analyses • Statistically summarise results from many separate studies – Combine evidence for and against an effect Flore & Wicherts, 2015
  • 34. 34 Meta-Analyses • Statistically summarise results from many separate studies – Combine evidence for and against an effect • Susceptible to publication bias – Excessively high effect size estimate – Can be detected with funnel plots Flore & Wicherts, 2015
  • 35. 35 The File Drawer, Unlocked • PsychFileDrawer – Repository of unpublished replications – Online repositories make publication easier http://www.psychfiledrawer.org/chart.php?target_article=33
  • 36. 36 Registered Replication Reports • Replication of specific effects – Important effects – Prior doubt about effects • Independent replication – Not involving the original authors – Often involving multiple labs • Pre-registered – No publication bias http://www.psychologicalscience.org/index.php/replication/ongoing-projects
  • 38. 38 Students Can Contribute • Student projects are particularly suitable for replication efforts – Opportunity to learn research practices – Contribute to improvement of science Grahe et al., 2012; Frank & Saxe, 2012; King, 2006
  • 39. 39 Third Conclusion • Meta-analyses can summarise studies statistically – File drawer problem – Online repositories make publication of null results easier • Registered replication reports – Provide reliable estimates – Eliminate publication bias
  • 40. 40 HONEST RESEARCH AND OPEN SCIENCE
  • 41. 41 Honest Research and Open Science • Purely confirmatory research – Pre-register all statistical analyses – Only claim registered analyses as hypothesis tests – Use strong statistical tests – Openly share methods and data Wagenmakers et al., 2012
  • 42. 42 Pre-registration • Pre-register analyses – How are data going to be collected? – How many subjects are going to be recruited? – When are outliers excluded? https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
  • 43. 43 Exclusion Rules When one subject makes all the difference… http://www.ted.com/talks/dan_ariely_beware_conflicts_of_interest
  • 44. 44 Pre-registration • Pre-register analyses – How are data going to be collected? – How many subjects are going to be recruited? – When are outliers excluded? – What statistical techniques are going to be used? https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
  • 45. 45 Statistical Diversity • 1 data set – Do darker-skinned football players get more red cards? – Four different leagues • 29 teams of analysts Silberzahn et al., 2015; http://www.nature.com/news/crowdsourced-research-many-hands-make-tight-work-1.18508
  • 46. 46 Statistical Diversity Silberzahn et al., 2015; http://www.nature.com/news/crowdsourced-research-many-hands-make-tight-work-1.18508
  • 47. 47 Pre-registration • Pre-register analyses – How are data going to be collected? – How many subjects are going to be recruited? – When are outliers excluded? – What statistical techniques are going to be used? • Several platforms https://www.socialscienceregistry.org/; http://egap.org/content/registration; http://ridie.3ieimpact.org/; http://osf.io
  • 48. 48 Registration and Exploration • We need both exploratory and confirmatory research – Pre-registration does not prevent exploratory research – Exploratory and confirmatory must be labelled as such Tukey, 1980
  • 49. 49 Cross-validation • So you’ve found a significant result… – … through exploration • Cross-validate analyses – New data set – Split data set
  • 50. 50 In Neuroscience: Double Dipping • Using the same data twice – First, to set the parameters of the analysis – Second, to run the analysis • Over-fitting the model – Make the model fit the data too much Kriegeskorte et al., 2009
  • 51. 51 Registration and Exploration • We need both exploratory and confirmatory research – Pre-registration does not prevent exploratory research – Exploratory and confirmatory must be labelled as such Tukey, 1980
  • 52. 52 Publishing Pre-registered Research • Badges – Psychological Science • Registered reports – About two dozen journals in psychology, medicine, and politics
  • 53. 53 Pre-registration Works • In medicine, pre- registration is mandatory – When outcomes must be pre-registered, null results become more common Kaplan & Irvin, 2015
  • 54. 54 Sharing Data • Sharing data openly – For re-analysis – For meta-analysis – For archiving – For teaching • Sharing materials openly – For replication http://re3data.org; http://osf.io; http://figshare.com
  • 55. 55 Why Sharing Data Matters • Growth in a Time of Debt – Key study to justify austerity policies – Re-analysed by a 28- year-old graduate student – Excel coding error led to significant results Reinhart & Rogoff, 2010; Herndon, Ash, & Pollin, 2013
  • 56. 56 Fourth Conclusion • Honest research – Explicit hypotheses – Pre-registered methods – Separating exploratory and confirmatory • Open Science – Detailed methods sections – Open data sharing
  • 57. 57 Six Lessons for a Critical Reader 1. Consider methods, not just p-values 2. Be wary of small studies, even if they are many 3. Appreciate meta-analyses, but watch out for publication bias 4. Independent replication is key – and you can contribute 5. Value pre-registered analyses 6. Use open data