Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improve your study with pre-registration

1,115 views

Published on

4 major threats to reproducibility are publication bias, low power, p-hacking and HARKing. In this talk I explain these terms and show how study pre-registration can fix them

Published in: Science
  • Be the first to comment

Improve your study with pre-registration

  1. 1. Improving your project through pre-registration Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee
  2. 2. The Reproducibility Crisis “There is increasing concern about the reliability of biomedical research, with recent articles suggesting that up to 85% of research funding is wasted.” Bustin, S. A. (2015). The reproducibility of biomedical research: Sleepers awake! Biomolecular Detection and Quantification 2005. PLoS Medicine, 2(8), e124. doi: 10.1371/journal.pmed.0020124
  3. 3. Four key factors leading to poor reproducibility P-hackingPublication bias Low power HARKing
  4. 4. Thought experiment: You have submitted a paper to Current Biology evaluating effect of computer games on dyslexia How likely is your paper to be accepted if you report: • 20 participants; beneficial effect of intervention, p < .05 • 20 participants; group difference is non-significant • 200 participants; group difference is non-significant
  5. 5. Thought experiment: You have submitted a paper to Current Biology evaluating effect of computer games on dyslexia How likely is your paper to be accepted if you report: • 20 participants; beneficial effect of intervention, p < .05 • 20 participants; group difference is non-significant • 200 participants; group difference is non-significant • http://deevybee.blogspot.co.uk/2013/03/high-impact-journals-where.html Ample evidence that many journals – especially ‘high impact’ journals prioritise newsworthiness over methodological quality: Reluctant to publish null results = PUBLICATION BIAS
  6. 6. 1956 De Groot 1975 Greenwald The “file drawer” problem 1979 Rosenthal Prejudice against the null “As it is functioning in at least some areas of behavioral science research, the research- publication system may be regarded as a device for systematically generating and propagating anecdotal information.” Publication bias
  7. 7. 1956 De Groot 1975 Greenwald 1987 Newcombe “Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant.” 1979 Rosenthal Low power POWER problem: Journals typically willing to publish a significant finding with a very small sample, even if they would not think of doing so for a null result
  8. 8. P-hacking and HARKing
  9. 9. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis-generating (exploratory) research -> misuse of statistical tests Historical timeline: concerns about reproducibility Acta Psychologica 148, 2014, pp. 188-194 The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers et al. //doi.org/10.1016/j.actpsy.2014.02.001
  10. 10. Here is a correlation matrix from a study that measured various perceptual skills in relation to reading ability in children. How should you report/interpret results? N = 20 subjects Created from script on: https://osf.io/skz3j/ * p < .05, ** p < .01
  11. 11. Key question: Did researcher specifically predict this association?
  12. 12. • Probability that a specific prespecified pair of variables will be correlated at p < .05 level when null is true = .05. • Probability that at least one of 21 correlations will meet p < .05 when null is true: = 1 - .95^21 = .64 (i.e. one minus prob. that NONE is significant) • Bonferroni-corrected significance level = .05/21 = .002 • With N = 20 subjects, to reach .002 significance level, r = .61 If you didn’t predict this specific association and you report uncorrected p-values, this is P-hacking
  13. 13. P-hacking -> huge risk of false positives
  14. 14. You run a study investigating how a drug, X, affects anxiety. You plot the results by age, and see this: No significant effect of X on anxiety overall -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  15. 15. But you notice that there is a treatment effect for those aged over 36 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  16. 16. How should you analyse/report this result? • We tested whether X affects anxiety • We tested whether X affects anxiety in people aged over 36 years • We tested whether age affects the impact of X on anxiety -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  17. 17. How should you analyse/report this result? • We tested whether X affects anxiety -TRUE • We tested whether X affects anxiety in people aged over 36 years – UNTRUE, and most would agree unacceptable • We tested whether age affects the impact of X on anxiety – UNTRUE, but many would think acceptable, given results -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptomimprovement Age (yr) Treatment effect by age
  18. 18. Close link between p-hacking and HARKing You are HARKing if you have no prior predictions, but on seeing results you write up paper as if you planned to look at effect of age on drug effect. This kind of thing is endemic in psychology. • It is OK to say that this association was observed in exploratory analysis, and that it suggests a new hypothesis that needs to be tested in a new sample. • It is NOT OK to pretend that you predicted the association if you didn’t. • And it is REALLY REALLY NOT OK to report only the data that support your new hypothesis (e.g. dropping those aged below 36 from the analysis) -1 -0.5 0 0.5 1 16 20 24 28 32 36 40 44 48 52 56 60 Symptom improvement Age (yr) Treatment effect by age
  19. 19. HARKING Capitalises on chance and produces huge risk of false positives Widespread in many fields – and even explicitly encouraged by some influential people
  20. 20. Which Article Should You Write? There are two possible articles you can write: (a) the article you planned to write when you designed your study or (b) the article that makes the most sense now that you have seen the results. They are rarely the same, and the correct answer is (b). re Data Analysis: Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something— anything —interesting. Writing the Empirical Journal Article Daryl J. Bem The Compleat Academic: A Practical Guide for the Beginning Social Scientist, 2nd Edition. Washington, DC: American Psychological Association, 2004. “This book provides invaluable guidance that will help new academics plan, play, and ultimately win the academic career game.” Explicitly advises HARKing!
  21. 21. HARKING seems innocuous but it fills the literature with dross
  22. 22. ‘We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.’ Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2012). A 21 Word Solution. SPSP Dialogue,26, 2, Fall 2012 issue. One solution to protect against HARKing: Suggested wording in write-up to keep researchers honest
  23. 23. A more comprehensive solution: Pre-registration
  24. 24. Plan study Do study Submit to journal Respond to reviewer comments Publish paper Acceptance! Classic publishing
  25. 25. Plan study Do study Submit to journal Respond to reviewer comments Publish paper Plan study Submit to journal Respond to reviewer comments Do study Publish paper Acceptance! Classic publishing Registered reports Acceptance!
  26. 26. Registered reports solves issues of: • Publication bias: publication decision made on the basis of quality of introduction/methods, before results are known • Low power: researchers required to have 90% power • P-hacking: analysis plan specified up-front • HARKing: hypotheses specified up-front. Unanticipated findings can be reported but clearly demarcated as ‘exploratory’
  27. 27. Registered reports But problematic for student projects because: • Time-scale means delay before data collected • Power requirements often hard to meet Plan study Submit to journal Respond to reviewer comments Do study Publish paper Acceptance!
  28. 28. An alternative: Preregistration lite: Open Science Framework
  29. 29. Pre-registration on OSF • Similar to regular publication route • No guarantee of publication • But reviewers generally positive about preregistered papers because prevents p-hacking or HARKing • And benefits of having well-worked out plan – less stress when it comes to making sense of data Plan study Submit plan to OSF Check by OSF statistician Do study Submit to journal Respond to reviewer comments Publish paper Acceptance!
  30. 30. Advantages of pre-registration through OSF • Free methodological/statistical consulting: http://cos.io/stats_consulting • Have a date-stamped record of what was planned which can be referred to when publishing • Encourages open data and scripts • Work has greater impact • Errors get detected • Improves reproducibility • Possibility of winning $1000 !
  31. 31. https://osf.io/jea94/ Even if you decide not to formally pre-register, this template is likely to be useful for planning your project Here I won’t go through template, but I will note points that crop up when trying to complete it
  32. 32. What is your research question? Points to consider • What type of question? Yes/No, Why, How? • How could it be improved? – is it too general/too precise Hypotheses • Can you formulate specific predictions? • E.g. X will be bigger than Y • X will be bigger than zero • X will vary systematically with Y • Are predictions directional? Study information
  33. 33. What is your study type? Are you systematically manipulating a variable to see its effect? = Experiment Or are you looking at relationships between variables that occur naturally? = Observational Study
  34. 34. Rationale for proposed sample size OSF note this could include a power analysis but also constraints such as time, money, availability of a particular group Power analysis Can use GPower http://www.gpower.hhu.de/en.html For more complex designs, simulate data with given effect size and repeatedly run through analysis to see how often you can detect the effect of interest (see Lazic, 2016) N.B. Power analysis is not the only way to rationalise sample size, but it is most common Sample size
  35. 35. When we talk of ‘control’ we usually mean ‘negative’ controls – where we compare effect of X with a situation that is identical except for X – to isolate specific effect. In a positive control: aim is to rule out trivial explanations for null result – e.g. manipulation didn’t work; participants didn’t attend, etc. Examples: You are comparing autistic and typically developing children on a false memory task – does paradigm yield false memory effect in the typically-developing group? You are interested whether there is suppression of the mu frequency band in EEG (regarded as ‘mirror neuron’ activity) when participants view hand gestures: - is there mu suppression when participants perform hand gestures? Making sense of null results: 1. Positive controls
  36. 36. Making sense of null results: 2. Bayes factors ”Bayes factors provide a coherent approach to determining whether non- significant results support a null hypothesis over a theory, or whether the data are just insensitive.”
  37. 37. Designing an analysis script using simulated data • Aim: create a process that is completely transparent and increase the likelihood that your analysis can be replicated • Analysis script reads in raw data as input, does all analysis and generates Tables/Figures/Summary statistics as output • Avoids common scenario where researcher cannot remember how results were generated • Scripts need to be extensively ’commented’ to explain what the code is doing • Increasingly researchers are using scripts written in R, Matlab or Python, but can also create scripts easily in SPSS.
  38. 38. Building a script in SPSS We start by simulating some data (see last week’s lecture) Can be either random data (null hypothesis) or data with an effect of interest added Here are random numbers (Score) allocated to groups 1 and 2
  39. 39. Building a script in SPSS Now do whatever analysis steps you usually do via the GUI Here we have selected options for a t- test But instead of hitting OK, we hit Paste
  40. 40. Building a script in SPSS Hitting Paste opens a new window with a script in it This shows the code behind the analysis done in the GUI You can run the analysis by pressing the big green button You can add to the script, add comments, and save it. Then you can rerun it any time with a new dataset Benefit: you have a complete record of the analysis Hitting Paste opens a new window with a script in it This shows the code behind the analysis done in the GUI You can run the analysis by pressing the big green button You can add to the script, add comments, and save it. Then you can rerun it any time with a new dataset Benefit: you have a complete record of the analysis
  41. 41. Free Coursera lectures https://www.coursera.org/learn/statistical-inferences Further suggestions for study
  42. 42. https://www.slideshare.net/deevybishop/bishop-reproducibility-references-nov2016 https://www.slideshare.net/deevybishop/what-is-the-reproducibility-crisis-in-science- and-what-can-we-do-about-it Experimental Design for Laboratory Biologists : Maximising Information and Improving Reproducibility Stanley E Lazic The 7 deadly sins of Psychology Chris Chambers
  43. 43. http://christophergandrud.github.io/RepResR-RStudio/ “Treat all of your research files as if someone who has not worked on the project will, in the future, try to understand them.”
  44. 44. https://images.nature.com/original/nature- cms/uploads/ckeditor/attachments/4127/RegisteredReportsGuidelines_NatureHumanBe haviour.pdf Require either 95% power or Bayesian equivalent For Bayes, recommend https://osf.io/d4dcu/ Going further: author/reviewer guidelines for Registered Reports in Nature Human Behaviour

×