Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Talk on reproducibility in EEG research

403 views

Published on

Talk at Cutting EEG meeting, 3 July 2018

Published in: Science
  • Be the first to comment

Talk on reproducibility in EEG research

  1. 1. Reproducible EEG: obtaining replicable results by avoiding the garden of forking paths and taking measurement seriously Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee
  2. 2. What is the replication crisis? “There is increasing concern about the reliability of biomedical research, with recent articles suggesting that up to 85% of research funding is wasted.” Bustin, S. A. (2015). The reproducibility of biomedical research: Sleepers awake! Biomolecular Detection and Quantification 2005. PLoS Medicine, 2(8), e124. doi: 10.1371/journal.pmed.0020124
  3. 3. 1975 Greenwald The “file drawer” problem 1979 Rosenthal “As it is functioning in at least some areas of behavioral science research, the research- publication system may be regarded as a device for systematically generating and propagating anecdotal information.” Publication bias Historical timeline: concerns about reproducibility
  4. 4. 1975 Greenwald 1987 Newcombe “Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant.” 1979 Rosenthal Weak statistical power
  5. 5. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis-generating (exploratory) research -> misuse of statistical tests 1975 Greenwald 1987 Newcombe 1979 Rosenthal
  6. 6. 1 contrast Probability of a ‘significant’ p-value < .05 = .05 Large population database used to explore link between ADHD and handedness
  7. 7. Focus just on Young subgroup: 2 contrasts at this level Probability of a ‘significant’ p-value < .05 = .10 Large population database used to explore link between ADHD and handedness
  8. 8. Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of a ‘significant’ p-value < .05 = .19 Large population database used to explore link between ADHD and handedness
  9. 9. Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of a ‘significant’ p-value < .05 = .34 Large population database used to explore link between ADHD and handedness
  10. 10. Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level Probability of a ‘significant’ p-value < .05 = .56 Large population database used to explore link between ADHD and handedness
  11. 11. Terrible twins: p-hacking – searching for significant p-values in exploratory data HARKing – then retrofitting a hypothesis
  12. 12. The four horsemen of the Apocalypse P-hacking Publication bias Low power HARKing
  13. 13. Failure to understand how flexible data analysis produces false positives http://deevybee.blogspot.co.uk/2016/01/the-amazing-significo-why-researchers.html Problems identified in social/cognitive psychology also affect EEG research
  14. 14. Precise replication very rare in EEG research Flexible choice of: time window frequency band filtering electrode referencing measurement artifact rejection outlier exclusion
  15. 15. Intervals used for MMN measurement in studies reviewed by Bishop (2007) (frequency deviants) Bishop, D. V. M. (2007). Using mismatch negativity to study central auditory processing in developmental language and literacy impairments: where are we, and where should we be going? Psychological Bulletin, 133, 651-672. doi:10.1037/0033-2909.133.4.651
  16. 16. Frequency bands used for mu suppression: First 25 studies from meta-analysis by Fox et al Fox, N. A.,et al (2016). Assessing human mirror activity with EEG mu rhythm: a meta-analysis. Psychological Bulletin, 142(3), 291-313. doi:10.1037/bul0000031 Variation in: • Method of freq analysis • Baseline • Duration of epoch • Electrodes
  17. 17. Double-dipping common in ERP research Inspect data to look for effect; then analyse • This is widely accepted and even encouraged • But only OK if analyses independent of method of selecting data ‘Inspection of the numerical values of the time–frequency spectra averaged across electrodes and participants indicated that power changes were most prominent at 9.38 Hz (σf = 1.56), 13.13 Hz (σf = 1.64), and 19.69 Hz (σf = 1.71)2. Accordingly, these three frequency ranges representing the alpha (8–12 Hz) and lower beta frequency bands (13–21 Hz) were selected for further statistical analysis’ ‘To determine the frequency band that corresponded to mu rhythm for each subject, the perform condition was subtracted from the rest condition and the resulting 2 Hz bandwith that best resembled mu suppression was chosen as the participant's mu frequency band’
  18. 18. Simulation 1: Double dipping can create effects out of noise 16 simulated noisy autocorrelated functions – no true signal; mean of 30 trials Dotted lines show window for MMN Red line is peak in that interval Single-sample t-test on mean amplitude: t (15) = 1.36, p = 0.194 Single-sample t-test on peak amplitude: t (15) = -8.86, p < .0001
  19. 19. Simulation 2: Double dipping with electrodes can create false group effect Simulated functions where all are based on a true MMN signal Noise and autocorrelation across points added to simulate 16 subjects
  20. 20. Simulation 2: Double dipping can create false group effect All electrodes simulated in same way: two groups of 8, no true group effect; Repeated runs find at least one electrode with group effect p < .05 in around 25% of runs
  21. 21. Simulation 3: Problems with multiway ANOVA • Often see analyses with 4- or 5-way ANOVA (group x side x site x condition x interval) • Standard stats packages correct p-values for N levels WITHIN a factor, but not for overall N factors and interactions • 3-way ANOVA has three main effects, three 2-way interactions; one 3-way interactions Bonferroni-corrected p-value for exploratory 3-way ANOVA is .05/7 = .007 .Bonferroni-corrected p-value for exploratory 4-way ANOVA is .05/15 = .003 Cramer AOJ, et al 2016. Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic Bulletin & Review 23:640-647
  22. 22. Results from repeated runs of simulated random data: no true effect
  23. 23. Individual differences research The Holy Grail: EEG components/indices as biomarkers The reality: Often too unreliable to consistently index a trait Which neuroimaging measures are useful for individual differences research? The tl;dr version A neuroimaging measure is potentially useful for individual differences research if variation between people is substantially greater than variation within the same person tested on different occasions. This means that we need to know about the reliability of our measures, before launching into studies of individual differences. High reliability is not sufficient to ensure a good measure, but it is necessary. Bishopblog: Sunday, 28 May 2017
  24. 24. RELIABILITY: individual differences or unmodeled noise? Once validity is maximized (inasmuch as current technology allows), it is also crucial to ensure that individual differences measured with fMRI are not merely attributable to unaccounted- for noise in the measurements Dubois J, and Adolphs R. 2016 Building a Science of Individual Differences from fMRI. Trends in Cognitive Sciences. http://dx.doi.org/10.1016/j.tics.2016.03.014 Dubois & Adolphs 2016 Writing about fMRI but same points apply to EEG
  25. 25. Fancy pants statistics alone is not the solution Better understanding of statistics is the solution P-hacking and low powerP-hacking and low power
  26. 26. Solutions a. Using simulated datasets to give insight into statistical methods https://www.slideshare.net/deevybishop/ introduction-to-simulating-data-to- improve-your-research
  27. 27. Solutions b. Study design: include ‘dummy’ effects Blue and red lines show mismatch ERPs for large and small deviants. Grey shows mismatch for ‘dummy’ deviant, which is identical to standard Bishop, D. V. M., et al. (2010). Lower-frequency event-related desynchronization: a signature of late mismatch responses to sounds, t. Journal of Neuroscience, 30(46), 15578-15584.
  28. 28. Solutions c. Replication • Subdivide data into exploration and replication sets. • Or replicate in another dataset Standard in genetics
  29. 29. http://deevybee.blogspot.co.uk/ Reproducibility problem exacerbated by journals • More concern for newsworthiness than methods • Won’t publish replications (or failures to replicate) • Won’t publish ‘negative’ findings
  30. 30. Solutions d. Preregistration of analyses
  31. 31. Plan study Do study Submit to journal Respond to reviewer comments Publish paper Plan study Submit to journal Respond to reviewer comments Do study Publish paper Acceptance! Classic publishing Registered reports Acceptance!
  32. 32. Plan study Do study Submit to journal Respond to reviewer comments Publish paper Plan study Submit to journal Respond to reviewer comments Do study Publish paper Acceptance! Classic publishing Registered reports Acceptance!
  33. 33. Registered reports solves issues of: • Publication bias: publication decision made on the basis of quality of introduction/methods, before results are known • Low power: researchers required to have 90% power • P-hacking: analysis plan specified up-front • HARKing: hypotheses specified up-front. Unanticipated findings can be reported but clearly demarcated as ‘exploratory’
  34. 34. Pre-registration on OSF • Similar to regular publication route • No guarantee of publication • But reviewers generally positive about preregistered papers because prevents p-hacking or HARKing • And benefits of having well-worked out plan – less stress when it comes to making sense of data Plan study Submit plan to OSF Check by OSF statistician Do study Submit to journal Respond to reviewer comments Publish paper Acceptance!
  35. 35. https://github.com/oscci/miscellaneous http://deevybee.blogspot.com/2012/11/ bishopblog-catalogue-updated-24th-nov.html Useful links Code for simulated MMN data Blogposts – see Reproducibility section of Bishopblog Slides on pre-registration & other related topics https://www.slideshare.net/deevybishop

×