Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
1 of 43

CuttingEEG - Open Science, Open Data and BIDS for EEG



Download to read offline

Starting with education, inception of research questions, planning, acquisition, analysis and reporting, there are multiple points where Open Science should play a role. In my presentation at the CuttingEEG conference in Paris, I argue that we should not only be sharing primary outcomes as Open Access publications, but that openness involves the full research cycle. Specifically, I will be sharing my experience with Open Data, privacy challenges and possibilities under the GDPR, Open Source for sharing analysis methods, dealing with imperfections in science and versioning of data, code and results. Finally, I will introduce BIDS for EEG, a new effort to increase the impact of shared and well-documented EEG data.

More Related Content

You Might Also Like

Related Books

Free with a 30 day trial from Scribd

See all

CuttingEEG - Open Science, Open Data and BIDS for EEG

  1. 1. Open Science, Open Data and BIDS for EEG Robert Oostenveld Donders Institute, Radboud University, Nijmegen, NL Karolinska Institutet, Stockholm, SE These slides will be shared online on slideshare
  2. 2. Outline of this session Dorothy Bishop – simulate and pre-register for more reproducible EEG Aina Puce – better and detailed reporting of results Robert Oostenveld – sharing of data and analysis details
  3. 3. What is Open Science? Open educational resources Open access publications Pre-registration Open peer review Open methodology Open source Open hardware Open data
  4. 4. What is Open Science?
  5. 5. Library of Charcot - ICM
  6. 6. Science – methods and tools are changing
  7. 7. Open Science – infrastructure and tools Git and GitHub, Gitlab, BitBucket Work together on code for analyses Open Science Framework ( Work together on documenting DataVerse, Zenodo, etc Sharing of data Code Ocean, Microsoft Azure, Anaconda Clould Cloud-based computational reproducibility platform Past - Black-and-white version of article printed on dead trees Present - PDF for download, sometimes online supplementary material Future - Online notebooks that reproduces the results in detail Lab notebook Science is getting more exiting – but also harder in some ways
  8. 8. Open Science – planning ahead Planning your analysis Planning and publishing primary outcomes Writing your scientific papers Writing your PhD thesis Public outreach Planning and publishing secondary outcomes Publishing details on the methods Data management plan Publishing your data
  9. 9. Sharing primary and secondary outcomes Publication with the primary findings To the wider audience To your scientific peers Methods Protocol Stimulus material Analysis methods Original data Details on the results
  10. 10. Open Science – planning ahead Planning your analysis Planning and publishing primary outcomes Writing your scientific papers Writing your PhD thesis Public outreach Planning and publishing secondary outcomes Publishing details on the methods Data management plan Publishing your data
  11. 11. Share/publish your methods More details in your analysis than fits in your “Methods” section Not possible to describe details in human-oriented text Batch scripting MATLAB, Python, R, SPSS, Bash, … Analysis script corresponds to computer code Version management tools for source code Git, Subversion, Mercurial GitHub, Gitlab, BitBucket
  12. 12. Version control - linear V1 V2 V3 V4 2018-02-24 2018-03-16 2018-05-30 2018-06-05
  13. 13. Version control – branching … V1 V2 V3-YoursV3-CoAuth1 V3-CoAuth2 V4-Merged Version control – branching and merging
  14. 14. Version control – branching multiple analyses V1 V2 V3-bV3-a V3-failed V4-bV4-a V5
  15. 15. Version control – collaborating V1 V2 V3-bV3-a V3- failed V4-bV4-a V5 V1 V2 V3-bV3-a V3- failed V4-bV4-a V5 V1 V2 V3-bV3-a V3- failed V4-bV4-a V5 Your copy on your computer Your copy on github Someone else’s copy on github V5-a V6-a V5-a V6-a V5-a V6-a V7
  16. 16. Open Science – version control Multiple versions/editions of your analysis scripts Release when you think it is ready, i.e. upon publication New revision when it has been improved Versions of software … at a time scale of years/months/weeks Editions of books … at a time scale of decennia Original scientific data stays constant, but its interpretation may change over time.
  17. 17. Data Management Plan Think about the data that you will collect and how to document it … since you want others to re-use your data Document the details of your data, e.g. in a “codebook” … since you want to use data collected by others To learn new analysis skills As pilot For (re)analysis … since you want to re-use your own data Write documentation for your “future self”
  18. 18. Open Data Findable Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier. Accessible Data is deposited in a trusted repository. Authentication and authorization procedure where necessary. Interoperable (Meta)data uses a formal, shared, and broadly applicable language or format. Reusable Data is described with clear and understandable attributes. There should be a clear and acceptable license for re-use.
  19. 19. Open Science - data Shared data allows for Improved reproducibility Pooling, small effects that require large group sizes Data mining, discovery science and generating new hypothesis Results in methodological opportunities Improve algorithms Estimate effect and group size Make informed decisions on analysis pipeline Prevent harking and p-hacking
  20. 20. Data from human participants General Data Protection Regulation (GDPR) Challenges: Explicit and strict protection of personal data Opportunities: Less influence of national legislation differences Learn from each other Develop best practices
  21. 21. Open data versus privacy
  22. 22. Personal data name address date of birth phone number license plate number IP address ... Crime Scene Investigation
  23. 23. (Biometric) data facial details dental record fingerprint genetics cortical folding pattern clinical data cortical response to stimulation responses to a questionaire
  24. 24. Personal Data is needed and should be managed Required for administration Contacting your participants Paying your participants Follow up incidental findings Often not required to address the research question Sometimes used as confound Check whether the sample is representative Possibly required to assess scientific integrity
  25. 25. Personal Data Personal data Name, address, date of birth Special personal data = “bijzondere persoonsgegevens in NL” Race Religion or beliefs Health Sexual activities Political preference, membership of a union Criminal record Indirect personal data – identifies someone … when linked to another database Fingerprint, DNA, facial details Anatomical MRI Specific pattern of data (e.g. answers on a questionnaire or interview)
  26. 26. Gradient between personal and research data indirect personal data personal data a lot of research data easy easyhard Keep private and don’t share Share as it is with others ?
  27. 27. Limit possible identification Anonymous Nobody is able to identify the participant Pseudonymization Use a code instead of the participants name De-identification Remove (indirectly) identifying features Blur the indirect personal data Deface anatomical MRI Age at the time of acquisition instead of date of birth Use age bins instead of years Questionnaire outcomes rather than individual item scores …
  28. 28. Appropriate blurring depends on the situation … for example the age of the participant 1 month bins 10 year bins
  29. 29. Personal and research data indirect personal data personal data a lot of research data
  30. 30. Personal and research data data minimization pseudonymization data minimization de-identifying, blurring alotofresearchdata personaldata indirect personaldata Share responsibly with legal constraints on reuse Keep safe and private
  31. 31. Legal constraints Contract between the researcher … and the funding agency … and the ethics committee … and the participants/patients … and the publisher of the results … and the recipient of the data upon sharing
  32. 32. Legal constraints – Data Use Agreement CC0 - Public Domain No copyright. The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. Donders Institute - Data Use Agreement for identifiable human data I will comply with all relevant rules and regulations imposed by my institution and my government …. I will not attempt to establish the identity of or attempt to contact any of the included human subjects. I will not link this data to any other database in a way that could provide identifying information …. I will not redistribute or share the data with others, including individuals in my research group, unless they have independently applied and been granted access to this data. I will acknowledge the use of the data and data derived from the data when publicly presenting … Failure to abide by these guidelines will result in termination of my privileges to access to these data. participant → you → recipient
  33. 33. Brain Imaging Data Structure
  34. 34. What is is? BIDS is a way to organize your existing raw data To improve consistent and complete documentation To facilitate re-use by your future self and others BIDS is not A new file format A search engine A data sharing tool
  35. 35. BIDS for MRI, MEG, EEG … in future also iEEG, PET, eye-tracker, etc. data/README CHANGES dataset_description.json participants.tsv /sub-01/anat/… /sub-01/meg/… /sub-01/eeg/sub-01_task-auditory_eeg.edf /sub-01/eeg/sub-01_task-auditory_eeg.json /sub-01/eeg/sub-01_task-auditory_channels.tsv /sub-01/eeg/sub-01_task-auditory_events.tsv /sub-01/eeg/sub-01_electrodes.tsv /sub-01/eeg/sub-01_coordinates.json EDF BrainVision Neuroscan Biosemi EEGLAB .set
  36. 36. Metadata in ”sidecar” files Participants Demographics Questionaire outcomes Equipment Amplifier, cap, electrode type and placement Filter settings, reference Design, task and conditions Instructions, stimuli material, responses Trigger codes Also some details from EEG data to make querying easier
  37. 37. Why use BIDS? Developed with open community discussion and involvement of experienced researchers Neuroinformatics and analysis tools available for it EEGLAB, FieldTrip, MNE-Python, BrainStorm Increases the chance of your data being indexed and reused (Future) applications for searching, automated analyses, … But … it is more important that you share and what you share than how you share it
  38. 38. Summary New tools to be adopted for Open Science Planning ahead for analysis and data Version control and release of analysis details Data management plan Responsible sharing, considering your participants’ rights Organizing EEG data according to BIDS
  39. 39. Suggested further reading This presentation on

Editor's Notes

  • Vragen aan eind
    Verzoek van subject om zijn data te wissen -> informed consent procedure
    Beschrijving van metadata -> koppeling aan externe ontologies
  • Dorothy: simulations, dummy conditions, replicate yourself, pre-registration
  • Review and critical evaluation beyond publication – open methods, tools and data
  • Open Science touches upon each aspect of the research cycle, you start using it when thinking and planning, all the way through maximizing your impact
  • These have already been discussed in detail by Adrienn in the previous lecture
  • OSF is the service by the Center for Open Science
  • Introduction, Methods, Results and Discussion
    The closer your scientific peers are, the more interest they will have in the middle section
  • This is something you know, not only from code but also from manuscripts that you write
  • Personal data is what the CSI will search for .. And if they cannot find it they will look at biometric data
  • ×