Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Similar to The Donders Repository(20)

Advertisement
Advertisement

The Donders Repository

  1. A short overview of the Donders Repository https://data.donders.ru.nl Presentation for the Erwin Hahn Institute - 3 February 2021 Robert Oostenveld r.oostenveld@donders.ru.nl
  2. The Donders Repository Outline The aims of the Donders Repository Procedural design Technical architecture Fitting it into the researchers daily work What data goes where? The timeline of a project and its data Closing collections, review and versions Making open data FAIR BIDS as specific standard for neuroimaging data Demonstration Automatic data flow Informed consent and GDPR Data Use Agreements
  3. About myself Member of the Research Infrastructure Committee since 2003. MEG physicist since 2009. Affiliated researcher at Karolinska Institutet since 2015. Associate PI at DCCN since 2020. Main research interest in the development of data analysis methods for MEG/EEG, such as source reconstruction, spectral analysis methods and stats. Also strong interest in Open Science and Team Science. Shortly after the “F.C. Donders Centre” started, I initiated the FieldTrip project (together with colleagues). In 2010 I got involved in the Human Connectome Project. In 2014 I got involved with the Donders- and RU-wide RDM efforts, together with Eric Maris, Erik van der Boogert, and Hurng-Chung Lee (the DR “core team”).
  4. Aims for the Donders Repository Research initiation Data acquisition Data analysis & documentation Data sharing Secure the original research data Document the research process Make the data accessible to the right people Your PI, your current collaborators, your future successors Your audience, i.e. sharing of open data Keep the data accessible in the future (beyond your contract)
  5. The Donders Repository Summary of the development timeline Active contributors: University board, Legal department, Security officer, central IT department, Research Information Services (library etc.), other interested institutes 2014: initial planning 2015: design of data management protocols and specification of IT requirements 2016: first implementation accessible to researchers 2016-2018:refinements and improvements, increase adoption 2019-now: scale up to the whole Radboud University, improvements to scalability The Donders Repository is presently used by 1873 researchers to manage their research data, organized in approximately 1500 collections with about 150 TB data. There are 144 published (open access) data sharing collections. https://data.donders.ru.nl https://data.ru.nl
  6. The Donders Repository Procedural design Different roles: administration, managers, contributors, viewers Different collections: for raw data (Data Acquisition Collection, DAC) for processed data (Research Documentation Collection, RDC) for publicly shared data (Data Sharing Collection, DSC) Collection states: Open/editable (read-write) Internal/external review (read-only) Archived or Published (permanent read-only, DOI) It should allow for large data (1000s of files, 100s of GB) that are organized per collection by the researcher. No zip files required, no limits. Authenticated access from inside and outside the institute. Not a replacement for the (much faster) work-in-progress storage systems.Suitable as a long- term (>10 year) archive. It should be scalable and grow along with our needs, also when IT subsystems change. Research initiation Data acquisition Data analysis & documentation Data sharing
  7. The Donders Repository Technical architecture IRODS/ICAT Low-level (meta)data management software JAVA middleware JAVA frontend web access Elastic stack (ELK) WebDAV future … file access Scalable network-attached storage system Isilon, Compellent, CephFS, future … Replication storage IRODS/ICAT future … Federated IDP Surfconext & ORCID (SAML) past … DataCite DOI
  8. The Donders Repository Thinking along with the researchers’ struggles with their data What to do when? - While preparing the project… - While acquiring the data… - When implementing the analysis pipeline… - When finalizing the manuscript for submission… Making it attractive for our (junior) researchers. This is part of the embedding that researchers get at the DCCN, it also includes support for ethics, experimental design, acquisition, analysis, high- performance computing, etc. The typical PhD student or Postdoc goes through this cycle a few times while at the DCCN. Research initiation Data acquisition Data analysis & documentation Data sharing
  9. Scanners and labs Donders Repository Central storage /project/30xx0yy.zz DICOM, physiology, eyetracking, MEG, presentation log files, questionaires, … “raw” analysis scripts and intermediate results ”shared” DAC DSC RDC convert, deface, etc de-identified data and some results
  10. The Donders Repository The timeline of a project and its data Researchers present a project proposal, gets a PPM number The administrator creates a data acquisition collection, the PI and the researcher are usually both managers The researcher collects and archives raw data The researcher analyzes the data The researcher writes and submits a manuscript The administrator creates a research documentation and a data sharing collection The researcher moves the processed data to the archive The researcher moves the to-be-published-data to the archive The three collections are closed Research initiation Data acquisition Data analysis & documentation Data sharing
  11. Reviewing and archiving data using the Donders Repository Collection states for internal and for raw data Editable/ open Internal review Archived This creates a new version with the same DOI, the old version remains available as well
  12. Reviewing and publishing data using the Donders Repository Collection states for shared/published data Editable/ open Internal review External review Published This creates a new version with the same DOI, the old version remains available as well
  13. The Donders Repository Making Open Data FAIR Findable Make your data available on repository with a persistent identifier (DOI, handle) and metadata Accessible Be explicit about data usage terms (agreement with downloader) Interoperable Make your data human and machine readable, e.g. BIDS Reusable Make sure you document enough details, e.g. as “data descriptor” paper which can be cited, along with citing your data -> measurable impact!
  14. The Donders Repository Making Open Data FAIR The Donders Repository takes care of  Storage  Procedures and protocols  Roles and responsibilities  Long-term management  Authentication and authorization  Internal data flow/access  Data use agreements  External data access  Pushing metadata to RIS, NARCIS and Google
  15. The Donders Repository Making Open Data FAIR The Donders Institute is very broad, with 4 centres over 3 faculties, 80 principal investigators and some 800 researchers. We don’t impose explicit standards for how to organize and store generic neuroscience data or metadata. Only minimal metadata at the collection level. Multiple domain-specific standards needed for I and R.
  16. BIDS is a way to organize your existing raw data To improve consistent and complete documentation To facilitate re-use by your future self and others BIDS is not A new file format A search engine A data sharing tool Making human neuroimaging data FAIR https://bids-standard.org https://github.com/Donders-Institute/bidscoin
  17. Making human neuroimaging data FAIR https://bids-standard.org https://github.com/Donders-Institute/bidscoin
  18. The Donders Repository Summary The aims of the Donders Repository Procedural design Technical architecture Fitting it into the researchers daily work What data goes where? The timeline of a project and its data Closing collections, review and versions Making open data FAIR BIDS as specific standard for neuroimaging data Demonstration Automatic data flow Informed consent and GDPR Data Use Agreements
  19. www.ru.nl/donders https://data.donders.ru.nl r.oostenveld@donders.ru.nl
  20. Demo datasets: https://doi.org/10.34973/3jk5-6j57 : mouse data, CC-BY https://doi.org/10.34973/j05g-fr58 : “Dr Who” MRI, RU-DI-HD
  21. The Donders Repository Data Use Agreement - Data Sharing Collections Data use agreement for identifiable human data Version RU-DI-HD-1.0 I request access to the data collected in the digital repository of the Donders Institute for Brain, Cognition and Behaviour, part of the Radboud University, established at Nijmegen, the Netherlands (hereinafter referred to as the Donders Institute), and I agree to the following: 1. I will comply with all relevant rules and regulations imposed by my institution and my government. This may mean that I need my research to be approved or declared exempt by a committee that oversees research on human subjects, e.g. my Institutional Review Board or Ethics Committee. 2. I will not attempt to establish the identity of or attempt to contact any of the included human subjects. I will not link this data to any other database in a way that could provide identifying information. I understand that under no circumstances will the code that would link these data to an individuals personal information be given to me, nor will any additional information about individual subjects be released to me under these Data Use Terms. 3. I will not redistribute or share the data with others, including individuals in my research group, unless they have independently applied and been granted access to this data. 4. I will acknowledge the use of the data and data derived from the data when publicly presenting any results or algorithms that benefitted from their use. (a) Papers, book chapters, books, posters, oral presentations, and all other presentations of results derived from the data should acknowledge the origin of the data as follows: "Data were provided (in part) by the Donders Institute for Brain, Cognition and Behaviour". (b) Authors of publications or presentations using the data should cite relevant publications describing the methods developed and used by the Donders Institute to acquire and process the data. The specific publications that are appropriate to cite in any given study will depend on what the data were used and for what purposes. When applicable, a list of publications will be included in the collection. (c) Neither the Donders Institute or Radboud University, nor the researchers that provide this data should be included as an author of publications or presentations if this authorship would be based solely on the use of this data. 5. Failure to abide by these guidelines will result in termination of my privileges to access to these data. I will not attempt to establish the identity of or attempt to contact any of the included human subjects. I will not link this data to any other database that could provide identifying …
  22. Typical (inefficient) reuse of raw data acquisition PPM DAC analysis publication DSC RDC analysis publication DSC RDC analysis publication DSC RDC Year 0 Year N Never
  23. More efficient reuse of shared/published data acquisition PPM DAC specific analysis publication RDC common preproc. DSC specific analysis publication RDC specific analysis publication RDC

Editor's Notes

  1. Relevant to mention: many data management systems have a web interface, but you cannot reliably download or upload many files and nested directories in the browser, only zip files. And that imposes constraints on the size and organization.
  2. Web access is mainly for metadata and management File access is for moving data in and out
Advertisement