Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016| |


Published on

| | 1st Session: July 7, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition

Published in: Data & Analytics
  • Be the first to comment

EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016| |

  1. 1. How to write a Data Management Plan Sarah Jones (DCC) Marjan Grootveld (DANS) both involved in EUDAT and OpenAIRE This work is licensed under the Creative Commons CC-BY 4.0 licence
  2. 2. Open Access Infrastructure for Research in Europe Who we are Research Data Services, Expertise & Technology
  3. 3. Joint webinar held on 26 May 2016 covering: • Reasons to manage data • Horizon 2020 Open Research Data Pilot • How to manage and share data • EUDAT & OpenAIRE services Slides, webinar recording and Q&A document online introductory-webinar-from-openaire-and-eudat Introduction to RDM
  4. 4. • What is a DMP and why write one? • Requirements under Horizon 2020 • Example plans • Lessons and guidance Overview
  5. 5. WHAT IS A DMP & WHY WRITE ONE? Image CC-BY-NC-SA by Leo Reynolds
  6. 6. A DMP is a brief plan to define: • how the data will be created • how it will be documented • who will be able to access it • where it will be stored • who will back it up • whether (and how) it will be shared & preserved DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data. Data Management Plans
  7. 7. Why manage data? NON PECUNIAE INVESTIGATIONIS CURATORE SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS (Not for the research funder, but for life we make data management plans) • Make your research easier • Stop yourself drowning in irrelevant stuff • Save data for later • Avoid accusations of fraud or bad science • Write a data paper • Share your data for re-use • Get credit for it
  8. 8. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Research data lifecycle CREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata RE-USING DATA: follow- up research, new research, undertake research reviews, scrutinising findings, teaching & learning ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back- up & archiving, migrating to best format & medium, creating metadata and documentation ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data Ref: UK Data Archive:
  9. 9. What data organisation would a re-user like? Planning trick 1: think backwards CREATING DATA PROCESSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA
  10. 10. DMP and data organisation exercises Design a data organisation for the project (folder structure, file naming convention, …) Research Data Netherlands data support training:
  11. 11. Data organisation
  12. 12. Planning trick 2: include RDM stakeholders Institution RDM policy Facilities €$£ Research funders Publishers Data Availability policy Commercial partners
  13. 13. Responsibilities in RDM
  14. 14. A DMP is about ‘keeping’ data • Storing data < > archiving data • Archived data < > findable data • Findable < > accessible • Accessible < > understandable • Understandable < > usable • A USB stick is not safe • A persistent ID is essential but no guarantee for usability • Data in a proprietary format is not sustainable
  15. 15. • Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource,... • Accessible – Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t... • Interoperable – Use formal, broadly applicable languages, use standard vocabularies, qualified references... • Reusable – Rich, accurate metadata, clear licences, provenance, use of community standards... Making data FAIR
  16. 16. How to deal with data and context? • Versioning, back-up, storage and archiving – During the project and in the long term • Ethics, consent forms, legal access • Security and technical access • Usage licences
  17. 17. What should be preserved and shared? • The data needed to validate results in scientific publications (minimally!). • The associated metadata: the dataset’s creator, title, year of publication, repository, identifier etc. – Follow a metadata standard in your line of work, or a generic standard, e.g. Dublin Core or DataCite, and be FAIR. – The repository will assign a persistent ID to the dataset: important for discovering and citing the data. • Documentation: code books, lab journals, informed consent forms – domain- dependent, and important for understanding the data and combining them with other data sources. • Software, hardware, tools, syntax queries, machine configurations – domain- dependent, and important for using the data. (Alternative: information about the software etc.) Basically, everything that is needed to replicate a study should be available. Plus everything that is potentially useful for others. Research Data Alliance (RDA) FAIR Guiding Principles for scientific data management & stewardship How to select and appraise research
  18. 18. DMPS IN HORIZON 2020 Image “Open Data” CC BY 2.0 by
  19. 19. Some funders that require DMPs
  20. 20. Common themes in DMPs 1. Description of data to be collected / created (i.e. content, type, format, volume...) 2. Standards / methodologies for data collection & management 3. Ethics and Intellectual Property (highlight restrictions on data sharing e.g. embargoes, confidentiality) 4. Plans for data sharing and access (i.e. how, when, to whom) 5. Strategy for long-term preservation Start planning and communicating early
  21. 21. Horizon 2020: Open Research Data Pilot a_pilot/h2020-hi-oa-data-mgt_en.pdf • Open access to research data refers to the right to access and re-use digital research data. Openly accessible research data can typically be accessed, mined, exploited, reproduced and disseminated free of charge for the user. • The use of a Data Management Plan (DMP) is required for projects participating in the Open Research Data Pilot, detailing what data the project will generate, whether and how they will be exploited or made accessible for verification and re-use, and how they will be curated and preserved.
  22. 22. H2020 - Open Data by Default from 2017
  23. 23. The RDM basics, tuned to Horizon 2020 • The EC’s goal is Open Access to research data: as open as possible, as closed as necessary. • In H2020 the Data Management Plan (DMP) is a regular project deliverable, due by month 6. • A DMP is a living document: to be used, updated and shared. • You can use the H2020 template in DMPonline. • Deposit the data in a research data repository. Look early for a research data repository for sharing and preserving the data long term. • If (part of your) data cannot be shared with everyone, you may (partially) opt out of the pilot.
  24. 24. Timing the DMP • Note that the Commission does NOT require applicants to submit a DMP at the proposal stage. • A DMP is therefore NOT part of the evaluation. • DMPs are a deliverable for those in the pilot. • Note that the Commission requires updates. A DMP is a living or “active” document.
  25. 25. Initial DMP (at 6 months) The DMP should address the points below on a dataset by dataset basis: • Dataset reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup)
  26. 26. More elaborate DMP Scientific research data should be easily: 1. Discoverable Are the data discoverable and identifiable by a standard mechanism e.g. DOIs? 2. Accessible Are the data accessible and under what conditions e.g. licenses, embargoes? 3. Assessable and intelligible Are the data and software assessable and intelligible to third parties for peer-review? E.g. can judgements be made about their reliability and the competence of those who created them? 4. Useable beyond the original purpose for which it was collected Are the data properly curated and stored together with the minimum software and documentation to be useful by third parties in the long-term? 5. Interoperable to specific quality standards Are the data and software interoperable, allowing data exchange? E.g. were common formats and standards for metadata used?
  27. 27. DMPonline A web-based tool to help researchers write DMPs Includes a template for Horizon 2020 Guidance from EUDAT and OpenAIRE being added
  28. 28. How the tool works Click to write a generic DMP Or choose your funder to get their specific template Pick your uni to add local guidance and to get their template if no funder applies Choose any additional optional guidance
  29. 29. EUDAT guidance
  30. 30. OpenAIRE support • Summary on the Open Research Data pilot • Brief guide on developing a DMP • Selecting a data repository • Developing guidance to add to DMPonline • Will be adding an ‘export to Zenodo’ feature in early 2017 to allow DMPs to be published and assigned a DOI
  31. 31. Deliver the DMP and keep it up to date • EC: “Since DMPs are expected to mature during the project, more developed versions of the plan can be included as additional deliverables at later stages. (…) New versions of the DMP should be created whenever important changes to the project occur due to inclusion of new data sets, changes in consortium policies or external factors.” Focus on how you will ensure your data are “FAIR”
  32. 32. Active DMPs • Interested in ways to support this active quality, where “active” is understood as “able to evolve and be monitored”? • Join the RDA’s Active Data Management Plans interest group management-plans.html • And see recordings, slides and notes of the international and interdisciplinary ADMP Workshop 28-30 June 2016
  33. 33. Option: add SSI template for software projects Two templates available for Software Management Plans in DMPonline courtesy of SSI
  35. 35. Example plans • 108 DMPs from the National Endowment for the Humanities grant-applications-2011-2014-now-available • 20+ scientific DMPs submitted to the NSF (USA) provided by UCSD – dmp- samples.html • Example DMP collection from Leeds University • • Further examples: •
  36. 36. Example: OpenMinTed OpenMinTed aims to create an infrastructure for Text and Data Mining (TDM) of scientific and scholarly content Have adopted their own structure to create a ‘Data and Software Management Plan’
  37. 37. Example: OpenMinTed – Data chapter Six high-level datasets identified: 1. Scholarly publications 2. Language and knowledge resources 3. Services and workflows 4. Automatically and manually generated annotations 5. Consortium publications 6. Metadata Described in a table per dataset (see illustration)
  38. 38. OpenMinTed – Software examples
  39. 39. Example: CAPSELLA CAPSELLA aims to develop ICT solutions for farmers and other actors engaged in agrobiodiversity Devised a questionnaire to collate datset information from project partners Identified 13 datasets, 6 of which are imported as is, 3 aggregated, 3 transformed and 1 generated
  40. 40. 4 types of data • Core Datasets - datasets related to the main project activities. The majority pre-exist CAPSELLA and are publicly available • Produced Datasets - datasets resulting from CAPSELLA’s pilot applications. These include sensor data, field data and user related datasets. • Project Related Data - datasets resulting from the operation of the project. They are collections of standard material e.g. deliverables, dissemination material, training material, scientific publications • Software - datasets resulting from the software developed in the frame of CAPSELLA. These datasets are mainly either software artefacts and source code and can be used for various purposes including research tasks or the development of new software components.
  41. 41. Example dataset record
  42. 42. Differing priorities?
  43. 43. Data description examples The final dataset will include self-reported demographic and behavioural data from interviews with the subjects and laboratory data from urine specimens provided. From NIH data sharing statements Every two days, we will subsample E. affinis populations growing under our treatment conditions. We will use a microscope to identify the life stage and sex of the subsampled individuals. We will document the information first in a laboratory notebook and then copy the data into an Excel spreadsheet. The Excel spreadsheet will be saved as a comma separated value (.csv) file. From DataOne – E. affinis DMP example
  44. 44. Metadata examples Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively. From ICPSR Framework for Creating a DMP We will first document our metadata by taking careful notes in the laboratory notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and works well for the types of data we will be producing. We will create these metadata using Morpho software, available through KNB. The metadata will fully describe the data files and the context of the measurements. From DataOne – E. affinis DMP example
  45. 45. Data sharing examples We will make the data and associated documentation available to users under a data- sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. From NIH data sharing statements The videos will be made available via the website (both as streaming media and downloads) HD and SD versions will be provided to accommodate those with lower bandwidth. Videos will also be made available via Vimeo, a platform that is already well used by research students at Bristol. Appropriate metadata will also be provided to the existing Vimeo standard. All video will also be available for download and re-editing by third parties. To facilitate this Creative Commons licenses will be assigned to each item. In order to ensure this usage is possible, the required permissions will be gathered from participants (using a suitable release form) before recording commences. From University of Bristol Kitchen Cosmology DMP
  46. 46. Examples restrictions Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data- sharing agreement. From NIH data sharing statements 1. Share data privately within 1 year. Data will be held in Private Repository, but metadata will be public 2. Release data to public within 2 years. Encouraged after one year to release data for public access. 3. Request, in writing, data privacy up to 4 years. Extensions beyond 3 years will only be granted for compelling cases. 4. Consult with creators of private CZO datasets prior to use. Pis required to seek consent before using private data they can access From Boulder Creek Critical Zone Observatory DMP
  47. 47. Archiving examples The investigators will work with staff at the UKDA to determine what to archive and how long the deposited data should be retained. Future long- term use of the data will be ensured by placing a copy of the data into the repository. From ICPSR Framework for Creating a DMP Data will be provided in file formats considered appropriate for long-term access, as recommended by the UK Data Service. For example, SPSS Portal format and tab-delimited text for qualitative tabular data and RTF and PDF/A for interview transcripts. Appropriate documentation necessary to understand the data will also be provided. Anonymised data will be held for a minimum of 10 years following project completion, in compliance with LSHTM’s Records Retention and Disposal Schedule. Biological samples (output 3) will be deposited with the UK BioBank for future use. From Writing a Wellcome Trust Data Management and Sharing Plan
  48. 48. Share your example DMPs! Send us links to your DMPs We will add them to the DCC list Aim to cover wide range of disciplines and funders share-DMPs
  49. 49. LESSONS AND RESOURCES Image ‘Energy Resources | Energie Quelle’ CC-BY-NC by K. H. Reichert
  50. 50. Tips for writing DMPs • Seek advice - consult and collaborate • Consider good practice for your field • Base plans on available skills & support • Make sure implementation is feasible • Think about things early…
  51. 51. Plan to share data from the outset • Negotiation on licenses and consent agreement may preclude later sharing if not careful • Costings can’t be included retrospectively • Useful to consider data issues at the consortium negotiation stage to make sure potential issues are identified and sorted asap Decisions made early on affect what you can do later
  52. 52. Sharing data: what is meant? With collaborators while research is active Data are mutable (Open) data sharing Data are stable, searchable, citable, clearly licensed
  53. 53. Storing data: what is meant? Storing and backing up files while research is active Likely to be on a networked filestore or hard drive Easy to change or delete Archiving or preserving data in the long-term Likely to be deposited in a digital repository Safeguarded and preserved
  54. 54. Archiving, repositories, ehm? • Horizon 2020 ORD pilot participants are asked to “deposit your data in a research data repository”: a digital archive collecting and displaying datasets and their metadata. • Select a data repository that will preserve your data, metadata and possibly tools in the long term. • It is advisable to contact the repository of your choice when writing the first version of your DMP. • Repositories may offer guidelines for sustainable data formats and metadata standards, as well as support for dealing with sensitive data and licensing.
  55. 55. Where to find a repository? • More information: • Zenodo: •
  56. 56. Searching with
  57. 57. How to select a repository? 1/2 • Main criteria for choosing a data repository: Certification as a ‘Trustworthy Digital Repository’, with an explicit ambition to keep the data available in the long term. • Three common certification standards for TDRs: Data Seal of Approval: nestor seal: Siegel/siegel_node.html ISO 16363:
  58. 58. How to select a repository? 2/2 • Main criteria for choosing a data repository: Certification as a ‘Trustworthy Digital Repository’, with an explicit ambition to keep the data available in long term. • Matches your particular data needs: e.g. formats accepted; mixture of Open and Restricted Access. • Provides guidance on how to cite the data that has been deposited. • Gives your submitted dataset a persistent and globally unique identifier: for sustainable citations – both for data and publications – and to link back to particular researchers and grants.
  59. 59. Licensing research data • Horizon 2020 guidelines point to CC-BY or CC-0 • EUDAT licensing wizard help you pick licence for data & software • DCC How-to guide helps you to license data
  60. 60. • How to develop a DMP • RDM brochure and template material?set_language=en • OpenAIRE guidelines • • ICPSR framework for a DMP ework.html Guidelines on DMPs
  61. 61. • Guidelines on Data Management in Horizon 2020 • Provides summary of requirements • Includes templates for DMPs ef/h2020/grants_manual/hi/oa_pilot/h2020-hi- oa-data-mgt_en.pdf EC guidance
  62. 62. KEY MESSAGES Image “Fishbone” CC BY-NC-ND 2.0 by ttps://
  63. 63. Key messages • The principles of good research conduct hold for all of us, across disciplinary boundaries. • Data management is all in a day’s work. • Planning and reflection are more important than the plan – but write the DMP and keep it up to date. • Planning data management is team work. • Think about the desired end result and plan for this. • Decisions made early affect what you can do later.
  64. 64. Thanks – any questions? Contact us: Marjan Grootveld: Sarah Jones: Acknowledgements: Thanks to DANS and DCC for reuse of slides, and to the OpenMinTeD and CAPSELLA projects for sharing their Data Management Plans