Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016| |


Published on

| | 2nd Session: July 14, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition

Published in: Data & Analytics
  • Be the first to comment

EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016| |

  1. 1. How to write a Data Management Plan Sarah Jones (DCC) Marjan Grootveld (DANS) both involved in EUDAT and OpenAIRE This work is licensed under the Creative Commons CC-BY 4.0 licence
  2. 2. Open Access Infrastructure for Research in Europe Who we are Research Data Services, Expertise & Technology
  3. 3. Joint webinar held on 26 May 2016 covering: • Reasons to manage data • Horizon 2020 Open Research Data Pilot • How to manage and share data • EUDAT & OpenAIRE services Slides, webinar recording and Q&A document online introductory-webinar-from-openaire-and-eudat Introduction to RDM
  4. 4. • What is a DMP and why write one? • Requirements under Horizon 2020 • Example plans • Lessons and guidance Overview
  5. 5. WHAT IS A DMP & WHY WRITE ONE? Image CC-BY-NC-SA by Leo Reynolds
  6. 6. A DMP is a brief plan to define: • how the data will be created • how it will be documented • who will be able to access it • where it will be stored • who will back it up • whether (and how) it will be shared & preserved DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data. Data Management Plans
  7. 7. Why manage data? NON PECUNIAE INVESTIGATIONIS CURATORE SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS (Not for the research funder, but for life we make data management plans) • Make your research easier • Stop yourself drowning in irrelevant stuff • Save data for later • Avoid accusations of fraud or bad science • Write a data paper • Share your data for re-use • Get credit for it
  8. 8. CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Research data lifecycle CREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata RE-USING DATA: follow- up research, new research, undertake research reviews, scrutinising findings, teaching & learning ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back- up & archiving, migrating to best format & medium, creating metadata and documentation ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data Ref: UK Data Archive:
  9. 9. What data organisation would a re-user like? Planning trick 1: think backwards CREATING DATA PROCESSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA
  10. 10. Data organisation exercises Design a data organisation for the project (folder structure, file naming convention, …) Research Data Netherlands data support training:
  11. 11. Data organisation
  12. 12. Planning trick 2: include stakeholders Institution RDM policy Facilities €$£ Research funders Publishers Data Availability policy Commercial partners
  13. 13. Responsibilities in RDM
  14. 14. A DMP is about ‘keeping’ data • Storing data < > archiving data • Archived data < > findable data • Findable < > accessible • Accessible < > understandable • Understandable < > usable • A USB stick is not safe • A persistent ID is essential but no guarantee for usability • Data in a proprietary format is not sustainable
  15. 15. • Findable – Assign persistent IDs, provide rich metadata, register in a searchable resource,... • Accessible – Retrievable by their ID using a standard protocol, metadata remain accessible even if data aren’t... • Interoperable – Use formal, broadly applicable languages, use standard vocabularies, qualified references... • Reusable – Rich, accurate metadata, clear licences, provenance, use of community standards... Making data FAIR
  16. 16. How to deal with data and context? • Versioning, back-up, storage and archiving – During the project and in the long term • Ethics, consent forms, legal access • Security and technical access • Usage licences
  17. 17. What should be preserved and shared? • The data needed to validate results in scientific publications (minimally!). • The associated metadata: the dataset’s creator, title, year of publication, repository, identifier etc. – Follow a metadata standard in your line of work, or a generic standard, e.g. Dublin Core or DataCite, and be FAIR. – The repository will assign a persistent ID to the dataset: important for discovering and citing the data. • Documentation: code books, lab journals, informed consent forms – domain- dependent, and important for understanding the data and combining them with other data sources. • Software, hardware, tools, syntax queries, machine configurations – domain- dependent, and important for using the data. (Alternative: information about the software etc.) Basically, everything that is needed to replicate a study should be available. Plus everything that is potentially useful for others. Research Data Alliance (RDA) FAIR Guiding Principles for scientific data management & stewardship How to select and appraise research
  18. 18. DMPS IN HORIZON 2020 Image “Open Data” CC BY 2.0 by
  19. 19. Some funders that require DMPs
  20. 20. Common themes in DMPs 1. Description of data to be collected / created (i.e. content, type, format, volume...) 2. Standards / methodologies for data collection & management 3. Ethics and Intellectual Property (highlight restrictions on data sharing e.g. embargoes, confidentiality) 4. Plans for data sharing and access (i.e. how, when, to whom) 5. Strategy for long-term preservation Start planning and communicating early
  21. 21. Horizon 2020: Open Research Data Pilot a_pilot/h2020-hi-oa-data-mgt_en.pdf • Open access to research data refers to the right to access and re-use digital research data. Openly accessible research data can typically be accessed, mined, exploited, reproduced and disseminated free of charge for the user. • The use of a Data Management Plan (DMP) is required for projects participating in the Open Research Data Pilot, detailing what data the project will generate, whether and how they will be exploited or made accessible for verification and re-use, and how they will be curated and preserved.
  22. 22. Who’s involved in this pilot? Current situation: • Researchers funded by Horizon 2020 within 9 specified call areas - • Opt out and opt in are possible. • A DMP per dataset As of 2017: • European Cloud Initiative to give Europe a global lead in the data-driven economy. • For new projects open data will become the default option. The pilot will be extended to cover all call areas. Opting out remains possible. •
  23. 23. Open, unless… • The EC’s goal is Open Access to research data: as open as possible, as closed as necessary. • Grant Agreement, Art. 29.3, Open Access to research data: • When applicable: explain in the DMP why you need to (partially) opt out.
  24. 24. Timing the DMP • Note that the Commission does NOT require applicants to submit a DMP at the proposal stage (see next slide). • A DMP is therefore NOT part of the evaluation. • DMPs are a deliverable for those in the pilot (due by month 6). • Note that the Commission requires updates. A DMP is a living or “active” document.
  25. 25. Proposal phase Where relevant*, H2020 proposals can include a section on data management which is evaluated under the criterion ‘Impact’. • What types of data will the project generate/collect? • What standards will be applied? • How will this data be exploited &/or shared/made accessible for verification and reuse? • If data cannot be made available, why not? • How will this data be curated and preserved? Your data management policy should reflect the current state of consortium agreements on RDM. * For “Research and Innovation actions” and “Innovation Actions”
  26. 26. Initial DMP (at 6 months) The DMP should address the points below on a dataset by dataset basis: • Dataset reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup) See Annex 1 at: -hi-oa-data-mgt_en.pdf
  27. 27. More elaborate DMP Scientific research data should be easily: 1. Discoverable Are the data discoverable and identifiable by a standard mechanism e.g. DOIs? 2. Accessible Are the data accessible and under what conditions e.g. licenses, embargoes? 3. Assessable and intelligible Are the data and software assessable and intelligible to third parties for peer-review? E.g. can judgements be made about their reliability and the competence of those who created them? 4. Useable beyond the original purpose for which it was collected Are the data properly curated and stored together with the minimum software and documentation to be useful by third parties in the long-term? 5. Interoperable to specific quality standards Are the data and software interoperable, allowing data exchange? E.g. were common formats and standards for metadata used? See Annex 2 at: hi-oa-data-mgt_en.pdf
  28. 28. DMPonline A web-based tool to help researchers write DMPs Includes a template for Horizon 2020 Guidance from EUDAT and OpenAIRE being added
  29. 29. How the tool works Click to write a generic DMP Or choose your funder to get their specific template Pick your uni to add local guidance and to get their template if no funder applies Choose any additional optional guidance
  30. 30. EUDAT guidance
  31. 31. OpenAIRE support • Summary on the Open Research Data pilot • Brief guide on developing a DMP • Selecting a data repository • Developing guidance to add to DMPonline • Will be adding an ‘export to Zenodo’ feature in early 2017 to allow DMPs to be published and assigned a DOI
  32. 32. Deliver the DMP and keep it up to date • EC: “Since DMPs are expected to mature during the project, more developed versions of the plan can be included as additional deliverables at later stages. (…) New versions of the DMP should be created whenever important changes to the project occur due to inclusion of new data sets, changes in consortium policies or external factors.” Focus on how you will ensure your data are “FAIR”
  33. 33. Active DMPs • Interested in ways to support this active quality, where “active” is understood as “able to evolve and be monitored”? • Join the RDA’s Active Data Management Plans interest group management-plans.html • And see recordings, slides and notes of the international and interdisciplinary ADMP Workshop 28-30 June 2016
  34. 34. Option: add SSI template for software projects Two templates available for Software Management Plans in DMPonline courtesy of SSI
  36. 36. Example plans • 108 DMPs from the National Endowment for the Humanities grant-applications-2011-2014-now-available • 20+ scientific DMPs submitted to the NSF (USA) provided by UCSD – dmp- samples.html • Example DMP collection from Leeds University • • Further examples: •
  37. 37. Example: OpenMinTed OpenMinTed aims to create an infrastructure for Text and Data Mining (TDM) of scientific and scholarly content Have adopted their own structure to create a ‘Data and Software Management Plan’
  38. 38. Example: OpenMinTed – Data chapter Six high-level datasets identified: 1. Scholarly publications 2. Language and knowledge resources 3. Services and workflows 4. Automatically and manually generated annotations 5. Consortium publications 6. Metadata Described in a table per dataset (see illustration)
  39. 39. OpenMinTed – Software examples
  40. 40. Example: CAPSELLA CAPSELLA aims to develop ICT solutions for farmers and other actors engaged in agrobiodiversity Devised a questionnaire to collate datset information from project partners Identified 13 datasets, 6 of which are imported as is, 3 aggregated, 3 transformed and 1 generated
  41. 41. Example dataset record
  42. 42. Data description examples The final dataset will include self-reported demographic and behavioural data from interviews with the subjects and laboratory data from urine specimens provided. From NIH data sharing statements Every two days, we will subsample E. affinis populations growing under our treatment conditions. We will use a microscope to identify the life stage and sex of the subsampled individuals. We will document the information first in a laboratory notebook and then copy the data into an Excel spreadsheet. The Excel spreadsheet will be saved as a comma separated value (.csv) file. From DataOne – E. affinis DMP example
  43. 43. Metadata examples Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively. From ICPSR Framework for Creating a DMP We will first document our metadata by taking careful notes in the laboratory notebook that refer to specific data files and describe all columns, units, abbreviations, and missing value identifiers. These notes will be transcribed into a .txt document that will be stored with the data file. After all of the data are collected, we will then use EML (Ecological Metadata Language) to digitize our metadata. EML is one of the accepted formats used in ecology, and works well for the types of data we will be producing. We will create these metadata using Morpho software, available through KNB. The metadata will fully describe the data files and the context of the measurements. From DataOne – E. affinis DMP example
  44. 44. Data sharing examples We will make the data and associated documentation available to users under a data- sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed. From NIH data sharing statements The videos will be made available via the website (both as streaming media and downloads) HD and SD versions will be provided to accommodate those with lower bandwidth. Videos will also be made available via Vimeo, a platform that is already well used by research students at Bristol. Appropriate metadata will also be provided to the existing Vimeo standard. All video will also be available for download and re-editing by third parties. To facilitate this Creative Commons licenses will be assigned to each item. In order to ensure this usage is possible, the required permissions will be gathered from participants (using a suitable release form) before recording commences. From University of Bristol Kitchen Cosmology DMP
  45. 45. Examples restrictions Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data- sharing agreement. From NIH data sharing statements 1. Share data privately within 1 year. Data will be held in Private Repository, but metadata will be public 2. Release data to public within 2 years. Encouraged after one year to release data for public access. 3. Request, in writing, data privacy up to 4 years. Extensions beyond 3 years will only be granted for compelling cases. 4. Consult with creators of private CZO datasets prior to use. Pis required to seek consent before using private data they can access From Boulder Creek Critical Zone Observatory DMP
  46. 46. Archiving examples The investigators will work with staff at the UKDA to determine what to archive and how long the deposited data should be retained. Future long- term use of the data will be ensured by placing a copy of the data into the repository. From ICPSR Framework for Creating a DMP Data will be provided in file formats considered appropriate for long-term access, as recommended by the UK Data Service. For example, SPSS Portal format and tab-delimited text for qualitative tabular data and RTF and PDF/A for interview transcripts. Appropriate documentation necessary to understand the data will also be provided. Anonymised data will be held for a minimum of 10 years following project completion, in compliance with LSHTM’s Records Retention and Disposal Schedule. Biological samples (output 3) will be deposited with the UK BioBank for future use. From Writing a Wellcome Trust Data Management and Sharing Plan
  47. 47. Share your example DMPs! Send us links to your DMPs We will add them to the DCC list Aim to cover wide range of disciplines and funders share-DMPs
  48. 48. LESSONS AND RESOURCES Image ‘Energy Resources | Energie Quelle’ CC-BY-NC by K. H. Reichert
  49. 49. Tips for writing DMPs • Seek advice - consult and collaborate • Consider good practice for your field • Base plans on available skills & support • Make sure implementation is feasible • Think about things early…
  50. 50. Plan to share data from the outset • Negotiation on licenses and consent agreement may preclude later sharing if not careful • Costings can’t be included retrospectively • Useful to consider data issues at the consortium negotiation stage to make sure potential issues are identified and sorted asap Decisions made early on affect what you can do later
  51. 51. Sharing data: what is meant? With collaborators while research is active Data are mutable (Open) data sharing Data are stable, searchable, citable, clearly licensed
  52. 52. Storing data: what is meant? Storing and backing up files while research is active Likely to be on a networked filestore or hard drive Easy to change or delete Archiving or preserving data in the long-term Likely to be deposited in a digital repository Safeguarded and preserved
  53. 53. Archiving, repositories, ehm? • Horizon 2020 ORD pilot participants are asked to “deposit your data in a research data repository”: a digital archive collecting and displaying datasets and their metadata. • Select a data repository that will preserve your data, metadata and possibly tools in the long term. • It is advisable to contact the repository of your choice when writing the first version of your DMP. • Repositories may offer guidelines for sustainable data formats and metadata standards, as well as support for dealing with sensitive data and licensing.
  54. 54. Where to find a repository? • More information: • Zenodo: •
  55. 55. Searching with
  56. 56. How to select a repository? • Certification as a ‘Trustworthy Digital Repository’ with an explicit ambition to keep the data available in long term. • Matches your particular data needs: e.g. formats accepted; mixture of Open and Restricted Access. • Provides guidance on how to cite the deposited data. • Gives your submitted dataset a persistent and globally unique identifier for sustainable citations and to link back to particular researchers and grants. Data Seal of Approval nestor seal ISO 16363
  57. 57. Keep everything? For always? • When regenerating data would be cheaper than archiving, don’t archive. Select what data you’ll need and want to retain. • 10 years is often stated in data policies and academic codes, but data can be valuable for ages, in climatology, sociology, health sciences, astronomy, linguistics, … Look beyond minimal retention periods where relevant. • Explain your selection criteria in the DMP. DCC How-to guide: RDNL Selection criteria: management/selecting-research-data/
  58. 58. Licensing research data • Horizon 2020 guidelines point to CC-BY or CC-0 • EUDAT licensing wizard help you pick licence for data & software • DCC How-to guide helps you to license data
  59. 59. Metadata standards Metadata Standards Directory • Broad, disciplinary listing of standards and tools • Maintained by RDA group directory Biosharing • A portal of data standards, databases, and policies • Focused on life, environmental and biomedical sciences
  60. 60. • How to develop a DMP • RDM brochure and template material?set_language=en • OpenAIRE guidelines • • ICPSR framework for a DMP ework.html Guidelines on DMPs
  61. 61. • Guidelines on Data Management in Horizon 2020 • Provides summary of requirements • Includes templates for DMPs ef/h2020/grants_manual/hi/oa_pilot/h2020-hi- oa-data-mgt_en.pdf EC guidance
  62. 62. KEY MESSAGES Image “Fishbone” CC BY-NC-ND 2.0 by ttps://
  63. 63. Key messages • Data management is part of good research practice whether you plan to make the data open or not – it benefits you! • The process of planning and reflecting are most important. Think about the desired end result and plan for this. • Approach the DMP in whatever way best fits your project – adopt a different template to suit – add sections / elements e.g. ethics, software – decide whether to describe each dataset in detail – focus effort on datasets you’ll create rather than reuse…
  64. 64. Thanks – any questions? Contact us: Marjan Grootveld: Sarah Jones: Acknowledgements: Thanks to DANS and DCC for reuse of slides, and to the OpenMinTeD and CAPSELLA projects for sharing their Data Management Plans
  65. 65. Please let us know what you thought of the webinar how-to-write-a-data-management-plan