Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Preparing your own data for future re-use: data management and the FAIR principles

363 views

Published on

Sheffield Hallam, 5 April 2017

Published in: Education
  • Be the first to comment

  • Be the first to like this

Preparing your own data for future re-use: data management and the FAIR principles

  1. 1. Preparing your own data for future re-use: data management and the FAIR principles Martin Donnelly Digital Curation Centre University of Edinburgh Sheffield Hallam University, 5 April 2017
  2. 2. The Digital Curation Centre (DCC) • UK national centre of expertise in digital preservation and data management, established 2004 • Principal audience is the UK higher education sector, but we increasingly work further afield (continental Europe, North America, South Africa, Asia…) • Provide guidance, training, tools (e.g. DMPonline) and other services on all aspects of research data management and Open Science • Now offering tailored consultancy/training • Organise national and international events and webinars (International Digital Curation Conference, Research Data Management Forum)
  3. 3. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  4. 4. Overview • As Open Access to publications became normal (if not ubiquitous), scholarly attention turned to the data underpinning the written outputs of research, and it is now considered a first-class research output in its own right. The development of OA and research data management (RDM) are closely linked as part of a broader trend in research, sometimes termed ‘Open Science’ or ‘Open Research’ • “The European Commission is now moving beyond open access towards the more inclusive area of open science. Elements of open science will gradually feed into the shaping of a policy for Responsible Research and Innovation and will contribute to the realisation of the European Research Area and the Innovation Union, the two main flagship initiatives for research and innovation” http://ec.europa.eu/research/swafs/index.cfm?pg=policy&lib=science • The EC’s data expectations are based on the framework of the FAIR principles, which state that data (and metadata) should ideally be Findable, Accessible, Interoperable and Reusable
  5. 5. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  6. 6. The old way of doing research 1. Researcher collects data (information) 3. Researcher writes paper based on data 4. Paper is published (and preserved) 5. Data is left to benign neglect, and eventually ceases to be accessible 2. Researcher interprets/synthesises data
  7. 7. Without intervention, data + time = no data Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old” - The odds of a data set being reported as extant fell by 17% per year - Broken e-mails and obsolete storage devices were the main obstacles to data sharing - Policies mandating data archiving at publication are clearly needed “The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes” according to Timothy Vines, one of the researchers. This underscores the need for intentional management of data from all disciplines and opened our conversation on potential roles for librarians in this arena.(“80 Percent of Scientific Data Gone in 20 Years” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in- 20-years.htm.) Vines et al., The Availability of Research Data Declines Rapidly with Article Age, Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
  8. 8. Baker, M. (2016) “1,500 scientists lift the lid on reproducibility”, Nature, 533:7604, http://www.nat ure.com/news/1 -500-scientists- lift-the-lid-on- reproducibility- 1.19970
  9. 9. (Aside: from data to research objects?) • ‘Research object’ is a term that is gaining in popularity, not least in the humanities where the relevance of the term ‘data’ is not always recognised… • Research objects can comprise any supporting material which underpins or otherwise enriches the (written) outputs of research • Data (numeric, written, audiovisual….) • Software code and algorithms • Workflows and methodologies • Slides, logs, lab books, sketchbooks, notebooks, etc • See http://www.researchobject.org/ for more info
  10. 10. The new way of doing research Plan Collect Assure Describe Preserve Discover Integrate Analyze DEPOSIT …and RE-USE The DataONE lifecycle model
  11. 11. N.B. other models are available… Ellyn Montgomery, US Geological Survey
  12. 12. Data sharing isn’t entirely new… from Philosophical Transactions of the Royal Society, (MDCCCLXI) (or 1861 if you’d prefer)
  13. 13. …but what’s “normal” is shifting Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
  14. 14. The benefits of Open / managed data • SPEED: The research process becomes faster • EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes • ACCESSIBILITY: Interested third parties can (where appropriate) access and build upon publicly-funded research resources with minimal barriers to access • IMPACT and LONGEVITY: Open publications and data receive more citations, over longer periods (see for example recent DCC/SPARC- Europe paper, “The Open Data Citation Advantage”) • TRANSPARENCY and QUALITY: The evidence that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings. This leads to a more robust scholarly record • SECURITY: Not all data should be made available to everyone. Careful management reduces the risk of inappropriate disclosure.
  15. 15. MANAGEMENT ≠ SHARING
  16. 16. Open and/or Managed? • Taking a managed and planned approach to research is not the same as making everything open to everyone • The purpose of research data management is twofold: • To ensure that data remains accessible and understandable; or • To ensure that data is not accessible or understandable (in its raw state, by the wrong people, or at the wrong time) • Which of these pertains will depend on the nature of the research. It is increasingly expected that publications and data (and software, algorithms, workflows etc) will be made Open by default, unless… • There is an ethical reason to restrict access • There is a public safety reason to restrict access • There is a commercial or contractual reason to restrict access • In some cases, data can be made partially-open (i.e. anonymised, aggregated or redacted) in order to protect these interests
  17. 17. Unanticipated data re-use Ships’ log books build picture of climate change 14 October 2010 You can now help scientists understand the climate of the past and unearth new historical information by revisiting the voyages of First World War Royal Navy warships. Visitors to OldWeather.org will be able to retrace the routes taken by any of 280 Royal Navy ships. These include historic vessels such as HMS Caroline, the last survivor of the 1916 Battle of Jutland still afloat. By transcribing information about the weather and interesting events from images of each ship's logbook, web volunteers will help scientists build a more accurate picture of how our climate has changed over the last century. http://www.nationalarchives.gov.uk/news /503.htm Detail from Royal Navy Recruitment poster, RNVR Signals branch, 1917 (Catalogue reference: ADM 1/8331) Endeavour, 1768-71 (Captain Cook) HMS Beagle, 1830-34 HMS Torch, 1918
  18. 18. Controversial FOI requests to… - University of East Anglia - Queens University Belfast - University of Stirling Unanticipated data mis-use?
  19. 19. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  20. 20. The FAIR Data Principles (0/4) One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows. FAIR is a set of guiding principles to make data • Findable • Accessible • Interoperable, and • Re-usable
  21. 21. The FAIR Data Principles (1/4) To be Findable: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier.
  22. 22. The FAIR Data Principles (2/4) To be Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1. the protocol is open, free, and universally implementable. A1.2. the protocol allows for an authentication and authorization procedure, where necessary. A2. metadata are accessible, even when the data are no longer available.
  23. 23. The FAIR Data Principles (3/4) To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data.
  24. 24. The FAIR Data Principles (4/4) To be Re-usable: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards
  25. 25. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  26. 26. FAIR in practice: European data policy • The EC is currently midway through an extended pilot for Horizon 2020. Other projects can participate voluntarily, and opting in has been more popular than opting out • The pilot applies as minimum to research data underlying publications, plus any other data as decided by the project • Participants must: • Create and maintain a DMP as a project deliverable • Deposit data in a repository • Make it possible for others to access, mine, exploit and reuse the data • Share information on the tools needed …unless there are compelling reasons not to do so. (And these reasons should be recorded in the DMP.) “As open as possible, as closed as necessary”
  27. 27. Horizon 2020 – extended pilot (i) The DMP should include information on: • the handling of research data during and after the end of the project • what data will be collected, processed and/or generated • which methodologies and standards will be applied • whether data will be shared/made open access, and • how data will be curated and preserved (including after the end of the project)
  28. 28. Horizon 2020 – extended pilot (ii) • Once project funding is approved and gets underway, the first version of the DMP is submitted (as a deliverable) within the first 6 months • The EC provides a template (in the Guidelines), use of which is recommended but voluntary • The DMP needs to be updated over the course of the project whenever significant changes arise (e.g. new datasets created; changes in consortium policies; changes in consortium members, etc.) • DMP should be updated for each periodic evaluation/ assessment of the project, and at minimum in time for the final review.
  29. 29. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  30. 30. Making your data FAIR, step-by-step 1. Understand your funder’s policies (e.g. the EC Guidelines) 2. Create a data management plan (e.g. with DMPonline) 3. Decide which data to preserve using the DCC How-To guide and checklist, “Five Steps to Decide what Data to Keep” 4. Identify a long-term home for your data (e.g. via re3data.org) 5. Link your data to your publications with a persistent identifier (e.g. via DataCite) • N.B. Many repositories will do this for you 6. Investigate infrastructure services and resources, e.g. EUDAT, OpenAIRE, FOSTER, etc…
  31. 31. Tools and resources
  32. 32. Contents 1. Overview 2. Recap: why does data need managing? 3. The FAIR principles 4. Principles into practice: data in Horizon 2020 5. FAIR data, step-by-step 6. References/resources
  33. 33. References/resources • FORCE11, “Guiding principles for findable, accessible, interoperable and re-usable data publishing”, https://www.force11.org/fairprinciples • Guidelines on FAIR Data Management in Horizon 2020, v3.0, 26 July 2016, http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pil ot/h2020-hi-oa-data-mgt_en.pdf • DMPonline, https://dmponline.dcc.ac.uk/ • DCC guide, “Five Steps to Decide what Data to Keep” (2014), http://www.dcc.ac.uk/resources/how-guides/five-steps-decide-what-data-keep • DCC/SPARC-Europe report, “The Open Data Citation Advantage” – http://sparceurope.org/open-data-citation-advantage/ • Registry of Research Data Repositories, http://www.re3data.org/ • DataCite, https://www.datacite.org/ • Workshop materials from “How EUDAT services support FAIR data" at IDCC 2017, Edinburgh, https://www.eudat.eu/events/trainings/eudat-workshop-how-eudat- services-support-fair-data-at-idcc-2017-edinburgh • OpenAIRE, https://www.openaire.eu/ • FOSTER, https://www.fosteropenscience.eu/
  34. 34. Thank you: any questions? • For more information about the DCC: • Website: www.dcc.ac.uk • Director: Kevin Ashley (kevin.ashley@ed.ac.uk) • General enquiries: Alex Delipalta (alexandra.delipalta@ed.ac.uk) • Twitter: @digitalcuration • My contact details: • Email: martin.donnelly@ed.ac.uk • Twitter: @mkdDCC • Slideshare: http://www.slideshare.net/martindonnelly This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

×