Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Data - strategies for research data management & impact of best practices

201 views

Published on

Slides from a webinar given for the NCP Academy, 16 June 2017

Published in: Education
  • Be the first to comment

  • Be the first to like this

Open Data - strategies for research data management & impact of best practices

  1. 1. Facilitate Open Science Training for European Research Open Data: Strategies for Research Data Management, and impact of best practices? Martin Donnelly Digital Curation Centre University of Edinburgh NCP Academy Webinar 16 June 2017
  2. 2. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  3. 3. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  4. 4. Background (me) • Academic background in cultural heritage computing… • Which led me to work in digital preservation… • Which led to my current involvement in research data management and the broader topic of Open Science • I’ve been involved to various degrees in the development of early DMP resources (DCC Checklist, DMPonline, DMPTool, book chapter on DMP…) • Member of the original FOSTER consortium • Also involved in consultancy, advocacy, events, training etc, e.g. as external expert reviewer of Horizon 2020 DMPs
  5. 5. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  6. 6. Open Access + Open Data = Open Science • Openness in research is situated within a context of ever greater transparency, accessibility and accountability • As Open Access to publications became normal (if not yet ubiquitous), the scholarly community turned its attention to the data which underpins the research outputs, and eventually to consider it a first-class output in its own right. The development of the OA and research data management (RDM) agendas are closely linked as part of a broader trend in research, sometimes termed ‘Open Science’ or ‘Open Research’ • “The European Commission is now moving beyond open access towards the more inclusive area of open science. Elements of open science will gradually feed into the shaping of a policy for Responsible Research and Innovation and will contribute to the realisation of the European Research Area and the Innovation Union, the two main flagship initiatives for research and innovation” http://ec.europa.eu/research/swafs/index.cfm?pg=policy&lib=science
  7. 7. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  8. 8. Good practice in RDM RDM is “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” What sorts of activities? - Planning and describing data- related work before it takes place - Documenting your data so that others can find and understand it - Storing it safely during the project - Depositing it in a trusted archive at the end of the project - Linking publications to the datasets that underpin them
  9. 9. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  10. 10. Benefits • IMPACT and LONGEVITY: Open data (and publications) receive more citations, over longer periods • SPEED: The research process becomes faster • ACCESSIBILITY: Interested third parties can (where appropriate) access and build upon publicly-funded research outputs with minimal barriers to access • EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes • TRANSPARENCY and QUALITY: The evidence that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings. This leads to a more robust scholarly record, and reduces academic fraud for example • DURABILITY: simply put, fewer important datasets will be lost
  11. 11. “In genomics research, a large-scale analysis of data sharing shows that studies that made data available in repositories received 9% more citations, when controlling for other variables; and that whilst self-reuse citation declines steeply after two years, reuse by third parties increases even after six years.” (Piwowar and Vision, 2013) Van den Eynden, V. and Bishop, L. (2014). Incentives and motivations for sharing research data, a researcher’s perspective. A Knowledge Exchange Report, http://repository.jisc.ac.uk/5662/1/KE _report-incentives-for-sharing- researchdata.pdf Benefits: Impact and Longevity
  12. 12. “Data is necessary for reproducibility of computational research” Victoria Stodden, “Innovation and Growth through Open Access to Scientific Research: Three Ideas for High-Impact Rule Changes” in Litan, Robert E. et al. Rules for Growth: Promoting Innovation and Growth Through Legal Reform. SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, February 8, 2011. http://papers.ssrn.com/abstract=1757982. Benefits: Quality
  13. 13. Baker, M. (2016) “1,500 scientists lift the lid on reproducibility”, Nature, 533:7604, http://www.nat ure.com/news/1 -500-scientists- lift-the-lid-on- reproducibility- 1.19970
  14. 14. “Conservatively, we estimate that the value of data in Australia’s public research to be at least $1.9 billion and possibly up to $6 billion a year at current levels of expenditure and activity. Research data curation and sharing might be worth at least $1.8 billion and possibly up to $5.5 billion a year, of which perhaps $1.4 billion to $4.9 billion annually is yet to be realized.” • “Open Research Data”, Report to the Australian National Data Service (ANDS), November 2014 - John Houghton, Victoria Institute of Strategic Economic Studies & Nicholas Gruen, Lateral Economics Benefits: Financial
  15. 15. J. Manyika et al. "Open data: Unlocking innovation and performance with liquid information" McKinsey Global Institute, October 2013
  16. 16. “If we are going to wait five years for data to be released, the Arctic is going to be a very different place.” Bryn Nelson, Nature, 10 Sept 2009 http://www.nature.com/nature/jour nal/v461/n7261/index.html Benefits: Speed https://www.flickr.com/photos/gsfc/7348953774/ - CC-BY
  17. 17. Benefits: Durability Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old” - The odds of a data set being reported as extant fell by 17% per year - Broken e-mails and obsolete storage devices were the main obstacles to data sharing - Policies mandating data archiving at publication are clearly needed “The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes” according to Timothy Vines, one of the researchers. This underscores the need for intentional management of data from all disciplines and opened our conversation on potential roles for librarians in this arena. (“80 Percent of Scientific Data Gone in 20 Years” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in- 20-years.htm.) Vines et al., The Availability of Research Data Declines Rapidly with Article Age, Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
  18. 18. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  19. 19. Risks of getting this wrong • Legal – sensitive data is protected by law (and contracts) and needs to be protected • Financial – non-compliance with funder policies can lead to reduced access to income streams • Scientific – potential discoveries may be hidden away in drawers, on USB • Opportunity cost – reduced visibility for research > lost opportunities for collaboration • Quality – the scholarly record becomes less robust • Reputational – responsible data management is increasingly considered a core element of good scholarly practice in the 21st century
  20. 20. Growing momentum and ubiquity… Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
  21. 21. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  22. 22. Step 1. Be clear about who is involved • RDM is a hybrid activity, involving multiple stakeholder groups… • The researchers themselves • Research support personnel • Partners based in other institutions, funders, data centres, commercial partners, etc • No single person does everything, and it makes no sense to duplicate effort or reinvent wheels • Data Management Planning (DMP) underpins and pulls together different strands of data management activities. DMP is the process of planning, describing and communicating the activities carried out during the research lifecycle in order to… • Keep sensitive data safe • Maximise data’s re-use potential • Support longer-term preservation • Data Management Plans are a means of communication, with contemporaries and future re-users alike
  23. 23. Step 2. Write things down • In a data management plan / record • In metadata to describe the data and help others to understand it • In workflows and README files • In version management • In justifying decisions re. access, embargo, selection and appraisal… the list can be very long… Communication is crucial!
  24. 24. Step 3. Don’t try to do everything yourself • See Step 1 ;)
  25. 25. RDM / Open Data in practice: key points 1. Understand your funder’s policies (and perhaps national policy initiatives – see recent SPARC-Europe reports) 2. Create a data management plan (e.g. with DMPonline) 3. Decide which data to preserve (e.g. using the DCC How-To guide and checklist, “Five Steps to Decide what Data to Keep”) 4. Identify a long-term home for your data (e.g. via re3data.org) 5. Link your data to your publications with a persistent identifier (e.g. via DataCite) • N.B. Many archives, including Zenodo, will do this for you 6. Investigate EU infrastructure services and resources
  26. 26. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  27. 27. A few do’s and don’ts for RDM DO DON’T Have a plan for your data Make it up as you go along Keep backups. Make this easy with automated syncing services like Dropbox, provided your data isn’t too sensitive Carry the only copy around on a memory card, your laptop, your phone, etc Describe your data as you collect it. This makes it possible for others to interpret it, and for you to do the same a few years down the line Leave this till the end. The quality of metadata decreases with time, and the best metadata is created at the moment of data capture Save your work in open file formats, where possible, and use accepted metadata standards to enable like-with-like comparison Invent new ‘standards’ where community norms already exist Deposit your data in a data centre or repository, and link it to your publications Be afraid to ask for help. This will exist both within your institution, and via national / European support organisations
  28. 28. Rules of thumb • Without intervention, data + time = no data • See Vines, above • Prioritise: could anyone die or go to jail? • Legal issues (e.g. protecting vulnerable subjects) are the most important • Storage is not the same as management • Think of data as plants and the servers as a greenhouse • The plants still need to be fed, watered, pruned, etc… and sometimes disposed of • Management is not the same as sharing • Not all data should be shared • Approach: “As open as possible, as closed as necessary”
  29. 29. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  30. 30. • Phase 1 (2014-2016): Spread the Seeds of Open Science and Open Access • Creation of Open Science Taxonomy • 2000+ training materials, categorized in the FOSTER Portal • More than 100 f2f training events in 28 countries and 25 online courses, totalling more than 6300 participants FacilitateOpenScienceTrainingforEuropeanResearch The project http://fosteropenscience.eu
  31. 31. • Phase 2 (2017-2019): Let the Flowers of Open Science Bloom • Focus on: • Training for the practical implementation of Open Science (face to face and online) including RDM and Open Data • Developing intermediate/advanced level/discipline-specific training resources in collaboration with three disciplinary communities (and related RIs): Life Sciences (ELIXIR), Social Sciences (CESSDA) and Humanities (DARIAH) • Update the FOSTER Portal to support moderated learning, badges and gamification • In concrete terms: • 150 new training resources • Over 50 training events (outcome-oriented, providing participants with tangible skills) and 20 e-learning courses • Multi-module Open Science Toolkit • Trainers Network, Open Science Bootcamp, Open Science Training Handbook, and more… FacilitateOpenScienceTrainingforEuropeanResearch The project http://fosteropenscience.eu
  32. 32. Overview 1. Background 2. Context: Open Access + Open Data (+ Open Source) = Open Science (or Open Research) 3. What is good RDM practice? 4. What are the benefits of good RDM? 5. What are the risks of poor RDM? 6. A step by step approach 7. Do’s and don’ts / Rules of thumb 8. About the FOSTER project 9. About the DCC / contact details
  33. 33. The Digital Curation Centre (DCC) • UK national centre of expertise in digital preservation and data management, est. 2004 • Principal audience is the UK higher education sector, but we increasingly work further afield (continental Europe, North America, South Africa, Asia…) • Provide guidance, training, tools (e.g. DMPonline) and other services on all aspects of research data management and Open Science • Tailored consultancy/training • Organise national and international events and webinars (International Digital Curation Conference, Research Data Management Forum)
  34. 34. Contact details • For more information about the FOSTER project: • Website: www.fosteropenscience.eu • Principal investigator: Eloy Rodrigues (eloy@sdum.uminho.pt) • General enquiries: Gwen Franck (gwen.franck@eifl.net) • Twitter: @fosterscience • My contact details: • Email: martin.donnelly@ed.ac.uk • Twitter: @mkdDCC • Slideshare: http://www.slideshare.net/martindo nnelly This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

×