Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical Research Data Management: tools and approaches, pre- and post-award

813 views

Published on

Presentation given at an ARMA training event, London, 10th February 2016

Published in: Data & Analytics
  • Be the first to comment

Practical Research Data Management: tools and approaches, pre- and post-award

  1. 1. Practical Research Data Management: tools and approaches, pre- and post-award Martin Donnelly Digital Curation Centre University of Edinburgh Research Data Management: a good practice exchange London, 10 February 2016
  2. 2. The Digital Curation Centre (DCC) • The UK’s national centre of expertise in digital preservation and data management, established 2004 • Provide guidance, training, tools and other services on all aspects of research data management • Organise national and international events and webinars (International Digital Curation Conference, Research Data Management Forum) • Principal audience is the UK higher education sector, but we increasingly work further afield (Europe, North America, South Africa…) • Now offering tailored consultancy/training services
  3. 3. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  4. 4. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  5. 5. The old way of doing research 1. Researcher collects data (information) 2. Researcher interprets/synthesises data 3. Researcher writes paper based on data 4. Paper is published (and preserved) 5. The data is left to benign neglect, and eventually ceases to be accessible
  6. 6. Without intervention, data + time = no data Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old” - The odds of a data set being reported as extant fell by 17% per year - Broken e-mails and obsolete storage devices were the main obstacles to data sharing - Policies mandating data archiving at publication are clearly needed “The current system of leaving data with authors means that almost all of it is lost over time, unavailable for validation of the original results or to use for entirely new purposes” according to Timothy Vines, one of the researchers. This underscores the need for intentional management of data from all disciplines and opened our conversation on potential roles for librarians in this arena. (“80 Percent of Scientific Data Gone in 20 Years” HNGN, Dec. 20, 2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data- gone-in-20-years.htm.) Vines et al., The Availability of Research Data Declines Rapidly with Article Age, Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
  7. 7. The new way of doing research Plan Collect Assure Describe Preserve Discover Integrate Analyze DEPOSIT …and RE-USE The DataONE lifecycle model
  8. 8. What is RDM? “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” What sorts of activities? - Planning and describing data- related work before it takes place - Documenting your data so that others can find and understand it - Storing it safely during the project - Depositing it in a trusted archive at the end of the project - Linking publications to the datasets that underpin them
  9. 9. (Aside: from data to research objects?) • ‘Research object’ is a term that is gaining in popularity, not least in the humanities where the relevance of the term ‘data’ is not always recognised… • Research objects can comprise any supporting material which underpins or otherwise enriches the (written) outputs of research • Data (numeric, written, audiovisual….) • Software code and algorithms • Workflows and methodologies • Slides, logs, lab books, sketchbooks, notebooks, etc • See http://www.researchobject.org/ for more info
  10. 10. Helicopter view: benefits of RDM • SPEED: The research process becomes faster • EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes • ACCESSIBILITY: Interested third parties can (where appropriate) access and build upon publicly-funded research resources with minimal barriers to access • IMPACT and LONGEVITY: Publications with open data receive more citations, over longer periods • TRANSPARENCY and QUALITY: The evidence that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings. This leads to a more robust scholarly record
  11. 11. Growing momentum and ubiquity… Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
  12. 12. (Aside: To share or not to share?) • Data management and data sharing are not the same thing! • Sensitive data, whether commercially or ethically sensitive, must be protected in line with the laws of the land, research funder policies, and institutional ethical approval processes • Some sensitive data can be shared after certain kinds of processing, e.g. anonymisation, pseudonymisation, aggregation, etc. • Other datasets may be subject to rigorous access / clearance controls, or embargos • The real experts in sensitive data are the UK Data Archive, based at the University of Essex
  13. 13. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  14. 14. Who and how? • RDM is a hybrid activity, involving multiple stakeholder groups… • The researchers themselves • Research support personnel • Partners based in other institutions, commercial partners, etc • Other stakeholders in the modern research process include governments, public services, and the general public (who fund lots of research via their taxes)
  15. 15. What does it mean in practice? (i) • For research institutions, there are three principal areas of focus… 1. Developing and integrating technical infrastructure (repositories/ CRIS systems, storage space, data catalogues and registries, etc) 2. Developing human infrastructure (creating policies, assessing current data management capabilities, identifying areas of good practice, DMP templates, tailored training and guidance materials…) 3. Developing business plans for sustainable service • Many have formed cross-function (hybrid) working groups, advisory groups, task forces, etc http://blog.soton.ac.uk/keepit/ 2010/01/28/aida-and- institutional-wobbliness/
  16. 16. What does it mean in practice? (ii) • For researchers it is… • A disruption to previous working processes • Additional expectations / requirements from the funders (and sometimes home institutions) • But! It provides opportunities for new types of investigation • And leads to a more robust scholarly record
  17. 17. What does it mean in practice? (iii) • Research administrators and other support professionals: • Need to understand the key elements in the process, as well as roles and responsibilities • Should understand the key points of the funders’ requirements • Should expect questions from researchers… and perhaps some resistance!
  18. 18. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  19. 19. Data management planning (DMP) • Data management planning is the process of planning, describing and communicating activities carried out during the research lifecycle in order to… • Keep sensitive data safe • Maximise data’s reuse potential • Support longer-term preservation • Data management planning underpins and pulls together different strands of data management activities, often across multiple project partners • A data management plan (DMP) is usually a short document detailing specifics of the data that will be created during a research project, together with information on how it can be accessed and utilised • Research funders often ask for DMPs to be submitted alongside grant applications and/or developed over the course of the research project. (HEIs are increasingly asking their researchers to do this too…)
  20. 20. Why plan? • It is intuitive that planned activities stand a better chance of meeting their goals than unplanned ones. The process of planning is also a process of communication, increasingly important in interdisciplinary/multi-partner research. Collaboration will be more harmonious if project partners (in industry, other universities, other countries…) are in accord • In terms of data security, if there are good reasons not to publish/share data, in whole or in part, you will be on more solid ground with funders if you flag these up early in the process • DMP also provides an ideal opportunity to engender good practice with regard to (e.g.) file formats, metadata standards, storage and risk management practices, leading to greater longevity of data, and improved quality standards…
  21. 21. (Aside: limits of data management planning) What can a plan not do? It can’t do the work for you. The map is not the territory (Korzybski) or Chalk’s no shears (Scottish saying) It is important to remember that the human challenges in data management are often more difficult to meet than the technological ones. Communication is vital!
  22. 22. What does a data management plan look like? It is usually a couple of pages outlining:  how data will be captured/created  how it will be documented  who will be able to access it  where it will be stored  how it will be backed up, and  whether (and how) it will be shared and preserved long-term  etc DMPs are often submitted as part of funding applications – and requirements vary from funder to funder – but they are useful whenever researchers are creating (or reusing) data, especially where the research involves multiple partners, countries, etc…
  23. 23. Roles and responsibilities Like RDM in general, data management planning is a hybrid activity, involving multiple stakeholder groups… • The principal investigator (usually ultimately responsible for data) • Research assistants (may be more involved in day-to-day data management) • The institution’s funding office (may have a compliance role) • Library/IT/Legal (The library may issue PIDs, or liaise with an external service who do this, e.g. DataCite.) • Partners based in other institutions • Commercial partners • etc
  24. 24. Interactive exercise: data management planning • Select one of the DMP Checklist headings (left), and brainstorm all the internal and external stakeholders you think might be involved (and how/why) – be as specific as you like • Remember to consider the different stages of research: pre-award, in-project, post- project • We’ll have a short reporting/discussion session at the end • http://www.dcc.ac.uk/resourc es/data-management- plans/checklist §1. Administrative Data [basic details about the project] §2. Data Collection  What data will you collect or create?  How will the data be collected or created? §3. Documentation and Metadata  What documentation and metadata will accompany the data? §4. Ethics and Legal Compliance  How will you manage any ethical issues?  How will you manage copyright and Intellectual Property Rights (IPR) issues? §5. Storage and Backup  How will the data be stored and backed up during the research?  How will you manage access and security? §6. Selection and Preservation  Which data should be retained, shared, and/or preserved?  What is the long-term preservation plan for the dataset? §7. Data Sharing  How will you share the data?  Are any restrictions on data sharing required? §8. Responsibilities and Resources  Who will be responsible for data management?  What resources will you require to deliver your plan?
  25. 25. Data management planning exercise: outcomes • It’s not necessary – or even desirable – for every researcher (or research administrator, or librarian, or IT person…) to become an expert in every aspect of data management • Universities have an increasing obligation to provide infrastructure and support • Specific expertise may be available from the research office, library, IT, departmental support staff, legal services etc, as well as academic colleagues with particular areas of expertise • The trick is to make this appear seamless - communication and coordination is ever more important
  26. 26. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  27. 27. Case study: The Horizon 2020 data pilot • Horizon 2020 includes a data management (planning) pilot… • http://ec.europa.eu/research/participants/data/ref/h2020 /grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf • Proposals covered • “Innovation actions” and “Research and innovation actions” • DMP contents • Data types; Standards used; Sharing/making available; Curation and preservation • Multi-phase approach • Initial DMP due within first 6 months • Mid-term DMP • Final review stage DMP • There are opt-out conditions. A detailed description and scope of the Open Research Data Pilot requirements is provided on the Participants’ Portal
  28. 28. The Horizon 2020 DMP requirements (i) v1: Within Six Months For each data set specify the following: • Data set reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup) .docx output from DMPonline
  29. 29. The Horizon 2020 DMP requirements (i) v1: Within Six Months For each data set specify the following: • Data set reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup) Tools/resources • DOI or other identifier • Example plans • RDA | Metadata Directory • UKDA resources • Data archive, e.g. Zenodo or a funding council mandated archive
  30. 30. The Horizon 2020 DMP requirements (ii) v2 and v3: Mid-Term and Final Reviews Scientific research data should be easily: 1. Discoverable • Are the data and associated software produced and/or used in the project discoverable (and readily located), identifiable by means of a standard identification mechanism (e.g. Digital Object Identifier)? 2. Accessible • Are the data and associated software produced and/or used in the project accessible and in what modalities, scope, licenses? 3. Assessable and intelligible • Are the data and associated software produced and/or used in the project assessable for and intelligible to third parties in contexts such as scientific scrutiny and peer review? Continues…
  31. 31. The Horizon 2020 DMP requirements (ii) v2 and v3: Mid-Term and Final Reviews Scientific research data should be easily: 1. Discoverable • Are the data and associated software produced and/or used in the project discoverable (and readily located), identifiable by means of a standard identification mechanism (e.g. Digital Object Identifier)? 2. Accessible • Are the data and associated software produced and/or used in the project accessible and in what modalities, scope, licenses? 3. Assessable and intelligible • Are the data and associated software produced and/or used in the project assessable for and intelligible to third parties in contexts such as scientific scrutiny and peer review? Continues… Tools/resources 1. DataCite / Zenodo 2. DCC guidance on licensing data 3. Software Sustainability Institute
  32. 32. v2 and v3: Mid-Term and Final Reviews Scientific research data should be easily: 4. Usable beyond the original purpose for which it was collected • Are the data and associated software produced and/or used in the project useable by third parties even long time after the collection of the data? 5. Interoperable to specific quality standards • Are the data and associated software produced and/or used in the project interoperable allowing data exchange between researchers, institutions, organisations, countries, etc? The Horizon 2020 DMP requirements (iii)
  33. 33. Tools/resources 4. Guidance on data description, e.g. LSHTM 5. Using open formats, e.g. Five Stars of Open Data v2 and v3: Mid-Term and Final Reviews Scientific research data should be easily: 4. Usable beyond the original purpose for which it was collected • Are the data and associated software produced and/or used in the project useable by third parties even long time after the collection of the data? 5. Interoperable to specific quality standards • Are the data and associated software produced and/or used in the project interoperable allowing data exchange between researchers, institutions, organisations, countries, etc? The Horizon 2020 DMP requirements (iii)
  34. 34. Overview 1. What is Research Data Management, and why? 2. Who’s involved and what does it mean for them? 3. Data management planning: a shared activity (including short interactive exercise) 4. Case study: The Horizon 2020 data pilot a. Pre-project b. In-project c. Post-project 5. Links and resources
  35. 35. Resources mentioned in the talk • DataCite: https://www.datacite.org/ • Zenodo: https://zenodo.org/ • DCC guidance on licensing data: http://www.dcc.ac.uk/resources/how-guides/license- research-data • Software Sustainability Institute: http://www.software.ac.uk/ • LSHTM guidance on describing data: http://www.lshtm.ac.uk/research/researchdataman/desc ribe/describe_data.html • Five Star Open Data: http://5stardata.info/en/#costs- benefits
  36. 36. DCC resources on data management planning • Guidance, e.g. “How-To Develop a Data Management and Sharing Plan” • DCC Checklist for a Data Management Plan: http://www.dcc.ac.uk/resources/data- management-plans/checklist • DMPonline tool: https://dmponline.dcc.ac.uk/ • Links to all DCC DMP resources via http://www.dcc.ac.uk/resources/data- management-plans
  37. 37. • Helps researchers write DMPs • Provides funder questions and guidance • Includes a template DMP for Horizon 2020 • Provides help from universities • Examples and suggested answers • Free to use • Mature (v1 launched April 2010) • Code is Open Source (on GitHub) https://dmponline.dcc.ac.uk DMPonline: overview
  38. 38. Registration Sign up with your email address, organisation and password Select ‘other organisation’ if yours is not listed
  39. 39. Creating a plan Select funder (if any) Select organisation for additional questions and guidance Select other sources of guidance
  40. 40. Plan details: summary Summary of the sections and questions in your DMP
  41. 41. Answering questions Notes who has answered the question and when Progress bar updates how many questions remain
  42. 42. Sharing plans Allow colleagues to read-only, read-write, or become co-owners
  43. 43. Co-writing DMPs Sections are locked for editing when they’re being worked on by colleagues
  44. 44. Exporting DMPs Can export as plain text, docx, PDF, html...
  45. 45. Institutions can customise the tool by… • Adding templates • Adding custom guidance • Providing example or suggested answers • Monitoring usage within their organisation • Offering non-English language versions www.dcc.ac.uk/news/customising-dmponline-admin- interface-launches
  46. 46. More information Customising DMPonline www.dcc.ac.uk/news/customising- dmponline-admin-interface-launches http://www.screenr.com/PJHN Get the code, amend it, run a local instance, flag issues, request features... https://github.com/DigitalCurationCentre/DMPonline_v4
  47. 47. Sample plans, and last words of advice • There are lots of data management plans available on the Web. The DCC provides links to a number of sample DMPs via http://www.dcc.ac.uk/resources/data-management- plans/guidance-examples • The US National Endowment for the Humanities (NEH) recently released over 100 of its DMPs. These are available via: http://www.neh.gov/divisions/odh/grant-news/data-management- plans-successful-grant-applications-2011-2014-now-available • Remember that there is no magic bullet, and no one-size- fits-all solution! Much of the benefit of data management planning lies in the process of planning, above and beyond the plans produced at the end of the process • DMP is above all a communication activity, between the data collectors and their contemporaries (project partners and funders) and with future data re-users…
  48. 48. Thank you – any questions? • For more information about the DCC: • Website: www.dcc.ac.uk • Director: Kevin Ashley (kevin.ashley@ed.ac.uk) • General enquiries: Lorna Brown (lorna.brown@ed.ac.uk) • Twitter: @digitalcuration • My contact details: • Email: martin.donnelly@ed.ac.uk • Twitter: @mkdDCC • Slideshare: http://www.slideshare.net/martindonn elly This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

×