Research Data Management:
a gentle introduction
Martin Donnelly, Digital Curation Centre, University of Edinburgh
CLS Liv...
OVERVIEW
1. Introductions and definitions
 The Digital Curation Centre
 Research data management
 What do we mean by ‘d...
1. INTRODUCTIONS AND DEFINITIONS
The Digital Curation Centre
 The (est. 2004) is…
 A UK centre of expertise in digital
preservation, with a particular fo...
Working with UK universities
DCC networks and partnerships
What is RD(M)?
“the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data ...
The old way of doing things
1. Researcher collects data (information)
2. Researcher interprets/synthesises data
3. Researc...
The new way of doing things
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
SHARE
…and
RE-USE
The DataONE...
Helicopter view:What are the benefits of RDM?
 TRANSPARENCY: The data that underpins research
can be made open for anyone...
 Definitions vary from discipline to discipline, and from funder to funder…
 Here’s a science-centric definition:
 “The...
2. POLITICS AND PRACTICAL CONCERNS
Nature, 09/08 Economist, 02/10
Popular Science,Science, 02/11
Nature, 09/09ACM, 12/08
InformationWeek, 08/10 Computerworld...
 Developments in sensor technology,
networking and digital storage enable
new research and scientific paradigms
 As cost...
Rosse
from
Philosophical
Transactions of
the Royal Society,
(MDCCCLXI) (or
1861 if you’d
prefer)
Repurposing /VfM via data re-use
Ships’ log books build picture of climate
change 14 October 2010
You can now help scienti...
6.9 The Research Councils expect the researchers they fund
to deposit published articles or conference proceedings in
an o...
Funder principles/expectations
1. Public good
2. Preservation
3. Discovery
4. Confidentiality
5. First use
6. Recognition
...
Meanwhile, in the USA…
(Aside: Open Data)
 Open Data is a philosophy, underpinned by
pragmatism… transparency + utility.
 “Open data is the ide...
Controversial FOI requests to…
- University of East Anglia
- Queens University Belfast
- University of Stirling
Risk manag...
- Reinhart & Rogoff (2010) “Growth in a Time of Debt” - paper not peer-reviewed, data
not initially made available…
- Very...
3. BARRIERS AND CURRENT ACTIVITIES
Why don’t we live in a data sharing utopia?
 Four main reasons…
 Lack of understanding of the fundamental
issues
 Lack ...
What are UK HEIs doing about it?
 Three principal areas of focus
 Developing and integrating their technical
infrastruct...
Quick interactive session: data management
planning
 Checklist for a Data
Management Plan, v4.0
(2013)
www.dcc.ac.uk/reso...
Quick interactive session: data management
planning
 Outcomes
 It’s not necessary – or even desirable – for every resear...
4. SUPPORT
i. DCC resources
 Publications
The DCC publishes a series of themed Briefing Papers, How-To Guides
and Case Studies, pitc...
ii. Other resources
 Jisc services and resources
 RDM resources, www.jisc.ac.uk/guides/research-data-
management
 EDINA...
A few rules of thumb…
STORAGE
≠
MANAGEMENT
Greenhouse = storage
Horticulture = management
DATA
MANAGEMENT
≠
SHARING
But! You generally
need a reason NOT to
share, e.g.
- Commercial interests
- Ethical concerns
- Data Protection Act
So… do...
Why not?
1. We probably can’t afford the
costs of storage: increasing
volumes outpace declining
storage hardware costs
and...
http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html
“Keeping 2018’s data in S3 would
cost the entire gl...
How to decide?
1. Relevance to Mission – including any legal/funder
requirement to retain the data beyond its
immediate us...
A few do’s and don’ts
DO DON’T
Have a plan for your data Make it up as you go along
Keep backups. Make this easy with auto...
Last slide: take-home messages
 Research data management (RDM) is…
 An integral part of doing quality research in the 21...
Thank you
Questions?
Image credits
Slide 2 (forest) – http://assets.worldwildlife.org/photos/934/images/hero_small/forest...
Upcoming SlideShare
Loading in …5
×

Research Data Management: a gentle introduction

533 views

Published on

Slides from a workshop at University of Huddersfield, June 2014

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
533
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • First cohort of institutional engagements, 2011-2013
  • Painting in broad strokes here, of course…
  • Share = deposit, link, publish, etc
  • Will unpack these over the course of the presentation, but first
  • Think about what you do in your own research
  • …and as the worlds of business and academia continue to merge… Interest in data is not limited to academia: the business world sees data as a valuable and potentially lucrative resource, a real game-changer…
  • Earliest academic scientific journal is Journal des sçavans, published on 5 Jan 1665
  • We can now publish and re-use data in a much more structured way, automating the process and crunching more data via computers than we could when it was only available on paper.
  • https://www.youtube.com/watch?v=n603rEnEGXA
  • Philip Morris International vs University of Stirling (2011) - another example of unanticipated data re-use!

    There’s a delicate balance between the rights of researchers, of human research subjects, of funders, and other interested stakeholders to enable or prevent access to research data…
  • So, those are the benefits, but there are still barriers to this utopia…
  • Forming cross-function (hybrid) working groups, advisory groups, task forces, etc
  • IT departments in particular tend to think of data management as primarily a hardware/technical problem. It’s not – the human side is bigger
  • The two main goal of data management are (1) to make data more widely accessible, and (2) to prevent access to sensitive data
  • 2. Prioritise based on relationship with publications, e.g. underpins scientific record (c.f. Sarah Callaghan, Preparde)
    5. Privilege irreproducible data…
  • A DMP is a basic statement of how you will create, manage, share and preserve your data

    Funders expect the decisions to be justified, particularly where it’s not in line with their policy (e.g. limits on data sharing)


  • Research Data Management: a gentle introduction

    1. 1. Research Data Management: a gentle introduction Martin Donnelly, Digital Curation Centre, University of Edinburgh CLS Live, University of Huddersfield, 3 June 2014
    2. 2. OVERVIEW 1. Introductions and definitions  The Digital Curation Centre  Research data management  What do we mean by ‘data’, exactly? 2. Data as a hot topic: politics and practical concerns 3. Barriers and current activities  Quick interactive session 4. Support and resources  A few rules of thumb / do’s and don’ts  Take-home messages
    3. 3. 1. INTRODUCTIONS AND DEFINITIONS
    4. 4. The Digital Curation Centre  The (est. 2004) is…  A UK centre of expertise in digital preservation, with a particular focus on research data management (RDM)  Based across three sites: Universities of Edinburgh, Glasgow and Bath  Working with a number of UK universities to identify gaps in RDM provision and raise capabilities across the sector  Also involved in a variety of international collaborations
    5. 5. Working with UK universities
    6. 6. DCC networks and partnerships
    7. 7. What is RD(M)? “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
    8. 8. The old way of doing things 1. Researcher collects data (information) 2. Researcher interprets/synthesises data 3. Researcher writes paper based on data 4. Paper is published (and preserved) 5. Data is left to benign neglect, and eventually ceases to be accessible
    9. 9. The new way of doing things Plan Collect Assure Describe Preserve Discover Integrate Analyze SHARE …and RE-USE The DataONE lifecycle model
    10. 10. Helicopter view:What are the benefits of RDM?  TRANSPARENCY: The data that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings.  EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes.  RISK MANAGEMENT: A pro-active approach to data management reduces the risk of inappropriate disclosure of sensitive data, whether commercial or personal.  PRESERVATION: Lots of data is unique, and can only be captured once. If lost, it can’t be replaced.
    11. 11.  Definitions vary from discipline to discipline, and from funder to funder…  Here’s a science-centric definition:  “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” (US Office of Management and Budget, Circular 110)  [Addendum: This policy applies to scientific collections, known in some disciplines as institutional collections, permanent collections, archival collections, museum collections, or voucher collections, which are assets with long-term scientific value. (US Office of Science and Technology Policy, Memorandum, 20 March 2014)]  And another from the visual arts:  “Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.” (Leigh Garrett, KAPTUR project: see http://kaptur.wordpress.com/ 2013/01/23/what-is-visual-arts-research-data-revisited/) Okay, but what is ‘data’ exactly?
    12. 12. 2. POLITICS AND PRACTICAL CONCERNS
    13. 13. Nature, 09/08 Economist, 02/10 Popular Science,Science, 02/11 Nature, 09/09ACM, 12/08 InformationWeek, 08/10 Computerworld, A hot topic: 5 years of front pages…
    14. 14.  Developments in sensor technology, networking and digital storage enable new research and scientific paradigms  As costs also fall, possibilities for data sharing, citation and re-use become much more widespread  Journals dedicated solely to publishing data have even started to appear. That’s not to say it’s an entirely new thing: journals have always published data, just never before at such scale… Technology
    15. 15. Rosse from Philosophical Transactions of the Royal Society, (MDCCCLXI) (or 1861 if you’d prefer)
    16. 16. Repurposing /VfM via data re-use Ships’ log books build picture of climate change 14 October 2010 You can now help scientists understand the climate of the past and unearth new historical information by revisiting the voyages of First World War Royal Navy warships. Visitors to OldWeather.org will be able to retrace the routes taken by any of 280 Royal Navy ships. These include historic vessels such as HMS Caroline, the last survivor of the 1916 Battle of Jutland still afloat. By transcribing information about the weather and interesting events from images of each ship's logbook, web volunteers will help scientists build a more accurate picture of how our climate has changed over the last century. http://www.nationalarchives.gov.uk/news/503. htm Detail from Royal Navy Recruitment poster, RNVR Signals branch, 1917 (Catalogue reference: ADM 1/8331) Endeavour, 1768-71 (Captain Cook) HMS Beagle, 1830-34 HMS Torch, 1918
    17. 17. 6.9 The Research Councils expect the researchers they fund to deposit published articles or conference proceedings in an open access repository at or around the time of publication. But this practice is unevenly enforced. Therefore, as an immediate step, we have asked the Research Councils to ensure the researchers they fund fulfil the current requirements. Additionally, the Research Councils have now agreed to invest £2 million in the development, by 2013, of a UK ‘Gateway to Research’. In the first instance this will allow ready access to Research Council funded research information and related data but it will be designed so that it can also include research funded by others in due course. The Research Councils will work with their partners and users to ensure information is presented in a readily reusable form, using common formats and open standards. Government pressure/support http://www.bis.gov.uk/assets/biscor e/innovation/docs/i/11-1387- innovation-and-research-strategy- for-growth.pdf
    18. 18. Funder principles/expectations 1. Public good 2. Preservation 3. Discovery 4. Confidentiality 5. First use 6. Recognition 7. Public funding Six of the seven RCUK councils require data management plans (or equivalent), as do Wellcome Trust, Cancer Research UK, and more…
    19. 19. Meanwhile, in the USA…
    20. 20. (Aside: Open Data)  Open Data is a philosophy, underpinned by pragmatism… transparency + utility.  “Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.” – Wikipedia  Governments, cities etc are all getting onboard  Open Knowledge Foundation is basically the political / activist wing: http://okfn.org/  From the government / industry side, we have the Open Data Institute: http://theodi.org/
    21. 21. Controversial FOI requests to… - University of East Anglia - Queens University Belfast - University of Stirling Risk management
    22. 22. - Reinhart & Rogoff (2010) “Growth in a Time of Debt” - paper not peer-reviewed, data not initially made available… - Very influential and repeatedly cited by politicians to lend weight to economic strategy - Multiple issues (selective exclusions, unconventional weightings, coding error) identified by a postgrad researcher attempting to replicate the paper’s findings - Widespread embarrassment, but at least the errors were discovered! Research quality and integrity
    23. 23. 3. BARRIERS AND CURRENT ACTIVITIES
    24. 24. Why don’t we live in a data sharing utopia?  Four main reasons…  Lack of understanding of the fundamental issues  Lack of joined-up thinking within institutions, countries, internationally…  Issues around ownership / privacy  Technical/financial limitations and the need for appraisal
    25. 25. What are UK HEIs doing about it?  Three principal areas of focus  Developing and integrating their technical infrastructure (storage space, repositories/ CRIS systems, data catalogues, etc)  Developing human infrastructure (creating policies, assessing current data management capabilities, identifying areas of good practice, data management plan templates, tailoring training and guidance materials…)  Developing business plans for sustainable services / roles  Forming cross-function (hybrid) working groups, advisory groups, task forces, etc… http://blog.soton.ac.uk/keepi t/2010/01/28/aida-and- institutional-wobbliness/
    26. 26. Quick interactive session: data management planning  Checklist for a Data Management Plan, v4.0 (2013) www.dcc.ac.uk/resource s/data-management- plans  Questions  How confident would you be about completing each section?  What help or advice is available in the university? DMP SECTIONS 1. Administrative Data, e.g. project name, description, PI, funder, etc 2. Data Collection, e.g. description, capture methods, etc 3. Documentation and Metadata, e.g. what information is needed for the data to be to be accessed and understood in the future? 4. Ethics and Legal Compliance, e.g. consent, sensitivity, copyright/IPR 5. Storage and Backup, e.g. where will data be held and backed up? Security and access issues 6. Selection and Preservation, e.g. keep it all or just some? How long should it be kept? 7. Data Sharing, e.g. how will data be found and accessed, any restrictions? 8. Responsibilities and Resources, e.g. who will do it and who will pay?
    27. 27. Quick interactive session: data management planning  Outcomes  It’s not necessary – or even desirable – for every researcher to become expert in every aspect of data management  Universities have an increasing obligation to provide infrastructure and support  Huddersfield have developed a dedicated web area at https://www.hud.ac.uk/cls/researchdata/  Specific expertise may also be available from the research office, library, IT, departmental support staff, legal services, etc…
    28. 28. 4. SUPPORT
    29. 29. i. DCC resources  Publications The DCC publishes a series of themed Briefing Papers, How-To Guides and Case Studies, pitched at different audiences / levels of detail  http://www.dcc.ac.uk/resources/briefing-papers  http://www.dcc.ac.uk/resources/how-guides  http://www.dcc.ac.uk/resources/developing-rdm-services  Training  e.g. DC101 courses and Curation Reference Manual  Advice  e.g. Disciplinary metadata, www.dcc.ac.uk/resources/metadata- standards  Tools  DMPonline, CARDIO, Data Asset Framework, DRAMBORA  Events  International Digital Curation Conference (most recent was in San Francisco, February 2014)  Research Data Management Forum (themed events – next one is on Workflows and Lifecycle Models, London, 20 June 2014)
    30. 30. ii. Other resources  Jisc services and resources  RDM resources, www.jisc.ac.uk/guides/research-data- management  EDINA and Mimas (national data centres)  JISCMRD projects – Phase 1 (2009-2011) and Phase 2 (2011-2013)  1) Research Data Management Infrastructure (RDMI)  2) Research Data Management Planning (RDMP)  3) Support and Tools  4) Citing, Linking, Integrating and Publishing Research Data (CLIP)  5) Research Data Management Training Materials  6) Enhancing DMPonline  7) Events  Universities  Good materials are available from Edinburgh, Cambridge, Oxford, Glasgow, Bristol, and many others
    31. 31. A few rules of thumb…
    32. 32. STORAGE ≠ MANAGEMENT
    33. 33. Greenhouse = storage Horticulture = management DATA
    34. 34. MANAGEMENT ≠ SHARING
    35. 35. But! You generally need a reason NOT to share, e.g. - Commercial interests - Ethical concerns - Data Protection Act So… don’t share it all
    36. 36. Why not? 1. We probably can’t afford the costs of storage: increasing volumes outpace declining storage hardware costs and 2. We probably can’t afford the time it will take to ensure it remains accessible/discoverable According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos, http://www.emc.com/digital_universe And… don’t keep it all
    37. 37. http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html “Keeping 2018’s data in S3 would cost the entire global GDP”
    38. 38. How to decide? 1. Relevance to Mission – including any legal/funder requirement to retain the data beyond its immediate use. 2. Scientific or Historical Value – significance and relationship to publications etc. 3. Uniqueness – can it be found elsewhere / if we don’t preserve it, who will? 4. Potential for Redistribution – quality / IP / ethical concerns are addressed. 5. Non-Replicability – either impossible to replicate (e.g. atmospheric or social science data) or not financially viable. 6. Economic Case – costs of managing and preserving the resource stack up well against potential future benefits. 7. Full Documentation – surrounding / contextual information necessary to facilitate future discovery, access, and reuse is adequate. How to Appraise & Select Research Data for Curation Angus Whyte, Digital Curation Centre, and Andrew Wilson, Australian National Data Service (2010)
    39. 39. A few do’s and don’ts DO DON’T Have a plan for your data Make it up as you go along Keep backups. Make this easy with automated syncing services like Dropbox, provided your data isn’t too sensitive Carry the only copy around on a memory card, your laptop, your phone, etc Describe your data as you collect it. This makes it possible for others to interpret it, and for you to do the same a few years down the line Leave this till later. The quality of metadata decreases with time, and the best metadata is created at the moment of data capture Save your work in open file formats, where possible, and use accepted metadata standards to enable like-with-like comparison Invent new ‘standards’ where community norms already exist Deposit your data in a data centre or repository, and link it to your publications Be afraid to ask for help. This will exist both within your institution, and via national support organisations like the DCC
    40. 40. Last slide: take-home messages  Research data management (RDM) is…  An integral part of doing quality research in the 21st century  Increasingly expected / mandated by funders, publishers and others  An opportunity for new discoveries and different approaches to research  A safeguard against inappropriate data disclosure  An activity that requires careful planning and consideration, and – ideally – coordination and support across many stakeholder types
    41. 41. Thank you Questions? Image credits Slide 2 (forest) – http://assets.worldwildlife.org/photos/934/images/hero_small/forest-overview-HI_115486.jpg?1345533675 Slide 3 (dictionary) – http://www.flickr.com/photos/dougbelshaw/ Slide 12 (politics) – https://www.flickr.com/photos/junglearctic/ Slide 23 (barriers) – http://www.flickr.com/photos/thetrapezium/ Slide 24 (utopia) – http://www.flickr.com/photos/burningmax/ Slide 28 (Thierry) – https://twitter.com/AFC_Fisher/ Slide 33 (greenhouse) – http://www.flickr.com/photos/mykl/ Slide 41 (love note) – http://www.edawax.de/wp-content/uploads/2013/01/Metadata_love250.jpg Thanks to Sarah Callaghan, PREPARDE, for the Rosse example This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License. For more about DCC services see www.dcc.ac.uk or follow us on twitter @digitalcuration and #ukdcc Martin Donnelly Digital Curation Centre University of Edinburgh martin.donnelly@ed.ac.uk @mkdDCC

    ×