Research data management: a tale of two paradigms:


Published on

Presentation I was supposed to give at "Scotland’s Collections and the Digital Humanities" workshop in Edinburgh on May 2nd 2014. Illness prevented it, but my heroic DCC colleague Jonathan Rans stepped up and delivered the presentation on my behalf.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Research data management: a tale of two paradigms:

  1. 1. Research Data Management: a tale of two paradigms ‘Scotland's National Collections and the Digital Humanities’ workshop series #2 Tom Phillips, A Humument (1970, 1986, 1998, 2004, 2012…) Martin Donnelly, Digital Curation Centre, University of Edinburgh Edinburgh, 2 May 2014
  2. 2. Overview 1. Introductions and definitions  The Digital Curation Centre  Research data management  What do we mean by ‘data’, exactly? 2. Data as a hot topic: politics and practical concerns 3. Data in/and the Arts and Humanities  How the Arts and Humanities differ  Strengths and weaknesses  Reflections on opportunities for exploration at national level 4. Resources
  4. 4. The Digital Curation Centre  The (est. 2004) is…  A UK centre of expertise in digital preservation. Emerged from the e-Journal preservation field, now with a particular focus on research data management (RDM)  Based across three sites: Universities of Edinburgh, Glasgow and Bath  Working with a number of UK universities to identify gaps in RDM provision and raise capabilities across the sector  Also involved in a variety of national and international collaborations…
  5. 5. DCC networks and partnerships
  6. 6. What is research data management? “the active management and appraisal of data over the lifecycle of scholarly and scientific interest” Data management is a part of good research practice. - RCUK Policy and Code of Conduct on the Governance of Good Research Conduct
  7. 7. The old way of doing things 1. Researcher collects data (information) 2. Researcher interprets/synthesises data 3. Researcher writes paper based on data 4. Paper is published (and preserved) 5. Data is left to benign neglect, and eventually ceases to be accessible
  8. 8. The new way of doing things Plan Collect Assure Describe Preserve Discover Integrate Analyze SHARE …and RE-USE The DataONE lifecycle model
  9. 9. Other models are available… Ellyn Montgomery, US Geological Survey
  10. 10. Helicopter view:What are the benefits of RDM?  TRANSPARENCY: The data that underpins research can be made open for anyone to scrutinise, and attempt to replicate findings.  EFFICIENCY: Data collection can be funded once, and used many times for a variety of purposes.  RISK MANAGEMENT: A pro-active approach to data management reduces the risk of inappropriate disclosure of sensitive data, whether commercial or personal.  PRESERVATION: Lots of data is unique, and can only be captured once. If lost, it can’t be replaced.
  11. 11.  Definitions vary from discipline to discipline, and from funder to funder…  Here’s a science-centric definition:  “The recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” (US Office of Management and Budget, Circular 110)  [Addendum: This policy applies to scientific collections, known in some disciplines as institutional collections, permanent collections, archival collections, museum collections, or voucher collections, which are assets with long-term scientific value. (US Office of Science and Technology Policy, Memorandum, 20 March 2014)]  And another from the visual arts:  “Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.” (Leigh Garrett, KAPTUR project: see 2013/01/23/what-is-visual-arts-research-data-revisited/) Okay, but what is ‘data’ exactly?
  12. 12.  Are the goals – or indeed the concepts – of evidence, facts, validation, replication still central in disciplines reliant on subjectivity, interpretation, argument and qualities of expression?  How do we identify, preserve and share ephemera, emotions, the unconscious…? How do we protect rights around creative data? What are the financial/ ownership issues accompanying creative / Arts research?  Is it clear where creative research begins and ends? How can we differentiate between funded research and unfunded personal work?  What problems are introduced by practice-driven research?  To what extent is non-digital material a problem? Can we share approaches to this with other subject areas (e.g. biology, geology)?  What other characteristics do Arts and Humanities data have in common with those of the Sciences? Which other disciplines share these issues more generally? A few questions around data in the Arts and Humanities
  14. 14. Nature, 09/08 Economist, 02/10 Popular Science,Science, 02/11 Nature, 09/09ACM, 12/08 InformationWeek, 08/10 Computerworld, A hot topic: 5 years of front pages…
  15. 15.  Developments in sensor technology, networking and digital storage enable new research and scientific paradigms  As costs also fall, possibilities for data sharing, citation and re-use become much more widespread  Journals dedicated solely to publishing data have even started to appear. That’s not to say it’s an entirely new thing: journals have always published data, just never before at such scale… Technology
  16. 16. Rosse from Philosophical Transactions of the Royal Society, (MDCCCLXI) (or 1861 if you’d prefer)
  17. 17. Repurposing /VfM via data re-use Ships’ log books build picture of climate change 14 October 2010 You can now help scientists understand the climate of the past and unearth new historical information by revisiting the voyages of First World War Royal Navy warships. Visitors to will be able to retrace the routes taken by any of 280 Royal Navy ships. These include historic vessels such as HMS Caroline, the last survivor of the 1916 Battle of Jutland still afloat. By transcribing information about the weather and interesting events from images of each ship's logbook, web volunteers will help scientists build a more accurate picture of how our climate has changed over the last century. htm Detail from Royal Navy Recruitment poster, RNVR Signals branch, 1917 (Catalogue reference: ADM 1/8331) Endeavour, 1768-71 (Captain Cook) HMS Beagle, 1830-34 HMS Torch, 1918
  18. 18. 6.9 The Research Councils expect the researchers they fund to deposit published articles or conference proceedings in an open access repository at or around the time of publication. But this practice is unevenly enforced. Therefore, as an immediate step, we have asked the Research Councils to ensure the researchers they fund fulfil the current requirements. Additionally, the Research Councils have now agreed to invest £2 million in the development, by 2013, of a UK ‘Gateway to Research’. In the first instance this will allow ready access to Research Council funded research information and related data but it will be designed so that it can also include research funded by others in due course. The Research Councils will work with their partners and users to ensure information is presented in a readily reusable form, using common formats and open standards. Government pressure/support e/innovation/docs/i/11-1387- innovation-and-research-strategy- for-growth.pdf
  19. 19. (Aside: Open Data)  Open Data is a philosophy, underpinned by pragmatism… transparency + utility.  “Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.” – Wikipedia  Governments, cities etc are all getting onboard  Open Knowledge Foundation is basically the political / activist wing:  From the government / industry side, we have the Open Data Institute:
  20. 20. Controversial FOI requests to… - University of East Anglia - Queens University Belfast - University of Stirling Risk management
  21. 21. - Reinhart & Rogoff (2010) “Growth in a Time of Debt” - paper not peer-reviewed, data not initially made available… - Very influential and repeatedly cited by politicians to lend weight to economic strategy - Multiple issues (selective exclusions, unconventional weightings, coding error) identified by a postgrad researcher attempting to replicate the paper’s findings - Widespread embarrassment, but at least the errors were discovered! Research quality and integrity
  22. 22. Why don’t we live in a data sharing utopia?  Four main reasons…  Lack of understanding of the fundamental issues  Lack of joined-up thinking within institutions, countries, internationally…  Issues around ownership / privacy  Technical/financial limitations and the need for appraisal  National bodies may be well-placed to address some of these
  23. 23. What do research funders have to say? (i)  Seven “Common Principles on Data Policy” – Data as a public good; Preservation; Discovery; Confidentiality; Right of first use; Recognition; Public funding for RDM  Six of the seven RCUK councils require data management plans, or equivalent, at the application stage  The seventh (EPSRC) requires nothing short of an institutional data infrastructure
  24. 24. 3. DATA INTHE ARTS AND HUMANITIES Kailie Parrish, “In My Dreams”
  25. 25. What do research funders have to say? (ii)  AHRC requires that significant electronic resources or datasets are made available in an accessible repository for at least three years after the end of the grant  AHRC used to run several data services. Most stopped being funded in 2008, but the Archaeology Data Service remains at York, and the Visual Arts Data Service at UCA.  ESRC applicants submit a statement on data sharing in the relevant section of the Je-S form, and provide a two-page data management and sharing plan addressing 9 distinct themes  Datasets must be offered to the UK Data Archive on conclusion of the project
  26. 26.  Some characteristics of Arts and Humanities data are likely to require a different kind of handling from that afforded to other disciplines  Arts ‘data’ is often personal, and creative data in particular may not be factual in nature. Furthermore, it may be quite valuable or precious to its creator. What matters most may not be the content itself, but rather the presentation, the arrangement, the quality of expression…  This tends to be why Open Access embargoes are often longer in the Arts and Humanities than other areas  Digital ‘data’ emerging in the Arts is as likely to be an outcome of the creative research process as an input to a workflow. This is at odds with the scientific method, and how most RDM resources are described. Problems re. data in the Arts and Humanities
  27. 27. Scientific and other methods…  The scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge.  To be termed scientific, a method of inquiry must be based on empirical and measurable evidence subject to specific principles of reasoning.  The Oxford English Dictionary defines the scientific method as: “a method or procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses.”  Source: An art methodology differs from a science methodology, perhaps mainly insofar as the artist is not always after the same goal as the scientist. In art it is not necessarily all about establishing the exact truth so much as making the most effective form (painting, drawing, poem, novel, performance, sculpture, video, etc.) through which ideas, feelings, perceptions can be communicated to a public. With this purpose in mind, some artists will exhibit preliminary sketches and notes which were part of the process leading to the creation of a work. Sometimes, in Conceptual art, the preliminary process is the only part of the work which is exhibited, with no visible end result displayed. In such a case the "journey" is being presented as more important than the destination. Source:
  28. 28.  There’s nothing new about data re-use in the Arts and Humanities; it’s an integral part of the culture, and always has been  Think Kristeva’s intertextuality, Barthes’ ‘galaxy of signifiers’, Shakespeare’s plots, Lanark’s assorted ‘plagiarisms’, Edwin Morgan’s ‘found’ newspaper poems, Marcel Duchamp, variations on a theme, collage and intermedia art, T.S. Eliot, sampling/hip-hop, etc etc  (  However, it’s often more fraught than data re-use in other areas (such as the Sciences)  For starters, people tend not to think of their sources or influences as ‘data’, and the value and referencing systems are quite different  Furthermore, practice /praxis based research is pretty much the sole preserve of the Humanities, and research / production methods are not always rigorously methodical / linear… Strengths and weaknesses re. data in the Arts and Humanities
  29. 29.  REFUGE: Many universities are developing data repositories for their funded research data, but a comparatively high proportion of Arts research does not receive external funding, so there’s less incentive for the institutions to provide support (no stick, and little demand from researchers)  APPRAISAL, STEWARDSHIP AND DISCOVERY: Furthermore, it is (probably/usually) preferable for data to be deposited in discipline- or domain-specific repositories. There’s a gap in the market, and national bodies are already experienced in managing large digital collections.  SUPPORT AND ADVOCACY: Humanities scholars are entirely comfortable with the use of primary and secondary sources. It just requires a little translation for the core concepts of RDM to become meaningful in an Arts and Humanities context. The trust is already there. National roles around Arts and Humanities data?
  30. 30. 4. RESOURCES
  31. 31. i. Arts-centric resources  DCC and University of the Arts London were both involved in the KAPTUR project:  DCC subsequently ran an institutional engagement with UAL between 2011 and 2013, which developed…  A data management guidance web area: management/data-management/  An institutional policy: Research-Data-Management-Policy.pdf  A UAL data management planning template in  A UAL data community-of-practice is being launched, with support of the senior management  Events  RDMF10: “Research data management in the Arts and Humanities”, Oxford, September 2013  UoE Digital Humanities workshop: “Managing Humanities Research Data”, Edinburgh, January 2014
  32. 32. ii. Other DCC resources  Publications  Briefing Papers and How-To Guides  Training  e.g. DC101 events and Curation Reference Manual  Advice  e.g. Disciplinary metadata, standards  Tools  DMPonline, CARDIO, Data Asset Framework, DRAMBORA
  33. 33. iii. Further resources  JISC Services  RDM resources,  EDINA and Mimas (national data centres)  JISCMRD projects (Phase 1 (2009-2011) and Phase 2 (2011-2013)) covered a wide range of topics, including infrastructure, planning, training, support and guidance, events and tools  Universities  Great RDM materials are available from Edinburgh, Cambridge, Oxford, Glasgow, Bristol, and many other places  Alliance of Digital Humanities Organizations (ADHO) 
  34. 34.  “Ten recommendations for libraries to get started with research data management: Final report of the LIBER working group on E-Science / Research Data Management” - Christensen-Dalsgaard et al. (LIBER, 2012)  “Curating research data: the potential roles of libraries and information professionals”, Nielsen & Hjørland (2014) Journal of Documentation, Vol. 70 Iss: 2, pp.221 - 240  For more on potential future roles for librarians, see slides from Open Repositories 2013 workshop:  Two recent surveys about libraries and data…  USA & Canada – “Academic Libraries and Research Data Services: Current practices and plans for the future” - Tenopir, Birch & Allard, University of Tennessee (Association of College & Research Libraries, June 2012)  UK – “Research data management and libraries: Current activities and future priorities” - Cox & Pinfield, Information School, University of Sheffield (Journal of Librarianship and Information Science, June 2013) iv. Further reading
  35. 35. Last slide: take-home messages  Research data management (RDM) is…  An integral part of doing quality research in the 21st century  Increasingly expected / required by funders, publishers and others  An opportunity for new discoveries and different approaches to research  A safeguard against inappropriate data disclosure  Sometimes complicated in the Arts and Humanities!  And hence… an activity that requires careful planning and consideration, and – ideally – coordination and support at many levels
  36. 36. Thank you Questions? Image credits Slide 2 (forest) – Slide 3 (dictionary) – Slide 13 (politics) – Slide 22 (utopia) – Slide 30 (Thierry) – Slide 36 (love note) – Thanks to Sarah Callaghan, PREPARDE, for the Rosse example This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License. For more about DCC services see or follow us on twitter @digitalcuration and #ukdcc Martin Donnelly Digital Curation Centre University of Edinburgh @mkdDCC