Developing institutional RDM services


Published on

Slides from a presentation given at a Digital Curation Centre workshop, Cardiff University, 14 May 2013

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Developing institutional RDM services

  1. 1. Developinginstitutional RDMservicesMichael DayDigital Curation Centre (DCC)UKOLN, University of BathDCC Workshop, Cardiff University 14 May 2013
  2. 2. Session outline Managing active data Storage options Long-term retention of data Selection criteria Data repositories Finding and citing data Data registries and metadata Presentation based on: Sarah Jones, Graham Pryor and Angus Whyte, How toDevelop Research Data Management Services – a guide for HEIs (DCC, 2013): Some slides reused from RDMRose training materials:
  3. 3. Managing activedata
  4. 4. Managing active data: key tasks Researchers: Have a duty to ensure that research data is stored securely and backed-up on aregular basis Have choices (e.g. network drives, laptops, external storage devices, online /cloud-based storage) Need to take data security seriously This should be considered as part of the data management planning process Institutions: Need to be constantly review data holdings and RDM practices in order toevaluate whether current storage infrastructures are sufficient May need to make a case for investing in the provision of additional data storagecapability Need procedures for the allocation and management of storage Need to be flexible, taking account of a diverse range of research contexts anddata storage requirements
  5. 5. Research data storage Trend for some HEIs to enhance the capacity ofresearch data storage facilities Extending capacity of existing filestores (e.g. Bath) Exploring secure cloud storage Utilising High Performance Computing facilities Managing storage University of Bristol (data.bris) – registered researchers (datastewards) are allocated 5TB storage to manage, e.g. decidinghow long data should be kept, who has access, etc.
  6. 6. Options for managing active data Cloud storage options There may be benefits in terms of costs and expertise There may also be risks (e.g. loss of control, jurisdictionalissues) Janet Brokerage - promoting the use of cloud and off-site datacentre facilities Academic dropbox-like services Dropbox is often used for sharing and synching data betweenmachines, but institutions are keen to retain control Systems developed in-house Typically developed with an disciplinary focus, e.g. BRISSkit(biomedicine)
  7. 7. Selection for the long-term retention of data
  8. 8. Selecting data for retention RCUK, Common Principles on Data Policy (2011): “Data with acknowledged long-term value should be preserved and remainaccessible and usable for future research” Institutions will need to establish clear criteria to guide decisions on whatshould be kept It will not be possible to retain everything Carefully considered selection processes are essential to help prioritise that datathat has long-term value Institutional selection processes will need to take account of: Data that institutions are legally obliged to retain (or destroy), e.g. for contractualor regulatory reasons Different disciplinary practices (e.g., some disciplines will have mature datasharing infrastructures and will already deposit data with third party services) Researcher sensitivities about losing control of data (deposit agreements)
  9. 9. Developing guidance on selection Establishing guidelines, processes and goodpractice for data selection and deposit can beone of the more challenging aspects of an RDMservice There is a need for buy-in from researchers There is a need for clarity on what kinds of data arewithin the remit of an institutional RDM service There may be a need to apply different levels ofcuration, e.g. depending on the perceived value of thedata accepted
  10. 10. DCC selection categories DCC How to Select and Appraise Research Data forCuration (Whyte and Wilson, 2010) proposes sevenmain criteria: Relevance to mission Scientific or historic value Uniqueness Potential for redistribution Non-replicability Economic case Full documentation
  11. 11. Data repositories
  12. 12. Data repositories Focusing on how data will be preserved andmade available for others Main options: Developing an institutional data repository Building, where possible, on existing systems, e.g. IR, CRIS,etc. Essex Research Data demo: Liaising with external research data repositories (or datacentres) Often subject based, some UK data centres supported byfunding bodies Providing researchers with information on external services
  13. 13. Data catalogues
  14. 14. RCUK Common Principles RCUK, Common Principles on Data Policy (2011): “To enable research data to be discoverable andeffectively re-used by others, sufficient metadata shouldbe recorded and made openly available to enable otherresearchers to understand the research and re-usepotential of the data. Published results should alwaysinclude information on how to access the supportingdata” Also EPSRC Principle 6
  15. 15. EPSRC Expectation V “Research organisations will ensure that appropriatelystructured metadata describing the research data theyhold is published (normally within 12 months of the databeing generated) and made freely accessible on theinternet; in each case the metadata must be sufficient toallow others to understand what research data exists,why, when and how it was generated, and how toaccess it. Where the research data referred to in themetadata is a digital object it is expected that themetadata will include use of a robust digital objectidentifier (For example as available through theDataCite organisation -” material produced by RDMRose
  16. 16. Some questions to consider What metadata is required to adequately recorddatasets? What is “sufficient metadata” for discovery andre-use? Does any of this metadata already exist? If so, where might it be found? If not, how can the appropriate metadata be generated orcaptured? Will there be a need to share this metadata, e.g. withthird-party discovery services? National data services? If so, what standards exist to support metadata sharing?
  17. 17. Examples: UKOLN Scoping Study Scientific Data Application Profile Scoping Study (UKOLN, 2009) Building on work undertaken on the Scholarly Works Application Profile(SWAP) Analysed the metadata used by UK data centres and repositories,selected domain models (e.g. DDI, CCLRC Metadata Model, CIDOCCRM) Concluded that: Simple Dublin Core (e.g., as mandated by OAI-PMH) would be insufficient There was sufficient convergence between the different schemas to suggestthat a generic metadata profile could be constructed A generic metadata profile would benefit interdisciplinary research andinstitution based services (e.g. IRs)
  18. 18. Examples: DataCite metadata (1) DataCite:Organisation aiming to facilitate easier accessto (and citation of) research data, e.g. throughthe use of persistent identifiers (DOIs)DataCite Metadata Schema (currently v. 2.2,2011) defines core metadata propertiesBroadly based on Dublin Core concepts
  19. 19. Examples: DataCite metadata (2) Mandatory Properties: Identifier Creator Title Publisher PublicationYear Administrative Metadata LastMetadataUpdate MetadataVersionNumber Optional Properties: Subject Contributor Date Language ResourceType AlternateIdentifier RelatedIdentifier Size Format Version Rights Description
  20. 20. Examples: University of Oxford The DaMaRO project at the University of Oxford is developinga metadata schema for its DataFinder (Rumsey, 2012). A three-tier metadata approach: Mandatory minimal metadata to enable basic discovery, such asCreator, Title, Publisher, Date, Location, Access terms &conditions Mandatory contextual metadata (mostly administrative andpartly based on EPSRC expectations), such as Funding Agency,Grant Number, Last access request date, Project Information,Data Generation Process, Why the data was generated, Date(range) of data collection, Reasons for embargoes Optional metadata (including discipline-specific metadata) toenable reuse, such as machine settings and the experimentalconditions under which the data were gatheredMay-13Learning material produced by RDMRose
  21. 21. Examples: University of Essex RDE Metadata Profile for EPrints Based on DataCite, INSPIRE, DDI 2.1 and DataShare Mixture of generic schema and standards specific tosocial science data Seems to be convergence on layered approach
  22. 22. Some practical questions (1) Technical choices for institutions: Developing new institutional services, e.g. theapproach taken by ANDS: Defining metadata stores by their coverage, the granularity ofdata that they describe, and the specialisation of theirdescriptions (e.g. collection-level, object level, local,institutional, national and discipline-specific) Building upon existing infrastructures, e.g.: Institutional Repositories CRIS (e.g. Pure, Symplectic, Converis)
  23. 23. Some practical questions (2) Research Information Management interaction? There is interest in what RIM standards like CERIF can offer RDM (e.g.potentially richer metadata structures for linking research outputs withorganisational groupings and funding streams, some level of buy-in fromfunding bodies), but implementation CERIF for Datasets (C4D): We need to think about how metadata can be shared with: Discipline-based repositories and data centres Emerging national (and international) discovery infrastructures Australian National Data Service Uses RIF-CS schema (based on ISO 2146:2010) as a data interchange format Jisc and DCC are currently exploring the options for collating metadataabout research data at national level
  24. 24. Data citation
  25. 25. Data Citation Issues include (Ball & Duke, 2011a and b): At what granularity should data be made citeable? How to credit each contributor in a dataset that isassembled from very many contributions? Where in a research paper should a data citation begiven (e.g. a paper describing a dataset versussubsequent papers using it)? What to do with frequently updated data?May-13Learning material produced by RDMRose
  26. 26. DataCite DataCite ( is a not-for-profitorganisation that aims to promote and support thesharing of research data They are developing an infrastructure that supportsmethods of data citation, discovery, and access They are currently leveraging the DOI (Digital ObjectIdentifier) infrastructure, which is also used for researcharticles They can provide DOIs for datasets DataCite DOIs have to resolve to a public landing pagewith information about the dataset and a direct link to itMay-13Learning material produced by RDMRose
  27. 27. DataCite Basic form: Creator (PublicationYear): Title. Publisher. Identifier Version and ResourceType are optional extra elements For citation purposes, DataCite recommends that DOInames are displayed as linkable, permanent URLs More info in DataCite (2011) University of Poppleton (2011): Precipitationmeasurements 1905-2010 taken at Western Bankweather station. Meteorological service, The Universityof Poppleton. material produced by RDMRose
  28. 28. References Ball, A., (2009). Scientific Data Application Profile Scoping Study Report. Bath:UKOLN, University of Bath. Retrieved from: Ball, A., & Duke, M. (2011a). Data Citation and Linking. DCC Briefing Papers.Edinburgh: Digital Curation Centre. Retrieved from Ball, A., & Duke, M. (2011b). How to Cite Datasets and Link to Publications. DCCHow-To Guides. Edinburgh: Digital Curation Centre. Retrieved from DataCite (2011). DataCite Metadata Schema for the Publication and Citation ofResearch Data. Version 2.2. London: DataCite. Retrieved from Rumsey, S. (2012). Just enough metadata: Metadata for research datasets ininstitutional data repositories [PowerPoint presentation]. Oxford: The University ofOxford. Retrieved from material produced by RDMRose
  29. 29. Questions?