Facing the Research Data
                 Challenge:l
     Developing Data Policy and Services

                                        Marieke Guy
                                  Digital Curation Centre
                                    m.guy@ukoln.ac.uk
                This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of
                this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard   Funded by:
                Street, 5th Floor, San Francisco, California, 94105, USA.



DCC London, Imperial College, 22 May 2012 #dcc_london
Outline
• Who is responsible for RDM?

• What are the components of a data service?

• Learning lessons from other HEIs

• Developing policies and roadmaps




DCC London, Imperial College, 22 May 2012 #dcc_london
Who is Responsible for RDM?
                                          Funders
               Advisory                                   Data
                bodies                                   centres
                                        Research
                                      Organisations
               Support                                  Publishers
               services
                                       Researchers



DCC London, Imperial College, 22 May 2012 #dcc_london
Components of a Research Data Service?
                      Tools                Support staff & services



                                            Metadata and documentation
                       Research
                                                                Archive
                    environment&
                                            Storage
                        systems                                Preserve
                                            Back-up
                     RDM policies                               & Share

                                             Access
                         Advocacy (senior mgmt & researcher)


DCC London, Imperial College, 22 May 2012 #dcc_london
Data Storage – Bristol Example
• £2m funding to date; further
  investment planned
• Available to all researchers for
  research data
• Petascale facility – expandable
• 3 machine rooms – resilience
  (tape archive 2012)                Blue Peta at Bristol
• 1st 5TB free per Data Steward then

   £400 per TB p.a. for disk storage;
   tape backup £40 per TB
     http://data.bris.ac.uk
DCC London, Imperial College, 22 May 2012 #dcc_london
Archiving – Institutional Data Repositories

                          Not intended to replace
                             national, subject or
http://datashare.is.ed.ac.ukother established data
                                  collections                   Essex-RDR and
                                                            DataPool at Southampton

                                   Acknowledge hybrid
                                      environment
  www.dspace.cam.ac.uk/

                                                          https://databank.ora.ox.ac.uk
  DCC London, Imperial College, 22 May 2012 #dcc_london
Archiving – External Data Centres
 Research funders’                                       Structured databases
 data centres…




Disciplinary&
 community                                       List of repositories
  initiatives                                    & data centres:
                                                 http://datacite.org/repolist

 DCC London, Imperial College, 22 May 2012 #dcc_london
Data Registries (metadata)
                                               RADAR: Researching a
 CERIF for Datasets                            Data Asset Registry
                                                   http://radar.blogs.edina.ac.uk
  Develop an extension to the
  research information standard
http://cerif4datasets.wordpress.com
                 Can we learn lessons from overseas?




  DCC London, Imperial College, 22 May 2012 #dcc_london
Guidance and training
Collate guidance
www.gla.ac.uk/datamanagement


                                                 Online training
                                                 http://datalib.edina.ac.uk/mantra
                                                 and others from JISC RDMTrain


                                                        Embed into curriculum via
                                                        Doctoral Training Centres
                                                        e.g. Research360@Bath
                                                        http://blogs.bath.ac.uk/research360

DCC London, Imperial College, 22 May 2012 #dcc_london
Disciplinary Training (RDMTrain)
 • The training materials they created are mapped to the
   lifecycle model below.
 • The projects were:
      • CAIRO – performing arts (Uni of Bristol)
      • DataTrain- archaeology and social
        anthropology (Uni of Cambridge)
      • DATUM for Health – health sciences
        (Northumbria Uni)
      • DMTpsych – psychology (Uni of York,
        Sheffield Unis)
      • Research Data MANTRA – geosciences,
        social sciences (Uni of Edinburgh)




DCC London, Imperial College, 22 May 2012 #dcc_london
Existing Research Data Policies
 • University of Oxford
   Statement of commitment until infrastructure is in place
 • University of Edinburgh
   10 short principles, described as ‘aspirational’
 • University of Northampton
   brief policy on RCUK Code, detailing procedures & support
 • University of Hertfordshire
   part of wider data management policy – guide as appendix
 • University of East London
   newest policy, based on Edinburgh’s
 www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies

DCC London, Imperial College, 22 May 2012 #dcc_london
How are Others Developing Policies?
 • Towards a RDM policy at Manchester
    Reviewed existing policies, collated funder
    requirements, drafted policy for discussion

 • Driving institutional data policy at Southampton
    Draft policy and series of user guides put forward for to
    University Advisory/Executive groups for ratification

 www.dcc.ac.uk/news/developing-institutional-data-policies-trend-2012


DCC London, Imperial College, 22 May 2012 #dcc_london
JISC MRD Leeds Workshop
 • Programme workshop on institutional research data management
   policy development and implementation
 • Themes/thoughts:
      • Institutions are still all at different stages with their research data management
        policies.
      • Having a policy in place without any real buy-in from staff can be more harmful
        over time .
      • Think about if your policy is aspirational or a working document
      • Policy and infrastructure need to evolve in correlation.
      • Consider the other policies – both internal and external – with which your new
        research data management policy should work in concert.
      • Retain awareness of the different roles and legislation for research data and
        administrative data.
      • Try to avoid taking the view that researchers will automatically resist
        implementation of a research data management policy. 
      http://bit.ly/jiscwestwood

DCC London, Imperial College, 22 May 2012 #dcc_london
Slide courtesy of Robin Rice, University of Edinburgh
DCC London, Imperial College, 22 May 2012 #dcc_london
Lots to think about and develop,
             so where to start?




DCC London, Imperial College, 22 May 2012 #dcc_london
Make a plan!

       “EPSRC expects all those it funds to have
   developed a clear roadmap to align their policies
    and processes with EPSRC’s expectations by 1st
    May 2012, and to be fully compliant with these
           expectations by 1st May 2015.”


www.epsrc.ac.uk/about/standards/researchdata/Pages/impact.aspx
DCC London, Imperial College, 22 May 2012 #dcc_london
What is a Roadmap?
 • a plan made up of stages

 • a guideline which it is necessary to follow during
   the entire project

 • a visual showing the key streams of activity that
   a person, team, or organisation needs to
   complete to achieve set objectives, usually
   keyed to a specific timeline

DCC London, Imperial College, 22 May 2012 #dcc_london
Key Elements in EPSRC Requirements
 • Ensure published research papers state how and on what terms any
   supporting research data may be accessed (ii)

 • Have policies and processes to maintain effective internal awareness
   of research data holdings and third-party access requests (iii)

 • Publish appropriately structured metadata (normally within 12
   months of the data being generated) including DOIs (v)

 • Securely preserve research data for a minimum of 10-years from end
   of embargo or last 3rd party access request (vii)

 • Ensure effective data curation throughout the full data lifecycle (viii)
 www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
DCC London, Imperial College, 22 May 2012 #dcc_london
What is the EPSRC Looking For?
 • Know what you hold – publish metadata- record
   access requests
 • Link publications and data
 • Share data whenever possible
 • Curate and preserve valuable data

 The same as other funders (i.e good research
 practice) so think broadly when you develop your
 Strategy – where does it fit in?
DCC London, Imperial College, 22 May 2012 #dcc_london
RDM Infrastructure
        RDM                                Institutional
     Strategy                                  Policy
     (includes
       EPSRC
    Roadmap)



    Guidelines                 DMP                    DMP
                          (departmental)            (project)

•   Institutional policy – This is what the institution is committed to do.
•   Strategy/action plan/roadmap – This is the institution’s response to
    expectations placed on them by research councils etc.
•   Guidelines – This is what the institution expect of staff (& services
    available, and where responsibilities lie).
•   Data management plans – This is staff are going to do at a departmental or
    project level.
Roles & Responsibilities




DCC London, Imperial College, 22 May 2012 #dcc_london
Questions?

 • Slides from DCC Roadshow Web site




DCC London, Imperial College, 22 May 2012 #dcc_london
Exercise: Developing a Roadmap for RDM

Think about the potential components of a RDM service

Based on the strengths/weaknesses you identified in the quiz:
• Draft a list of actions needed at your institution

• Attempt to prioritise your list and pencil in timeframes (consider quick
  wins!)

• Decide who needs to be involved to make this happen?

• Discuss how to make these plans public?

DCC London, Imperial College, 22 May 2012 #dcc_london

Facing the data challenge: Developing data policy & services

  • 1.
    Facing the ResearchData Challenge:l Developing Data Policy and Services Marieke Guy Digital Curation Centre m.guy@ukoln.ac.uk This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Funded by: Street, 5th Floor, San Francisco, California, 94105, USA. DCC London, Imperial College, 22 May 2012 #dcc_london
  • 2.
    Outline • Who isresponsible for RDM? • What are the components of a data service? • Learning lessons from other HEIs • Developing policies and roadmaps DCC London, Imperial College, 22 May 2012 #dcc_london
  • 3.
    Who is Responsiblefor RDM? Funders Advisory Data bodies centres Research Organisations Support Publishers services Researchers DCC London, Imperial College, 22 May 2012 #dcc_london
  • 4.
    Components of aResearch Data Service? Tools Support staff & services Metadata and documentation Research Archive environment& Storage systems Preserve Back-up RDM policies & Share Access Advocacy (senior mgmt & researcher) DCC London, Imperial College, 22 May 2012 #dcc_london
  • 5.
    Data Storage –Bristol Example • £2m funding to date; further investment planned • Available to all researchers for research data • Petascale facility – expandable • 3 machine rooms – resilience (tape archive 2012) Blue Peta at Bristol • 1st 5TB free per Data Steward then £400 per TB p.a. for disk storage; tape backup £40 per TB http://data.bris.ac.uk DCC London, Imperial College, 22 May 2012 #dcc_london
  • 6.
    Archiving – InstitutionalData Repositories Not intended to replace national, subject or http://datashare.is.ed.ac.ukother established data collections Essex-RDR and DataPool at Southampton Acknowledge hybrid environment www.dspace.cam.ac.uk/ https://databank.ora.ox.ac.uk DCC London, Imperial College, 22 May 2012 #dcc_london
  • 7.
    Archiving – ExternalData Centres Research funders’ Structured databases data centres… Disciplinary& community List of repositories initiatives & data centres: http://datacite.org/repolist DCC London, Imperial College, 22 May 2012 #dcc_london
  • 8.
    Data Registries (metadata) RADAR: Researching a CERIF for Datasets Data Asset Registry http://radar.blogs.edina.ac.uk Develop an extension to the research information standard http://cerif4datasets.wordpress.com Can we learn lessons from overseas? DCC London, Imperial College, 22 May 2012 #dcc_london
  • 9.
    Guidance and training Collateguidance www.gla.ac.uk/datamanagement Online training http://datalib.edina.ac.uk/mantra and others from JISC RDMTrain Embed into curriculum via Doctoral Training Centres e.g. Research360@Bath http://blogs.bath.ac.uk/research360 DCC London, Imperial College, 22 May 2012 #dcc_london
  • 10.
    Disciplinary Training (RDMTrain) • The training materials they created are mapped to the lifecycle model below. • The projects were: • CAIRO – performing arts (Uni of Bristol) • DataTrain- archaeology and social anthropology (Uni of Cambridge) • DATUM for Health – health sciences (Northumbria Uni) • DMTpsych – psychology (Uni of York, Sheffield Unis) • Research Data MANTRA – geosciences, social sciences (Uni of Edinburgh) DCC London, Imperial College, 22 May 2012 #dcc_london
  • 11.
    Existing Research DataPolicies • University of Oxford Statement of commitment until infrastructure is in place • University of Edinburgh 10 short principles, described as ‘aspirational’ • University of Northampton brief policy on RCUK Code, detailing procedures & support • University of Hertfordshire part of wider data management policy – guide as appendix • University of East London newest policy, based on Edinburgh’s www.dcc.ac.uk/resources/policy-and-legal/institutional-data-policies DCC London, Imperial College, 22 May 2012 #dcc_london
  • 12.
    How are OthersDeveloping Policies? • Towards a RDM policy at Manchester Reviewed existing policies, collated funder requirements, drafted policy for discussion • Driving institutional data policy at Southampton Draft policy and series of user guides put forward for to University Advisory/Executive groups for ratification www.dcc.ac.uk/news/developing-institutional-data-policies-trend-2012 DCC London, Imperial College, 22 May 2012 #dcc_london
  • 13.
    JISC MRD LeedsWorkshop • Programme workshop on institutional research data management policy development and implementation • Themes/thoughts: • Institutions are still all at different stages with their research data management policies. • Having a policy in place without any real buy-in from staff can be more harmful over time . • Think about if your policy is aspirational or a working document • Policy and infrastructure need to evolve in correlation. • Consider the other policies – both internal and external – with which your new research data management policy should work in concert. • Retain awareness of the different roles and legislation for research data and administrative data. • Try to avoid taking the view that researchers will automatically resist implementation of a research data management policy.  http://bit.ly/jiscwestwood DCC London, Imperial College, 22 May 2012 #dcc_london
  • 14.
    Slide courtesy ofRobin Rice, University of Edinburgh DCC London, Imperial College, 22 May 2012 #dcc_london
  • 15.
    Lots to thinkabout and develop, so where to start? DCC London, Imperial College, 22 May 2012 #dcc_london
  • 16.
    Make a plan! “EPSRC expects all those it funds to have developed a clear roadmap to align their policies and processes with EPSRC’s expectations by 1st May 2012, and to be fully compliant with these expectations by 1st May 2015.” www.epsrc.ac.uk/about/standards/researchdata/Pages/impact.aspx DCC London, Imperial College, 22 May 2012 #dcc_london
  • 17.
    What is aRoadmap? • a plan made up of stages • a guideline which it is necessary to follow during the entire project • a visual showing the key streams of activity that a person, team, or organisation needs to complete to achieve set objectives, usually keyed to a specific timeline DCC London, Imperial College, 22 May 2012 #dcc_london
  • 18.
    Key Elements inEPSRC Requirements • Ensure published research papers state how and on what terms any supporting research data may be accessed (ii) • Have policies and processes to maintain effective internal awareness of research data holdings and third-party access requests (iii) • Publish appropriately structured metadata (normally within 12 months of the data being generated) including DOIs (v) • Securely preserve research data for a minimum of 10-years from end of embargo or last 3rd party access request (vii) • Ensure effective data curation throughout the full data lifecycle (viii) www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx DCC London, Imperial College, 22 May 2012 #dcc_london
  • 19.
    What is theEPSRC Looking For? • Know what you hold – publish metadata- record access requests • Link publications and data • Share data whenever possible • Curate and preserve valuable data The same as other funders (i.e good research practice) so think broadly when you develop your Strategy – where does it fit in? DCC London, Imperial College, 22 May 2012 #dcc_london
  • 20.
    RDM Infrastructure RDM Institutional Strategy Policy (includes EPSRC Roadmap) Guidelines DMP DMP (departmental) (project) • Institutional policy – This is what the institution is committed to do. • Strategy/action plan/roadmap – This is the institution’s response to expectations placed on them by research councils etc. • Guidelines – This is what the institution expect of staff (& services available, and where responsibilities lie). • Data management plans – This is staff are going to do at a departmental or project level.
  • 21.
    Roles & Responsibilities DCCLondon, Imperial College, 22 May 2012 #dcc_london
  • 22.
    Questions? • Slidesfrom DCC Roadshow Web site DCC London, Imperial College, 22 May 2012 #dcc_london
  • 23.
    Exercise: Developing aRoadmap for RDM Think about the potential components of a RDM service Based on the strengths/weaknesses you identified in the quiz: • Draft a list of actions needed at your institution • Attempt to prioritise your list and pencil in timeframes (consider quick wins!) • Decide who needs to be involved to make this happen? • Discuss how to make these plans public? DCC London, Imperial College, 22 May 2012 #dcc_london

Editor's Notes

  • #2 This talk pulls together the lessons from the DCC roadshow to consider how to develop policies and services for Research Data Management (RDM)
  • #3 We ’ll cover who is responsible for RDM and what the potential components of a research data service are. The main part of the talk will focus on how other universities are addressing certain aspects to see where you can learn lessons At the end we ’ll touch on developing roadmaps in light of the EPSRC policy requirement and do an exercise on this
  • #4 There are lots of stakeholders with varied roles, both within organisations and external to them. Requirements and support can be external (e.g. from funders, publishers, data centres) but in terms of developing infrastructure, research organisations are taking a central role. Ensuring clarity of responsibility across stakeholders and bringing people together is key.
  • #5 *Animated slide – components come in separately* This isn ’t definitive. It’s just an idea of the building blocks involved and how they might be put together. - Storage is often though of first. It should be properly backed up with appropriate access controls and ability to access from anywhere - Also need an appropriate environment for research (instruments, hardware, software, VREs) tools and systems e.g. for grants - Aside from current work environments, we also need to consider facilities for archiving to preserve and share data - There ’s an inherent need to access/share data, so we need standards, tools and approaches for metadata across the lifecycle - We have the basics of a system, but none of this works without people to keep things running and provide guidance and training - Also need policies to provide overarching governance - And to ensure uptake and maintenance you need buy-in across the board, incentives and financial backing We ’ll now consider how different institutions are addressing certain aspects of this.
  • #6 The data.bris team gave a case study at the DCC Roadshow in Cardiff in December 2011. This details here are abstracted from that talk. They are building research data services around their High Performance Computing facility to provide all researchers with adequate storage for their research data. The key things to note is the cost model – they provide a clear, up-front cost so additional storage can be written into proposals. Other Universities (Oxford, Leicester) have produced similar figures
  • #7 A few institutions already run data repositories e.g. Edinburgh and Cambridge (both DSpace) Others are piloting them e.g. Essex and Southampton (doing extensions to existing ePrints repositories as part of JISC MRD02 programme) and Databank at Oxford. Key thing is that none of these services intend to replace established data services. Where there are more appropriate disciplinary data centres, for example, the data should be submitted there.
  • #8 There are many external services – dedicated data centres supported by research funders and various structured databases and community initiatives. The list of data centres provided by DataCite is a useful reference for institutions and researchers to identify the most appropriate place of deposit.
  • #9 This area is the aspect most in its infancy. No institutions appear to have a handle on exactly what research data they hold in order to systematically register & manage data, and expose appropriate metadata to facilitate sharing. However, several UK institutions have flagged a desire to develop institutional data catalogues so models are likely to emerge. A pertinent project to look at is C4D, which is developing an extension to the cerif standard to record information on research data. Research Data Australia – a discovery service for research data from Australian universities supported by ANDS – is a model the DCC is looking at to see how a similar service could be provided in the UK.
  • #10 There are many examples of guidance and training – most are Creative Commons licensed so you can repurpose them. At the University of Glasgow, the Incremental project pulled together details of existing support to raise awareness of services that tended to be missed or misunderstood. Mantra provided excellent online training modules, as did other JISC RDMTrain projects. A current trend is to embed RDM into existing curricula e.g. core PhD skills courses. The research360 project is collaborating with a Doctoral Training Centre and reflect on this in their blog
  • #12 There are four institutional RDM policies at present (Feb 2012). These differ in approach: Oxford University doesn’t have a policy per se. They collaborated with the University of Melbourne on the EIDCSR project (c.2009) and realised that implementation is a stumbling block so first introduced a Statement of Commitment until infrastructure was developed. A proper policy is being developed on the DaMaRO project. The University of Edinburgh’s policy is exemplary and seems to be the biggest influence on policy development at other institutions. It was written by an external consultant (Chris Rusbridge) and is described as aspirational as they know there’s some way to go to make it a reality. The University of Northampton reiterates the RCUK Code as its guiding principle and usefully provides guidance on procedures and support to explain how the policy should be implemented. The University of Hertfordshire has RDM requirements as part of a wider data management policy. The language/style is more legal, however an appendix provides much more practical guidance on data management.
  • #13 Other universities are sharing lessons about how they are developing policy. The University of Manchester has released a document which explains how they’ve reviewed existing policies and funder requirements and what they’ve taken from each. The draft policy is included in this. The University of Southampton has blogged about developing their policy but have not yet shared the text. They’re developing a series of user guides to accompany the policy and usefully outline the ratification process, as they have good experience of this from passing open access policies.
  • #14 Other universities are sharing lessons about how they are developing policy. The University of Manchester has released a document which explains how they’ve reviewed existing policies and funder requirements and what they’ve taken from each. The draft policy is included in this. The University of Southampton has blogged about developing their policy but have not yet shared the text. They’re developing a series of user guides to accompany the policy and usefully outline the ratification process, as they have good experience of this from passing open access policies.
  • #15 Many thanks to Robin Rice for this slide, which she presented at the JISC MRD launch event. She spoke about the importance of knowing what your drivers are and getting lots of people involved to help develop your policy. There are also some practical decisions to make e.g. in terms of style and who will write the policy. The really key thing is to know your current situation (service gap analysis) and where you want to be (postcard from the future) so you can plan the transition between these stages
  • #17 Uppermost on many minds at the moment is the requirement to develop a roadmap in response to the EPSRC
  • #18 A question the DCC is often asked is ‘What is a roadmap?’ Here are some basic definitions found online. The key thing isn’t this outcome (i.e. the plan) rather the process of getting there – taking stock of your current position and realising what you need to do to be in a position to comply with the EPSRC policy in 3 years so you can plan for that activity.
  • #19 This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
  • #20 This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
  • #23 This slide pulls out some of the key EPSRC policy requirements which have an impact on service development You need to know what you hold so you can systematically manage data – particularly access requests (iii), You need to make others aware of data holdings by publishing appropriate metadata (ii & v) And you need to proactively manage data throughout its lifecycle i.e. for 10+ years (vii & viii) Other requirements cover specific points about implementation i.e. what metadata to include, where data can be stored (not in a jurisdiction with lower legal safeguards), expectations for curation, and how to fund all of this work.
  • #24 In the exercise, please consider the potential components of a RDM service which we’ve covered here and the strengths and weaknesses you identified earlier in the CARDIO quiz to decide what you need to do, when and how.