EDINA & Data Library@Edinburgh

Stuart Macdonald
Visiting CISER Data Services Librarian

srm262@cornell.edu
EDINA & Data Library (EDL)
• EDINA and University Data Library (EDL) together
are a division within Information Services of the
University of Edinburgh.
• EDINA is a JISC-funded National Data Centre
providing national online resources for education
and research.
• The Data Library assists Edinburgh University
users in the discovery, access, use and
management of research datasets.
EDINA National Data Centre
• Mission statement: “..to enhance the productivity of research,
learning and teaching in UK higher and further education ..”

• Networked access to a range of online resources for UK
FE and HE
• Services free at the point of use for use by staff and
students in learning, teaching and research through
institutional subscription
• Focus is on service but also undertake R&D (projects 
services)
• delivers about 20 online services
• ~10 major projects (incl. services in development)
• employs about 80 staff (Edinburgh & St Helens)
Data Library services and projects
• Data Library & Consultancy
• Edinburgh DataShare
• JISC-funded projects
– DISC-UK DataShare (2007-2009)
– Data Audit Framework Implementation (2008)
– Research Data MANTRA (2010-2011)
–
–
–
–

AddressingHistory
STEEV
UK Repository Net +
AQMeN
Data Library & Consultancy
•
•
•
•
•

finding…
accessing …
using …
teaching …
managing

Building relationships with researchers via postgraduate
teaching activities, research support projects, IS Skills
workshops, Research Data Management training and
through traditional reference interviews.
Edinburgh DataShare was built as an output of

the JISC-funded DISC-UK DataShare project (2007-2009)
Edinburgh DataShare
An online institutional repository of multi-disciplinary
research datasets produced at the University of
Edinburgh, hosted by the Data Library
Researchers producing research data associated with a
publication, or which has potential use for other
researchers, can upload their dataset for sharing and
safekeeping. A persistent identifier and suggested citation
will be provided.
DataShare is a customised Dspace instance with a
selection of standards-compliant metadata fields useful
for discovery of datasets, through Google and other
search engines via OAI-PMH.
Research Data MANTRA
Partnership between:
Data Library & Institute for Academic Development
Funded by JISC Managing Research Data Programme (Sept.
2010 – Aug. 2011)
Grounded in three disciplinary contexts: social science,
clinical psychology and geoscience
Aim was to develop online interactive open learning resources
for PhD students and early career researchers that will:
• Raise awareness of the key issues related to research data
management
• Contribute to culture change & good research practice
Online learning module
Eight units with activities, scenarios and videos:
•
•
•
•
•
•
•
•

Research data explained
Data management plans
Organising data
File formats and transformation
Documentation and metadata
Storage and security
Data protection, rights and access
Preservation, sharing and licensing

Four data handling practicals: SPSS, NVivo, R, ArcGIS
Xerte Online Toolkits – University of Nottingham
• CC licence to allow manipulation of content for re-use with attribution
• Portable content in open standard formats (e.g. SCORM)
RDM Roadmap@Edinburgh
- an institutional approach
Background
Edinburgh Data Audit Framework (DAF) Implementation Project
(May – Dec 2008)

A JISC-funded pilot project produced 6 case studies from research
units across the University in identifying research data assets and
assessing their management, using DAF methodology developed
by
the Digital Curation Centre.
4 main outcomes:
•
•
•
•

Develop online RDM guidance
Develop RDM training
Develop university research data management policy
Develop services & support for RDM (in partnership IS)
Drivers
• RDM policy containing 10 aspirational statements affirming both the
researchers’ and the University’s responsibilities, e.g.
•PI responsible for RDM
•University will provide RDM support and training
•University will provide RDM services (such as back-up, storage,
deposit)
•Data retained elsewhere will be registered with the University
•RDM plans must ensure that data are available for access and re-use
under appropriate safeguards

•

UKRC’s Common Principles of Research Data Policy states:
•Publicly funded research are a public good produced in the
public interest
•Data with long terms value should be preserved & accessible
for re-use

•

UK Research Funders have all issued research data management
policies demanding institutions take care of and preserve data for
re-use
Committees
An RDM Policy Implementation Committee was set up by the
Vice Principal Knowledge Management, Professor Jeff Haywood
to implement recommendations:
•
•
•

Membership from across IS
Charged with delivering services that will meet RDM policy
objectives
Iterate with researchers to ensure services meet the needs of
researchers

The Vice Principal also established a Steering Committee led by
Prof. Peter Clarke comprising members of Research Committee
from the 3 colleges, IS, DCC and Edinburgh Research and
Innovation (ERI).
Their role is to:
•

Provide oversight to the activity of the Implementation
Committee

•

Ensure services meet researcher requirements without
harming research competitiveness
Roadmap
•EPSRC expects funded projects to have developed a roadmap
aligned with EPSRC’s RDM expectations by 1st May 2012, and to be
fully compliant with these expectations by 1st May 2015.
•The Executive Summary of the Information Services Plan, 2012-13
states, “Research data management & storage – policies, training,
curation, preservation, baseline 0.5Tb/user,” is a major IS-led
project for the year.
•The Edinburgh RDM roadmap was set out as a high level plan for its
delivery, noting objectives, outcomes, deliverables and target dates
for an 18-month period - consisting of Phase 0 planning period (May
– Sept. 2013) followed by 3 x six monthly phases up to April 2015.
Costs !
•The roadmap follows up a business case submitted to the University IT
Committee in Summer 2012 by Jeff Haywood which estimated one-off
and recurrent costs.
•In May 2013 Vice Principal announced that funding would be in the
order of £2 million split between infrastructure and RDM support and
technical personnel.
•Currently the Roadmap does not include itemised costs. Details are
currently being discussed by the Implementation Committee.
•Freemium model
Support and services for planning activities that are performed
before research data is collected or created – addresses policy
points 3, 4
Services to include:
• Tailored DMP assistance for PIs submitting research proposals
• Customised Data Management Planning tool
Facilities to store data that is actively used in current research
activities, to provide access to that storage, and tools to assist in
working with the data – addresses policy points 5, 8

Example services might include:
• Accessible cross-platform Data Store
• File Access Services (e.g. Dropbox-like)
• Data Synchronisation (e.g. mobile devices)
• Web-based Collaboration tools
• Structured Data Version Control (WebDAV)
• Central Database service
Tools and services to aid in the description, deposit, and on-going
management of completed research data outputs – addresses policy
points 6, 7, 9, 10
Services to include:
•

DataShare - repository for enhanced deposit and discovery of open
data collections generated by University researchers

•

Data archive service (Vault) - service to ensure integrity and long term
retention of golden copy research data

•

Data asset registry – a registry of research data assets (recording
location and description of data)

•

PURE Current Research Information System (central system holding
General consultancy and support service throughout the research
process – addresses policy points 1, 2, 4

Example services might include:
• Tailored awareness and advocacy activities
• Online Data Management guidance
• Training (online / F-2-F)
• Data Management consultancy
 Potential pitfalls a University could fall into:
 
To wrongly assume there can be a one size fits all requirements
checklist
 
•Requirements are too diverse and in detail are incompatible across
disciplines / sub-disciplines (size, volume, format, time)
To wrongly assume that all data is Edinburgh centric and needs to be
stored in a university repository
•For much large scale science this is inappropriate & impossible
 
To wrongly assume that all data is shareable.
 
•Raw data – probably need to preserve – but maybe not open access
as it would be practically unusable - e.g. 100 Pbytes from SKA
telescope
•Processed data – probably more useful to make open access
Current Activity
• A proactive programme to raise awareness of University and
funder policies and data management planning across the 3
Colleges
• DataShare Pilots - 3 sets of “communities and collections” set up to
determine how the data repository meets the needs of Edinburgh
researchers
• RDM Training for Liaison Librarians
• Data Asset Register
• Overall Architecture & Workflow (storage, hardware, security…)
• RDM Blog: http://datablog.is.ed.ac.uk
THANK YOU!
URLs:
Data Library services: http://www.ed.ac.uk/is/data-library
EDINA: http://edina.ac.uk/
Research data management guidance pages:
http://www.ed.ac.uk/is/research-data-management
Edinburgh University Data Policy:
http://www.ed.ac.uk/is/research-data-policy
Edinburgh Data Audit Framework (DAF) Implementation:
http://ie-repository.jisc.ac.uk/283/
Research Data MANTRA course: http://datalib.edina.ac.uk/mantra
Edinburgh University RDM Roadmap: http://tinyurl.com/km99a9l

RDM@Edinburgh

  • 1.
    EDINA & DataLibrary@Edinburgh Stuart Macdonald Visiting CISER Data Services Librarian srm262@cornell.edu
  • 2.
    EDINA & DataLibrary (EDL) • EDINA and University Data Library (EDL) together are a division within Information Services of the University of Edinburgh. • EDINA is a JISC-funded National Data Centre providing national online resources for education and research. • The Data Library assists Edinburgh University users in the discovery, access, use and management of research datasets.
  • 3.
    EDINA National DataCentre • Mission statement: “..to enhance the productivity of research, learning and teaching in UK higher and further education ..” • Networked access to a range of online resources for UK FE and HE • Services free at the point of use for use by staff and students in learning, teaching and research through institutional subscription • Focus is on service but also undertake R&D (projects  services) • delivers about 20 online services • ~10 major projects (incl. services in development) • employs about 80 staff (Edinburgh & St Helens)
  • 4.
    Data Library servicesand projects • Data Library & Consultancy • Edinburgh DataShare • JISC-funded projects – DISC-UK DataShare (2007-2009) – Data Audit Framework Implementation (2008) – Research Data MANTRA (2010-2011) – – – – AddressingHistory STEEV UK Repository Net + AQMeN
  • 5.
    Data Library &Consultancy • • • • • finding… accessing … using … teaching … managing Building relationships with researchers via postgraduate teaching activities, research support projects, IS Skills workshops, Research Data Management training and through traditional reference interviews.
  • 6.
    Edinburgh DataShare wasbuilt as an output of the JISC-funded DISC-UK DataShare project (2007-2009)
  • 7.
    Edinburgh DataShare An onlineinstitutional repository of multi-disciplinary research datasets produced at the University of Edinburgh, hosted by the Data Library Researchers producing research data associated with a publication, or which has potential use for other researchers, can upload their dataset for sharing and safekeeping. A persistent identifier and suggested citation will be provided. DataShare is a customised Dspace instance with a selection of standards-compliant metadata fields useful for discovery of datasets, through Google and other search engines via OAI-PMH.
  • 8.
    Research Data MANTRA Partnershipbetween: Data Library & Institute for Academic Development Funded by JISC Managing Research Data Programme (Sept. 2010 – Aug. 2011) Grounded in three disciplinary contexts: social science, clinical psychology and geoscience Aim was to develop online interactive open learning resources for PhD students and early career researchers that will: • Raise awareness of the key issues related to research data management • Contribute to culture change & good research practice
  • 9.
    Online learning module Eightunits with activities, scenarios and videos: • • • • • • • • Research data explained Data management plans Organising data File formats and transformation Documentation and metadata Storage and security Data protection, rights and access Preservation, sharing and licensing Four data handling practicals: SPSS, NVivo, R, ArcGIS Xerte Online Toolkits – University of Nottingham • CC licence to allow manipulation of content for re-use with attribution • Portable content in open standard formats (e.g. SCORM)
  • 10.
    RDM Roadmap@Edinburgh - aninstitutional approach
  • 11.
    Background Edinburgh Data AuditFramework (DAF) Implementation Project (May – Dec 2008) A JISC-funded pilot project produced 6 case studies from research units across the University in identifying research data assets and assessing their management, using DAF methodology developed by the Digital Curation Centre. 4 main outcomes: • • • • Develop online RDM guidance Develop RDM training Develop university research data management policy Develop services & support for RDM (in partnership IS)
  • 12.
    Drivers • RDM policycontaining 10 aspirational statements affirming both the researchers’ and the University’s responsibilities, e.g. •PI responsible for RDM •University will provide RDM support and training •University will provide RDM services (such as back-up, storage, deposit) •Data retained elsewhere will be registered with the University •RDM plans must ensure that data are available for access and re-use under appropriate safeguards • UKRC’s Common Principles of Research Data Policy states: •Publicly funded research are a public good produced in the public interest •Data with long terms value should be preserved & accessible for re-use • UK Research Funders have all issued research data management policies demanding institutions take care of and preserve data for re-use
  • 13.
    Committees An RDM PolicyImplementation Committee was set up by the Vice Principal Knowledge Management, Professor Jeff Haywood to implement recommendations: • • • Membership from across IS Charged with delivering services that will meet RDM policy objectives Iterate with researchers to ensure services meet the needs of researchers The Vice Principal also established a Steering Committee led by Prof. Peter Clarke comprising members of Research Committee from the 3 colleges, IS, DCC and Edinburgh Research and Innovation (ERI). Their role is to: • Provide oversight to the activity of the Implementation Committee • Ensure services meet researcher requirements without harming research competitiveness
  • 14.
    Roadmap •EPSRC expects fundedprojects to have developed a roadmap aligned with EPSRC’s RDM expectations by 1st May 2012, and to be fully compliant with these expectations by 1st May 2015. •The Executive Summary of the Information Services Plan, 2012-13 states, “Research data management & storage – policies, training, curation, preservation, baseline 0.5Tb/user,” is a major IS-led project for the year. •The Edinburgh RDM roadmap was set out as a high level plan for its delivery, noting objectives, outcomes, deliverables and target dates for an 18-month period - consisting of Phase 0 planning period (May – Sept. 2013) followed by 3 x six monthly phases up to April 2015.
  • 15.
    Costs ! •The roadmapfollows up a business case submitted to the University IT Committee in Summer 2012 by Jeff Haywood which estimated one-off and recurrent costs. •In May 2013 Vice Principal announced that funding would be in the order of £2 million split between infrastructure and RDM support and technical personnel. •Currently the Roadmap does not include itemised costs. Details are currently being discussed by the Implementation Committee. •Freemium model
  • 17.
    Support and servicesfor planning activities that are performed before research data is collected or created – addresses policy points 3, 4 Services to include: • Tailored DMP assistance for PIs submitting research proposals • Customised Data Management Planning tool
  • 18.
    Facilities to storedata that is actively used in current research activities, to provide access to that storage, and tools to assist in working with the data – addresses policy points 5, 8 Example services might include: • Accessible cross-platform Data Store • File Access Services (e.g. Dropbox-like) • Data Synchronisation (e.g. mobile devices) • Web-based Collaboration tools • Structured Data Version Control (WebDAV) • Central Database service
  • 19.
    Tools and servicesto aid in the description, deposit, and on-going management of completed research data outputs – addresses policy points 6, 7, 9, 10 Services to include: • DataShare - repository for enhanced deposit and discovery of open data collections generated by University researchers • Data archive service (Vault) - service to ensure integrity and long term retention of golden copy research data • Data asset registry – a registry of research data assets (recording location and description of data) • PURE Current Research Information System (central system holding
  • 20.
    General consultancy andsupport service throughout the research process – addresses policy points 1, 2, 4 Example services might include: • Tailored awareness and advocacy activities • Online Data Management guidance • Training (online / F-2-F) • Data Management consultancy
  • 21.
     Potential pitfalls aUniversity could fall into:   To wrongly assume there can be a one size fits all requirements checklist   •Requirements are too diverse and in detail are incompatible across disciplines / sub-disciplines (size, volume, format, time) To wrongly assume that all data is Edinburgh centric and needs to be stored in a university repository •For much large scale science this is inappropriate & impossible   To wrongly assume that all data is shareable.   •Raw data – probably need to preserve – but maybe not open access as it would be practically unusable - e.g. 100 Pbytes from SKA telescope •Processed data – probably more useful to make open access
  • 22.
    Current Activity • Aproactive programme to raise awareness of University and funder policies and data management planning across the 3 Colleges • DataShare Pilots - 3 sets of “communities and collections” set up to determine how the data repository meets the needs of Edinburgh researchers • RDM Training for Liaison Librarians • Data Asset Register • Overall Architecture & Workflow (storage, hardware, security…) • RDM Blog: http://datablog.is.ed.ac.uk
  • 23.
    THANK YOU! URLs: Data Libraryservices: http://www.ed.ac.uk/is/data-library EDINA: http://edina.ac.uk/ Research data management guidance pages: http://www.ed.ac.uk/is/research-data-management Edinburgh University Data Policy: http://www.ed.ac.uk/is/research-data-policy Edinburgh Data Audit Framework (DAF) Implementation: http://ie-repository.jisc.ac.uk/283/ Research Data MANTRA course: http://datalib.edina.ac.uk/mantra Edinburgh University RDM Roadmap: http://tinyurl.com/km99a9l

Editor's Notes

  • #2 All urls and links will be available on the last slide
  • #3 25 years ago disk storage - expensive researchers interested in working with data came together to petition the PLU and the University’s Library – wanting a university-wide provision for files that were too large to be stored on individual computing accounts Early holdings were research data from universities of edinburgh, glasgow, and strathclyde
  • #4 UKBORDERS Digimap Collection Go-Geo! Agcensus Moving pictures and sound services - EIG, newsFilm Online Plus A&I databases The Depot HILT GetRef LOCKSS PePRS
  • #6 Primarily social sciences but not exclusively so, large scale government surveys (micro data), macro-economic time series data (country-level data), Elections studies, Geospatial data, financial datasets, population census data Free on internet / subscription / through national data centres/archives / resource discovery portals Registration / authorisaiton and authentication / special conditions / budget to pay for data SPSS, STATS, SAS, R, ArcGIS – interpret documentaiton/codebooks, merge and match users data with other data (via look-up tables), subset data Data Catalogue
  • #9 Funded by JISC as part of its UK programme, Managing Research Data to develop online learning materials to assist researchers manage their digital assets. IAD – set up to deliver training and development for postgraduate students and staff – via online course, Virtual Learning Environments, transferable skills training
  • #12 Training for postgraduates and early career researchers  These  were  the  School  of  Divinity,  School  of  History,  Classics  and  Archaeology),  School of Biomedical Sciences),  (School  of  Molecular  and  Clinical  Medicine),   (School  of  Physics  and  Astronomy).  Also,  the  School  of  Geosciences
  • #13 Publicly funded research area public good produced in the public interest; data with long-term value should be preserved & accessible for re-use; metadata for searching and description/context; embargo for privileged use by researcher; data citation/attribution
  • #14 Funders have policies, responsibilities fall to the university as well as the researcher Researchers are mobile Institution and researcher must work together, define the responsibilities Awareness raising within university of practicalties
  • #15 1.6PB storage – allocated to Schools / research institutes / research groups
  • #19 To support most research use-cases, provide off-site back-up ECDF – cross-platform storage, HPC file storage, back-up services, version control & software source code store
  • #20 Golden Copy data
  • #21 IT Business Support (IT Consultancy) Library Academic Support (Liaison Librarians)
  • #23 School of Psychology Philosophy and Language Sciences Roslin Institute (human and animal genetics) Centre for Speech Technology Research