Because good research needs good dataSupporting Research Data Management at the University of Stirling Graham Pryor and Martin Donnelly Digital Curation Centre 27 April 2012 Funded by This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
The Digital Curation Centre is• a consortium comprising units from the Universities of Bath (UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII)• launched 1st March 2004 as a national centre for solving challenges in digital curation that could not be tackled by any single institution or discipline• funded by JISC• with additional HEFCE funding from 2011 for • the provision of support to national cloud services • targeted institutional development
The DCC Mission Helping to build capacity, capability and skills in data management and curation across the UK’s higher education research community – DCC Phase 3 Business Plan
DCC institutional stakeholders University managers Researchers • University libraries Research support • IT services staff with a role to play • The research and in data management, innovation office particularly those from • Digital repositories
Why manage research data?The impact of e-Science and the global network• “Research data is a form of infrastructure, the basis for data intensive research across many domains” – EC Riding the Wave report, 2010• “Funders expect research to be international in scope. A third of all articles published are internationally collaborative” – Royal Society, 2011The governmental and funder imperative• “Publicly-funded research data must be made available for secondary scientific research” – ESRC research data policy
Why manage research data?The researcher incentive• “By making their data available via licensed platforms researchers stand to improve their status as researchers through the mandatory citing and attribution of their original work” – Mark Hahnel, FigShare, IDCC 2011
Why manage research data?The researcher incentive• “By making their data available via licensed platforms researchers stand to improve their status as researchers through the mandatory citing and attribution of their original work” – Mark Hahnel, FigShare, IDCC 2011The same demanding, sometimes competingcommunity of perspectives that the Digital CurationCentre was created to unravel…
Where is the data in research? The six datacentric phases of the research lifecycle
Three perspectives Scale and complexity – Volume and pace – Infrastructure – Open science Policy – Funders – Institutions – Ethics & IP Management – Storage – Incentives – Costs & Sustainability http://www.nonsolotigullio.com/effettiottici/images/escher.jpg/
The data deluge “Surfing the Tsunami” Science: 11 February 2011
Challenges of scale and complexity– transformation and globalisation
“For science to effectively function,and for society to reap the fullbenefits from scientific endeavours, http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.htmit is crucial that science data be l#november-2009made open”
Open to all? Case studies of opennessin researchChoices are made according to context, withdegrees of openness reached according to:• The kinds of data to be made available• The stage in the research process• The groups to whom data will be made available• On what terms and conditions it will be providedDefault position of most:• YES to protocols, software, analysis tools, methods and techniques• NO to making research data content freely available to everyoneAfter all, where is the incentive? Angus Whyte, RIN/NESTA, 2010
“Data sharing was“While many researchers are more readilypositive about sharing data indiscussed byprinciple, they are almost early careeruniversally reluctant in researchers.”practice. ..... using thesedata to publish results beforeanyone else is theprimary way of gainingprestige in nearly alldisciplines.” INCREMENTAL Project
Rules and regulations… Compliance Data Protection Act 1998 • Rights, Exemptions, EnforcementFreedom of • Climategate, Tree Rings, TobaccoInformation Act 2000 and…(what’s next?)Computer Misuse Act 1980 • etc. etc. etc………..
Policy• Public good• Preservation• Discovery• Confidentiality• First use• Recognition• Public funding
RCUK Policy and Code of Conduct on the Governance of Good Research Conduct (updated Oct 2011)UNACCEPTABLE RESEARCH CONDUCT includes mismanagement orinadequate preservation of data and/or primary materials, including failureto: keep clear and accurate records of the research procedures followed and the results obtained, including interim results; hold records securely in paper or electronic form; make relevant primary data and research evidence accessible to others for reasonable periods after the completion of the research: data should normally be preserved and accessible for 10 yrs (in some cases 20 yrs or longer); manage data according to the research funder’s data policy and all relevant legislation; wherever possible, deposit data permanently within a national collection.Responsibility for proper management and preservation of data and primarymaterials is shared between the researcher and the research organisation.
EPSRC’s nine expectations and a roadmap - implications for HEIshttp://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectations.aspx
Regulation, regulation… …….addressing where European copyright and database law poses flaws and obstacles to the access to research data Intellectual Property Rights and Digital Preservation 21.11.2011 at the Clifton Hill House, Bristol University “a poor fit between technology, processes and regulations constrains preservation actions and significantly inhibits the benefits which long-term access ought to deliver”
Management – infrastructure and data storage challenges...ScaleableCost-effective (rent on-demand)Secure (privacy and IPR)Robust and resilientLow entry barrier / ease-of-useHas data-handling / transfer /analysis capabilityCloud services?The case for cloud computing in genomeinformatics. Lincoln D Stein, May 2010
http://www.flickr.com/photos/mattimattila/3003324844/ “Departments don’t have guidelines or norms for personal back-up and researcher procedure, knowledge and diligence varies tremendously. Many have experienced moderate to catastrophic data loss”Incremental Project Report, June 2010
Management - incentivisation,recognition and reward
Help desk:0131 651 firstname.lastname@example.org
DCC Institutional Support: Tools and Services Martin Donnelly Digital Curation Centre University of Edinburgh University of Stirling 27 April 2012
Institutional EngagementsWith funding from HEFCE we’re:• Working intensively with 18 HEIs to increase RDM capability – 60 days of effort per HEI drawn from a mix of DCC staff – Deploy DCC & external tools, approaches & best practice• Support varies based on what each institution wants/needs• Lessons & examples to be shared with the communitywww.dcc.ac.uk/community/institutional-engagements
Some current IE activities Assessing Piloting tools needs e.g. DataFlow RDM roadmaps Policy Policydevelopment implementation
Support offered by the DCC InstitutionalAssess data cataloguesneeds Workflow assessment Pilot RDM tools Develop DAF & CARDIO DCC assessments Guidance support support team and training and services RDM policy Advocacy to senior development management Customised Data Make the case Management Plans …and support policy implementation
DATA MANAGEMENT STRATEGY (Research and Admin) Five components: • Policy • Advocacy • Planning • Tools • Training
Your Data as Assets: DAF• What are the characteristics of research data assets? – Number? – Scale? – Complexity? – Dependencies? – Liabilities?• Why do researchers act the way they do with respect to data?• What do they need to do research?
IN BRIEFThe Data Asset Framework provides a methodologyand online tool to identify research data assets andfind out how they are being managed. Thisinformation will enable institutions to develop a datastrategy so their assets are preserved and remainaccessible in the long term. It is usually applied atresearch group / department level to ensure thescope is manageable.URL: http://www.data-audit.eu
Data Management Planning: DMP Online• A growing requirement from funders, publishers and HEIs, in the UK and internationally• Supportive of good research practice, according to RCUK• A cross-cutting activity involving multiple stakeholder types (researchers, librarians, IT managers, support staff)
IN BRIEFDMP Online is the DCCs web-based datamanagement planning tool. It allows you to build andedit DMPs according to the requirements of themajor UK funders.The tool also contains helpful guidance and links forresearchers and other data professionals. Thestructure of the tool is based on the DCC’s Checklistfor a Data Management Plan.URL: http://www.dcc.ac.uk/dmponline
Capacity Assessment and Building: CARDIO• How well does an institution (or department, School, etc) manage its data?• Depends on: – Finances – Technology – Policy management – Organisational will• Demands acknowledgement of many perspectives
IN BRIEFAn online tool which helps departments or researchgroups to identify and communicate their current datamanagement capabilities, and subsequently identifycoordinated pathways for future enhancement via adedicated knowledge base.CARDIO emphasises a collaborative, consensus-driven approach, and enables benchmarking withother groups and institutions.URL: http://cardio.dcc.ac.uk/
Risk Management: DRAMBORA • A variety of risk factors, both internal and external, affect the management of digital objects such as research data • Risks can tangible (fire/flood) or intangible (accidental data loss leading to reputational impact) • They may exist in isolation, or lead to other risks if not adequately managed
IN BRIEFDRAMBORA is an audit methodology and tool foridentifying and planning for the management of riskswhich may threaten the availability and/or usability ofcontent in a digital repository or archive.URL: http://www.repositoryaudit.eu
DCC Services• Policy• Strategy• Training• Other services…
Policy (i)The DCC has a number of guidance resources related toresearch data policy. We can guide institutions on theirrequirements to manage/share data, and offer practicalsteps to help them develop data policies by:- Providing templates and examples to demonstrate what aspects could be incorporated into a data policy;- Coordinating / contributing to meetings of relevant stakeholders to ensure all activities and perspectives are addressed;- Reviewing and feeding back on draft policies;- Assisting with communications to launch and implement the policy.
Policy (ii)Benefits of developing a data policy:- Compliance with funder guidelines, e.g. the EPSRC expectation that HEIs have a RDM roadmap in place by May 2012, and be fully compliant by May 2015;- Assuring the good conduct of research in line with Research Integrity guidelines (see RCUK & UKRIO docs);- Clarity for researchers and demonstrable institutional commitment for RDM;- The prestige of joining a small but growing group of leading institutions with a data policy: http://www.dcc.ac.uk/resources/policy-and- legal/institutional-data-policies
Strategy (i)We offer a half-day workshop in which key stakeholdersfrom an institution (e.g. librarians, senior IT staff, researchadministration, repository staff, researchers, etc) conveneto discuss and develop an institutional strategy for RDM.Benefits:- Coherence across service providers and agreed direction for RDM services;- Ability to reference strategy / commitment to RDM (the University of Oxford policy may be a useful example of this - http://www.admin.ox.ac.uk/rdm);- A move towards more efficient management of data.
Strategy (ii)Through practical breakout sessions, senior DCC staff canlead and mediate discussion to help the institutiondetermine its priorities and define practical next steps.These might include the development of infrastructure (e.g.data repositories), new services (e.g. DMP support), policydevelopment, improved guidance or data managementtraining provision.Suggested actions will depend on gaps/areas forimprovement as perceived by the institution.
Training (i)We offer a variety of training courses:- DC101 introduction to data management- Tools of the Trade courses which give practical overviews and hands-on exercises using DCC tools- Train-the-Trainer, which equips information professionals to teach RDM courses.We also organise regional data management roadshowevents which can incorporate a training element.Generic training materials are available online, andhardcopy packs can be produced.
Training (ii)The DCC can:- Run courses, tailoring content to institutional needs;- Assist in the development of online learning materials (screencasts, audio-synced slides);- Develop resources such as guidance documents, case studies and manuals.Key benefits of training provision are:- Improved data management capacity;- The opportunity to profile and raise awareness of institutional support services.
Other services...CARDIO Used at research group or department level to assess activity and data management infrastructure and contribute to an institution-wide viewData Asset Framework DAF is a structured mechanism used to identify what data exists and understand how research data are being managed and sharedCustomised DMP We can work with you to develop an institution-specific instance of DMP Online for developing data management plans that fit funder requirements before and after an award of grantPolicy development We can assist in the development of institutional policyWorkflow assessment Using tested methodologies we can analyse current research data workflowsTraining We can train people in the use of many of the above tools and in generic skills such as data quality assessmentCosting We can assist with the development of costing and pricing for data management servicesRisk management Working with you to identify risks in current or planned research data management practice, we will make recommendations on mitigation and the elimination of those risksInstitutional data We can recommend options for exposing metadata about your research data via CRIS systems, repositories, or a mix of thesecatalogues
Recap: support offered by the DCC InstitutionalAssess data cataloguesneeds Workflow assessment Pilot RDM tools Develop DAF & CARDIO DCC assessments Guidance support support team and training and services RDM policy Advocacy with senior development management Customised Data Make the case Management Plans …and support policy implementation
Practicalities• University Modernisation Fund provides resource for 18 “institutional engagements” between DCC and HEIs• Up to 60 days of effort available per institution, between now and March 2013• Institution agrees a schedule of work with the DCC, and each assigns a primary contact / programme manager