Meeting the Research Data Challenge 12th October 2011, 12.00 – 13.00 #jiscmrd #jiscwebinar10/11/2011 JISC Webinar slide 1
Meeting the Research Data Challenge Sarah Porter10/11/2011 JISC Webinar: slide 2
Presentation outline The role of JISC in research support The data challenge Drivers for good management of research data10/11/2011 Wellcome Collection Conference Centre slide 3
Why JISC? JISC’s role in research support National infrastructure services for research such as the JANET network, data centres to host published resources, repository infrastructure – The provision complements that of other stakeholders – Research funders - Research Councils, Funding Councils – Data hosts e.g. National Data Centres JISC supports universities and colleges to make effective and efficient use of technology – in research and in the management of research Key themes: Increasing the impact and visibility of research Increasing research competitiveness Management of research information Collaboration with business and the community Improved management of research data.10/11/2011 slide 4
The challenge of data – Volume – Diversity (what is data, anyway?) – ‘Long tail’ – Drivers for data management not well understood – complex picture due to range of funders’ policies, other policies at multiple levels (European, UK, each research council, each institution) – Good management practice not yet well understood • so not embedded into research practice – Institutional roles and responsibilities may be unclear – Responsibility for meeting costs not yet established.10/11/2011 slide 5
Drivers to improve research data management Considerations for research integrity Research Funder Policies Freedom of Information / Environmental Information Regulations Benefits of data reuse and improved research data management (including Research Excellence Framework)10/11/2011 slide 6
Drivers: Research Integrity UK Research Integrity Office Code of Practice for Research: Promoting good practice and preventing misconduct, September 2009 Data management planning is an essential part of research design [3.4.1.c; also 3.12.6] Section 3.12 covers collection AND RETENTION of research data. Organisations and researchers should ensure that research data relating to publications is available for discussion with other researchers, subject to any existing agreements on confidentiality. [3.12.1] Organisations should have in place procedures, resources (including physical space) and administrative support to assist researchers in the accurate and efficient collection of data and its storage in a secure and accessible form. [3.12.5] Due regard to privacy, confidentiality and ethical issues. Research integrity requires addressing these issues in order to make data as ‘shareable’ as possible.10/11/2011 slide 7
Drivers: Funders’ Policies Research funders’ policies form an important part of the research data ecology. In common with international developments, requirements are becoming increasingly exacting. Many policy statements reference the OECD Principles and Guidelines for Access to Research Data from Public Funding: http://www.oecd.org/dataoecd/9/61/38500813.pdf NSF recently added the requirement of a data management plan to grant proposals: http://www.arl.org/rtl/eresearch/escien/nsf/index.shtml Health Research Funders’ ‘Joint Statement of Purpose: Sharing research data to improve public health’: http://www.wellcome.ac.uk/About-us/Policy/Spotlight- issues/Data-sharing/Public-health-and-epidemiology/WTDV030690.htm – making research data sets available to investigators beyond the original research team in a timely and responsible manner, subject to appropriate safeguards, will generate three key benefits: • faster progress in improving health; • better value for money; • higher quality science.10/11/2011 slide 8
Joint RCUK Policy RCUK Common Principles on Data Policy: http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx 1. Publicly funded research data are a public good, produced in the public interest and therefore should be made as openly available as possible; 2. Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice; 3. Sufficient metadata should be recorded and made openly available to enable other researchers to understand the research and re-use potential of the data; 4. Legal, ethical and commercial constraints on release of research data must be recognised; 5. Recognition and ‘reward’ for managing and sharing research data are essential, and so limited embargo periods on the release of data are acceptable; 6. All users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed; 7. It is appropriate to use public funds to support the management and sharing of publicly-funded research data, but this should be done in as cost-effective and efficient way as possible. Infrastructure implications to be inferred rather than directly stated?10/11/2011 slide 9
Drivers: Funders’ Policies New MRC policies on research data management and sharing being prepared; tested and refined; guidance produced as part of a JISC funded project. BBSRC Statement, April 2007, updated June 2010: http://www.bbsrc.ac.uk/web/FILES/Policies/data-sharing-policy.pdf – Requires statement on data sharing. New ESRC policy now in vigour: http://www.esrc.ac.uk/about- esrc/information/data-policy.aspx – Introduces the requirement of a data management and sharing statement (J-eS) and a data management and sharing plan as part of the grant submission10/11/2011 slide 10
Drivers: Funders’ Policies (EPSRC)Responsibility: EPSRC has a Policy Framework stating expectations concerning the Management of and Access to EPSRC-funded Research Data. Places responsibility with institutions, departments and centres in receipt of EPSRC funding to show they can manage and preserve data to adequate standards.Appropriate division of costs: EPSRC believes that where research has been publicly-funded it is reasonable and appropriate to use public funds to also fund the associated data management costs. EPSRC therefore expects research organisations to make appropriate provision from within public research funding received, making use of both direct and indirect funding streams as appropriate. http://www.epsrc.ac.uk/about/standards/researchdata/Pages/responsibility.aspx10/11/2011 slide 11
Drivers: Freedom of Information and Environmental Information requests Research data can be subject to Freedom of Information / Environmental Information requests: UEA and Queen’s University Belfast cases. Guidance available at JISC Q&A on ‘Freedom of Information and Research Data’: http://www.jisc.ac.uk/publications/programmerelated/2010/foir esearchdata.aspx Indicative research on numbers of FoI requests for research data: sample of 21 Universities, received total of 40 FoI requests for research data from 2007-10. –Wide variance in distribution 12 universities received 0; 1 received 8; another 9. –All but six were from 2009 and 2010; • Indicates a growing trend.10/11/2011 slide 12
Driver: preparation for Research Excellence Framework submissions in 2013 Good data management practice improves and reduces the burden of data collection for institutions – Need to embed practices into key roles – researchers, research managers, administrators. Demonstrate the contribution that research makes to the economy and society (impact) Opening up data provides one level of increased opportunity for ‘citizen science’, etc. Can be aided by research information management systemsJISC has funded universities to demonstrate the benefits of using the Common European Research Information Format (CERIF) to manage research information – The cost of use is more than offset by efficiency savings. Research management ‘Shared Service’ being developed for April 2012.10/11/2011 slide 13
Wednesday 12 October 2011 JISC WebinarMeeting the Research Data ChallengeSimon HodsonProgramme Manager, Managing Research Data, Digital Infrastructure Team
Responding to the drivers? How can universities respond to these drivers? What is JISC doing to help?
Supporting the Research Data Lifecycle Store Plan Annotate Reuse CreateAccess Discover Use Describe Select Publish Appraise Identify Hand Over? Discard
Supporting the Research Data Lifecycle Guidance and Store Policy Plan Development Annotate Reuse Create Publication and Training and Citation Information MechanismsAccess Discover Use Describe RDM Systems Support for Data Select andPublish Appraise Management Infrastructure Planning Identify Hand Over? Discard
Wednesday 12 September 2011 JISC WebinarMeeting the Research Data ChallengeAdvice and guidance.Training materials.Data management planning.Research data management systems and infrastructure.Making the case: recognition, rewards, benefits.
DCC’s Data Management Roadshows Regional Data Management Roadshows. http://www.dcc.ac.uk/events/data- management-roadshows Next: Cambridge, 9-11 November http://www.dcc.ac.uk/events/data- management-roadshows/dcc- roadshow-cambridge Then: Cardiff, 14-16 December Blog on Oxford Roadshow: http://www.dcc.ac.uk/news/review- dcc-roadshow-oxford
Institutional Research Data Management Policies University of Edinburgh Research Data Management Policy: http://www.ed.ac.uk/schools-departments/information- services/about/policies-and-regulations/research-data-policy University of Oxford Commitment to Research Data Management: http://www.ict.ox.ac.uk/odit/projects/datamanagement/ University of Hertfordshire: http://research-data- toolkit.herts.ac.uk/?p=11 See DCC on institutional data management policies: http://www.dcc.ac.uk/resources/policy-and-legal/institutional- data-policies
Guidance Materials (JISCMRD Programme) Sudamih Project: http://sudamih.oucs.ox.ac. uk/ Oxford Research Data Management Pages (EIDCSR Project): http://www.admin.ox.ac.uk /rdm/ Training Materials for Humanities Scholars – delivered as part of central Humanities Division IT training courses: http://sudamih.oucs.ox.ac. uk/documents.xml
Guidance Materials (JISCMRD Programme) Incremental Project, collaboration between Glasgow and Cambridge, concentrated on providing guidance and training materials at an institutional level; focus on arts and humanities, social sciences, archaeology, social anthropology: http://www.lib.cam.ac.uk/preservation/incremental/in dex.html Cambridge Website: www.lib.cam.ac.uk/dataman/ Glasgow Website: www.gla.ac.uk/datamanagement/ Workshops and Seminars: http://www.lib.cam.ac.uk/preservation/incremental/se minars.html – Series at CRASSH covering: ethics, FoI, IPR, new technologies. – Series at Glasgow covering: performing arts and archaeology. Interviews from Seminars: – http://www.lib.cam.ac.uk/dataman/training.html#Intervie ws – http://www.gla.ac.uk/services/datamanagement/training /videos/ Incremental Project Blog: http://incrementalproject.wordpress.com/
DCC How-To Guides DCC How-To Guides: http://www.dcc.ac.uk/resou rces/how-guides – Appraise and select research data for curation – How to license research data – How to develop a data management and sharing plan Further Guides in preparation.
JISCMRD Training Projects Need for subject focussed research data management / curation training, integrated with PG studies Five projects to design and pilot (reusable) discipline-focussed training units for postgraduate courses: http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmtrain.aspx Health studies: http://www.northumbria.ac.uk/sd/academic/ceis/re/isrc/themes/rmarea/datum/ Creative arts: http://www.projectcairo.org/ Archaeology, social anthropology: http://www.lib.cam.ac.uk/preservation/datatrain/ Psychological sciences: http://www.dmtpsych.york.ac.uk/ Social sciences, geographical sciences, clinical psychology: http://bit.ly/RDMantra DaMSSI Support Project: http://www.rin.ac.uk/our-work/researcher-development-and- skills/data-management-and-information-literacy
ERIM Project Data Management Planning for engineering and manufacturing research, IdMRC and UKOLN, Bath: http://www.ukoln.ac.uk/projects/erim/ Data very heterogeneous: data type, conditions of use etc. Review of the State of the Art of the Digital Curation of Research Data. Report on Understanding and Characterizing Engineering Research Data for its Better Management: included detailed Research Activity Information Development modeling. Draft IdMRC Projects Data Management Plan; Requirements for a RAID associative tool. Principle: interventions should result in ‘a zero net resource requirement increase’; i.e. data management needs to be supported by appropriate tools, or balanced by immediate benefits. Role of data manager in research centres needs to be examined closely.
DMP-ESRC Project Led by UK Data Archive: http://www.data-archive.ac.uk/create- manage/projects/jisc-dmp Study of data management practices in ESRC funded Centres and Programmes. Data Management Recommendations for Research Centres and Programmes: http://www.data- archive.ac.uk/media/257765/ukdadatamanagementrecommendations_centrespro grammes.pdf – Clear roles and responsibilities; RDM coordinator; Data Inventory; Data Management Resources Library. – Recommendations and guidelines on anonymisation, security and backup etc. Data Management Costing Tool: http://www.data- archive.ac.uk/media/257647/ukda_jiscdmcosting.pdf
RDM Platforms and Infrastructure FISHnet Project, freshwater biology: http://www.fishnetonline.org/ MaDAM Project, biomedical research in an institutional context: http://www.merc.ac.uk/?q=MaD AM
JISC UMF Shared Services and Cloud Programme Strand A: Shared IT Infrastructure: http://www.jisc.ac.uk/whatwedo/programmes/umf.aspx JANET(UK) brokerage to create trusted cloud(s) for HE. Pilot Cloud provided by Eduserv. Augment the role of DCC (in part to deploy tools in the cloud). ‘Killer RDM Apps’ developed to be deployed as Software as a Service.
RDM SaaS Applications VIDaaS (Virtual Infrastructure for Database as a Service), University of Oxford: http://vidaas.oucs.ox.ac.uk/ DataFlow, University of Oxford: http://www.dataflow.ox.ac.uk/ Smart Research Framework, University of Southampton: http://www.mylabnotebook.ac.uk/ Biomedical Research Infrastructure (BRISSkit), University of Leicester
Financial SavingsOXREP case study:Estimated research savings during 2010 = 21%Estimated data hosting savings during 2010 = 37% (just central VI, not cloud hosted) Comparison of DaaS hosting costs: Single physical server running 30 2GB database instances = £125 Oxford VM running on local VI with 100 2GB instances = £79 Oxford VM running on local VI with 100 8GB instances = £109 Eduserv VM running on VI with 500 8GB instances = £76-98 Amazon VM with 8GB instances = £660-744
Making the Case: recognition, rewards, benefits Data Citation – DCC how to guide on data citation (in preparation) – DCC Briefing Paper on Data Citation and Linking: http://www.dcc.ac.uk/resources/briefing-papers/introduction- curation/data-citation-and-linking – BL is a founding member of DataCite – Currently have DataCite user group; will be extending this and working with JISCMRD Projects
Dryad: a repository for supporting research data Joint declarations, Feb 2010, in American Naturalist, Evolution, the Journal of Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in evolution and ecology: http://www.journals.uchicago.edu/doi/full/10.1086/650340 This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Allows embargos of up to one year; allows exceptions for, e.g., sensitive information such as human subject data or the location of endangered species. Data that have an established standard repository, such as DNA sequences, should continue to be archived in the appropriate repository, such as GenBank. For more idiosyncratic data, the data can be placed in a more flexible digital data library such as the National Science Foundation-sponsored Dryad archive at http://datadryad.org.
Dryad-UK: a repository for supporting research dataDryad-UK Expand the number of journals: BMJ Open, titles from PLoS and BioMed Central: Prepare a business model for long term funding of the data repository: supported by payments from journals, in turn recouped from subscription or author-pays OA fees.Benefits? Benefits for researchers: indications that publishing data increases citation rates – Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 (cancer microarray clinical trial publications). – Piwowar ongoing work e.g. http://researchremix.wordpress.com/2011/02/18/early_re sults/ (citation, reuse of data from Gene Expression Omnibus).
Incentives and Benefits Research Data Management Forum, 2-3 November, University of Warwick: http://www.dcc.ac.uk/events/research- data-management-forum/rdmf7-incentivising-data- management-sharing Making the Case for RDM, DCC Briefing Paper: http://www.dcc.ac.uk/resources/briefing-papers/making-case- rdm Report on the Benefits from the Infrastructure Projects in the JISC Managing Research Data Programme: http://www.jisc.ac.uk/whatwedo/programmes/mrd/outputs/ben efitsreport.aspx
JISC Managing Research Data Programme JISC Managing Research Data Programme, Outputs: http://www.jisc.ac.uk/whatwedo/programmes/mrd/outputs.aspx Second JISC Managing Research Data Programme, Google Map of funded projects: http://maps.google.co.uk/maps/ms?msid=2104934568561360 57364.0004ab687f5a25636a285&msa=0 Call for Proposals on research data publications/citation and on training planned for the New Year.