EPSRC research data expectations and research software management
EPSRC research data expectations and
research software management
Associate Data Librarian
University of Edinburgh
Research Committee Meeting, School of Mathematics, University of Edinburgh, 25 Jan. 2016
What is research data?
Research data is defined by EPSRC as recorded factual material commonly used in
the scientific community as necessary to validate research findings
Although the majority of such data is created in digital format, all research data is
included irrespective of the format in which it is created.
Note that EPSRC does not expect every piece of data produced during a project to
be retained – decisions about what to keep should be taken on a case by case
There is however a clear expectation that data which underpins published research
outputs will be retained and managed.
• EPSRC have introduced a policy framework concerning the management
and provision of access to publicly-funded research data.
• EPSRC Principal Investigators and the University must demonstrate to
EPSRC that their expectations are being met. The 9 expectations are
detailed at: http://www.epsrc.ac.uk/about/standards/researchdata/
• EPSRC began monitoring compliance on 1st May 2015 on a case-by-case
basis. If it judges sharing of research data is being obstructed then it
reserves the right to impose sanctions.
The expectations arise from 7 core principles which align with
the core RCUK principles on data sharing, namely:
• EPSRC-funded research data is a public good produced in the public
interest and should be made freely and openly available with as few
restrictions as possible in a timely and responsible manner.
• EPSRC recognises that there are legal, ethical and commercial constraints
on release of research data
• Sharing research data is an important contributor to the impact of publicly
• EPSRC-funded researchers should be entitled to a limited period of
privileged access to the data they collect to allow them to work on and
publish their results.
• Data management policies and plans should be in accordance with
relevant standards and community best practice and should exist for all
• Sufficient metadata should be recorded and made openly available to
enable other researchers to understand the potential for further research
and re-use of the data
• It is appropriate to use public funds to support the preservation and
management of publicly-funded research data.
What do PIs and researchers need to
• All researchers or research students funded by EPSRC will be required to
comply with these expectations.
• Data that is not generated in digital format will be stored in a manner to
facilitate it being shared in the event of a valid request for access.
• A link to digital research data is expected to be included in the metadata.
• Where access to data is restricted published metadata should give the
reason and summarise the conditions which must be satisfied for access
to be granted.
• Key expectation 1: The data should be securely stored for at least 10 years
• Key expectation 2: An online record should be created within 12 months of
the data being generated that describes the research data and how to
• Key expectation 3: Published research papers should include a short
statement describing how and on what terms any supporting research
data may be accessed.
What do PIs and researchers need to do?
• Research data that underpins a publication must be stored safely and securely, and
• Data may already be managed by a trusted domain archive outside of the university,
in which case data may not need to be stored locally.
• If not then data must be stored in a suitable University of Edinburgh storage
solution. Minimal compliance is achieved by having your data on DataStore and
then making a secure copy of it into the Data Vault (this service is currently in
• For those who wish to openly publish data (and a snapshot of their research
software), Edinburgh DataShare is the university’s open online digital repository of
data produced by local researchers (policies, DOI, licence, citation).
• Datasets added to DataShare will be allocated persistent identifiers (DOIs) for
Key expectation 1: store data securely
• The University is using PURE to record descriptive data (metadata) about
the research data in order to meet this expectation. Research staff are
therefore expected to add a metadata record for any EPSRC-funded
research data into PURE, normally within 12 months of the data being
• To enter a new dataset description in PURE, click on the green ‘Add new’
button, and select ‘Dataset’.
• Once added to PURE via the dataset content type, the resulting record
should link to the funding source and also link to any associated
Key expectation 2: a record describing
the data must be freely available online
• If the dataset is available online, for example in DataShare, the URL (or
DOI) of that dataset should also be added.
• Where access to the data is to be restricted, the published dataset
metadata in PURE should give the reason and summarise the conditions
which must be satisfied to grant access.
• Dataset metadata added to PURE will ultimately be publicly accessible via
the Edinburgh Research Explorer subject to confidentiality and other such
• EPSRC state that this expectation ‘could be satisfied by citing such data in the
published research and including in such citations direct links to the data or to
supporting documentation that describes the data in detail, how it may be
accessed and any constraints that may apply.’ Such links should be persistent URLs
such as DOIs.
• An example of a basic data citation would be of the form: ‘Creator (Publication
Year): Title. Publisher. DOI’ Further details can be found at:
• If commercial, legal or ethical reasons exist to protect access to the data these
should be noted in a statement included in the published research paper. A simple
direction to interested parties to ‘contact the author for access’ may not be
• The paper must also be made Open Access in PURE.
Key expectation 3: include a statement in
published papers under-pinned by EPSRC-funded data
Does research data include software?
This depends on the research which is being carried out.
As noted in the definition of research data, the deciding factor is whether the software is
necessary to validate research findings, such as those published in a journal paper.
As “rule of thumb”, if your research can’t be replicated without your code then the code
should be included and shared as part of the research data
Often the software, script or simulation may be the research output and the data
produced considered ancillary content. In this case it is more important to store and share
your code than the actual data.
Additionally, even if you don’t need to preserve software, it is good practice to make
software available with adequate documentation to enable others to validate your
research findings, and to access and reuse your research data.
Who should make the decision about
what research data should be
Each research organisation will have specific policies and associated processes
to determine what and how publicly-funded research data will be stored and
Normally it will be the PI of the research project and/or Head of
Department/School who will make the decision about what research data
should be preserved and made available.
It is important to recognise that not all research data can or should be freely
shared – ethical, legal or confidentiality issues may constrain who may have
What about software not produced by
my project, but is required to validate my
It is prudent, in terms of providing access to your research data and of enabling
your own future research, to take reasonable steps to assure continued availability
of the software you use.
This may include taking a copy of open-source software and preserving it if the
licence allows, or using commercial software where a multi-year support
agreement is available.
Given the requirement to preserve research data for 10 years from the date of last
access by a third party, this provides a compelling reason to use open-data
formats and open-source software
What licence should I choose for my
data and software?
Following the principle that publicly funded research data should be made
openly available with as few restrictions as possible, you should consider
applying an appropriate open licence to the data and software generated by
your research (GNU General Public Licence, MIT, CC, ODC).
The Digital Curation Centre and Software Sustainability Institute have written
guides to help you license research data and choose an open-source licence for
How to licence research data (DCC) - http://www.dcc.ac.uk/resources/how-guides/license-research-
Choosing an open-source licence - http://www.software.ac.uk/resources/guides/adopting-open-source-
Where should I deposit my research data
EPSRC does not provide a publications repository, research data repository, or
Researchers are expected to use institutional or disciplinary/domain repositories
available to them (e.g. GitHub, SubVersion).
It is important that deposited objects can be referenced and accessed via a persistent
identifier (e.g. a DOI) and that appropriately structured metadata describing the
objects is easily discoverable.
A good way to make data and software discoverable is to cite it in research
publications, and to include the persistent identifier (DOI) in the citation.
Writing and using a software
At present software management plans are relatively new for research
The EPSRC Software for the Future call explicitly requires software
management plans as part of the Pathways to Impact.
NSF SI2 funding requires software to be addressed as part of the mandatory
data management plan.
A software management plan is a way of formalising a set of structures and
goals to ensure your software is accessible and reusable in the short, medium
and long term.
Software management plans should
• information on what software outputs (including documentation and other related
material) are expected to be produced
• who is responsible for releasing the software (e.g. PI / lead developer)
• the revision control process to be used [Note: it is important to choose a revision
control / configuration management system that all members of the team will use]
• what license will be used for each output
A software management plan could also:
• identify the software development model
• identify external software and any associated licences
• dependencies and risks
Implementation of the EPSRC Policy at Edinburgh is being supported by the Research
Data Service delivered by Information Services
For help about meeting this policy requirement contact:
• Email: IS.Helpline@ed.ac.uk with “Help with EPSRC data policy framework” in
your subject line.
• Email: PURE@ed.ac.uk if you have questions about PURE.
For help about research data management in general contact:
• Email: IS.Helpline@ed.ac.uk with “Help with Research Data Management in
general” in your subject line.
• Email: IS.Helpline@ed.ac.uk if you would like to arrange an RDM training or
awareness raising session.
For help with research software contact:
• Email: email@example.com at the Software Sustainability Institute
• EPSRC Policy Framework on Research Data:
• DataStore: https://www.wiki.ed.ac.uk/x/Np9FD
• Edinburgh DataShare: http://datashare.is.ed.ac.uk/
• Data Catalogue in PURE: http://www.pure.ed.ac.uk
• Digital preservation and curation - the danger of overlooking software -
• Choosing a repository for your software project - http://software.ac.uk/resources/guides/choosing-
• How to cite and describe software - http://software.ac.uk/so-exactly-what-software-did-you-use
• Writing and using a software management plan - http://www.software.ac.uk/resources/guides/software-