Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EPSRC research data expectations and research software management

Invited presentation given to Sch. Mathematics Research Committee re. the EPSRC policy framework on research data, and research software management

  • Be the first to comment

EPSRC research data expectations and research software management

  1. 1. EPSRC research data expectations and research software management Stuart Macdonald Associate Data Librarian University of Edinburgh Research Committee Meeting, School of Mathematics, University of Edinburgh, 25 Jan. 2016
  2. 2. What is research data? Research data is defined by EPSRC as recorded factual material commonly used in the scientific community as necessary to validate research findings Although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created. Note that EPSRC does not expect every piece of data produced during a project to be retained – decisions about what to keep should be taken on a case by case basis. There is however a clear expectation that data which underpins published research outputs will be retained and managed.
  3. 3. • EPSRC have introduced a policy framework concerning the management and provision of access to publicly-funded research data. • EPSRC Principal Investigators and the University must demonstrate to EPSRC that their expectations are being met. The 9 expectations are detailed at: • EPSRC began monitoring compliance on 1st May 2015 on a case-by-case basis. If it judges sharing of research data is being obstructed then it reserves the right to impose sanctions.
  4. 4. EPSRC policy framework on research data
  5. 5. The expectations arise from 7 core principles which align with the core RCUK principles on data sharing, namely: • EPSRC-funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner. • EPSRC recognises that there are legal, ethical and commercial constraints on release of research data • Sharing research data is an important contributor to the impact of publicly funded research.
  6. 6. • EPSRC-funded researchers should be entitled to a limited period of privileged access to the data they collect to allow them to work on and publish their results. • Data management policies and plans should be in accordance with relevant standards and community best practice and should exist for all data • Sufficient metadata should be recorded and made openly available to enable other researchers to understand the potential for further research and re-use of the data • It is appropriate to use public funds to support the preservation and management of publicly-funded research data.
  7. 7. What do PIs and researchers need to know? • All researchers or research students funded by EPSRC will be required to comply with these expectations. • Data that is not generated in digital format will be stored in a manner to facilitate it being shared in the event of a valid request for access. • A link to digital research data is expected to be included in the metadata. • Where access to data is restricted published metadata should give the reason and summarise the conditions which must be satisfied for access to be granted.
  8. 8. • Key expectation 1: The data should be securely stored for at least 10 years • Key expectation 2: An online record should be created within 12 months of the data being generated that describes the research data and how to access it. • Key expectation 3: Published research papers should include a short statement describing how and on what terms any supporting research data may be accessed. What do PIs and researchers need to do?
  9. 9. • Research data that underpins a publication must be stored safely and securely, and made accessible. • Data may already be managed by a trusted domain archive outside of the university, in which case data may not need to be stored locally. • If not then data must be stored in a suitable University of Edinburgh storage solution. Minimal compliance is achieved by having your data on DataStore and then making a secure copy of it into the Data Vault (this service is currently in development). • For those who wish to openly publish data (and a snapshot of their research software), Edinburgh DataShare is the university’s open online digital repository of data produced by local researchers (policies, DOI, licence, citation). • Datasets added to DataShare will be allocated persistent identifiers (DOIs) for citations. Key expectation 1: store data securely
  10. 10. • The University is using PURE to record descriptive data (metadata) about the research data in order to meet this expectation. Research staff are therefore expected to add a metadata record for any EPSRC-funded research data into PURE, normally within 12 months of the data being generated. • To enter a new dataset description in PURE, click on the green ‘Add new’ button, and select ‘Dataset’. • Once added to PURE via the dataset content type, the resulting record should link to the funding source and also link to any associated publications. Key expectation 2: a record describing the data must be freely available online
  11. 11. Data Catalogue in PURE
  12. 12. • If the dataset is available online, for example in DataShare, the URL (or DOI) of that dataset should also be added. • Where access to the data is to be restricted, the published dataset metadata in PURE should give the reason and summarise the conditions which must be satisfied to grant access. • Dataset metadata added to PURE will ultimately be publicly accessible via the Edinburgh Research Explorer subject to confidentiality and other such restrictions.
  13. 13. • EPSRC state that this expectation ‘could be satisfied by citing such data in the published research and including in such citations direct links to the data or to supporting documentation that describes the data in detail, how it may be accessed and any constraints that may apply.’ Such links should be persistent URLs such as DOIs. • An example of a basic data citation would be of the form: ‘Creator (Publication Year): Title. Publisher. DOI’ Further details can be found at: • If commercial, legal or ethical reasons exist to protect access to the data these should be noted in a statement included in the published research paper. A simple direction to interested parties to ‘contact the author for access’ may not be considered sufficient. • The paper must also be made Open Access in PURE. Key expectation 3: include a statement in published papers under-pinned by EPSRC-funded data
  14. 14. Does research data include software? This depends on the research which is being carried out. As noted in the definition of research data, the deciding factor is whether the software is necessary to validate research findings, such as those published in a journal paper. As “rule of thumb”, if your research can’t be replicated without your code then the code should be included and shared as part of the research data Often the software, script or simulation may be the research output and the data produced considered ancillary content. In this case it is more important to store and share your code than the actual data. Additionally, even if you don’t need to preserve software, it is good practice to make software available with adequate documentation to enable others to validate your research findings, and to access and reuse your research data.
  15. 15. Who should make the decision about what research data should be preserved? Each research organisation will have specific policies and associated processes to determine what and how publicly-funded research data will be stored and managed. Normally it will be the PI of the research project and/or Head of Department/School who will make the decision about what research data should be preserved and made available. It is important to recognise that not all research data can or should be freely shared – ethical, legal or confidentiality issues may constrain who may have access.
  16. 16. What about software not produced by my project, but is required to validate my research results? It is prudent, in terms of providing access to your research data and of enabling your own future research, to take reasonable steps to assure continued availability of the software you use. This may include taking a copy of open-source software and preserving it if the licence allows, or using commercial software where a multi-year support agreement is available. Given the requirement to preserve research data for 10 years from the date of last access by a third party, this provides a compelling reason to use open-data formats and open-source software
  17. 17. What licence should I choose for my data and software? Following the principle that publicly funded research data should be made openly available with as few restrictions as possible, you should consider applying an appropriate open licence to the data and software generated by your research (GNU General Public Licence, MIT, CC, ODC). The Digital Curation Centre and Software Sustainability Institute have written guides to help you license research data and choose an open-source licence for your software. How to licence research data (DCC) - data Choosing an open-source licence - licence
  18. 18. Where should I deposit my research data and software? EPSRC does not provide a publications repository, research data repository, or software repository. Researchers are expected to use institutional or disciplinary/domain repositories available to them (e.g. GitHub, SubVersion). It is important that deposited objects can be referenced and accessed via a persistent identifier (e.g. a DOI) and that appropriately structured metadata describing the objects is easily discoverable. A good way to make data and software discoverable is to cite it in research publications, and to include the persistent identifier (DOI) in the citation.
  19. 19. Writing and using a software management plan At present software management plans are relatively new for research proposals. The EPSRC Software for the Future call explicitly requires software management plans as part of the Pathways to Impact. NSF SI2 funding requires software to be addressed as part of the mandatory data management plan. A software management plan is a way of formalising a set of structures and goals to ensure your software is accessible and reusable in the short, medium and long term.
  20. 20. Software management plans should minimally include: • information on what software outputs (including documentation and other related material) are expected to be produced • who is responsible for releasing the software (e.g. PI / lead developer) • the revision control process to be used [Note: it is important to choose a revision control / configuration management system that all members of the team will use] • what license will be used for each output A software management plan could also: • identify the software development model • identify external software and any associated licences • dependencies and risks
  21. 21. Support Implementation of the EPSRC Policy at Edinburgh is being supported by the Research Data Service delivered by Information Services For help about meeting this policy requirement contact: • Email: with “Help with EPSRC data policy framework” in your subject line. • Email: if you have questions about PURE. For help about research data management in general contact: • Email: with “Help with Research Data Management in general” in your subject line. • Email: if you would like to arrange an RDM training or awareness raising session. For help with research software contact: • Email: at the Software Sustainability Institute
  22. 22. Thanks! URLs • EPSRC Policy Framework on Research Data: • DataStore: • Edinburgh DataShare: • Data Catalogue in PURE: • Digital preservation and curation - the danger of overlooking software - software • Choosing a repository for your software project - repository-your-software-project • How to cite and describe software - • Writing and using a software management plan - management-plans