Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Supporting the development of a national Research Data Discovery Service – a Pilot Project

297 views

Published on

In order to be reused, research data must be discoverable.

The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects.

Universities are increasingly making research data assets available through repositories or other data portals.

The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Supporting the development of a national Research Data Discovery Service – a Pilot Project

  1. 1. Supporting the development of a national Research Data Discovery Service – a Pilot Project Stuart Macdonald EDINA & Data Library University of Edinburgh stuart.macdonald@ed.ac.uk
  2. 2. • University of Edinburgh • Background and context • UK Research Data Discovery Service • PhD Interns • Observations • Closing remarks
  3. 3. University of Edinburgh • Founded in 1582 - 6th oldest university in the English-speaking world and one of Scotland's 4 ancient universities. • 3 Colleges (MVM, CSE, CHSS) , 22 Schools • Over 60 disciplinary/cross-disciplinary Institutes and Research Centres • 34000 students, 4500 researchers, 6000 research students
  4. 4. Background • EDINA and Data Library are a division within Information Services (IS) of the University of Edinburgh. • EDINA is a Jisc-funded centre for digital expertise providing national online resources for education and research. • Data Library & Consultancy assists Edinburgh University users in the discovery, access, use and management of research datasets. • The Data Library is part of the new Research Data Service – the culmination of a 36 month RDM Roadmap (phase 1 and 2) to implement the University’s RDM Policy and develop a suite of RDM Services that map onto the research lifecycle Data Library Services: http://www.ed.ac.uk/is/data-library EDINA: http://edina.ac.uk/
  5. 5. • In order to be reused, research data must be discoverable. • The EPSRC Research Data Expectations* requires research organisations to maintain a data catalogue to record metadata about research data generated by EPSRC-funded research projects. • Universities are increasingly making research data assets available through repositories or other data portals. • The requirement for a UK research data discovery service has grown as universities become more involved in RDM and capacity develops. * https://www.epsrc.ac.uk/about/standards/researchdata/expectations/ Context
  6. 6. UK Research Data Discovery Service (RDDS) In 2013, the Digital Curation Centre (DCC) and the UK Data Service piloted a registry service to aggregate metadata for research data held within a sample of UK universities and national, discipline specific data centres. This 6 month pilot that tested an existing data registry architecture developed by the Australian National Data Service (ANDS). This was followed up with Phase 2 funding from Jisc to evaluate technical solutions and further develop a national Research Data Discovery Service • https://www.jisc.ac.uk/rd/projects/uk-research-data-discovery • http://ckan.data.alpha.jisc.ac.uk/
  7. 7. As part of Phase 2 University of Edinburgh received funding from Jisc to support the development of UKRDDS. PhD interns from the 3 Colleges were hired through a ‘streamlined’ e-recruitment* process - As part of IS’s plan to recruit 500 PhD interns per academic year – complete with formal eligibility to work checks, inductions, probation reports, end of employment /continuation of employment processes !! To engage with local researchers in order to make metadata and full data sets available for harvest into the pilot service for discovery and potential reuse. This work was co-ordinated jointly by EDINA & Data Library and Library & University Collections. Progress was reported back to Jisc via monthly UKRDDS meetings and F-2-F workshops as well as representation on UKRDDS Technical and Metadata Advisory Groups.
  8. 8. PhD Interns: responsibilities To develop plans for getting researchers in schools engaged with recording or sharing their data To work closely with researchers and School administrators to assist in the description and upload of research data into: • PURE, the University’s proprietary Current Research Information System, used as a data catalogue where descriptive metadata about datasets can be added to link to related research outputs, publications or projects. • Work needed to convert PURE ver. 5 API into OAI-PMH end- point
  9. 9. Edinburgh DataShare - the University’s OA multi- disciplinary data repository hosted by the Data Library • It allows University researchers to upload, share, and license their data resources for online discovery and re-use by others. • OAI-PMH compliant • Built on the DSpace platform • http://datashare.is.ed.ac.uk
  10. 10. Other responsibilities: To validate and quality control metadata records ingested into both PURE and DataShare for the purpose of being harvested by UKRDDS To develop or enhance the quality of metadata records to the standard set for UKRDDS To assist in the identification and deposit of research datasets deemed suitable or appropriate for open publication and long-term preservation into DataShare To record their own observations and provide period reports on data sharing and cataloguing practices within respective Schools.
  11. 11. Observations 1st tranche of PhD interns (Dec. 15 – April 16) • School of Literatures, Languages and Cultures • Roslin Institute • School of Social and Political Science 2nd tranche (Mar. 15 – Sept.16) • Division of Infection and Pathway Medicine. School of Medicine • School of Literatures, Languages and Cultures (2nd intern) 3rd tranche (June. 16 – Sept. 16) • School of Divinity • School of Engineering
  12. 12. Literatures, Languages, and Cultures (LLC) • 3 datasets described in PURE, 2 datasets deposited into DataShare and described in PURE • 14 researchers interviewed for LLC ( + 7 researchers for Philosophy, Psychology and Language Science) • LLC has dedicated RDM webpages • Communications with researchers within the two Schools were conducted via Research Administrators • Research Administrators and researchers happy to talk once the interns is not seen as an ‘enforcing figure’
  13. 13. • Researchers expressed discomfort or unfamiliarity concerning online distribution of data and unease about upsetting publishers making their data available online • Due to the nature of humanities research, where interpretation of existing artefacts (books, historic texts, manuscripts) is itself the research output, researchers did not tend to regard this as data • Copyright was seen as one of the main issue hindering dataset deposit – a limiting factor when researchers’ data is based on texts and other archival material. • Also, some documents no longer under copyright are restricted from imaging due to preservation efforts • When texts themselves are a researcher’s own ‘data’ (as if often the case in Humanities) there is still a reluctance to share
  14. 14. Roslin Institute 67 researchers interviewed belonging to 4 divisions (70% of total) • Infection and Immunity • Genetics and genomics • Neurobiology and Developmental Biology • Clinical researchers from Veterinary School 0 datasets deposited in DataShare. Linking data in e.g. NCBI to PURE unrealisitic (see next slide) • PhD interns worked closely with dedicated Data Manager, PURE Administrator and Research Administrator. • Roslin have dedicated RDM webpages • c. 60% researchers kept their research outputs up-to-date in PURE though very few had updated research data metadata or were aware that they could. • c. 90% of researchers submitted data to journals and open access domain repositories e.g. 50% submitted to NCBI , 20% submitted to EBI
  15. 15. Number of datasets deposited into NCBI from Roslin Institute are large (e.g. Over 55,000 expressed sequence tags, over 73000 protein sequences, over 132000 genome survey sequences) Unwieldy proposition to record metadata from NCBI into PURE Currently no automated processes in place.
  16. 16. • The main reasons stated for using these repositories were: • Funder requirement • Default repository within their discipline • Recommendation by peers • c. 40% of researchers were confident about the safety of their data and long term gaurantees provided by the domain repositories, whereas c. 60% did not know or were not sure • Researchers working with industry partners indicated that due to confidential nature of the data they do not upload data to open access repositories • Only one third of researchers had heard about DataShare (with only one researcher who had used it). Two thirds hadn’t heard of it. • In general there was no interest in using DataShare due to well established domain repositories
  17. 17. Social and Political Science • 19 datasets held in the UK Data Archive described in PURE, 0 datasets deposited into DataShare • 15 researchers identified as having made data available via the UK Data Archive were sent a questionnaire – only 2 knew about DataShare • 12 ESRC funded PhD students interviewed (about making their data available in UKDA / DataShare) - No Data Management Plans written by ESRC funded PhDs at start of research (this is now mandatory) • 10 researchers interviewed (different from those that answered the questionnaire)
  18. 18. • Research Assistants are regularly employed to manage, clean and publish datasets. The temporary nature of contracts often means that the knowledge and practice of curating datasets is not retained within the School • Among the challenges cited by researchers for making datasets available both in a quantitative and qualitative sense, the most common is that of ethics and anonymisation • Of c. 300 researchers in the School between 2008-2016 only 19 had deposited data in the UK Data Archive • This confirmed (in the eyes of ther PhD intern) that making datasets available in open access or domain repositories is not necessarily a wide spread practice nor of primary importance
  19. 19. Closing remarks • Internships instrumental in starting RDM conversations within Schools • Mixed economy of research culture, practice and behaviour • Speed and process of data generation, description and deposit varies • Are we surprised? Old habits die hard. • Build it and they will come! • From a service provision perspective there is no one-size-fits all solution • With more emphasis placed on ‘as required’ service solutions
  20. 20. • Greater understanding needed of disciplinary and sub-disciplinary practice • Rethink outreach, formal and informal training strategies • Targeted approach, local data managers, 6FTEs • OA has taken c. 10 years to become embedded as common practice within the scholarly communication process • Arguably it is early days for RDM • We’ll await observations from other Schools with interest !
  21. 21. Questions! Special thanks to: Rodrigo Bacigalupe Cleo Davies James Jafali Natalie Lankester-Carthy Bridget Moynihan

×