Why does research data matter to libraries


Published on

Presented at the Research Data Network workshop, St Andrews, 30 Nov 2016

Published in: Education
  1. 1. Why does Research Data Matter to Libraries? John MacColl, University Librarian & Director of Library Services, University of St Andrews Jisc Research Data Network Meeting University of St Andrews, 30 November 2016
  2. 2. Happy St Andrew’s Day!
  3. 3. Three sources for this talk
  4. 4. Back to basics • Is research data an output? Does it validate a researcher? • What is wrong with the status quo? Peer review is sufficient? • Libraries deal in graspable objects. • We have more than enough to do, reinventing ourselves as managers of study spaces. • But – research data is in the library! Why?
  5. 5. Two new requirements • Need on campuses for research profile management (pragmatic; cynical?). • The ‘integrity of science’ argument (purist; Utopian?).
  6. 6. Ithaka UK Survey of Academics 2015 • Substantial increase in number of academics who preserve their data in an institutional or other online repository. • Corresponding decrease in number preserving their data themselves using commercially or freely available software. • Humanists and social scientists more likely to build up qualitative data.
  7. 7. Ithaka UK Survey of Academics 2015 • Scientists more likely to collect scientific,quantitative or computational data. • RLUK academics more likely to build up scientific or computational data. • Non-RLUK academics more frequently build up qualitative data. • 80% organise qualitative data on their own computer. • 30% organise or manage data using institutional or cloud storage.
  8. 8. Ithaka UK Survey of Academics 2015 • Medics and vets use institutional storage the most (53%). • < 5% of respondents utilise their library in organising or managing data. • 20% find it difficult to preserve data long-term. • 50% of respondents find freely available software highly valuable in managing or preserving research data, media or images.
  9. 9. Ithaka UK Survey of Academics 2015 • Next most valuable is a disciplinary or departmental repository at their institution. • Then their IT dept. • Then their library. • When projects end, 60% preserve their research data themselves using commercially or freely available software.
  10. 10. Ithaka UK Survey of Academics 2015 • This has substantially decreased since 2012, while the number using an institutional or other online repository has substantially increased. • Number using their library for preservation has substantially increased. • Humanists most likely to do preservation themselves.
  11. 11. Ithaka UK Survey of Academics 2015 • RLUK respondents more likely to use their university’s or another online repository. • UK respondents more likely than their US peers to use a repository, and less likely to self-preserve. • EPSRC mandate?
  12. 12. Funder influence • A good thing? • Librarians say yes! Drives behaviours that we know don’t come from ‘integrity of science’ argument. • Academics find it less welcome: this is where we come in? • But danger of being compliance police. • Alliance with research policy offices critical.
  13. 13. Integrity of science • Public scrutiny of evidence. • Data must be accessible, intelligible, assessable, usable. • Therefore requires metadata. • ‘We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable.’
  14. 14. Recommendations • Universities should recognise data communication for progression & reward. • Develop a data strategy and curation capacity. • Be open by default. • Research Councils should include costs of data preparation and metadata in costs of research. • Journals should require that underpinning data is accessible as a condition of publication.
  15. 15. However … Jim Scott • Professor in the schools of Physics &Astronomy and Chemistry. • Pioneering research on nano-memories used in smartcards (industry worth £100m). • Unesco medal in October. • ‘I thought scientists were marching together into the future to help humanity. That is not true; we are as competitive and unscrupulous as used-car dealers.’
  16. 16. Back to the library • Now we manage from the inside out, as well as the outside in. • Why? Research reputation, competitiveness, accountability (research profile management). • Libraries now have a potential relationship with all academic and research staff. • New role: to capture the local. • Libraries used to reflect scholarly disciplines back to themselves. • Still do so, but ensuring that our own institutions’ contributions are maximised. Much less than before is left to chance.
  17. 17. Change is gonna come … • Public accountability is driving the changes to the system that scientific integrity should be making, or should already have made. • We see this also in OA publishing – whether RCUK (gold) or Hefce/REF (green). • Disappointing? Let’s not forget: • The academy is intensely conservative. • How many years has the OA community striven for change? • Whatever the reason, we need systems to help us achieve capture. • So Jisc’s work on shared services for research data management is good news! • Libraries are in this business, and need robust infrastructure for data storage and management, as well as expertise.
  18. 18. The challenges of data for librarians • Are books and articles ‘data’? • Librarians consider data as a primary source; books and articles as secondary (unless constituted as ‘corpora’); catalogues, indexes and search engines as tertiary. • Data are difficult to separate from the software, equipment, documentation, and knowledge required to use them. There is an elephant in the room. • When is data data? • The reusability conundrum: experimental data can be recreated, whereas observational data cannot.
  19. 19. The challenges of data for librarians • Hidden data: in paper and digital form, in both public and private hands (offices, labs, freezers). • Uncovering historic data is of use for digital humanities. • Disciplines have their own ontologies: how to cross- search? • Disciplines resist simplistic mirroring behaviour. • Chemistry mixes open and closed models. • Biotechnologists patent first and publish later.
  20. 20. The challenges of data for librarians • High-energy physicists are open with publications but do not make data publicly available. • Biotechnologists restrict publication but openly deposit genome and protein data. • Scholars in fields that replicate experiments or draw on observational data are positive about data sharing. • Scholars who only work with their own data won’t standardise their data management practices. • Where data preparation is labour-intensive, scholars are less inclined to share.
  21. 21. The challenges of data for librarians • Scarce or novel data (eg new cell lines) less likely to be shared because they are labour-intensive and may yield subsequent data and publications. • Market value affects willingness to share. • So does sensitivity (eg human subject records). • And so does the decision on when data is data (verification). • Graduate students in high-paradigm fields are more likely to use the same tools and information resources as their advisors. • Students in low-paradigm fields are more likely to seek new information tools and techniques.
  22. 22. The challenges of data for librarians • Documents don’t always tally with their underpinning data: terminology is simplified; data collection may be compromised; reporting can be accelerated due to pressure from funders; formatting requirements can affect presentation. • The relationship can be fuzzy. • Is this falsification?
  23. 23. Libraries and research data • Librarians have had an over-simplified view of data and its infrastructure. • They don’t know what data is (are); but nor do academics outside of their own discipline. • Royal Society: ‘A realistic means of making data open to the wider public needs to ensure that the data that are most relevant to the public are accessible, intelligible, assessable and usable for the likely purposes of non-specialists. The effort required to do this is far greater than making data available to fellow specialists.’ • A role here for libraries?
  24. 24. Evolving libraries • Research libraries - a four-element brand: • Publications • Space • Special • Capture • Can libraries develop a layer of meta-understanding, sufficient to describe types and scopes and methods and characteristics at a general level, and so do for data what they have done for many years for the world of publications? • Can they move from an essentially passive role as capture agents in response to funder requirements, to becoming active, trusted managers of scholarly data?
  25. 25. Thank you! Have a good meeting.