Transcript of "Recommendation to the EU Hearing on Access to and Preservation of Scientific Information"
Thank you for this invitation to contribute to the formation of policy on this topic. Let me begin by quoting two scientists.The first is the Spanish Nobel Prize winner Santiago Ramón y Cajal He wrote: “A scholar’s positive contribution is measured by the sum of the original data that he contributes. Hypotheses come and go but data remain.” He said that in 1897 in his work: Advice to a Young Investigator. The 21st Century data scientist Mark Parsons, advises us: “You are not finished until you have done the research, published the results, and published the data, receiving formal credit for everything.” This highlights two key concepts for preservation of scientific data: Making data public and gaining recognition <<As Aside: On the matter of publishing data, my lawyers tell me to use the phrase ‘to make data available’ in order not to imply a new role for the present day publishers.>> Two key challenges are therefore • how to make data available into the future, and for the future << data need not be digital; all that is digital are not always data – but they might become so. >> and • how to provide the reward of recognition, to add motivation by carrot, not just the stick of compliance. 1
I would like to make three recommendations: 1. First, in seeking to preserve the record of science for the future, we should include research literature as an important part of the record of science. Both have evidential value for research, and the relationship between the two is also important. ** In order to keep to time, I would like to submit separate written note on the relationship between research literature and research data, in which I contrast three types of data [reversing the labels I have used elsewhere to give prominence to the data originating close to the instrument by which the data were generated]: A. the source & reference databases that are curated in data centres and large-‐scale research ‘data factories’ – from which datasets are often extracted and analysed by researchers B. the datasets upon which the conclusions published in literature are based C. the supplementary data files that increasingly accompany enhanced e-‐publication in research literature. Responsibilities for these different types of data differ. 2. My second recommendation is that ‘future-proofing’ requires we make data available - as though for researchers beyond our immediate peer group and for the machine-as-user -‐ thereby to ensure that future researchers can use their software on these data for what can only be called ‘unimaginable purposes’. This means opening up the knowledge now locked in document formats like pdf so that the scientific literature becomes scientific data. 3. Third, when the Commission re-‐visits the grand societal challenges to which research can and should address, it should regard ‘assured and continuing access to digital content’ itself as a grand societal challenge -‐ one to which Europe’s scientific and scholarly community can and are making globally significant and lead contribution. It follows that we should not have a narrow view of science and scientific data. 2
What then of preservation of research literature? In days of print no one expected the publishers to have the last copy; it was for the libraries to exercise stewardship on behalf of future researchers. But with digital anytime/anyplace access, libraries do not easily have that opportunity – and it is not necessary that every library has to have every copy on its digital shelf. There are better ways of behaving. Fortunately several organisations are stepping forward to be active as archiving agencies – LOCKSS, CLOCKSS, Portico and national libraries such as the BL and the Dutch KB are all working with publishers to take stewardship of e-‐journal and other digital content. I’m pleased to report that the ISSN International Centre in Paris and EDINA have been working with those leading agencies in a JISC-‐funded project to create an online facility, peprs.org to act as a monitor to establish who is looking after what e-journal, how, and with what terms of access. peprs.org is available now as an online source about the ‘keepers’ – in Beta form -‐ and we are seeking help on establishing how it should be governed. Research literature is of international concern. It requires international action. Our experience is that relying upon legal deposit legislation is not enough. For example, as one of 12 steward libraries, the University of Edinburgh is one of three secure Archive Nodes in Europe (*) on behalf of CLOCKSS which has reached direct and international agreement with publishers. The EU and the Commission have an important part to play in ensuring that Europe has a lead role. * the other two are Humboldt University (Berlin, Germany) and Università Cattolica del Sacro Cuore (Milan, Italy) 3
In closing I would like to say a few words about the ways in which the University of Edinburgh has been involved and the contribution we have been attempting to make, over the long and for the long. The University is a research-‐led seat of learning, set in Scotland’s Capital, renown for the flourishing of the Scottish Enlightenment, and now contributing internationally to the UK and European research base. Its commitment to stewardship for research content was signaled from the start, as the Library came first, three years ahead of the start of what became the first civic university in 1583. What now of its digital stewardship? In 1983, Edinburgh decided to set up the first University Data Library in the UK, having studied the growth of national social science data archives in Europe and institutional data libraries in North America. I was at the University at the time as a research statistician, designing and supervising sample surveys in a research centre that had begun to make its data available for others to use, engaging with practitioners. I was recruited to take charge of this new Data Library. What I learnt was much about data archiving but a great deal more about how to assist researchers and students discover and obtain access to data produced by others. I learnt how to be demand-focussed. That has helped when realising the plans of policy agencies like JISC working to serve research needs across the UK – done via a range of content and infrastructure services deployed by EDINA as national academic data centre, and the Digital Curation Centre taking the lead internationally in combining the two approaches of value-‐added data curation and long term digital preservation. [David and I worked together during that set-‐up phase for the DCC.] Edinburgh is the venue for the INSPIRE Conference next month to which my colleagues are contributing 5 papers, including one on ‘continuing access’ for these spatially-‐reference data produces by public sector bodies across Europe. 4
David spoke of mandates. I am delighted to be able to announce that earlier this month, the University now has claim to be among the first to approve an institutional policy to guide researchers and support staff in their management of digital research data. http://www.ed.ac.uk/schools-‐departments/information-‐services/about/news/research-‐policy-‐news Three of the policy measures are as follows: · Research data of future historical interest, and all research data that represent records of the University, including data that substantiate research findings, will be offered and assessed for deposit and retention in an appropriate national or international data service or domain repository, or a University repository. · Any data which is retained elsewhere, for example in an international data service or domain repository should be registered with the University. · Exclusive rights to reuse or publish research data should not be handed over to commercial publishers or agents without retaining the rights to make the data openly available for re-use, unless this is a condition of funding. This policy recognizes that archival responsibility and digital preservation are not just something to think about at the end of a project, but at the outset. It sets standards and defines the different responsibilities for the institution and the researcher -‐ for the all important PIs. It is being followed through with implementation via the training and services that many researchers will need including provision of a central resilient data storage service. *********** To re-state those three recommendations: • include research literature as part of the record of science • make data available for the machine-as-user • propose ‘assured and continuing access to digital content’ as the next grand societal challenge 5