eScience and Digital Preservation, presentation to Association for Information Science and Technology (ASIST) conference November 2004, Rhode Island USA, is the sixth of 12 presentations I have selected to mark 20 years in Digital Preservation.

It is closely related to the previous slideshare for May on the Jisc continuing access and digital preservation strategy but focuses just on the science component.

This is one I wasn’t able to present in person but it was kindly delivered by Gail Hodge.

My brief for the presentation was "thoughts or citations you have for the impact of e-science, particularly the GRID, on information management, particularly archiving, preservation and long-term access."

It is a short presentation of 15 slides covering collection-based science, the Grid, data publishing, and the background and rationale for the Digital Curation Centre (just launched two weeks before in the UK).

It is a snapshot in time and of key issues in 2004 – interesting to contrast with what one would write 10 years on and ponder on progress made.

  1. 1. Supporting further and higher education E-science and Digital Preservation Neil Beagrie BL/JISC Partnership Manager ASIST Annual Conference Nov 2004 E-science Panel
  2. 2. 2 Overview • Apologies for absence (and thanks to Gail for presenting!) • Trends and implications – Data growth – e-Research and collection-based science – The Grid – New publishing roles for datasets – Digital preservation • Digital Curation Centre – What the funders are looking for
  3. 3. 3 Growth of Scientific Data and Data Curation • In next 5 years e-Science will produce more data than has been collected in the whole of human history • Data growth – Protein Data Bank (1972- 2003)
  4. 4. 4 Implications • Core Funding for institutions will not grow in line with information growth • Need for more automation and tools • Need for new shared services– lower the curation cost for disciplines – accelerate knowledge transfer • Significant need for R&D and investment now to prepare for this
  5. 5. 5 Collection-based Science (1) • National Science Foundation Advisory Panel on Cyberinfrastructure – “The importance of data in science and engineering continues on a path of exponential growth; some even assert that the major science driver of high end computing will soon be data…Collecting, organizing, storing, and providing access to vast quantities of data and other information (such as scholarly publications) is becoming as important as simulation has been and will likely grow faster over the next decade.”
  6. 6. 6 Collection-based Science (2) • NSF Advisory Panel on Cyberinfrastructure • “To succeed NSF must… ensure that the exponentially growing amounts of data are collected, curated, managed, and stored for broad long-term access by scientists everywhere.” • “Data Repositories...Providing access to observational and other data entails far more than attaching a lot of disks to a server that is on the Internet.” • “R&D centers could be established for addressing common issues…there may be advantages to grouping applied research, development, and operations within a common organization and geographic location.”
  7. 7. 7 The Grid • ‘The Grid is a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources’ ( Foster, Kesselman and Tuecke) • Includes computational systems, data storage resources, digital libraries and specialized facilities
  8. 8. 8 e-Government and the Grid ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’ Tony Blair, 2002 Implications for dp - Grids could enable better replication and preservation, and access
  9. 9. 9 Data Publishing In some subjects databases are wholly or partly replacing journal publications as a medium of communication – These databases are built and maintained with a great deal of human effort – Scale of effort and supporting infrastructure varies – may have discipline-wide scope and dedicated “curators” – They may not contain primary data. Sometimes just value-added annotation/metadata – They borrow/exchange extensively, and refer to other databases and journal articles – May have evolved from supporting/internal facing role to publishing to external audiences
  10. 10. 10 Ordnance Survey • Publication in paper editions at different scales since 1791. • Computerisation first designed to assist in workflow of paper publication. • OS National Topographic Database (NTD) • For large –scale mapping paper editions now discontinued. NTD is the map -continuously updated and printed remotely on demand.
  11. 11. 11 Digital Preservation “ digital documents last forever –or five years, which ever comes first” (Jeff Rothenberg 1997) BBC Domesday System
  12. 12. 12 Organisational and technical challenges “….I have data files from projects from years ago which are on disks I no longer have a drive for on computers I no longer have access to or are no longer made or the software/operating system changes would make it extremely difficult to access any more…. the nature of research work means a lot of short-term researchers over the years … Also as PIs move around and collaborate with many people in other organisations it is pretty difficult to go back more than a few years with confidence that data will be adequately archived.” (Interview quote from UK-based Professor cited in JISC Audit of e-Science Curation report)
  13. 13. 13 Digital Curation Centre • Joint funding JISC and e-science core programme • Three year initial funding - $6m • Awarded to Consortium of Edinburgh, Glasgow, CCLRC, UKOLN • Not a data centre – will provide generic support services and research • DCC officially launched 5th November 2004
  14. 14. 14 What the DCC funders are looking for • Research into data curation and preservation issues • advisory services in best practice and a repository for tools, software and documentation • DCC is not being funded to set up its own data repository • DCC will need to work with key data centres, repositories and libraries to engage the relevant communities
  15. 15. 15 Further information • Digital Curation Centre • The Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC)