Library Preservation Challenge - Gatenby


Published on

By Janifer Gatenby. Presented 30 June 2008 to the ACRL Western European Studies Section

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Library Preservation Challenge - Gatenby

    1. 1. A Library Preservation Challenge Managing the collective collection over time Janifer Gatenby, OCLC Western European Studies Section (ACRL WESS) June 30 2008
    2. 2. Fundamental Questions <ul><li>What are the challenges? </li></ul><ul><li>What are the responses to these challenges / problems? </li></ul><ul><li>Are they adequate in themselves? </li></ul><ul><li>Does the ensemble make coherent sense? </li></ul>
    3. 3. Challenge: We lose valuables <ul><li>Natural disaster, Deterioration, War, </li></ul><ul><ul><li>Lack of care or will </li></ul></ul><ul><li>We also lose valuable digital things </li></ul><ul><ul><li>Broken links, broken code, buried data, lack of description (context) </li></ul></ul>Only 20% of silent films survive Theda Bara: 3.5 of 40 films Fire Fire
    4. 4. <ul><li>“When scores of books packed into shelves or in piles are set ablaze, the pages fuse and carbonize, turning into clinkers in the intense heat due to the lack of oxygen.” </li></ul><ul><li>Bosnia's National Library in Sarajevo, shelled and burned August 25-26, 1992 </li></ul>
    5. 5. Looking at the Challenges <ul><li>That which cannot be found…. is lost </li></ul><ul><ul><li>Increasing part of library collection is outside the building – digital and physical </li></ul></ul><ul><ul><li>Broken digital links </li></ul></ul><ul><ul><li>Obscurity – failure to rank highly </li></ul></ul><ul><li>That which is not regularly accessed…. may be lost </li></ul><ul><ul><li>E.g. PDFs that will not open because they contain outdated fonts </li></ul></ul><ul><ul><li>Openers may get lost </li></ul></ul><ul><ul><li>Little used resources may be weeded, in ignorance of their rare value </li></ul></ul>
    6. 6. Looking at the Challenges <ul><li>Limited space in library buildings </li></ul><ul><ul><li>Yet physical collection continues to grow </li></ul></ul><ul><ul><li>British Library storage increases by 12 kilometres a year </li></ul></ul><ul><ul><li>“ in 2006 291,920 new titles were published in the U.S., and the number of new books has increased nearly every year for the last decade, despite the spread of electronic publishing” Robert Darnton </li></ul></ul><ul><li>Increase in grey material </li></ul><ul><ul><li>Self publishing; no peer review </li></ul></ul><ul><ul><li>Increase in material of dubious merit </li></ul></ul><ul><ul><li>More difficult to detect resources of worth </li></ul></ul><ul><ul><li>How much should we collect and preserve? </li></ul></ul>
    7. 7. Preservation Level <ul><li>Regional & National </li></ul><ul><ul><li>Individually not unique but if lost would destroy regional landscape </li></ul></ul><ul><ul><ul><li>Streetscape Leiden, Netherlands </li></ul></ul></ul><ul><li>Global </li></ul><ul><ul><li>Works of international significance in their own right </li></ul></ul><ul><ul><ul><li>Rijnland Gemeenschaft, Leiden </li></ul></ul></ul><ul><ul><ul><li>Ship’s Carpenter’s building, Leiden </li></ul></ul></ul>
    8. 8. Responses <ul><li>Collective physical stores and services </li></ul><ul><li>Digitisation </li></ul><ul><li>Collective digital stores </li></ul><ul><ul><li>Beyond simple storage to true preservation </li></ul></ul><ul><li>Exposure </li></ul>
    9. 9. Collective physical stores <ul><li>Central independent physical stores </li></ul><ul><ul><li>National Repository Library of Finland, Kuopio (400kms NE of Helsinki) 1989- </li></ul></ul><ul><ul><li>Norwegian Repository Library </li></ul></ul><ul><ul><li>CTLes (Centre Technique du Livres de L’Enseignement Supérieur) </li></ul></ul><ul><ul><li>CASS (Cooperative Academic Store of Scotland) </li></ul></ul><ul><li>One library nominated for region </li></ul><ul><ul><li>University of Regensburg, Bavaria (following failures in North Rhine Westphalia and Baden-Württemberg) </li></ul></ul><ul><li>Central / Distributed hybrid </li></ul><ul><ul><li>UK Research Reserve Project (based on BL but retaining 2 to 3 other copies among the cooperative) </li></ul></ul>
    10. 10. Physical Stores – an adequate response? <ul><li>Rely on union catalogues – for collection management and exposure </li></ul><ul><ul><li>Finnish receive uncatalogued material and catalogue it </li></ul></ul><ul><ul><li>UK Research Reserve: known problems </li></ul></ul><ul><ul><ul><li>Serial holdings in union catalogues are “patchy” </li></ul></ul></ul><ul><ul><ul><li>And not updated regularly enough </li></ul></ul></ul><ul><ul><ul><li>And don’t include retention policy (new ISO standard 20775 – Schema for Holdings – includes dynamic and policy information for query responses) </li></ul></ul></ul><ul><li>Rely on effective delivery systems & D igitisation o n D emand </li></ul><ul><ul><li>For DOD need copyright evidence </li></ul></ul><ul><ul><li>Lack comprehensive, simple international request and delivery </li></ul></ul>
    11. 11. Digitisation <ul><li>We cannot digitise everything </li></ul><ul><li>in the forseeable future </li></ul><ul><ul><li>CENL 2007 est. – only 1% digitised </li></ul></ul><ul><ul><li>2005 – Google books 10 yr plan is </li></ul></ul><ul><ul><li>only covering 35% of WorldCat </li></ul></ul><ul><ul><li>Jilovsky 2008 – 12% </li></ul></ul><ul><li>Digitising for indexing versus digitizing for preservation </li></ul><ul><ul><li>Parts need re-doing </li></ul></ul><ul><li>Jean Noël Jeanneney & Google Books </li></ul><ul><ul><li>Cultural, commercial bias </li></ul></ul><ul><ul><li>Questionable quality </li></ul></ul><ul><ul><li>Access and re-use restrictions </li></ul></ul><ul><ul><li>Preservation not guaranteed </li></ul></ul>“ Too many – often small – institutions or institutions with relevant collections of too modest a volume want to digitise too little at too high a price without being able to justify distributed costs of investment and management…” Erland Kolding Nielsen RLG Programs comparison of digitisation contracts
    12. 12. Digitisation and Copyright <ul><li>OCLC Registry of Copyright Evidence </li></ul><ul><ul><li>Need more author information – nationality </li></ul></ul><ul><ul><li>Need definition of “due diligence” </li></ul></ul>“ The extension of the copyright limit from 50 to 70 years… was simply a catastrophe and an enormous obstacle to developing a relevant, adequate and comprehensive EDL with 20th century material of sufficient importance….The legal demands of investigating and finding the heirs..prohibitive” Erland Kolding Nielsen
    13. 13. Centralised and Collective Digital Stores <ul><li>The assets of the libraries </li></ul><ul><ul><li>Gallica (90,000 works, 80,000 images) </li></ul></ul><ul><ul><li>KB, Netherlands – 7 projects; 41 million pages </li></ul></ul><ul><ul><li>EU Digital Library </li></ul></ul><ul><ul><li>OCLC Digital Archive </li></ul></ul><ul><ul><li>Harvestable Institutional Repositories </li></ul></ul><ul><li>Saving the community’s assets </li></ul><ul><ul><li>CLOCKSS (licensed content on publisher’s sites) </li></ul></ul><ul><ul><li>Web archiving – saving the community’s web presence </li></ul></ul>
    14. 14. European Digital Library <ul><li>EU has funded infrastructure </li></ul><ul><li>Now warming to funding digitisation itself </li></ul><ul><li>Needs serious funding </li></ul>“ ..nothing wrong with the vision (EDL with > 12 million by 2012) but [financially] unrealistic “ Erland Kolding Nielsen “ ..A better balance between quality, quantity and costs has to be struck if libraries wish to digitise on a large scale…” Astrid Verheusen
    15. 15. Archiving licensed and research content <ul><li>Licensed digital content – e.g. CLOCKSS </li></ul><ul><li>Research data </li></ul><ul><ul><li>EU 7 th Framework FP7 – taking research data to the forefront </li></ul></ul><ul><ul><li>JISC study: Keeping Research Data Safe: a cost model and guidance for UK Universities Neil Beagrie, Julia Chriszcz and Brian Lavoie </li></ul></ul><ul><ul><li>RLG Programs, led from Scotland “New modes of scholarship” </li></ul></ul>
    16. 16. L ots o f C opies K eep S tuff S afe <ul><li>“ ...let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.” — Thomas Jefferson, February 18, 1791 (CLOCKSS web page) </li></ul><ul><li>CLOCKSS – US/UK cooperative </li></ul><ul><ul><li>Monitoring, testing, emulating </li></ul></ul><ul><ul><li>Indiana U, NYPL, Rice U, Stanford U, U. Virginia, OCLC, U. Edinburgh </li></ul></ul><ul><ul><li>Am.Chem.Assoc, AMA, Am. Physiol.Assoc, Nature, Sage, Taylor & Francis, Elsevier, Oxford UP, Springer, Wiley Blackwell </li></ul></ul>
    17. 17. Long Term Archiving Systems <ul><li>e-Depot (KB Dutch National Library) </li></ul><ul><li>kopal (DNB German National Library) </li></ul><ul><li>LOCKSS (Stanford University) </li></ul><ul><li>Portico (Ithaka / JSTOR) </li></ul><ul><li>Outdated data formats </li></ul><ul><ul><li>Migration or emulation </li></ul></ul>“ ..migration during ingest adds significantly to costs with no guarantee that the chosen formats will solve future problems “ Michael Seadle quoting LOCKSS
    18. 18. WARC – Web Archiving ISO 28500 <ul><li>Allows archiving of </li></ul><ul><ul><li>all control information from the harvesting protocol (e.g., request headers), not just response information. </li></ul></ul><ul><ul><li>payload content </li></ul></ul><ul><ul><li>Arbitrary metadata linked to other stored data (e.g., subject classifier, discovered language, encoding) </li></ul></ul><ul><ul><li>Data transformations linked to other stored data </li></ul></ul><ul><ul><li>Overly long records by truncation or segmentation </li></ul></ul><ul><li>Allows duplicate removal (to reduce storage) </li></ul><ul><li>Support for data compression and maintenance of data record integrity. </li></ul><ul><li>Standardisation led by Christian Lupovici of the Bibliothèque Nationale de France </li></ul>
    19. 19. WARC Implementations <ul><li>Bibliothèque Nationale de France </li></ul><ul><ul><li>Coverage – covered by legal deposit – everything that would be consumed by the French public (not just the .fr domain) </li></ul></ul><ul><ul><li>120 terabytes; > 10 billion files </li></ul></ul><ul><li>Norway, Sweden </li></ul><ul><ul><li>Everything in the national domain </li></ul></ul><ul><li>International Internet Preservation Consortium (IIPC) </li></ul><ul><ul><li>Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway, Sweden, The British Library (UK), The Library of Congress (USA) and the Internet Archive (USA) </li></ul></ul>
    20. 20. Exposure <ul><li>Importance of maximum exposure </li></ul><ul><ul><li>Users expectations – users are playing internationally </li></ul></ul><ul><ul><li>Need web presence to achieve page rank in search engines </li></ul></ul><ul><ul><li>Growth of harvesting OAIPMH, sitemaps </li></ul></ul><ul><ul><li>Need to act on two fronts for maximum exposure </li></ul></ul><ul><ul><ul><li>Feeding external search engines and sites </li></ul></ul></ul><ul><ul><ul><li>Enriching the metadata at home </li></ul></ul></ul>“ Visibility in a way can be seen to be indispensable in the survival of the item”…Werner Schwartz
    21. 21. Catalogue and Union catalogues <ul><li>Cataloguing backlogs still real </li></ul><ul><ul><li>NUKAT in Poland – just celebrated 1 million records </li></ul></ul><ul><ul><li>University of Warsaw </li></ul></ul><ul><ul><li>Bavarian State Library finished retrospective cataloguing 2004 </li></ul></ul><ul><li>“ Millions of digital documents and resources are uncatalogued and even unmapped and difficult to retrieve” Michael Gorman </li></ul>
    22. 22. Catalogues becoming richer <ul><li>Supporting improved discovery and exposure </li></ul><ul><ul><li>Replacing the browsing experience </li></ul></ul><ul><li>Supporting selection </li></ul><ul><li>Learning from online shopping experience (Amazon) </li></ul><ul><li>Importance of user contribution </li></ul><ul><ul><li>Need large sites to attract users </li></ul></ul><ul><li>Importance of data mining rather than data crafting </li></ul><ul><ul><li>Scalability </li></ul></ul>
    23. 23. VIAF links
    24. 24. Evaluative information <ul><li>Reading lists </li></ul><ul><ul><li>Ester (Estonia) </li></ul></ul><ul><ul><li>BibSys </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Statistics </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>COBISS – Slovenian “Best Read Books” </li></ul></ul>Over 70,000 lists
    25. 25. Exposure of Library Resources <ul><li>End User interfaces to Union catalogues - examples </li></ul><ul><ul><li>NUKAT </li></ul></ul><ul><ul><li>SUDOC </li></ul></ul><ul><ul><li>GBV </li></ul></ul><ul><ul><li> </li></ul></ul><ul><ul><li>BibSys </li></ul></ul><ul><ul><li>Libris </li></ul></ul><ul><ul><li>COBISS </li></ul></ul><ul><ul><li>Ester </li></ul></ul>
    26. 26. The same copy can be retrieved from: The Royal Library of the Netherlands web site: The Dutch National Union Catalogue web site: http:// The European Library web site: WorldCat: http:// Google Books: http:// /
    27. 27. WorldCat - Statistics <ul><li>http:// </li></ul><ul><ul><li>Bibliographic records: 106.938.448 </li></ul></ul><ul><ul><li>Holdings: 1.284.762.384 </li></ul></ul><ul><ul><li>Articles: 54+ million (Medline, ERIC, GPO, British Library, ArticleFirst) </li></ul></ul><ul><ul><ul><li>New – agreements with H. W. Wilson et MLA </li></ul></ul></ul><ul><ul><li>Libraries: 60.000+ </li></ul></ul><ul><ul><li>National catalogues loaded: > 40 </li></ul></ul><ul><ul><ul><li>Australia, Czech Republic, Denmark, Finland, Germany, Poland, Sweden, United Kingdom public libraries ++ </li></ul></ul></ul><ul><li> </li></ul>
    28. 28. WorldCat: Visits per month
    29. 29. Provenance and destination 2007: 110 millions 1 million referrals per month from Google Book Search
    30. 30. <ul><li>Regional & National </li></ul><ul><ul><li>Management of the collective collection </li></ul></ul><ul><ul><ul><li>Selection, weeding, claiming </li></ul></ul></ul><ul><ul><li>Provision of delivery services </li></ul></ul><ul><ul><li>Global physical delivery architecture </li></ul></ul><ul><li>Global </li></ul><ul><ul><li>Exposure of collections </li></ul></ul><ul><ul><ul><li>bibliographic metadata, </li></ul></ul></ul><ul><ul><ul><li>holdings, issue level holdings </li></ul></ul></ul><ul><ul><ul><li>statistics </li></ul></ul></ul><ul><ul><ul><li>reference query and answer pairs </li></ul></ul></ul><ul><ul><li>Exposure of services </li></ul></ul><ul><ul><li>Management of exposure data </li></ul></ul><ul><ul><ul><li>Links, mining, user contribution </li></ul></ul></ul><ul><ul><li>Curation of rare & priceless </li></ul></ul><ul><ul><li>Electronic delivery architecture </li></ul></ul>
    31. 31. Fundamental Questions <ul><li>What are the responses to these challenges / problems? </li></ul><ul><li>Are they adequate in themselves? </li></ul><ul><li>Does the ensemble make coherent sense? </li></ul><ul><li>? ? ? ? </li></ul><ul><li>Collective physical stores and services  </li></ul><ul><li>Digitisation  </li></ul><ul><li>Collective digital stores </li></ul><ul><ul><li>Beyond simple storage to true preservation  </li></ul></ul><ul><li>Exposure  </li></ul>
    32. 32. <ul><li>“ ..Electronic enterprises come and go. Research libraries last for centuries. Better to fortify them than to declare them obsolete, because obsolescence is built into the electronic media..” Robert Darnton </li></ul><ul><li>New York review of books 55, no. 10 June 12 2008 </li></ul>
    33. 33. Thank you
    34. 34. Collection Analysis <ul><li>Compare collection – globally or with selected libraries </li></ul><ul><ul><li>By subject, classification, format, publication date range </li></ul></ul><ul><ul><li>Supports re-location, purchase, digitisation decisions </li></ul></ul>
    35. 35. Registries of digital content <ul><li>“ Catalogue information will increase the use of digitised works and their chance to be preserved”…Werner Schwartz </li></ul><ul><li>OCLC / DLF Registry of digital Masters </li></ul><ul><ul><li>Agreement with LIBER / CERL – Consortium of European Research Libraries </li></ul></ul><ul><ul><li>Scope of harvesting in Europe is still insufficient </li></ul></ul><ul><ul><li>Data not in compliance with DLF guidelines – being converted </li></ul></ul>