Trm Introduction


Published on

DPE Training materials

Published in: Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Trm Introduction

  1. 1. Digital Preservation – An Introduction DPE/Planets/nestor training event October, 1st- 5th 2007 Vilnius, Lithuania Stefan Strathmann Göttingen State and University Library nestor - Network of Expertise in Long-Term Storage of Digital Resources
  2. 2. Session Outline <ul><li>10:00 – 10:45 Lecture </li></ul><ul><li>10:45 – 11:00 Discussion </li></ul><ul><li>11:00 – 11:30 Coffee Break </li></ul><ul><li>11:30 – 12:30 Group Work </li></ul><ul><li>12:30 – 13:00 Groups present their results </li></ul><ul><li>13:00 – 13:15 Summary discussion </li></ul>
  3. 3. Key Questions <ul><li>What is digital preservation? </li></ul><ul><li>Why is digital preservation important? </li></ul><ul><li>What are the big challenges? </li></ul><ul><li>What are the relevant standards, initiatives, programs? </li></ul>
  4. 4. Table of Contents <ul><li>General Introduction </li></ul><ul><li>Relevant Aspects </li></ul>
  5. 5. Digital Preservation – The Challenge <ul><li>Hardware and Software are becoming obsolete in very short periods of time </li></ul><ul><li>Incompatibility of different versions of hard- and software </li></ul><ul><li>Fading knowledge of how to use older hard- and software </li></ul><ul><li>Aging and decaying storage media </li></ul><ul><li>Loss of Information </li></ul>
  6. 6. Example – Loss of Information Acrobat 5 Acrobat 7
  7. 7. UNESCO <ul><li>Charter on the Preservation of Digital Heritage, October 15th, 2003 </li></ul><ul><li>Article 1: </li></ul><ul><li>“ The digital heritage consists of unique resources of human knowledge and expression.” </li></ul><ul><li>“ Many of these resources have lasting value and significance, and therefore constitute a heritage that should be protected and preserved for current and future generations.” </li></ul>
  8. 8. UNESCO Charter: Articles <ul><li>Article 2 – Access to the digital heritage </li></ul><ul><li>Article 3 – The threat of loss </li></ul><ul><li>Article 4 – Need for action </li></ul><ul><li>Article 5 – Digital continuity </li></ul><ul><li>Article 6 – Developing strategies and policies </li></ul><ul><li>Article 7 – Selecting what should be kept </li></ul><ul><li>Article 8 – Protecting the digital heritage </li></ul><ul><li>Article 9 – Preserving cultural heritage </li></ul><ul><li>Article 10 – Roles and responsibilities </li></ul><ul><li>Article 11 – Partnerships and cooperation </li></ul>
  9. 9. Digital Resources <ul><li>New forms of information: </li></ul><ul><ul><li>digital production (digitization, born digital, only digital) </li></ul></ul><ul><ul><li>digital publication (only digital, object features like retrieval) </li></ul></ul><ul><ul><li>digital distribution (portal, value chain) </li></ul></ul><ul><li>Rapid change of technology </li></ul>
  10. 10. Digital Long-Term Preservation <ul><li>Digital Preservation consists of processes that ensure that </li></ul><ul><li>digital objects remain </li></ul><ul><ul><li>accessible, </li></ul></ul><ul><ul><li>(re-)usable and </li></ul></ul><ul><ul><li>understandable </li></ul></ul><ul><li>in the future. </li></ul><ul><li>Digital Preservation has to ensure that future software and </li></ul><ul><li>hardware tools retain the authenticity, integrity, and </li></ul><ul><li>reliability of the digital object. </li></ul>
  11. 11. Digital Preservation – A Definition <ul><li>What is meant by „digital long-term preservation“ or “digital </li></ul><ul><li>preservation”? </li></ul><ul><li>Definition by Ute Schwens / Hans Liegmann (DNB/nestor): </li></ul><ul><li>“ In terms of preserving digital resources, ‘long-term’ does </li></ul><ul><li>not mean issuing a guarantee for five or fifty years, rather </li></ul><ul><li>the responsible development of strategies which can cope </li></ul><ul><li>with the constant changes brought about by the information </li></ul><ul><li>market.” </li></ul>
  12. 12. Preservation Approaches <ul><li>Migration </li></ul><ul><li>Emulation </li></ul><ul><li>Normalisation </li></ul><ul><li>Refreshing </li></ul><ul><li>Digital Archaeology </li></ul><ul><li>Hardware Museum/Technology Preservation </li></ul><ul><li>Print to Paper or Microfilm/fiche or barcode </li></ul><ul><li>... </li></ul>
  13. 13. Digital Information – An Estimate <ul><li>UC Berkeley‘s School of Information Management and </li></ul><ul><li>Systems: </li></ul><ul><li>How much Information? 2003 </li></ul><ul><li>Analysis of the year 2002 to estimate the yearly increase of new (digital and analog) information. </li></ul><ul><li>Finding: 30 % increase of digital information per year </li></ul><ul><li>See: </li></ul>
  14. 14. Heterogeneity - Materials <ul><li>Journals and monographs </li></ul><ul><ul><li>retrodigitized material </li></ul></ul><ul><ul><li>genuine digital material </li></ul></ul><ul><li>Web Documents, Web Server </li></ul><ul><li>Preprint-Server, theses, e-Proceedings, etc. </li></ul><ul><li>Primary data, research data, raw data </li></ul><ul><li>Emails, blogs, etc. </li></ul><ul><li>Film, Music, Multimedia etc. </li></ul><ul><li>... </li></ul>
  15. 15. Heterogeneity: Formats <ul><li>Depends on subject, e.g. </li></ul><ul><ul><li>Mathematics (TEX, PS, ...) </li></ul></ul><ul><ul><li>Geography (GIS) </li></ul></ul><ul><ul><li>... </li></ul></ul><ul><li>Multimedia, e.g. </li></ul><ul><ul><li>Animated WWW pages </li></ul></ul><ul><ul><li>Interactive objects in e-Learning </li></ul></ul><ul><ul><li>... </li></ul></ul><ul><li>Different versions in e.g. PDF, TEX, ... </li></ul><ul><li>Presentation Format / Preservation Format </li></ul>
  16. 16. Heterogeneity - General <ul><li>Metadata formats </li></ul><ul><li>(Dublin Core, MODS, PREMIS, MIX, ..) </li></ul><ul><li>Exchange formats (XML, METS, XML/RDF, SOAP, ...) </li></ul><ul><li>Controlled vocabulary systems (Ontologies, Taxonomies, ...) </li></ul><ul><li>Architecture, Protocols </li></ul><ul><li>... </li></ul><ul><li>Standardisation & Interoperability </li></ul>
  17. 17. Dealing with the Heterogeneity <ul><li>Preservation policy </li></ul><ul><li>Cooperation: international/national </li></ul><ul><li>Cooperation: cross-domain (e.g. museums, archives, research institutes, commercials, ...) </li></ul><ul><li>Redundancy of digital repositories explicitly desired </li></ul><ul><li>Cooperative management/administration of distributed digital archives/repositories </li></ul>
  18. 18. ... <ul><li>Coordinated cooperation needed between: </li></ul><ul><ul><li>producers of digital objects (e.g. scientists) </li></ul></ul><ul><ul><li>providers (e.g. libraries) </li></ul></ul><ul><ul><li>distributors (e.g. publishers, hosts of db) </li></ul></ul><ul><li>Use of international standards (e.g. DC, OAI, OAIS, METS) </li></ul>
  19. 19. producer consumer SIP DIP Access Archival storage AIP AIP Administration Preservation Planning SIP AIP DIP Submission Information Package Archival Information Package Dissemination Information Package Ingest Data management OAIS Model – Example for a Standard
  20. 20. Relevant Aspects <ul><li>Technical Issues / Obsolescence </li></ul><ul><li>Identification & Validation of Formats </li></ul><ul><li>Preservation Metadata </li></ul><ul><li>Preservation Policy </li></ul><ul><li>Legal Aspects </li></ul><ul><li>Trusted Repositories </li></ul>
  21. 21. Technical Issues / Obsolescence <ul><li>Digital information is stored as a bit stream on physical media => Preservation of the bit stream! </li></ul><ul><ul><li>Storage media types change quickly and are subject to obsolescence </li></ul></ul><ul><ul><li>Storage media are unstable and can degrade quickly </li></ul></ul><ul><li>Keeping the bit stream accessible </li></ul><ul><ul><li>Migration (Medium and Format) </li></ul></ul><ul><ul><li>Emulation (Hard- and Software) </li></ul></ul><ul><ul><li>... </li></ul></ul>
  22. 22. Formats: Identification & Validation <ul><li>Examples: </li></ul><ul><ul><li>Document - DOC, HTML </li></ul></ul><ul><ul><li>Raster Images - TIFF, PNG, JPEG </li></ul></ul><ul><ul><li>Structured graphics - CAD, VSD, </li></ul></ul><ul><ul><li>Audio - WAV, MP3, MIDI </li></ul></ul><ul><ul><li>Video - MPEG, AVI </li></ul></ul><ul><ul><li>Databases - DBF, MDB </li></ul></ul><ul><ul><li>Raw data </li></ul></ul><ul><ul><li>Collections - tar, zip </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>We are dealing with lots of different formats! </li></ul><ul><li>File format registries may help to handle the heterogeneity. </li></ul>
  23. 23. File Format Registries: Use Cases <ul><li>Identification I have a digital object; what format is it? </li></ul><ul><li>Validation I have an object purportedly of format F ; is it? </li></ul><ul><li>Transformation I have an object of format F , but need G ; how can I produce it? </li></ul><ul><li>Characterization I have an object of format F ; what are its features? </li></ul><ul><li>Risk assessment I have an object of format F ; is it at risk of obsolescence? </li></ul><ul><li>Delivery I have an object of format F ; how can I render it? </li></ul><ul><li>(Abrams, Seaman: Towards a global digital format registry. IFLA 2003) </li></ul>
  24. 24. Format validation with JOHVE <ul><li>JSTOR/Harvard Object Validation Environment </li></ul><ul><ul><li>see: </li></ul></ul><ul><li>The concept of representation format, or type, permeates all technical areas of digital repositories. Policy and processing decisions regarding object ingest, storage, access, and preservation are frequently conditioned on a per-format basis. In order to achieve necessary operational efficiencies, repositories need to be able to automate these procedures to the fullest extent possible </li></ul><ul><li>How much technical metadata do I need? </li></ul>
  25. 25. Preservation Metadata <ul><li>All Preservation strategies (migration, emulation, etc.) depend on the creation, capture and maintenance of suitable metadata: </li></ul><ul><ul><li>&quot;Preserving the right metadata is key to preserving digital objects&quot; (ERPANET Briefing Paper, 2003) </li></ul></ul><ul><ul><li>&quot;It's all about metadata&quot; (Cedars project manager, ca. 2000) </li></ul></ul>
  26. 26. Preservation Metadata <ul><li>Specific preservation metadata are necessary to ensure that information can be accessed in the future, e.g. metadata about: </li></ul><ul><ul><li>Provenance </li></ul></ul><ul><ul><li>Structure </li></ul></ul><ul><ul><li>File Format(s) </li></ul></ul><ul><ul><li>Technical Environment </li></ul></ul><ul><ul><li>Rights </li></ul></ul><ul><li>Much of the necessary metadata can be extracted automatically, e.g. via tools like JHOVE </li></ul>
  27. 27. Preservation Policy <ul><li>What do you want to preserve? </li></ul><ul><li>Why do you want to preserve? </li></ul><ul><li>How do you want to render an object in the future? </li></ul><ul><li>Furthermore ... </li></ul><ul><ul><li>Documentation </li></ul></ul><ul><ul><li>Policy for short-term preservation </li></ul></ul><ul><ul><li>Policy for long-term preservation </li></ul></ul><ul><ul><li>… </li></ul></ul>
  28. 28. Preservation Policy <ul><li>What kind of digital objects is the repository responsible for? </li></ul><ul><ul><li>Fixed format texts, images, web resources, complex digital objects, datasets, … </li></ul></ul><ul><li>What do you want to render in the future? </li></ul><ul><ul><li>Keep the original? </li></ul></ul><ul><ul><li>What is the original? </li></ul></ul><ul><ul><li>Offer extended functionalities? </li></ul></ul>
  29. 29. Preservation Policy <ul><li>What are the significant properties of the object? </li></ul><ul><ul><li>Appearance (layout, colour, font size, etc) </li></ul></ul><ul><ul><li>Behaviour (functionality, interaction, etc) </li></ul></ul><ul><ul><li>Structure (chapter, section, etc) </li></ul></ul><ul><ul><li>Content (text, video, audio, etc) </li></ul></ul><ul><ul><li>Context (cross-references, etc) </li></ul></ul><ul><li>How do you want to provide access? </li></ul><ul><ul><li>Designated User Community </li></ul></ul><ul><ul><li>Options for the user? </li></ul></ul>
  30. 30. But Policies/Strategies are not enough ... <ul><li>… we need tools that </li></ul><ul><ul><li>help choose & perform a strategy </li></ul></ul><ul><ul><li>make the strategy possible (emulators, migration tools) </li></ul></ul><ul><ul><li>maintain the link between originals and conversions </li></ul></ul><ul><ul><li>enable interoperability and co-operation between different repositories/archives </li></ul></ul><ul><li>Tools have to be implemented in the archiving system and archiving workflow. </li></ul><ul><li>Preservation has to come to practice! </li></ul>
  31. 31. Legal Aspects <ul><li>Copyright and other intellectual property rights (IPR) have a substantial impact on digital preservation </li></ul><ul><li>Preservation of digital materials is dependent on a range of strategies, which has implications for IPR in those materials </li></ul><ul><li>Consideration may need to be given not only to content but to any associated software </li></ul><ul><li>Specific permissions may be very challenging e.g. for webarchiving or digital art </li></ul>
  32. 32. Legal Aspects: Examples <ul><li>What will be covered by legal deposit? </li></ul><ul><ul><li>How much is served from within the country? </li></ul></ul><ul><li>Strategy </li></ul><ul><ul><li>The national publication archive </li></ul></ul><ul><ul><li>How are roles/responsibilities shared? </li></ul></ul><ul><ul><li>Web archiving initiatives (e.g. European Web Archive) </li></ul></ul><ul><ul><li>Development of electronic deposit systems </li></ul></ul><ul><li>International collaboration </li></ul><ul><ul><li>Other international repositories </li></ul></ul><ul><ul><li>Levels of redundancy </li></ul></ul><ul><li>Access restrictions </li></ul>
  33. 33. ..., but <ul><li>Digital preservation is often a legal grey area not yet understood or considered by legislators </li></ul><ul><li>Lack of legal certainty should not prevent digital preservation actions </li></ul><ul><li>Take action to manage risks </li></ul>
  34. 34. Trusted Repositories <ul><li>Why trusted repositories? </li></ul><ul><ul><li>It is very easy to manipulate digital information </li></ul></ul><ul><ul><li>The users need to trust the accessed information </li></ul></ul><ul><ul><li>Nobody is able to preserve everything – distributed preservation management </li></ul></ul><ul><li>Criteria of trusted repositories (i.e. TRAC , nestor) </li></ul><ul><ul><li>Administrative responsibility </li></ul></ul><ul><ul><li>Financial sustainability </li></ul></ul><ul><ul><li>Technical security </li></ul></ul><ul><ul><li>... </li></ul></ul>
  35. 35. Thank you very much for your attention! Comments? Questions? Stefan Strathmann Göttingen State and University Library [email_address]
  36. 36. Exercise <ul><li>Which Digital Preservation issues are relevant in the </li></ul><ul><li>context of your Digital Collection? How are they relevant? </li></ul><ul><li>Data creation? </li></ul><ul><li>Data management (collection management)? </li></ul><ul><li>Data storage? </li></ul><ul><li>Data documentation and description? </li></ul><ul><li>Data preservation? </li></ul><ul><li>Data use? </li></ul><ul><li>Rights management? </li></ul><ul><li>... </li></ul><ul><li>Try to describe a digital preservation Framework for your </li></ul><ul><li>institution. </li></ul>
  37. 37. Session Outline <ul><li>10:00 – 10:45 Lecture </li></ul><ul><li>10:45 – 11:00 Discussion </li></ul><ul><li>11:00 – 11:30 Coffee Break </li></ul><ul><li>11:30 – 12:30 Group Work </li></ul><ul><li>12:30 – 13:00 Groups present their results </li></ul><ul><li>13:00 – 13:15 Summary discussion </li></ul>