Digital Preservation in Perspective:                  How far have we come, and whats next?                               ...
A brief history of digital preservation          • Early statements of the problem                  – Jay Bolter, Margaret...
Jeff Rothenberg   Future Perfect 3/26/2012   Rev: 2012-03-24   Chart 1
Jeff Rothenberg   Future Perfect 3/26/2012   Rev: 2012-03-24   Chart 2
Color photo by Jeff RothenbergJeff Rothenberg   Future Perfect 3/26/2012   Rev: 2012-03-24   Chart 3
Outline                  • What should we mean by digital preservation?                  • Levels of awareness of the prob...
What should preservation mean?     “The goal of digital preservation is the accurate rendering of authenticated      conte...
Preserve originals as well as “vernacular renditions”                          The Canterbury Tales                  Origi...
A particular “view” of information may be crucial                         Example: Space Shuttle O-ring damage vs. tempera...
Revealing View of Space Shuttle O-ring Data                  Extrapolation of damage curve to the 31o F                  t...
Furthermore, many digital artifacts are inherently digital    • Inherently digital artifacts are those whose perceptibilit...
What you see is not what you get                  V2.24 ERwin                   if                     %JoinPKPK(oldrows,n...
Render unto seer...Jeff Rothenberg   Future Perfect 3/26/2012               Rev: 2012-03-24   Chart 11
In fact, every digital artifact is a program                  • A program                     – Is a sequence of commands ...
Digital information promises to last better than analog        • Digital objects do not decay, fade, tear, crumble, dissol...
So the best we can say is...  “Digital objects last forever — or five years, whichever comes first”Jeff Rothenberg   Futur...
So the best we can say is...  “Digital objects last forever — or five years, whichever comes first”                       ...
Outline                  • What should we mean by digital preservation?                  • Levels of awareness of the prob...
Levels of awareness of the problem                   (by disciplines/institutions/individuals)                            ...
Innocence                  • Why should digital artifacts be any different?                     – Preservation is preserva...
Awakening                  • Digital poses unique problems                     – Media obsolescence                     – ...
Analysis                  • Digital artifacts                      – What are their essential characteristics for preserva...
Looking under the streetlamp            • Metadata                  – Dublin Core, etc.                  – Depends on the ...
The Open Archival Information System Reference Model                             (OAIS)Jeff Rothenberg   Future Perfect 3/...
Experimentation/Demonstration                  • BBC Domesday Book / CAMiLEON Project                     – Early warning ...
The BBC Domesday / CAMiLEON Project                  Emulated at the University of Leeds, U.K. (2002)Jeff Rothenberg      ...
EDSAC: the first electronic digital computerJeff Rothenberg        Future Perfect 3/26/2012    Rev: 2012-03-24   Chart 25
Jeff Rothenberg   Future Perfect 3/26/2012   Rev: 2012-03-24   Chart 26
Jeff Rothenberg   Future Perfect 3/26/2012   Rev: 2012-03-24   Chart 27
Renewing the ErlKing                  • An interactive mixed-media video experience                     – By Roberta Fried...
The ErlKing in the Guggenheim’s “Seeing Double” Show                        (March 18, 2004)Jeff Rothenberg   Future Perfe...
KB’s Dioscuri Emulator                  Running my 1982 Calendar/1 ProgramJeff Rothenberg     Future Perfect 3/26/2012   R...
Where are we now?                  • Somewhere between 4 and 5                     – Looking under the streetlamp         ...
Outline                  • What should we mean by digital preservation?                  • Levels of awareness of the prob...
Responses                  • Denial                      – What problem?                  • Wishful thinking              ...
Denial                  • Just save bits                     – And hope for the best (let our grandchildren worry about it...
Preservation approaches           • Save and run obsolete hardware and software                  – In “computer museums”  ...
Wishful thinking                    • Metadata is all we need                            – Describe formats, behavior, etc...
Misguided efforts (IMHO)                  • Focus on short-term preservation                     – Urgent enough to preclu...
Facing reality                  • Technological issues                     – For “inherently digital” artifacts (which wil...
Current implementation efforts                  • NARA’s ERA project                     – Ill-conceived: assumed a soluti...
Where are we now?                          • Still at 1?                                   – Denial                       ...
Outline                  • What should we mean by digital preservation?                  • Levels of awareness of the prob...
Distinctions across contexts                  • Disciplines: Libraries, Archives, Museums                     – Archives: ...
Outline                  • What should we mean by digital preservation?                  • Levels of awareness of the prob...
Remaining challenges                  • Integrate true long-term perspective                     – Render “inherently digi...
Expected cost & effectiveness comparisons                                                    archaeology                  ...
References for Jeff Rothenberg                  http://www.JeffRothenberg.org                           jeff@JeffRothenber...
Upcoming SlideShare
Loading in …5
×

Jeff Rothenberg Digital Preservation Perspective

10,368 views

Published on

Digital Preservation in Perspective:
How far have we come, and what's next?
Jeff Rothenberg

Published in: Technology, Education

Jeff Rothenberg Digital Preservation Perspective

  1. 1. Digital Preservation in Perspective: How far have we come, and whats next? Jeff Rothenberg March 26, 2012 Color photo by Jeff RothenbergJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24
  2. 2. A brief history of digital preservation • Early statements of the problem – Jay Bolter, Margaret Hedstrom, David Bearman – Avra Michelson’s & my 1992 American Archivist paper – My 1995 Scientific American article – Into the Future film (CLIR, 1997; shown on PBS) – Tora Bikson’s & my 1999 report for the Dutch National Archives • Gradual recognition of the problem – By librarians, archivists, modern museum curators – But without much technological depth of understanding in most cases – OAIS Preservation Planning assumed migration, though admits problems • Some experiments & demonstrations – U. Leeds & U. Mich: CEDARS & CAMiLEON projects; BBC Domesday Book – Dutch National Archives Testbed: migration & UVC “data archiving” – UCSD Supercomputing Center & NARA: formalisms (e-mail only) – Guggenheim “ErlKing” renewal project – Dutch Royal Library (KB): Dioscuri emulator & eDepot • Few serious attempts at implementation – Most implementations essentially ignore long-term preservationJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 0
  3. 3. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 1
  4. 4. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 2
  5. 5. Color photo by Jeff RothenbergJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 3
  6. 6. Outline • What should we mean by digital preservation? • Levels of awareness of the problem • Responses • Distinctions across disciplines • Remaining challengesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 4
  7. 7. What should preservation mean? “The goal of digital preservation is the accurate rendering of authenticated content over time.” —ALA “medium” definitionJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 5
  8. 8. Preserve originals as well as “vernacular renditions” The Canterbury Tales Original Vernacular RenditionWhan that Aprill, with his shoures soote When in April the sweet showers fallThe droghte of March hath perced to the roote That pierce March’s drought to the root and allAnd specially from every shires ende And specially from every shire’s endOf Engelond, to Caunterbury they wende, Of England they to Canterbury went,The hooly blisful martir for to seke The holy blessed martyr there to seekThat hem hath holpen, whan that they were seeke. Who helped them when they lay so ill and weak• Used by scholars for serious research • Used by non-scholars for casual research• Used to generate & evaluate vernacular renditions • May be used by scholars for research as well• Accessed by non-scholars for aesthetic purposes • Not thought of as a preservation copy (with help, e.g., see below) • Not used as a source for later vernacular renditionsJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 6
  9. 9. A particular “view” of information may be crucial Example: Space Shuttle O-ring damage vs. temperature Prior to Challenger 3 1Levels of 2 1 O-ringdamage 1 1 1 1 2 0 1 3 1 1 2 1 1 1 2 1 1 1 1 53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 80 81 Temperature °F Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 7
  10. 10. Revealing View of Space Shuttle O-ring Data Extrapolation of damage curve to the 31o F temperature forecast for Challenger’s launch on January 28, 1986. Dots indicate temperature and O-ring damage for 24 successful launches prior to Challenger. Curve shows that increasing damage is related to cooler temperature. 3 3 2 2 1 1 0 0 30o 35o 40o 45o 50o 55o 60o 65o 70o 75o 80o 85o Temperature oFJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 8
  11. 11. Furthermore, many digital artifacts are inherently digital • Inherently digital artifacts are those whose perceptibility, meaning, or usability arise from and rely on their being encoded in digital form • They cannot be meaningfully represented as page images – Doing so loses essential aspects of their contents and/or behavior • Examples include dynamic, active or interactive artifacts – Multimedia (e.g., web pages, CD-ROM publications, Ph.D. dissertations) – Dynamically generated (e.g., JavaScript, cgi, ASP or PHP web pages, Servelets) – Active presentation (e.g., animation, simulation, virtual reality) – Interactive (e.g., applets, interactive virtual reality, games) – Digital artworkJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 9
  12. 12. What you see is not what you get V2.24 ERwin if %JoinPKPK(oldrows,newrows,” <> “,” or “) then select count(*) into numrows from %Child where %JoinFKPK(%Child,oldrows,” = “,” and”); if (numrows > 0) then signal parent_updrstrct_err end if; end if; if %JoinPKPK(oldrows,newrows,” <> “,” or “) then update %Child set %JoinFKPK(%Child,newrows,” = “,”,”) where %JoinFKPK(%Child,oldrows,” = “,” and”);Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 10
  13. 13. Render unto seer...Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 11
  14. 14. In fact, every digital artifact is a program • A program – Is a sequence of commands in some formal language – That is intended to be interpreted – By an interpreter that understands that language • An interpreter – Is an active process – That knows how to perform commands – Specified in a given formal language • Interpretation ultimately involves hardware – ASCII codes are rendered by a printer or display – More complex entities are interpreted by software (applications) – But all software is ultimately interpreted by hardwareJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 12
  15. 15. Digital information promises to last better than analog • Digital objects do not decay, fade, tear, crumble, dissolve, etc. – Their media may, but not the bits themselves • A bitstream lasts forever – Producing exactly the same behavior, without loss (at least in principle) – So long as it can be interpreted correctly • But interpreting a bitstream correctly requires software – And software must be run on hardware (a computer) – A computer is (ultimately) an analog device, that does decay – And both hardware and software become obsolete, long before they decayJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 13
  16. 16. So the best we can say is... “Digital objects last forever — or five years, whichever comes first”Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 14
  17. 17. So the best we can say is... “Digital objects last forever — or five years, whichever comes first” min ( ∞ ,5)Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 15
  18. 18. Outline • What should we mean by digital preservation? • Levels of awareness of the problem • Responses • Distinctions across disciplines • Remaining challengesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 16
  19. 19. Levels of awareness of the problem (by disciplines/institutions/individuals) • Innocence • Awakening • Analysis • Looking under the streetlamp • Experimentation/Demonstration • Where are we now?Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 17
  20. 20. Innocence • Why should digital artifacts be any different? – Preservation is preservation, isn’t it? • Except for media obsolescence – Isn’t this just analogous to medieval monks copying manuscripts? • Digital artifacts don’t decay or change – Isn’t this a dream come true for preservationists?Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 18
  21. 21. Awakening • Digital poses unique problems – Media obsolescence – Description (unique and complex attributes) – Cataloging (ephemeral reference, links) – Metadata (unique requirements) – Format/encoding (interpretation, conversion, corruption) – Future rendering (in the face of obsolete software and hardware) • Digital preservation must be proactive – Over relatively short timeframes (5 years?) – Otherwise artifacts are likely to be irretrievably lostJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 19
  22. 22. Analysis • Digital artifacts – What are their essential characteristics for preservation? • Authenticity – What does this mean for digital artifacts? • Rendering – How can we guarantee proper (or any) rendering in the future? • Preservation – What does (should) this mean for digital artifacts in various disciplines? • Costs – What are the up-front and long-term costs of digital preservation? – How should these costs be paid and by whom?Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 20
  23. 23. Looking under the streetlamp • Metadata – Dublin Core, etc. – Depends on the nature of digital artifacts & technical preservation schemes • Reference models – OAIS – Premature in the absence of viable technical preservation schemes • Institutional process models – Premature in the absence of defined, viable technical preservation schemes – May tend to lock in approaches that are not viableJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 21
  24. 24. The Open Archival Information System Reference Model (OAIS)Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 22
  25. 25. Experimentation/Demonstration • BBC Domesday Book / CAMiLEON Project – Early warning of the need for timely, extreme action – Demonstrated the potential of hardware emulation • Dutch Archives Testbed – “Discovered” that migration is very hard (duh!) • Other emulation examples – Apple’s M68000 emulator for PowerPC – U. Warwick’s EDSAC emulator – Emory U’s MARBL collection – Guggenheim: Renewing the ErlKing – KB’s Dioscuri Emulator • PLANETS, KEEP – Continuing to explore technically viable approachesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 23
  26. 26. The BBC Domesday / CAMiLEON Project Emulated at the University of Leeds, U.K. (2002)Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 24
  27. 27. EDSAC: the first electronic digital computerJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 25
  28. 28. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 26
  29. 29. Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 27
  30. 30. Renewing the ErlKing • An interactive mixed-media video experience – By Roberta Friedman and Grahame Weinbren – That overlays text and graphics on video content – And branches in response to user touchscreen input • Highly innovative when created in 1982 – Pushed the limits of affordable computers and video display – Included a custom-built “authoring” environment – Widely exhibited in major museums and other venuesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 28
  31. 31. The ErlKing in the Guggenheim’s “Seeing Double” Show (March 18, 2004)Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 29
  32. 32. KB’s Dioscuri Emulator Running my 1982 Calendar/1 ProgramJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 30
  33. 33. Where are we now? • Somewhere between 4 and 5 – Looking under the streetlamp – Experimentation/Demonstration • Few end-to-end implementations – Except for page-image artifacts (e.g., LOCKSS, Portico) – And KB eDepotJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 31
  34. 34. Outline • What should we mean by digital preservation? • Levels of awareness of the problem • Responses • Distinctions across disciplines • Remaining challengesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 32
  35. 35. Responses • Denial – What problem? • Wishful thinking – Deus ex machina • Misguided efforts (IMHO) – Digital garden paths • Facing reality – What will it take? • Where are we now?Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 33
  36. 36. Denial • Just save bits – And hope for the best (let our grandchildren worry about it) • Expect commercial sector solutions – Microsoft, IBM, etc. will save us • Popular formats will live forever or auto-migrate – (What the ancient Egyptians thought) • Convergent formats like HTML and XML solve everything – But these are really just “scaffold” formats embedding othersJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 34
  37. 37. Preservation approaches • Save and run obsolete hardware and software – In “computer museums” – To read documents by running the original programs that created them • Rely on universal, formal description of logical formats – To allow interpreting those formats in the future – Thereby correctly rendering saved digital artifacts • Rely on standards and migration – Expect new programs to read old documents in enduring standard forms – Convert documents from old standards to new ones as standards evolve • Rely on emulation of obsolete hardware to run saved software – Requires no migration or conversion (aside from media) – Saves originals in original formJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 35
  38. 38. Wishful thinking • Metadata is all we need – Describe formats, behavior, etc. • Format migration – The game of “telephone” • Formal encoding (UCSD/NARA-ERA) – Maybe someday • Rely on future cryptography – Counterexample: Hieroglyphics • Digitize to preserve – e.g., ShoahJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 36
  39. 39. Misguided efforts (IMHO) • Focus on short-term preservation – Urgent enough to preclude long-term focus (e.g., JSTOR?) • Reject emulation without understanding it – Seems like smoke and mirrors • LC, NARA-ERA – Full speed ahead and damn the technical realitiesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 37
  40. 40. Facing reality • Technological issues – For “inherently digital” artifacts (which will become more prevalent) • Defining/preserving “digital originals” – Retaining original rendering & behavior – Enabling repeated “vernacular extraction” of surrogates • Comparative cost analyses – Informed by technological understanding – Looking at overall lifecycle costs • Realistic process models – Based on technologically viable approaches • Facing long-term issues (KB/IBM-NL eDepot) – Loss of metadata – Partial loss or corruption of archival information package indexesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 38
  41. 41. Current implementation efforts • NARA’s ERA project – Ill-conceived: assumed a solution would magically appear • LC still seems somewhat aimless – Lost half their NDIIP funding after 2006 (some since restored) • Most so-called “archiving” efforts ignore preservation – LOCKSS, Portico (journal archiving) offer no real preservation – Internet Archive seems based on wishful thinking • BL proceeding rationally – Pursuing a broadly-based, intelligent strategy • KB may still be in the lead – eDepot designed to address long-term preservation – Using a two-pronged migration/emulation approach – Planets & KEEP projects continuing to explore longer-term issuesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 39
  42. 42. Where are we now? • Still at 1? – Denial • Somewhere between 2 and 4? – Misguided efforts – Facing realityJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 40
  43. 43. Outline • What should we mean by digital preservation? • Levels of awareness of the problem • Responses • Distinctions across disciplines • Remaining challengesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 41
  44. 44. Distinctions across contexts • Disciplines: Libraries, Archives, Museums – Archives: preserve “record” value – Libraries: preserve[/contextualize] content/rendering – Museums: preserve/recreate/contextualize experience • Institutions: National, Commercial, NGO – Commercial: film industry, petrochemical, pharma (core vs. ancillary assets) – Shoah Fndn (Spielberg): http://dornsife.usc.edu/vhi/preservation • Individuals – Mostly not yet begunJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 42
  45. 45. Outline • What should we mean by digital preservation? • Levels of awareness of the problem • Responses • Distinctions across disciplines • Remaining challengesJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 43
  46. 46. Remaining challenges • Integrate true long-term perspective – Render “inherently digital” artifacts – Recognize the executability of all digital artifacts – Preserve digital originals and facilitate “vernacular renditions” • Engage the Computer Science (ICT) field – Conference sessions, working groups, etc. • Perform serious cost and process analyses – Based on viable technological approaches • Try some small-scale “end-to-end” demonstrations – Long-term focus – Inherently digital artifacts – Preserve digital originals and produce “vernacular renditions” – Develop and test realistic process models – Instrument, measure, and evaluate: - Authenticity, quality, accessibility, usability, cost - Effort, scalability, reproducibility (of process)Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 44
  47. 47. Expected cost & effectiveness comparisons archaeology formalizatio standards emulation migration H,M,L: High, Med, Low viewers +,- : Frequent, Rare Cost: Per-approach (x 1) Create EVM or formalism 0 H/- 0 0 0 H/- Per-platform (x 10) Create H/W emulators 0 0 0 0 0 H/- Port to new platforms 0 L/- M/- H/- M/- M/- Per-format (x 1000) Reverse-engineer 0 H/- H/- H/+ H/+ 0 Obtain necessary S/W 0 0 0 M/+ M/- L/+ Per-artifact (x 100,000,000) Process at Ingest 0 H H 0 0 L Convert over time 0 M/- H/- H/+ H/+ 0 Access H M L L L L Effectiveness: On each artifact L M M M M H % of formats handled L L L M L HJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 45
  48. 48. References for Jeff Rothenberg http://www.JeffRothenberg.org jeff@JeffRothenberg.orgJeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 46

×