Preservation for digital materials –A. Heritage is obligatory for social continuity.B. Maintenance of assets is business requisite.
The Library of Alexandria continues to be a platform fordemocracy and has, since its inception opened its doors toall Egyptians to participate in open dialogues aboutreform […]. …I have no doubt that once life returns to its norms, theLibrary of Alexandria will continue its peaceful dialogueand civic responsibility as an institution of learning.Ismail Serageldin (via telephone)Director of the Library of Alexandria29 January 2011
• Myriads of ebook formats• Digital rights management• No national policy in place No one is in charge of the preservation of our growing cultural heritage in digital books.
Digital storage means it is easy to preserve inmultiple locations, in different formats. Moreredundancy than paper could ever afford.Storage costs trending toward insignificanton a per-byte basis.
High density storage:Each rack stores between0.5 and 0.75 petabytes.
A book is a complex assembly of content –Text, video, foreword, illustrations, photos.Each item needs careful management.
Digital production workflow requires: - content management system - sophisticated rights / use tracking - skilled personnel esp. engineering
Possibility of loss should motivate preservation.Loss can arise for oneself – or one’s partners – via – 1. complex, tightly coupled systems 2. struggle of memory vs. forgetting
From an engineering perspective, complexityin work flow systems raises the chance ofcatastrophic loss. Interactions become“tightly-coupled” as system efficiencyincreases.Tightly coupled systems are prone to routineaccidents with unforeseen cascading effects.
- Charles Perrow Sociology Dept, Yale UniversityEx.: Apollo 13 “[The] accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design.” source: Wikipedia.
Traditional (print) publishers are constructingbaroque and complex systems to maintainheritage production while transitioning tonewer business systems.Potential trigger events are ubiquitous.
“Were working quickly to recover from a major issue in one of our database clusters. Were incredibly sorry for the inconvenience.”
:“Unfortunately, I have mixed up the accounts andaccidentally deleted yours. I am terribly sorry forthis grave error and hope that this mistake can bereconciled. ” …“Our teams are currently working hard to try torestore the contents of this users account. We areworking on a process that would allow us to easilyrestore deleted accounts and we plan on rolling thisfunctionality out soon.” (em. added)
Many archives are lost by simply forgettingwhere or how they were stored. Canisters offilm, shelves of books, or backup tapes.“Those books are in a warehouse in Jersey.”“The servers are in one of our datacenters.”
Commonly, companies get acquired orshutter business operations; records areabandoned.Servers are redeployed without an audit.Databases run with no backup routines.
Preserving without metadata is not helpful.Metadata provides necessary context.Need to point backwards (provenance) andforward (to find superseded content)Without necessary metadata infrastructure,preservation architectures are rickety.
If you are a publisher, do you safeguard yourdigital assets? Has that process been auditedagainst security threats? Technical accidents?Are your workflows well understood? Haveyou conducted trial asset recovery exercises?Does your insurance company know?Do you mind if I give them a call?
For 100s of years, libraries preserved books.Sort of anyway (witness Alexandria).By-product of the inefficiency of distribution:Lots of libraries wound up with same books.Lots of copies keep stuff safe.
Books were once bespoke – monks with pens. Over around 500 years, books became industrialized – mass produced. Although ebooks are the ultimate industrial product, we come full circle. Digital books stand on threshold of tremendous mutability.
It is easy to store objects now, arguably, but … It is also easy to save objects that cannot be easily recovered. E.g., PDF can be a stew of non standard formats and arbitrary data. EPUB with DRM locks data with risk that key will be lost.
We may be able to preserve digital editions.Can we preserve the book of the future?• Scripted and Interactive• Networked and Distributed• Personalized and Mobile
In U.S., beyond public and academic librarycollections, the Library of Congress plays aunique role:§407. The Required copies or phonorecords shallbe deposited in the Copyright Office for the useor disposition of the Library of Congress.
Current U.S. code privileges print as bestedition for preservation. Slowly beingchanged.(Congress approved e-journal demanddeposit for LoC in the 2010 legislativesession).
The Copyright Office historically held thattransmission to the LoC of deposited digitalbooks would be an infringing faithful copy.despite –§ 704. In the case of published works, all copies,phonorecords, and identifying materialdeposited are available to the Library ofCongress for its collections ….
I led an initiative to confront this issue inSummer 2008. Fizzled.Convened meeting in NYC via Digital LibraryFederation with LoC, Mellon, Portico, NISO,IDPF, BISG, and publisher consultants andrepresentatives to discuss digital archivesthat would also serve as escrow.
The group decided that we should attempt atrial project with a small set of publishers whowould deposit sample books into Portico’srepository; the pilot would inform againstbusiness, legal, and policy issues.With that in hand, LoC would be in strongerposition to solicit rule changes by Congress.
Portico is a not for profit jointly representingpublishers and libraries preserving a growingvolume of digital content with high reliability in arights respecting, secure archive. Serving as anescrow, access to the archive by its members istriggered only under carefully-defined andcontractually-specified circumstances.Initial funding came from Mellon and LoC.
The AAP evidenced support.Ed McCoyd, Director of Digital Policy: “Thank you for speaking at the AAP Digital Issues Working Group meeting on [11 June 2008], regarding your interest in bringing parties together to develop digital book archives for preservation.
“Your points about preserving cultural patrimony,and assuring permanent access by libraries and otherdigital content customers as well as by the publishersthemselves, certainly resonated with the group.“I look forward to talking further about your initiativein the coming months, and will keep the publishersapprised of the additional details as they develop.”
It fizzled due to the inability of the LoC todetermine whether it had the capacity topursue a deposit initiative, and if so, whatdepartment of the Library should proceed.
Portico decided to continue the pursuit ofebooks deposits on behalf of its members.Members are primarily research libraries –‘cuz, Who else has a mandate to care aboutpreservation?With inevitable focus on academic ebookpublishers, e.g. Elsevier, starting in 2008.
Portico has been unable to penetrate tradepublishing sector through private initiative.No national policy requiring digital deposit.
Google Book Search (GBS) has emerged inthe absence of international rulemaking asthe default archive for some institutions.GBS does not do preservation qualityimaging, and there is requirement forcomprehensive publisher participation.This is NOT a good solution.
Europeana digital library seeking to build acollection preserving Europe’s vast culturalheritage.In September 2010,Ghent Univ. Librarybecame first in Europe to deposit publicdomain books scanned by Google intoEuropeana.
GBS partners’ collection management ofolder public domain books is a pale shadow ofthe comprehensive international policyframework needed to mandate preservationof our cultural heritage in digital books.Preservation must mandate participation forlibraries and publishers in a legal framework.
Widespread recognition of need for newcopyright regime supporting digital use.Internationally interoperable network ofrights assertions in rights registries capable ofautomated status query in geographic andnational domains is a very useful support.
Mandating the deposit of digital works forpreservation purposes to ensure adequaterepresentation of copyright assertions for acontent manifest would be one approach.Arguably avoids Berne / TRIPS.
Most publishers are demonstrably willing toengage in discussion of efforts supporting thedevelopment of persistent heritage archives.Requires hard political work conjoining somesubset of copyright policies, the imperativesof cultural heritage, technical architecture.
If nothing is done, and we cannot solve thisproblem for the simple digital books of the20th Century, what are we going to do withthe books of the 21st Century?
peter brantley director, bookserver project internet archive san francisco, ca(twitter) @naypinya (slideshare) peter at archive.org