The document discusses the need for digital preservation due to the risk of e-resources disappearing or becoming obsolete. It defines digital preservation as active content management to ensure long-term usability, authenticity and accessibility, rather than just backup storage. Core requirements for digital preservation include a sustainable economic model, standards compliance, and clear legal agreements. Emerging models of digital preservation include national libraries, community archives like Portico, and networked library efforts. Portico case study highlights its mission to preserve scholarly literature, methodology using migration and storage, and funding model with support from publishers and libraries.
Digital Preservation: New Issues and Responsibilities
1. Digital Preservation:
New Issues and Responsibilities
Eileen Fenton
Executive Director, Portico
Society for Scholarly Publishing
May 28-30, 2008
2. New Issues and Responsibilities
1. Why worry about digital preservation?
2. What digital preservation is (and is not)
3. Core requirements
4. Emerging models and case study
5. Insights from operations
3. 1. Why Worry About Digital Preservation?
E-resources can and do disappear.
• Removal: 27 months after publication up to 13% of
online cited sources are irretrievable*
• Obsolescence: Tapes of U.S. census data from
1960’s are now inaccessible
• Loss: Location of NASA’s original moon landing
recordings is (currently) unknown
• Funding: Funding for the long standing UK Arts and
Humanities Data Service discontinued April 2008
• Orphans: When ownership or other rights become
uncertain, availability is threatened
*Dellavalle, Robert P. et. al. “Information Science: Going, Going, Gone.”
Science 302, no. 5646 (Oct. 31, 2003), 787-8.
4. 1. Why Worry About Digital Preservation?
• The shift to reliance upon
40 e-resources is accelerating.
35
30
Average E-Resource • E-resources consume a
Expenditure as Percent
25 of Total LME
growing portion of total
20
library materials
15
expenditures.
10
• Libraries typically license
5
access to rather than own
0
outright e-resources.
0
6
7
8
1
2
3
4
5
9
5
00
-9
-9
-9
-9
-9
-0
-0
-0
-0
-0
94
95
96
97
98
00
01
02
03
04
-2
99
19
20
19
19
19
19
20
20
20
20
19
Mark Young and Martha Kyrillidou, ARL Statistics 2004-05 (Washington: Association of Research Libraries, 2005).
5. 2. What Digital Preservation Is (and Is Not)
Digital preservation is not:
• Reformatting from print to digital for access
surrogates or product line expansion
• Back-up or byte storage on various media
• Mirror sites or networks designed for reliable delivery
• Carried out within delivery systems
6. 2. What Digital Preservation Is (and Is Not)
• Active content management designed to ensure
enduring usability, authenticity and accessibility
over the very long-term
– See Trusted Digital Repositories: Attributes and
Responsibilities. An RLG-OCLC Report, May 2002.
– See The Preservation Management of Digital
Material Handbook
7. 3. Core Requirements for Digital Preservation
• Third-party with an organizational mission to carry
out preservation
• A sustainable economic model able to support
preservation activities over the targeted timeframe
• Technological infrastructure able to support selected
preservation strategy and best practices
8. 3. Core Requirements for Digital Preservation
• Clear legal rights and relationships with content
providers and (eventual) users
• Compliance with digital preservation standards and
best practices
– OAIS: Open Archives Information Systems
– TRAC: Trustworthy Repositories Audit and
Certification
– DRAMBORA: Digital Repository Audit Method Based
on Risk Assessment
9. 4. Emerging Models
• Models for e-journal preservation are emerging
• E-Journal Archiving Metes and Bounds: A survey of
the landscape published by the Council on Library and
Information Resources (CLIR), September 2006
reports on current approaches*
– A survey of 12 e-journal initiatives
– All efforts are described as young and requiring
ongoing evaluation
*http://www.clir.org/pubs/reports/pub138/pub138.pdf
10. 4. Emerging Models
1. National libraries
– To support mission or legal deposit
– Content scope and access terms vary
– Government funded
– Ex: National Library of the Netherlands, British
Library
1. Community supported third-party preservation
archives
– Provides a focused point of accountability
– Costs shared across participating publishers and
libraries
– Ex: Portico, ICPSR
11. 4. Emerging Models
3. Networked library efforts
– Responsibility shared across a group of institutions
– May (or may not) use specialized software
– Ex: C/LOCKSS (Lots of Copies Keeps Stuff Safe
and Controlled LOCKSS), National Digital
Information Infrastructure Preservation Program
(NDIIPP)
12. 4. Case Study: Portico
Mission
To preserve scholarly literature published in electronic form
and to ensure that these materials remain available
to future generations of scholars, researchers, and students.
13. 4. Case Study: Portico
Content Scope
In scope:
• Electronic scholarly literature, initially e-journals;
other genres under active discussion
• Intellectual content including text, tables, images,
supplemental files
• Limited functionality such as internal linking
Out of scope:
• Full functionality of publisher’s delivery platform
• Today’s ephemeral HTML rendition
14. 4. Case Study: Portico
Methodology: Migration and Byte Storage
• Publishers deliver “source files” (SGML, XML, PDF,
etc.) to Portico.
• Portico converts proprietary source files from multiple
publishers to archival formats suitable for long-term
preservation.
• 7.1 million+ journal articles preserved to date;
hardware systems capacity supports ingest of 1-2
million articles / month
• Portico migrates files to new formats as technology
changes.
15. 4. Case Study: Portico
Access to the Preservation Archive
• Only participating libraries and publishers may access
the archive.
• Access is offered when specific trigger event
conditions prevail and when titles are no longer
available from the publisher or other sources.
• Trigger events initiate campus-wide access for all
libraries supporting the archive regardless of previous
subscriber status.
• Libraries may rely upon the Portico archive for post-
cancellation access, if a publisher chooses to name
Portico as one mechanisms to meet this obligation.
For approximately 85% of preserved titles Portico is
so named.
16. 4. Case Study: Portico
Sources of Support
• Early support provided by The Andrew W. Mellon Foundation,
Ithaka, JSTOR and the Library of Congress
• Ongoing support for the archive comes from the primary
beneficiaries of the archive.
• Contributing publishers supply content and make an annual
financial contribution ($250 to $75,000).
• More than 7,550 journals (~14M articles) from 55 publishers are
committed to the archive to date.
• Libraries make an Annual Archive Support (AAS) payment based
upon total library materials expenditures ($1,500 to $24,000).
• More than 430 libraries from 13 countries participate in the
archive
17. 5. Insights from Operations: Publishers
• Publishers are developing multi-layered strategies to
mitigate risk and meet library requirements.
• Cooperative interaction with archival partners is
required to establish data flows and respond to
questions.
• Third-party preservation archives can supply feedback
regarding data consistency and standards
conformance.
18. 5. Insights from Operations: Libraries
• Libraries are actively evaluating options for meeting
preservation obligations and needs.
• Multi-layered strategies to preserve a wide array of
print and e-content are being developed.
• Breadth of preservation strategy varies with
institutional size.
19. 5. Insights from Operations: Preservation
Archives
• Archives must be prepared to respond to complex,
still emerging e-publishing best practices.
• File usability vs. validity creates special preservation
challenges.
• Gathering and communicating holdings information is
critical and challenging.
Removal – we all have encountered this experience Obsolescence – we all have 5.25 inch floppy disks we can no longer read Loss – unless well managed, data can and does go missing Funding – as administrations (gov or academic) come and go priorities change and funds are reallocated E-journals are vulnerable
Libraries are allocating significant funds to e-resources Since they do not hold a local copy they can not provide hands-on preservation as they did with print
Explain who are RLG, OCLC and leading role in digital preservation standards development
Third party = independent of the content creator. Examples: National libraries, Portico Economic model: financial support must be diverse enough to minimize risk – link to AHDS govt funding model Tech infrastructure: must be refreshed as technology and preservation strategies, tools evolve
OAIS – a NISO standard TRAC – developed with input from RLG, OCLC and Center for Research Libraries with Mellon funding. Note Portico’s participation
The Interuniversity Consortium for Political and Social Research ( ICPSR ) provides access to an extensive collection of downloadable data.
Portico’s approach builds from its preservation-oriented mission.