ArchivingWhat is it and why should it be important to me? John Shaw Director, Publishing Technologies SAGE Publications, U.S.
I. Archiving OverviewII. Types of ArchivesII. A SAGE ExampleIV. Risks, Questions, and More Questions
Archiving Part I:Archiving Overview
What is an Archive? An authoritative collection Preserved and professionally managed in perpetuity History, institutional commitment & policy, integrity re: preservation “…information needed for society’s memory.” "Schellenberg in Cyberspace," American Archivist 61:2 (Fall 1998), p. 309-327. Preservation first
What is a Repository? “A place where things can be stored and maintained; a storehouse.” [Society of American Archivists Glossary] “Depository” is same also library that receives government documents to public access Not all repositories are archives
Why Care?“Preserving information for decades or even centuries has proved important. Shang dynasty (12th century BC) Chinese astronomers inscribed eclipse observations on “oracle bones" (animal bones and tortoise shells). About 3200 years later researchers used these records, together with one from 1302BC, to estimate that the accumulated clock error was just over 7 hours, and from this derived a value for the viscosity of the Earths mantle as it rebounds from the weight of the glaciers..”********
Why Care?“These timescales of many decades, even centuries, contrast with the typical 5-year lifetime for computing hardware and digital media” “A Fresh Look at the Reliability of Longterm Digital Storage.” Baker, Mary, et al.. EuroSys 06, April 18-21, 2006
Why Care?Preservation: Digital information is impermanent Publisher: Safety to insure ongoing availability of your content Your library customers: Custodianship to insure continuity of the record of scientific progress Very long view: epistemology, history of science and culture
What Should be Preserved? Scholarly content Research materials Web-based, digitally born content
How e-Archives Differ Mission: collection v. preservation Access control, dark v. light Deposits Why: voluntary v. mandated Who: author v. publisher What: manuscripts v. final work When: backfile v. current content Future format migration Rights transfer Costs
Archiving Part II:Types of Archives
Types of Archives: National archives Institutional repositories Community-based archives Product solution archives
Types of Archives: National Dutch National library Koninklijke Bibliotheek (KB) British Library NIH – PubMedCentral? “NIH’s digital repository for biomedical research” Library of Congress?
KB: Dutch National Library Mission: Legal deposit library “…collect, catalogue and preserve all publications appearing in the Netherlands. ” Capable of ingesting 60,000 articles/day Deposits: Source files from publishers Automated, strict Costs? Access Control: Local patron access Publisher sets remote access rules
KB: Dutch National Library Migration: Preservation research leader Committed to format migration Archiving agreements with: OUP, Sage, Blackwell, Elsevier, Kluwer Academic, etc.
The British Library Legal Deposit Pilot Mission: Legal deposit library UK-published (to start) Pilot: Legal deposit for e-journals 23 volunteer publishers Secure infrastructure Uses DigiTool by Ex-Libris Shared with the other UK legal deposit libraries To “scope and test” ingest, storage, retrieval Cost?
The British Library: Preservation and Migration BL’s future for managing digital assets preserve any type of digital material in perpetuity Migration ensure that users can view the material with contemporary applications preserve the original look-and-feel where possible Access Control “appropriate permissions”
PMC: US National Library of Medicine Journal Archive Mission: Make research more accessible Free full-text archive of 230 journals Deposit: publishers submit source files Migration Access Control Cost?
PMC: Depository forNIH-Funded Research Articles Authors of NIH-funded articles “encouraged” to deposit final manuscript “After all modifications due to …peer review” MS Word, PDF, etc. With supplementary information Publisher can replace with published version To be required soon?
Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) – formed in 2000 Members: National Library of Medicine, the National Agricultural Library, the National Institute of Standards and Technology, the Research Libraries Group, the OCLC Online Computer Library Center, and the Council on Library and Information Resources Preliminary investigation and software development phase Primarily e-journal deposit Future …???
Types of Archives: Institutional University with expansive focus Stanford Digital Repository Automated LOCKSS
Stanford Digital Repository Stanford Univ. Libraries initiative Digital preservation serving Stanford University Broader academic community Publishers Principles: Trust, Security, Transparency Costs?
LOCKSS Technology to preserve local library collection Automated, self-correcting cache servers Requires LOCKSS server at library Requires publisher participation Builds collection of all resources which the institution licenses Goes online to users if data source becomes unavailable Provides access to static “HTML images” of source Costs
Types of Archives: Product Solution Non-profit organization Portico
Portico Mission: scholarly preservation Standalone archive Initiated by JSTOR, with grant funding Deposits: source files from publisher Migration: planned Costs Publishers annual fee $250 to $75,000 based on annual revenue Libraries annual fee $1,500 to $24,000 based on Library Materials Expenditure
Portico: Access Control Member libraries get access: “when specific trigger events occur, and when titles are no longer available from the publisher or other source.” Trigger events include: Publisher stops operations Publisher ceases to publish a title Publisher no longer offers back issues Catastrophic and sustained failure of a publisher’s delivery platform Can also fulfill “perpetual access” subscription obligations
Types of Archives: Community Community based and openly run CLOCKSS
CLOCKSS (Controlled LOCKSS) Long-term global archiving solution Community-managed, failsafe repository for scholarly content Serve libraries & publishers in the event of a long-term business interruption Publishers participation is voluntary Small number library participants maintain the archive on behalf of larger community libraries preserve member publisher content whether they subscribe or not Release only after a trigger event Publisher, libraries, and society collaborative decision to release “cost sharing” for system, not access Costs?
Summary: How Repositories Differ Stated purpose Dark v. light Complete backfile v. current only Deposits Who: author v. publisher What: manuscripts v. final work Why: voluntary v. mandated Rights transfer Access control Costs
Archiving Part III:A SAGE Example
Why Archive? SAGE’s commitment to customers and partners Critical to society arrangements Essential for new e-sales (consortia + single institutions) – Perpetual access Business continuity Long-term preservation We are not archiving experts!
Where to Archive? Dutch KB CLOCKSS LOCKSS Portico Library of Congress British Library
How to Archive? Provide details of digital availability Provide sample of content Provide details of content format (DTD) Send all backfile for loading Set up content flow for ongoing content
SAGE Experience with DutchKB Contract and negotiation Contact with technical team Delivery of samples and details of scope Follow-up questions Visit KB – Find out what’s happening Delivery of back content Delivery of ongoing issues Ongoing issue discrepancies
Archiving Part IV:Questions, Questions and More Questions
Measurements of Success Who is overseeing the archiving process and governance? Compliance? Accuracy and legitimacy? Financial stability?