Closing the Content-Rhetoric Gap: Working on Underdeveloped Writing Skills wi...
Digital Preservation Policy at the Library of Congress
1. Emily Reynolds & Chelcie Rowell
Digital Preservation Policy Development 2012 OSI Junior Fellows
at the Library of Congress The Library’s Strategic Plan sets an outcome for 2014 that “The Library has identified
and proposed criteria for the preservation of the Library’s digital materials.” We’re working
to effect that outcome by gathering information about existing digital preservation practices and
comparing those practices to the life cycle framework developed by the Library’s Preservation Working
Group. Ultimately, our comparative analysis may contribute to revision of the life cycle framework.
1. Prioritize and select digital collection.
2. Determine user group. Serial & Government Publications
3. Verify content, type, and metadata. Manuscript Division Geography & Map Division
Division
PLAN
4. Determine requirements for access, storage space, and server space.
5. Determine resource needs.
1. Unpublished born-digital records received as part 1. Historic newspapers in the public domain (high 1. Data is acquired via Federal deposit, donation,
of manuscript collections of personal papers and demand and value but low risk) on high quality or Acquisitions; CCP project data is downloaded
organizational records. microfilm. from the Internet or acquired directly from client.
2. Discoverable through finding aids served publicly 2. Publicly served immediately on the Web through 2. Most content is publicly accessible, with some
on-site at the Manuscript Reading Room. Access Chronicling America site and API. private data used for specific Congressional
restrictions may require more granular access 3. METS/ALTO objects and associated images. research projects.
1. Assign or confirm unique identifiers. controls or enforceable triggers that make the 4. Storage and access requirements 3. Accept any formats. Some content delivered
2. Pre-process items as appropriate. collection available at a later date. • 4 preservation bags (2 copies per 2 on CD may have little existing metadata. Data
3. Transfer, package, and inventory items. 3. Sufficient metadata to record context of digital preservation server) + access bag requested by clients is verified for accuracy and
GET
materials within archival collection. Bagged media • online and offline storage completeness, as are data sources.
+ reports + collection metadata = SIP. • 54 Mb/page at approx. 10,000 pages per 4. Server space is requested based on projected
4. Access restrictions will determine reading room or partner per month for 20 years à tens of need, datasets kept up in archival storage.
Web access by collection. Storage requirements millions of pages 5. Resource needs relate to the variety of formats
are currently small but expected to grow. 5. Responsible for ingesting and sustaining digital taken in – management and access can be
5. Resource needs still being determined. materials rather than digitizing and describing. difficult, as most are proprietary and complex.
1. Describe or catalog digital material at
DESCRIBE
collection, object, or file level according to
LOC best practices. 1. Unique identifiers are assigned for tangible media 1. LC control numbers used as identifiers. Project
1. METS objects are produced at the level of title,
2. Provide metadata to identify, characterize, in archival collection. data are filed by year, organization, client, and
reel, and issue (including pages).
2. Accessioning: Discovery of media à Remove
or place the digital object in context. 2. Metadata relates each page-level digital object topic.
from original location and note context à Physical 2. If purchased, data is checked for completeness
to the issue, reel, and title. Selected standards
custody transfer to shelf à Collection catalog against purchase order. Byte count is checked
include METS, MODS, PREMIS, MIX, MARC
record à Registration and bagging prep for files delivered by FTP. Data is sorted by
XML, and Z39.87. Once objects are ingested, ITS
Bagging: Receive and log media à Virus scan geographic location to prepare for cataloging.
periodically supplies technical, administrative,
à Disk image à Bag media separately using Original copies of data are transferred to storage
and preservation metadata to Serials.
BagIt à Create preliminary reports inc. directory separately from copies used in projects.
structure, file identification, & validation à QR of 3. NGA CDs and DVDs (excluding series CB01) are
bags and media logs à Return media to registrar logged in an inventory. Files are checked against
1. Educate. 3. Registrar appraises flagged media & deletes as
2. Survey. an inventory if one was received from the creator;
SUSTAIN
appropriate à Create SIP bag à Create tangible otherwise BagIt serves as the inventory. Some
3. Determine needs. media backup stored in stacks à Update tracking files zipped for storage. MS Access database is
4. Develop preservation action plan. database with # bytes ingested, # bytes disposed being developed to inventory CCP project files.
1. Partners assign files unique identifiers according
5. Use existing LOC processes, tools, and practices. à Transfer to long-term storage (TBD)
to NDNP file naming conventions, in addition to
6. Review preservation actions, plans, and policies. generating digital signatures and fixity values
using the NDNP Digital Viewer & Validation
Toolkit.
2. Batches of approx. 10,000 pages/month from
partner institutions arrive at Serials aggregated
on external hard drives.
1. During processing, original media are noted 3. NDNP CTS workflow: Register batch delivery
1. Determine and manage access. 1. Bibliographic catalog record is created if
within collection-level finding aids encoded in à Verify batch à QR à Mount drive to Bronze
AVAILABLE
2. Monitor preservation status. EAD. à Inventory à Malware scan à Copy to RDC necessary (National Geospatial Intelligence CDs
3. State missing or corrupted file. and DVDs have an existing record).
MAKE
2. Before ingest, sufficient metadata is created to staging à Bag in place à Ingest into ChronAm
4. Ensure authenticity. record context of digital materials within archival staging à Accept batch and copy to tape Sun 29, 2. Metadata is extracted from the data if possible.
5. Enable search across data sets. collection. tape Frontier, and access Sun 11 à Ingest into
6. Enable search for derivative content. ChronAm production à Return drive