Digital Preservation Policy Development at the Library of Congress
Emily Reynolds & Chelcie RowellDigital Preservation Policy Development 2012 OSI Junior Fellowsat the Library of Congress The Library’s Strategic Plan sets an outcome for 2014 that “The Library has identified and proposed criteria for the preservation of the Library’s digital materials.” We’re working to effect that outcome by gathering information about existing digital preservation practices and comparing those practices to the life cycle framework developed by the Library’s Preservation Working Group. Ultimately, our comparative analysis may contribute to revision of the life cycle framework. 1. Prioritize and select digital collection. 2. Determine user group. Serial & Government Publications 3. Verify content, type, and metadata. Manuscript Division Geography & Map Division Division PLAN 4. Determine requirements for access, storage space, and server space. 5. Determine resource needs. 1. Unpublished born-digital records received as part 1. Historic newspapers in the public domain (high 1. Data is acquired via Federal deposit, donation, of manuscript collections of personal papers and demand and value but low risk) on high quality or Acquisitions; CCP project data is downloaded organizational records. microfilm. from the Internet or acquired directly from client. 2. Discoverable through finding aids served publicly 2. Publicly served immediately on the Web through 2. Most content is publicly accessible, with some on-site at the Manuscript Reading Room. Access Chronicling America site and API. private data used for specific Congressional restrictions may require more granular access 3. METS/ALTO objects and associated images. research projects. 1. Assign or confirm unique identifiers. controls or enforceable triggers that make the 4. Storage and access requirements 3. Accept any formats. Some content delivered 2. Pre-process items as appropriate. collection available at a later date. • 4 preservation bags (2 copies per 2 on CD may have little existing metadata. Data 3. Transfer, package, and inventory items. 3. Sufficient metadata to record context of digital preservation server) + access bag requested by clients is verified for accuracy and GET materials within archival collection. Bagged media • online and offline storage completeness, as are data sources. + reports + collection metadata = SIP. • 54 Mb/page at approx. 10,000 pages per 4. Server space is requested based on projected 4. Access restrictions will determine reading room or partner per month for 20 years à tens of need, datasets kept up in archival storage. Web access by collection. Storage requirements millions of pages 5. Resource needs relate to the variety of formats are currently small but expected to grow. 5. Responsible for ingesting and sustaining digital taken in – management and access can be 5. Resource needs still being determined. materials rather than digitizing and describing. difficult, as most are proprietary and complex. 1. Describe or catalog digital material at DESCRIBE collection, object, or file level according to LOC best practices. 1. Unique identifiers are assigned for tangible media 1. LC control numbers used as identifiers. Project 1. Partners assign files unique identifiers according 2. Provide metadata to identify, characterize, in archival collection. data are filed by year, organization, client, and to NDNP file naming conventions, in addition to 2. Accessioning: Discovery of media à Remove or place the digital object in context. generating digital signatures and fixity values topic. from original location and note context à Physical 2. If purchased, data is checked for completeness using the NDNP Digital Viewer & Validation custody transfer to shelf à Collection catalog against purchase order. Byte count is checked Toolkit. record à Registration and bagging prep for files delivered by FTP. Data is sorted by 2. Batches of approx. 10,000 pages/month from Bagging: Receive and log media à Virus scan geographic location to prepare for cataloging. partner institutions arrive at Serials aggregated à Disk image à Bag media separately using Original copies of data are transferred to storage on external hard drives. BagIt à Create preliminary reports inc. directory separately from copies used in projects. 3. NDNP CTS workflow: Register batch delivery structure, file identification, & validation à QR of 3. NGA CDs and DVDs (excluding series CB01) are à Verify batch à QR à Mount drive to Bronze bags and media logs à Return media to registrar logged in an inventory. Files are checked against 1. Educate. 3. Registrar appraises flagged media & deletes as à Inventory à Malware scan à Copy to RDC 2. Survey. staging à Bag in place à Ingest into ChronAm an inventory if one was received from the creator; SUSTAIN appropriate à Create SIP bag à Create tangible otherwise BagIt serves as the inventory. Some 3. Determine needs. staging à Accept batch and copy to tape Sun 29, media backup stored in stacks à Update tracking files zipped for storage. MS Access database is tape Frontier, and access Sun 11 à Ingest into 4. Develop preservation action plan. database with # bytes ingested, # bytes disposed being developed to inventory CCP project files. ChronAm production à Return drive 5. Use existing LOC processes, tools, and practices. à Transfer to long-term storage (TBD) 6. Review preservation actions, plans, and policies. 1. METS objects are produced at the level of title, 1. During processing, original media are noted reel, and issue (including pages). 1. Determine and manage access. 1. Bibliographic catalog record is created if within collection-level finding aids encoded in 2. Metadata relates each page-level digital objectAVAILABLE 2. Monitor preservation status. EAD. to the issue, reel, and title. Selected standards necessary (National Geospatial Intelligence CDs 3. State missing or corrupted file. and DVDs have an existing record). MAKE 2. Before ingest, sufficient metadata is created to include METS, MODS, PREMIS, MIX, MARC 4. Ensure authenticity. record context of digital materials within archival XML, and Z39.87. Once objects are ingested, ITS 2. Metadata is extracted from the data if possible. 5. Enable search across data sets. collection. periodically supplies technical, administrative, 6. Enable search for derivative content. and preservation metadata to Serials.