Emily Reynolds & Chelcie RowellDigital Preservation Policy Development                                                    ...
Upcoming SlideShare
Loading in …5

Digital Preservation Policy at the Library of Congress


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Digital Preservation Policy at the Library of Congress

  1. 1. Emily Reynolds & Chelcie RowellDigital Preservation Policy Development 2012 OSI Junior Fellowsat the Library of Congress The Library’s Strategic Plan sets an outcome for 2014 that “The Library has identified and proposed criteria for the preservation of the Library’s digital materials.” We’re working to effect that outcome by gathering information about existing digital preservation practices and comparing those practices to the life cycle framework developed by the Library’s Preservation Working Group. Ultimately, our comparative analysis may contribute to revision of the life cycle framework. 1. Prioritize and select digital collection. 2. Determine user group. Serial & Government Publications 3. Verify content, type, and metadata. Manuscript Division Geography & Map Division Division PLAN 4. Determine requirements for access, storage space, and server space. 5. Determine resource needs. 1. Unpublished born-digital records received as part 1. Historic newspapers in the public domain (high 1. Data is acquired via Federal deposit, donation, of manuscript collections of personal papers and demand and value but low risk) on high quality or Acquisitions; CCP project data is downloaded organizational records. microfilm. from the Internet or acquired directly from client. 2. Discoverable through finding aids served publicly 2. Publicly served immediately on the Web through 2. Most content is publicly accessible, with some on-site at the Manuscript Reading Room. Access Chronicling America site and API. private data used for specific Congressional restrictions may require more granular access 3. METS/ALTO objects and associated images. research projects. 1. Assign or confirm unique identifiers. controls or enforceable triggers that make the 4. Storage and access requirements 3. Accept any formats. Some content delivered 2. Pre-process items as appropriate. collection available at a later date. • 4 preservation bags (2 copies per 2 on CD may have little existing metadata. Data 3. Transfer, package, and inventory items. 3. Sufficient metadata to record context of digital preservation server) + access bag requested by clients is verified for accuracy and GET materials within archival collection. Bagged media • online and offline storage completeness, as are data sources. + reports + collection metadata = SIP. • 54 Mb/page at approx. 10,000 pages per 4. Server space is requested based on projected 4. Access restrictions will determine reading room or partner per month for 20 years à tens of need, datasets kept up in archival storage. Web access by collection. Storage requirements millions of pages 5. Resource needs relate to the variety of formats are currently small but expected to grow. 5. Responsible for ingesting and sustaining digital taken in – management and access can be 5. Resource needs still being determined. materials rather than digitizing and describing. difficult, as most are proprietary and complex. 1. Describe or catalog digital material at DESCRIBE collection, object, or file level according to LOC best practices. 1. Unique identifiers are assigned for tangible media 1. LC control numbers used as identifiers. Project 1. METS objects are produced at the level of title, 2. Provide metadata to identify, characterize, in archival collection. data are filed by year, organization, client, and reel, and issue (including pages). 2. Accessioning: Discovery of media à Remove or place the digital object in context. 2. Metadata relates each page-level digital object topic. from original location and note context à Physical 2. If purchased, data is checked for completeness to the issue, reel, and title. Selected standards custody transfer to shelf à Collection catalog against purchase order. Byte count is checked include METS, MODS, PREMIS, MIX, MARC record à Registration and bagging prep for files delivered by FTP. Data is sorted by XML, and Z39.87. Once objects are ingested, ITS Bagging: Receive and log media à Virus scan geographic location to prepare for cataloging. periodically supplies technical, administrative, à Disk image à Bag media separately using Original copies of data are transferred to storage and preservation metadata to Serials. BagIt à Create preliminary reports inc. directory separately from copies used in projects. structure, file identification, & validation à QR of 3. NGA CDs and DVDs (excluding series CB01) are bags and media logs à Return media to registrar logged in an inventory. Files are checked against 1. Educate. 3. Registrar appraises flagged media & deletes as 2. Survey. an inventory if one was received from the creator; SUSTAIN appropriate à Create SIP bag à Create tangible otherwise BagIt serves as the inventory. Some 3. Determine needs. media backup stored in stacks à Update tracking files zipped for storage. MS Access database is 4. Develop preservation action plan. database with # bytes ingested, # bytes disposed being developed to inventory CCP project files. 1. Partners assign files unique identifiers according 5. Use existing LOC processes, tools, and practices. à Transfer to long-term storage (TBD) to NDNP file naming conventions, in addition to 6. Review preservation actions, plans, and policies. generating digital signatures and fixity values using the NDNP Digital Viewer & Validation Toolkit. 2. Batches of approx. 10,000 pages/month from partner institutions arrive at Serials aggregated on external hard drives. 1. During processing, original media are noted 3. NDNP CTS workflow: Register batch delivery 1. Determine and manage access. 1. Bibliographic catalog record is created if within collection-level finding aids encoded in à Verify batch à QR à Mount drive to BronzeAVAILABLE 2. Monitor preservation status. EAD. à Inventory à Malware scan à Copy to RDC necessary (National Geospatial Intelligence CDs 3. State missing or corrupted file. and DVDs have an existing record). MAKE 2. Before ingest, sufficient metadata is created to staging à Bag in place à Ingest into ChronAm 4. Ensure authenticity. record context of digital materials within archival staging à Accept batch and copy to tape Sun 29, 2. Metadata is extracted from the data if possible. 5. Enable search across data sets. collection. tape Frontier, and access Sun 11 à Ingest into 6. Enable search for derivative content. ChronAm production à Return drive