METS-Bagger Tool
Normalizing existing digitized content into standardized
packages for robust long-term management.

Marcus Emmanuel Barnes
#c4lbc
2013-11-28
Background
● SFU Library holds about 15 TB of content
○ the Library has created high-quality master versions
of content it has digitized using ‘preservationfriendly’ formats.
○ descriptive metadata exists for almost all of it.

However, this content was not previously
managed with generally accepted digital
preservation practice.
Solution
● SFU Library Digitized Content Packaging
Specification
● METS-Bagger tool for normalizing existing
digitized content based on this specification
for robust long-term management.
METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection
manifest
Collection Normalization
● Processes existing collections of files into a format
compliant with the SFU Library Digitized Content
Packaging Specification
● Packaging Formats:
○ METS (http://www.loc.gov/standards/mets/)
○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
How Collection Normalization Works
1. Configuration file for settings
2. Script walks the directory tree of a collection, compiles
list of files to be preserved
3. Files are collated into items (e.g., newspaper issue),
METS file is generated
4. Items files and associated METS file are bagged (and
serialized)
5. Future: A collection manifest is created for the collection
for integrity checking (automatic or manual).
Before and After Processing
Design Principles
● a minimalist implementation - uses as few METS and
BagIt options as possible.
● incorporates three widely implemented and understood
standards: METS, BagIt and UUID (Universally Unique
Identifiers)
● Technical metadata included in METS should include at
a minimum bit-level checksums, file type identification,
creating application, and where possible format validity
● Whenever possible, include descriptive metadata for the
item in the METS file.
Script Details
● Configuration file, main script, log file, processed
collection output directory
● Uses Python for using the tool on multiple platforms
● Plugins for technical metadata (FITS) and descriptive
metadata.
● Configuration options include:
○ test run (limited run size)
○ skipping technical metadata creation
○ file types of interest
Future
● Addition of manifest and integrity checking
tools that check a collection against its
manifest
● Additional plugins
● Sharing code on GitHub
Thank You
This work was made possible by the support of:
● Simon Fraser University Library
● SFU Library Systems group
● Mark Jordan @mjordan

SFU Library's METS-Bagger Tool

  • 1.
    METS-Bagger Tool Normalizing existingdigitized content into standardized packages for robust long-term management. Marcus Emmanuel Barnes #c4lbc 2013-11-28
  • 2.
    Background ● SFU Libraryholds about 15 TB of content ○ the Library has created high-quality master versions of content it has digitized using ‘preservationfriendly’ formats. ○ descriptive metadata exists for almost all of it. However, this content was not previously managed with generally accepted digital preservation practice.
  • 3.
    Solution ● SFU LibraryDigitized Content Packaging Specification ● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.
  • 4.
    METS-Bagger Tool ● Twocomponents: ○ Collection normalization script ○ Integrity scripts based on collection manifest
  • 5.
    Collection Normalization ● Processesexisting collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification ● Packaging Formats: ○ METS (http://www.loc.gov/standards/mets/) ○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
  • 6.
    How Collection NormalizationWorks 1. Configuration file for settings 2. Script walks the directory tree of a collection, compiles list of files to be preserved 3. Files are collated into items (e.g., newspaper issue), METS file is generated 4. Items files and associated METS file are bagged (and serialized) 5. Future: A collection manifest is created for the collection for integrity checking (automatic or manual).
  • 7.
    Before and AfterProcessing
  • 8.
    Design Principles ● aminimalist implementation - uses as few METS and BagIt options as possible. ● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers) ● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity ● Whenever possible, include descriptive metadata for the item in the METS file.
  • 9.
    Script Details ● Configurationfile, main script, log file, processed collection output directory ● Uses Python for using the tool on multiple platforms ● Plugins for technical metadata (FITS) and descriptive metadata. ● Configuration options include: ○ test run (limited run size) ○ skipping technical metadata creation ○ file types of interest
  • 10.
    Future ● Addition ofmanifest and integrity checking tools that check a collection against its manifest ● Additional plugins ● Sharing code on GitHub
  • 11.
    Thank You This workwas made possible by the support of: ● Simon Fraser University Library ● SFU Library Systems group ● Mark Jordan @mjordan