METS-Bagger Tool
Normalizing existing digitized content into standardized
packages for robust long-term management.

Marcu...
Background
● SFU Library holds about 15 TB of content
○ the Library has created high-quality master versions
of content it...
Solution
● SFU Library Digitized Content Packaging
Specification
● METS-Bagger tool for normalizing existing
digitized con...
METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection
manifest
Collection Normalization
● Processes existing collections of files into a format
compliant with the SFU Library Digitized ...
How Collection Normalization Works
1. Configuration file for settings
2. Script walks the directory tree of a collection, ...
Before and After Processing
Design Principles
● a minimalist implementation - uses as few METS and
BagIt options as possible.
● incorporates three wid...
Script Details
● Configuration file, main script, log file, processed
collection output directory
● Uses Python for using ...
Future
● Addition of manifest and integrity checking
tools that check a collection against its
manifest
● Additional plugi...
Thank You
This work was made possible by the support of:
● Simon Fraser University Library
● SFU Library Systems group
● M...
Upcoming SlideShare
Loading in...5
×

SFU Library's METS-Bagger Tool

474

Published on

Normalizing existing digitized content into standardized packages for robust long-term management. A report on SFU Library's METS-Bagger tool, with a discussion of the benefits, design principles used for the packaging specification, and potential next steps.

Presented at Code4Lib BC, November 28, 2013.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
474
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SFU Library's METS-Bagger Tool

  1. 1. METS-Bagger Tool Normalizing existing digitized content into standardized packages for robust long-term management. Marcus Emmanuel Barnes #c4lbc 2013-11-28
  2. 2. Background ● SFU Library holds about 15 TB of content ○ the Library has created high-quality master versions of content it has digitized using ‘preservationfriendly’ formats. ○ descriptive metadata exists for almost all of it. However, this content was not previously managed with generally accepted digital preservation practice.
  3. 3. Solution ● SFU Library Digitized Content Packaging Specification ● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.
  4. 4. METS-Bagger Tool ● Two components: ○ Collection normalization script ○ Integrity scripts based on collection manifest
  5. 5. Collection Normalization ● Processes existing collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification ● Packaging Formats: ○ METS (http://www.loc.gov/standards/mets/) ○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
  6. 6. How Collection Normalization Works 1. Configuration file for settings 2. Script walks the directory tree of a collection, compiles list of files to be preserved 3. Files are collated into items (e.g., newspaper issue), METS file is generated 4. Items files and associated METS file are bagged (and serialized) 5. Future: A collection manifest is created for the collection for integrity checking (automatic or manual).
  7. 7. Before and After Processing
  8. 8. Design Principles ● a minimalist implementation - uses as few METS and BagIt options as possible. ● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers) ● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity ● Whenever possible, include descriptive metadata for the item in the METS file.
  9. 9. Script Details ● Configuration file, main script, log file, processed collection output directory ● Uses Python for using the tool on multiple platforms ● Plugins for technical metadata (FITS) and descriptive metadata. ● Configuration options include: ○ test run (limited run size) ○ skipping technical metadata creation ○ file types of interest
  10. 10. Future ● Addition of manifest and integrity checking tools that check a collection against its manifest ● Additional plugins ● Sharing code on GitHub
  11. 11. Thank You This work was made possible by the support of: ● Simon Fraser University Library ● SFU Library Systems group ● Mark Jordan @mjordan
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×