• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Metadata For Preservation Delos
 

Metadata For Preservation Delos

on

  • 1,310 views

 

Statistics

Views

Total Views
1,310
Views on SlideShare
1,258
Embed Views
52

Actions

Likes
0
Downloads
5
Comments
0

4 Embeds 52

http://www.digitalpreservationeurope.eu 49
http://www.digitalpreservationeurope.e 1
https://twimg0-a.akamaihd.net 1
http://digitalpreservationeurope.eu 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Metadata For Preservation Delos Metadata For Preservation Delos Presentation Transcript

    • Metadata for Preservation Metadata for Preservation Priscilla Caplan, Florida Center for Library Automation
    • Outline
      • Do it yourself: let’s invent some preservation metadata
      • The OAIS Information Model
      • Metadata standards for preservation
        • general preservation metadata standards
        • format-specific technical metadata
        • packaging standards
      • Problems/issues/interesting things
    • first things first
      • what is metadata?
      • how do we normally classify different types of metadata?
      • what is preservation metadata?
    • first things first
      • what is metadata?
      • how do we normally classify different types of metadata?
      • what is preservation metadata?
        • metadata related to the preservation management of information resources; for example, metadata used to document, or created as a result of, preservation processes performed on information resources.
        • information that supports and documents the long-term preservation of information materials.
    • Fixity Viability Renderability Description Secure storage Media management Availability Identity Capture Selection Understandability Authenticity Format strategies (migration, emulation..) Authentication Documentation Preservation Pyramid
    • Fixity Viability Renderability Description Secure storage Media management Availability Identity Capture Selection Understandability Authenticity Format strategies (migration, emulation..) Authentication Documentation Preservation Pyramid p r e s e r v a t i o n m e t a d a t a
    • fixity
        • the quality of not being altered or deleted
        • threatened by insecure storage and media degredation
    • metadata supporting fixity
        • a message digest (checksum)
        • the algorithm used to generate it
        • when it was last calculated
        • who did the calculation
    • viability
        • the quality of being readable from media
        • threatened by media degredation and media obsolescence
    • metadata supporting viability
        • the type of medium used to store the object
        • the age of the specific unit
        • the date the object was written to the unit
        • performance metrics for the medium (MTTF)
        • usage metrics for the unit
    • renderability
        • the quality of being displayable, playable, or otherwise usable
        • threatened by format obsolescence
    • authenticity
        • the object is what it purports to be; both the source and the content are verifiable
        • threatened by unknown provenance, undocumented alterations
    • metadata supporting authenticity
        • the source of the object
        • a history of the custody of the object
        • a record of any changes to the object
        • a digital signature (maybe)
    • OAIS
    • OAIS Information Model
    • representation information
      • the information that is needed to make a Content Data Object understandable to a Designated Community
      • Structural:
        • the format is biff8
        • column 1 is a date yyyy-mm-dd, column 2 is a decimal
      • Semantic:
        • this is a daily business log for XYZ Corp.
        • col. 1 is the date of business, col. 2 is gross take in Euros
    • representation information
      • may be recursive
      • Structural:
        • the format is biff8
          • format specification for biff 8 (in PDF)
            • format specification for PDF
          • rules for rendering as a spreadsheet
        • column 1 is a date yyyy-mm-dd, column 2 is a decimal
      • Semantic:
        • this is a daily business log for XYZ Corp.
        • col. 1 is the date of business, col. 2 is gross in Euros
          • currency equivalence chart
    • preservation descriptive information
      • The information necessary to preserve the Content Information
      • reference = identifier(s)
      • context = relation to other Content Information
      • provenance = history of creation, modification, custody
      • fixity = checksums and similar mechanisms
    • packaging information
      • the information which, either actually or logically, binds, identifies and relates the Content Information and Preservation Descriptive Information
    • Standards
    • metadata standards for preservation
      • General preservation metadata standards
        • PREMIS (Preservation Metadata: Implementation Strategies)
        • LMER (Long-term Preservation Metadata for Electronic Resources)
      • Format-specific technical metadata
        • Z39.87 NISO/AIIM Technical metadata for digital still images
        • AES X089 core audio metadata
      • Packaging standards
        • METS (Metadata Encoding and Transmission Standard)
        • MPEG-21 Digital Item Declaration Language
    • general standards
        • PREMIS (Preservation Metadata: Implementation Strategies)
        • LMER (Long-term Preservation Metadata for Electronic Resources)
    • PREMIS
        • an implementable core set of preservation metadata
        • defines preservation metadata as “the information a repository uses to support the digital preservation process”
        • defines core as what most repositories need to know most of the time
        • but what is implementable ?
    • Implementable preservation metadata ...
        • is precisely defined
        • can be automatically supplied
        • can be automatically processed
          • e.g. prefer coded values from authority lists
        • is implementation independent
        • is based on a rigorous data model
    • PREMIS Data Model
    • Intellectual entity
        • Set of content that is considered a single intellectual unit for purposes of management and description (e.g., a book, a photograph, a map, a database)
        • May include other Intellectual Entities (e.g. a website that includes a web page)
        • Has one or more digital representations
        • Not described in PREMIS – use descriptive metadata
        • Examples:
          • Planets Newsletter, Issue 3
          • “ Identical twins” by Diane Arbus (a photograph)
          • Digital Curation Centre website
    • Object
        • What the repository actually preserves
        • Three types of object:
          • FILE: named and ordered sequence of bytes that is known by an operating system
          • REPRESENTATION: set of files, including structural metadata, that, taken together, constitute a complete rendering of an Intellectual Entity
          • BITSTREAM: data within a file with properties relevant for preservation purposes (but needs additional structure or reformatting to be stand-alone file)
    • Example: An IE with two representations Intellectual Entity: “ My dog Ace” Representation1: TIFF version Representation 2: JPEG2000 version File 1: dog.TIFF File 2: dog.JP2 Bitstream 1: Embedded metadata
    • Example 2: Another IE with 2 representations Intellectual Entity Da Vinci Code by Dan Brown Representation 1 Page image version Representation 2 ebook version File 1: page1.tiff File 2: page2.tiff File N: pageN.tiff File 1: book.lit File N+1: METS.xml
    • Event
        • An action that involves or impacts at least one Object or Agent associated with or known by the preservation repository
        • Helps document digital provenance. Can track history of Object through the chain of Events that occur during the Objects lifecycle
        • Examples:
          • Validation Event: verify that chapter1.pdf is a valid PDF file
          • Ingest Event: transform an OAIS SIP into an AIP (one Event or multiple Events?)
          • Migration Event: create a new version of a file in a more current format
    • Agent
        • Person, organization, software program associated with an Event or a Right
        • Not defined in detail in PREMIS
        • Examples:
          • Seamus Ross (a person)
          • British Library (an organization)
          • DAITSS (a system)
          • dioscuri (a software program)
    • Rights
      • Rights statement describes one or more rights or permissions granted to the repository
        • What is the basis for claiming the right? – statute, copyright, license
        • What can the repository do?
        • Examples:
          • because copyright status is public domain, repository can give unrestricted access, make copies and make derivative works
          • because of license terms, repository can make up to 10 copies
    • some things we say about Objects
        • object identifier
        • general technical characteristics, e.g.
          • size, format, fixity, inhibitors, creating application
          • composition level
        • format specific technical characteristics (use extension)
        • original name
        • storage
        • environment
        • digital signature
        • relationships to other objects
        • relationships to agents, events, and rights statements
        • significant properties
    • significant properties
        • the characteristics of digital objects which must be preserved over time in order to ensure the continued accessibility, usabilty, and meaning of the objects, and their capacity to be accepted as evidence of what the purport to record. (Andrew Wilson)
        • the characteristics of a particular object subjectively determined to be important to maintain through preservation actions
    • how could you preserve this apple?
    • significant properties
        • performance model: a source file is interpreted through a process to create a performance; in other words, the object is meaningful only as it is perceived
        • often faceted as content, context, appearance (rendering), structure, and behavior
        • InSPECT (Investigating the Significant Properties of Electronic Content over Time)
        • can apply to all objects of a given format, or individual objects
        • may be in the eye of the beholder
    • some things we say about Events
        • event identifier
        • event type
        • date and time
        • detail
        • outcome information
        • agents and their roles
        • objects and their roles
    • Sample Data Dictionary entry
    • PREMIS Maintenance Activity
    • LMER
      • Authored by Die Deutsche Bibliothek, used in kopal
      • Explicitly for exchange
      • Based on the National Library of New Zealand’s data model
    • a quick aside on archives
    • format specific technical metadata
      • What kinds of properties are format-specific?
        • number of tracks
        • character set
        • height, width
        • color space
        • fonts
    • format specific metadata “standards”
        • NISO/AIIM Z39.87-2006, Data Dictionary - Technical metadata for digital still images
        • AES-X098B, Core audio metadata XML definition (draft)
        • textMD (now maintained by Library of Congress)
        • JHOVE and metadata extraction tools
    • Z39.87 Revised second edition Defers to PREMIS where elements overlap XML binding is MIX, maintained by Library of Congress
    • issues with format-specific metadata
        • how much of it is useful for preservation?
        • what would you use it for?
        • if you can extract it from a file header, do you need to need to extract it from the file header?
        • what to do when schema for format-specific metadata also defines general technical metadata?
        • what is the proper role of registries?
    • packaging standards
        • METS (Metadata Encoding and Transmission Standard)
        • MPEG-21 Digital Item Declaration Language
        • IMS Global Learning Consortium Content Packaging Standards
        • Sharable Content Object Reference Model (SCORM)
        • CCSDS XML Packaging scheme
    • METS
    • structure of a METS document amdsec can include source, provenance, rights, and technical metadata
    • Issues, problems and Interesting things
      • does preservation metadata actually work?
      • how best to store in working repositories?
      • role of centralized registries
      • what can be automated
      • best practices for interoperability
    • references
      • Priscilla Caplan, Preservation Metadata (DCC Digital Curation Manual) http://www.dcc.ac.uk/resource/curation-manual/chapters/preservation-metadata/
      • Brian Lavoie, Technology Watch Report: Preservation Metadata http://www.dpconline.org/docs/reports/dpctw05-01.pdf
      • PREMIS Maintenance Activity http://www.loc.gov/standards/premis/
      • METS Maintenance Activity http://www.loc.gov/standards/mets/
    • Creative Commons Licence
      • This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http:// creativecommons . org / licenses /by/3.0/ us / or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.