SlideShare a Scribd company logo
a centre of expertise in data curation and preservation




          Preservation metadata

                       Michael Day
                 Digital Curation Centre
                UKOLN, University of Bath
                  m.day@ukoln.ac.uk




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                 Session overview
      – The need for preservation metadata
      – Some definitions and roles
      – Some influential standards
         • The OAIS Information Model
         • The PREMIS Data Dictionary
      – Some final challenges




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




       The need for preservation
              metadata




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




      Digital preservation options (1)
  •   Option 1: Retaining the original media
      – Sometimes known as quot;technology preservationquot; (also
        keeping all necessary hardware and software)
      – However, media will decay and become obsolete (as will
        associated hardware and software)
      – There may be some ways to rescue content, e.g. digital
        archaeology or forensics (expensive and unproven)




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




      Digital preservation options (2)
  •   Option 2: Retaining and maintaining the (original) bit
      stream
      – Known as bit-level preservation
         • An essential part of any long-term digital preservation
            strategy
      – However, keeping bits safe is insufficient by itself
      – There remains a need for additional information (e.g.
        software, hardware emulators, documentation, descriptive
        and contextual metadata) that supports both the authenticity
        of objects and their continued rendering




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




     Digital preservation options (3)
  • Option 3: The periodic (and ongoing) transformation
    of bit streams
      – Associated with migration strategies
      – Bit stream can always (?) be rendered within the current
        hardware and software environment
      – There still remains a need for descriptive and contextual
        metadata, also for additional information on the change-
        history of the object itself (provenance and stewardship)




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                          Argument
  • Metadata is required to support these preservation
    strategies:
      – All digital preservation approaches depend (to some extent)
        on the creation, capture and maintenance of suitable
        metadata
          • quot;Preserving the right metadata is key to preserving digital
            objectsquot; (ERPANET Briefing Paper, 2003)
      – An important area of ongoing research and development
        (and increasingly implementation)




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




If we start with an object ...




          ... we will need to answer some questions about it ...

  DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




       Who created this          What intellectual   What format is it?
          object?               property rights are
                               vested in the object?      What software will I
     Is there a                                            need to render it?
    codebook ?
                                                                       Is this software
Are there any other                                                  currently available?
  dependencies?
                                                                   Has the object been
    Is the object related                                          changed in any way
  to other objects located                                            since ingest?
  within this (or any other)
         repository?                                           Who made these
                                                                 changes?
            Can I be sure that                                  Who currently has
            the object is what    Who has previously
                                                                 custody of this
             it claims to be?     had custody of this
                                                                    object?
                                       object?
      DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




      Some definitions and roles




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




    Roles of preservation metadata
      – The information needed quot;… to find, manage, control,
        understand or preserve … information over timequot; (Adrian
        Cunningham, 2000)
      – The various types data that will allow the re-creation and
        interpretation of the structure and content of digital data over
        time (Ludäsher, Marciano & Moore, ACM SIGMOD Record,
        2001)
      – The quot;information a repository uses to support the digital
        preservation process,quot; specifically: quot;the functions of
        maintaining viability, renderability, understandability,
        authenticity, and identity in a preservation contextquot; (PREMIS
        Data Dictionary, 2005)


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                     PREMIS roles
  • No specific definitions in PREMIS Data Dictionary:
      – Viability - bit streams (and systems) should be managed in a
        way that ensures that they continue to be available over time
      – Renderability - preservation strategies should be adopted in
        order to ensure that objects can be rendered in appropriate
        ways, e.g. within the current computing environment or
        through emulation
      – Understandability - Objects should be understandable (at
        various different levels, e.g. structure and semantics)
      – Authenticity - Objects should be what they claim to be ... bit
        integrity is not enough
      – Identity - Objects should be identifiable and able to be
        discovered in appropriate ways
DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                             Authenticity       Understandability

     Preservation strategies         Renderability

     Media management                  Viability

     Secure storage                    Integrity

     Description                        Identity

     Capture
                                     Availability
     Selection
           Priscilla Caplan's revised Preservation Pyramid (2005)

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




           Sources of metadata (1)
  • Embedded within objects themselves
      – Typical examples include TIFF headers, file properties in
        Office programs
      – Tools have been developed to capture some of this
        metadata automatically, e.g.:
         • New Zealand National Library preservation metadata
            extraction tool
         • JHOVE (JSTOR/Harvard Object Validation Environment)
            for the identification and validation of formats
      – However, can we always trust embedded metadata?
         • Do we regularly update file properties?
         • What do we do if there are conflicts?

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




           Sources of metadata (2)
  • Associated with objects, e.g.:
      – Readme files or documentation
      – Databases (e.g., bibliographic catalogues, e-journal
        systems)
      – Documentation standards or codebooks (e.g. XML)
  • Created by the preservation repository itself
      – Part of ingest process
      – Automatically captured from the ongoing management of
        objects, recording, e.g.:
         • Custodial history
         • Format transformations
         • Usage
DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




      Some influential standards




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                  The OAIS Model
  • Reference Model for an Open Archival Information
    System (OAIS)
      – Development managed by the Consultative Committee on
        Space Data Systems (CCSDS)
      – CCSDS Blue Book 650.0-B-1 (2002)
      – ISO 14721:2003 (currently under review)
      – Has established a common framework of terms and
        concepts
      – The information model has been very relevant to the design
        of preservation metadata schemas
      – Question of OAIS quot;conformancequot;


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




    OAIS mandatory responsibilities
      – Negotiating and accepting information
      – Obtaining sufficient control of the information to ensure long-
        term preservation
      – Determining the quot;designated communityquot;
      – Ensuring that information is independently
        understandable, i.e. without the assistance of those who
        produced it
      – Following documented policies and procedures
      – Making the preserved information available




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




          OAIS functional model (1)
      – Six entities
         • Ingest
         • Archival Storage
         • Data Management
         • Administration
         • Preservation Planning
         • Access
      – Described using UML diagrams ...




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




              OAIS functional model (2)
P                                                                                             C
                              Preservation Planning
R                                                                                             O
                                                  Descriptive
                                                                                  DIP         N
O                                                    info.
D               Descriptive
                                                                                 queries      S
                   info.
U                                      Data
                                                                                result sets
                                                                                              U
      SIP                           Management
C                                                               Access                        M
                Ingest                                                           orders       E
E
R     SIP                       Archival                                                      R
                    AIP         Storage             AIP

      SIP                                                                         DIP


                                 Administration



                              MANAGEMENT
                                                          OAIS Functional Entities (Figure 4-1)


    DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




          OAIS information objects
      – Information Object (basic concept)
          • Data Object (bit-stream)
          • Representation Information
              – Permits “the full interpretation of Data Object into
                meaningful information”
              – Includes documentation, software, metadata, etc.
      – Information Object Classes
          • Content Information
          • Preservation Description Information (PDI)
          • Packaging Information
          • Descriptive Information


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




         OAIS information packages
  •   Information package:
       – Container that encapsulates Content Information and
          Preservation Description Information (PDI)
       – Different packages defined for submission to an archive
          (SIP), archival storage (AIP) and dissemination (DIP)
            • AIP = “... a concise way of referring to a set of
              information that has, in principle, all of the qualities
              needed for permanent, or indefinite, Long Term
              Preservation of a designated Information Object”
       – PDI = other information “which will allow the understanding
          of the Content Information over an indefinite period of time”
            • Reference, Provenance, Context, Fixity


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




        PREMIS Working Group (1)
      – PREMIS WG = Preservation Metadata: Implementation
        Strategies
         • Sponsored by OCLC and RLG
         • Established 2003
         • International working group and advisory committee
            (practical focus)
              – Members from the US, the UK, the Netherlands,
                Germany, Australia and New Zealand
         • Chaired by Priscilla Caplan and Rebecca Guenther




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




        PREMIS Working Group (2)
  • Main objectives:
      – A 'core' set of preservation metadata elements (Data
        Dictionary)
      – Strategies for encoding, packaging, storing, managing, and
        exchanging metadata
  • Outputs:
      – Implementation Survey report (September 2004)
      – PREMIS Data Dictionary (May 2005)
         • The data dictionary is a translation of the OAIS-based
           2002 Framework into a set of implementable semantic
           units
         • Based on data model ...

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




               PREMIS data model
       Intellectual
         entities


                            Rights


         Objects                                    Agents


                            Events




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




           PREMIS Data Dictionary
      – Defined various quot;semantic unitsquot; for:
         • Objects ... at three levels of entity (representation, file,
            bitstream)
         • Events ... metadata about actions
         • Agents ... but are not the main focus
         • Rights ... primarily those that relate to preservation
      – Also:
         • An XML implementation
      – Maintenance activity (led by the Library of Congress)
      – PREMIS Implementors' Group (PIG)
      – Already thinking about lessons for version 2.0


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                   Other standards
      – Standards developed from many different perspectives:
         • PREMIS Data Dictionary
         • OCLC/RLG Preservation Metadata Framework, Cedars,
           NEDLIB, NLA, NLNZ ... OAIS influence has been
           strongest in this area
         • METS, NISO Z39.87 (emerged from digitisation contexts)
      – Other standards have also been developed with other
        aspects of object management in mind:
         • Records management (VERS, RKMS, ISO 23081-1
           Metadata for Records)
         • Multimedia (MPEG-7, SMPTE Metadata Dictionary)
         • Rights management (MPEG-21)


DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




           Some final challenges




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




          Is metadata sustainable?
      – Metadata is expensive to create and maintain:
         • There is a need to balance the risks of data loss (or costs
           of recovery) with the costs of creating metadata
         • Automatic capture of some types of metadata
             – Metadata already embedded in objects or in
                secondary databases; also need to capture event
                metadata from archive processes
         • Sharing information via registries of format information
           (Representation Information)
         • Avoid imposing unnecessary costs:
             – Need to identify the right metadata (or 'core
                metadata')

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




       The role of shared registries
  • For the sharing of information about formats (and
    metadata)
      – There is quot;… a pressing need to establish reliable, sustained
        repositories of file format specifications, documentation, and
        related softwarequot; (Lawrence, et al., 2000)
      – Examples:
          • Global Digital Format Registry (GDFR)
              – Harvard University Library
              – Funded by the Andrew W. Mellon Foundation
          • PRONOM technical registry (The National Archives)
          • DCC Representation Information Registry and
            Repository (demo)

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                     Further reading
  •   Three chapters from the DCC Digital Curation Manual:
      http://www.dcc.ac.uk/resource/curation-manual/chapters/
       – Priscilla Caplan, quot;Preservation metadataquot; (July 2006)
       – Wendy Duff & Marlene van Ballegooie, quot;Archival Metadataquot; (May
          2006)
       – Michael Day, quot;Metadataquot; (November 2005)
  •   Brian Lavoie & Richard Gartner, quot;Preservation metadata.quot; DPC
      Technology Watch Report 05-01 (September 2005):
      http://www.dpconline.org/docs/reports/dpctw05-01.pdf
  •   PREMIS Data Dictionary for Preservation Metadata (May 2005):
      http://www.oclc.org/research/projects/pmwg/
  •   Reference Model for an Open Archival Information System (OAIS),
      CCSDS 650 0-B-1 (January 2002):
      http://public.ccsds.org/publications/archive/650x0b1.pdf

DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




                 Acknowledgements
  UKOLN is funded by the Museums, Libraries and Archives Council, the
  Joint Information Systems Committee (JISC) of the UK higher and further
  education funding councils, as well as by project funding from the JISC,
  the European Union and other sources. UKOLN also receives support from
  the University of Bath, where it is based: http://www.ukoln.ac.uk/




  The Digital Curation Centre is funded by the JISC and the UK e-Science
  Programme: http://www.dcc.ac.uk/




DCC Information Day, University College London, January 16, 2007
a centre of expertise in data curation and preservation




DCC Information Day, University College London, January 16, 2007

More Related Content

Similar to Preservation metadata

Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Smita Chandra
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
ekansa
 
Getaneh Alemu
Getaneh AlemuGetaneh Alemu
Getaneh Alemu
JISC Digital Media
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
Michael Day
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
Michael Day
 
Corrado -- Establishing the Landscape
Corrado -- Establishing the LandscapeCorrado -- Establishing the Landscape
Corrado -- Establishing the Landscape
National Information Standards Organization (NISO)
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
DigitalPreservationEurope
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
Michael Day
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesJoshua Parker
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
Michael Day
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
Michael Day
 
Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationSmita Chandra
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
smtcd
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
Rob Grim
 
Trm Vilnius Metadata New
Trm Vilnius Metadata NewTrm Vilnius Metadata New
Trm Vilnius Metadata New
DigitalPreservationEurope
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
Michael Day
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
Michael Day
 
Digital Preservation (UWE)
Digital Preservation (UWE)Digital Preservation (UWE)
Digital Preservation (UWE)
Michael Day
 

Similar to Preservation metadata (20)

Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Getaneh Alemu
Getaneh AlemuGetaneh Alemu
Getaneh Alemu
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Brief Introduction to Digital Preservation
Brief Introduction to Digital PreservationBrief Introduction to Digital Preservation
Brief Introduction to Digital Preservation
 
Corrado -- Establishing the Landscape
Corrado -- Establishing the LandscapeCorrado -- Establishing the Landscape
Corrado -- Establishing the Landscape
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Preservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCCPreservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCC
 
Trm Vilnius Metadata New
Trm Vilnius Metadata NewTrm Vilnius Metadata New
Trm Vilnius Metadata New
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Digital Preservation (UWE)
Digital Preservation (UWE)Digital Preservation (UWE)
Digital Preservation (UWE)
 
Ji cv6n1
Ji cv6n1Ji cv6n1
Ji cv6n1
 

More from Michael Day

What can libraries do for researchers?
What can libraries do for researchers?What can libraries do for researchers?
What can libraries do for researchers?
Michael Day
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British Library
Michael Day
 
Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...
Michael Day
 
Developing institutional RDM services
Developing institutional RDM servicesDeveloping institutional RDM services
Developing institutional RDM services
Michael Day
 
Open access data
Open access dataOpen access data
Open access data
Michael Day
 
Digital Curation 101 (University of Glamorgan)
Digital Curation 101 (University of Glamorgan)Digital Curation 101 (University of Glamorgan)
Digital Curation 101 (University of Glamorgan)
Michael Day
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...
Michael Day
 
Developing a Community Capability Model Framework for data-intensive research
Developing a Community Capability Model Framework for data-intensive researchDeveloping a Community Capability Model Framework for data-intensive research
Developing a Community Capability Model Framework for data-intensive researchMichael Day
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
Michael Day
 
Introduction to Research Data Management: activities, roles and requirements
Introduction to Research Data Management: activities, roles and requirementsIntroduction to Research Data Management: activities, roles and requirements
Introduction to Research Data Management: activities, roles and requirements
Michael Day
 
UKOLN activities on research information management
UKOLN activities on research information managementUKOLN activities on research information management
UKOLN activities on research information management
Michael Day
 
UKOLN Programme Support for the JISC Research Information Management Programme
UKOLN Programme Support for the JISC Research Information Management ProgrammeUKOLN Programme Support for the JISC Research Information Management Programme
UKOLN Programme Support for the JISC Research Information Management Programme
Michael Day
 
Models for integrating institutional repositories and research information ma...
Models for integrating institutional repositories and research information ma...Models for integrating institutional repositories and research information ma...
Models for integrating institutional repositories and research information ma...
Michael Day
 
Research Information Management
Research Information ManagementResearch Information Management
Research Information Management
Michael Day
 
Digital preservation exercises
Digital preservation exercisesDigital preservation exercises
Digital preservation exercises
Michael Day
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
Michael Day
 
Digital preservation from a records management perspective
Digital preservation from a records management perspectiveDigital preservation from a records management perspective
Digital preservation from a records management perspective
Michael Day
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Michael Day
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservation
Michael Day
 

More from Michael Day (20)

What can libraries do for researchers?
What can libraries do for researchers?What can libraries do for researchers?
What can libraries do for researchers?
 
Preservation planning at the British Library
Preservation planning at the British LibraryPreservation planning at the British Library
Preservation planning at the British Library
 
Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...Implementing digital preservation strategy: collection profiling at the Briti...
Implementing digital preservation strategy: collection profiling at the Briti...
 
Developing institutional RDM services
Developing institutional RDM servicesDeveloping institutional RDM services
Developing institutional RDM services
 
Open access data
Open access dataOpen access data
Open access data
 
Digital Curation 101 (University of Glamorgan)
Digital Curation 101 (University of Glamorgan)Digital Curation 101 (University of Glamorgan)
Digital Curation 101 (University of Glamorgan)
 
Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...Continuity and change: Opportunities and challenges for the future of researc...
Continuity and change: Opportunities and challenges for the future of researc...
 
Developing a Community Capability Model Framework for data-intensive research
Developing a Community Capability Model Framework for data-intensive researchDeveloping a Community Capability Model Framework for data-intensive research
Developing a Community Capability Model Framework for data-intensive research
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Introduction to Research Data Management: activities, roles and requirements
Introduction to Research Data Management: activities, roles and requirementsIntroduction to Research Data Management: activities, roles and requirements
Introduction to Research Data Management: activities, roles and requirements
 
UKOLN activities on research information management
UKOLN activities on research information managementUKOLN activities on research information management
UKOLN activities on research information management
 
UKOLN Programme Support for the JISC Research Information Management Programme
UKOLN Programme Support for the JISC Research Information Management ProgrammeUKOLN Programme Support for the JISC Research Information Management Programme
UKOLN Programme Support for the JISC Research Information Management Programme
 
EASTER project
EASTER projectEASTER project
EASTER project
 
Models for integrating institutional repositories and research information ma...
Models for integrating institutional repositories and research information ma...Models for integrating institutional repositories and research information ma...
Models for integrating institutional repositories and research information ma...
 
Research Information Management
Research Information ManagementResearch Information Management
Research Information Management
 
Digital preservation exercises
Digital preservation exercisesDigital preservation exercises
Digital preservation exercises
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Digital preservation from a records management perspective
Digital preservation from a records management perspectiveDigital preservation from a records management perspective
Digital preservation from a records management perspective
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
 
Repositories and digital preservation
Repositories and digital preservationRepositories and digital preservation
Repositories and digital preservation
 

Recently uploaded

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 

Recently uploaded (20)

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 

Preservation metadata

  • 1. a centre of expertise in data curation and preservation Preservation metadata Michael Day Digital Curation Centre UKOLN, University of Bath m.day@ukoln.ac.uk DCC Information Day, University College London, January 16, 2007
  • 2. a centre of expertise in data curation and preservation Session overview – The need for preservation metadata – Some definitions and roles – Some influential standards • The OAIS Information Model • The PREMIS Data Dictionary – Some final challenges DCC Information Day, University College London, January 16, 2007
  • 3. a centre of expertise in data curation and preservation The need for preservation metadata DCC Information Day, University College London, January 16, 2007
  • 4. a centre of expertise in data curation and preservation Digital preservation options (1) • Option 1: Retaining the original media – Sometimes known as quot;technology preservationquot; (also keeping all necessary hardware and software) – However, media will decay and become obsolete (as will associated hardware and software) – There may be some ways to rescue content, e.g. digital archaeology or forensics (expensive and unproven) DCC Information Day, University College London, January 16, 2007
  • 5. a centre of expertise in data curation and preservation Digital preservation options (2) • Option 2: Retaining and maintaining the (original) bit stream – Known as bit-level preservation • An essential part of any long-term digital preservation strategy – However, keeping bits safe is insufficient by itself – There remains a need for additional information (e.g. software, hardware emulators, documentation, descriptive and contextual metadata) that supports both the authenticity of objects and their continued rendering DCC Information Day, University College London, January 16, 2007
  • 6. a centre of expertise in data curation and preservation Digital preservation options (3) • Option 3: The periodic (and ongoing) transformation of bit streams – Associated with migration strategies – Bit stream can always (?) be rendered within the current hardware and software environment – There still remains a need for descriptive and contextual metadata, also for additional information on the change- history of the object itself (provenance and stewardship) DCC Information Day, University College London, January 16, 2007
  • 7. a centre of expertise in data curation and preservation Argument • Metadata is required to support these preservation strategies: – All digital preservation approaches depend (to some extent) on the creation, capture and maintenance of suitable metadata • quot;Preserving the right metadata is key to preserving digital objectsquot; (ERPANET Briefing Paper, 2003) – An important area of ongoing research and development (and increasingly implementation) DCC Information Day, University College London, January 16, 2007
  • 8. a centre of expertise in data curation and preservation If we start with an object ... ... we will need to answer some questions about it ... DCC Information Day, University College London, January 16, 2007
  • 9. a centre of expertise in data curation and preservation Who created this What intellectual What format is it? object? property rights are vested in the object? What software will I Is there a need to render it? codebook ? Is this software Are there any other currently available? dependencies? Has the object been Is the object related changed in any way to other objects located since ingest? within this (or any other) repository? Who made these changes? Can I be sure that Who currently has the object is what Who has previously custody of this it claims to be? had custody of this object? object? DCC Information Day, University College London, January 16, 2007
  • 10. a centre of expertise in data curation and preservation Some definitions and roles DCC Information Day, University College London, January 16, 2007
  • 11. a centre of expertise in data curation and preservation Roles of preservation metadata – The information needed quot;… to find, manage, control, understand or preserve … information over timequot; (Adrian Cunningham, 2000) – The various types data that will allow the re-creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, ACM SIGMOD Record, 2001) – The quot;information a repository uses to support the digital preservation process,quot; specifically: quot;the functions of maintaining viability, renderability, understandability, authenticity, and identity in a preservation contextquot; (PREMIS Data Dictionary, 2005) DCC Information Day, University College London, January 16, 2007
  • 12. a centre of expertise in data curation and preservation PREMIS roles • No specific definitions in PREMIS Data Dictionary: – Viability - bit streams (and systems) should be managed in a way that ensures that they continue to be available over time – Renderability - preservation strategies should be adopted in order to ensure that objects can be rendered in appropriate ways, e.g. within the current computing environment or through emulation – Understandability - Objects should be understandable (at various different levels, e.g. structure and semantics) – Authenticity - Objects should be what they claim to be ... bit integrity is not enough – Identity - Objects should be identifiable and able to be discovered in appropriate ways DCC Information Day, University College London, January 16, 2007
  • 13. a centre of expertise in data curation and preservation Authenticity Understandability Preservation strategies Renderability Media management Viability Secure storage Integrity Description Identity Capture Availability Selection Priscilla Caplan's revised Preservation Pyramid (2005) DCC Information Day, University College London, January 16, 2007
  • 14. a centre of expertise in data curation and preservation Sources of metadata (1) • Embedded within objects themselves – Typical examples include TIFF headers, file properties in Office programs – Tools have been developed to capture some of this metadata automatically, e.g.: • New Zealand National Library preservation metadata extraction tool • JHOVE (JSTOR/Harvard Object Validation Environment) for the identification and validation of formats – However, can we always trust embedded metadata? • Do we regularly update file properties? • What do we do if there are conflicts? DCC Information Day, University College London, January 16, 2007
  • 15. a centre of expertise in data curation and preservation DCC Information Day, University College London, January 16, 2007
  • 16. a centre of expertise in data curation and preservation DCC Information Day, University College London, January 16, 2007
  • 17. a centre of expertise in data curation and preservation Sources of metadata (2) • Associated with objects, e.g.: – Readme files or documentation – Databases (e.g., bibliographic catalogues, e-journal systems) – Documentation standards or codebooks (e.g. XML) • Created by the preservation repository itself – Part of ingest process – Automatically captured from the ongoing management of objects, recording, e.g.: • Custodial history • Format transformations • Usage DCC Information Day, University College London, January 16, 2007
  • 18. a centre of expertise in data curation and preservation Some influential standards DCC Information Day, University College London, January 16, 2007
  • 19. a centre of expertise in data curation and preservation The OAIS Model • Reference Model for an Open Archival Information System (OAIS) – Development managed by the Consultative Committee on Space Data Systems (CCSDS) – CCSDS Blue Book 650.0-B-1 (2002) – ISO 14721:2003 (currently under review) – Has established a common framework of terms and concepts – The information model has been very relevant to the design of preservation metadata schemas – Question of OAIS quot;conformancequot; DCC Information Day, University College London, January 16, 2007
  • 20. a centre of expertise in data curation and preservation OAIS mandatory responsibilities – Negotiating and accepting information – Obtaining sufficient control of the information to ensure long- term preservation – Determining the quot;designated communityquot; – Ensuring that information is independently understandable, i.e. without the assistance of those who produced it – Following documented policies and procedures – Making the preserved information available DCC Information Day, University College London, January 16, 2007
  • 21. a centre of expertise in data curation and preservation OAIS functional model (1) – Six entities • Ingest • Archival Storage • Data Management • Administration • Preservation Planning • Access – Described using UML diagrams ... DCC Information Day, University College London, January 16, 2007
  • 22. a centre of expertise in data curation and preservation OAIS functional model (2) P C Preservation Planning R O Descriptive DIP N O info. D Descriptive queries S info. U Data result sets U SIP Management C Access M Ingest orders E E R SIP Archival R AIP Storage AIP SIP DIP Administration MANAGEMENT OAIS Functional Entities (Figure 4-1) DCC Information Day, University College London, January 16, 2007
  • 23. a centre of expertise in data curation and preservation OAIS information objects – Information Object (basic concept) • Data Object (bit-stream) • Representation Information – Permits “the full interpretation of Data Object into meaningful information” – Includes documentation, software, metadata, etc. – Information Object Classes • Content Information • Preservation Description Information (PDI) • Packaging Information • Descriptive Information DCC Information Day, University College London, January 16, 2007
  • 24. a centre of expertise in data curation and preservation OAIS information packages • Information package: – Container that encapsulates Content Information and Preservation Description Information (PDI) – Different packages defined for submission to an archive (SIP), archival storage (AIP) and dissemination (DIP) • AIP = “... a concise way of referring to a set of information that has, in principle, all of the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object” – PDI = other information “which will allow the understanding of the Content Information over an indefinite period of time” • Reference, Provenance, Context, Fixity DCC Information Day, University College London, January 16, 2007
  • 25. a centre of expertise in data curation and preservation PREMIS Working Group (1) – PREMIS WG = Preservation Metadata: Implementation Strategies • Sponsored by OCLC and RLG • Established 2003 • International working group and advisory committee (practical focus) – Members from the US, the UK, the Netherlands, Germany, Australia and New Zealand • Chaired by Priscilla Caplan and Rebecca Guenther DCC Information Day, University College London, January 16, 2007
  • 26. a centre of expertise in data curation and preservation PREMIS Working Group (2) • Main objectives: – A 'core' set of preservation metadata elements (Data Dictionary) – Strategies for encoding, packaging, storing, managing, and exchanging metadata • Outputs: – Implementation Survey report (September 2004) – PREMIS Data Dictionary (May 2005) • The data dictionary is a translation of the OAIS-based 2002 Framework into a set of implementable semantic units • Based on data model ... DCC Information Day, University College London, January 16, 2007
  • 27. a centre of expertise in data curation and preservation PREMIS data model Intellectual entities Rights Objects Agents Events DCC Information Day, University College London, January 16, 2007
  • 28. a centre of expertise in data curation and preservation PREMIS Data Dictionary – Defined various quot;semantic unitsquot; for: • Objects ... at three levels of entity (representation, file, bitstream) • Events ... metadata about actions • Agents ... but are not the main focus • Rights ... primarily those that relate to preservation – Also: • An XML implementation – Maintenance activity (led by the Library of Congress) – PREMIS Implementors' Group (PIG) – Already thinking about lessons for version 2.0 DCC Information Day, University College London, January 16, 2007
  • 29. a centre of expertise in data curation and preservation DCC Information Day, University College London, January 16, 2007
  • 30. a centre of expertise in data curation and preservation Other standards – Standards developed from many different perspectives: • PREMIS Data Dictionary • OCLC/RLG Preservation Metadata Framework, Cedars, NEDLIB, NLA, NLNZ ... OAIS influence has been strongest in this area • METS, NISO Z39.87 (emerged from digitisation contexts) – Other standards have also been developed with other aspects of object management in mind: • Records management (VERS, RKMS, ISO 23081-1 Metadata for Records) • Multimedia (MPEG-7, SMPTE Metadata Dictionary) • Rights management (MPEG-21) DCC Information Day, University College London, January 16, 2007
  • 31. a centre of expertise in data curation and preservation Some final challenges DCC Information Day, University College London, January 16, 2007
  • 32. a centre of expertise in data curation and preservation Is metadata sustainable? – Metadata is expensive to create and maintain: • There is a need to balance the risks of data loss (or costs of recovery) with the costs of creating metadata • Automatic capture of some types of metadata – Metadata already embedded in objects or in secondary databases; also need to capture event metadata from archive processes • Sharing information via registries of format information (Representation Information) • Avoid imposing unnecessary costs: – Need to identify the right metadata (or 'core metadata') DCC Information Day, University College London, January 16, 2007
  • 33. a centre of expertise in data curation and preservation The role of shared registries • For the sharing of information about formats (and metadata) – There is quot;… a pressing need to establish reliable, sustained repositories of file format specifications, documentation, and related softwarequot; (Lawrence, et al., 2000) – Examples: • Global Digital Format Registry (GDFR) – Harvard University Library – Funded by the Andrew W. Mellon Foundation • PRONOM technical registry (The National Archives) • DCC Representation Information Registry and Repository (demo) DCC Information Day, University College London, January 16, 2007
  • 34. a centre of expertise in data curation and preservation DCC Information Day, University College London, January 16, 2007
  • 35. a centre of expertise in data curation and preservation Further reading • Three chapters from the DCC Digital Curation Manual: http://www.dcc.ac.uk/resource/curation-manual/chapters/ – Priscilla Caplan, quot;Preservation metadataquot; (July 2006) – Wendy Duff & Marlene van Ballegooie, quot;Archival Metadataquot; (May 2006) – Michael Day, quot;Metadataquot; (November 2005) • Brian Lavoie & Richard Gartner, quot;Preservation metadata.quot; DPC Technology Watch Report 05-01 (September 2005): http://www.dpconline.org/docs/reports/dpctw05-01.pdf • PREMIS Data Dictionary for Preservation Metadata (May 2005): http://www.oclc.org/research/projects/pmwg/ • Reference Model for an Open Archival Information System (OAIS), CCSDS 650 0-B-1 (January 2002): http://public.ccsds.org/publications/archive/650x0b1.pdf DCC Information Day, University College London, January 16, 2007
  • 36. a centre of expertise in data curation and preservation Acknowledgements UKOLN is funded by the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union and other sources. UKOLN also receives support from the University of Bath, where it is based: http://www.ukoln.ac.uk/ The Digital Curation Centre is funded by the JISC and the UK e-Science Programme: http://www.dcc.ac.uk/ DCC Information Day, University College London, January 16, 2007
  • 37. a centre of expertise in data curation and preservation DCC Information Day, University College London, January 16, 2007