Preservation metadata

         Andrew Waugh
Senior Manager, Standards and Policy
  Public Record Office Victoria
Structure of the talk
•   What is preservation metadata?
•   Recordkeeping metadata in theory
•   NAA/ANZ recordkeeping metadata standard
•   PREMIS – standard for preservation metadata
•   Practical reading and implementing tips
•   Conclusions
What is preservation?
• The ability to be able to access content for as
  long as it is required
• Access means
  – Being able to find the content
  – Extract information from the content
  – Understand the context of the content
  – Be confident of the history of the content
Preservation metadata
• Preservation metadata is the information
  necessary to maintain access to content
• Difference between short and long term
  access is one of degree of metadata, not kind
• As preservation professionals, we are rarely
  interested in the content, just managing it.
  Preservation metadata is the basic information
  that we use to do our job
Examples of preservation
              metadata
•   Identifier
•   Creation date
•   Title
•   History information
•   Relationship between objects
•   Data formats
Recordkeeping Metadata
• The archival profession has been developing
  recordkeeping (=preservation) metadata for
  around a decade
• This work provides a useful framework to think
  about preservation and metadata
RK Metadata Standards
• ISO 20381 Information and documentation –
  Records management processes – Metadata for
  records
  – Part 1: Principles
  – Part 2: Conceptual and implementation issues
• National Archives of Australia (and Archives New
  Zealand) - Recordkeeping Metadata Standard
  Version 2.0
  – http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16
    %2007%2008_Revised_tcm2-12630.pdf
• Forthcoming Australian/New Zealand Standard
Metadata from a records view
• Records are content, context, and structure
• Record management metadata is data
  describing the context, content, and structure
  of records and their management through time
  (ISO 15489-1:2001, 3.12)
• Recordkeeping metadata is the key to
  providing access (and hence preservation)
• In practice, metadata is everything except the
  actual content of the record
Purpose of recordkeeping
              metadata
• The purpose of recordkeeping metadata includes
  – Protecting records as evidence
  – Ensuring their accessibility and usability through time
  – Facilitating the ability to understand records
  – Helping ensure the authenticity, reliability and integrity of
    records
  – Supporting and managing access, privacy, and rights
  – Supporting the migration of records from one
    (preservation) system to another
Metadata at record capture
• Records are captured into a system, and
  metadata is created/captured with them
• This metadata documents
  – Environment in which records were created
  – Purpose or business activity being undertaken
  – Relationship with other records or aggregations
  – Physical or technical structure of the record
  – Logical structure of the record
Metadata after record capture
• Metadata captured after record creation
  documents what happened to a record over
  time
  – demonstrates authenticity, reliability, usability, and
    integrity)
• Answers the basic questions of who, what,
  when, where, why
Metadata after disposal
• Metadata is a record itself, and some parts
  may need to be kept after the record has been
  disposed of to account for their existence,
  management, and disposition
Four entity model
• Modern Australian recordkeeping metadata
  models normally are expressed in terms of
  entities
  – Records (the objects to be preserved: record, file,
    series…)
  – Agents (people who create and use the records)
  – The business transacted
  – Mandates (the rules governing the business)
Four entity model




    • ISO23081-2 s6.1
One, two, three, four entity models
• The four entity model can be flattened to
  facilitate implementation
  – A system could only store one entity (record)
    which contains metadata for agents, business,
    and mandates
  – Practical because most metadata is captured at
    creation, subsequent changes in relationships or
    information less relevant
Metadata associated with an entity




                       • ISO23081-2 s6.1
Identity metadata
• Distinguishes entity from all other entities in
  the domain
  – Entity type (e.g. record, agent)
  – Aggregation (e.g. file, record)
  – Registration Identifier (the actual identifier)
Description metadata
• Describes the entity to allow determination if
  this is the entity sought
  – Title
  – Classification
  – Abstract
  – Place
  – External Identifiers
• WARNING – description elements are
  normally business specific
Use metadata
• Assists long-term access to the entity
  – Technical environment
  – Rights (who may legal use it & under what
    conditions)
  – Access (access control)
  – Language
  – Integrity
  – Documentary form
Event plan
• Allows the entity to be managed
• Consists of management actions that are
  planned to occur in the future
  – Appraisal (To keep or not)
  – Disposal (Implementation of appraisal decision)
  – Preservation
  – Access Control (Changes to)
  – Rights (Changes to)
Event history
• Documents the trail of past events
• Who, what, when, why
  – Event identifier
  – Event date/time
  – Event type
  – Event description
  – Event relation (mandate, agent)
Relation
• Links two (or more) entities
• Implicitly bi-directional, but need not be
  implemented this way
• Relationships often have a time span
  – Entity Identifiers (from, to)
  – Relationship type
  – Relationship description
  – Relationship date range
NAA/ANZ metadata standard
• Same content, two standards
• NAA version
  – Recordkeeping Metadata Standard Version 2.0
  – http://www.naa.gov.au/Images/AGRkMS_Final%2
    0Edit_16%2007%2008_Revised_tcm2-12630.pdf
  – Based on five entities (Record, Agent, Business,
    Mandate, Relationship)
  – Defines 26 elements with 44 sub-elements
  – Includes extensive element schemes
NAA/ANZ Elements
                                   All Entities
                                   Entity Type                          Mandatory Element
                                   Category                             Conditional Element
                                   Identifier*                          Optional Element
                                   Name*
                                   Date Range
                                   Description




Record             Agent           Business          Mandate              Relationshp
Jurisdiction*      Jurisdiction*   Jurisdiction*     Jurisdiction*        Related Entity*
Security Class*    Permissions*    Security Class*   Security Class*      Change History*
Security Caveat*   Contact*        Permissions*      Security Caveat*
Rights*            Position*                         Coverage*
Language*          Language*
Coverage*
Keyword*
Disposal*
Format
Extent*
Medium
Integrity Check
Location*
Document Form
Precedence
Future Australian Standard
• Work is in progress on an Australian Standard
  for recordkeeping metadata
• Based on the NAA/ANZ metadata standard
• Focus on relationships
PREMIS
• Preservation metadata is the information a
  respository uses to support the digital preseration
  process
• Supports the viability, renderability,
  understandability, authenticity, and identity of digital
  objects
• Built on OAIS reference model
• Data dictionary & supporting materials
   – http://www.loc.gov/standards/premis/
PREMIS scope
• Not intended to define all preservations elements,
  only those that most repositories are likely to need to
  know in order to support digital preservation
• Excludes
   – Format specific metadata (even for a class of format)
   – Repository specific metadata and business rules
   – Descriptive metadata
   – Detailed information about media or hardware
   – Information about agents, apart from minimum required for
     identification
   – Information about rights and permissions, except those
     that directly affect preservation functions
PREMIS Data Model




•   From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf
PREMIS Entities
• Intellectual Entity – set of content that is a single
  intellectual unit – has no metadata in PREMIS
• Object Entity – things actually stored in a repository
   – Representation Object – collection of all file objects
     necessary to represent an intellectual entity
   – File Object – discrete object on a computer file system
   – Bitstream Object – portion of a file
• Event Entity – contains the history of an Object
• Rights Entity – rights and permissions about object
• Agent Entity – actors involved in events or rights
Elements for Object Entities
•   Unique Identifier     • Significant properties
•   Fixity information      (aspects that must be
•   Size                    preserved)
                          • Environment
•   Format
                            (infrastructure required
•   Original Name           to use)
•   Creators              • Storage media
•   Inhibitors (things    • Digital signatures
    designed to prevent
    use)                  • Relationship with other
                            entities
NAA/ANZ vs PREMIS
• NAA/ANZ                        • PREMIS
  – Recordkeeping is about         – Deliberately focuses on
    relationships, so includes       preserving the files that
    the context of objects           form a digital object –
    which is often necessary         context is important, but
    to understand the object         not documented
  – Documents the                  – Documents critical
    management plan for the          information necessary to
    object                           use objects
Reading metadata schemas

   Don’t panic at the length…
General observations
• Most metadata schemes are lengthy, but
  contain relatively little information
• If you understand the typical structure, it is
  easy to quickly pick out the information you
  need
• Metadata schemes tend to be aspirational –
  what the drafters thought you should do, often
  beyond what can do or have to do
Metadata schemes
• Typical metadata schemes contain
  – Entities (i.e. objects modelled)
     • Definition
     • Lists valid elements
  – Elements (i.e. specific pieces of information)
     •   Definition
     •   Mandatory, optional, conditional flag
     •   Repeatable or not
     •   Structure (child elements)
  – Element schemas (i.e. controls over the values that can be
    used)
     • Lists of valid values (e.g. States)
     • Format controls (e.g. dates)
Implementation
• Metadata schemes are information models, not
  implementation instructions
• Adopting a scheme means that your implementation
  has the
  –   mandatory elements
  –   conditional elements (if relevant)
  –   (perhaps) some of the optional elements
  –   The element structure is correct
• Metadata schemes are often associated with a
  representation standard (e.g. in XML)
  – Still not an implementation – often just for exchange
Conclusions
• Preservation metadata is simply the
  information that preservation professionals
  use to ensure continued access to objects
• What is viewed as essential depends on your
  discipline (what features is it necessary to
  preserve?)
  – E.g. archivists are concerned about context,
    librarians less so
Conclusions (2)
• Typical preservation        • Other common
  metadata                      metadata
  – Identity information        – Description
  – Technical details and       – Management Plans
    organisation of the         – Relationships between
    objects to be preserved       objects
  – Rights and access
  – History of object
Conclusions (3)
• You only have to implement the logical model
  and the mandatory elements
• Standards are usually aspirational – include
  metadata that is nice to have, but not essential
• Specific representations (e.g. XML) are for
  data exchange, not how you must implement
  them internally

Andrew waugh

  • 1.
    Preservation metadata Andrew Waugh Senior Manager, Standards and Policy Public Record Office Victoria
  • 2.
    Structure of thetalk • What is preservation metadata? • Recordkeeping metadata in theory • NAA/ANZ recordkeeping metadata standard • PREMIS – standard for preservation metadata • Practical reading and implementing tips • Conclusions
  • 3.
    What is preservation? •The ability to be able to access content for as long as it is required • Access means – Being able to find the content – Extract information from the content – Understand the context of the content – Be confident of the history of the content
  • 4.
    Preservation metadata • Preservationmetadata is the information necessary to maintain access to content • Difference between short and long term access is one of degree of metadata, not kind • As preservation professionals, we are rarely interested in the content, just managing it. Preservation metadata is the basic information that we use to do our job
  • 5.
    Examples of preservation metadata • Identifier • Creation date • Title • History information • Relationship between objects • Data formats
  • 6.
    Recordkeeping Metadata • Thearchival profession has been developing recordkeeping (=preservation) metadata for around a decade • This work provides a useful framework to think about preservation and metadata
  • 7.
    RK Metadata Standards •ISO 20381 Information and documentation – Records management processes – Metadata for records – Part 1: Principles – Part 2: Conceptual and implementation issues • National Archives of Australia (and Archives New Zealand) - Recordkeeping Metadata Standard Version 2.0 – http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16 %2007%2008_Revised_tcm2-12630.pdf • Forthcoming Australian/New Zealand Standard
  • 8.
    Metadata from arecords view • Records are content, context, and structure • Record management metadata is data describing the context, content, and structure of records and their management through time (ISO 15489-1:2001, 3.12) • Recordkeeping metadata is the key to providing access (and hence preservation) • In practice, metadata is everything except the actual content of the record
  • 9.
    Purpose of recordkeeping metadata • The purpose of recordkeeping metadata includes – Protecting records as evidence – Ensuring their accessibility and usability through time – Facilitating the ability to understand records – Helping ensure the authenticity, reliability and integrity of records – Supporting and managing access, privacy, and rights – Supporting the migration of records from one (preservation) system to another
  • 10.
    Metadata at recordcapture • Records are captured into a system, and metadata is created/captured with them • This metadata documents – Environment in which records were created – Purpose or business activity being undertaken – Relationship with other records or aggregations – Physical or technical structure of the record – Logical structure of the record
  • 11.
    Metadata after recordcapture • Metadata captured after record creation documents what happened to a record over time – demonstrates authenticity, reliability, usability, and integrity) • Answers the basic questions of who, what, when, where, why
  • 12.
    Metadata after disposal •Metadata is a record itself, and some parts may need to be kept after the record has been disposed of to account for their existence, management, and disposition
  • 13.
    Four entity model •Modern Australian recordkeeping metadata models normally are expressed in terms of entities – Records (the objects to be preserved: record, file, series…) – Agents (people who create and use the records) – The business transacted – Mandates (the rules governing the business)
  • 14.
    Four entity model • ISO23081-2 s6.1
  • 15.
    One, two, three,four entity models • The four entity model can be flattened to facilitate implementation – A system could only store one entity (record) which contains metadata for agents, business, and mandates – Practical because most metadata is captured at creation, subsequent changes in relationships or information less relevant
  • 16.
    Metadata associated withan entity • ISO23081-2 s6.1
  • 17.
    Identity metadata • Distinguishesentity from all other entities in the domain – Entity type (e.g. record, agent) – Aggregation (e.g. file, record) – Registration Identifier (the actual identifier)
  • 18.
    Description metadata • Describesthe entity to allow determination if this is the entity sought – Title – Classification – Abstract – Place – External Identifiers • WARNING – description elements are normally business specific
  • 19.
    Use metadata • Assistslong-term access to the entity – Technical environment – Rights (who may legal use it & under what conditions) – Access (access control) – Language – Integrity – Documentary form
  • 20.
    Event plan • Allowsthe entity to be managed • Consists of management actions that are planned to occur in the future – Appraisal (To keep or not) – Disposal (Implementation of appraisal decision) – Preservation – Access Control (Changes to) – Rights (Changes to)
  • 21.
    Event history • Documentsthe trail of past events • Who, what, when, why – Event identifier – Event date/time – Event type – Event description – Event relation (mandate, agent)
  • 22.
    Relation • Links two(or more) entities • Implicitly bi-directional, but need not be implemented this way • Relationships often have a time span – Entity Identifiers (from, to) – Relationship type – Relationship description – Relationship date range
  • 23.
    NAA/ANZ metadata standard •Same content, two standards • NAA version – Recordkeeping Metadata Standard Version 2.0 – http://www.naa.gov.au/Images/AGRkMS_Final%2 0Edit_16%2007%2008_Revised_tcm2-12630.pdf – Based on five entities (Record, Agent, Business, Mandate, Relationship) – Defines 26 elements with 44 sub-elements – Includes extensive element schemes
  • 24.
    NAA/ANZ Elements All Entities Entity Type Mandatory Element Category Conditional Element Identifier* Optional Element Name* Date Range Description Record Agent Business Mandate Relationshp Jurisdiction* Jurisdiction* Jurisdiction* Jurisdiction* Related Entity* Security Class* Permissions* Security Class* Security Class* Change History* Security Caveat* Contact* Permissions* Security Caveat* Rights* Position* Coverage* Language* Language* Coverage* Keyword* Disposal* Format Extent* Medium Integrity Check Location* Document Form Precedence
  • 25.
    Future Australian Standard •Work is in progress on an Australian Standard for recordkeeping metadata • Based on the NAA/ANZ metadata standard • Focus on relationships
  • 26.
    PREMIS • Preservation metadatais the information a respository uses to support the digital preseration process • Supports the viability, renderability, understandability, authenticity, and identity of digital objects • Built on OAIS reference model • Data dictionary & supporting materials – http://www.loc.gov/standards/premis/
  • 27.
    PREMIS scope • Notintended to define all preservations elements, only those that most repositories are likely to need to know in order to support digital preservation • Excludes – Format specific metadata (even for a class of format) – Repository specific metadata and business rules – Descriptive metadata – Detailed information about media or hardware – Information about agents, apart from minimum required for identification – Information about rights and permissions, except those that directly affect preservation functions
  • 28.
    PREMIS Data Model • From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf
  • 29.
    PREMIS Entities • IntellectualEntity – set of content that is a single intellectual unit – has no metadata in PREMIS • Object Entity – things actually stored in a repository – Representation Object – collection of all file objects necessary to represent an intellectual entity – File Object – discrete object on a computer file system – Bitstream Object – portion of a file • Event Entity – contains the history of an Object • Rights Entity – rights and permissions about object • Agent Entity – actors involved in events or rights
  • 30.
    Elements for ObjectEntities • Unique Identifier • Significant properties • Fixity information (aspects that must be • Size preserved) • Environment • Format (infrastructure required • Original Name to use) • Creators • Storage media • Inhibitors (things • Digital signatures designed to prevent use) • Relationship with other entities
  • 31.
    NAA/ANZ vs PREMIS •NAA/ANZ • PREMIS – Recordkeeping is about – Deliberately focuses on relationships, so includes preserving the files that the context of objects form a digital object – which is often necessary context is important, but to understand the object not documented – Documents the – Documents critical management plan for the information necessary to object use objects
  • 32.
    Reading metadata schemas Don’t panic at the length…
  • 33.
    General observations • Mostmetadata schemes are lengthy, but contain relatively little information • If you understand the typical structure, it is easy to quickly pick out the information you need • Metadata schemes tend to be aspirational – what the drafters thought you should do, often beyond what can do or have to do
  • 34.
    Metadata schemes • Typicalmetadata schemes contain – Entities (i.e. objects modelled) • Definition • Lists valid elements – Elements (i.e. specific pieces of information) • Definition • Mandatory, optional, conditional flag • Repeatable or not • Structure (child elements) – Element schemas (i.e. controls over the values that can be used) • Lists of valid values (e.g. States) • Format controls (e.g. dates)
  • 35.
    Implementation • Metadata schemesare information models, not implementation instructions • Adopting a scheme means that your implementation has the – mandatory elements – conditional elements (if relevant) – (perhaps) some of the optional elements – The element structure is correct • Metadata schemes are often associated with a representation standard (e.g. in XML) – Still not an implementation – often just for exchange
  • 36.
    Conclusions • Preservation metadatais simply the information that preservation professionals use to ensure continued access to objects • What is viewed as essential depends on your discipline (what features is it necessary to preserve?) – E.g. archivists are concerned about context, librarians less so
  • 37.
    Conclusions (2) • Typicalpreservation • Other common metadata metadata – Identity information – Description – Technical details and – Management Plans organisation of the – Relationships between objects to be preserved objects – Rights and access – History of object
  • 38.
    Conclusions (3) • Youonly have to implement the logical model and the mandatory elements • Standards are usually aspirational – include metadata that is nice to have, but not essential • Specific representations (e.g. XML) are for data exchange, not how you must implement them internally