Presentation 16 may keynote karin bredenbergPresentation Transcript
1Preservation Metadata: an introduction to PREMISandits application in audiovisual archivesKarin Bredenberg, The National Archives of SwedenMember of PREMIS Editorial Comittee2013-05-16
2The Challenge of Digital Preservation
3How to access the material 1 month, 1 year,10 years from now?• Information about the material–Intellectual information (Descriptive Metadata)• Who• Why• and so on–”Physical” information (Digital Preservation Metadata)• Which kind of file am I?• What has happened to me during the years?• Who can look at me?• And so onMetadata = data about dataDigital Preservation Metadata =metadata that is essential to ensure long-term accessibility of digital resources
4• A best guess on the future– little experience validating the longevity of digital objects– uncertain future technical possibilities– uncertain future legal framework• Digital objects must be self-descriptive• Must be able to exist independently from the systemswhich were used to create them– XML (machine and human readable)What Digital PreservationMetadata to store?
5OAISOpen Archival Information System or also the ISO OAIS Reference Model for an OAIS(A simple OAIS explanation byRichard Pearce-Moses and more)
6The PREMIS Data Dictionary• Information you need to know for preservingdigital objects• Available on line through the PREMIS website• Preservation Metadata:Implementation Strategies– Includes PREMIS Data Dictionary, context/assumptions, datamodel, usage examples– XML schema to support implementation
7PREMIS Web and PREMIS EC• Web site:– Permanent Web presence(http://www.loc.gov/standards/premis/ ),hosted by Library of Congress– Central destination for PREMIS-related info,announcements, resources– Home of the PREMIS Implementers’ Group (PIG)discussion list (email@example.com)• PREMIS Editorial Committee:– Set directions/priorities for PREMIS development– Considers proposals for changes– Coordinates revisions of Data Dictionary and XML schema– Consists of members with different affiliations from all over the world.– Meetings once a month (sometimes more)– Hosts PREMIS events eg PREMIS Implementation Fair at iPRES
8OAIS Reference Model and PREMIS• OAIS reference model specifies the Preservation Description Information (PDI)• PREMIS used the OAIS information model as a starting point• PREMIS Data Dictionary consolidated and further developed the conceptual types ofinformation objects into more than 100 structured and logically integrated semanticunits.• PREMIS Data Dictionary provided detailed descriptions and guidelines to implementthese semantic units.• PREMIS Data Dictionary does not provide semantic units for Intellectual Entities, butprovides semantic units to link to other metadata sources for Intellectual Entities (thiswill change in version 3)• All entities have reference (identification) information.• No “packaging information” that links content with metadata, but PREMIS can beused with container schemas• PREMIS deals mostly with representation, context, provenance, and fixityinformation, in keeping with PREMIS definition of preservation metadata.
9The PREMIS data model: 5 interactingentitiesIntellectualEntityObjectEventAgentRightsidentifier
11Sample Data Dictionary EntrySemantic unit sizeSemanticcomponentsNoneDefinition The size in bytes of the file or bitstream stored in therepository.Rationale Size is useful for ensuring the correct number of bytes fromstorage have been retrieved and that an application hasenough room to move or process files. It might also be usedwhen billing for storage.Data constraint IntegerObject category Representation File BitstreamApplicability Not applicable Applicable ApplicableExamples 2038927Repeatability Not repeatable Not repeatableObligation Optional OptionalCreation/Maintenance notesAutomatically obtained by the repository.Usage notes Defining this semantic unit as size in bytes makes itunnecessary to record a unit of measurement. However, forthe purpose of data exchange the unit of measurement shouldbe stated or understood by both partners.
12• What PREMIS DD is:– Common data model for organizing/thinking aboutpreservation metadata– Implementable– Standard for exchanging information packages betweenrepositories– Technically neutral– Core metadataScope
13• What PREMIS DD is not:– Out-of-the-box solution– All needed metadata– Lifecycle management of objects outside repository– Rights managementScope
14Technology Dependence0001.tiff 0002.tiff 0003.tiff 0004.tiff 000156.tiff0005.tiff 0006.tiffNo direct access • Not self-descriptive• Complex formatsComplex environmentsdigital…
15Information packages• Information about owner; what the package is and more• The files, checksum, filenam, use and more• Technical information like Digital Preservation Metadata, what has happend tothe files and more– need for detailed rendering information» Software» Hardware» Other dependencies: schemas, style sheets, encodings, etc.– need for format information• Information about structure, how are the files related?
16Standards for Information Packages• One commonly used standard is METSMetadata Encoding and Transmission Standard• PREMIS can be used togehter with METS<metsHdr><dmdSec><amdSec><fileSec><structMap><structLink><behaviorSec><mets>mets Headerdescriptive metadata Sectionadministrative metadata Sectionfile Sectionstructural Map sectionstructural Link sectionbehavior Section
17Technical metadata for audio and video• A “new” need, objects now created digitally and digitization has increased• Not as fast developed as other technical metadata schemes• Complexities of file formats require expertise to develop and implementthese• Few standards available for metadata about audio and video– AES (will be briefly introduced)– audioMD and videoMD (will be briefly introduced)– Material Exchange Format (MXF)– Technical metadata in EBUCore, PBCore– In US the Federal Agencies Digitization Guidelines Initiative (FAGDI)– MPEG-7 and MPEG-21 for video• Programs creating audio and/or video often can export metadata.Question: Is this exported information sufficient?Answer: Needs to be evaluated at the archives and a decision taken!
18AES• Audio Engineering Society (http://www.aes.org/ )• AES-X098B supersided by:– AES57-2011-f (2011)AES standard for audio metadata - Audio object structures for preservation andrestoration– AES60-2011-f (2011)AES standard for audio metadata - Core audio metadata• Two XML schemas available• According to earlier know information 98C (video) was planned to bemade after 98B had been established• Some educational orientated presentations can be found.
19audioMD and videoMD (AMD and VMD)• Hosted by Library of Congress(http://www.loc.gov/standards/amdvmd/index.html )• Simple schemas developed during 10 years• Current version published during spring 2011• Information about one use case together with METS• Mailing list exists, but rarely used• Archives interested in using not too complex schemas for preservationpurposes
20Tools• PREMIS in METS toolbox• The controlled vocabularies database• Some institutions are making repository software availablethat implements PREMIS– DAITSS Digital Preservation Repository Software– Archivematica
21The controlled vocabularies database• Library of Congress is establishing databases with controlled vocabulary valuesfor standards that it maintains• http://id.loc.gov• Now also specific vocabularies for PREMIS semantic units:preservationLevelRole, cryptographicHashAlgorithm, eventType• Additional PREMIS controlled lists to be made available with the PREMIS OWLontology
22PREMIS Web Ontology Language (OWL) ontology• Initiated by the Archipel project to use PREMIS in Open Archives InitiativeObject Reuse and Exchange (OAI-ORE)(description/exchange of Web resources)• Resource Description Framework (RDF) serialization of preservationmetadata as a data management function in a preservation repository• Interoperate with other preservation Linked Data efforts such as UDFR(Unified Digital Formats Registry)• Interoperate with PREMIS controlled vocabularies at http://id.loc.gov
23PREMIS OWL ontology in anutshell• Purpose– Providing the community with an RDF serialization ofthe PREMIS data model and dictionary– While remaining as close as possible to the datadictionary’s clearly defined semanticsRDF modelling in 3 words:• Everything modelled under the form ofsubject-verb-object• But what objects? what verbs? whatobjects?role of vocabularies & ontologies
24Implementation issues:Conformance• Conformant Implementation of the PREMIS DataDictionary http://www.loc.gov/standards/premis/premis-conformance-oct2010.pdf• What does "being conformant to PREMIS"mean?• Conformant at which level?– semantic unit: conformant implementation of theinformation defined in a particular semantic unit– data dictionary: conformant implementation of allsemantic units• Conformant from what perspective?– internal: conformant implementation at semantic units anddata dictionary levels– external (exchanging PREMIS descriptions):import = the repository can manage PREMIS conformantinformationexport = the repository can provide others with PREMIS
25Implementation issues: Technical• Which semantic units to use besides themandatory?• Create own vocabularys?• Where to store the metadata?– In an XML-document?– In one or more databases?• Which event to store?• How to store agents, rights management?• In short:A lot of descision making needs to be preformed!
26Conclusion• Using PREMIS as the basis for digital preservation metadata iswidely implemented• Both IT and the archives need to work together.Different kind of expertise.• Complexities of audio and video require increased need fortechnical and structural metadata• Increasing use of digital preservation metadata for archiving audioand video is expected• Examples of use of PREMIS together with audio and videometadata is needed
27Thank you!Karin Bredenberg, The National Archives of Swedenkarin.firstname.lastname@example.orgPresentation made with the help of:Angela DappertSébastien PeyrardRebecca Guenther