Day 2, workshop 4, Inge Van Nieuwerburgh


Published on

Published in: Education, Technology
1 Comment
1 Like
  • good contains all the achiving basics associated with experience
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • 100.000 tapes uit de jaren ‘70 en ‘80 in Toulouse space center
  • Selection!
  • zie ook seal of approval DANS An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community
  • Submission Information package Archival Information Package Dissemination Information Package
  • zie ook seal of approval DANS An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community
  • Day 2, workshop 4, Inge Van Nieuwerburgh

    1. 1. (meta)data standards for digital archiving DISH 2009 @ Rotterdam Universiteitsbibliotheek Gent – MMLab UGent
    2. 2. Summary <ul><ul><li>Introduction </li></ul></ul><ul><ul><li>Defining the problem </li></ul></ul><ul><ul><li>State of the art: </li></ul></ul><ul><ul><ul><li>OAIS </li></ul></ul></ul><ul><ul><ul><li>Data formats </li></ul></ul></ul><ul><ul><ul><li>Metadata schemas </li></ul></ul></ul><ul><ul><ul><li>Declarative containers </li></ul></ul></ul><ul><ul><li>Layered Metadata Model </li></ul></ul><ul><ul><li>Best practices </li></ul></ul>
    3. 3. Introduction
    4. 4. BOM Vl: Preservation and disclosure of multimedia data in Flanders <ul><ul><li>Flemish project – 1.5 years </li></ul></ul><ul><ul><li>Cross sectoral: broadcasters, archival institutions, cultural sector and the libraries. </li></ul></ul><ul><ul><li>Studies: </li></ul></ul><ul><ul><li>Needs for preservation </li></ul></ul><ul><ul><li>Selection </li></ul></ul><ul><ul><li>Metadata standards & exchange formats </li></ul></ul><ul><ul><li>Digital rights </li></ul></ul><ul><ul><li>Supply and distribution models </li></ul></ul>
    5. 5. Defining the problem
    6. 6. Problems when archiving digital information <ul><li>Problem 1 . </li></ul><ul><ul><li>Analogous formats are disappearing and have to be replaced by digital alternatives. </li></ul></ul><ul><ul><li>Quick growth of data. </li></ul></ul><ul><ul><li>Discrepancy between the short life span of digital technology and the need for long term archiving. </li></ul></ul>
    7. 7. Problems when archiving digital information <ul><li>Problem 2 . </li></ul><ul><ul><li>In digital form, information is abstract , independent from the storage medium. The abstract information has to be preserved, not the medium. </li></ul></ul>
    8. 8. Problems when archiving digital information <ul><li>But also consider… </li></ul>
    9. 9. Growth Storage capacity of desktop computers (HanKwang 2008)
    10. 10. Evolution of used file formats (PRONOM) 1980 1990 2000 ‘ 86 – TIFF3 ’ 87 ‘88 TIFF4 & 5 ‘ 92 – TIFF6 ‘ 96 - PNG 1.0 ’ 99 – PNG 1.2 ’ 00 - JPEG2000 ‘ 92 - JPEG ’ 87 – GIF87 ’ 87 – GIF89 ‘ 92 - MrSID ‘ 85 - BMP ‘ 84 - TGA ‘ 03 - SVG ’ 84 - GEM Raster
    11. 11. Evolution format derivatives <ul><li>MIME type image/tiff: </li></ul><ul><li>TIFF (alle versies) </li></ul><ul><li>TIFF/IT </li></ul><ul><li>TIFF G4/LZW/UNC </li></ul><ul><li>Digital Negative Format (DNG) </li></ul><ul><li>GeoTIFF </li></ul><ul><li>Pyramid TIFF </li></ul><ul><li>… </li></ul>Bron: PRONOM Technical Registry []
    12. 12. Riscs at the long term Bit Errors/Bugs File Format Changes Time Changing Technology Organizational changes Interpretation of the format 1980 1990 2000
    13. 13. Study: state of the art (meta)data standards
    14. 14. <ul><li>What is a digital archive NOT : </li></ul><ul><ul><li>mass storage for active applications and data </li></ul></ul><ul><ul><li>a networked backup solution </li></ul></ul><ul><li>What is a digital archive: </li></ul><ul><ul><li>Storage of digital information with historical , scientific , financial or legal value in the long term. </li></ul></ul><ul><ul><li>Platform independent access to digital information for 50, 100 years or longer . </li></ul></ul>What is a digital archive?
    15. 15. OAIS
    16. 16. Open Archival Information System (OAIS) <ul><li>Reference model for the description of digital archives. </li></ul><ul><li>Developed in 1982: </li></ul><ul><ul><li>NASA (US) </li></ul></ul><ul><ul><li>ESA (EU) </li></ul></ul><ul><ul><li>RSA (USSR) </li></ul></ul><ul><ul><li>NASDA (Japan) </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Since 2002 ISO Standard 14721 </li></ul>
    17. 17. OAIS model <ul><li>Consists of 3 parts: </li></ul><ul><ul><li>Description of an archival system : responsabilities, procedures and a common terminology. </li></ul></ul><ul><ul><li>Functional model : all processes needed for the longterm preservation of digital information. </li></ul></ul><ul><ul><li>Information model : describes the stored digital information. </li></ul></ul>
    18. 18. OAIS functional model
    19. 19. <ul><li>Need to explore the necessary, recommended and generally used standards </li></ul><ul><ul><li>technical schemas </li></ul></ul><ul><ul><li>descriptive schemas </li></ul></ul><ul><ul><li>preservation schemas </li></ul></ul><ul><ul><li>structural schemas </li></ul></ul><ul><li>What are the different metadata schemas (if any) used in the different cultural sectors? </li></ul>Standards
    20. 20. Data formats
    21. 21. What <ul><li>Raw data is increasingly storage consuming </li></ul><ul><li>Need to compress: compression standards </li></ul><ul><ul><li>video: Mpeg-2, H.264/Mpeg-4 AVC, Motion JPEG2000 </li></ul></ul><ul><ul><li>audio: MP3, AAC </li></ul></ul><ul><ul><li>images: JPEG, TIFF </li></ul></ul><ul><li>Need for container formats for exchange of A/V material </li></ul><ul><ul><li>MXF, AVI, WMA, MP4 </li></ul></ul>
    22. 22. Metadata schemas
    23. 23. What <ul><li>Descriptive metadata </li></ul><ul><li>Administrative metadata </li></ul><ul><li>Preservation metadata </li></ul><ul><li>Technical metadata </li></ul><ul><li>Usage data </li></ul>
    24. 24. Standards <ul><li>Especially for descriptive metadata: differences in sectors </li></ul><ul><ul><li>=> Preferred standard per sector? </li></ul></ul><ul><ul><li>Differences in detail </li></ul></ul><ul><ul><li>Differences in structure </li></ul></ul><ul><ul><li>Differences in relations </li></ul></ul><ul><li>Preservation metadata: PREMIS </li></ul><ul><li>Conceptual models </li></ul>
    25. 25. Declarative containers
    26. 26. What <ul><li>Compound information objects, combining descriptive, administrative and/or structural metadata </li></ul><ul><li>Advantage: the ease to exchange and reuse them </li></ul><ul><li>some examples: </li></ul><ul><ul><li>METS </li></ul></ul><ul><ul><li>MPEG-21 DIDL: describe complex digital objects </li></ul></ul><ul><ul><li>LOM: learning objects </li></ul></ul><ul><ul><li>ORE: model to describe aggregations </li></ul></ul>
    27. 27. Layered Metadata Model
    28. 28. How to proceed? <ul><li>Need for a layered metadata model to manage digital archive </li></ul><ul><li>Why? Too much differences between data models </li></ul><ul><li>Need a common ground </li></ul>
    29. 29. Solution: layered metadata model <ul><li>Model in different layers: </li></ul><ul><ul><li>A generic top level descriptive metadata schema (DC) </li></ul></ul><ul><ul><li>A refined standard per sector for detail, to preserve the metadata in detail </li></ul></ul><ul><ul><li>+ Preservation metadata, technical metadata and rights metadata </li></ul></ul>
    30. 30. Layered metadata model MARCXML TIFF PSD Descriptive metadata: Dublin Core Preservation metadata: PREMIS Rights metadata: PREMIS, MPEG-21/REL, INDECS, ODRL, XrML Technical metadata: PREMIS, MPEG-7, Z38.87, AudioMD, VideoMD, TextMD MARC Standard TIFF Standard
    31. 31. Layered Metadata Model <ul><li>Descriptive Model: Dublin Core </li></ul><ul><li>Most interoperable, cross sectoral. </li></ul><ul><li>Greatest common divider of all metadata models. </li></ul><ul><li>All fields are repeatable and optional. </li></ul><ul><ul><ul><li>Mapping between own metadata model. </li></ul></ul></ul><ul><li>Dublin Core as pidgin: </li></ul><ul><ul><li>DC as common layer above the own metadata. </li></ul></ul><ul><ul><li>DC as model for querying. </li></ul></ul><ul><ul><li>Discovery and identification of digital objects. </li></ul></ul>
    32. 32. Layered Metadata Model <ul><li>Descriptive Model: Dublin Core </li></ul><ul><li>How to disseminate as DC? </li></ul><ul><li>Crosswalk to DC is made for the most important metadata models used in the sectors: </li></ul><ul><li>Libraries: MARC21 </li></ul><ul><li>A/V Sector: P/Meta </li></ul><ul><li>Arts sector and museums: CDWA and SPECTRUM </li></ul><ul><li>Archiving sector: ISAD(G) and EAD </li></ul><ul><li>Crosswalks can be used to disseminate the DC records via OAI-PMH, GRDDL(XSLT), mapping API (D2RQ), or ontology linking. </li></ul>
    33. 33. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Administrative metadata + Rights Metadata </li></ul><ul><li> assisting in the management of the digital objects. </li></ul><ul><li>Technical metadata </li></ul><ul><li> assisting the access (conversions or emulation). </li></ul><ul><li>Preservation Metadata </li></ul><ul><li> Tracking the provenance – history of all actions on an object. </li></ul>
    34. 34. Layered Metadata Model Preservation Model: PREMIS
    35. 35. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Objects: Describes the objects to be preserved in a technical manner. </li></ul><ul><li>3 subclasses: </li></ul><ul><ul><ul><li>Bitstream </li></ul></ul></ul><ul><ul><ul><li>File </li></ul></ul></ul><ul><ul><ul><li>Representation </li></ul></ul></ul><ul><li>Facilitates the conversion or emulation process. </li></ul>
    36. 36. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Objects: </li></ul><ul><ul><ul><li>Describes the objects to be preserved in a technical manner. </li></ul></ul></ul><ul><ul><ul><li>3 subclasses: </li></ul></ul></ul><ul><ul><ul><ul><li>Bitstream </li></ul></ul></ul></ul><ul><ul><ul><ul><li>File </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Representation </li></ul></ul></ul></ul><ul><ul><ul><li>Facilitates the conversion or emulation process. </li></ul></ul></ul>
    37. 37. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Agents: </li></ul><ul><ul><ul><li>Aggregates information about agents (persons, organisations, software) associated with rights management and preservation events in the life of a data object . </li></ul></ul></ul><ul><ul><ul><li>No direct relation between Agent and Object: </li></ul></ul></ul><ul><ul><ul><ul><li>May hold or grant one or more rights </li></ul></ul></ul></ul><ul><ul><ul><ul><li>May carry out, authorize, or compel one or more events. </li></ul></ul></ul></ul><ul><ul><ul><li>Identify agents uniquely. </li></ul></ul></ul>
    38. 38. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Events: </li></ul><ul><ul><ul><li>Actions that modify objects should always be recorded. Other actions such as copying an object for backup purposes may be recorded in an Event entity. </li></ul></ul></ul><ul><ul><ul><li>Stored separately from the digital object. </li></ul></ul></ul>
    39. 39. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Rights: </li></ul><ul><ul><ul><li>The minimum core rights information that a preservation repository must know, however, is what rights or permissions a repository has to carry out actions related to objects within the repository. </li></ul></ul></ul><ul><ul><ul><li>These may be granted by copyright law, by statute , or by a license agreement with the rightsholder. </li></ul></ul></ul>
    40. 40. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>Intellectual Entity: </li></ul><ul><ul><ul><li>Descriptive metadata: out of scope for PREMIS. </li></ul></ul></ul><ul><ul><ul><li>Dublin Core </li></ul></ul></ul>
    41. 41. Layered Metadata Model <ul><li>Preservation Model: PREMIS </li></ul><ul><li>PREMIS OWL: </li></ul><ul><ul><ul><li>Semantic (OWL) ontology following the data dictionary of PREMIS 2.0. </li></ul></ul></ul><ul><ul><ul><li>Published Online: </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><li>Documentation Online: </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
    42. 42. Best practices Or How to minimize risks
    43. 43. B est Practice # 1: Store technical metadata Bron: Adrian Brown, National Archives UK; “Developing Practical Approaches to Active Preservation”
    44. 44. Bitrot/Software errors <ul><li>No storage device is perfect and eternal. </li></ul><ul><li>David Rosenthal Stanford University “Bit Preservation: A Solved Problem?” </li></ul><ul><li>Bit half-life of 8 x 10^17 year => gives 50 % chance that 1 Petabyte survives a century without errors. </li></ul><ul><li>Comparable studies by Carnegie Mellon University, Google and CERN </li></ul>
    45. 45. Bitrot/Software errors <ul><li>Volker Heydegger University of Cologne </li></ul><ul><li>Analysing the Impact of File Formats on Digital Integrity </li></ul>
    46. 46. B est Practice # 2: Preserve preservation metadata <ul><li>Checksums </li></ul><ul><li>Digital Signatures </li></ul><ul><li>Provenance </li></ul><ul><li>… </li></ul>
    47. 47. Interpretation riscs One of the coolest and oldest dwarf stars ever been found.
    48. 48. B est Practice # 3: Representation metadata <ul><li>Time </li></ul><ul><li>Place </li></ul><ul><li>Wave lengths/Calibration data </li></ul><ul><li>Provenance </li></ul>
    49. 49. Technology Changes + = Documentation Information Syntax Semantics 4b50 0403 0014 0000 0008 0cdb 282e 7d22 ddaa 0243 0001 ab00 0002 000f 0000 6341 5f65 666f INC $D020 DEC $D020 JMP $2000 LDX $D020 INX STX $D020 JMP $2000 LDA $5000
    50. 50. B est Practice # 4: Do not trust software <ul><li>It is an illusion to think that software will always offer access to the archived data. </li></ul><ul><li>Computer software is an active component in the archive and it knows only two possible states: </li></ul><ul><ul><li>It works and is maintained . </li></ul></ul><ul><ul><li>It does not work and is not maintained . </li></ul></ul>
    51. 51. B est Practice # 4: Do not trust software (cont.) <ul><li>Case 2: Software does not work, is not maintained: </li></ul><ul><ul><li>Documentation metadata has to contain the source code of the original software. </li></ul></ul><ul><ul><li>Emulation has to be foreseen; metadata has to contain all the emulation parameters . </li></ul></ul><ul><li>Case 1: Software works, is maintained: </li></ul><ul><ul><li>The archive has the software. </li></ul></ul><ul><ul><li>The user has the software. </li></ul></ul><ul><ul><li>Both cases have a dynamic metadata layer with all the software aspects needed to access the data. </li></ul></ul>
    52. 52. Descriptive metadata <ul><li>Are descriptive metadata (or other access tools like thumbnails, previews) data or metadata ? </li></ul><ul><li>Non-discussion: ‘metadata’ is a relative term. </li></ul><ul><li>as Data: </li></ul><ul><ul><li>Advantage : descriptive metadata are ‘core business’, too valuable not to be archived. </li></ul></ul><ul><ul><li>Disadvantage : this type data is very dynamic. </li></ul></ul><ul><li>as Metadata : </li></ul><ul><ul><li>Advantage : metadata are dynamic; can be adapted to the needs of the archive. </li></ul></ul><ul><ul><li>Disadvantage : which descriptive model have to be used: MARC, EAD, P/META,…? </li></ul></ul>
    53. 53. B est Practice # 5: Store descriptive metadata as data Provide a broadly accepted descriptive model like Dublin Core <ul><li>Dublin Core describes the ‘Who’, ‘What’,’Where’, ‘When’ and ‘How’. </li></ul><ul><li>Sector specific descriptive metadata models have finer granularity. </li></ul><ul><li>Use international standards (MARC, EAD, P/Meta). </li></ul>
    54. 54. Want to know more? Book (in dutch): “ (Meta)datastandaarden voor digitale archieven full-text available book “, Bastijns, Paul; Coppens, Sam; Corneillie, Siska; Hochstenbach, Patrick et al. , ( 2009 ); Deliverable Layered metadata model:
    55. 55. Q & A