• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 

Digital Preservation Best Practices: Lessons Learned From Across the Pond

on

  • 1,563 views

Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns ...

Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)

Statistics

Views

Total Views
1,563
Views on SlideShare
1,563
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Digital Preservation Best Practices: Lessons Learned From Across the Pond Digital Preservation Best Practices: Lessons Learned From Across the Pond Presentation Transcript

    • Digital Preservation Best PracticesLessons Learned From Across the Pond
      Slavko ManojlovichAssociate University Librarian (IT) / Manager, Digital Archives Initiative
      and
      Benoit Pauwels
      Head, Library Automation Team
      Université Libre de Bruxelles
      [with input from Michael J. Bennett, Digital Projects Librarian and Institutional Repository Coordinator, University of Connecticut]
    • Outline
      What is digital preservation?
      Best practices information resources
      Open Archives Information System (OAIS)
      Preservation Planning
      Digital Preservation in Action(Archivematica)
      Digital preservation @ ULB
      Our issues
    • What is digital preservation?
      Digital preservation is NOT digitization!!!!!!!!
    • What is digital preservation?
      Digital preservation is the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material.Source
    • What is digital preservation?
      Digital preservation is the series of actions and interventions required to ensure continued and reliable access to authenticdigital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material.Source
    • What is digital preservation?
      Disaster recovery strategies and backup systems are not sufficient to ensure survival and access to authentic digital resources over time.Source
    • Digital preservation includes:
      Digitized analogue content (easy)
      Born–digital content (more difficult)
      What is digital preservation?
    • Recent example from Memorial University
      Preserve faculty member’s research outputs from 1977 – present stored in a variety of formats.
      “All of the above represents a vast resource which cannot be lost from the University”.
      What is digital preservation?
    • Best practices may not always be the best option for your organization:
      British Library Microsoft Live Book Data Project
      The DPT [Digital Preservation Team] have taken the view that since the budget for hard drive storage for this project has already been allocated, it would be impractical to recommend a change in the specifics as far as file format is concerned for this project...... JPEG 2000 files compressed to 70 dB PSNR for the preservation copy.
      Source
      Digital Preservation Best Practices
    • Best practices may not always be the best option for your organization:
      British Library Microsoft Live Book Data Project
      The DPT [Digital Preservation Team] have taken the view that since the budget for hard drive storage for this project has already been allocated, it would be impractical to recommend a change in the specifics as far as file format is concerned for this project...... JPEG 2000 files compressed to 70 dB PSNR for the preservation copy.
      Source
      Digital Preservation Best Practices
    • The National Gallery (UK) Preservation of Digital Photographs of the Collection
      The National Gallery has photographed their entire collection using a high-end digital MARC camera capable of capturing and rendering colour accuracy which is at least 5 times better than traditional photography. They have selected the proprietary raw camera output format for long-term preservation because it supports an advanced level of colour management. The company supporting the camera and associated software is very smalland is not a market leader.Source: Site Visit to National Gallery Photography Department, April, 2010.
      Digital Preservation Best Practices
    • The National Gallery (UK) Preservation of Digital Photographs of the CollectionThe National Gallery has photographed their entire collection using a high-end digital MARC camera capable of capturing and rendering colour accuracy which is at least 5 times better than traditional photography. They have selected the proprietary raw camera output format for long-term preservation because it supports an advanced level of colour management. The company supporting the camera and associated software is very smalland is not a market leader.Source: Site Visit to National Gallery Photography Department, April, 2010.
      Digital Preservation Best Practices
    • Eighth European Conference on Digital ArchivingGeneva, Switzerland / April 28 -30, 2010Source
      Archiving 2010The Hague, Netherlands / June 1-4, 2010Note: Archiving 2011 – Salt Lake City (May, 16-19, 2011)Source
      Best Practices Information SourcesConferences
    • OR2010: The 5th International Conference on Open RepositoriesMadrid, Spain / July 6-9, 2010Note: Or2011 – Austin, Texas (June 7-11, 2011) Source
      iPRES2010: 7th International Conference on Preservation of Digital ObjectsVienna, AustriaSeptember 19-24, 2010Source
      Best Practices Information SourcesConferences
    • Digital Preservation – The Planets WayLondon, UK / February 9, 2010Source
      Digital Futures London 2010: From digitization to delivery King’s Digital Consultancy Services (KDCS)King’s College, London, UK April 19 – 23, 2010Source
      Best Practices Information SourcesWorkshops
    • Digital Preservation Management: Implementing Short-term Solutions for Long-term ProblemsCambridge, MA, USA / June 13-18, 2010Note: Albany, New York / June 5-10, 2011Source
      Short digital preservation workshops are typically offered in conjunction withmost digital preservationconferences.
      Best Practices Information SourcesWorkshops
    • Open Planets FoundationSource
      Digital Curation CentreSource
      Library of Congress National Digital Information Infrastructure and Preservation ProgramSource
      Best Practices Information SourcesWeb Sites/Listservs/Blogs
    • JISC Digital Preservation and Records Management ProgrammeSource
      PrestoPRIME Keeping Audiovisual Contents AliveSource
      International Internet Preservation ConsortiumSource
      Best Practices Information SourcesWeb Sites/Listservs/Blogs
    • Best Practices Information SourcesWeb Sites/Listservs/Blogs
      Source
    • International Journal of Digital CurationSource
      ARIADNESource
      D-Lib MagazineSource
      Best Practices Information SourcesJournals
    • International Journal of Digital Curation
      Source
    • International Journal of Digital Curation
      Source
    • International Journal of Digital Curation
      Source
    • Best Practices Information SourcesEducation
      Source
    • Best Practices Information SourcesEducation
      Source
    • Best Practices Information SourcesEducation
      Source
    • Best Practices Information SourcesEducation
      Source
    • Best Practices Information SourcesEmployment
      Source
    • Open Archives Information System(OAIS)
      Developed by the Consultative Committee for Space Data Systems in 2002 and became an ISO standard in 2003 (ISO 14721:2003).148 pages of heavy reading
      “Those who will implement OAIS archives or administer them on a daily basisshould read the entire document.”Source
    • Open Archives Information System
      OCLC claims OAIS compliance for their “Digital Archive”.Source
      Library and Archives Canada’s Trusted Digital Repository is based on OAIS.Source
      National Library of the Netherlands’ e-Depot is an exemplar world classOAIS based digital repository.Source
    • Open Archives Information System
      “GPO’s world-class preservation repository [Fdsys] went live in March 2009. The repository was built upon the Open Archival Information System (OAIS) model and provides sufficient control to ensure long-term preservation and access.”
      Source
    • Open Archives Information System
      “The use of this reference model as the basis of any archive implementation is recommended as it allows practitioners to use common language and potentially common tools to address common problems.”
      Tessella Technology & Consulting White PaperSource
    • OAIS Reference Model
      “The use of
      Source
    • OAIS Reference Model
      Source
    • OAIS Reference Model
      Source
    • OAIS Reference Model - Actors
      Source
    • OAIS Reference Model - Objects
      Source
    • OAIS Reference Model - Actions
      Source
    • Monitor designated community (consumer needs and expectations)
      Monitor technology
      Develop preservation strategies and standards
      Develop packaging designs and migration plans
      Preservation Planning
      Source
    • Monitor TechnologyInternet Archive Wayback Machine
      Wayback for www.unb.ca
    • Monitor TechnologyCross-Platform Access Video Format
      2005: wmv (Windows Media Video) format using Windows Media Player (or other players) for Windows and Flip4MAC Quicktime extension for Macintosh.
      2005 – 2009: swf (Adobe Flash) format with Adobe flash plug-ins available for Windows and Macintosh browsers becomes the flavour of the day for web delivery of video content.
    • Monitor TechnologyCross-Platform Access Video Format
      Fast forward to April, 2010: mp4 (H.264) format with players/support for Windows, Macintosh and IPAD.
      IPAD does not support wmv or swf video formats.
      Video conversion history: wmvswfmp4 from original DVD vobs.
      DVD vob files are being preserved with agoal of converting them to MXF MotionJPEG 2000 for long-term preservation.
    • Monitor TechnologyGoogle Drops H.264 Support (Jan 11, 2011)
      Source
    • Monitor TechnologyMicrosoft Adds H.264 Support (Feb 2, 2011)
      Source
    • Plato: The PLANETS Preservation Planning Tool
      Source
    • Plato: The PLANETS Preservation Planning Tool
      Developed by the PLANETS Consortium
      Source
    • Plato: The PLANETS Preservation Planning Tool
      A preservation plan defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a given set of digital objects or records (called collection).
      The preservation plan takes into account the preservation policies, legal obligations, organisational and technical constraints, user requirements and preservation goals and describes the preservation context, the evaluated preservation strategies and the resulting decision for one strategy, including the reasoning for the decision.
      It also specifies a series of steps or actions (called preservation action plan) along with responsibilities and rules and conditions for execution on the collection.
      Provided that the actions and their deployment as well as the technical environment allow it, this action plan is an executable workflow definition.
      Access to a library of preservation plans.
      Source
    • Plato: The PLANETS Preservation Planning Tool
      Source
    • Plato: TIFF to JPEG 2000 Case Study
      Source
      YouTube Video
    • Plato: TIFF to JPEG 2000 Case Study
      British Library’s 2 million newspaper pages in TIFF-5 uncompressed and high quality. File size is 40 MB/ page.
      PLATO experiment compares image quality and size of TIFF-5 images converted to JPEG 2000 lossless.
      Experiment results: JPEG 2000 lossless image quality is as good as TIFF-5 uncompressed and image file size is reduced by 25-30 percent. JPEG derivatives from TIFF-5 are as good as JPEG derivativesfrom JPEG 2000 lossless.
      Source
    • Planets Time Capsule
      Source
    • E-Prints: Integration of Bit-Level and Logical Preservation (New)
      Source
    • E-Prints: Integration of Bit-Level and Logical Preservation (New)
      Source
    • E-Prints: Integration of Bit-Level and Logical Preservation (New)
      GIF files will be migrated to PNG with the ImageMagick utility
      Source
    • E-Prints: Integration of Bit-Level and Logical Preservation (New)
      Upload Plato preservation plan to E-Prints
      Prescribed preservation plan action applied to each set of identified “at risk” classified files
      E-Prints creates provenance metadata for all preservation actions (i.e. File was migrated from “file format A” to “file format B” on this date according to preservationplan NNN).
      Source
    • Sample Media Type Preservation Plan
      Source
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Source
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Source
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Source
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Source
      The repository commits to continuing maintenance of digital objects for identified community/communities.
      Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment.
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Source
      Acquires and maintains requisite contractual and legal rights and fulfills responsibilities.
      Has an effective and efficient policy framework.
      Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities.
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time.
      Creates and maintains requisite metadata about actions taken on digital objects during preservation as well as about the relevant production, access support, and usage process contexts beforepreservation.
      Source
    • Trustworthy Repositories Audit & Certification (TRAC) Checklist
      Fulfills requisite dissemination requirements.
      Has a strategic program for preservation planning and action.
      Has technical infrastructure adequate to continuing maintenance and security of its digital objects.
      Complete TRAC Document
      Source
    • Digital Curation Micro-Services
      “Micro-services are an approach to digital curation based on
      devolving curation function into a set of independent, but
      interoperable, services that embody curation values and strategies.
      Since each of the services is small and self-contained, they are
      collectively easier to develop, deploy, maintain, and enhance.
      Equally as important, they are more easily replaced when they have
      outlived their usefulness. Although the individual services are
      narrowly scoped, the complex function needed for effective
      curation emerges from the strategic combination of
      individual services.”
      Source
    • Archivematica http://archivematica.org is an open source software toolkit that takes the OAIS model and turns its various conceptual entities into actionable functionalities.
      Take SIPs and turn them into AIPs and DIPs.
      In v. 0.7 alpha this is accomplished through a Unix pipeline design which makes use of various open-source utilities toperform designated actions.
      Digital Preservation in Action Archivematica (version 0.7 alpha)
    • Open source software developed by Artefactual Systems (Vancouver, Canada)
      Development partners include:
      UNESCO Memory of the World Programme
      International Monetary Fund
      Vancouver City Archives
      University of British Columbia
      University of Virginia (Rubymatica)
      Many alpha installations
      Digital Preservation in Action Archivematica (version 0.7 alpha)
    • Archivematica & OAISSIP > AIP > DIP
      Source
    • Archivematica & OAIS
      SIP>AIP>DIP
      Micro-services
      Open source tools employed
      Source
    • Archivematica & OAISCuration Micro-services
      Receive SIP
      verifyChecksum
      Review SIP
      extractPackage
      assignIdentifier
      parseManifest
      cleanFilename
      Source
    • Archivematica & OAISCuration Micro-services
      Quarantine SIP
      lockAccess
      virusCheck
      Appraise SIP
      identifyFormat
      validateFormat
      extractMetadata
      decidePreservationAction
      Source
    • Archivematica & OAISCuration Micro-services
      Prepare AIP
      gatherMetadata
      normalizeFiles
      createPackage
      Review AIP
      decideStorageAction
      Source
    • Archivematica & OAISCuration Micro-services
      Store AIP
      writePackage
      replicatePackage
      auditfixity
      readPackage
      updatePackage
      Provide DIP
      uploadPackage
      updateMetadata
      Source
    • Archivematica & OAISCuration Micro-services
      Monitor Preservation
      checkFormatRegistry
      updatePreservationPlanPolicies
      migrateFormat
      synchronizeAIPsandDIPs
      Source
    • Digital Curation Software Tools
      Pronom File Format RegistryPRONOM is a resource for anyone requiring impartial and definitive information about the 320+ file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value. It is maintained by The National Archive(UK).
      Source
      Source
    • Digital Curation Software Tools
      Pronom File Format Registry (Excel 2.1)
      Source
    • Digital Curation Software Tools
      Pronom File Format Registry (Excel 2.1)
      Source
    • Digital Curation Software Tools
      FITS (Developed by Harvard University)
      The File Information Tool Set (FITS) identifies, validates, and extracts technical metadata for various file formats. It wraps several third-party open source tools, normalizes and consolidates their output, and reports any errors.
      Current tools are: Jhove, Exiftool, National Library of New Zealand Metadata Extractor, DROID, FFIdent, File Utility, Fileinfo andXMLMetadata.
      Source
    • Digital Curation Software Tools
      FITS (Developed by Harvard University)
      File identification using DROID
      File validation using Jhove
      Metadata extraction using NZ Metadata Extractor
      Metadata normalization and consolidation using XMLMetadata
      Source
    • Digital Curation Software Tools
      FITS (Developed by Harvard University)
      All digital file formats are not supported by every tool as illustrated in the latest FITS release notes:
      Improved support for audio formats
      Better identification of JP2 and JPx images
      Improved identification of EXIF and JFIF JPEGs
      Fixed DROID format output for SVG files
      Source
    • Digital Curation Software Tools
      FITS (DROID Tool – file identification)
      DROID (Digital Record Object Identification) uses internal and external signatures, maintained in the PRONOM technical registry, to identify and report the specific file format versions of digital files.
      Source
    • Digital Curation Software Tools
      FITS (JHOVE Tool – file identification, validation and characterization)
      File identification as per DROID
      File validation
      A file is well-formed if it meets the purely syntactic requirements for a format.For example, a TIFF object is well-formed if it starts with an 8 byte header followed by a sequence of Image File Directories (IFDs), each composed of a 2 byteentry count and a series of 8 byte taggedentries.
      Source
    • Digital Curation Software Tools
      FITS (JHOVE Tool – file identification, validation and characterization)
      File validation (continued)
      A well-formed file is also valid if it meets additional semantic level requirements.For example, an RGB file must have at least three sample values per pixel.
      Source
    • Digital Curation Software Tools
      FITS (JHOVE Tool – file identification, validation and characterization)
      File characterization
      The process of determining the format-specific significant properties of an object of a given format.
      JHOVE can report the file pathname or URI, last modification date, byte size, format, format version, MIME type, format profiles and, optionally, a checksum.
      Source
    • Digital Curation Software Tools
      FITS (JHOVE Tool – sample output)
      Source
    • Digital Curation Software Tools
      FITS (JHOVE Tool – supported file formats)
      Source
      AIFF
      ASCII
      BYTESTREAM
      GIF
      HTML
      JPEG
      JPEG 2000
      PDF
      TIFF
      UTF-8
      WAVE
      XML
    • Digital Curation Software Tools
      FITS (New Zealand Metadata Extraction Tool)
      Automatically extracts preservation-related metadata from digital files.
      Supported file formats:
      Images: BMP, GIF, JPEG and TIFF.
      Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel,
      MS PowerPoint, and PDF.
      Audio and Video: WAV and MP3.
      Markup languages: HTML and XML.
      Source
    • Digital Curation Software Tools
      FITS (New Zealand Metadata Extraction Tool)
      Potential metadata elements which can be extracted from an audio file header include:
      Resolution
      Duration
      Bitrate
      Compression
      Encapsulation
      Channels
      Source
    • Digital Curation Software Tools
      BagItA specification for the packaging of digital content for transfer. Content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content's receipt, storage and retrieval. There is no software to install. A bag consists of a base directory containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:
      An inventory of the content files in the bag
      A checksum for each file
      Source
      Source
    • Digital Curation Software Tools
      BagIt: bag directory contents
      /6‐1999‐06‐07bagit.txtbag‐info.txtmanifest‐md5.txt/data6‐1999‐06‐07.tif6‐1999‐06‐07_general_metadata.xml6‐1999‐06‐07_technical_metadata.xml
      Source
      Source
    • Digital Curation Software Tools
      BagIt: bagit.txt
      BagIt‐Version: 0.96
      Tag‐File‐Character‐Encoding: UTF‐8
      Source
    • Digital Curation Software Tools
      BagIt: bag‐info.txt
      Source‐organization: Simon Fraser University LibraryOrganization‐URL: http://www.lib.sfu.caBagging‐Date: 2009‐06‐26External‐Description: TIFF master files and associated metadata for item 6‐1999‐06‐07 in the SFU Editorial Cartoons Collection.
      Source
    • Digital Curation Software Tools
      BagIt: manifest‐md5.txt
      91a6ce58ad2628b81c46c034d434816f data/6‐ 1999‐06‐07.tif8c2712026f0f54c4ad156674e87f573b data/6‐1999‐06‐07_general_metadata.xml28fa197bbfd61e4da0f6119ed7420bff data/6‐ 1999‐06‐07_technical_metadata.xml
      Source
    • Digital Curation Software Tools
      BagIt: 1999‐06‐07.tif
      Ingrid Rice, June 7, 1999
      Source
    • Digital Curation Software Tools
      BagIt: General metadata file
      Source
    • Digital Curation Software Tools
      BagIt: Technical metadata file
      /6‐1999‐06‐07
      bagit.txt
      bag‐info.txt
      manifest‐md5.txt
      /data
      6‐1999‐06‐07.tif
      6‐1999‐06‐07_general_metadata.xml
      6‐1999‐06‐07_technical_metadata.xml
      Source
    • DSpace 1.7 (New Features)
      AIP Backup and Restore
      Outputs metadata and bitstreams into zipped self-contained Archival Information Packages which can be loaded into another instance of DSpace or another institutional respository platform (Fedora, CONTENTdm, etc.)
      DSpace AIPs can function as SIPs or DIPs.
      Possible to load Archivematica AIPs intoDSpace.
      Source
    • DSpace 1.7 (New Features)
      Curation System
      Infrastructure to support the implementation of digital curation micro-services for the long-term preservation of your DSpace content.
      Initial Services include:
      Bitstream format profiler: examines all the bitstreams and generates a count and support level for each type of bitstream format. Useful tool for format migration. Note: this is not identifying and validating bitstreams.
      Required metadata: checks to see if requiredmetadata is present in all records.
      Virus scan: Virus check using ClamAV tool.
      Source
    • Archivematica 0.7 Alpha Demo
      Objectives
      Show complete process of ingest/archival/dissemination chain for one SIP
      Our demo SIP contains object files of various image formats: TIFF, BMP, SVG, PNG, JP2, EPS, GIF, JPG, TGA
      Check contents of ArchiveMatica SIP, throughout the process, as it transforms into a self-contained AIP and DIP
    • Archivematica 0.7 Alpha Demo
      Normalization paths used in this demo
      (*) PNG and JPEG2000 are not normalized to a preservation format
    • Archivematica 0.7 Alpha Demo
      Archivematica Release 0.7 Alpha
      YouTube Video 1 and 2, along with
      step by step instructions.
    • Archivematica 0.7 Alpha Demo
      Boot your PC with the bootable Archivematica DVD.
      Login as: demo Password: demo
      You see the File Manager
      Shortcuts
      Directories used through the archiving process
      Imagine you’re an archivist and you have a set of object files sitting in demo/testFiles
      structured into a number of directories
      each directory corresponds to a logical unit of resources, be it a distinctive item or a complete fonds
      each directory in testFiles = one SIP
      You couldalso drag/drop, copy/pastefromUSB stick
    • Archivematica 0.7 Alpha Demo
      Launch dashboard and resize so that it can be viewed as you navigate through the Archivematica processes.
      FireFox: uncheck File/Work Offline
      Web-based administration for the archivist
      Tracks various stages of the archival process
      (In this demo setup of ) ArchiveMatica manual approval is required from archivist at various stages in the process:
      we’ll have a look at contents of SIP, AIP and DIP at each of these stages
    • Archivematica 0.7 Alpha Demo
      ArchiveMatica-SIP
      Folder structure, containing metadata, checksums, object files
      logs
      logs/fileMeta
      metadata: checksum and descriptive metadata
      objects: digital objects to be preserved
      Content changes as SIP is moved through the different stages of the archiving process
      Demo SIP = ImagesSIP directory
    • Archivematica 0.7 Alpha Demo
      Start the archival process
      Drap and drop the ImagesSIP directory into the receiveSIP watched directory
      Rename the SIP
      The SIP appears in the DashBoard
    • Archivematica 0.7 Alpha Demo
      First approval: appraise SIP for submission
      click on Micro-Services to look at actions performed by ArchiveMatica so far
      SIP backup, SIP compliant, assign UUIDs (package and object files), check delivered checksums (if any delivered)
      click on Browse to see contents of SIP at this stage
      logs/fileUUIDs.log
      logs/fileMeta/*.xml for each object file: PREMIS-formatted metadata
      file name, uuid, sha256 hash
      events that occurred on the object file
    • Archivematica 0.7 Alpha Demo
      First approval: appraise SIP for submission
      submitted SIP should be in accordance with institution’s submission agreements
      delete any unwanted files or directories
      File Manager/appraiseSIPForSubmission
      add descriptive metadata about the SIP in metadata/dublincore.xml
      click on Approve
    • Archivematica 0.7 Alpha Demo
      SIP quarantined
      SIP is placed in quarantine for virus checking
      Why quarantine?
      Give ClamAV a chance to pick up the latest version of its virus database
      How long?
      demo: preset to one minute
      National Archives of Australia: 1 month
      archivist can manually remove SIP from quarantine
    • Archivematica 0.7 Alpha Demo
      Second approval: appraise SIP for preservation
      zipped/tarred/… files are extracted
      check directory and file names
      scan for viruses
      using FITS:
      identify and validate format of object files
      extract technical metadata – PREMIS
    • Archivematica 0.7 Alpha Demo
      Second approval: appraise SIP for preservation
      logs/clamAVScan.txt: report on virus checking
      logs/extraction.log: report on extracted zip
      logs/fileMeta/*.xml: augmentedPREMIS-formatted metadata
      format designation(PRONOM PUID identifier)
      events
      technical metadata
    • Archivematica 0.7 Alpha Demo
      Second approval: appraise SIP for preservation
      technical metadata: object characteristics
      <fits_output> XML formatted metadata
      <fits/identification>
      <fits/fileinfo>
      <fits/filestatus>: well-formed / valid
      <fits/metadata>: technical metadata of object
      <fits/toolOutput>: output results of used tools Jhove, File Utility, Exiftool, Droid, NLNZ Metadata Extractor, ffident File Information, XML Metadata
    • Archivematica 0.7 Alpha Demo
      Second approval: appraise SIP for preservation
      delete any unwanted files or directories from the SIP
      FileManager/appraiseSIPForPreservation
      click on Approve
      ArchiveMatica now creates an AIP and a DIP for this SIP
      normalization based on format identified
    • Archivematica 0.7 Alpha Demo
      Third approval: push AIP to archival storage
      storeAIP contains one zip file for the AIP
      containing a bag (according BagIt specs)
      Click on Browse next to Store AIP micro-service
      Look in the bag
    • Archivematica 0.7 Alpha Demo
      ArchiveMatica-AIP
      data/
      logs/normalizationLog.txt
      metadata: the dublincore.xml
      checksum.sha256 for the AIP
      objects: all original formats + preservation formats
      METS.xml: METS XML container with structural, descriptive, administrative metadata of AIP
    • Archivematica 0.7 Alpha Demo
      Source
    • Archivematica 0.7 Alpha Demo
      Source
    • Archivematica 0.7 Alpha Demo
      ArchiveMatica-AIP / METS.xml
      <structMap>: structure of the AIP
      <fileSec>: list of files included in the AIP
      <dmdSec>: descriptive metadata for the AIP (the dublincore.xml)
      <amdSec>: administrative metadata
      <digiprovMD>: PREMIS-formatted digital provenance metadata
      most of it is grabbed from the logs/fileMetafiles
      object identification and characteristics
      events
      agents
      relation between original and preservation copies
    • Archivematica 0.7 Alpha Demo
      Third approval: push AIP to archival storage
      If wanted, check contents of the AIP : you are not able to make any changes though in an AIP
      click on Approve
      AIP is pushed into archival storage
      our demo setup: the AIPsStore directory
      real life: cloud storage, Amazon S3, your own network storage device, CLOCKSS, …
    • Archivematica 0.7 Alpha Demo
      Fourth approval: upload DIP to public access system
      directory created for this DIP under uploadDIP
      objects: normalized access copies of the object files
      objectsBackup: idem
      METS.xml: identical as in the AIP
      If wanted, check and change contents of the DIP
      File Manager / uploadDIP
      click on Approve
      removed from SIPbackups
      copied to DIPbackups
      our demo setup: DIP is pushed towards an ICA-Atom public access system
    • Archivematica 0.7 Alpha Demo
      ICA-AtoM public access system
      Fully web-basedarchival description application based on International Council on Archives standards
      AtoM = Access to Memory
      Point Firefox to http://localhost/ica-atom
      UploadedDIPs are by default in draft. Change status to ‘published’ for these to become visible in public access
      Log in: demo@example.com / demo
      Choosefromarchival descriptions
      Edit: change publication status to ‘published’
      Log out
      Selected archive isnowpublicly visible
    • Digital preservation @ ULB
      Context: multiple digital archives
      DI-pot
      All academic output (except PhD theses)
      Most digital born / some digitized by library staff
      Self-submission by academic staff
      Extensively modified DSpace 1.4.2
      Metadata granularity
      Semi-automated metadata ingest from PubMed, Scopus, Web of Science, BibTex and RIS files
      Integrated with central administration databases (staff, departments, controlled vocabulary, ...)
      55K descriptions
      8K full-text [ PDF ]
    • Digital preservation @ ULB
      Context: multiple digital archives
      Bictel
      PhD theses (since 2004)
      Most digital born / some digitized by library staff
      Self-submission, with some support from faculty staff
      ETD software from Virginia Tech
      Metadata per object file: access restrictions, deposit dates, mime type, location
      1300 descriptions
      Typically multiple object files per thesis [ PDF ]
    • Digital preservation @ ULB
      Context: multiple digital archives
      Iconothèque
      Audiovisual material as support for courses
      Most digital born / some digitized by faculty staff
      Self-submission by faculty staff
      ContentDM 5.4
      12K descriptions
      [ JPEG ]
    • Digital preservation @ ULB
      Context: multiple digital archives
      Digithèque
      Out of print / public domain books and journals
      Digitized by library staff
      Submission by library staff
      Symphony + file system (available over SMB, HTTP)
      100K pages / 344 publications
      [ TIFF + PDF ]
    • Digital preservation @ ULB
      Context: multiple digital archives
      Near future: archives of ULB
      (our ISADG enabled) DSpace
    • Digital preservation @ ULB
      All our digital archives :
      Talk OAI-PMH
      Expose identical exchange format
      Based on MPEG21-DIDL
      Compound object of item and associated object files
      “Globally unique persistent identifier” (GUPI) for item and each object file
      Descriptive metadata for item expressed in MODS
      Metadata for object files: descriptive, version, access restrictions, deposit /embargo dates, mime type, location
    • DIDL[1]
      Item[1]
      Descriptor/Identifier (persistent identifier)
      Descriptor/modified
      Item[1..∞] (of type descriptiveMetadata)
      Descriptor/type (« descriptiveMetadata »)
      Descriptor/Identifier (persistent identifier)
      Descriptor/modified
      Component/Resource -- representation by value (XML)
      Item[0..∞] (of type objectFile)
      Descriptor/type (« objectFile »)
      Descriptor/Identifier (persistent identifier)
      Descriptor/modified
      Component/Resource -- representation by ref. (URL)
      Item[0..1] (of type humanStartPage)
      Descriptor/type (« humanStartPage »)
      Component/Resource -- representation by ref. (URL)
      Digital preservation @ ULB
    • Digital preservation @ ULB
      One dissemination platform
      SAMBURU: harvest and index
      DIDL records are harvested from the digital archives
      DIDL record is stored as-is in MySQL database
      DIDL record is transformed into SOLR document and stored in Lucene indexes
      DI-fusion: web portal
      Based on VuFind
      Search/retrieve records through SOLR
      Use XSLT to transform DIDL into HTML
      Additional 2.0 functionality with AJAX technology
    • Digital preservation @ ULB
      Samburu
      SOLR
      DI-fusion
      web
      portal
      OAI-PMH
      DI-pot
      Indexer
      BicTel
      Harvester
      Lucene indexes
      Icono
      MySQL
      Metadata Store
      Digi
      UMons
      OAI-PMH
      OAI-PMH
      Metadata
      Enrichment
    • Digital preservation @ ULB
    • Digital preservation @ ULB
    • Digital preservation @ ULB
    • Digital preservation @ ULB
      Enrichment process
      Fetch DIDL records from SAMBURU md store+ Fetch object files (in function of enrichment type)
      Calculate enrichment and create DIDL formatted enrichment record
      Make enrichment record available over OAI-PMH
      SAMBURU harvests and merges original DIDL record with enrichment DIDL record, before re-indexing into Lucene
      End user sees enrichment through DI-fusion
    • Digital preservation @ ULB
      Enrichment: 3 prototype setups
      Enrichment service at Erasmus University in Rotterdam fetches publications in economics from md store, and determines JEL classification codes based on text analysis
      Enrichment service @ ULB extracts texts from PDFs and indexes on all words. DI-fusion permits end user to do a full-text search
      Enrichment service @ ULB enriches with JCR impact factors (based on ISSN and publication year)
    • Digital preservation @ ULB
      Back to digital preservation
      SUBMISSION
      metadata and object files (through 4 submission interfaces)
      DISSEMINATION
      through DI-fusion
      ARCHIVAL
      we need a PAS: “Perpetual Archiving System”
      based on the idea of enrichment
    • Digital preservation @ ULB
      Samburu
      SOLR
      DI-fusion
      web
      portal
      OAI-PMH
      DI-pot
      Indexer
      BicTel
      Harvester
      Lucene indexes
      Icono
      MySQL
      Metadata Store
      Digi
      UMons
      OAI-PMH
      OAI-PMH
      PAS
      SIPs
      AIPs
      DIPs
      LOCKSS
      Admin
    • Digital preservation @ ULB
      PAS-SIP
      Retrieve DIDL records over OAI-PMH from SAMBURU metadata store
      Fetch object files, based on references included in the DIDL record
      Make and store ArchiveMatica-SIP
      Alternative to OAI-PMH + web grabbing:
      Prepare ArchiveMatica-SIPs on a network-attached filesystem
      More practical for bulk ingest into AM: less network traffic
      We would probably try a combined approach: bulk + incremental
      Specific package information registered in PAS-Admin
    • Digital preservation @ ULB
      PAS-AIP
      Use ArchiveMatica micro-services to create and store ArchiveMatica-AIP, according to media type preservation plan
      Fully automated, at least for certain media types (PDF, JPEG, TIFF)
      Update package information in PAS-Admin
    • Digital preservation @ ULB
      PAS-DIP
      Use ArchiveMatica micro-services to create and store ArchiveMatica-DIP, according to media type preservation plan
      DIPped object files made available through web service
      Update package information in PAS-Admin
    • Digital preservation @ ULB
      PAS-Admin
      Digital preservation status of packages information accessible over a web service:
      Original digital archive wants to find out archival status of its items, based on gupi of item or object file
      End user accesses DIPped object files through web service: not publicly available since dependent on accessibility restrictions set by IPR owner in original digital archive
      AIPs are pushed into outer preservation space, e.g. LOCKSS + registered as suchin PAS-Admin
    • Digital preservation @ ULB
      PAS-Admin
      Throughout SIP/AIP/DIP processing, relevant information should be registered about the packages in a db
      For each SIP, AIP, DIP:
      (I) gupi of item and all object files
      uuid of package
      (I) identifier of original digital archive
      (I) date of creation/modification
    • Digital preservation @ ULB
      PAS-Admin
      relevant metadata of DIPs are made available as DIDL-structured (enrichment) records over OAI-PMH for SAMBURU to pick up
      Parse/extract from METS.xml:
      Essentially mime type and location
      sum of original metadata and PAS-created metadata is available to DI-fusion
      DI-fusion could for example decide to only show DIP version of an object file, and inform end user of the existence of the original object file format
    • Open DiscussionAlternative options for integrating Archivematica or a subset of digital curation micro-services into your digitization workflow.
    • Issues
      Institutional repositories are also used to maintain an institution’s bibliography, with frequent updates of descriptive metadata and object files.
      When should digital objects from an IR be preserved?
    • Issues
      Dappert, A. & Enders M. Using METS, PREMISand MODS for archiving eJournalsD-Lib Magazine Volume 14 Number (9/10)http://www.dlib.org/dlib/september08/dappert/09dappert.html
      “AIP per generation” generation: change in md and/or object file
    • Issues
      Both ArchiveMatica and LOCKSS are looking into solutions for the normalization of objects and packaging. Both systems seem redundant at first.
      How does ArchiveMatica interact with LOCKSS?
    • Issues
      ArchiveMatica-AIPs, DSpace-AIPs, exchange of packages between digital archives, nationwide preservation solution.
      Need for interoperability standards?
      TIPR: Towards Interoperable Preservation
      Respositories
      RXP: Repository eXchange Package
    • AIP Repository Interoperability
      “For reasons of redundancy, succession planning and software migration, repositories must be able to exchange copies of archival information packages with each other. Every different repository application, however, describes and structures its archival packages differently. Therefore each system produces dissemination packages that are rarely understandable or usable as submission packages by other repositories. “
      Source
    • AIP Repository Interoperability
      One possible solution: RXP (Repository eXchange Package), developed by the Towards Interoperable Preservation Repositories (TIPR) project which has defined a standards-based package of metadata files that can act as an intermediary information package, the RXP, a lingua franca all repositories can read and write.
      Another option: create AIPS followingthe HathiTrust specification for digital objects.
      Source
      Source
    • Issues
      AIPs are intended for perpetual access and therefore only contain objects that comply to an open documented format. Any human being within 50 years should be able to re-read the contents of the object files, given a textual documentation.
      So, why migrate AIPs into a new(er) format?
    • Issues
      Archivematica normalizes moving pictures into MPEG2 = loss of quality
      Lossless conversion would be Motion JPEG2000
      However: no open-source CLI-based tool for conversion into Motion JPEG2000 format available
    • Issues
      The more copies of a digital object are stored all over the place, the less trivial becomes control of copyright.
      Is geo-independent perpetual archiving in contradiction with IPR issues?
    • Issues
      Packages are self-contained: if you find an AIP, you know what it is about, and you can read, look, hear it. But how do you find the AIP in a see of billions of AIPs?
      Don’t forget to preserve finding aids! How?
    • Contact
      SlavkoManojlovich
      Associate University Librarian (IT)
      Manager, Digital Archives Initiative
      Memorial University of Newfoundland, St. John’s
      slavko@mun.ca
      &
      Benoit Pauwels
      Head, Library Automation Team
      Université Libre de Bruxelles
      Benoit.Pauwels@ulb.ac.be
      *This presentation may be downloaded at:
      http://dl.dropbox.com/u/18652253/phoenix%20presentation.pptx