An Infrastructure for Preservation Claudio Prandoni Marlis Valentini MetaWare SpA & CASPAR
Programme Digital preservation threats and requisites Summary of OAIS model From OAIS to CASPAR CASPAR key components Ex. 1: Preservation step by step Demo: A simple web application Ex. 2: CASPAR answers to preservation threats A preservable architecture Interviews: Two case studies
Introduction How can digital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is  the CASPAR challenge .
Preservation Issue 1 Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved How to guarantee digital information may be  accessed and understood  in the future? How to guarantee  retrieval  of Archival Information? How to guarantee  intelligibility  of digital information within heterogeneous Designated Communities?
Preservation Issue 2 Non-maintainability of essential hardware, software or support environment may make the information inaccessible How to guarantee preservation actors are  informed about change events ? How to guarantee  appropriate actions are undertaken  to preserve Archival Information against change events?
Preservation Issue 3 The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity How to guarantee an adequate  integrity and identity  for any Archival Information?
Preservation Issue 4 Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future How to guarantee an adequate  security access  with the proper  rights  to any resource and functionality within an Archive?
Preservation Issue 5 The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future How to guarantee a proper  information package management  within and Archive? How to guarantee  long-time preservation maintenance  of any information package?
The CASPAR Project  The CASPAR project is mainly based on the  OAIS  standard  ISO:14721:2003 In this perspective, its Architecture is defined for Managing key concepts of the  OAIS reference model Supporting main functionality identified in the  OAIS functional model Moreover, the CASPAR project aims to define and implement interfaces and functionally independent components
OAIS Information Model Content  Information Data Object interpreted using interpreted using Information Package Preservation Description Information Needed for long-term preservation Descriptive Information Needed for discovery Primary focus of archival preservation Representation Information Designated Community Knowledge Base
OAIS Functional Model Manager Consumer Producer
CASPAR Implementation Monitoring OAIS Environment   Detect Changes/Impacts in DCKB Mapping out Preservation Strategy   Provide Recommendations   AIP Storage AIP Maintenance   AIP Retrieval   Populate Descriptive Info Maintain Descriptive Info Access Descriptive Info   Receive SIP   Q-check on SIP   Generate AIP   Extract DescInfo   Coordinate updates   Query Processing Retrieval Delivery   Perform Transformation   Security   Access Control   STORAGE DATA MANAGEMENT INGEST ACCESS
CASPAR Implementation STORAGE DATA MANAGEMENT INGEST ACCESS
CASPAR key components Creation, maintenance and reuse of OAIS Representation Information Allow search of an object using either a related measurable parameter or a linkage to remote values Construction and unpackaging of OAIS Information Packages Centralised and persistent storage and retrieval of OAIS Representation Information, including PDI OAIS-based Preservation Aware Storage, providing built-in support for bit and logical preservation
CASPAR key components Information discovery services Definition and enforcement of access control policies Registration of provenance information on digital works and retrieval of right holding information Maintenance and verification of authenticity in terms of identity and integrity of the digital objects Reception of notifications from Publishers for a specific “topic” and sending of alerts to Subscribers Definition of Designated Communities, identification of missing Representation Information
The CASPAR Workflow
Preservation step by step 1)  The digital content object has to be “prepared” and “packed” in a proper way to be “ingested” in the digital archive system that will manage and maintain it for a long time. 2)  The digital content object has to be “retrieved” within the digital archive, through its descriptive information, and “checked” for any restricting access right policy. 3)  The digital content object within the digital archive needs to be maintained in order to be accessed, used and understood for whatever changes during its long-term lifecycle.
Ingestion steps
Ingestion Phase Information Packaging Components Ingest  Content Information Create Information Package Representation Info Descriptive Info Preservation Description Info Check Information Package Store Information Package  for long term OAIS Ingest Data  Management Archival  Storage Preservation Planning Administration Access
Access steps
Access Phase Information Access Components Search Content Information Obtain   Information Packages  and relative Contents and Descriptions Check  Content Access Permissions OAIS Ingest Data  Management Archival  Storage Preservation Planning Administration Access
Preservation steps
Preservation Phase Communication Components Notify and Alert for Change Event  impacting long term preservation Trigger Preservation Process   OAIS Ingest Data  Management Archival  Storage Preservation Planning Administration Access
CASPAR innovations CASPAR aims at preserving not only the bits of digital objects but also the  information  and  knowledge  that is encoded in digital objects CASPAR aims at preserving  digital rights  on contents and at  identifying mechanisms to ensure maintenance and verification of the  authenticity  of digital objects along the whole preservation process
Phaistos disk (1700 BC) We still cannot understand it (the  meaning  has not been  preserved ) We can only understand it’s a  “sequence of symbols”…
Rosetta Stone (196 BC) … just a “ sequence of symbols”… but… Ancient Heroglyphic Egyptian Demotic Egyptian Greek
Additional components Designated Community & Knowledge Management Deal with Designated  Community Profile  and its own  Knowledge Base Identify and Provide  Knowledge Gap  for understanding a Content Information   Provenance Management Deal with  Digital Rights Guarantee  Authenticity
Web Application
CASPAR answers So… Is CASPAR solution able to provide an answer to the digital preservation issues identified at the beginning?
Preservation Issue 1 Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved You need the ability to create and maintain adequate  Representation Information
Preservation Issue 1 To guarantee a digital information may be  accessed and understood  in the future, you need an adequate  OAIS   Representation Information To guarantee  retrieval  of Archival Information, you need an  OAIS   Finding Aids To guarantee  intelligibility  of digital information within heterogeneous Designated Communities, you need to manage  DC Profiles  and their  Knowledge Base
Preservation Issue 2 Non-maintainability of essential hardware, software or support environment may make the information inaccessible You need the ability to  share information about the availability  of hardware and software and their replacements/substitutes
Preservation Issue 2 To guarantee preservation actors are  informed about change events , you need an adequate management of  message exchange To guarantee  appropriate actions are undertaken  to preserve Archival Information against change events, you need to  identify the information to be added/modified
Preservation Issue 3 The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity You need the ability to bring together evidence from diverse sources about the  Authenticity  of a digital object
Preservation Issue 3 To guarantee an adequate  integrity and identity  for any Archival Information, you need an  Authenticity Tool
Preservation Issue 4 Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future You need the ability to deal with  Digital Rights  correctly in a changing and evolving environment
Preservation Issue 4 To guarantee an adequate  security access  with the proper  rights  to any resource and functionality within an OAIS Archive, you need a  Security and DRM Management
Preservation Issue 5 The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future You need brokering of organisations to hold data and the ability to  package together the information  needed to transfer information between organisations ready for long term preservation
Preservation Issue 5 To guarantee a proper  information package management  within and OAIS Archive, you need to create an adequate  OAIS   Information Package To guarantee  long-time preservation maintenance  of any information package, you need an implementation of  OAIS   Archival Storage
Conclusion Platform Operating System: Linux, Unix, Windows, Mac Java Platform DBMS: H2, Postgres Framework Development Framework: JAX-WS, GWT, Ant Application Server: Tomcat, Glassfish, WASCE KeyComponents GapManager Orchestration DataAccess&Security RepInfoToolbox Registry Packaging DataStores Virtualisation CASPAR Service Factory Authenticity SemanticWeb DigitalRights FindingAids Development Management: Hudson and JTrac The CASPAR Foundation
Preservable Equation Self-Contained  +  Well Described  +  Adaptable  +  Replaceable =  Preservable  Pure  Service-oriented design  guarantees that the component can provide functionality without requiring cooperation of other components Component analysis, design  and development process is strongly based on  complete – shared – open  documentation at any level No Dependencies Loosely coupled Distributed Sharing know-how Open Specification Open Source  Open Documentation Design choices and implementation allows to  adapt and configure  each component to provide always at least a minimal set of functionality  independently  from the  deployment framework and condition Flexibility Scalability Design choices and implementation allows to  replace  any component in the framework with  compliant  one. Interoperability Mantainability
The  Developer Community http://developers.casparpreserves.eu:8080 Shared and cooperative development community based on CASPAR Best Practices Development Management based on a detailed D1302 Overall Master Plan Refinement Specifications Development Control based on a Continuous Integration Engine Hudson + JTrac Specification, Software and Documentation available for developers & practitioners
CASPAR Preservation Nodes
Use cases Artistic Testbed – IRCAM Scientific Testbed – ESA
This work is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit  http:// creativecommons . org / licenses /by- nc - sa /3.0/  or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Infrastructure Training Session

  • 1.
    An Infrastructure forPreservation Claudio Prandoni Marlis Valentini MetaWare SpA & CASPAR
  • 2.
    Programme Digital preservationthreats and requisites Summary of OAIS model From OAIS to CASPAR CASPAR key components Ex. 1: Preservation step by step Demo: A simple web application Ex. 2: CASPAR answers to preservation threats A preservable architecture Interviews: Two case studies
  • 3.
    Introduction How candigital data still be used and understood in the future when systems, software, and everyday knowledge continues to change? This is the CASPAR challenge .
  • 4.
    Preservation Issue 1Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved How to guarantee digital information may be accessed and understood in the future? How to guarantee retrieval of Archival Information? How to guarantee intelligibility of digital information within heterogeneous Designated Communities?
  • 5.
    Preservation Issue 2Non-maintainability of essential hardware, software or support environment may make the information inaccessible How to guarantee preservation actors are informed about change events ? How to guarantee appropriate actions are undertaken to preserve Archival Information against change events?
  • 6.
    Preservation Issue 3The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity How to guarantee an adequate integrity and identity for any Archival Information?
  • 7.
    Preservation Issue 4Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future How to guarantee an adequate security access with the proper rights to any resource and functionality within an Archive?
  • 8.
    Preservation Issue 5The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future How to guarantee a proper information package management within and Archive? How to guarantee long-time preservation maintenance of any information package?
  • 9.
    The CASPAR Project The CASPAR project is mainly based on the OAIS standard ISO:14721:2003 In this perspective, its Architecture is defined for Managing key concepts of the OAIS reference model Supporting main functionality identified in the OAIS functional model Moreover, the CASPAR project aims to define and implement interfaces and functionally independent components
  • 10.
    OAIS Information ModelContent Information Data Object interpreted using interpreted using Information Package Preservation Description Information Needed for long-term preservation Descriptive Information Needed for discovery Primary focus of archival preservation Representation Information Designated Community Knowledge Base
  • 11.
    OAIS Functional ModelManager Consumer Producer
  • 12.
    CASPAR Implementation MonitoringOAIS Environment Detect Changes/Impacts in DCKB Mapping out Preservation Strategy Provide Recommendations AIP Storage AIP Maintenance AIP Retrieval Populate Descriptive Info Maintain Descriptive Info Access Descriptive Info Receive SIP Q-check on SIP Generate AIP Extract DescInfo Coordinate updates Query Processing Retrieval Delivery Perform Transformation Security Access Control STORAGE DATA MANAGEMENT INGEST ACCESS
  • 13.
    CASPAR Implementation STORAGEDATA MANAGEMENT INGEST ACCESS
  • 14.
    CASPAR key componentsCreation, maintenance and reuse of OAIS Representation Information Allow search of an object using either a related measurable parameter or a linkage to remote values Construction and unpackaging of OAIS Information Packages Centralised and persistent storage and retrieval of OAIS Representation Information, including PDI OAIS-based Preservation Aware Storage, providing built-in support for bit and logical preservation
  • 15.
    CASPAR key componentsInformation discovery services Definition and enforcement of access control policies Registration of provenance information on digital works and retrieval of right holding information Maintenance and verification of authenticity in terms of identity and integrity of the digital objects Reception of notifications from Publishers for a specific “topic” and sending of alerts to Subscribers Definition of Designated Communities, identification of missing Representation Information
  • 16.
  • 17.
    Preservation step bystep 1) The digital content object has to be “prepared” and “packed” in a proper way to be “ingested” in the digital archive system that will manage and maintain it for a long time. 2) The digital content object has to be “retrieved” within the digital archive, through its descriptive information, and “checked” for any restricting access right policy. 3) The digital content object within the digital archive needs to be maintained in order to be accessed, used and understood for whatever changes during its long-term lifecycle.
  • 18.
  • 19.
    Ingestion Phase InformationPackaging Components Ingest Content Information Create Information Package Representation Info Descriptive Info Preservation Description Info Check Information Package Store Information Package for long term OAIS Ingest Data Management Archival Storage Preservation Planning Administration Access
  • 20.
  • 21.
    Access Phase InformationAccess Components Search Content Information Obtain Information Packages and relative Contents and Descriptions Check Content Access Permissions OAIS Ingest Data Management Archival Storage Preservation Planning Administration Access
  • 22.
  • 23.
    Preservation Phase CommunicationComponents Notify and Alert for Change Event impacting long term preservation Trigger Preservation Process OAIS Ingest Data Management Archival Storage Preservation Planning Administration Access
  • 24.
    CASPAR innovations CASPARaims at preserving not only the bits of digital objects but also the information and knowledge that is encoded in digital objects CASPAR aims at preserving digital rights on contents and at identifying mechanisms to ensure maintenance and verification of the authenticity of digital objects along the whole preservation process
  • 25.
    Phaistos disk (1700BC) We still cannot understand it (the meaning has not been preserved ) We can only understand it’s a “sequence of symbols”…
  • 26.
    Rosetta Stone (196BC) … just a “ sequence of symbols”… but… Ancient Heroglyphic Egyptian Demotic Egyptian Greek
  • 27.
    Additional components DesignatedCommunity & Knowledge Management Deal with Designated Community Profile and its own Knowledge Base Identify and Provide Knowledge Gap for understanding a Content Information Provenance Management Deal with Digital Rights Guarantee Authenticity
  • 28.
  • 29.
    CASPAR answers So…Is CASPAR solution able to provide an answer to the digital preservation issues identified at the beginning?
  • 30.
    Preservation Issue 1Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved You need the ability to create and maintain adequate Representation Information
  • 31.
    Preservation Issue 1To guarantee a digital information may be accessed and understood in the future, you need an adequate OAIS Representation Information To guarantee retrieval of Archival Information, you need an OAIS Finding Aids To guarantee intelligibility of digital information within heterogeneous Designated Communities, you need to manage DC Profiles and their Knowledge Base
  • 32.
    Preservation Issue 2Non-maintainability of essential hardware, software or support environment may make the information inaccessible You need the ability to share information about the availability of hardware and software and their replacements/substitutes
  • 33.
    Preservation Issue 2To guarantee preservation actors are informed about change events , you need an adequate management of message exchange To guarantee appropriate actions are undertaken to preserve Archival Information against change events, you need to identify the information to be added/modified
  • 34.
    Preservation Issue 3The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity You need the ability to bring together evidence from diverse sources about the Authenticity of a digital object
  • 35.
    Preservation Issue 3To guarantee an adequate integrity and identity for any Archival Information, you need an Authenticity Tool
  • 36.
    Preservation Issue 4Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future You need the ability to deal with Digital Rights correctly in a changing and evolving environment
  • 37.
    Preservation Issue 4To guarantee an adequate security access with the proper rights to any resource and functionality within an OAIS Archive, you need a Security and DRM Management
  • 38.
    Preservation Issue 5The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future You need brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation
  • 39.
    Preservation Issue 5To guarantee a proper information package management within and OAIS Archive, you need to create an adequate OAIS Information Package To guarantee long-time preservation maintenance of any information package, you need an implementation of OAIS Archival Storage
  • 40.
    Conclusion Platform OperatingSystem: Linux, Unix, Windows, Mac Java Platform DBMS: H2, Postgres Framework Development Framework: JAX-WS, GWT, Ant Application Server: Tomcat, Glassfish, WASCE KeyComponents GapManager Orchestration DataAccess&Security RepInfoToolbox Registry Packaging DataStores Virtualisation CASPAR Service Factory Authenticity SemanticWeb DigitalRights FindingAids Development Management: Hudson and JTrac The CASPAR Foundation
  • 41.
    Preservable Equation Self-Contained + Well Described + Adaptable + Replaceable = Preservable Pure Service-oriented design guarantees that the component can provide functionality without requiring cooperation of other components Component analysis, design and development process is strongly based on complete – shared – open documentation at any level No Dependencies Loosely coupled Distributed Sharing know-how Open Specification Open Source Open Documentation Design choices and implementation allows to adapt and configure each component to provide always at least a minimal set of functionality independently from the deployment framework and condition Flexibility Scalability Design choices and implementation allows to replace any component in the framework with compliant one. Interoperability Mantainability
  • 42.
    The DeveloperCommunity http://developers.casparpreserves.eu:8080 Shared and cooperative development community based on CASPAR Best Practices Development Management based on a detailed D1302 Overall Master Plan Refinement Specifications Development Control based on a Continuous Integration Engine Hudson + JTrac Specification, Software and Documentation available for developers & practitioners
  • 43.
  • 44.
    Use cases ArtisticTestbed – IRCAM Scientific Testbed – ESA
  • 45.
    This work islicensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. To view a copy of this license, visit  http:// creativecommons . org / licenses /by- nc - sa /3.0/  or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.