Archiving as a Service - A Model for the Provision of Shared Archiving Services Using Cloud Computing


Published on

Presentation held at iConference 2011

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Greet Japan Late in day for talk on archiving – get right started
  • Hot topic and here to stay What is cloud computing
  • Software hosting - virtualization Abstracted – complexity reduced Pay for amount of service, trust service provider to deliver General definition – when it comes to a specific area such as archiving
  • What we are afraid of…
  • To what extent is this possible with existing models?
  • Chosen because it is the de facto standard for archival systems
  • Incompatabilities with cloud computing -Data management, archival storage -starts with SIP. Share platform, shouldn&t be necessary to wait for SIP -references to digital objects The oais model - What does cloud model look like their descriptive information and administrative data is handled by Data Management
  • Layered, SaaS, PaaS, IaaS… HaaS in the case of crowdsourcing Bottom to top
  • 2- With a defined API and classes and properties, possible to exchange one service for another as long as they support 3- Sharing, different programs sharing and taking advantage of similar services in the same layer What would a layered cloud archive system look like? particular set of rules and specifications that a software program can follow to access and make use of the services and resources provided by another particular software program that implements that API
  • Simple 2 layered model. 2 systems taking using a shared cloud repository as a storage backend All well and good, but to offer preservation
  • So to provide this information, we have expanded on the simple model
  • More detail about information provided by each layer
  • place different OAIS Information types in the layers just described. Preservation Layer, lot of different information types needed to generate information package. Where from
  • Putting it all together.
  • Moving on from theory to practice – Application of the model
  • Number of problems raised
  • Archiving as a Service - A Model for the Provision of Shared Archiving Services Using Cloud Computing

    1. 1. iConference 2011 Archiving as a Service - A Model for the Provision of Shared Archiving Services Using Cloud Computing Jan Askhoj – janaskhoej[at] Shigeo Sugimoto – sugimoto[at] Mitsuharu Nagamori – nagamori[at] University of Tsukuba, Japan
    2. 2. The Rise of Cloud Computing <ul><li>Big business: Reported that the cloud computing market will grow to more than $150 billion in 2013 </li></ul><ul><li>Gartner listed cloud computing as one of the most hyped technologies in 2009. </li></ul><ul><li>Many benefits: Reduced cost, increased storage, no software deployment, flexibility, mobility and allowing IT to shift focus. </li></ul><ul><li>Cloud computing is being used increasingly for content creation and storage . </li></ul>* Global Industry Analysts, 2010
    3. 3. A Cloud Definition (One of Many) <ul><li>Cloud Computing is an abstracted, scalable plat-form for service delivery. </li></ul><ul><li>Cloud computing makes use of existing technologies that can be described via a layered model. </li></ul><ul><li>Access to both platform and services is available via the internet . </li></ul><ul><li>Availability, quality and number of services are offered according to agreements with a provider . </li></ul><ul><li>- Vaquero et al. 2009 </li></ul>
    4. 4. Cloud Computing from an Archiving Perspective <ul><li>In the cloud, archives may not have knowledge of records creation hardware and software . How do we document such formats? </li></ul><ul><li>Cloud Providers are good at managing data and hosting software. But what if something happens? </li></ul><ul><li>There are providers of services for backup , but not for preservation . </li></ul><ul><li>Can we find and read documents created and stored in the cloud in 10 years from now? </li></ul>
    5. 5. I found the document... If only I knew how to access it!
    6. 6. Object of Research <ul><li>Providing a reference model for cloud based archiving that makes possible: </li></ul><ul><li>Offering trusted storage and long term preservation as a cloud based service. </li></ul><ul><li>Automatically providing preservation metadata and information packages for transfer of digital records. </li></ul><ul><li>Extending preservation to as early in the records lifecycle as possible. </li></ul>
    7. 7. Current Archive Model: OAIS <ul><li>Reference Model for an Open Archival Information System (OAIS). </li></ul><ul><li>Defines Entities, Relationships and Information Types in digital archives. </li></ul>Consultative Committee for Space Data Systems, 2002.
    8. 8. OAIS and the Cloud <ul><li>The OAIS Model does not cover the use of a shared platform for storage , outside the control of an archive. Such functionality overlaps with several OAIS functional entities. </li></ul><ul><li>An OAIS Archive does not cover the early stages of the document lifecycle . With a shared platform, digital objects can be immediately accessible to an archive for early preservation planning. </li></ul><ul><li>In OAIS, Digital Objects and metadata are included in information packages . If Producer and Archive share a common platform, this is not necessary. </li></ul>
    9. 9. Hardware/Facilities Connectivity Abstraction OS Virtualization Data Metadata Content Applications APIs Presentation (User facing) SaaS (Software as a Service). Users access applications via user-facing software or APIs. PaaS (Platform as a Service). Virtualized platform for executing applications and providing storage. IaaS (Infrastructure as a Service). Hardware and Infrastructure. A General Layered Model for Cloud Computing Services
    10. 10. Some Characteristics of the Layered Model <ul><li>In a layered model, each layer offers defined services to the layers above. </li></ul><ul><li>Services are abstracted and interchangeable. </li></ul><ul><li>Benefits: </li></ul><ul><li>- Makes it easy to offer and take advantage of defined levels of services. </li></ul><ul><li>- Facilitates resource sharing </li></ul><ul><li>- Facilitates migration </li></ul>
    11. 11. Archive Digital Object Digital Object Business System Storage Layer Simple Layered Cloud Archiving System Interaction Layer Trusted repository (bit-level integrity)
    12. 12. Expanding the Simple Model <ul><li>Storage does not equal preservation . </li></ul><ul><li>Information is needed to support: “ Viability, Renderability, Understandability, Authenticity, and Identity of Digital Objects” (known in OAIS as an Information Package). </li></ul>
    13. 13. Proposed Four Layer Model <ul><li>Interaction Layer : User facing Archives/ Records Management Systems and Business Systems. </li></ul><ul><li>Preservation Layer : Adds preservation information. Turns Digital Objects into Information Packages for use by Archives/Records Management Systems. </li></ul><ul><li>SaaS Layer : Applications represent bit-strings as Digital Objects used by systems and users. </li></ul><ul><li>PaaS Layer : Application platform and trusted repository for storing bit-strings. </li></ul>
    14. 14. Information Object Data Object Represent. Information Digital Object Bit Sequence 1+ 1+ 1+ OAIS Information Package Layered Model Interaction Layer Preservation Layer SaaS Layer PaaS Layer Preservation Description Information Information Package
    15. 15. Where does Preservation Metadata come from? <ul><li>Business System Metadata : Generated at the time of document creation or records export. </li></ul><ul><li>Registry Information : Pre-provided (semi-static) information about registered Entities and Information Types </li></ul><ul><li>Event Related Information : Information describing changes to Digital Objects and metadata taking place during the preservation process. </li></ul>
    16. 16. PaaS Layer SaaS Layer Preservation Layer Interaction Layer Digital Object Type & Metadata Bitstream Storage & API Information Package Layered Model Applications, Information and Provided Services Archive System Package Creator Business Software Storage/ Hosting Platform Application Service Preservation Information Information Package Digital Object Bit-stream Information Type
    17. 17. Case Study: Japanese Government <ul><li>Problems with system incompatibility and insufficient record management has led to a new Archives Policy and a new IT Strategy </li></ul><ul><li>One part is a cloud computing project: The Kasumigaseki Cloud ( 霞が関クラウド ). This is still in the early stages of planning. </li></ul><ul><li>We focus on three archiving problem areas to see how these could be resolved using our model. </li></ul>
    18. 18. Platform Platform Platform Record Historic Record Destruction Destruction Common Document Registration System Registration Transfer Plan Preservation Plan Retention Schedule Agency Records Mgmt. Agency National Archives Business System National Archive Current Workflow Business System Business System Business System Business System Records Mgmt. System
    19. 19. Problem Areas <ul><li>Lack of system integration : Individual government offices use different systems. Preparing records is a time consuming task. </li></ul><ul><li>Lack of resources : The burden of transferring records to the National Archives lies with government agencies. The size of the NAJ makes it hard to provide assistance. </li></ul><ul><li>Preservation : Lack of preservation of records in government agency systems. </li></ul>
    20. 20. Applying the model <ul><li>Assumption that the Kasumigaseki Cloud will offer both a storage/hosting platform (PaaS) and software services (SaaS) </li></ul><ul><li>Added functionality in Preservation Layer: </li></ul><ul><ul><li>Registration </li></ul></ul><ul><ul><li>Harvesting </li></ul></ul><ul><ul><li>Preservation </li></ul></ul><ul><ul><li>Reporting </li></ul></ul>
    21. 21. Archive System PaaS Layer Package Layer SaaS Layer ARM Layer User Facing Systems Transfer Transfer SaaS Business Systems -> Digital Objects Platform -> Bit-sequences Preservation Description Information Representation Information Package Information Package Desc. Functionality -> Registration, Harvesting, Conversion, Reporting RMS Agency Records Mgmt. Agency National Archives Business System Back-end Transfer Plan Preservation Plan Retention Schedule
    22. 22. Benefits and Limitations in Case <ul><li>Benefits : </li></ul><ul><ul><li>Automatic package creation, simplifying records transfer. </li></ul></ul><ul><ul><li>Early and consistent preservation metadata addition </li></ul></ul><ul><ul><li>Allows keeping current workflow, but adds automation </li></ul></ul><ul><li>Limitations/Requirements : </li></ul><ul><ul><li>Cloud platform must be truly trustworthy with no unexpected change or loss of service. </li></ul></ul><ul><ul><li>Need good export of content and metadata from SaaS business systems </li></ul></ul><ul><ul><li>Providing semantic or community specific information </li></ul></ul>
    23. 23. Concluding Remarks <ul><li>We believe our model has a number of advantages when developing a cloud archive framework: </li></ul><ul><li>Builds on OAIS model concepts and information types. </li></ul><ul><li>Adds trusted storage and preservation to early stages in the document lifecycle. </li></ul><ul><li>Simplifies archive system design by allowing organizations choose different levels of service. </li></ul><ul><li>Current Status : Work on defining information classes and properties. Designing a test system using the model. </li></ul>
    24. 24. Thank you ! ありがとうございました ! University of Tsukuba, Japan
    25. 25. References <ul><li>ISO 15489-1:2001 - Information and documentation - Records management - Part 1: General. 2001. </li></ul><ul><li>Requirements for Electronic Records Management Systems. 2002. . </li></ul><ul><li>Reference Model for an Open Archival Information System (OAIS) . Consultative Committee for Space Data Systems, 2002. </li></ul><ul><li>Electronic Records Archives ERA Lifecycle. 2004. </li></ul><ul><li>National Archives Law . National Archives of Japan, 2007. </li></ul><ul><li>Outline of the National Archives. 2007. </li></ul><ul><li>Chan, T. Japan to build massive cloud infrastructure for e-government. Green Telecom . </li></ul><ul><li>Guenther, R. Understanding and Implementing the PREMIS Data Dictionary for Preservation Metadata. 2009. </li></ul><ul><li>Koga, T. Recent development of the government information policy in Japan. International Federation of Library Associations and Institutions, Government Information and Official Publications Section (GIOPS) Newsletter, 8 , (2010), 8-11. </li></ul><ul><li>Kulovits, H., Becker, C., and Kraxner, M. Plato: A Preservation Planning Tool Integrating Preservation Action Services. 5173/2008 , (2008), 413-414. </li></ul><ul><li>Okamoto, S. New Developments in Managing Records in Japan - The Establishment, Direction and Structure of the Archive Law. 2010. </li></ul><ul><li>Sugimoto, S. Ensuring the Preservation and Use of Electronic Records. (2007). </li></ul><ul><li>Vaquero, L.M., Rodero-Merino, L., and Caceres, J. A Break in the Clouds: Towards a Cloud Definition. ACM SIGCOMM Computer Communication Review 39 , 1 (2009), 50-55. </li></ul><ul><li>Youseff, L., Butrico, M., and DaSilva, D. Toward a Unified Ontology of Cloud Computing. Grid Computing Environments Workshop , (2008), 1-10. </li></ul>