SlideShare a Scribd company logo
1 of 26
Download to read offline
DLF 2011
Baltimore, MD


  Digital Preservation Cloud
  Services for Libraries and
           Archives


                Quyen L. Nguyen – NARA
Outline
     Introduction
     LDPaaS
     Levels of Service and Cost Model
     Related Work
     Conclusion




Oct. 31, 2011           2011 DLF Forum   2
Functional Requirements

 Need for Long-Term Digital Preservation

     – Policy mandates: retention of governments’ records

     – Knowledge function: preserve digitized books and digital born
         materials

     – History-oriented mandates: preservation of cultural heritage



 Challenges

     – Rapid growth of digital objects that require archiving.

     – Data heterogeneity


Oct. 31, 2011                     2011 DLF Forum                       3
Desired System Characteristics

 Dynamic Scalability
     – Increase as well as decrease

 Cost-effective Maintainability
     – Operation cost

     – Patches: COTS, security.

 Evolvability
     – Technology refresh

     – New features and services



Oct. 31, 2011                     2011 DLF Forum   4
Cloud Computing Characteristics
     Elasticity
          – Computing and storage resources

          – Three levels of cloud services: IaaS, PaaS, and SaaS.

          – Quick Provisioning (e.g. Cloud Market [3])

          – Pay-as-you-go

     Cost-efficient Maintenance
          – Economies of scale

          – Maximizing utilization of computing resources

     Evolvability by configuration
Oct. 31, 2011                     2011 DLF Forum                    5
OAIS Reference Model




Oct. 31, 2011          2011 DLF Forum   6
LDPaaS
 Long-term Digital Preservation as a Cloud Service
   – Encompass major OAIS functionalities
   – Not only storage service,
   – But also preservation service according to customer’s
     policies: retention period, preservation level, and access
     level.
 Beneficial to Cloud Service Consumer
   – Relieve records owners from the burden of engineering
     and provisioning preservation infrastructure
 Beneficial to Cloud Service Provider
   – Realize economies of scales by sharing unused
     computing resources
Oct. 31, 2011             2011 DLF Forum                      7
Ingest Provisioning Challenges
 Unpredictability due to business policies
      – Uneven flow of transfer volume

      – Various object sizes, hence object numbers

      – Various object types

 Cloud Computing benefits:

      – Computation resources
             File format identification and Application of Integrity Seal

      – Storage resources: Ingest processing Buffer Space


Oct. 31, 2011                       2011 DLF Forum                           8
Access Provisioning Challenges
 Unpredictability of publishing
     – Volume of publishable data sets

 Spikiness of Access request load

 Access types: Storage Delivery Networks vs Content
    Delivery Networks.

 Cloud Computing benefits
     – Computation: access-time visualization, zooming, conversion to
         access format

     – Storage: High-efficiency Access disk cache

Oct. 31, 2011                  2011 DLF Forum                           9
Preservation Provisioning Challenges
 Prominent preservation methods:

     Bit-level: error detection and correction capabilities

     Transformation
           Computing resources for transformation processes

           Storage served as a scratchpad for transformation.

     Emulation: virtual machine requirements.

 Cloud Computing benefits

    – Computation: Execution of Preservation Algorithms

    – Storage: Preservation Processing Buffer Space
Oct. 31, 2011                         2011 DLF Forum             10
Storage Provisioning Challenges


 It is all about Storage capacity

      Scale of Storage Requirement   May be Best Suited to Function as

     Hyper Large-Scale               Cloud Provider

     Moderate-to-Small-Scale         Cloud Consumer




   Could there be a Community Cloud?


Oct. 31, 2011                        2011 DLF Forum                      11
Software Paradigms




                                               Virtualization
 Structural       Object-
                                         SOA                    Cloud
                  oriented




Oct. 31, 2011           2011 DLF Forum                                  12
System Architecture




Oct. 31, 2011         2011 DLF Forum   13
SOA-based Ingest Process
                                                     • Ingest Process
                                                       implemented as
                                                       composite service
                    Virus Scan
                                                     • Could be
                                     DROID
                                                       implemented by
                    File Format
                   Identification                      BPEL.
                                     JHOVE
                    Metadata
          Ingest
                    Extraction


                   Integrity Seal

                     Move to
                   Preservation
                     Storage



Oct. 31, 2011                       2011 DLF Forum                      14
LDPaaS Levels of Service
 Service                                            Levels
 Ingest           IL1: Transfer Only
                  IL2: With Format Identification
                  IL3: Metadata Extraction

 Preservation     PL1: Bit
                  PL2: Content
                  PL3: Content, Behavior & Formatting

 Discovery        DL1: Metadata search
                  DL2: Full content search

 Access           AL1: Passive Viewer
                  AL2: Interactive Viewer
                  AL3: Content Mining

 Storage          SL1: Delayed Access - Near-Line Storage
                  SL2: Rapid Access - High Performance Storage

 Content Server   CL1: Just-in-Time Active
                  CL2: Always Active

Oct. 31, 2011                         2011 DLF Forum             15
Level of Service Definitions
  Definition 1.
  Each Content Server has a set of LoS formalized by the following 6-
  tuple:

  C = (CL, IL, PL, DL, AL, SL).


  Definition 2.
  Since a customer can have one or more Content Servers, a customer’s
  SLA is specified by the n-tuple:

  L = (C1, …, Cn), if the customer has signed up for n Content Servers,
  with each Ci being a 6-tuple defined according to Definition 1.

Oct. 31, 2011                        2011 DLF Forum                     16
LoS - Example 1
                              Digital Library Repository
  Define Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2)
   Content Server
                CL1 - Active Just-in-Time - this repository is sporadically used
   Ingest Service
                IL2 - File Format Identification
   Preservation Service
                PL2 - Preservation at the Content Level
      Discovery Service: DL1 -              Metadata Search

   Access Service: AL2 -                 Interactive Viewer is provided for access.
   Storage Service
                SL2 - Rapid Access, High Performance Disk - the volume is static


Oct. 31, 2011                              2011 DLF Forum                              17
LoS - Example 2
         Digital Library Repository for Research Publications
Two Sets of Records Stored in Two Different Content Servers: C1 and C2
C1 - Relatively Small Volume of High-Demand Digital Assets
                      C1 = (CL1, IL2, PL3, DL1, AL1, SL2)
CL1 - Active Just-in-Time Content Server
IL2 - File Format Identification
PL3 - Preservation at the Content and Formatting Level
DL1 - Metadata Search
AL1 - Passive Viewer
SL2 - High Performance, Rapid Access Storage

C2 - Backend Repository, Volume Increasing with Time
                  C2 = (CL2, IL2, PL3, DL2, AL1, SL1)
CL2 - Always Active Content Server
IL2 - File Format Identification
PL3 - Preservation at the Content and Formatting Level
DL2 - Full Content Search
AL1 - Passive Viewer
SL1 - Delayed Access Storage

Oct. 31, 2011                          2011 DLF Forum                    18
LoS - Example 3
           Sarbanes-Oxley Act Compliance Business Archive
  Retain and Preserve Records in a Sliding Time Window of Seven Years

                      C1 = (CL1, IL2, PL1, DL2, AL1, SL1)

  PL1 - Preservation Service at the Bit Level
  Retention Period of Seven Years – Elaborate Preservation not Needed
  SL1 - Delayed Access Storage
  Archive Intended for Audit Purposes Only - Rapid Access to Data not Essential




Oct. 31, 2011                       2011 DLF Forum                            19
Cost Model
 Cost is one of the crucial elements in Cloud Computing
 Let O = (V, N) be the Body of N Digital Objects and total
  volume V
 Cost (O, Service) depends on the level of service.
   – Function of V or N or both.
 Examples:
      fIL1 - Utilization Cost for Digital Object Transfer, varies with V
           fIL2 - File Type Identification      Vary with N
           fIL3 - Metadata Extraction


 TOTAL COST (O,C) =                     Cost (O, Service), where
where Service = {Ingest, Preservation, Discovery, Access, Storage}

Oct. 31, 2011                          2011 DLF Forum                      20
Cost Model Example
 Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume :
    fCL2 (V,N) = 20V + 100 N;
     fIL2 (N) = 10 N;
    fPL1 (V) = 20 V;
    fDL1 (N) = 30 N;
    fAL1 (V) = 30 V;
    fSL1 (V) = 40 V.
 For Set O1 of Objects with V1 = 10 GB and N1 = 106
  totalCost(O1,C1)       = 140,000,740
 For Set O2 of Objects with V2 = 103 GB and N2 = 102
  totalCost(O2,C1)       = 88,000

 Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1

Oct. 31, 2011                 2011 DLF Forum                     21
Related Work
 CiteSeer study by Teregowda [2]:
     – Examine each service in the architecture stack in terms of feasibility
       and cost of migrating and hosting in the Cloud.
     – Possible integration with Cloud Storage thanks to current virtualized
       storage component.
 DuraCloud [5]:
     – Open source platform for digital libraries and archives
     – Adapters to commercially available Cloud Storage services
 Strategies and SLAs for bit-level preservation by Zierau [6]:
     – Various sub-levels of bit-preservation.
 www.cloudpreservation.com: archives and indexes data
  from websites and social networks.
 www.ltdprm.org/ - Long-Term Digital Retention and
  Preservation Reference Model: cloud-based digital archive.
Oct. 31, 2011                   2011 DLF Foruml                            22
Conclusion
 Proposed LDPaaS concept: why is it useful?
   – Beneficial to large organizations
   – Beneficial to small organizations
 Notional cost model useful for establishing a price
  model associated with published SLA set.
 Contend that Cloud Storage Service vendors can
  augment their portfolios to provide LDPaaS.
 Community Cloud for Preservation
     – Environment for more collaboration and sharing

Oct. 31, 2011             2011 DLF Forum                23
References
 1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM,
    Volume 53, No 4, April 2010.
 2. P. Teregowda, Burgaonkar, B. and C. L. Giles. “Cloud Computing: A Digital
    Libraries Perspective”. 2010 IEEE 3rd International Conference on Cloud Computing,
    Miami, FL, July 2010.
 3. Stephen Abrams, Patricia Cruse, and John Kunze. “Preservation Is Not a Place”.
    The International Journal of Digital Curation, Issue 1, Volume 4, 2009.
 4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben O’Steen, Neil Jefferies, and
    Leslie Carr. “Towards Smart Storage for Repository Preservation Services”. The
    International Journal of Digital Curation, Issue 1, Volume 5, 2010.
 5. DuraCloud. Available: http://www.duraspace.org/duracloud.php.
 6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. “Evaluation of Bit Preservation
    Strategies”. 7th International Conference on Preservation of Digital Objects
    (iPRES2010), Sep. 19-24, 2010, Vienna, Austria.




Oct. 31, 2011                        2011 DLF Forum                                        24
Disclaimer
 The content of this presentation is the personal opinion of
  the author and does not necessarily reflect any position of
  the U.S. Government or the National Archives and Records
  Administration.




Oct. 31, 2011            2011 DLF Forum                    25
Thank You!

                   Any questions?

                mailto:quyen.nguyen@nara.gov



Oct. 31, 2011             2011 DLF Forum       26

More Related Content

Similar to Digital Preservation Cloud Services for Libraries and Archives

Cncf storage-final-filip
Cncf storage-final-filipCncf storage-final-filip
Cncf storage-final-filipJuraj Hantak
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptSUNILKUMARSINGH
 
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureRed Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureIntel® Software
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...KCDItaly
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesSamuel Dratwa
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital LibrariesJack Eapen
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkTim4PreStartup
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assetsJon Morley
 
M.Sc. Research Proposal
M.Sc. Research ProposalM.Sc. Research Proposal
M.Sc. Research ProposalLighton Phiri
 
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...locloud
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
 
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dS016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dTony Pearson
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKAndy Powell
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overvieweposthumus
 
IT Press Tour #19 Slides OpenIO June 2016
IT Press Tour #19 Slides OpenIO June 2016IT Press Tour #19 Slides OpenIO June 2016
IT Press Tour #19 Slides OpenIO June 2016OpenIO Object Storage
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesVineet Sabharwal
 

Similar to Digital Preservation Cloud Services for Libraries and Archives (20)

Cncf storage-final-filip
Cncf storage-final-filipCncf storage-final-filip
Cncf storage-final-filip
 
Z39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol pptZ39.50: Information Retrieval protocol ppt
Z39.50: Information Retrieval protocol ppt
 
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureRed Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
Multi-Clusters Made Easy with Liqo:
Getting Rid of Your Clusters Keeping Them...
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual Network
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
M.Sc. Research Proposal
M.Sc. Research ProposalM.Sc. Research Proposal
M.Sc. Research Proposal
 
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...
LoCloud - D2.1: Core Infrastructure Specifications (including Business Proces...
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.
 
S016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710dS016825 ibm-cos-nola-v1710d
S016825 ibm-cos-nola-v1710d
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
 
IT Press Tour #19 Slides OpenIO June 2016
IT Press Tour #19 Slides OpenIO June 2016IT Press Tour #19 Slides OpenIO June 2016
IT Press Tour #19 Slides OpenIO June 2016
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Free Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st releaseFree Webinar: LOD2 Stack - 1st release
Free Webinar: LOD2 Stack - 1st release
 
Logging using ELK Stack for Microservices
Logging using ELK Stack for MicroservicesLogging using ELK Stack for Microservices
Logging using ELK Stack for Microservices
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Digital Preservation Cloud Services for Libraries and Archives

  • 1. DLF 2011 Baltimore, MD Digital Preservation Cloud Services for Libraries and Archives Quyen L. Nguyen – NARA
  • 2. Outline  Introduction  LDPaaS  Levels of Service and Cost Model  Related Work  Conclusion Oct. 31, 2011 2011 DLF Forum 2
  • 3. Functional Requirements  Need for Long-Term Digital Preservation – Policy mandates: retention of governments’ records – Knowledge function: preserve digitized books and digital born materials – History-oriented mandates: preservation of cultural heritage  Challenges – Rapid growth of digital objects that require archiving. – Data heterogeneity Oct. 31, 2011 2011 DLF Forum 3
  • 4. Desired System Characteristics  Dynamic Scalability – Increase as well as decrease  Cost-effective Maintainability – Operation cost – Patches: COTS, security.  Evolvability – Technology refresh – New features and services Oct. 31, 2011 2011 DLF Forum 4
  • 5. Cloud Computing Characteristics  Elasticity – Computing and storage resources – Three levels of cloud services: IaaS, PaaS, and SaaS. – Quick Provisioning (e.g. Cloud Market [3]) – Pay-as-you-go  Cost-efficient Maintenance – Economies of scale – Maximizing utilization of computing resources  Evolvability by configuration Oct. 31, 2011 2011 DLF Forum 5
  • 6. OAIS Reference Model Oct. 31, 2011 2011 DLF Forum 6
  • 7. LDPaaS  Long-term Digital Preservation as a Cloud Service – Encompass major OAIS functionalities – Not only storage service, – But also preservation service according to customer’s policies: retention period, preservation level, and access level.  Beneficial to Cloud Service Consumer – Relieve records owners from the burden of engineering and provisioning preservation infrastructure  Beneficial to Cloud Service Provider – Realize economies of scales by sharing unused computing resources Oct. 31, 2011 2011 DLF Forum 7
  • 8. Ingest Provisioning Challenges  Unpredictability due to business policies – Uneven flow of transfer volume – Various object sizes, hence object numbers – Various object types  Cloud Computing benefits: – Computation resources  File format identification and Application of Integrity Seal – Storage resources: Ingest processing Buffer Space Oct. 31, 2011 2011 DLF Forum 8
  • 9. Access Provisioning Challenges  Unpredictability of publishing – Volume of publishable data sets  Spikiness of Access request load  Access types: Storage Delivery Networks vs Content Delivery Networks.  Cloud Computing benefits – Computation: access-time visualization, zooming, conversion to access format – Storage: High-efficiency Access disk cache Oct. 31, 2011 2011 DLF Forum 9
  • 10. Preservation Provisioning Challenges  Prominent preservation methods:  Bit-level: error detection and correction capabilities  Transformation  Computing resources for transformation processes  Storage served as a scratchpad for transformation.  Emulation: virtual machine requirements.  Cloud Computing benefits – Computation: Execution of Preservation Algorithms – Storage: Preservation Processing Buffer Space Oct. 31, 2011 2011 DLF Forum 10
  • 11. Storage Provisioning Challenges  It is all about Storage capacity Scale of Storage Requirement May be Best Suited to Function as Hyper Large-Scale Cloud Provider Moderate-to-Small-Scale Cloud Consumer  Could there be a Community Cloud? Oct. 31, 2011 2011 DLF Forum 11
  • 12. Software Paradigms Virtualization Structural Object- SOA Cloud oriented Oct. 31, 2011 2011 DLF Forum 12
  • 13. System Architecture Oct. 31, 2011 2011 DLF Forum 13
  • 14. SOA-based Ingest Process • Ingest Process implemented as composite service Virus Scan • Could be DROID implemented by File Format Identification BPEL. JHOVE Metadata Ingest Extraction Integrity Seal Move to Preservation Storage Oct. 31, 2011 2011 DLF Forum 14
  • 15. LDPaaS Levels of Service Service Levels Ingest IL1: Transfer Only IL2: With Format Identification IL3: Metadata Extraction Preservation PL1: Bit PL2: Content PL3: Content, Behavior & Formatting Discovery DL1: Metadata search DL2: Full content search Access AL1: Passive Viewer AL2: Interactive Viewer AL3: Content Mining Storage SL1: Delayed Access - Near-Line Storage SL2: Rapid Access - High Performance Storage Content Server CL1: Just-in-Time Active CL2: Always Active Oct. 31, 2011 2011 DLF Forum 15
  • 16. Level of Service Definitions Definition 1. Each Content Server has a set of LoS formalized by the following 6- tuple: C = (CL, IL, PL, DL, AL, SL). Definition 2. Since a customer can have one or more Content Servers, a customer’s SLA is specified by the n-tuple: L = (C1, …, Cn), if the customer has signed up for n Content Servers, with each Ci being a 6-tuple defined according to Definition 1. Oct. 31, 2011 2011 DLF Forum 16
  • 17. LoS - Example 1 Digital Library Repository Define Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2)  Content Server CL1 - Active Just-in-Time - this repository is sporadically used  Ingest Service IL2 - File Format Identification  Preservation Service PL2 - Preservation at the Content Level  Discovery Service: DL1 - Metadata Search  Access Service: AL2 - Interactive Viewer is provided for access.  Storage Service SL2 - Rapid Access, High Performance Disk - the volume is static Oct. 31, 2011 2011 DLF Forum 17
  • 18. LoS - Example 2 Digital Library Repository for Research Publications Two Sets of Records Stored in Two Different Content Servers: C1 and C2 C1 - Relatively Small Volume of High-Demand Digital Assets C1 = (CL1, IL2, PL3, DL1, AL1, SL2) CL1 - Active Just-in-Time Content Server IL2 - File Format Identification PL3 - Preservation at the Content and Formatting Level DL1 - Metadata Search AL1 - Passive Viewer SL2 - High Performance, Rapid Access Storage C2 - Backend Repository, Volume Increasing with Time C2 = (CL2, IL2, PL3, DL2, AL1, SL1) CL2 - Always Active Content Server IL2 - File Format Identification PL3 - Preservation at the Content and Formatting Level DL2 - Full Content Search AL1 - Passive Viewer SL1 - Delayed Access Storage Oct. 31, 2011 2011 DLF Forum 18
  • 19. LoS - Example 3 Sarbanes-Oxley Act Compliance Business Archive Retain and Preserve Records in a Sliding Time Window of Seven Years C1 = (CL1, IL2, PL1, DL2, AL1, SL1) PL1 - Preservation Service at the Bit Level Retention Period of Seven Years – Elaborate Preservation not Needed SL1 - Delayed Access Storage Archive Intended for Audit Purposes Only - Rapid Access to Data not Essential Oct. 31, 2011 2011 DLF Forum 19
  • 20. Cost Model  Cost is one of the crucial elements in Cloud Computing  Let O = (V, N) be the Body of N Digital Objects and total volume V  Cost (O, Service) depends on the level of service. – Function of V or N or both.  Examples: fIL1 - Utilization Cost for Digital Object Transfer, varies with V fIL2 - File Type Identification Vary with N fIL3 - Metadata Extraction  TOTAL COST (O,C) = Cost (O, Service), where where Service = {Ingest, Preservation, Discovery, Access, Storage} Oct. 31, 2011 2011 DLF Forum 20
  • 21. Cost Model Example Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume : fCL2 (V,N) = 20V + 100 N; fIL2 (N) = 10 N; fPL1 (V) = 20 V; fDL1 (N) = 30 N; fAL1 (V) = 30 V; fSL1 (V) = 40 V. For Set O1 of Objects with V1 = 10 GB and N1 = 106 totalCost(O1,C1) = 140,000,740 For Set O2 of Objects with V2 = 103 GB and N2 = 102 totalCost(O2,C1) = 88,000 Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1 Oct. 31, 2011 2011 DLF Forum 21
  • 22. Related Work  CiteSeer study by Teregowda [2]: – Examine each service in the architecture stack in terms of feasibility and cost of migrating and hosting in the Cloud. – Possible integration with Cloud Storage thanks to current virtualized storage component.  DuraCloud [5]: – Open source platform for digital libraries and archives – Adapters to commercially available Cloud Storage services  Strategies and SLAs for bit-level preservation by Zierau [6]: – Various sub-levels of bit-preservation.  www.cloudpreservation.com: archives and indexes data from websites and social networks.  www.ltdprm.org/ - Long-Term Digital Retention and Preservation Reference Model: cloud-based digital archive. Oct. 31, 2011 2011 DLF Foruml 22
  • 23. Conclusion  Proposed LDPaaS concept: why is it useful? – Beneficial to large organizations – Beneficial to small organizations  Notional cost model useful for establishing a price model associated with published SLA set.  Contend that Cloud Storage Service vendors can augment their portfolios to provide LDPaaS.  Community Cloud for Preservation – Environment for more collaboration and sharing Oct. 31, 2011 2011 DLF Forum 23
  • 24. References 1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM, Volume 53, No 4, April 2010. 2. P. Teregowda, Burgaonkar, B. and C. L. Giles. “Cloud Computing: A Digital Libraries Perspective”. 2010 IEEE 3rd International Conference on Cloud Computing, Miami, FL, July 2010. 3. Stephen Abrams, Patricia Cruse, and John Kunze. “Preservation Is Not a Place”. The International Journal of Digital Curation, Issue 1, Volume 4, 2009. 4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben O’Steen, Neil Jefferies, and Leslie Carr. “Towards Smart Storage for Repository Preservation Services”. The International Journal of Digital Curation, Issue 1, Volume 5, 2010. 5. DuraCloud. Available: http://www.duraspace.org/duracloud.php. 6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. “Evaluation of Bit Preservation Strategies”. 7th International Conference on Preservation of Digital Objects (iPRES2010), Sep. 19-24, 2010, Vienna, Austria. Oct. 31, 2011 2011 DLF Forum 24
  • 25. Disclaimer  The content of this presentation is the personal opinion of the author and does not necessarily reflect any position of the U.S. Government or the National Archives and Records Administration. Oct. 31, 2011 2011 DLF Forum 25
  • 26. Thank You! Any questions? mailto:quyen.nguyen@nara.gov Oct. 31, 2011 2011 DLF Forum 26