Digital Preservation Cloud Services for Libraries and Archives
Upcoming SlideShare
Loading in...5

Digital Preservation Cloud Services for Libraries and Archives






Total Views
Views on SlideShare
Embed Views



1 Embed 317 317


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Digital Preservation Cloud Services for Libraries and Archives Digital Preservation Cloud Services for Libraries and Archives Presentation Transcript

  • DLF 2011Baltimore, MD Digital Preservation Cloud Services for Libraries and Archives Quyen L. Nguyen – NARA
  • Outline  Introduction  LDPaaS  Levels of Service and Cost Model  Related Work  ConclusionOct. 31, 2011 2011 DLF Forum 2
  • Functional Requirements Need for Long-Term Digital Preservation – Policy mandates: retention of governments’ records – Knowledge function: preserve digitized books and digital born materials – History-oriented mandates: preservation of cultural heritage Challenges – Rapid growth of digital objects that require archiving. – Data heterogeneityOct. 31, 2011 2011 DLF Forum 3
  • Desired System Characteristics Dynamic Scalability – Increase as well as decrease Cost-effective Maintainability – Operation cost – Patches: COTS, security. Evolvability – Technology refresh – New features and servicesOct. 31, 2011 2011 DLF Forum 4
  • Cloud Computing Characteristics  Elasticity – Computing and storage resources – Three levels of cloud services: IaaS, PaaS, and SaaS. – Quick Provisioning (e.g. Cloud Market [3]) – Pay-as-you-go  Cost-efficient Maintenance – Economies of scale – Maximizing utilization of computing resources  Evolvability by configurationOct. 31, 2011 2011 DLF Forum 5
  • OAIS Reference ModelOct. 31, 2011 2011 DLF Forum 6
  • LDPaaS Long-term Digital Preservation as a Cloud Service – Encompass major OAIS functionalities – Not only storage service, – But also preservation service according to customer’s policies: retention period, preservation level, and access level. Beneficial to Cloud Service Consumer – Relieve records owners from the burden of engineering and provisioning preservation infrastructure Beneficial to Cloud Service Provider – Realize economies of scales by sharing unused computing resourcesOct. 31, 2011 2011 DLF Forum 7
  • Ingest Provisioning Challenges Unpredictability due to business policies – Uneven flow of transfer volume – Various object sizes, hence object numbers – Various object types Cloud Computing benefits: – Computation resources  File format identification and Application of Integrity Seal – Storage resources: Ingest processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 8
  • Access Provisioning Challenges Unpredictability of publishing – Volume of publishable data sets Spikiness of Access request load Access types: Storage Delivery Networks vs Content Delivery Networks. Cloud Computing benefits – Computation: access-time visualization, zooming, conversion to access format – Storage: High-efficiency Access disk cacheOct. 31, 2011 2011 DLF Forum 9
  • Preservation Provisioning Challenges Prominent preservation methods:  Bit-level: error detection and correction capabilities  Transformation  Computing resources for transformation processes  Storage served as a scratchpad for transformation.  Emulation: virtual machine requirements. Cloud Computing benefits – Computation: Execution of Preservation Algorithms – Storage: Preservation Processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 10
  • Storage Provisioning Challenges It is all about Storage capacity Scale of Storage Requirement May be Best Suited to Function as Hyper Large-Scale Cloud Provider Moderate-to-Small-Scale Cloud Consumer  Could there be a Community Cloud?Oct. 31, 2011 2011 DLF Forum 11
  • Software Paradigms Virtualization Structural Object- SOA Cloud orientedOct. 31, 2011 2011 DLF Forum 12
  • System ArchitectureOct. 31, 2011 2011 DLF Forum 13
  • SOA-based Ingest Process • Ingest Process implemented as composite service Virus Scan • Could be DROID implemented by File Format Identification BPEL. JHOVE Metadata Ingest Extraction Integrity Seal Move to Preservation StorageOct. 31, 2011 2011 DLF Forum 14
  • LDPaaS Levels of Service Service Levels Ingest IL1: Transfer Only IL2: With Format Identification IL3: Metadata Extraction Preservation PL1: Bit PL2: Content PL3: Content, Behavior & Formatting Discovery DL1: Metadata search DL2: Full content search Access AL1: Passive Viewer AL2: Interactive Viewer AL3: Content Mining Storage SL1: Delayed Access - Near-Line Storage SL2: Rapid Access - High Performance Storage Content Server CL1: Just-in-Time Active CL2: Always ActiveOct. 31, 2011 2011 DLF Forum 15
  • Level of Service Definitions Definition 1. Each Content Server has a set of LoS formalized by the following 6- tuple: C = (CL, IL, PL, DL, AL, SL). Definition 2. Since a customer can have one or more Content Servers, a customer’s SLA is specified by the n-tuple: L = (C1, …, Cn), if the customer has signed up for n Content Servers, with each Ci being a 6-tuple defined according to Definition 1.Oct. 31, 2011 2011 DLF Forum 16
  • LoS - Example 1 Digital Library Repository Define Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2)  Content Server CL1 - Active Just-in-Time - this repository is sporadically used  Ingest Service IL2 - File Format Identification  Preservation Service PL2 - Preservation at the Content Level  Discovery Service: DL1 - Metadata Search  Access Service: AL2 - Interactive Viewer is provided for access.  Storage Service SL2 - Rapid Access, High Performance Disk - the volume is staticOct. 31, 2011 2011 DLF Forum 17
  • LoS - Example 2 Digital Library Repository for Research PublicationsTwo Sets of Records Stored in Two Different Content Servers: C1 and C2C1 - Relatively Small Volume of High-Demand Digital Assets C1 = (CL1, IL2, PL3, DL1, AL1, SL2)CL1 - Active Just-in-Time Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL1 - Metadata SearchAL1 - Passive ViewerSL2 - High Performance, Rapid Access StorageC2 - Backend Repository, Volume Increasing with Time C2 = (CL2, IL2, PL3, DL2, AL1, SL1)CL2 - Always Active Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL2 - Full Content SearchAL1 - Passive ViewerSL1 - Delayed Access StorageOct. 31, 2011 2011 DLF Forum 18
  • LoS - Example 3 Sarbanes-Oxley Act Compliance Business Archive Retain and Preserve Records in a Sliding Time Window of Seven Years C1 = (CL1, IL2, PL1, DL2, AL1, SL1) PL1 - Preservation Service at the Bit Level Retention Period of Seven Years – Elaborate Preservation not Needed SL1 - Delayed Access Storage Archive Intended for Audit Purposes Only - Rapid Access to Data not EssentialOct. 31, 2011 2011 DLF Forum 19
  • Cost Model Cost is one of the crucial elements in Cloud Computing Let O = (V, N) be the Body of N Digital Objects and total volume V Cost (O, Service) depends on the level of service. – Function of V or N or both. Examples: fIL1 - Utilization Cost for Digital Object Transfer, varies with V fIL2 - File Type Identification Vary with N fIL3 - Metadata Extraction TOTAL COST (O,C) = Cost (O, Service), wherewhere Service = {Ingest, Preservation, Discovery, Access, Storage}Oct. 31, 2011 2011 DLF Forum 20
  • Cost Model Example Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume : fCL2 (V,N) = 20V + 100 N; fIL2 (N) = 10 N; fPL1 (V) = 20 V; fDL1 (N) = 30 N; fAL1 (V) = 30 V; fSL1 (V) = 40 V. For Set O1 of Objects with V1 = 10 GB and N1 = 106 totalCost(O1,C1) = 140,000,740 For Set O2 of Objects with V2 = 103 GB and N2 = 102 totalCost(O2,C1) = 88,000 Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1Oct. 31, 2011 2011 DLF Forum 21
  • Related Work CiteSeer study by Teregowda [2]: – Examine each service in the architecture stack in terms of feasibility and cost of migrating and hosting in the Cloud. – Possible integration with Cloud Storage thanks to current virtualized storage component. DuraCloud [5]: – Open source platform for digital libraries and archives – Adapters to commercially available Cloud Storage services Strategies and SLAs for bit-level preservation by Zierau [6]: – Various sub-levels of bit-preservation. archives and indexes data from websites and social networks. - Long-Term Digital Retention and Preservation Reference Model: cloud-based digital archive.Oct. 31, 2011 2011 DLF Foruml 22
  • Conclusion Proposed LDPaaS concept: why is it useful? – Beneficial to large organizations – Beneficial to small organizations Notional cost model useful for establishing a price model associated with published SLA set. Contend that Cloud Storage Service vendors can augment their portfolios to provide LDPaaS. Community Cloud for Preservation – Environment for more collaboration and sharingOct. 31, 2011 2011 DLF Forum 23
  • References 1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM, Volume 53, No 4, April 2010. 2. P. Teregowda, Burgaonkar, B. and C. L. Giles. “Cloud Computing: A Digital Libraries Perspective”. 2010 IEEE 3rd International Conference on Cloud Computing, Miami, FL, July 2010. 3. Stephen Abrams, Patricia Cruse, and John Kunze. “Preservation Is Not a Place”. The International Journal of Digital Curation, Issue 1, Volume 4, 2009. 4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben O’Steen, Neil Jefferies, and Leslie Carr. “Towards Smart Storage for Repository Preservation Services”. The International Journal of Digital Curation, Issue 1, Volume 5, 2010. 5. DuraCloud. Available: 6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. “Evaluation of Bit Preservation Strategies”. 7th International Conference on Preservation of Digital Objects (iPRES2010), Sep. 19-24, 2010, Vienna, Austria.Oct. 31, 2011 2011 DLF Forum 24
  • Disclaimer The content of this presentation is the personal opinion of the author and does not necessarily reflect any position of the U.S. Government or the National Archives and Records Administration.Oct. 31, 2011 2011 DLF Forum 25
  • Thank You! Any questions? mailto:quyen.nguyen@nara.govOct. 31, 2011 2011 DLF Forum 26