DLF 2011Baltimore, MD  Digital Preservation Cloud  Services for Libraries and           Archives                Quyen L. N...
Outline     Introduction     LDPaaS     Levels of Service and Cost Model     Related Work     ConclusionOct. 31, 2011...
Functional Requirements Need for Long-Term Digital Preservation     – Policy mandates: retention of governments’ records ...
Desired System Characteristics Dynamic Scalability     – Increase as well as decrease Cost-effective Maintainability    ...
Cloud Computing Characteristics     Elasticity          – Computing and storage resources          – Three levels of clou...
OAIS Reference ModelOct. 31, 2011          2011 DLF Forum   6
LDPaaS Long-term Digital Preservation as a Cloud Service   – Encompass major OAIS functionalities   – Not only storage se...
Ingest Provisioning Challenges Unpredictability due to business policies      – Uneven flow of transfer volume      – Var...
Access Provisioning Challenges Unpredictability of publishing     – Volume of publishable data sets Spikiness of Access ...
Preservation Provisioning Challenges Prominent preservation methods:     Bit-level: error detection and correction capab...
Storage Provisioning Challenges It is all about Storage capacity      Scale of Storage Requirement   May be Best Suited t...
Software Paradigms                                               Virtualization Structural       Object-                  ...
System ArchitectureOct. 31, 2011         2011 DLF Forum   13
SOA-based Ingest Process                                                     • Ingest Process                             ...
LDPaaS Levels of Service Service                                            Levels Ingest           IL1: Transfer Only    ...
Level of Service Definitions  Definition 1.  Each Content Server has a set of LoS formalized by the following 6-  tuple:  ...
LoS - Example 1                              Digital Library Repository  Define Content Server C1 by C1 = (CL1, IL2, PL2, ...
LoS - Example 2         Digital Library Repository for Research PublicationsTwo Sets of Records Stored in Two Different Co...
LoS - Example 3           Sarbanes-Oxley Act Compliance Business Archive  Retain and Preserve Records in a Sliding Time Wi...
Cost Model Cost is one of the crucial elements in Cloud Computing Let O = (V, N) be the Body of N Digital Objects and to...
Cost Model Example Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume :    fCL2 (V,N) = 20V + 100 N;     fIL2 (N) = 10 N;    ...
Related Work CiteSeer study by Teregowda [2]:     – Examine each service in the architecture stack in terms of feasibilit...
Conclusion Proposed LDPaaS concept: why is it useful?   – Beneficial to large organizations   – Beneficial to small organ...
References 1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM,    Volume 53, No 4, April 20...
Disclaimer The content of this presentation is the personal opinion of  the author and does not necessarily reflect any p...
Thank You!                   Any questions?                mailto:quyen.nguyen@nara.govOct. 31, 2011             2011 DLF ...
Upcoming SlideShare
Loading in …5
×

Digital Preservation Cloud Services for Libraries and Archives

2,637
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,637
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
56
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Digital Preservation Cloud Services for Libraries and Archives

  1. 1. DLF 2011Baltimore, MD Digital Preservation Cloud Services for Libraries and Archives Quyen L. Nguyen – NARA
  2. 2. Outline  Introduction  LDPaaS  Levels of Service and Cost Model  Related Work  ConclusionOct. 31, 2011 2011 DLF Forum 2
  3. 3. Functional Requirements Need for Long-Term Digital Preservation – Policy mandates: retention of governments’ records – Knowledge function: preserve digitized books and digital born materials – History-oriented mandates: preservation of cultural heritage Challenges – Rapid growth of digital objects that require archiving. – Data heterogeneityOct. 31, 2011 2011 DLF Forum 3
  4. 4. Desired System Characteristics Dynamic Scalability – Increase as well as decrease Cost-effective Maintainability – Operation cost – Patches: COTS, security. Evolvability – Technology refresh – New features and servicesOct. 31, 2011 2011 DLF Forum 4
  5. 5. Cloud Computing Characteristics  Elasticity – Computing and storage resources – Three levels of cloud services: IaaS, PaaS, and SaaS. – Quick Provisioning (e.g. Cloud Market [3]) – Pay-as-you-go  Cost-efficient Maintenance – Economies of scale – Maximizing utilization of computing resources  Evolvability by configurationOct. 31, 2011 2011 DLF Forum 5
  6. 6. OAIS Reference ModelOct. 31, 2011 2011 DLF Forum 6
  7. 7. LDPaaS Long-term Digital Preservation as a Cloud Service – Encompass major OAIS functionalities – Not only storage service, – But also preservation service according to customer’s policies: retention period, preservation level, and access level. Beneficial to Cloud Service Consumer – Relieve records owners from the burden of engineering and provisioning preservation infrastructure Beneficial to Cloud Service Provider – Realize economies of scales by sharing unused computing resourcesOct. 31, 2011 2011 DLF Forum 7
  8. 8. Ingest Provisioning Challenges Unpredictability due to business policies – Uneven flow of transfer volume – Various object sizes, hence object numbers – Various object types Cloud Computing benefits: – Computation resources  File format identification and Application of Integrity Seal – Storage resources: Ingest processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 8
  9. 9. Access Provisioning Challenges Unpredictability of publishing – Volume of publishable data sets Spikiness of Access request load Access types: Storage Delivery Networks vs Content Delivery Networks. Cloud Computing benefits – Computation: access-time visualization, zooming, conversion to access format – Storage: High-efficiency Access disk cacheOct. 31, 2011 2011 DLF Forum 9
  10. 10. Preservation Provisioning Challenges Prominent preservation methods:  Bit-level: error detection and correction capabilities  Transformation  Computing resources for transformation processes  Storage served as a scratchpad for transformation.  Emulation: virtual machine requirements. Cloud Computing benefits – Computation: Execution of Preservation Algorithms – Storage: Preservation Processing Buffer SpaceOct. 31, 2011 2011 DLF Forum 10
  11. 11. Storage Provisioning Challenges It is all about Storage capacity Scale of Storage Requirement May be Best Suited to Function as Hyper Large-Scale Cloud Provider Moderate-to-Small-Scale Cloud Consumer  Could there be a Community Cloud?Oct. 31, 2011 2011 DLF Forum 11
  12. 12. Software Paradigms Virtualization Structural Object- SOA Cloud orientedOct. 31, 2011 2011 DLF Forum 12
  13. 13. System ArchitectureOct. 31, 2011 2011 DLF Forum 13
  14. 14. SOA-based Ingest Process • Ingest Process implemented as composite service Virus Scan • Could be DROID implemented by File Format Identification BPEL. JHOVE Metadata Ingest Extraction Integrity Seal Move to Preservation StorageOct. 31, 2011 2011 DLF Forum 14
  15. 15. LDPaaS Levels of Service Service Levels Ingest IL1: Transfer Only IL2: With Format Identification IL3: Metadata Extraction Preservation PL1: Bit PL2: Content PL3: Content, Behavior & Formatting Discovery DL1: Metadata search DL2: Full content search Access AL1: Passive Viewer AL2: Interactive Viewer AL3: Content Mining Storage SL1: Delayed Access - Near-Line Storage SL2: Rapid Access - High Performance Storage Content Server CL1: Just-in-Time Active CL2: Always ActiveOct. 31, 2011 2011 DLF Forum 15
  16. 16. Level of Service Definitions Definition 1. Each Content Server has a set of LoS formalized by the following 6- tuple: C = (CL, IL, PL, DL, AL, SL). Definition 2. Since a customer can have one or more Content Servers, a customer’s SLA is specified by the n-tuple: L = (C1, …, Cn), if the customer has signed up for n Content Servers, with each Ci being a 6-tuple defined according to Definition 1.Oct. 31, 2011 2011 DLF Forum 16
  17. 17. LoS - Example 1 Digital Library Repository Define Content Server C1 by C1 = (CL1, IL2, PL2, DL1, AL2, SL2)  Content Server CL1 - Active Just-in-Time - this repository is sporadically used  Ingest Service IL2 - File Format Identification  Preservation Service PL2 - Preservation at the Content Level  Discovery Service: DL1 - Metadata Search  Access Service: AL2 - Interactive Viewer is provided for access.  Storage Service SL2 - Rapid Access, High Performance Disk - the volume is staticOct. 31, 2011 2011 DLF Forum 17
  18. 18. LoS - Example 2 Digital Library Repository for Research PublicationsTwo Sets of Records Stored in Two Different Content Servers: C1 and C2C1 - Relatively Small Volume of High-Demand Digital Assets C1 = (CL1, IL2, PL3, DL1, AL1, SL2)CL1 - Active Just-in-Time Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL1 - Metadata SearchAL1 - Passive ViewerSL2 - High Performance, Rapid Access StorageC2 - Backend Repository, Volume Increasing with Time C2 = (CL2, IL2, PL3, DL2, AL1, SL1)CL2 - Always Active Content ServerIL2 - File Format IdentificationPL3 - Preservation at the Content and Formatting LevelDL2 - Full Content SearchAL1 - Passive ViewerSL1 - Delayed Access StorageOct. 31, 2011 2011 DLF Forum 18
  19. 19. LoS - Example 3 Sarbanes-Oxley Act Compliance Business Archive Retain and Preserve Records in a Sliding Time Window of Seven Years C1 = (CL1, IL2, PL1, DL2, AL1, SL1) PL1 - Preservation Service at the Bit Level Retention Period of Seven Years – Elaborate Preservation not Needed SL1 - Delayed Access Storage Archive Intended for Audit Purposes Only - Rapid Access to Data not EssentialOct. 31, 2011 2011 DLF Forum 19
  20. 20. Cost Model Cost is one of the crucial elements in Cloud Computing Let O = (V, N) be the Body of N Digital Objects and total volume V Cost (O, Service) depends on the level of service. – Function of V or N or both. Examples: fIL1 - Utilization Cost for Digital Object Transfer, varies with V fIL2 - File Type Identification Vary with N fIL3 - Metadata Extraction TOTAL COST (O,C) = Cost (O, Service), wherewhere Service = {Ingest, Preservation, Discovery, Access, Storage}Oct. 31, 2011 2011 DLF Forum 20
  21. 21. Cost Model Example Let C1 = (CL2, IL2, PL1, DL1, AL1, SL1). Assume : fCL2 (V,N) = 20V + 100 N; fIL2 (N) = 10 N; fPL1 (V) = 20 V; fDL1 (N) = 30 N; fAL1 (V) = 30 V; fSL1 (V) = 40 V. For Set O1 of Objects with V1 = 10 GB and N1 = 106 totalCost(O1,C1) = 140,000,740 For Set O2 of Objects with V2 = 103 GB and N2 = 102 totalCost(O2,C1) = 88,000 Note : totalCost(O2,C1) < totalCost(O1,C1) , although V2 > V1Oct. 31, 2011 2011 DLF Forum 21
  22. 22. Related Work CiteSeer study by Teregowda [2]: – Examine each service in the architecture stack in terms of feasibility and cost of migrating and hosting in the Cloud. – Possible integration with Cloud Storage thanks to current virtualized storage component. DuraCloud [5]: – Open source platform for digital libraries and archives – Adapters to commercially available Cloud Storage services Strategies and SLAs for bit-level preservation by Zierau [6]: – Various sub-levels of bit-preservation. www.cloudpreservation.com: archives and indexes data from websites and social networks. www.ltdprm.org/ - Long-Term Digital Retention and Preservation Reference Model: cloud-based digital archive.Oct. 31, 2011 2011 DLF Foruml 22
  23. 23. Conclusion Proposed LDPaaS concept: why is it useful? – Beneficial to large organizations – Beneficial to small organizations Notional cost model useful for establishing a price model associated with published SLA set. Contend that Cloud Storage Service vendors can augment their portfolios to provide LDPaaS. Community Cloud for Preservation – Environment for more collaboration and sharingOct. 31, 2011 2011 DLF Forum 23
  24. 24. References 1. Michael Armbrust et al. “A View of Cloud Computing”. Communications of the ACM, Volume 53, No 4, April 2010. 2. P. Teregowda, Burgaonkar, B. and C. L. Giles. “Cloud Computing: A Digital Libraries Perspective”. 2010 IEEE 3rd International Conference on Cloud Computing, Miami, FL, July 2010. 3. Stephen Abrams, Patricia Cruse, and John Kunze. “Preservation Is Not a Place”. The International Journal of Digital Curation, Issue 1, Volume 4, 2009. 4. Steve Hitchcock, David Tarrant, Adrian Brown, Ben O’Steen, Neil Jefferies, and Leslie Carr. “Towards Smart Storage for Repository Preservation Services”. The International Journal of Digital Curation, Issue 1, Volume 5, 2010. 5. DuraCloud. Available: http://www.duraspace.org/duracloud.php. 6. Eld Zierau, Ulla Bogvad Kejser, and Hannes Kulovits. “Evaluation of Bit Preservation Strategies”. 7th International Conference on Preservation of Digital Objects (iPRES2010), Sep. 19-24, 2010, Vienna, Austria.Oct. 31, 2011 2011 DLF Forum 24
  25. 25. Disclaimer The content of this presentation is the personal opinion of the author and does not necessarily reflect any position of the U.S. Government or the National Archives and Records Administration.Oct. 31, 2011 2011 DLF Forum 25
  26. 26. Thank You! Any questions? mailto:quyen.nguyen@nara.govOct. 31, 2011 2011 DLF Forum 26
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×