Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

2,664 views

Published on

This session explores some of the key features of Amazon Glacier, including security, durability, and configuration for storing compliance and regulatory data. It covers best practices for managing your cold data, including ingest, retrieval, and security controls. Other topics include: how to optimize storage, upload, and retrieval costs; how to identify the most applicable workloads; and recommended optimizations based on a few sample use cases from a number of industry verticals.

Published in: Technology

(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, Amazon Glacier October 2015 Amazon Glacier Deep Dive STG312
  2. 2. Audio archives – SoundCloud • World’s leading social sound platform • Audio files transcoded and stored in multiple formats • Stores PBs of data • Transcoded files served from Amazon S3 • Originals moved to Amazon Glacier for long-term retention
  3. 3. Video archives – Sony Media Cloud (Ci) Amazon Glacier
  4. 4. Tape replacement – King County • Most populous county in Washington State • Replace tape solution for backup from 17 agencies • Meet compliance requirement • Saved $1MM in first year, no more tape refresh or management churn
  5. 5. Archive: Data retained for the long term, for compliance or potential future reference Data archiving needs are growing everywhere • Media assets, 4K, 8K • Health care / Life sciences • Financial services • Regulated industries • Oil and gas / Geospatial • Digital preservation • Long-term backups • Logs
  6. 6. Traditional archiving approaches • Tape silos / Tape libraries • Tape drives (LTO-X / DLT / etc.) • Virtual tape libraries (VTLs) • Tape out / Vaulting • Specialized software & personnel
  7. 7. How can Amazon Glacier help with your archival? Metered usage: Pay as you go No capital investment No commitment No risky capacity planning Avoid risks of physical media handling Control your geographic locality for performance and compliance
  8. 8. Amazon Glacier is a low-cost storage service for archival data with long-term retention requirements. $0.007/GB per month 3-5 hour data retrieval Financial records Medical PACs images High Res Media Assets
  9. 9. How can Amazon Glacier help with your archival? Extremely low-cost archive storage service, starting at $0.007 GB/mo Allows you to retrieve data within 3-5 hours 99.999999999% of durability (7 orders of magnitude higher than 2 copies of tape) No data migration, no hardware/infrastructure investments Infinite scale and pay for what you use Access to on-demand compute resource on AWS
  10. 10. Getting started – key concepts • Account – Access AWS services, view billing/usage, manage security • Vaults – Container for archives, up to 1000 vaults per account • Archives – Files and records, write-once, 40TB max, unlimited archives • Inventory – Cold index of archive properties refreshed every 24 hours
  11. 11. Amazon Glacier – 3 ways to Access •Direct Glacier API/SDK •S3 lifecycle integration •Third party tools and gateways
  12. 12. Amazon Glacier concepts: Uploading data Create vault (films)1 Configure access policies2 ArchiveApp user policy Effect:Allow Resource: arn:aws:glacier:<accountId>:vaults/Films Action: glacier:UploadArchive 3 Upload archives UploadArchive(data) -> Archive ID
  13. 13. Amazon Glacier concepts: Retrieving data Initiate Job ArchiveId: AE99F… Vault: Films -> Job ID 1 3-5 hours for job completion2 3 Job completion notification 4 Download output
  14. 14. Amazon Glacier – Amazon S3 lifecycle archival • Seamlessly move data from Amazon S3 to Amazon Glacier • Automated lifecycle rules • Transition based on object age or predefined date
  15. 15. Amazon Glacier – Backup software integration • CommVault – Native Integration with Amazon S3 & Amazon Glacier • Deduplication & encryption • Single console management Amazon S3 Amazon Glacier
  16. 16. Amazon Glacier – Third-party tools and gateways •Consumer grade: less than $50 • Example: Cloudberry, FastGlacier, Arq (Haystack Software) •Small / medium business: $500 - $1,000 • Example: Synology, Veeam, QNap •Enterprise grade gateway (price varies) • Example: NetApp AltaVault
  17. 17. Best practices – Prepare your data
  18. 18. Use Archive descriptions • Use Archive description field for metadata. • If local index is corrupted or destroyed, use archive description to reconstruct critical mappings. • For example, create index entry, add primary key to archive description on upload.
  19. 19. Small objects and object size overhead • Every archive has 32KB of associated overhead and some operations are charged per request • For archive size of 3.2MB ~1% cost overheads • For 1KB archive, 97% of cost would go to overhead • Solution is aggregation – recommend minimum size on the order of at least MBs
  20. 20. Archive aggregation Checksum 2 Checksum 1 File 2 Checksum 3 . . . Local index File 1 offset File 1 File 2 offset File 3 offset Index/directory … Checksum & metadata Checksum & metadata Checksum & metadata Archive
  21. 21. Best practices – Optimize upload
  22. 22. Best practices: Multipart uploads Improve throughput, reliability, and get idempotency with multipart uploads 1. InitiateMultipartUpload(partSize) → uploadId 2. UploadPart(uploadId, data) 3. CompleteMultipartUpload(uploadId) → archiveId Archive Parallel Uploads Parts
  23. 23. Best practices: Data ingestion options AWS Direct Connect Dedicated bandwidth between your site and AWS Internet Transfer data in a secure SSL tunnel over the public Internet AWS Import/Export Snowball Physical transfer of media into and out of AWS
  24. 24. Best practices – Cost management
  25. 25. Amazon Glacier – Data retrieval policies • Provides transparency and cost control for data retrievals • Governs all retrieval activities for an account in a region • Synchronously accept/reject each retrieval request • Accounts for inflight retrieval operations
  26. 26. Amazon Glacier – Data retrieval policies
  27. 27. Amazon Glacier – Data retrieval policies
  28. 28. Amazon Glacier – Data retrieval policies
  29. 29. Amazon Glacier – Data retrieval policies
  30. 30. Cost allocation with vault tags
  31. 31. Best practices – Security and compliance
  32. 32. Amazon Glacier – Audit logging with AWS CloudTrail • Enable AWS CloudTrail in console • Control plane events – Vault activities • Data plane events – Archive activities
  33. 33. Vault access policies • Manage access to a Vault in a single location – single IAM policy – Grant/revoke access to internal business units/teams – “Marketing_Vault” has a distinct access policy than “DevOps_Vault” • Easily manage cross-account access for your business partner – Simply add a section for your business partner in the same policy
  34. 34. Amazon Glacier Vault Lock allows you to easily set compliance controls on individual vaults and enforce them via a lockable policy. Time-based retention MFA Authentication Controls govern all records in a Vault Immutable policy Two-step locking Compliance Storage with Vault Lock
  35. 35. Vault Lock for compliance storage • Non-overwrite, non-erasable records • Time-based retention with “ArchiveAgeInDays” control • Policy lockdown (strong governance) • Legal hold with vault-level tags • Configure optional designated third-party access and grant temporary access
  36. 36. Example control: 1 year record retention
  37. 37. Example control: 1 year record retention
  38. 38. Vault Lock: Two-step locking
  39. 39. Legal hold with vault-level tags
  40. 40. Example control: Legal hold
  41. 41. Vault lock best practices
  42. 42. Vault access policy • Can be updated/deleted Vault lock policy • Lockable/Immutable policy • Cannot be updated/deleted after lockdown Use vault access policy to: • Designate third-party access • Grant temporary read permissions when necessary Use vault lock policy to: • Deploy regulatory controls such as records retention • Enforce data access through multi-factor authentication only Compliance/Governance Flexibility Using vault lock policy with vault access policy
  43. 43. Vault Lock in the Glacier Console
  44. 44. Vault Lock in the Glacier Console
  45. 45. Vault Lock in the Glacier Console
  46. 46. Vault Lock in the Glacier Console
  47. 47. Vault Lock in the Glacier Console
  48. 48. Vault Lock in the Glacier Console
  49. 49. Vault Lock in the Glacier Console
  50. 50. Vault Lock in the Glacier Console
  51. 51. Vault Lock in the Glacier Console
  52. 52. Vault Lock in the Glacier Console
  53. 53. Vault Lock in the Glacier Console
  54. 54. Vault Lock in the Glacier Console
  55. 55. Vault Lock in the Glacier Console
  56. 56. Vault Lock in the Glacier Console
  57. 57. Vault Lock in the Glacier Console
  58. 58. Vault Lock in the Glacier Console
  59. 59. Vault Lock in the Glacier Console
  60. 60. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).
  61. 61. Thank you!
  62. 62. Remember to complete your evaluations!

×