Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series


Published on

With Expedited, Standard, and Bulk retrievals, you can leverage Amazon Glacier’s extremely low-cost storage service to support the full spectrum of archive use cases. These range from deep archives that are never retrieved to active workloads with minute-level access, such as media broadcasting, to petabyte-scale content distribution or big data analytics use cases. This session will dive deep into the recently launched retrieval features, review Amazon Glacier’s current feature set, and share use cases from customers leveraging Glacier’s latest features.

Learning Objectives:
• Dive deep on Amazon Glacier and the new retrieval features
• Learn about the benefits of Amazon Glacier and the new retrieval features
• Learn about the different use cases
• Learn how to get started using Amazon Glacier

Published in: Technology
  • Be the first to comment

Deep Dive on Amazon Glacier Covering New Retrieval Features - December 2016 Monthly Webinar Series

  1. 1. Mas Kubo, Senior Product Manager, Amazon Glacier December 12, 2016 Deep Dive on Amazon Glacier © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  2. 2.  Storing 20 PB and 1M+ hours of motion picture and television content, growing 1 PB per quarter  Single-copy on Glacier  Over $10MM in savings  Replaced legacy tape solution  Higher performance, higher durability, lower cost Media Content Distribution – Sony DADC
  3. 3.  HealthSuite digital platform powered by AWS  15 PB of patient data  Archives patient records and medical images produced across over 1,500 hospitals  Securely stored for decades (lifetime of patients)  Uses HIPAA-eligible AWS services Patient data – Philips Healthcare
  4. 4. Batches and Streams Direct Connect Snowball, Snowball Edge, Snowmobile 3rd Party Connectors Transfer Acceleration Storage Gateway Kinesis Firehose File Amazon EFS Block Amazon EBS (persistent) Object Amazon GlacierAmazon S3 Amazon EC2 Instance Store (ephemeral)
  5. 5. Data Storage Demand  Media assets, 4k, 8k  Healthcare/life sciences  Financial services  Regulated industries  Oil and gas/geospatial  Digital preservation  Longterm backups  Logs Solution Requirements:  Secure and durable  Scalable  Cost-effective  Flexible data access  Compliant
  6. 6. Flexible Data Access Three retrieval options from minutes to hours Durable 11 9s of durability (5 orders of magnitude better than 2 copies on tape) Management Features Vault Lock, Retrieval Policies, CloudTrail Cost-Effective Starting at $0.004 per GB per month Secure All data encrypted at rest Scalable From gigabytes to exabytes Amazon Glacier
  7. 7. Amazon Glacier Metered usage: pay as you go No capital investment No commitment No risky capacity planning Avoid risks of physical media handling Control your geographic locality for performance and compliance
  8. 8. Key Terms and Concepts  Vaults – container for archives, up to 1,000 vaults per account  Archives – basic unit, write-once, 40 TB max, unlimited archives  Inventory – cold index of archives refreshed every 24 hours 1. Access – three ways to access Amazon Glacier 2. Uploads – multipart, lifecycle, cost optimizations, AWS Snowball 3. Data management – Vault Lock, tagging, audit logs 4. Retrievals – retrieval policies, range retrievals, new retrieval features
  9. 9. Accessing Amazon Glacier 1. Direct Amazon Glacier API/SDK 2. Amazon S3 lifecycle integration 3. Third-party tools and gateways FastGlacier
  10. 10. Uploading data: Internet or sneaker-net AWS Direct Connect Dedicated bandwidth between your site and AWS Internet Transfer data in a secure SSL tunnel over the public Internet Snowball Snowball Edge Snowmobile Physical transfer of media into and out of AWS
  11. 11. Uploading data: archive descriptions  Use archive description field for metadata  If local index is corrupted or destroyed, use archive description to reconstruct critical mappings  For example, create index entry, add primary key to archive description on upload Local Index Entry Primary key: 12345 Description: 2014Audit Dept: FinanceDept ArchiveID: 9FG23….. ….. UploadArchive(data, ArchiveDescription=“12345, 2014Audit,FinanceDept”) -> Archive ID = 9FG23…..
  12. 12. Uploading data: optimizing costs  Every archive has 32 KB of associated overhead and some operations are charged per request  For archive size of 3.2 MB ~1% cost overheads  For 1 KB archive, 97% of cost would go to overhead  Solution is aggregation – recommend minimum size on the order of at least MBs
  13. 13. Checksum 2 Checksum 1 File 2 Checksum 3 . . . Local index File 1 offset File 1 File 2 offset File 3 offset Index/directory … Checksum & metadata Checksum & metadata Checksum & metadata Archive Uploading data: aggregating archives
  14. 14. Best practices: multipart uploads Improve throughput, reliability, and get idempotency 1. InitiateMultipartUpload(partSize) → uploadId 2. UploadPart(uploadId, data) 3. CompleteMultipartUpload(uploadId) → archiveId Archive Parallel Uploads Parts
  15. 15. Amazon Glacier: Amazon S3 lifecycle policies  Seamlessly move data from Amazon S3 to Amazon Glacier  Automated lifecycle rules  Transition based on object age
  16. 16. Amazon Glacier: Amazon S3 lifecycle policies  Object-level tagging for S3 objects  Apply lifecycle rules based on object tags  Example: transition objects to Amazon Glacier when 1 year old and have object tags ‘Project=Delta’ and ‘Data type=HPI’.
  17. 17. Management features: vault tagging
  18. 18. Management features: AWS CloudTrail  Enable AWS CloudTrail in console  Control plane events: vault activities  Data plane events: archive activities
  19. 19. Management features: vault access policies  Manage access to a vault in a single location – single AWS Identity and Access Management (IAM) policy  Grant/revoke access to internal business units/teams  “Marketing_Vault” has an access policy that is distinct from “DevOps_Vault”  Easily manage cross-account access for your business partner  Simply add a section for your business partner in the same policy
  20. 20. Management features: Vault Lock  Non-overwrite, non-erasable records  Time-based retention with “ArchiveAgeInDays” control  Policy lockdown (strong governance)  Legal hold with vault-level tags  Configure optional designated third-party access and grant temporary access
  21. 21. Vault Lock: two-step locking  InitiateVaultLock  Effectuates a retention policy for testing (in-progress state)  Returns a unique lock ID (expires after 24 hours)  AbortVaultLock  Deletes an in-progress policy  Ability to modify a policy before locking it down  CompleteVaultLock  Locks down the vault with the appropriate lock ID  A Vault Lock policy cannot be aborted once locked Management features: Vault Lock
  22. 22.  Set up a legal hold tag  Configure a vault-level tag “LegalHold”  Set initial value to “False”  Add compliance control for legal hold in a vault lock policy  Deny delete archive operation  From anybody (root, administrators, users, business partners)  When LegalHold tag = “True”  Place or lift legal hold by updating the tag value Legal hold with vault-level tags Management features: Vault Lock
  23. 23. Example control: legal hold Management features: Vault Lock
  24. 24.  Map one vault to a single retention range  Group regulatory data by retention: 1-year vault, 6-year vault, etc.  Create a new vault and lock it before storing production data  Enforce the full ArchiveAgeInDays on all new archives  Leave no “gap” on existing archives  Thoroughly test a vault lock policy before locking it down (Abort/Initiate)  Implement only the most restrictive controls with Vault Lock  Leave the flexible controls to vault access policy Vault Lock best practices Management features: Vault Lock
  25. 25. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c) Third-party assessment Management features: Vault Lock
  26. 26. Data retrievals: basic concepts Initiate job ArchiveId: AE99F… Vault: Films -> Job ID 1 Retrieval Processing (minutes or hours depending on retrieval option) 2 3 Job completion notification 4 Download output
  27. 27. Data retrievals: restoring via lifecycle 1 2
  28. 28. Data retrievals: restoring via lifecycle 3 4
  29. 29. Data retrievals: data retrieval policies  Provides transparency and cost control for data retrievals  Governs all retrieval activities for an account in a region  Synchronously accepts or rejects each retrieval request  Accounts for inflight retrieval operations
  30. 30. Data retrievals: expedited and bulk retrievals Expedited Standard Bulk Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests  Expedited: designed for occasional urgent access to a small number of archives  Standard: low-cost option for retrieving data in just a few hours  Bulk: lowest cost option optimized for large retrievals, up to petabytes of data in 12 hours  Three flexible and powerful retrieval options to access any of your Amazon Glacier data
  31. 31. Data retrievals: expedited retrievals  Expedited: two types of requests  On-demand: like EC2 On-Demand instances are available the vast majority of the time  Provisioned requests: guaranteed capacity  Provisioned capacity  Guarantees expedited retrieval capacity is available when needed  Ensure at least 3 expedited requests every 5 minutes and provides up to 150 MB/s of retrieval throughput  $100 per month per unit
  32. 32. Thank you!
  33. 33. Q&A