Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)


Published on

Not just for archiving or compliance use cases, Amazon Glacier accommodates customers simply looking to replace their on-premises long term storage with a cost efficient, durable, cloud option, from which they can easily and quickly access their data when they need to. This session will introduce newly launched features for Amazon Glacier, review the current service feature set, and share the global data center shut down and storage strategy for Sony DADC New Media Solutions (NMS). NMS is Sony’s digital servicing division providing global digital distribution, linear playout and white label OTT/Commerce solutions for clients such as BBC Worldwide, NBCUniversal, Sony Playstation, and Funimation Entertainment.

Hear from Andy Shenkler, NMS’s Chief Technology and Solutions Officer as he talks about the key factors that drove the organization’s decision to move away from tape and go towards the cloud and out of the infrastructure business overall. Learn more about the impact and operational practices inside a world class digital supply chain as they were able to move over 20 petabytes of data, over 1M hours of video, to the cloud and never looked back.

Published in: Technology
  • Be the first to comment

AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mas Kubo, Senior Product Manager, Amazon Glacier Andy Shenkler, EVP and Chief Solutions & Technology Officer, Sony DADC New Media Solutions (NMS) November 30, 2016 Deep Dive on Amazon Glacier STG302
  2. 2. Audio archives – SoundCloud • World’s leading social sound platform • Audio files transcoded and stored in multiple formats • Stores petabytes of data • Transcoded files served from Amazon S3 • Originals moved to Amazon Glacier for longterm retention
  3. 3. Patient data – Philips Healthcare • HealthSuite digital platform powered by AWS • 15 petabytes of patient data • Securely stored for decades (beyond the lifetime of patients) • Uses HIPAA-eligible AWS services
  4. 4. Tape replacement – King County • Most populous county in Washington State • Replaced tape solution for backups from 17 agencies • Meets compliance requirements • Saved $1MM in first year, no more tape refresh or management churn
  5. 5. Batches and Streams Direct Connect Snowball, Snowball Edge, Snowmobile 3rd Party Connectors Transfer Acceleration Storage Gateway Kinesis Firehose File Amazon EFS Block Amazon EBS (persistent) Object Amazon GlacierAmazon S3 Amazon EC2 Instance Store (ephemeral)
  6. 6. Data Storage Demand • Media assets, 4k, 8k • Healthcare/life sciences • Financial services • Regulated industries • Oil and gas/geospatial • Digital preservation • Longterm backups • Logs Archive: • Secure and durable • Low cost • Flexible data access • Compliant
  7. 7. Amazon Glacier • Extremely low-cost archive storage service, starting at $0.004 per GB per month • New! Three retrieval options ranging from minutes to hours (more later) • 99.999999999% of durability (5-6 orders of magnitude higher than 2 copies of tape) • All data is encrypted at rest • Features: compliance, data management, cost management, audit logging
  8. 8. Amazon Glacier Metered usage: pay as you go No capital investment No commitment No risky capacity planning Avoid risks of physical media handling Control your geographic locality for performance and compliance
  9. 9. Key Terms and Concepts • Vaults – container for archives, up to 1,000 vaults per account • Archives – basic unit, write-once, 40 TB max, unlimited archives • Inventory – cold index of archives refreshed every 24 hours • Access – three ways to access Amazon Glacier • Uploads – multipart, lifecycle, cost optimizations, AWS Snowball • Data management – Vault Lock, tagging, audit logs • Retrievals – retrieval policies, range retrievals, new retrieval features
  10. 10. Accessing Amazon Glacier 1. Direct Amazon Glacier API/SDK 2. Amazon S3 lifecycle integration 3. Third-party tools and gateways FastGlacier
  11. 11. Uploading data: Internet or sneaker-net AWS Direct Connect Dedicated bandwidth between your site and AWS Internet Transfer data in a secure SSL tunnel over the public Internet AWS Import/Export AWS Snowball Physical transfer of media into and out of AWS
  12. 12. Uploading data: archive descriptions • Use archive description field for metadata • If local index is corrupted or destroyed, use archive description to reconstruct critical mappings • For example, create index entry, add primary key to archive description on upload Local Index Entry Primary key: 12345 Description: 2014Audit Dept: FinanceDept ArchiveID: 9FG23….. ….. UploadArchive(data, ArchiveDescription=“12345, 2014Audit,FinanceDept”) -> Archive ID = 9FG23…..
  13. 13. Uploading data: optimizing costs • Every archive has 32 KB of associated overhead and some operations are charged per request • For archive size of 3.2 MB ~1% cost overheads • For 1 KB archive, 97% of cost would go to overhead • Solution is aggregation – recommend minimum size on the order of at least MBs
  14. 14. Checksum 2 Checksum 1 File 2 Checksum 3 . . . Local index File 1 offset File 1 File 2 offset File 3 offset Index/directory … Checksum & metadata Checksum & metadata Checksum & metadata Archive Uploading data: aggregating archives
  15. 15. Best practices: multipart uploads Improve throughput, reliability, and get idempotency 1. InitiateMultipartUpload(partSize) → uploadId 2. UploadPart(uploadId, data) 3. CompleteMultipartUpload(uploadId) → archiveId Archive Parallel Uploads Parts
  16. 16. Amazon Glacier: Amazon S3 lifecycle policies • Seamlessly move data from Amazon S3 to Amazon Glacier • Automated lifecycle rules • Transition based on object age
  17. 17. Amazon Glacier: Amazon S3 lifecycle policies • Object-level tagging for S3 objects • Apply lifecycle rules based on object tags • Example: transition objects to Amazon Glacier when 1 year old and have object tags ‘Project=Delta’ and ‘Data type=HPI’.
  18. 18. Management features: vault tagging
  19. 19. Management features: audit logging via AWS CloudTrail • Enable AWS CloudTrail in console • Control plane events: vault activities • Data plane events: archive activities
  20. 20. Management features: vault access policies • Manage access to a vault in a single location – single AWS Identity and Access Management (IAM) policy – Grant/revoke access to internal business units/teams – “Marketing_Vault” has an access policy that is distinct from “DevOps_Vault” • Easily manage cross-account access for your business partner – Simply add a section for your business partner in the same policy
  21. 21. Management features: Vault Lock • Non-overwrite, non-erasable records • Time-based retention with “ArchiveAgeInDays” control • Policy lockdown (strong governance) • Legal hold with vault-level tags • Configure optional designated third-party access and grant temporary access
  22. 22. Vault Lock: two-step locking • InitiateVaultLock – Effectuates a retention policy for testing (in-progress state) – Returns a unique lock ID (expires after 24 hours) • AbortVaultLock – Deletes an in-progress policy – Ability to modify a policy before locking it down • CompleteVaultLock – Locks down the vault with the appropriate lock ID – A Vault Lock policy cannot be aborted once locked Management features: Vault Lock
  23. 23. • Set up a legal hold tag – Configure a vault-level tag “LegalHold” – Set initial value to “False” • Add compliance control for legal hold in a vault lock policy – Deny delete archive operation – From anybody (root, administrators, users, business partners) – When LegalHold tag = “True” • Place or lift legal hold by updating the tag value Legal hold with vault-level tags Management features: Vault Lock
  24. 24. Example control: legal hold Management features: Vault Lock
  25. 25. • Map one vault to a single retention range – Group regulatory data by retention: 1-year vault, 6-year vault, etc. • Create a new vault and lock it before storing production data – Enforce the full ArchiveAgeInDays on all new archives – Leave no “gap” on existing archives • Thoroughly test a vault lock policy before locking it down (Abort/Initiate) • Implement only the most restrictive controls with Vault Lock – Leave the flexible controls to vault access policy Vault Lock best practices Management features: Vault Lock
  26. 26. Amazon Glacier received a third-party assessment from Cohasset Associates on how Amazon Glacier with Vault Lock can be used to meet the requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c) Third-party assessment Management features: Vault Lock
  27. 27. Data retrievals: basic concepts Initiate job ArchiveId: AE99F… Vault: Films -> Job ID 1 3-5 hours for job completion2 3 Job completion notification 4 Download output
  28. 28. Data retrievals: restoring via lifecycle 1 2
  29. 29. Data retrievals: restoring via lifecycle 3 4
  30. 30. Data retrievals: data retrieval policies • Provides transparency and cost control for data retrievals • Governs all retrieval activities for an account in a region • Synchronously accepts or rejects each retrieval request • Accounts for inflight retrieval operations
  31. 31. Checksum 2 Checksum 1 File 2 Checksum 3 . . . Local index File 1 offset File 1 File 2 offset File 3 offset Index/directory … Checksum & metadata Checksum & metadata Checksum & metadata Archive Data retrievals: range retrievals
  32. 32. Data retrievals: expedited and bulk retrievals Expedited Standard Bulk Data Access Time 1 - 5 minutes 3 - 5 hours 5 - 12 hours Data Retrievals $0.03 per GB $0.01 per GB $0.0025 per GB Retrieval Requests $0.01 per request $0.05 per 1,000 requests $0.025 per 1,000 requests • Expedited: designed for occasional urgent access to a small number of archives • Standard: low-cost option for retrieving data in just a few hours • Bulk: lowest cost option optimized for large retrievals, up to petabytes of data in 12 hours • Three flexible and powerful retrieval options to access any of your Amazon Glacier data
  33. 33. Accelerated Media Lifecycle @SonyDADCNMS
  34. 34. “If physical deliveries can happen within one hour based on unpredictable requests, surely we are able to exceed such expectations digitally” @SonyDADCNMS
  35. 35. Our migration The Challenge • Seamlessly migrate a platform that enables content delivery across all devices and more than 1,200 distribution points worldwide • Store 20 petabytes of motion picture and television content • Equating to 1,000,000 M+ hours of content • At a growth curve of ~1 petabyte every quarter Desired Goals: • One-hour delivery turn around time • Agile, scalable, predictable cost model and infrastructure • Investing in innovation vs. hardware @SonyDADCNMS
  36. 36. On-premise Asset Storage Workflow @SonyDADCNMS
  37. 37. AWS Cloud-based asset storage workflow @SonyDADCNMS AMAZON GLACIER
  38. 38. Amazon Glacier vs. on-premises cost comparison @SonyDADCNMS
  39. 39. Thank you!
  40. 40. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Who: Lead Software Development Engineers, Architects, and Technical PMs Where: Storage Booth Walk-up Bar When: Exhibit hours (Tues 5-7pm, Wed & Thurs 10:30a-6:00p) What: Architecture best practices, code reviews, feature requests Storage “Office Hours” Meet the People who Build AWS Storage
  41. 41. Remember to complete your evaluations!