AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier

  • 1,853 views
Uploaded on

Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS …

Join our webinar to learn more about how to build a cost effective archive application using Amazon Glacier, an extremely low cost, secure, highly durable, and easy to use storage service in the AWS cloud.

We will explain how Amazon Glacier works and walk through some best practices to get the most out of the service

We will also highlight how to choose between Amazon Glacier and Amazon S3’s Glacier storage option.

Learn more: http://aws.amazon.com/glacier/

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,853
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
21
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Archiving in the Cloud Best Practices for Amazon Glacier Colin Lazier & Henry Zhang
  • 2. What We’ll Cover Today Overview of Amazon Glacier Amazon Glacier Key Concepts Key Use Cases and Benefits Best Practices with Amazon Glacier Q&A
  • 3. Overview of Amazon Glacier
  • 4. With Amazon Glacier, You Can: Achieve extremely low storage costs for archive data Pay only for what you use No longer maintain your own physical storage infrastructure Increase durability and geographic redundancy Secure your data Access on-demand computing EC2
  • 5. What is Archival Data? Most data stored is infrequently accessed (Cold Data) Often older data still important for future reference Typically long-lived (months or years) Business and regulatory reasons to retain data
  • 6. What is Amazon Glacier? Extremely low cost archive storage service Allows you to retrieve any amount of data within 3-5 hours Provides high-durability storage Makes it easy to retain data safely and securely for months, years, or decades
  • 7. Benefits with Amazon Glacier Low cost As little as $0.01/GB/month with no up-front capital commitments. Durable Designed to provide an average annual durability of 99.999999999% per archive. Flexible Store any amount of data on-demand. Eliminate the need for capacity planning. Secure Leverage AWS’ robust security platform. Control access to your data. Simple Eliminate your operational overhead. Focus your resources on your core business. Use multiple services Easily leverage other AWS services once your data is in the AWS cloud.
  • 8. Customer Data Archiving Examples Enterprise Archives Media Archives Scientific Archives Enterprise Information Archiving includes archiving email, business documents and other unstructured content. Driven by business needs, compliance requirements, and to reduce primary storage costs. Media companies’ core assets (books, movies, music, TV etc.) can grow to hundreds of petabytes. Amazon Glacier reduces the cost of storing these assets while simultaneously increasing the durability, ease of use, and accessibility of the content. Research and scientific organizations, such as pharmaceutical and bio-tech companies, as well as universities, store many large but rarely accessed data sets.
  • 9. Amazon Glacier Key Concepts
  • 10. High-level Amazon Glacier Architecture Archive Application Send + Receive Data HTTP / REST APIs / AWS Import/Export Archive Application (Search, Policy-based data management, eDiscovery) Amazon Glacier Amazon IAM Control Access to your data Index (Index of your archived data)
  • 11. Amazon Glacier Concepts Archives An archive is a durably stored block of information. You store your data in Amazon Glacier as archives. You may upload a single file as an archive, but your request costs will be lower if you aggregate your data. TAR and ZIP are common formats that customers use to aggregate multiple files into a single file before uploading to Amazon Glacier Vaults You use vaults to organize the data you store in Amazon Glacier. Each archive is stored in a vault of your choice. You may control access to your data by setting vault-level access policies
  • 12. Uploading Data to Amazon Glacier 2 1 Upload Archives Create Vault 3 Configure Access Policies (Optional) via Amazon Identity and Access Management Retrieve Archives Archives are retrieved 3 - 5 hours after being requested Initiate Job Track Job Download Job Output
  • 13. Retrieving Data from Amazon Glacier 2 1 Upload Archives Create Vault 3 Configure Access Policies (Optional) via Amazon Identity and Access Management Retrieve Archives Archives are retrieved 3 - 5 hours after being requested Initiate Job Track Job Download Job Output
  • 14. Sending / Retrieving Data Sending and retrieving data • Glacier REST-based APIs to send and retrieve data • Direct Connect • Amazon S3 lifecycle archival to Amazon Glacier
  • 15. Additional Amazon Glacier / AWS Concepts Vault Inventory For a real time view of the contents of your vaults, you would refer to your index. For Disaster Recovery purposes, in case you lose or corrupt your index, Amazon Glacier maintains an inventory of all your archives in a vault. The vault inventory is updated approximately once a day Amazon Simple Notification Service (Amazon SNS) Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud
  • 16. Amazon Glacier Key Concepts 2 1 Create Vault Configure Access Policies (Optional) via Amazon Identity and Access Management Configure Notification Policies (Optional) via Amazon Simple Notification Service AWS Management Console Operations Also accessible via Amazon Glacier APIs or SDKs 3 Upload Archives Download Archives Retrieve Archives Archives retrieved 3 - 5 hours after being requested Initiate Job Track Job Download Job Output Amazon Glacier API Operations Also accessible via Amazon Glacier SDKs Notifications sent via Amazon SNS Your Application
  • 17. Best Practices with Amazon Glacier
  • 18. Aggregate Large Number of Smaller Files Reduce overhead costs Reduce requests costs Find ideal archive size for your use case
  • 19. Uploading Large files – MultipartUpload Internet weather Distance between your application and Amazon Glacier Cost of retrying failed transmissions Improve upload throughput
  • 20. Multipart Upload Improve speed and reliability with multipart upload 1. InitiateMultipartUpload(partSize) -> uploadId 2. UploadPart(uploadId, data) 3. CompleteMultipartUpload(uploadId) -> archiveId
  • 21. Optimize Data Retrieval and Download Retrieval vs. Download Ranged Retrieval • Reduce cost, control retrieval rate • Retrieve only what you need Ranged Download (Get) • Improve download speed • Be aware of your download speed as data is only staged for 24 hours
  • 22. Ranged Retrieval Example Example 12 GB archive Retrieved using a single 4 hour job = 3GB/hour peak retrieval Retrieved over 24 hours using 6 consecutive jobs = 0.5GB/hour peak retrieval
  • 23. Amazon Glacier Benefits Low cost As little as $0.01/GB/month with no up-front capital commitments. Durable Designed to provide an average annual durability of 99.999999999% per archive. Flexible Store any amount of data on-demand. Eliminate the need for capacity planning. Secure Leverage AWS’ robust security platform. Control access to your data. Simple Eliminate your operational overhead. Focus your resources on your core business. Use multiple services Easily leverage other AWS services once your data is in the AWS cloud.
  • 24. Thank You Q&A with Colin Lazier & Henry Zhang
  • 25. http://aws.amazon.com/glacier