S3 and Glacier
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


S3 and Glacier



This is a presentation that I gave at the AWS Meetup in Ann Arbor, Michigan back in January. It recounts some experiences that I had while working on a project with RightBrain Networks that involved ...

This is a presentation that I gave at the AWS Meetup in Ann Arbor, Michigan back in January. It recounts some experiences that I had while working on a project with RightBrain Networks that involved moving millions of small files around between S3, Glacier and an NFS NAS volume. A good time was had by all.



Total Views
Views on SlideShare
Embed Views



1 Embed 4

https://www.linkedin.com 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

S3 and Glacier Presentation Transcript

  • 1. Glacier and S3 Dave Thompson AWS Meetup Michigan, Jan 2014
  • 2. Who the @#%^ is Dave Thompson? • DevOps/SRE/Systems guy from MI by way of San Francisco • Current Employer: MuleSoft Inc • Past Employers: Netflix, Domino’s Pizza, U of M • Also contributing to the madness at RBN
  • 3. … and what is he talking about? • Today, we’ll talk about a case study using Glacier with S3, and the various surprises that I encountered on the way.
  • 4. Act 1: A New Project
  • 5. Our Story So Far • Client’s datacenter is going dark in a few months. • Their app is data heavy… a little less than 1 BN small files.
  • 6. Our Story So Far (cont.) • Client has migrated app servers to EC2 • Data has been uploaded to S3
  • 7. Everything Goes According to Plan! • Files are uploaded to S3 • App updated to use S3 data
  • 8. Act 2: The Public Cloud Strikes Back
  • 9. Things take a dark turn… S3 is too latent for the app.
  • 10. Enter RBN! The proposal: migrate the data from S3 to a cloud storage solution (Zadara), and archive the files to Glacier.
  • 11. Everything Goes According to Plan (Again)! • Files are copied to Zadara share • S3 lifecycle configured to archive objects to Glacier
  • 12. The Zadara share becomes corrupted after the data is migrated. Except…
  • 13. Amazon Glacier: a Primer • Glacier is an archival solution provided by AWS. • It’s closely integrated with S3. • Use cases for Glacier and S3 are different, though…
  • 14. S3 vs Glacier • Unlike an S3 GET, a Glacier RETRIEVAL takes ~4 hours • UPLOAD and RETRIEVAL API requests are 10x more expensive on Glacier than comparable S3 requests • Bandwidth charges for RETRIEVAL requests apply, even inside us-east-1
  • 15. S3 vs Glacier (cont.) • This means that Glacier is optimized for compressed archives (i.e. tarball data) • S3 is about equally suited for smaller or larger files • Automatically archiving S3 objects to Glacier can thus lead to great sadness.
  • 16. What a Twist! ~100MM files had already been automatically archived to Glacier.
  • 17. Act 3: Return of the Data
  • 18. The New Plan • Restore files from Glacier back to S3 • Migrate data from S3 to Zadara share • Archive files back to Glacier in tar.gz chunks • Create DynamoDB index from file name to Glacier archive for future restore
  • 19. but wait… How much was this restore going to cost?
  • 20. Task 0: Calculating Cost • Glacier pricing model is… interesting • Costs are fixed per UPLOAD and RETRIEVAL request • Cost for bandwidth based on the peak outbound bandwidth consumed in a monthly billing period2 • Monthly bandwidth equal to 5% of your total Glacier usage is permitted free of charge
  • 21. The Equation(Oh, boy. Okay, let’s do this.) • Let X equal the number of RETRIEVE API calls made. • Let Y equal the amount to restore in GB. • Let Z equal the total amount of data archived in GB. • Let T equal the time to restore the data in hours. • Then the cost can be expressed as: (0.05 * (X / 1000)) + (((Y / T) - (Z * .05 / 30) * .01 * 720)
  • 22. Task 1: Restore from Glacier • Two m2.large instances running a Python daemon • Multiple iterations, from single threaded to multi- threaded to multiprocessing with threading After iterating several times to get the speed we needed, I started the process for the ‘last time’ on a Sunday evening. ETA: ~5 days
  • 23. This Page Intentionally Left Blank
  • 24. Protip: Glacier is not optimized for RPS
  • 25. Task 1: Restore from Glacier (cont.) Glacier team was not amused.
  • 26. Task 1: Restore from Glacier (cont.) Restore continued at the ‘suggested’ rate, and thereafter completed successfully a couple of weeks later. Task 1 complete!
  • 27. Task 2: Migrate and Archive Data Now we just needed to migrate the data from S3 to Zadara (again), create tarballs of the files, archive them to Glacier, and create a DynamoDB index so you can look up individual files. Easy!
  • 28. Task 2: Migrate and Archive Data (cont.) Back to iPython and Boto. Recent experience with Python threading and multiprocessing was to prove helpful.
  • 29. This Page Intentionally Left Blank
  • 30. Great Success! And the whole thing only took about 10x as long as the client initially estimated!
  • 31. Lessons Learned • Glacier is optimized for large, compressed files and lower request rates. • Be very careful about the S3 -> Glacier lifecycle option. • If you DoS an Amazon service, you get special attention!
  • 32. Questions have you?