SlideShare a Scribd company logo
Glacier and S3
Dave Thompson
AWS Meetup Michigan, Jan 2014
Who the @#%^ is Dave
Thompson?
• DevOps/SRE/Systems guy from MI by way of San
Francisco
• Current Employer: MuleSoft Inc
• Past Employers: Netflix, Domino’s Pizza, U of M
• Also contributing to the madness at RBN
… and what is he talking
about?
• Today, we’ll talk about a case study using Glacier
with S3, and the various surprises that I
encountered on the way.
Act 1: A New Project
Our Story So Far
• Client’s datacenter is going dark in a few months.
• Their app is data heavy… a little less than 1 BN
small files.
Our Story So Far (cont.)
• Client has migrated app servers to EC2
• Data has been uploaded to S3
Everything Goes According
to Plan!
• Files are uploaded to S3
• App updated to use S3 data
Act 2: The Public Cloud Strikes Back
Things take a
dark turn…
S3 is too latent for the app.
Enter RBN!
The proposal: migrate the data from S3 to a cloud storage
solution (Zadara), and archive the files to Glacier.
Everything Goes According
to Plan (Again)!
• Files are copied to Zadara share
• S3 lifecycle configured to archive objects to Glacier
The Zadara share becomes
corrupted after the data is migrated.
Except…
Amazon Glacier: a Primer
• Glacier is an archival solution provided by AWS.
• It’s closely integrated with S3.
• Use cases for Glacier and S3 are different,
though…
S3 vs Glacier
• Unlike an S3 GET, a Glacier RETRIEVAL takes ~4
hours
• UPLOAD and RETRIEVAL API requests are 10x
more expensive on Glacier than comparable S3
requests
• Bandwidth charges for RETRIEVAL requests apply,
even inside us-east-1
S3 vs Glacier (cont.)
• This means that Glacier is optimized for
compressed archives (i.e. tarball data)
• S3 is about equally suited for smaller or larger files
• Automatically archiving S3 objects to Glacier can
thus lead to great sadness.
What a Twist!
~100MM files had already been
automatically archived to Glacier.
Act 3: Return of the Data
The New Plan
• Restore files from Glacier back to S3
• Migrate data from S3 to Zadara share
• Archive files back to Glacier in tar.gz chunks
• Create DynamoDB index from file name to Glacier
archive for future restore
but wait…
How much was this restore going to cost?
Task 0: Calculating Cost
• Glacier pricing model is… interesting
• Costs are fixed per UPLOAD and RETRIEVAL
request
• Cost for bandwidth based on the peak outbound
bandwidth consumed in a monthly billing period2
• Monthly bandwidth equal to 5% of your total Glacier
usage is permitted free of charge
The Equation(Oh, boy. Okay, let’s do
this.)
• Let X equal the number of RETRIEVE API calls made.
• Let Y equal the amount to restore in GB.
• Let Z equal the total amount of data archived in GB.
• Let T equal the time to restore the data in hours.
• Then the cost can be expressed as:
(0.05 * (X / 1000)) + (((Y / T) - (Z * .05 / 30) * .01 * 720)
Task 1: Restore from Glacier
• Two m2.large instances running a Python daemon
• Multiple iterations, from single threaded to multi-
threaded to multiprocessing with threading
After iterating several times to get the speed we needed, I
started the process for the ‘last time’ on a Sunday evening.
ETA: ~5 days
This Page Intentionally
Left Blank
Protip:
Glacier is not optimized for RPS
Task 1: Restore from Glacier
(cont.)
Glacier team was not amused.
Task 1: Restore from Glacier
(cont.)
Restore continued at the ‘suggested’ rate, and thereafter
completed successfully a couple of weeks later.
Task 1 complete!
Task 2: Migrate and Archive
Data
Now we just needed to migrate the data from S3 to Zadara
(again), create tarballs of the files, archive them to Glacier, and
create a DynamoDB index so you can look up individual files.
Easy!
Task 2: Migrate and Archive
Data (cont.)
Back to iPython and Boto. Recent experience with Python
threading and multiprocessing was to prove helpful.
This Page Intentionally
Left Blank
Great Success!
And the whole thing only took about 10x as long
as the client initially estimated!
Lessons Learned
• Glacier is optimized for large, compressed files and
lower request rates.
• Be very careful about the S3 -> Glacier lifecycle
option.
• If you DoS an Amazon service, you get special
attention!
Questions have you?

More Related Content

What's hot

AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
Amazon Web Services
 
AWS 201 Webinar: Introduction to Amazon Glacier
AWS 201 Webinar: Introduction to Amazon GlacierAWS 201 Webinar: Introduction to Amazon Glacier
AWS 201 Webinar: Introduction to Amazon Glacier
Amazon Web Services
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
Amazon Web Services
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon Web Services
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Amazon Web Services
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
Amazon Web Services
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Amazon Web Services
 
AWS Storage Gateway
AWS Storage GatewayAWS Storage Gateway
AWS Storage Gateway
zekeLabs Technologies
 
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
Amazon Web Services
 
Backup to the Cloud
Backup to the CloudBackup to the Cloud
Backup to the Cloud
2nd Watch
 
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
Amazon Web Services
 
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
Amazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
Amazon Web Services
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Amazon Web Services
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Amazon Web Services
 
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
Amazon Web Services
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
Amazon Web Services
 
AWS Storage Options
AWS Storage OptionsAWS Storage Options
AWS Storage Options
Amazon Web Services
 
EC2 and S3 Level 100
EC2 and S3 Level 100EC2 and S3 Level 100
EC2 and S3 Level 100
AWS Riyadh User Group
 

What's hot (20)

AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon GlacierAWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
AWS Webcast - Archiving in the Cloud - Best Practices for Amazon Glacier
 
AWS 201 Webinar: Introduction to Amazon Glacier
AWS 201 Webinar: Introduction to Amazon GlacierAWS 201 Webinar: Introduction to Amazon Glacier
AWS 201 Webinar: Introduction to Amazon Glacier
 
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
An Overview of AWS Services for Data Storage and Migration - SRV205 - Atlanta...
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
AWS Storage Services - AWS Presentation - AWS Cloud Storage for the Enterpris...
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
 
AWS Storage Gateway
AWS Storage GatewayAWS Storage Gateway
AWS Storage Gateway
 
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
(STG311) AWS Storage Gateway: Secure, Cost-Effective Backup & Archive
 
Backup to the Cloud
Backup to the CloudBackup to the Cloud
Backup to the Cloud
 
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
Best Practices for Architecting Cloud Backup and Recovery Solutions - AWS Mar...
 
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
SRG302 Archiving in the Cloud using Amazon Glacier - AWS re: Invent 2012
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier | AWS Public Sector...
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
 
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
(SOV203) Understanding AWS Storage Options | AWS re:Invent 2014
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
Understanding AWS Storage Options (STG101) | AWS re:Invent 2013
 
AWS Storage Options
AWS Storage OptionsAWS Storage Options
AWS Storage Options
 
EC2 and S3 Level 100
EC2 and S3 Level 100EC2 and S3 Level 100
EC2 and S3 Level 100
 

Similar to S3 and Glacier

Ingest and storage options
Ingest and storage optionsIngest and storage options
Ingest and storage options
Amazon Web Services
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Amazon Web Services Korea
 
Store stream data on Data Lake
Store stream data on Data LakeStore stream data on Data Lake
Store stream data on Data Lake
Marcos Rebelo
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the Cloud
Amazon Web Services
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
Amazon Web Services
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
Amazon Web Services
 
Bluecat Iceberg Journey by Cory Darby
Bluecat Iceberg Journey by Cory DarbyBluecat Iceberg Journey by Cory Darby
Bluecat Iceberg Journey by Cory Darby
Brian Olsen
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage Service
Rod Boothby
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
Hany Fahim
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
Amazon Web Services
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
Igor Sfiligoi
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Amazon Web Services
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Amazon Web Services
 
AWS Customer Highlight - Craftsy
AWS Customer Highlight - CraftsyAWS Customer Highlight - Craftsy
AWS Customer Highlight - Craftsy
Amazon Web Services
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
Codemotion
 
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
 Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ... Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
AWS Chicago
 
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
Amazon Web Services
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
Schubert Zhang
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Amazon Web Services
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
Databricks
 

Similar to S3 and Glacier (20)

Ingest and storage options
Ingest and storage optionsIngest and storage options
Ingest and storage options
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Store stream data on Data Lake
Store stream data on Data LakeStore stream data on Data Lake
Store stream data on Data Lake
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the Cloud
 
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
MED201 Media Ingest and Storage Solutions with AWS - AWS re: Invent 2012
 
AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour AWS for Start-ups - Case Study - PeoplePerHour
AWS for Start-ups - Case Study - PeoplePerHour
 
Bluecat Iceberg Journey by Cory Darby
Bluecat Iceberg Journey by Cory DarbyBluecat Iceberg Journey by Cory Darby
Bluecat Iceberg Journey by Cory Darby
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage Service
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
AWS Customer Highlight - Craftsy
AWS Customer Highlight - CraftsyAWS Customer Highlight - Craftsy
AWS Customer Highlight - Craftsy
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
 Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ... Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
Server-less solution for moving Millions of Images in Cloud - Brett Sutter, ...
 
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
[AWS LA Media & Entertainment Event 2015]: Digital Media Ingest & Storage Opt...
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 

Recently uploaded

Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
MiscAnnoy1
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
Madan Karki
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 

Recently uploaded (20)

Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Introduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptxIntroduction to AI Safety (public presentation).pptx
Introduction to AI Safety (public presentation).pptx
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Seminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptxSeminar on Distillation study-mafia.pptx
Seminar on Distillation study-mafia.pptx
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 

S3 and Glacier

  • 1. Glacier and S3 Dave Thompson AWS Meetup Michigan, Jan 2014
  • 2. Who the @#%^ is Dave Thompson? • DevOps/SRE/Systems guy from MI by way of San Francisco • Current Employer: MuleSoft Inc • Past Employers: Netflix, Domino’s Pizza, U of M • Also contributing to the madness at RBN
  • 3. … and what is he talking about? • Today, we’ll talk about a case study using Glacier with S3, and the various surprises that I encountered on the way.
  • 4. Act 1: A New Project
  • 5. Our Story So Far • Client’s datacenter is going dark in a few months. • Their app is data heavy… a little less than 1 BN small files.
  • 6. Our Story So Far (cont.) • Client has migrated app servers to EC2 • Data has been uploaded to S3
  • 7. Everything Goes According to Plan! • Files are uploaded to S3 • App updated to use S3 data
  • 8. Act 2: The Public Cloud Strikes Back
  • 9. Things take a dark turn… S3 is too latent for the app.
  • 10. Enter RBN! The proposal: migrate the data from S3 to a cloud storage solution (Zadara), and archive the files to Glacier.
  • 11. Everything Goes According to Plan (Again)! • Files are copied to Zadara share • S3 lifecycle configured to archive objects to Glacier
  • 12. The Zadara share becomes corrupted after the data is migrated. Except…
  • 13. Amazon Glacier: a Primer • Glacier is an archival solution provided by AWS. • It’s closely integrated with S3. • Use cases for Glacier and S3 are different, though…
  • 14. S3 vs Glacier • Unlike an S3 GET, a Glacier RETRIEVAL takes ~4 hours • UPLOAD and RETRIEVAL API requests are 10x more expensive on Glacier than comparable S3 requests • Bandwidth charges for RETRIEVAL requests apply, even inside us-east-1
  • 15. S3 vs Glacier (cont.) • This means that Glacier is optimized for compressed archives (i.e. tarball data) • S3 is about equally suited for smaller or larger files • Automatically archiving S3 objects to Glacier can thus lead to great sadness.
  • 16. What a Twist! ~100MM files had already been automatically archived to Glacier.
  • 17. Act 3: Return of the Data
  • 18. The New Plan • Restore files from Glacier back to S3 • Migrate data from S3 to Zadara share • Archive files back to Glacier in tar.gz chunks • Create DynamoDB index from file name to Glacier archive for future restore
  • 19. but wait… How much was this restore going to cost?
  • 20. Task 0: Calculating Cost • Glacier pricing model is… interesting • Costs are fixed per UPLOAD and RETRIEVAL request • Cost for bandwidth based on the peak outbound bandwidth consumed in a monthly billing period2 • Monthly bandwidth equal to 5% of your total Glacier usage is permitted free of charge
  • 21. The Equation(Oh, boy. Okay, let’s do this.) • Let X equal the number of RETRIEVE API calls made. • Let Y equal the amount to restore in GB. • Let Z equal the total amount of data archived in GB. • Let T equal the time to restore the data in hours. • Then the cost can be expressed as: (0.05 * (X / 1000)) + (((Y / T) - (Z * .05 / 30) * .01 * 720)
  • 22. Task 1: Restore from Glacier • Two m2.large instances running a Python daemon • Multiple iterations, from single threaded to multi- threaded to multiprocessing with threading After iterating several times to get the speed we needed, I started the process for the ‘last time’ on a Sunday evening. ETA: ~5 days
  • 24. Protip: Glacier is not optimized for RPS
  • 25. Task 1: Restore from Glacier (cont.) Glacier team was not amused.
  • 26. Task 1: Restore from Glacier (cont.) Restore continued at the ‘suggested’ rate, and thereafter completed successfully a couple of weeks later. Task 1 complete!
  • 27. Task 2: Migrate and Archive Data Now we just needed to migrate the data from S3 to Zadara (again), create tarballs of the files, archive them to Glacier, and create a DynamoDB index so you can look up individual files. Easy!
  • 28. Task 2: Migrate and Archive Data (cont.) Back to iPython and Boto. Recent experience with Python threading and multiprocessing was to prove helpful.
  • 30. Great Success! And the whole thing only took about 10x as long as the client initially estimated!
  • 31. Lessons Learned • Glacier is optimized for large, compressed files and lower request rates. • Be very careful about the S3 -> Glacier lifecycle option. • If you DoS an Amazon service, you get special attention!