Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
in data transfer from S3 
not including Amazon Web Services use
Architecture 
Choosing a region 
Building a naming scheme 
Considering LISTs 
Optimizing PUTs 
Multipart upload 
Demo 
Opt...
Request Rate and Performance Considerations 
http://amzn.to/18oF5LC 
TIP
1 
2 
5 
8 
100/8 = 12.5 events/sec 
100,000 users @ 10 events an hour = 224 TPS
<my_bucket>/2013_11_13-164533125.jpg 
<my_bucket>/2013_11_13-164533126.jpg 
<my_bucket>/2013_11_13-164533127.jpg 
<my_buck...
1 
2 
N 
1 
2 
N 
Partition 
Partition 
Partition 
Partition
<my_bucket>/521335461-2013_11_13.jpg 
<my_bucket>/465330151-2013_11_13.jpg 
<my_bucket>/987331160-2013_11_13.jpg 
<my_buck...
1 
2 
N 
1 
2 
N 
Partition 
Partition 
Partition 
Partition
•Store objects as a hash of their name 
–add the original name as metadata 
•“deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4...
<my_bucket>/images/521335461-2013_11_13.jpg 
<my_bucket>/images/465330151-2013_11_13.jpg 
<my_bucket>/movies/293924440-201...
Request Rate and Performance Considerations 
http://amzn.to/18oF5LC 
TIP
fasterflexibleset of partspresents all parts as a single objectparallelpausingresumingbeginning uploads before you know th...
DEMOMultipart Uploads
DEMOAmazon CloudFrontvs. Amazon S3 download performance
•Align your ranges with your parts!
DEMORange based GETs
DynamoDB 
Amazon RDS 
Amazon CloudSearch 
Amazon EC2
Maestro 
(Reserved Instance) 
List of crawl 
URLs 
Main workers 
Execute crawling and process data 
Spot Instances 
Second...
Architecture 
Choosing a region 
Building a naming scheme 
Considering LISTs 
Optimizing PUTs 
Multipart upload 
Demo 
Opt...
gfelipe@amazon.comthoran@bigdatacorp.com.br
Please give us your feedback on this 
presentation
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
Upcoming SlideShare
Loading in …5
×

(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

47,399 views

Published on

This session drills deep into the Amazon S3 technical best practices that help you maximize storage performance for your use case. We provide real-world examples and discuss the impact of object naming conventions and parallelism on Amazon S3 performance, and describe the best practices for multipart uploads and byte-range downloads.

Published in: Technology
  • Be the first to comment

(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014

  1. 1. in data transfer from S3 not including Amazon Web Services use
  2. 2. Architecture Choosing a region Building a naming scheme Considering LISTs Optimizing PUTs Multipart upload Demo Optimizing GETs Using CloudFront Range-based GETs Demo Customer Case BigDataCorp
  3. 3. Request Rate and Performance Considerations http://amzn.to/18oF5LC TIP
  4. 4. 1 2 5 8 100/8 = 12.5 events/sec 100,000 users @ 10 events an hour = 224 TPS
  5. 5. <my_bucket>/2013_11_13-164533125.jpg <my_bucket>/2013_11_13-164533126.jpg <my_bucket>/2013_11_13-164533127.jpg <my_bucket>/2013_11_13-164533128.jpg <my_bucket>/2013_11_12-164533129.jpg <my_bucket>/2013_11_12-164533130.jpg <my_bucket>/2013_11_12-164533131.jpg <my_bucket>/2013_11_12-164533132.jpg <my_bucket>/2013_11_11-164533133.jpg <my_bucket>/2013_11_11-164533134.jpg <my_bucket>/2013_11_11-164533135.jpg <my_bucket>/2013_11_11-164533136.jpg
  6. 6. 1 2 N 1 2 N Partition Partition Partition Partition
  7. 7. <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg
  8. 8. 1 2 N 1 2 N Partition Partition Partition Partition
  9. 9. •Store objects as a hash of their name –add the original name as metadata •“deadmau5_mix.mp3” 0aa316fb000eae52921aab1b4697424958a53ad9 –prependkeyname withshort hash •0aa3-deadmau5_mix.mp3 •Epoch time (reverse) –5321354831-deadmau5_mix.mp3
  10. 10. <my_bucket>/images/521335461-2013_11_13.jpg <my_bucket>/images/465330151-2013_11_13.jpg <my_bucket>/movies/293924440-2013_11_13.jpg <my_bucket>/movies/987331160-2013_11_13.jpg <my_bucket>/thumbs-small/838434842-2013_11_13.jpg <my_bucket>/thumbs-small/342532454-2013_11_13.jpg <my_bucket>/thumbs-small/345233453-2013_11_13.jpg <my_bucket>/thumbs-small/345453454-2013_11_13.jpg
  11. 11. Request Rate and Performance Considerations http://amzn.to/18oF5LC TIP
  12. 12. fasterflexibleset of partspresents all parts as a single objectparallelpausingresumingbeginning uploads before you know the total object size
  13. 13. DEMOMultipart Uploads
  14. 14. DEMOAmazon CloudFrontvs. Amazon S3 download performance
  15. 15. •Align your ranges with your parts!
  16. 16. DEMORange based GETs
  17. 17. DynamoDB Amazon RDS Amazon CloudSearch Amazon EC2
  18. 18. Maestro (Reserved Instance) List of crawl URLs Main workers Execute crawling and process data Spot Instances Secondary workers(queue listeners) Reprocess data, query additional services, store data on MongoDB Spot Instances Secondary work queues – processed data MongoDBcluster Command and Control Queue
  19. 19. Architecture Choosing a region Building a naming scheme Considering LISTs Optimizing PUTs Multipart upload Demo Optimizing GETs Using CloudFront Range-based GETs Demo Customer Case BigDataCorp
  20. 20. gfelipe@amazon.comthoran@bigdatacorp.com.br
  21. 21. Please give us your feedback on this presentation

×