Advertisement

Deep Dive on Amazon S3

Amazon Web Services
Jul. 14, 2016
Advertisement

More Related Content

Slideshows for you(20)

Similar to Deep Dive on Amazon S3(20)

Advertisement

More from Amazon Web Services(20)

Advertisement

Deep Dive on Amazon S3

  1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Susan Chan Senior Product Manager, Amazon S3 July 13, 2016 Deep Dive on Amazon S3
  2. Recent innovations on S3 Visibility & control of your data New storage offering More data ingestion options • Standard - Infrequent Access • Amazon CloudWatch integration • AWS CloudTrail integration • New lifecycle policies • Event notifications • Bucket limit increases • Read-after-write consistency • AWS Snowball (80 TB) • S3 Transfer Acceleration • Amazon Kinesis Firehose • Partner integration
  3. Choice of storage classes on S3 Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier
  4. File sync and share + consumer file storage Backup and archive + disaster recovery Long-retained data Use cases for Standard-Infrequent Access
  5. 11 9s of durability Standard - Infrequent Access storage Designed for 99.9% availability Durable Available Same as Standard storage High performance • Bucket policies • AWS Identity and Access Management (IAM) policies • Many encryption options Secure • Lifecycle management • Versioning • Event notifications • Metrics Integrated • No impact on user experience • Simple REST API Easy to use
  6. - Directly PUT to Standard - IA - Transition Standard to Standard - IA - Transition Standard - IA to Amazon Glacier storage - Expiration lifecycle policy - Versioning support Standard - Infrequent Access storage Integrated: Lifecycle management Standard - Infrequent Access
  7. Transition older objects to Standard - IA
  8. Lifecycle policy Standard Storage -> Standard - IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> Standard - Infrequent Access storage
  9. Standard Storage -> Standard - IA <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>documents/</Prefix> <Status>Enabled</Status> <Transition> <Days>30</Days> <StorageClass>STANDARD-IA</StorageClass> </Transition> <Transition> <Days>365</Days> <StorageClass>GLACIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration> Standard - IA Storage -> Amazon Glacier Standard - Infrequent Access storage Lifecycle policy
  10. Data ingestion into S3
  11. S3 Transfer Acceleration S3 Bucket AWS Edge Location Uploader Optimized Throughput! Typically 50%-400% faster Change your endpoint, not your code No firewall exceptions No client software required 56 global edge locations
  12. Rio De Janeiro Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles Seattle Tokyo Singapore Time[hrs.] 500 GB upload from these edge locations to a bucket in Singapore Public Internet How fast is S3 Transfer Acceleration? S3 Transfer Acceleration
  13. Getting started 1. Enable S3 Transfer Acceleration on your S3 bucket. 2. Update your endpoint to <bucket-name>.s3-accelerate.amazonaws.com. 3. Done!
  14. How much will it help me? s3speedtest.com
  15. Tip: Parallelizing PUTs with multipart uploads • Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks • Move the bottleneck to the network, where it belongs • Increase resiliency to network errors; fewer large restarts on error-prone networks Best Practice
  16. Incomplete multipart upload expiration policy • Partial upload does incur storage charges • Set a lifecycle policy to automatically make incomplete multipart uploads expire after a predefined number of days Incomplete multipart upload expiration Best Practice
  17. Enable policy with the Management Console
  18. Example lifecycle policy <LifecycleConfiguration> <Rule> <ID>sample-rule</ID> <Prefix>MyKeyPrefix/</Prefix> <Status>rule-status</Status> <AbortIncompleteMultipartUpload> <DaysAfterInitiation>7</DaysAfterInitiation> </AbortIncompleteMultipartUpload> </Rule> </LifecycleConfiguration> Or enable a policy with the API
  19. Rob Hruska Engineering Director
  20. 3k collegiate/pro 125k amateur teams
  21. 8+ petabytes 4,000,000,000+ objects 35 hours video / minute
  22. • Video Upload/Storage Pipeline • S3 Transfer Acceleration • Standard – Infrequent Access
  23. Browsers Amazon CloudFront Hudl Uploader Upload-Only Bucket Application Video Processing Mobile Apps Metadata / Object Locations (HTTP) Video Amazon SQS Permanent Bucket Hudl video lifecycle
  24. Amazon CloudFront Application Video Processing Browsers Mobile Apps Metadata / Object Locations (HTTP) Amazon SQS Permanent Bucket Hudl Uploader Upload-Only Bucket Video Hudl video lifecycle
  25. Amazon CloudFront Application Video Processing Browsers Mobile Apps Metadata / Object Locations (HTTP) Amazon SQS Permanent Bucket Hudl Uploader Upload-Only Bucket Video Transfer Acceleration Hudl video lifecycle – Transfer Acceleration
  26. Browsers Amazon CloudFront Hudl Uploader Upload-Only Bucket Application Video Processing Mobile Apps Metadata / Object Locations (HTTP) Video Amazon SQS Permanent Bucket Hudl video lifecycle
  27. Browsers Amazon CloudFront Hudl Uploader Upload-Only Bucket Application Video Processing Mobile Apps Metadata / Object Locations (HTTP) Video Amazon SQS Permanent Bucket Hudl video lifecycle – Standard - IA
  28. Standard - IA > 50% transitioned Standard - IA (GB) Standard (GB)
  29. Video views/week Finding an ideal Standard - IA transition lifecycle Video Views Weeks After Created
  30. Teams N-N Teams N-N Teams N-N Game Video Highlight Video Recruit Video Bucket organization
  31. Amazon Glacier Amazon S3 Future thoughts – Amazon Glacier Amazon S3 Standard - IA Amazon S3 Standard
  32. Tip #1: Use versioning • Protects from accidental overwrites and deletes • New version with every upload • Easy retrieval of deleted objects and roll back to previous versions Best Practice Versioning
  33. Tip #2: Use lifecycle policies • Automatic tiering and cost controls • Includes two possible actions: • Transition: archives to Standard - IA or Amazon Glacier based on object age you specified • Expiration: deletes objects after specified time • Actions can be combined • Set policies at the bucket or prefix level • Set policies for current version or non- current versions Lifecycle policies
  34. Versioning + lifecycle policies
  35. Expired object delete marker policy • Deleting a versioned object makes a delete marker the current version of the object • Removing expired object delete marker can improve list performance • Lifecycle policy automatically removes the current version delete marker when previous versions of the object no longer exist Expired object delete marker
  36. Enable policy with the console Insert console screen shot
  37. Tip #3: Restrict deletes • Bucket policies can restrict deletes • For additional security, enable MFA (multi-factor authentication) delete, which requires additional authentication to: • Change the versioning state of your bucket • Permanently delete an object version • MFA delete requires both your security credentials and a code from an approved authentication device Best Practice
  38. <my_bucket>/2013_11_13-164533125.jpg <my_bucket>/2013_11_13-164533126.jpg <my_bucket>/2013_11_13-164533127.jpg <my_bucket>/2013_11_13-164533128.jpg <my_bucket>/2013_11_12-164533129.jpg <my_bucket>/2013_11_12-164533130.jpg <my_bucket>/2013_11_12-164533131.jpg <my_bucket>/2013_11_12-164533132.jpg <my_bucket>/2013_11_11-164533133.jpg <my_bucket>/2013_11_11-164533134.jpg <my_bucket>/2013_11_11-164533135.jpg <my_bucket>/2013_11_11-164533136.jpg Use a key-naming scheme with randomness at the beginning for high TPS • Most important if you regularly exceed 100 TPS on a bucket • Avoid starting with a date • Consider adding a hash or reversed timestamp (ssmmhhddmmyy) Don’t do this… Tip #4: Distribute key names
  39. Distributing key names Add randomness to the beginning of the key name… <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg
  40. Thank you!
Advertisement