Learn about new and existing Amazon S3 features that can help you better protect your data, save on cost, and improve usability, security, and performance. We will cover a wide variety of Amazon S3 features and go into depth on several newer features with configuration and code snippets, so you can apply the learnings on your object storage workloads.
2. Amazon EFS
File
Amazon EBS
Amazon EC2
instance store
Block
Amazon S3 Amazon Glacier
Object
Data transfer
AWS Direct
Connect
Snowball ISV connectors Amazon
Kinesis
Firehose
Transfer
Acceleration
AWS Storage
Gateway
AWS storage services
3. Cross-region
replication
Amazon CloudWatch
metrics for Amazon S3
& AWS CloudTrail
support
VPC endpoint
for Amazon S3
Read-after-write
consistency in all
regions
Event notifications
Amazon S3 bucket
limit increase
Innovation for Amazon S3 (1/2)
4. Innovation for Amazon S3 (2/2)
Amazon S3 Standard-IA
Transfer
Acceleration
Incomplete multipart
upload expiration
Expired object delete
marker
5. Standard
Active data Archive dataActive Archive
Standard - Infrequent Access Amazon Glacier
Choice of storage classes on Amazon S3
6. File sync and share /
consumer file storage
Backup and archive /
disaster recovery
Long retained data
Some use cases have different requirements
7. 11 9s of durability Designed for
99.9% availability
Durable Available
Same throughput as
Amazon S3 Standard storage
High performance
• Server-side encryption
• Use your encryption keys
• KMS-managed encryption keys
Secure
• Lifecycle management
• Versioning
• Event notifications
• Metrics
Integrated
• No impact on user
experience
• Simple REST API
• Single bucket
Easy to use
Standard-Infrequent Access storage
9. Lifecycle policies
Automatic tiering and cost controls
Includes two possible actions:
• Transition: to Standard-IA or Glacier after
specified time
• Expiration: deletes objects after specified time
Allows for actions to be combined
Set policies at the key prefix level
11. Versioning S3 buckets
Protects from accidental overwrites and
deletes
New version with every upload
Easy retrieval of deleted objects and roll back
Three states of an Amazon S3 bucket
• Unversioned (Default)
• Versioning-enabled
• Versioning-suspended
13. Expired object delete marker policy
Deleting a versioned object makes a delete
marker the current version of the object
No storage charge for delete marker
Removing delete marker can improve list
performance
Lifecycle policy to automatically remove the
current version delete marker when previous
versions of the object no longer exist
14. Example lifecycle policy to remove current versions
<LifecycleConfiguration>
<Rule>
...
<Expiration>
<Days>60</Days>
</Expiration>
<NoncurrentVersionExpiration>
<NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
Leverage lifecycle to expire current
and non-current versions
S3 Lifecycle will automatically remove any
expired object delete markers
Expired object delete marker policy
15. Example lifecycle policy for non-current version expiration
Lifecycle configuration with
NoncurrentVersionExpiration action
removes all the noncurrent versions,
<LifecycleConfiguration>
<Rule>
...
<Expiration>
<ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
</Expiration>
<NoncurrentVersionExpiration>
<NoncurrentDays>30</NoncurrentDays>
</NoncurrentVersionExpiration>
</Rule>
</LifecycleConfiguration>
ExpiredObjectDeleteMarker element
removes expired object delete markers.
Expired object delete marker policy
16. Restricting deletes with MFA
Bucket policies can restrict deletes
For additional security, enable MFA (multi-factor
authentication) delete, which requires additional
authentication to:
• Change the versioning state of your bucket
• Permanently delete an object version
MFA delete requires both your security credentials and a
code from an approved authentication device
18. Parallel PUTs with Multipart Uploads
Increase throughput by parallelizing PUTs
Increase resiliency to network errors
Fewer large restarts on error-prone
networks
A balance between part size & number of
parts:
• Small parts increase connection overhead
• Large parts provide less benefits of multipart
19. Incomplete multipart upload expiration policy
Multipart upload feature improves PUT
performance
Partial upload does not appear in bucket list
Partial upload does incur storage charges
Set a lifecycle policy to automatically expire
incomplete multipart uploads after a predefined
number of days
20. Example lifecycle policy
Abort incomplete multipart
uploads seven days after
initiation
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Prefix>SomeKeyPrefix/</Prefix>
<Status>rule-status</Status>
<AbortIncompleteMultipartUpload>
<DaysAfterInitiation>7</DaysAfterInitiation>
</AbortIncompleteMultipartUpload>
</Rule>
</LifecycleConfiguration>
Incomplete multipart upload expiration policy
21. Parallel GETs
Use range-based GETs to get multithreaded
performance when downloading objects
Compensates for unreliable networks
Benefits of multithreaded parallelism
Align your ranges with your parts!
22. Parallel LISTs
Parallelize LIST when you need a sequential
list of your keys
Secondary index to get a faster alternative to
LIST
• Sorting by metadata
• Searchability
• Objects by timestamp
23. Distributing object keys
Most important if you regularly exceed 100 TPS on a
bucket
Distribute keys uniformly across keyspace
Use a key-naming scheme with randomness at the
beginning
26. Distributing object keys
Add randomness to the beginning of the key name…
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
27. Distributing object keys
…so your transactions can be distributed across the partitions
1 2 N
1 2 N
Partition Partition Partition Partition
28. Techniques for distributing keys
Store as a hash:
• 83d02a66a0fee41b5767e4f4dd377d29
Prepend with short hash:
• 83d02013_11_13-164533125.jpg
Reverse:
• 521335461-31_11_3102.jpg
30. AWS Import/Export Snowball
• Accelerate PBs with AWS-
provided appliances
• 80TB and global availability
AWS Storage Gateway
• Up to 120 MB/s cloud upload rate
(4x improvement), and
• 10 Gb networking for VMware
Data ingestion into Amazon S3
Amazon Kinesis Firehose
• Ingest data streams directly into
AWS data stores
AWS Direct Connect
ISV connectors
Transfer Acceleration
• Move data up to 300% faster
using the AWS network
32. Introducing Amazon S3 Transfer Acceleration
Up to 300% faster
Change your endpoint, not
your code
56 global edge locations
No firewall exceptions
No client software required
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
33. Rio De
Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los
Angeles
Seattle Tokyo Singapore
Time[hrs]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is Transfer Acceleration?
S3 Transfer Acceleration
34. Getting Started
1. Enable S3 transfer acceleration on
your S3 bucket
2. Update your application/destination
URL to
<bucket-name>.s3-
accelerate.amazonaws.com
3. Done!
35. How much will it help me?
Use the Amazon S3 Transfer
Acceleration Speed Comparison page:
http://s3-accelerate-speedtest.s3-
accelerate.amazonaws.com/en/accelerate-speed-
comparsion.html
37. Y-cam Solutions Ltd
Confidential and proprietary
Who we are...
Initially used S3
just to store videos
and thumbnails, 6
years ago
120 million
objects
But now we also
use S3 for so much
more
2 million
videos
39. Y-cam Solutions Ltd
Confidential and proprietary
Challenges
Handling the
expiration of videos
Legacy
scripts
Reducing servers,
cutting costs
40. Y-cam Solutions Ltd
Confidential and proprietary
Video Expiration
Create multiple
buckets with
different lifecycle
Improve code to
decide which
bucket to save the
video
41. Y-cam Solutions Ltd
Confidential and proprietary
Legacy Script
Move create
thumbnail and
update DynamoDB
from script to
Lambda function
Extra benefits of
using Lambda
Lambda triggered
by S3 event
notification
42. Y-cam Solutions Ltd
Confidential and proprietary
Future Plans
Reducing number of servers
Servers only serving web app
JS code
Moved this to be hosted
by S3
Reduced cost
Moving towards serverless architecture