AWS Storage Tools for Management and Optimization

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Marc Trimuschat, AWS Storage Services
June 13, 2017
Deep Dive on Object Storage
Amazon S3 and Amazon Glacier

Cloud Data Migration
AWS Direct
Connect
AWS Snow*
data transport
family
Third Party
Connectors
Transfer
Acceleration
AWS
Storage
Gateway
Amazon Kinesis
Firehose
The AWS Storage Portfolio
Object
Amazon GlacierAmazon S3
Block
Amazon EBS
(persistent)
Amazon EC2
Instance Store
(ephemeral)
File
Amazon EFS

What to expect from the session
• Pick the right storage class for your use cases
• Automate management tasks
• Best practices to optimize S3 performance
• Tools to help you manage storage
• Storage migration, tiering, bursting

Choice of storage classes on S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier

Storage classes designed for your use case
S3 Standard
• Big data analysis
• Content distribution
• Static website
hosting
Standard - IA
• Backup & archive
• Disaster recovery
• File sync & share
• Long-retained data
Amazon Glacier
• Long term archives
• Digital preservation
• Magnetic tape
replacement

Amazon S3 and Amazon Glacier Durability
4 9s durability
5 9s durability
S3 - IA Amazon
Glacier
11 9s durability

When should you move to Standard-IA?
S3 Analytics – storage class analysis
• Visualize the access pattern on your data over time
• Measure the object age where data is infrequently accessed
• Dive deep by bucket, prefixes, or specific object tag
• Easily create a lifecycle policy based on the analysis

Amazon Glacier
Archival storage for infrequently accessed data
Amazon Glacier
is optimized for
infrequent retrieval
Stop managing
physical media
Even lower cost than
Amazon S3;
same high durability

Amazon Glacier – Data Retrieval Tiers
Standard Retrieval
• Current model
• 3-5 hours
• Disaster recovery
Bulk Retrieval
• Batch/Bulk access
• 5-12 hours
• PB scale re-transcoding
or video/image analysis
Expedited Retrieval
• Emergency access
• 1-5 minutes
• Last minute play-out
schedule swap
$0.03/GB $0.01/GB $0.0025/GB
On-site tape replacement Off-site tape replacement

S3 Data Lake Example: FINRA
Use Case
• Quickly ingest data from many sources, and store it efficiently in one location
• Multiple analytics/processing tools (e.g., Amazon Athena, Amazon EMR, Amazon Redshift) to
speed time to value
• Migrate on-premises data warehouses, Hadoop & big data clusters to AWS
Value Proposition
• Decouple storage and compute—scale optimally and cost efficiently
• Eliminate silos—centrally manage, govern and access all data
• Use the right analytic tools for the job - evolve as requirements change - no data migration
FINRA TCO
• Analyzes & stores 75 billion events per day with S3, EMR & Amazon Redshift
• Securely stores 5PB of historical data on S3 for deeper ad hoc analytics
• Increased agility and speed to results
• Estimated savings of $10-20M per year over previous on-premise solution

Amazon Glacier Example:
Satellite Image Archive
• DigitalGlobe takes satellite imagery of the Earth
• 100 PB image library = 6 billion square kilometers
• 1 PB new image every year
• Images to be archived and retained for decades

 Pick the right storage class for your use cases
 Automate management tasks
• Best practices to optimize S3 performance

Visualize access pattern on your data

Export S3 analytics to the tools of your choice

Automate data management
lifecycle policies
• Automatic tiering and cost controls
• Includes two possible actions:
• Transition: archives to Standard - IA or Amazon
Glacier based on object age you specified
• Expiration: deletes objects after specified time
• Actions can be combined
• Set policies by bucket, prefix, or tags
• Set policies for current version or non-
current versions
Lifecycle policies

Set up a lifecycle policy on the AWS Management Console

Protect your data from accidental deletes
• Protects from unintended user deletes or
application logic failures
• New version with every upload
• Easy retrieval of deleted objects and
rollback to previous versions
Best Practice
Versioning

Automate with trigger-based workflow
Amazon S3 event notifications
Events
SNS topic
SQS
queue
Lambda
function
• Notification when objects are
created via PUT, POST, Copy,
Multipart Upload, or DELETE
• Filter on prefixes and suffixes
• Trigger workflow with Amazon
SNS, Amazon SQS, and AWS
Lambda functions

Cross-region replication
Automated, fast, and reliable asynchronous replication of data across AWS regions
Use cases:
• Compliance – store data hundreds of miles apart
• Lower latency – distribute data to regional customers
• Security – create remote replicas managed by separate AWS accounts
How it works:
• Only replicates new PUTs. Once configured, all new uploads into source
bucket will be replicated
• Entire bucket or prefix based
• 1:1 replication between any 2 regions
• Versioning required
• Deletes and lifecycle actions are not replicated

Summary – automate management tasks
Cross-region
replication
Automate transition
and expiration with
lifecycle policies
Trigger-based
workflow with
event notification
Easily recover from
accidental delete
with versioning

Topics
 Automate management tasks
 Best practices to optimize S3 performance

Faster upload of large objects
Parallelize PUTs with multipart uploads
• Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
• Move the bottleneck to the network,
where it belongs
• Increase resiliency to network errors;
fewer large restarts on error-prone
networks
Best Practice

Faster download
You can parallelize GETs as well as PUTs
GET /example-object HTTP/1.1
Host: example-bucket.s3.amazonaws.com
x-amz-date: Fri, 28 Jan 2016 21:32:02 GMT
Range: bytes=0-9
Authorization: AWS AKIAIOSFODNN7EXAMPLE:Yxg83MZaEgh3OZ3l0rLo5RTX11o=
For large objects, use range-based GETs
align your get ranges with your parts
For content distribution, enable Amazon CloudFront
• Caches objects at the edge
• Low latency data transfer to end user

<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
Use a key-naming scheme with randomness at the beginning for high TPS
• Most important if you regularly exceed 100 TPS on a bucket
• Avoid starting with a date or monotonically increasing numbers
Don’t do this…
Higher TPS by distributing key names

Distributing key names
Add randomness to the beginning of the key name
with a hash or reversed timestamp (ssmmhhddmmyy)
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg

Best practices – performance
 Faster upload over long distances
with S3 Transfer Acceleration
 Faster upload for large objects
with S3 multipart upload
 Optimize GET performance with
Range GET and CloudFront
 SQL query on S3 with Athena
 Distribute key name for high TPS
workload
 TCP window scaling for long, fat
networks
 TCP SACK for fast, lossy
connections like mobile

Topics
 Best practices to optimize S3 performance
 Tools to help you manage storage

Organize your data with object tags
Manage data based on what it is as opposed to where its located
• Classify your data, up to 10 tags per object
• Tag your objects with key-value pairs
• Write policies once based on the type of data
• Put object with tag or add tag to existing objects
Storage Metrics
& Analytics
Lifecycle PolicyAccess Control

Manage access with object tags
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition": {"StringEquals": {"s3:RequestObjectTag/Project": "X"}}
}
]
}
User permission by tags

Manage access: restrict deletes
• Bucket policies can restrict deletes
• For additional security, enable MFA (multi-factor
authentication) delete, which requires additional
authentication to:
• Change the versioning state of your bucket
• Permanently delete an object version
• MFA delete requires both your security credentials and a
code from an approved authentication device
Best Practice

Use cases:
• Perform security analysis
• Meet your IT auditing and compliance needs
• Take immediate action on activity
How it works:
• Capture S3 object-level requests
• Enable at the bucket level
• Logs delivered to your S3 bucket
• $0.10 per 100,000 data events
Audit and monitor access
AWS CloudTrail data events

Monitor performance and operation
Amazon CloudWatch metrics for S3
• Generate metrics for data of your choice
• Entire bucket, prefixes, and tags
• Up to 1,000 groups per bucket
• 1-minute CloudWatch metrics
• Alert and alarm on metrics
• $0.30 per metric per month

CloudWatch Metrics for S3
Metric Name value
AllRequests Count
PutRequests Count
GetRequests Count
ListRequests Count
DeleteRequests Count
HeadRequests Count
PostRequests Count
Metric Name value
BytesDownloaded MB
BytesUploaded MB
4xxErrors Count
5xxErrors Count
FirstByteLatency ms
TotalRequestLatency ms

S3 Inventory
Save time Daily or weekly delivery Delivery to S3 bucketCSV file output
Use case: trigger business workflows and applications such as secondary index garbage collection, data
auditing, and offline analytics
• More information about your objects than provided by LIST API such as replication status, multipart
upload flag and delete marker
• Simple pricing: $0.0025 per million objects listed

S3 Inventory
Eventually consistent rolling snapshot
• New objects may not be listed
• Removed objects may still be included
Name Value Type Description
Bucket String Bucket name. UTF-8 encoded.
Key String Object key name. UTF-8 encoded.
Version Id String Version Id of the object
Is Latest boolean true if object is the latest version (current version) of a versioned object, otherwise false
Delete Marker boolean true if object is a delete marker of a versioned object, otherwise false
Size long Object size in bytes
Last Modified String Last modified timestamp. Format in ISO: YYYY-MM-DDTHH:mm:ss.SSSZ
ETag String E Tag in HEX encoded format
StorageClass String Valid values: STANDARD, REDUCED_REDUNDANCY, GLACIER, STANDARD_IA. UTF-8 encoded.
Multipart Uploaded boolean true if object is uploaded by using multipart, otherwise false
Replication Status String Valid values: REPLICA, COMPLETED, PENDING, FAILED. UTF-8 encoded.
Validate before you act!
• Use HEAD OBJECT

Pulling it all together
• SaaS security & compliance solution
• Built on AWS
• On-premises NAS migration to S3, Amazon Glacier
• Multi-PB migration
• Performant and scalable multi-tenant storage
• 1.7 PB per month
• 4000 customers
• Leverage new AWS storage features
• S3 tagging for scalability, lifecycle management
• S3 analytics to understand data access patterns
• Cross-region replication
• Glacier expedited retrieval for multi-region availability
“We couldn’t have
completed the
migration, optimized
performance, cost
optimized via lifecycle &
classified our (4000+)
customers without using
S3’s new analytics,
tagging and lifecycle
features

Topics
 Best practices to optimize S3 performance
 Tools to help you manage storage
 Storage migration, tiering, bursting

Data has gravity and underpins all workloads
…it’s easier to move processing to the data
4k/8k
Genomics
Seismic
Financial
Logs
IoT
1 PB over the Internet: 22 years
100 PB over T3: 609 years
100 PB over 1Gbps DX: 27 years

AWS Direct Connect AWS Snowball ISV Connectors
Amazon Kinesis
Firehose
S3 Transfer
Acceleration
AWS Storage
Gateway
Data transfer into Amazon S3
AWS Snowmobile
AWS Snowball Edge

Faster upload over long distances
S3 Transfer Acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Change your endpoint, not your code
No firewall changes or client software
Longer distance, larger files, more benefit
Faster or free
Global edge locations

Rio De
Janeiro
Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los
Angeles
Seattle Tokyo Singapore
Time[hrs.]
500 GB upload from these edge locations to a bucket in Singapore
Public Internet
How fast is S3 Transfer Acceleration?
S3 Transfer Acceleration
The longer the distance,
the larger the file
 more benefit
Try it at s3speedtest.com

How Snowball moves data into and out of AWS
Create
a job
Connect the
Snowball
Copy data to
the Snowball
Your data
moved to
Amazon S3
In transit to you Delivered to you Delivered to AWS At AWS
Job created Job completed

AWS Snowball Edge
Petabyte-scale hybrid device with onboard compute and storage
• 100 TB local storage
• Local compute equivalent to an Amazon
EC2 m4.4xlarge instance
• 10GBase-T, 10/25Gb SFP28, and 40Gb
QSFP+ copper, and optical networking
• Ruggedized and rack-mountable
RE:INVENT 2016 LAUNCH

Collect data Create job Copy data Moved to S3
Hybrid capabilities beyond data migrationMIGRATIONCOLLECTION
Create job Copy data Moved to S3

Case Study: Oregon State University
Use case:
• Collect and analyze oceanic and coastal images
• 60 TB of data per week
• Environmental and ocean ecosystem research
Architecture before Snowball:
• Transferred data with many small hard drives
• Used to take weeks to months to upload data
• $4MM+ in infrastructure investment
• Expensive and inefficient
Snowball lets OSU migrate TBs of data in days at a fraction of the cost

Storage Gateway:
Augmenting existing on-prem storage with cloud

AWS Storage Gateway hybrid storage solutions
Use standard storage protocols to access AWS storage services
AWS Storage
Gateway
Amazon EBS
snapshots
Amazon S3
Amazon Glacier
AWS Identity and Access
Management (IAM)
AWS Key Management
Service (KMS)
AWS
CloudTrail
Amazon
CloudWatch
Files
Volumes
Tapes
On-premises
AWSCloud

Enabling cloud workloads
Move data to AWS storage for big data, cloud bursting, or migration
“Storage Gateway has the promise to transform the way we move
data into the cloud. The NFS interface lets us easily integrate data
files from analytical instruments, and the transparent S3 storage
lets us easily connect our cloud-based applications and leverage the
powerful storage capabilities of S3.
With Storage Gateway, we can now unleash the full power of AWS
on our instrument data.”

Backup, archive, and disaster recovery
Cost effective storage in AWS with local or cloud restore
“Tapes are a headache, prone to hardware
failures, offsite storage costs, and constant
maintenance needs. Storage Gateway
provided the most cost-effective and simple
alternative. We even got disaster recovery by
using a bicoastal data center.”

Amazon Storage Partner Solutions
aws.amazon.com/backup-recovery/partner-solutions/
Note: Represents a sample of storage partners
Backup and RecoveryPrimary Storage Archive
Solutions that leverage file, block, object,
and streamed data formats as an
extension to on-premises storage
Solutions that leverage Amazon S3 for
durable data backup
Solutions that leverage Amazon
Glacier for durable and cost-effective
long-term data backup

AWS Storage Tools for Management and Optimization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS Storage Tools for Management and Optimization

Similar to AWS Storage Tools for Management and Optimization (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS Storage Tools for Management and Optimization