In this session, we cover some of the recently announced features. We then talk about using S3 event sources, the various S3 storage classes, cross-region replication, and VPC endpoints.
2. New for 2015…
Cross-region
replication
Amazon S3 Standard-IA
AWS CloudTrail support
for Amazon S3
Amazon CloudWatch
metrics for Amazon S3
VPC endpoint
for Amazon S3
Amazon S3 bucket
limit increase
Event notifications
Read-after-write
consistency in all regions
6. - Thumbnail
- Update Index
- Update WebApp
Video sharing service
Event
MetadataThumbnail
logs
VidShare
7. Amazon S3 event notifications
Events
SNS topic
SQS
queue
Lambda
function
• Notification when objects are
created via PUT, POST, Copy, or
Multipart Upload, DELETE
• Filtering on prefixes and suffixes
for all types of notifications
Fast IntegratedSimple
9. Launch
V2 Optimize on cost
V3 Expand globally
V4 Enterprise enablement
Optimizing VidShare
VidShare
T T+3 days T+5 days T+ 15 days T + 25 days T + 30 days T + 60 days T + 90 days T + 150 days T + 250 days T + 365 days
Access
Frequency
10. Choice of storage classes on Amazon S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
11. 11 9’s of Durability
Standard-Infrequent Access storage
Infrequently accessed data
Designed for
99.9% availability
Durable Available
Same throughput as
Amazon S3 Standard storage
High performance
• Server-side encryption
• Use your encryption keys
• KMS managed encryption keys
Secure
• Lifecycle management
• Versioning
• Event notifications
• Metrics
Integrated
• No impact on user
experience
• Simple REST API
• Single bucket
Easy to use
12. - Transition Standard to Standard-IA
- Transition Standard-IA to Amazon Glacier
storage
- Expiration lifecycle policy
- Versioning support
Standard-Infrequent Access storage
Integrated with lifecycle
Integrated: Lifecycle management
13. Save money on VidShare
Lifecycle policy
Standard Storage -> Standard-IA
<LifecycleConfiguration>
<Rule>
<ID>sample-rule</ID>
<Prefix>documents/</Prefix>
<Status>Enabled</Status>
<Transition>
<Days>30</Days>
<StorageClass>STANDARD-IA</StorageClass>
</Transition>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
</LifecycleConfiguration>
19. Remote replicas managed
by separate AWS accounts
Secure
Distribute data to regional
customers
Lower Latency
Store hundreds of
miles apart
Compliance
Amazon S3 cross-region replication
Automated, fast, and reliable asynchronous replication of data across AWS regions
22. Using Amazon S3 with VPC endpoints – Previously…
mybucket
Internet
PUT S3
PUTS3
NAT
Internet
Gateway
• Public IP on EC2 instances
and IGW
• Private IP on EC2
instances and NAT
Amazon S3 VPC endpoints
Access Amazon S3 from your Amazon VPC using VPC endpoints
24. Improved throughput
from VPC resources to
Amazon S3
High availability
High performance
High availability
Reduce cost by
avoiding expensive
NAT, Internet
gateways
Lower cost
Simple to setup, no
need to manage NATs
and Internet gateways
Simple
Improved security, no
need to route traffic
through the internet
Secure
Amazon S3 VPC endpoints
Access Amazon S3 from your Amazon VPC using VPC endpoints
25. Amazon S3 VPC endpoints
Using Amazon S3 VPC endpoints
• Control access to buckets from specific Amazon
VPC endpoints, or specific VPCs
• Control which VPCs or VPC endpoints have
access to your S3 buckets by using S3 bucket
policies
26. Audit logs Amazon S3
Demonstrate compliance, improve security
Log Amazon S3 API
using AWS CloudTrail
Track bucket-level operations
• Creation and deletion of buckets
• Changes to access control, lifecycle policy, cross
region replications policy etc.
Integrated with Amazon CloudWatch
• Alarm if a specific API called
Configure once per AWS Account
• Track multiple services with AWS CloudTrail
27. Amazon S3 storage metrics
Understand your Amazon S3 buckets
Amazon CloudWatch metrics for Amazon S3
Bucket-level metrics include:
• Total bytes for Standard storage
• Total bytes for Standard-IA storage
• Total bytes for Reduced-Redundancy storage
• Total number of objects for a given S3 bucket
Alarm on S3 metrics
• Set thresholds for alarms
Daily metrics
• Metrics emitted daily, after midnight GMT
We have had a busy year…
- As of July 350 significant services and features
- S3provide the right tools and capabilities to help you get the most out of your data on Amazon S3.
For this session today, we will focus on some of the key new capabilities that we have launched in Amazon S3 this year.
All of these in ADDITION to our continued focus on core fundamentals of:
- High Security,
- Durability,
- Availability and Performance.
Standard-IA:
- To that end we launch a new LOW COST storage class on Amazon S3 we launched just a few weeks ago.
Designed for data that is access infrequently called “Standard-Infrequent-Access”.
This new storage class offers the same great durability and Performance of Amazon S3 Standard Storage class with a slightly lower availability…
Ideal for workloads that are “COLDER” and “less frequently accessed”.
Notifications:
Initiate processing on the objects as they arrive;
Capture information about the objects and
Log it for tracking or security purposes.
These customers have been asking for a reliable and scalable way to be notified when an S3 object is created or overwritten.
Last year we introduced the ability to trigger event notifications when a new objects is added to an Amazon S3 bucket.
Further building on this capability and added the ability to trigger event notifications when objects are deleted from Amazon S3 buckets.
Delete event notifications can be used to add logic within your application or with AWS Lambda to build triggers like a script to clean up associated assets or maintain a separate index of your Amazon S3 objects.
We also added the ability to configure Amazon S3 buckets to
selectively provide event notifications based on object name prefixes and suffixes.
For example, you can choose to receive notifications on object names that start with "images/."
Cross-Region Replication:
Keep you data 100’s of miles apart for compliance and regulatory purposes OR to
Move your data closer to your end users.
Cross-Region Replication to automatically replicate every object uploaded to a particular S3 bucket to a designated destination bucket located in a different AWS region.
VPC Endpoints for S3:
We will also talk about VPC end points for Amazon S3. With VPC endpoints for Amazon S3 you no longer need to use Internet Gateway or manage NAT instances to establish connectivity from within your VPC to Amazon S3.
VPC Endpoints provide a
easy to configure,
reliable and
secure connection to S3 that does not require a gateway or NAT instances.
Plug on Amazon Growth
Amazon S3 is growing continuously. We regularly peak at million of request per second and have trillions of objects. Given our scale one of the things we think about is
HOW CAN WE HELP OUR CUSTOMERS MANAGE the billions and billions of objects they have on S3!
“ How do we help you our customer GET MORE OUT OF THE DATA ON AMAZON S3?”
CloudWatch:
One key component of managing your data is understand WHAT data you have on the platform to begin with? And How is that data being used???
We introduced new storage metrics for Amazon CloudWatch. These free metrics help you understand how your usage of S3 is changing over time. CloudWatch also helps you set alarms on these metrics to get alerts as usage changes.
CloudTrail:
We also introduced the ability to track API calls made to your Amazon S3 buckets using AWS CloudTrail. You can use CloudTrail logs to demonstrate compliance and improve the security of your S3 buckets.
Bucket limit increase:
- Customers such as enterprises, Software as a Service and Web hosting providers told us that in SOME usecases separating customers by buckets enables them to:
- Easily manage billing for their customers by scaling the number of buckets in their AWS accounts.
- Enterprises moving thousands of applications to AWS.
- Separate departments with buckets example… easier billing and management
- Makes administration easier and simpler … for example I may want to setup different lifecycle policies for different customers.
We now allow you to request an increase in your default bucket limit of 100. Simply open a support ticket at the AWS Support Center page to increase your bucket limit.
Read-After-Write consistency for all regions
S3 is a distributed system, S3 is eventually consistent:
A Change committed to S3 will EVENTULLY be visible to all clients.
Example: Delete and update operations… are eventually consistent
Read-after-write Consistency in all regions AND all END POINTS
Big data workload example
We will be diving deep in each of the new capabilities we just saw, HOWEVER instead of just talking through the list of capabilities Baz and I figured it might be fun for us to see how we can leverage these capabilities in a real world example.
So we have decided to leverage Amazon S3 to build a new product called VidShare. VidShare allows friends and families to share short video clips.
We think CUSTOMER EXPERIENCE is critical to the success of our application so we want to ensure our customers see low latency while uploading and playing back videos.
- We want to create thumbnails for clips that our customers can use to play back videos
We want to ensure our design scales to millions of customers.
<MORE…>
We think there's a lot of potential for VidShare and we have identified multiple customer segments.
For launch we want to:
- Ensure we can scale to millions of customers
We decided to focus on the US market at launch and we want to provide our target customers with the best possible user experience.
As part of the initial launch we also want to create thumbnails for all video clips so that customers can discover and playback videos
Once we have shipped the initial version we expect scale quickly so we will optimize our product to reduce operating costs.
We will then shift our focus on global expansion. While staying true to our customer first mission we will ensure our global customers get the best possible experience by moving their data closer to them.
Finally we will expand into a new market segment by making VidShare Enterprise ready.
Now that we know our roadmap and what we want to deliver at launch, lets talk design…
As I mentioned earlier, the experience we want here is that a customer records a video, uploads to Vidshare and shares with another user of the service. We want to create a thumbnail for our video so that other users can identify the video and play it back.
Here’s how this works…
When a customer records and uploads a video clip, VidShare uploads the clip to an Amazon S3 bucket.
We will configure event notifications to trigger a lambda function to process this new clip.
The Lambda function will,
1) Create the thumbnail, add the thumbnail to a different prefix within the bucket
2) The function will also update a dynamoDB index which we will use to map the key for the raw video clip to its respective thumbnail
3) The function will also update the VidShare application to indicate progress
So lets take a step back to talk about the event notifications we plan to use for VidShare…
SNS – Push, email, mobile alerts
SQS – good choice for triggering workflows that pull from queue
Lambda – New service-preview, take code, run in cloud with 0 admin, no instances, auto scale
Let’s talk about the benefits, or what’s in it for you
1st is simplicity – Prev to react to changes, Proxy or Polling, fleets to manage, not particularly efficient, add delays. Notifications make it simple, focus on how app reacts
applications by attaching new functionality driven by events
2nd is speed – if you need processing to occur quickly when new objects arrive, on avg notifications sent in <1 sec
3rd is Integration, building blocks, new service to connect storage in S3 w/ workflows. Emphasize concept of event-based compute: can architect application in new way where blocks of code or workflows are invoked by changes in data. Also a new way to extend existing
Thanks Baz.
Now that we have used Event notifications to trigger our Lambda function to generate and clean up thumbnails, lets check back in with our Roadmap.
So VidShare is going good, we have been growing steadily and have over 1PB of storage and growing !
Lets focus our attention to saving cost without having to compromise on our end user experience. We want to ensure our users continue to see the same great low latency experience we know and love about VidShare. Ideally we also want to minimize any code change so that we can keep focus on features requests from our customers.
After a bit of usage analysis, we have discovered that as clips get older they are not watched nearly as frequently as older videos. In fact our usage data suggests that clips older than 30 days are watched less than once a month on average
This is where the new storage class “Standard-Infrequent Access” can help us.
If you think about the typical lifecycle of data, newly created active data is access very frequently.
In our example take a new video clip you share with your friends and family. People will be consuming this new data actively, this new video will be played back frequently, shared and commented on very frequently.
As this video becomes older, a smaller number of people will engage, it will be LESS FREQUENTLY accessed.
S3 Standard-IA is a new LOW COST storage class on Amazon S3 we launched just a few weeks ago. Designed for data that is access infrequently called “Standard-Infrequent-Access”. This new storage class offers the same great durability and Performance of Amazon S3 Standard Storage class with a slightly lower availability… ideal for workloads that are “COLDER” and “less frequently accessed”.
If you don’t want to think about your data access patterns but just want to high durability, availability and performance for Amazon S3 you can simply select S3 Standard.
For data that is less-frequently accessed, you can leverage Amazon S3 Standard-IA to save on cost while still benefiting from the great durability and performance as S3 Standard.
At some point in time your data will be ready to be archived because no one if actively interacting with your data and you need to archive that away for record keeping etc.
In addition to transitioning your data to S-IA as its characteristics change, you can also leverage Amazon S3 Standard-IA for new data that fits the bill for Infrequently accessed data. For example you can leverage the S-IA storage class to stored detailed applications logs that you analyst in-frequently and save on storage cost.
Point out retry success for Available…
Explain lower Available better and same high durability
…. Add PUT API how the policy is added…
<NoncurrentVersionTransition>
<NoncurrentDays>30</NoncurrentDays>
<StorageClass>GLACIER</StorageClass>
</NoncurrentVersionTransition>
<NoncurrentVersionExpiration>
<NoncurrentDays>180</NoncurrentDays>
</NoncurrentVersionExpiration>
How does application handle transition to Glacier?
Assume each video is 10 MB (0.01 GB), we have 100 million videos = 1 PB of storage
Includes GET request and retrieval cost
Lifecycle transition requests into Standard – Infrequent Access
Data retrievals
Minimum object size 128 KB
We are happy with the growth we are seeing. Our analysis of VidShares usage patterns tell us that a large part of our customer base shares videos back and forth with their friends and family in Asia.
We want to ensure the best possible user experience for these customers, lets see how Cross-Region Replication can help us there.
Even though S3 provides 11 9’s of durability out of a single AWS region, some of our customers were asking us to automate replication of objects between regions to help them achieve their compliance objectives, lower latency and enhance access security.
Low Latency: Certain use cases where the volume of data delivered isn’t high enough to benefit from from use of a CDN, simply replicating your data closer to your end users can provide an improved low latency experience.
Compliance: Some of our customers were replicating their data across different regions to meet internal compliance and best practice guidelines that required them to move data hundreds of miles apart.
Security: Many customers told us they plan to use this feature to enhance access security by replicating data between buckets with separate owners.
With the couple improvements weThe launch of VidShare v1 was a success
Remove specific days… just add time… add note this is an illustration
You can use the Virtual Private Cloud to create a logically isolated section of the AWS Cloud, with full control over a virtual network that you define.
With VPCE
No need to manage IG or NAT instances
We simplified access to S3 resources from within a VPC by introducing the concept of a VPC Endpoint. These endpoints are easy to configure, highly reliable, and provide a secure connection to S3 that does not require a gateway or NAT instances.
EC2 instances running in private subnets of a VPC can now have controlled access to S3 buckets, objects, and API functions that are in the same region as the VPC.
EndPoints:
A VPC endpoint enables you to create a private connection between your VPC and another AWS service without requiring access over the Internet, through a NAT instance, a VPN connection, or AWS Direct Connect. Endpoints are virtual devices. They are horizontally scaled, redundant, and highly available VPC components that allow communication between instances in your VPC and AWS services without imposing availability risks or bandwidth constraints on your network traffic.
NAT:
You can optionally use a network address translation (NAT) instance in a public subnet in your VPC to enable instances in the private subnet to initiate outbound traffic to the Internet, but prevent the instances from receiving inbound traffic initiated by someone on the Internet.
Internet Gateways:
An Internet gateway is a horizontally scaled, redundant, and highly available VPC component that allows communication between instances in your VPC and the Internet.
An Internet gateway serves two purposes:
To provide a target in your VPC route tables for Internet-routable traffic, and
To perform network address translation (NAT) for instances that have been assigned public IP addresses.
Let’s talk about the benefits, or what’s in it for you
1st is Simple, There is no need to configure and manage NATs and Internet Gateways. VPC Endpoints are quick and easy to setup.
2nd is Performance and High Availability– Earlier if you wanted to ensure connectivity to S3 from within your VPC, you needed to configure multiple Internet gateways and NAT instancesß for High Availability for your
3rd is simplicity – Prev to react to changes, Proxy or Polling, fleets to manage, not particularly efficient, add delays. Notifications make it simple, focus on how app reacts
Here is no additional charge for using Amazon Virtual Private Cloud, aside from the normal Amazon EC2 usage charges.- $0.05 per VPN Connection-hour
There is no additional charge for using endpoints
No action to turn on if your already using CloudTrail
We’ve heard from customers that they’d like better insight into S3 capacity usage and performance. As a first stage, we recently started collecting and presenting capacity usage metrics in Cloud Watch, AWS’s centralized monitoring service. This allows you to view S3 metrics in the same application you monitor other services and with the same consistent interface, as well as set alarms when exceeding certain usage thresholds. The S3 CW metrics currently include total bytes for both Standard and RRS storage class, as well as the total number of objects. These totals are updated daily. You can expect us to add additional metrics over time and improve the level of granularity.
Only available in the CW console. Daily basis on each one of your buckets and set alarms.