This presentation describes the basics of Amazon S3 service. S3 provides Secure, durable, highly-scalable
object-level storage for storing an unlimited amount of data with each file could be of the size from 0 bytes to 5TB. Data is by default stored across multiple redundant facilities (or availability zones) and it is designed to continue providing access to your files even at the loss of two AZ’s concurrently.
2. HELLO!
I love to share the knowledge, I have gained
You can find me at:
@devopstechie
linkedin.com/in/devopstechie
devopstechie@outlook.com
2
I am Vikas Arora
3. AGENDA
▰ S3 101
▰ What is S3?
▰ Fundamental terminology
▰ Storage classes
▰ Lifecycle Policies
▰ Versioning and CRR
▰ Security and Encryption
▰ S3 Select
3
4. Amazon S3
▰Secure, durable, highly-scalable object level storage
▰Accessible via a simple web service interface
▰Store and retrieve any amount of data
▰Use alone or together with other AWS services
4
5. Bucket
▰ Base building block
▰ Container for objects stored
in S3
Fundamental terminology
Object
▰ Fundamental entity stored
in S3 bucket.
▰ Consists Data and Metadata
▰ Identified using a Key and
Version ID
5
Bucket with
objects
6. STORAGE CLASSES
6
S3 Standard
• Frequent data
access
• Low latency
important
Standard - IA
• Data
infrequently
accessed
• Low latency
important
• Cheaper than
S3 standard
• Retrieval fee
applied
Onezone- IA
• No data
redundancy
• Low latency
• Retrieval fee
applied
S3 - RRS
• Non-critical
data
• Easily
reproducible
Glacier
• Long term
archival
solution
• Retrieval
requires 3-5
hours
• Cheapest
storage
solution
Continued…..
10. SECURITY
10
• Restrict access to S3 at
User, Group and Role levelIAM
• Incorporate restrictions
without using IAM policies
Bucket
Policy
• Grant basic Read/Write
access to other AWS accountsACL
12. S3 SELECT
12
Enables retrieving subset of data from an object
by using simple SQL expressions
Helps retrieving small portion of the object
from large amount of structured data
Supports JSON and CSV file formats
Only supports "Select from" with "where"
condition
13. TOPICS NOT COVERED
13
▰ Transfer Acceleration
▰ Cross Origin Resource Sharing (CORS)
▰ Details of Encryption types
▰ Snowball
▰ Athena
So, Here I am
18 years in Corporate IT industry.
Currently working as Consulting AWS Solutions Architect with a US firm.
Love to share knowledge and so here I am enthusiastically sharing knowledge about S3 with you guys.
You can reach to me for any queries through any of the mentioned methods.
For any professional assignments related queries where I can help your organization, I have left my Visiting card on the table, please help yourself.
We will basically cover two things:
S3 101, where we will cover the basics, some theoretical aspects of S3 like ….
What is S3 all about
What are the fundamental building blocks of S3
Various types of storage classes it provides to suit your needs and budget
Lifecycle policies akin to SDLC in software industry
How versioning helps in preventing accidental deletion and managing multiple versions of same document
How secure S3 is and what are the various encryption mechanisms to safeguard your data from hackers.
We will see the demo for few of these topics as well
Last but not the least we will talk about new service from S3, called S3 select, which helps in retrieving selective data from the pool of GBs of data which is stored in S3 in either csv or json format.
Need a good definition here to kick start the presentation
S3 is an object bases storage solution provided by Amazon. It is used to store pictures, web files, PDFs, your excel, word or presentation files. You cannot install operating system or database on S3. Data is by default stored across multiple redundant facilities (or availability zones) and it is designed to provide continue providing access to your files even at the lose of two AZ’s concurrently.
Secure, durable, highly-scalable object level storage
Secure – Various mechanism to handle security of data. While transmission using SSL and using encryption while data is at rest. Further you can control who can access data using IAM, Bucket policies, Access Control list
Durability of 11, 9’s (Probability is losing 1 object once every 10000 years), availability of 99.99%. object stored redundantly across multiple facilities, means even if AWS lose any of their facility, you data will still be available
automatically scales to high request rates. For example, your application can achieve at least 3,500 PUT/POST/DELETE and 5,500 GET requests
Accessible via a simple web service interface – Self explanatory, we will see that as part of the demo
Store and retrieve any amount of data
Virtually unlimited amount of data can be stored, min file size 0bytes to max size of 5 TB per file
Use alone or together with other AWS services
Can be used as standalone service to store and retrieve data using web interface or Use it with aws services like EC2, Cloudtrail or with serverless technologies like lambda
Any kind of objects are stored in Buckets, equivalent to folders in our hard drivers, but it doesn’t follow hierarchical structure, as it’s not a file system, but an object storage.
Each object is stored with metadata like file type, size, date created etc.
Each object can be uniquely identified by Key and version id, if versioning is enabled.
4 types of storage classes,
S3 standard – Default, for quick storing and retrieval of data. Can Sustain the loss of two facilities concurrently.
IA – For less frequent but quick retrieval, 25-30% cheaper than std. AWS discourage frequent retrieval by applying retrieval fee.
1Z- IA – Data is stored in only 1 zone. Cost is further 20% less than IA.
RRS – Deprecated, aws now doesn’t prefer you to store data in this class, but still available.
Glacier is not part of S3, is an archival system, but is tightly integrated with S3 and doesn’t have separate identity without S3.
Glacier, for long term archival, normally 3-5 hours to retrieval time, though expedited retrieval has now been made available
How to uniquely identify object when versioning is not enabled?
Then the key only is the unique thing about an object
How Cheap is IA than Standard, 1Z-IA than IA, RRS than standard or IA, Glacier
Roughly 20-25% as go from Standard IA Glacier
1Z-IA - 11/9’s durability, 99.5% availability, if zone goes down, you will lose access to your data till it’s back.
RRS – Use only if data can be recreated
Difference between Availability and Durability
Durability should address the question: "Will my Data still be there in the future?"
whereas Availability should address the question: "Can i access my Data right now?"
Durability is probability of losing an object 11 times 9 durability means you have chances of losing 1 object every 10 million years
RRS durability 1 object every 10,000 years
Transition from one storage class to other, can be bound by rules. For e.g….describe the diagram
Minimum time and object size in IA/1Z-IA – 128 KB minimum file size even for RRS as well. Minimum 30 days of storage in both IA and1Z-IA
Confirm if duration is w.r.t file creation date? Yes
If versioning is disabled, version id is set to null
CRR can not be enabled unless versioning is enabled
Why to create an IAM role for CRR enablement?
So that Amazon S3 can assume to perform cross-region replication of objects on your behalf
Demo if time permits
Can you restrict access at object level?
Yes, by assigning tags to objects and then specifying condition in the bucket policy to restrict access to object which have specific tags.
AWS Athena is fully managed analytical service that allows running arbitrary ANSI SQL compliant queries - group by, having, window and geo functions, SQL DDL and DML.
TA – Using Cloudfront, transferring the data to and from edge locations.
CORS- Making data available Cross Origin