AWS Storage Options
and their anti-patterns
Senior Engineer:
Cloud, DevOps and Security
Slalom Consulting
10+ years in the IT space
Studied Database Administration
Started as a UNIX/Linux administrator
Worked at: Citigroup, Indigo, Bluecat Networks
In Cloud since 2014 (AWS)
Last 6 months, started dabbling in Google Cloud (GCP)
Passionate about cloud architecture, event driven
security and cloud Best Practices.
FSx for Lustre/Windows
Key Features
Durability
Availability
Scalability
Elasticity
Security
S3
Amazon Simple Storage Service (Amazon
S3) is an object storage service that
offers industry-leading scalability, data
availability, security, and performance. S3
is easy to use with a simple web interface
to store and retrieve any amount of data
from anywhere on the web.
Most inexpensive storage:
Standard - $0.023/GB
Standard-IA - $0.0125/GB
One Zone-IA - $0.01/GB
Key Features
● Server-side latencies severely
reduced
● Multipart upload on files >100mb
● Amazon CloudSearch, DynamoDB or
RDS used for indexing (metadata)
● S3 Transfer Acceleration - Cloudfront
Durability and Availability
● Error correction is built-in, and there
are no single points of failure
● Sustain the concurrent loss of data in
two facilities
● 11 nines (99.999999999%) of
durability and 4 nines (99.99%) of
availability
● Cross-region Replication
Scalability and Elasticity
● Virtually unlimited number of files in
any bucket
● Automatically manage scaling and
distributing redundant copies of your
information to other servers in other
locations in the same region
Security
● Fine-grained access policies
● SSE (data at rest)
● Versioning
● MFA Delete
● Access Logging
Anti-Patterns
● File system
● Structured data with query
● Rapidly changing data
● Archival data
● Dynamic website hosting
Glacier
Amazon Glacier is an extremely low-cost
storage service (object store) that
provides highly secure, durable, and
flexible storage for data archiving and
online backup. Allows you to query data in
place and retrieve only the subset of data
you need from within an archive. Archives
are then stored in Vaults.
Most inexpensive backup on the market:
$0.004/GB
Key Features
● 3 different retrieval options:
1 - 5 minutes
3 - 5 hours
5 - 12 hours
● Multipart upload for archives up to
about 40 TB
● Range retrievals on archives
Durability and Availability
● Systematic data integrity checks on
archives
● Data distributed across 3 AZ’s
● 11 nines (99.999999999%) of
durability and 4 nines (99.99%) of
availability
● Synchronously stores your data
across multiple facilities before
returning SUCCESS on uploading an
archive
Scalability and Elasticity
● Single archive is limited to 40 TB in
size
● NO LIMITS!!
Security
● Fine-grained access policies
● SSE (data at rest)
● Vault Lock policies available (WORM)
allowing data to become immutable
● Access logging
Anti-Patterns
● Rapidly changing data
● Immediate access
EFS
Amazon Elastic File System provides a simple,
scalable, elastic file system for Linux-based
workloads (NFS) for use with AWS Cloud services and
on-premises resources.
It is also highly available and highly durable because
it stores data and metadata across multiple
Availability Zones in a Region. Designed for
applications that concurrently access data from
multiple EC2 instances and that require substantial
levels of aggregate throughput and input/output
operations per second (IOPS).
Key Features
● 2 Performance Modes:
● General Purpose - Default mode and
built for most workloads
● Optimized to burst at high-
throughput levels for short periods of
time (burst credits)
● Max I/O - Performance Mode
● Many EC2 instances are accessing
the file system
Durability and Availability
● Highly durable and highly available
● Each file system object (such as a
directory, file, or link) is redundantly
stored across multiple Availability
Zones within a Region
● Designed to be as highly durable and
available as Amazon S3
Scalability and Elasticity
● Highly Scalable
● No provisioning, allocating, or
administration
● Petabyte Scale
Security
● 3 levels of access control
● IAM permissions for API calls (users
or roles)
● Security groups for EC2 instances
and mount targets - enforcing rules
that define the traffic flow
● NFS level users, groups, and
permissions
● Encryption for data at rest (KMS) and
in transit (TLS)
Anti-Patterns
● Archival Data
● Relational database storage
● Temporary storage
FSX
Amazon FSX for Lustre/Windows is a fully managed
file system that is optimized for compute-intensive
workloads, such as high performance computing,
machine learning, and media data processing
workflows. These workloads commonly require data
to be presented via a fast and scalable file system
interface, and typically have data sets stored on long-
term data stores like Amazon S3.
Key Features
● Storage for high-performance workloads
● Seamless access to S3 or on-premises
data
● Low latencies and high throughput
● Posix Compliant (Lustre)
● SMB protocol and Windows NTFS,
Active Directory (AD) integration, and
Distributed File System (DFS) (Windows)
Durability and Availability
● Replicates your data within the
Availability Zone (AZ) it resides in to
protect it from component failure
● Continuously monitors for hardware
failures
● Automatically replaces infrastructure
components in the event of a failure
● Highly durable backups (stored in S3)
of your file system daily using
Windows’s Volume Shadow Copy
Service (Windows)
Scalability and Elasticity
● 200MBps of baseline throughput per
TB of storage provisioned
● The size and performance of your file
system is determined when you
create the file system
● Tens of thousands of compute
instances can connect to a
filesystem
Security
● File systems are encrypted at rest
with keys managed by the service.
Data is encrypted using an XTS-AES-
256 block cipher (Lustre)
● File systems are encrypted at rest
with keys managed by KMS. It
encrypts data-in-transit using SMB
Kerberos session keys (Windows)
● PCI-DSS and HIPAA compatible
Anti-Patterns
● Rapidly changing data
● Immediate access
EBS
Amazon Elastic Block Store (Amazon
EBS) volumes provide durable block-level
storage for use with EC2 instances.
Amazon EBS volumes are network-
attached storage that persists
independently from the running life of a
single EC2 instance. After an EBS volume
is attached to an EC2 instance, you can
use the EBS volume like a physical hard
drive. EBS is designed for workloads that
require persistent storage accessible by
single EC2 instances.
Performance
● General Purpose SSD (gp2) - GP2 is
the default EBS volume type for
Amazon EC2 instances
● Provisioned IOPS SSD (io1) - critical,
I/O intensive database and
application workloads, as well as
throughput-intensive database and
data warehouse workloads
● Throughput Optimized HDD (st1) -
throughput intensive workloads with
large datasets
● Cold HDD (sc1) - less frequently
accessed large, cold datasets
Durability and Availability
● Volume data is replicated across
multiple servers in a single
Availability Zone
● Annual failure rate (AFR) of between
0.1% - 0.2%
● Snapshots are recommended -
Stored on S3 (incremental)
● All EBS volume snapshot capabilities
are designed for 99.999% availability.
Scalability and Elasticity
● Attach multiple volumes to one
instance
● Resizing volumes are done by using a
snapshot
Security
● Encryption at rest of EBS data
volumes, boot volumes and
snapshots
● Backed by KMS
● Encryption in transit occurs on the
servers that host EC2 instances
(between the host and the volume)
Anti-Patterns
● Temporary storage
● Multi-instance storage
● Highly durable storage
● Static data or web content
Instance Store Volumes
Amazon EC2 instance store volumes
(also called ephemeral drives) provide
temporary block-level storage for many
EC2 instance types. This storage is
located on disks that are physically
attached to the host computer. Ideal for
temporary storage of data that changes
frequently, such as buffers, caches,
scratch data, and other temporary
content, or for data that is replicated
across a fleet of instances.
Features
● SSD-Backed Storage-Optimized -
NoSQL databases, like Cassandra
and MongoDB, scale out
transactional databases, data
warehousing, Hadoop, and cluster file
systems.
● HDD-Backed Dense-Storage -
Massively Parallel Processing (MPP)
data warehousing, MapReduce and
Hadoop distributed computing,
distributed file systems, network file
systems, log or data-processing
applications
Durability and Availability
● Persists only during the life of the
associated EC2 instance
● If the EC2 instance is stopped and
restarted, terminates, or fails, all data
on the instance store volumes is lost
Scalability and Elasticity
● Storage capacity of Amazon EC2
local instance store volumes are
fixed and defined by the instance
type
● Scale the total amount of instance
store up or down by increasing or
decreasing the number of running
EC2 instances.
Security
● Instance store volumes can only be
mounted and accessed by the EC2
instances they belong to
● When you stop or terminate an
instance, the applications and data in
its instance store are erased, so no
other instance can have access to
the instance store in the future
Anti-Patterns
● Persistent storage
● Relational database storage
● Shared storage
● Snapshots
Storage Gateway
AWS Storage Gateway is a hybrid storage service that
enables your on-premises applications to seamlessly
integrate between an organization’s on-premises IT
environment and the AWS storage infrastructure. The
service enables you to securely store data in the AWS
Cloud for scalable and cost-effective storage. Storage
Gateway can be used for backup and archiving,
disaster recovery, cloud data processing, storage
tiering, and migration. It also supports industry
standard storage protocols that work with existing
applications.
Features
● NFS, SMB, iSCSI, or iSCSI-VTL
protocols used to interact with
existing apps.
● Fully managed cache
● Multi-part management, automatic
buffering, and delta transfers are
used across all gateway types, and
data compression is applied for all
block and virtual tape data.
Features
● 4 Gateway types
● File Gateway - file interface that
enables you to store files as objects
in Amazon S3 using NFS and SMB
● Gateway-cached volumes - storing a
portion of your data locally cached
for frequently accessed data
● Gateway-stored volumes - store your
primary data locally, while
asynchronously backing up to AWS
● Gateway-VTL - virtual tape library
that is able to communicate with your
backup software.
Durability and Availability
● 11 nines (99.999999999%) of
durability and 4 nines (99.99%) of
availability (due to data being stored
in S3/Glacier)
Scalability and Elasticity
● NO LIMITS!! (due to data being stored
in S3/Glacier)
Security
● Fine-grained access policies
● Encryption of data in transit to and
from (TLS)
● Encryption at rest (AES256)
● Between iSCSI devices - Challenge-
Handshake Authentication Protocol
(CHAP)
Anti-Patterns
● Designed specifically for backup,
disaster recovery and mirroring data
to AWS
Snowball
AWS Snowball accelerates moving large
amounts of data into and out of AWS
using secure Snowball appliances. Ideal
for transferring anywhere from terabytes
to many petabytes of data in and out of
the AWS Cloud securely.
This is especially beneficial in cases
where you don’t want to make expensive
upgrades to your network infrastructure
or in areas where high-speed Internet
connections are not available or cost
prohibitive.
Key Features
● Transferring terabytes to many
petabytes of data in and out of the
AWS Cloud securely
● Ideal for transferring anywhere from
terabytes to many petabytes of data
in and out of the AWS Cloud securely.
● 80 TB of data can be transferred
from your data source to the
appliance in 2.5 days
Durability and Availability
● Once the data is imported to AWS -
11 nines (99.999999999%) of
durability and 4 nines (99.99%) of
availability
Scalability and Elasticity
● Each AWS Snowball appliance is
capable of storing 50 TB or 80 TB of
data
● Use multiple appliances in parallel to
transfer more in the same time.
Security
● Fine-grained access policies (IAM)
● All data loaded onto a Snowball
appliance is encrypted using 256-bit
encryption (KMS)
● Snowball is physically secured by
using an industry- standard Trusted
Platform Module (TPM) that uses a
dedicated processor designed to
detect any unauthorized
modifications to the hardware,
firmware, or software.
Anti-Patterns
● If data can be transferred over the
Internet in less than one week
Honorable Mention
Amazon CloudFront is a content-delivery
web service that speeds up the
distribution of your website’s dynamic,
static, and streaming content by making it
available from a global network of edge
locations.
Questions
References
AWS Storage Services Overview link
Cloud Storage with AWS link
Amazon S3 link
Amazon Glacier link
Amazon Elastic File System (EFS) link
Amazon Elastic Block Store link
Amazon EC2 Instance Store link
Amazon Storage Gateway link
Amazon FSX for Lustre/Windows

Aws storage options

  • 1.
    AWS Storage Options andtheir anti-patterns
  • 2.
    Senior Engineer: Cloud, DevOpsand Security Slalom Consulting 10+ years in the IT space Studied Database Administration Started as a UNIX/Linux administrator Worked at: Citigroup, Indigo, Bluecat Networks In Cloud since 2014 (AWS) Last 6 months, started dabbling in Google Cloud (GCP) Passionate about cloud architecture, event driven security and cloud Best Practices.
  • 3.
  • 4.
  • 6.
    S3 Amazon Simple StorageService (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. S3 is easy to use with a simple web interface to store and retrieve any amount of data from anywhere on the web. Most inexpensive storage: Standard - $0.023/GB Standard-IA - $0.0125/GB One Zone-IA - $0.01/GB
  • 7.
    Key Features ● Server-sidelatencies severely reduced ● Multipart upload on files >100mb ● Amazon CloudSearch, DynamoDB or RDS used for indexing (metadata) ● S3 Transfer Acceleration - Cloudfront
  • 8.
    Durability and Availability ●Error correction is built-in, and there are no single points of failure ● Sustain the concurrent loss of data in two facilities ● 11 nines (99.999999999%) of durability and 4 nines (99.99%) of availability ● Cross-region Replication
  • 9.
    Scalability and Elasticity ●Virtually unlimited number of files in any bucket ● Automatically manage scaling and distributing redundant copies of your information to other servers in other locations in the same region
  • 10.
    Security ● Fine-grained accesspolicies ● SSE (data at rest) ● Versioning ● MFA Delete ● Access Logging
  • 11.
    Anti-Patterns ● File system ●Structured data with query ● Rapidly changing data ● Archival data ● Dynamic website hosting
  • 13.
    Glacier Amazon Glacier isan extremely low-cost storage service (object store) that provides highly secure, durable, and flexible storage for data archiving and online backup. Allows you to query data in place and retrieve only the subset of data you need from within an archive. Archives are then stored in Vaults. Most inexpensive backup on the market: $0.004/GB
  • 14.
    Key Features ● 3different retrieval options: 1 - 5 minutes 3 - 5 hours 5 - 12 hours ● Multipart upload for archives up to about 40 TB ● Range retrievals on archives
  • 15.
    Durability and Availability ●Systematic data integrity checks on archives ● Data distributed across 3 AZ’s ● 11 nines (99.999999999%) of durability and 4 nines (99.99%) of availability ● Synchronously stores your data across multiple facilities before returning SUCCESS on uploading an archive
  • 16.
    Scalability and Elasticity ●Single archive is limited to 40 TB in size ● NO LIMITS!!
  • 17.
    Security ● Fine-grained accesspolicies ● SSE (data at rest) ● Vault Lock policies available (WORM) allowing data to become immutable ● Access logging
  • 18.
    Anti-Patterns ● Rapidly changingdata ● Immediate access
  • 20.
    EFS Amazon Elastic FileSystem provides a simple, scalable, elastic file system for Linux-based workloads (NFS) for use with AWS Cloud services and on-premises resources. It is also highly available and highly durable because it stores data and metadata across multiple Availability Zones in a Region. Designed for applications that concurrently access data from multiple EC2 instances and that require substantial levels of aggregate throughput and input/output operations per second (IOPS).
  • 21.
    Key Features ● 2Performance Modes: ● General Purpose - Default mode and built for most workloads ● Optimized to burst at high- throughput levels for short periods of time (burst credits) ● Max I/O - Performance Mode ● Many EC2 instances are accessing the file system
  • 22.
    Durability and Availability ●Highly durable and highly available ● Each file system object (such as a directory, file, or link) is redundantly stored across multiple Availability Zones within a Region ● Designed to be as highly durable and available as Amazon S3
  • 23.
    Scalability and Elasticity ●Highly Scalable ● No provisioning, allocating, or administration ● Petabyte Scale
  • 24.
    Security ● 3 levelsof access control ● IAM permissions for API calls (users or roles) ● Security groups for EC2 instances and mount targets - enforcing rules that define the traffic flow ● NFS level users, groups, and permissions ● Encryption for data at rest (KMS) and in transit (TLS)
  • 25.
    Anti-Patterns ● Archival Data ●Relational database storage ● Temporary storage
  • 27.
    FSX Amazon FSX forLustre/Windows is a fully managed file system that is optimized for compute-intensive workloads, such as high performance computing, machine learning, and media data processing workflows. These workloads commonly require data to be presented via a fast and scalable file system interface, and typically have data sets stored on long- term data stores like Amazon S3.
  • 28.
    Key Features ● Storagefor high-performance workloads ● Seamless access to S3 or on-premises data ● Low latencies and high throughput ● Posix Compliant (Lustre) ● SMB protocol and Windows NTFS, Active Directory (AD) integration, and Distributed File System (DFS) (Windows)
  • 29.
    Durability and Availability ●Replicates your data within the Availability Zone (AZ) it resides in to protect it from component failure ● Continuously monitors for hardware failures ● Automatically replaces infrastructure components in the event of a failure ● Highly durable backups (stored in S3) of your file system daily using Windows’s Volume Shadow Copy Service (Windows)
  • 30.
    Scalability and Elasticity ●200MBps of baseline throughput per TB of storage provisioned ● The size and performance of your file system is determined when you create the file system ● Tens of thousands of compute instances can connect to a filesystem
  • 31.
    Security ● File systemsare encrypted at rest with keys managed by the service. Data is encrypted using an XTS-AES- 256 block cipher (Lustre) ● File systems are encrypted at rest with keys managed by KMS. It encrypts data-in-transit using SMB Kerberos session keys (Windows) ● PCI-DSS and HIPAA compatible
  • 32.
    Anti-Patterns ● Rapidly changingdata ● Immediate access
  • 34.
    EBS Amazon Elastic BlockStore (Amazon EBS) volumes provide durable block-level storage for use with EC2 instances. Amazon EBS volumes are network- attached storage that persists independently from the running life of a single EC2 instance. After an EBS volume is attached to an EC2 instance, you can use the EBS volume like a physical hard drive. EBS is designed for workloads that require persistent storage accessible by single EC2 instances.
  • 35.
    Performance ● General PurposeSSD (gp2) - GP2 is the default EBS volume type for Amazon EC2 instances ● Provisioned IOPS SSD (io1) - critical, I/O intensive database and application workloads, as well as throughput-intensive database and data warehouse workloads ● Throughput Optimized HDD (st1) - throughput intensive workloads with large datasets ● Cold HDD (sc1) - less frequently accessed large, cold datasets
  • 36.
    Durability and Availability ●Volume data is replicated across multiple servers in a single Availability Zone ● Annual failure rate (AFR) of between 0.1% - 0.2% ● Snapshots are recommended - Stored on S3 (incremental) ● All EBS volume snapshot capabilities are designed for 99.999% availability.
  • 37.
    Scalability and Elasticity ●Attach multiple volumes to one instance ● Resizing volumes are done by using a snapshot
  • 38.
    Security ● Encryption atrest of EBS data volumes, boot volumes and snapshots ● Backed by KMS ● Encryption in transit occurs on the servers that host EC2 instances (between the host and the volume)
  • 39.
    Anti-Patterns ● Temporary storage ●Multi-instance storage ● Highly durable storage ● Static data or web content
  • 41.
    Instance Store Volumes AmazonEC2 instance store volumes (also called ephemeral drives) provide temporary block-level storage for many EC2 instance types. This storage is located on disks that are physically attached to the host computer. Ideal for temporary storage of data that changes frequently, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances.
  • 42.
    Features ● SSD-Backed Storage-Optimized- NoSQL databases, like Cassandra and MongoDB, scale out transactional databases, data warehousing, Hadoop, and cluster file systems. ● HDD-Backed Dense-Storage - Massively Parallel Processing (MPP) data warehousing, MapReduce and Hadoop distributed computing, distributed file systems, network file systems, log or data-processing applications
  • 43.
    Durability and Availability ●Persists only during the life of the associated EC2 instance ● If the EC2 instance is stopped and restarted, terminates, or fails, all data on the instance store volumes is lost
  • 44.
    Scalability and Elasticity ●Storage capacity of Amazon EC2 local instance store volumes are fixed and defined by the instance type ● Scale the total amount of instance store up or down by increasing or decreasing the number of running EC2 instances.
  • 45.
    Security ● Instance storevolumes can only be mounted and accessed by the EC2 instances they belong to ● When you stop or terminate an instance, the applications and data in its instance store are erased, so no other instance can have access to the instance store in the future
  • 46.
    Anti-Patterns ● Persistent storage ●Relational database storage ● Shared storage ● Snapshots
  • 48.
    Storage Gateway AWS StorageGateway is a hybrid storage service that enables your on-premises applications to seamlessly integrate between an organization’s on-premises IT environment and the AWS storage infrastructure. The service enables you to securely store data in the AWS Cloud for scalable and cost-effective storage. Storage Gateway can be used for backup and archiving, disaster recovery, cloud data processing, storage tiering, and migration. It also supports industry standard storage protocols that work with existing applications.
  • 49.
    Features ● NFS, SMB,iSCSI, or iSCSI-VTL protocols used to interact with existing apps. ● Fully managed cache ● Multi-part management, automatic buffering, and delta transfers are used across all gateway types, and data compression is applied for all block and virtual tape data.
  • 50.
    Features ● 4 Gatewaytypes ● File Gateway - file interface that enables you to store files as objects in Amazon S3 using NFS and SMB ● Gateway-cached volumes - storing a portion of your data locally cached for frequently accessed data ● Gateway-stored volumes - store your primary data locally, while asynchronously backing up to AWS ● Gateway-VTL - virtual tape library that is able to communicate with your backup software.
  • 51.
    Durability and Availability ●11 nines (99.999999999%) of durability and 4 nines (99.99%) of availability (due to data being stored in S3/Glacier)
  • 52.
    Scalability and Elasticity ●NO LIMITS!! (due to data being stored in S3/Glacier)
  • 53.
    Security ● Fine-grained accesspolicies ● Encryption of data in transit to and from (TLS) ● Encryption at rest (AES256) ● Between iSCSI devices - Challenge- Handshake Authentication Protocol (CHAP)
  • 54.
    Anti-Patterns ● Designed specificallyfor backup, disaster recovery and mirroring data to AWS
  • 56.
    Snowball AWS Snowball acceleratesmoving large amounts of data into and out of AWS using secure Snowball appliances. Ideal for transferring anywhere from terabytes to many petabytes of data in and out of the AWS Cloud securely. This is especially beneficial in cases where you don’t want to make expensive upgrades to your network infrastructure or in areas where high-speed Internet connections are not available or cost prohibitive.
  • 57.
    Key Features ● Transferringterabytes to many petabytes of data in and out of the AWS Cloud securely ● Ideal for transferring anywhere from terabytes to many petabytes of data in and out of the AWS Cloud securely. ● 80 TB of data can be transferred from your data source to the appliance in 2.5 days
  • 58.
    Durability and Availability ●Once the data is imported to AWS - 11 nines (99.999999999%) of durability and 4 nines (99.99%) of availability
  • 59.
    Scalability and Elasticity ●Each AWS Snowball appliance is capable of storing 50 TB or 80 TB of data ● Use multiple appliances in parallel to transfer more in the same time.
  • 60.
    Security ● Fine-grained accesspolicies (IAM) ● All data loaded onto a Snowball appliance is encrypted using 256-bit encryption (KMS) ● Snowball is physically secured by using an industry- standard Trusted Platform Module (TPM) that uses a dedicated processor designed to detect any unauthorized modifications to the hardware, firmware, or software.
  • 61.
    Anti-Patterns ● If datacan be transferred over the Internet in less than one week
  • 62.
    Honorable Mention Amazon CloudFrontis a content-delivery web service that speeds up the distribution of your website’s dynamic, static, and streaming content by making it available from a global network of edge locations.
  • 63.
  • 64.
    References AWS Storage ServicesOverview link Cloud Storage with AWS link Amazon S3 link Amazon Glacier link Amazon Elastic File System (EFS) link Amazon Elastic Block Store link Amazon EC2 Instance Store link Amazon Storage Gateway link Amazon FSX for Lustre/Windows

Editor's Notes

  • #9 Hash algorhythms are used for consistency checking For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years.
  • #31 As your file systems scales, your throughput can scale up to hundreds of GB/s.
  • #34 Typical use cases include relational and NoSQL databases (like Microsoft SQL Server and MySQL or Cassandra and MongoDB), Big Data analytics engines (like the Hadoop/HDFS ecosystem and Amazon EMR), stream and log processing applications (like Kafka and Splunk), and data warehousing applications (like Vertica and Teradata).
  • #35 Typical use cases include relational and NoSQL databases (like Microsoft SQL Server and MySQL or Cassandra and MongoDB), Big Data analytics engines (like the Hadoop/HDFS ecosystem and Amazon EMR), stream and log processing applications (like Kafka and Splunk), and data warehousing applications (like Vertica and Teradata).
  • #36 Hadoop - a framework that allows for the distributed processing of large data sets across clusters of computers General purpose SSD (gp2) volumes suitable for a broad range of transactional workloads, including dev/test environments, low-latency interactive applications, and boot volumes. Provisioned IOPS SSD (io1) volumes IO1 is backed by solid-state drives (SSDs) and is the highest performance EBS storage option designed for critical, I/O intensive database and application workloads, as well as throughput-intensive database and data warehouse workloads, such as HBase, Vertica, and Cassandra. These volumes are ideal for both IOPS-intensive and throughput-intensive workloads that require extremely low latency. Throughput optimized HDD (st1) volumes ST1 is backed by hard disk drives (HDDs) and is ideal for frequently accessed, throughput intensive workloads with large datasets and large I/O sizes, such as MapReduce, Kafka, log processing, data warehouse, and ETL workloads Cold HDD (sc1) volumes SC1 is backed by hard disk drives (HDDs) and provides the lowest cost per GB of all EBS volume types. It is ideal for less frequently accessed workloads with large, cold datasets All provide burst model but io1
  • #37 EBS volumes are designed for an annual failure rate (AFR) of between 0.1 and 0.2 percent, where failure refers to a complete or partial loss of the volume, depending on the size and performance of the volume. This means, if you have 1,000 EBS volumes over the course of a year, you can expect unrecoverable failures with 1 or 2 of your volumes.
  • #38 The simplest approach is to create and attach a new EBS volume and begin using it together with your existing ones. However, if you need to expand the size of a single EBS volume, you can effectively resize a volume using a snapshot:
  • #40 Temporary storage Consider using local instance store volumes for needs such as scratch disks, buffers, queues, and caches. Amazon Local Instance Store Multi-instance storage Amazon EBS volumes can only be attached to one EC2 instance at a time. If you need multiple EC2 instances accessing volume data at the same time, consider using Amazon EFS as a file system. Amazon EFS Highly durable storage If you need very highly durable storage, use S3 or Amazon EFS. Amazon S3 Standard storage is designed for 99.999999999 percent (11 nines) annual durability per object. You can even decide to take a snapshot of the EBS volumes. Such a snapshot then gets saved in Amazon S3, thus providing you the durability of Amazon S3. For more information on EBS durability, see the Durability and Availability section. EFS is designed for high durability and high availability, with data stored in multiple Availability Zones within an AWS Region. Amazon S3 Amazon EFS Static data or web content If your data doesn’t change that often, Amazon S3 might represent a more cost-effective and scalable solution for storing this fixed information. Also, web content served out of Amazon EBS requires a web server running on Amazon EC2; in contrast, you can deliver web content directly out of Amazon S3 or from multiple EC2 instances using Amazon EFS. Amazon S3 Amazon EFS
  • #47 Persistent storage If you need persistent virtual disk storage similar to a physical disk drive for files or other data that must persist longer than the lifetime of a single EC2 instance, EBS volumes, Amazon EFS file systems, or Amazon S3 are more appropriate. Amazon EC2 Amazon EBS Amazon EFS Amazon S3 Relational database storage In most cases, relational databases require storage that persists beyond the lifetime of a single EC2 instance, making EBS volumes the natural choice. Amazon EC2 Amazon EBS Shared storage Instance store volumes are dedicated to a single EC2 instance and can’t be shared with other systems or users. If you need storage that can be detached from one instance and attached to a different instance, or if you need the ability to share data easily, Amazon EFS, Amazon S3, or Amazon EBS are better choices. Amazon EFS Amazon S3 Amazon EBS Snapshots If you need the convenience, long-term durability, availability, and the ability to share point-in-time disk snapshots, EBS volumes are a better choice. Amazon EBS
  • #49 It’s a virtual machine (VM) image that you install on a host in your data center or as an EC2 instance