This document provides an overview of Amazon Web Services including EC2, S3, and EMR. It discusses regions and availability zones in EC2, how to set up VPCs, different EC2 instance types, AMIs, key pairs, and the differences between EBS and instance store. It also covers S3 concepts like buckets, objects, storage classes, and access controls. Finally, it briefly introduces EMR and how it provides a managed Hadoop framework on EC2 instances with integration to S3 for storage. The document includes demos of working with EC2 instances and EBS volumes, S3 buckets, and creating an EMR cluster.
3. Regions and Availability Zones
• Amazon EC2 is hosted in multiple locations world-wide
• Each region is a separate geographic area
• Each region has multiple, isolated locations know as Availability zones
VPC
• Virtual datacenter in the cloud
• You can create your own public-facing subnet for your webservers and
place your backend systems such as databases or application servers in
private subnet
• You can create a hardware virtual private network connection b/w your
corporate datacenter and AWS
• Assign custom IP address range in each subnet
• Create internet gateways
• Leverage multiple layers of security
4.
5. Amazon EC2
• Web service that provides secure, resizable compute capacity in the cloud
• Pay only for capacity that you actually use
• Choose Linux or Windows
ü On-demand Instances
Applications with spiky or unpredictable workloads or being developed or tested on
AmazonEC2
ü Reserved Instances
Steady state or predictable usage and able to make upfront payment
ü Spot Instances
Applications that have flexible start and end times
6. Amazon Machine Image (AMI)
Provides the information required to launch an instance
An AMI includes the following:
ü A template for the root volume for the instance (for example, an
operating system, an application server, and applications)
ü Launch permissions that control which AWS accounts can use the AMI
to launch instances
ü A block device mapping that specifies the volumes to attach to the
instance when it's launched
Key Pair
Amazon EC2 uses public–key cryptography to encrypt and decrypt login information
ü Public–key cryptography uses a public key to encrypt a piece of data, such as a
password, then the recipient uses the private key to decrypt the data
7.
8.
9.
10.
11. Amazon EBS vs Amazon EC2 Instance Store
Amazon EBS
• Data stored on on Amazon EBS volume can persist independently of the life of the instance
• Storage is persistent
ü Magnetic
ü General Purpose (SSD)
ü Provisioned IOPS (SSD)
Amazon EC2 Instance Store
• Data stored on a local instance store persists only as long as the instance is alive
• Physically attached to the host computer
• Storage is ephemeral
EBS Volumes: Larger and Faster
General Purpose (SSD) Provisioned IOPS(SSD)
Up to 16 TB up to 16 TB
10000 IOPS 20000 IOPS
16. Amazon Simple Storage Service(S3)
• Store and retrieve any amount of data, any time, from
anywhere on the web
• Highly Scalable, reliable, fast and durable
• S3 object based allows you to upload files
• Files can be 1 Byte to 5 TB
• Buckets have unique namespace for each region
• Amazon guarantees 99.99% availability
• Guarantees durability of 99.999999999%
17. Amazon S3
concepts
uAmazon S3 stores data as objects within
buckets
uAn object is composed of a file and optionally
any metadata that describes that file
uYou can have up to 100 buckets in each
account
uYou can control access to the bucket and its
objects
18. • Write once, Read many
• Eventually consistent
• Secure by default
• Use S3 Policies, ACLs or IAM to define rules
• Cross-region replication
Storage Classes
Standard
For frequently accessed data
Standard – Infrequent access
For long-lived, but less frequently accessed data
Glacier
For long-term archive
Amazon Simple Storage Service(S3)
21. EMR
• Managed Hadoop framework
• Fast and cost-effective to process vast amounts of data across
• Dynamically scalable amazon EC2 instances
• Supported Applications
ü Hadoop, Hive, HUE, Pig, HBase, Zookeeper, Spark and more
Built in support for resizing clusters and integrated with the Amazon EC2 spot market
to help lower the costs
22. • Separate compute and storage
• Resize and shutdown Amazon EMR
clusters with no data loss
• Point multiple Amazon EMR clusters
at the same data in Amazon S3
Amazon S3 as your persistent data store