Cost Optimization with Spot
Instances
ARUN SIRIMALLA
Overview
of Amazon
Web
Services
Regions and Availability Zones
• Amazon EC2 is hosted in multiple locations world-wide
• Each region is a separate geographic area
• Each region has multiple, isolated locations know as Availability zones
VPC
• Virtual datacenter in the cloud
• You can create your own public-facing subnet for your webservers and
place your backend systems such as databases or application servers in
private subnet
• You can create a hardware virtual private network connection b/w your
corporate datacenter and AWS
• Assign custom IP address range in each subnet
• Create internet gateways
• Leverage multiple layers of security
Amazon EC2
• Web service that provides secure, resizable compute capacity in the cloud
• Pay only for capacity that you actually use
• Choose Linux or Windows
ü On-demand Instances
Applications with spiky or unpredictable workloads or being developed or tested on
AmazonEC2
ü Reserved Instances
Steady state or predictable usage and able to make upfront payment
ü Spot Instances
Applications that have flexible start and end times
Amazon Simple Storage Service
(S3)
The infinite Hard Drive in the Cloud
Amazon Simple Storage Service(S3)
• Store and retrieve any amount of data, any time, from
anywhere on the web
• Highly Scalable, reliable, fast and durable
• S3 object based allows you to upload files
• Files can be 1 Byte to 5 TB
• Buckets have unique namespace for each region
• Amazon guarantees 99.99% availability
• Guarantees durability of 99.999999999%
Amazon S3
concepts
uAmazon S3 stores data as objects within
buckets
uAn object is composed of a file and optionally
any metadata that describes that file
uYou can have up to 100 buckets in each
account
uYou can control access to the bucket and its
objects
Elastic MapReduce
EMR
• Managed Hadoop framework
• Fast and cost-effective to process vast amounts of data across
dynamically scalable amazon EC2 instances
• Supported Applications
ü Hadoop, Hive, HUE, Pig, HBase, Zookeeper, Spark and more
Built in support for resizing clusters and intergrated with the Amazon EC2 spot market
to help lower the costs
• Separate compute and storage
• Resize and shutdown Amazon EMR
clusters with no data loss
• Point multiple Amazon EMR clusters
at the same data in Amazon S3
Amazon S3 as your persistent data store
Amazon EMR
Instance Groups
Collection of EC2 instances that perform a set of roles
• Master
• Master Instance Group manages the entire Hadoop cluster
• Core
• Core Instance Group contains all the core nodes of an
EMR cluster
• Task
• Task Instance Group contains all the task nodes of an
EMR cluster
• Instance fleets for EC2 Spot instances and Spot blocks
• Can automatically provision Spot and On-Demand capacity
• 2 mins warning for termination
• Spot block allows you to execute without interruption for 1 to 6 hours
Create EMR clusters with Instance fleets from EMR console, CLI and EMR API
Price History
ü Provides price history of all instance types supported by EMR
Spot Bid Advisor
ü Low – interruption
ü Medium – Average lifetime is 2 days
1. Creating an EMR cluster with Instance group
2. Creating an EMR cluster with Spot instances
Demo
Thank you!
Upcoming Session
Deep Dive on EC2 and S3 – OCT 10

Cost Optimization with Spot Instances

  • 1.
    Cost Optimization withSpot Instances ARUN SIRIMALLA
  • 2.
  • 3.
    Regions and AvailabilityZones • Amazon EC2 is hosted in multiple locations world-wide • Each region is a separate geographic area • Each region has multiple, isolated locations know as Availability zones VPC • Virtual datacenter in the cloud • You can create your own public-facing subnet for your webservers and place your backend systems such as databases or application servers in private subnet • You can create a hardware virtual private network connection b/w your corporate datacenter and AWS • Assign custom IP address range in each subnet • Create internet gateways • Leverage multiple layers of security
  • 5.
    Amazon EC2 • Webservice that provides secure, resizable compute capacity in the cloud • Pay only for capacity that you actually use • Choose Linux or Windows ü On-demand Instances Applications with spiky or unpredictable workloads or being developed or tested on AmazonEC2 ü Reserved Instances Steady state or predictable usage and able to make upfront payment ü Spot Instances Applications that have flexible start and end times
  • 9.
    Amazon Simple StorageService (S3) The infinite Hard Drive in the Cloud
  • 10.
    Amazon Simple StorageService(S3) • Store and retrieve any amount of data, any time, from anywhere on the web • Highly Scalable, reliable, fast and durable • S3 object based allows you to upload files • Files can be 1 Byte to 5 TB • Buckets have unique namespace for each region • Amazon guarantees 99.99% availability • Guarantees durability of 99.999999999%
  • 11.
    Amazon S3 concepts uAmazon S3stores data as objects within buckets uAn object is composed of a file and optionally any metadata that describes that file uYou can have up to 100 buckets in each account uYou can control access to the bucket and its objects
  • 12.
  • 13.
    EMR • Managed Hadoopframework • Fast and cost-effective to process vast amounts of data across dynamically scalable amazon EC2 instances • Supported Applications ü Hadoop, Hive, HUE, Pig, HBase, Zookeeper, Spark and more Built in support for resizing clusters and intergrated with the Amazon EC2 spot market to help lower the costs
  • 16.
    • Separate computeand storage • Resize and shutdown Amazon EMR clusters with no data loss • Point multiple Amazon EMR clusters at the same data in Amazon S3 Amazon S3 as your persistent data store
  • 21.
    Amazon EMR Instance Groups Collectionof EC2 instances that perform a set of roles • Master • Master Instance Group manages the entire Hadoop cluster • Core • Core Instance Group contains all the core nodes of an EMR cluster • Task • Task Instance Group contains all the task nodes of an EMR cluster
  • 22.
    • Instance fleetsfor EC2 Spot instances and Spot blocks • Can automatically provision Spot and On-Demand capacity • 2 mins warning for termination • Spot block allows you to execute without interruption for 1 to 6 hours Create EMR clusters with Instance fleets from EMR console, CLI and EMR API Price History ü Provides price history of all instance types supported by EMR Spot Bid Advisor ü Low – interruption ü Medium – Average lifetime is 2 days
  • 23.
    1. Creating anEMR cluster with Instance group 2. Creating an EMR cluster with Spot instances Demo
  • 24.
  • 25.
    Upcoming Session Deep Diveon EC2 and S3 – OCT 10