SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jarrod Spiga - Solutions Architect, Amazon Web Services
John Matheson – Cloud & Automation Platform Manager, News Corp Australia
September 5, 2017
Spot On!
Replacing Reserved Instances with Spot
Spot 101
Workloads running on Spot therefore need to be fault-tolerant, or
able to be restarted again at a later time.
Customers can bid on unused AWS capacity.
The tradeoff is if capacity becomes constrained, your Spot
Instances may be terminated after a two-minute warning.
Spot 101
Each instance family, each instance size, each availability zone in
each region is a separate spot market.
The Spot market is where price of compute fluctuations based on
supply and demand.
You’ll never pay more than your bid - you’ll only ever pay the market
price. When the market price exceeds your bid, you get two minutes
to wrap up.
Spot 101
Spot 101
A Review of Spot Fleet
Launch many spot instances with one call.
Select whether you want instances in the cheapest market, or opt to
diversify to reduce impact of market variability.
Weighting allows you to scale based on cores, memory, latency,
etc.
This diversification option is what we are using to maximise availability!
Challenge 1 – ELB Registration
Q: How do we register a Spot Fleet Instance to an ELB/ALB?
A: Use EC2 User Data.
aws elb register-instances-with-load-balancer 
--load-balancer-name my-loadbalancer --instances $instance_id;
As there is no mechanism to automatically register a Spot Instance provisioned via a Spot Fleet Request
with an ELB/ALB, this needs to be implemented to distribute load across the fleet.
Challenge 2 – EIP Attachment
Q: How do we associate an Elastic IP Address to a Spot Fleet Instance?
A: Use EC2 User Data.
aws ec2 allocate-address --domain vpc;
aws ec2 associate-address --instance-id $instance_id 
--allocation-id $eip_allocation_id;
This will be required for services that need direct connectivity to the Internet such as NAT hosts and
proxy servers.
But it’s a bit more complicated…
An alternative is to enable automatic public IP addressing – but this is a VPC-wide setting.
This use case has been raised with the service team and a feature request has been created.
Challenge 3 – De-registration from an ELB
Q: How do we de-register a terminated Spot Instance from an ELB/ALB?
A: Run a script that polls the instance metadata for a termination time.
while true
do
if curl -s http://169.254.169.254/latest/meta- data/spot/termination-time | 
grep -q .*T.*Z;
then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); 
aws elb deregister-instances-from-load-balancer 
--load-balancer-name my-load-balancer 
--instances $instance_id;
else
sleep 5
fi
done;
We need a mechanism to ensure that ELB traffic is not routed through to an instance that’s about to be
terminated.
This is a well documented process.
Challenge 3 – De-registration from an ELB
A (v2.0): Use a Lambda function to deregister the instance upon termination.
The previous solution does not cater for instances that are manually terminated. Also, it cannot be tested.
Finally, we can “think bigger” and build a solution that applies to ALL instances – on-demand or spot.
When an EC2 instance is terminated, an EC2 Instance State-change Notification event is raised in
CloudWatch Events. When an EC2 Instance changes state, a Lambda function can be executed.
This function would first check that the instance was terminated, then would de-register the instance.
Q: But how do we ensure that requests don’t get directed to an instance that’s
about to be terminated?
Utilise existing health check functionality within the ELB/ALB.
When our termination notice is posted to our Spot Instance, poison the health check!
Alternatively, you can use a scheduled CloudWatch Event to routinely initiate the Lambda function.
Challenge 4 – Spot Price Variability
Q: What happens if the Spot market price goes up beyond our bid price?
A: Handle the outage or run on-demand instances in parallel.
This needs to be considered if we are to have any guarantee of service, especially for production
environments.
Deploying diversified Spot Fleets helps greatly here, but there is still a chance that ALL markets in the
fleet could be outbid… Running on-demand instances in parallel is the typical answer.
But surely there is a better way?
Challenge 4 – Spot Price Variability
A (v2.0): Pre-empt the market.
Each Spot Fleet request has an associated CloudWatch Metric called “EligibleInstancePoolCount”, which
enumerates how many pools that a Fleet Request could fulfil a request from.
We can configure a CloudWatch Alarm that triggers when the number of eligible pools drops below a
certain threshold – say, 2 pools.
Our On-Demand instances running in parallel can now be replaced with an AutoScaling group that
typically has no instances running. When the alarm triggers, a Lambda functions is invoked to manipulate
the AutoScling group configuration and provision On-Demand instances.
Challenge 5 – T2 Instances
Q: What happens if my workload runs on T2 Instance types?
A: Use the larger instance types
There is currently no Spot markets for T2 instance types - meaning that workloads may have to run on
m3.medium or larger instance types to take advantage of Spot.
Deploying Spot Fleets across a diverse range of pools means that we don’t need to be constrained by a
particular instance type. It’s very rare that a workload has adverse performance with more resources!
Challenge 5 – T2 Instances
Let’s look at an example. A t2.small instance type
has an hourly rate of $0.032. This instance type
features a single (burstable) vCPU and 2GiB of
memory.
Compare this with an m3.medium instance.
This instance type gives you a single (dedicated)
vCPU and 3.75GiB of memory.
More predictable performance and more memory at 40% of the cost!
In all three AZs in the Sydney region, the market
price of an m3.medium instance type has not
exceeded $0.020 over the last 3 months...
… and our average hourly rate in the most expensive AZ still ended up being $0.0128
Challenge 5 – T2 Instances
Now compare a t2.small with a m3.large instance.
The m3.large instance type gives you a two
(dedicated) vCPU and 7.50GiB of memory.
We can further diversify our fleet to also utilize m3.large instances in the spot market, and still make
savings over what would have been charged if we were running t2.small instances!
In all three AZs in the Sydney region, the market
price of an m3.large instance type has not
exceeded $0.0318 over the last 3 months...
Challenge 6 – Automation
Q: How do we automate all of this?
A: In steps the automation team
This needs to be considered if we are to have any guarantee of service, especially for production
environments.
The advantage to using CloudWatch Events and Alarms as triggers to Lambda functions is that the
Lambda functions should be able to be implemented in a single account and invoked by each stack.
That said, the team have spent a LOT of time working through the complexities of building Spot Fleet
requests in to CloudFormation stacks.
Development teams are still provided with same baseline CloudFormation templates (utilizing ASGs) that
have been provided in the past. However, a new tool has been written by the Automation team that takes
these baseline templates and converts them for use with Spot Fleets.
Given the recent launch of CloudFormation StackSets, we’re about to start looking at ways where this
can be further simplified.
The Solution Going Forward: Deploy
1. A deployment plan has been initiated.
We start with a standard ASG based template (ASG, ELB, Baked AMI-ID, SecurityGroups, Subnets etc).
2. A lambda function is used to convert this ASG template into a skeleton Spot
Fleet resource request
The Solution Going Forward: Deploy
Lambda Function
(Conversion tool)
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
3. A Lambda Function is triggered generating a list of instances types similar to
the one provided. It also calculates appropriate bid prices for those instance
types.
Lambda Function
(Conversion tool)
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
Lambda Function
(Dynamic Spot Fleet Template)
4. The provided list is pushed to a third Lambda Function which dynamically
creates the Spot Fleet Template and uploads it to S3.
Lambda Function
(Conversion tool)
5. Application Cloudformation Stack is created.
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
Lambda Function
(Dynamic Spot Fleet Template)
Cloudformation
(Application Stack)
Lambda Function
(Conversion tool)
1. The CloudFormation stack provisions an ELB/ALB and a Spot Fleet
Request is made.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
2. The Spot Fleet Request is fulfilled and Spot Instances register with their
ELB via EC2 User Data.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
3. If the market price for a Spot Instance exceeds the bid price, the Instance
is flagged for Termination. Health check on host is poisoned. Instance
marked as offline by ELB.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
4. After two minutes, Spot Instance terminated. Scheduled CloudWatch Event
triggered, which initiates a Lambda function that ensures that unhealthy
instances are terminated and deregisters terminated instances from the ELB.
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
5. A replacement Spot Instance is provisioned. Again, this Spot Instance
registers itself with the ELB.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
6. If the number of pools that Spot Fleet can fulfil instances from gets low…
The Solution Going Forward: Stack
Spot Fleet On-Demand Fleet
Elastic Load Balancer
CloudWatch Alarm
(EligibleInstancePoolCount)
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
Lambda Function
(ModifyOnDemandCapacity)
7. … a CloudWatch Alarm will trigger a Lambda function that manipulates an
On-Demand AutoScaling group, which will commence provisioning On-
Demand EC2 Instances to maintain capacity for the workload.
The Solution Going Forward: Stack
Spot Fleet On-Demand Fleet
Elastic Load Balancer
CloudWatch Alarm
(EligibleInstancePoolCount)
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
Lambda Function
(ModifyOnDemandCapacity)
CloudWatch Alarm
(Pending Capacity > 0 for > 5 min)
8. If a spot instance has not been able to be provisioned for more than 5
minutes, an on-demand instance is also added.
Development Recommendations
Build Stateless Applications
If you can’t, persist state outside of the EC2 Instance using services such as DynamoDB,
RDS, Aurora, Elasticache, EFS or S3.
Poison Application Health Checks within the 2-minute warning period
Detect when a Spot Instance is scheduled for termination and cause the ELB/ALB to
think that the workload is out of service. That server will then be removed from the pool
of Healthy servers, allowing your application to gracefully handle a termination event.
Set your bid price appropriately
A bid price that is too low will introduce volatility in to your workload and reduce the
number of spot markets that you can draw instances from.
Reference Material
• EC2 Spot Instances - http://aws.amazon.com/ec2/spot/
• Spot Bid Advisor - http://aws.amazon.com/ec2/spot/bid-advisor/
• Getting Started with Spot - http://aws.amazon.com/ec2/spot/getting-started/
• Spot FAQs - http://aws.amazon.com/ec2/spot/faqs/
• Spot Testimonials - http://aws.amazon.com/ec2/spot/testimonials/
•
• Documentation: Using Spot Instances -
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html
• Documentation: Spot Fleet -
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html
Any Questions?

More Related Content

What's hot

AWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 PerspectiveAWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 Perspective
Oblivion Cloud Control BV
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
Amazon Web Services
 
AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership  AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership
Amazon Web Services
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
Amazon Web Services
 
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
Amazon Web Services
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
Amazon Web Services
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
Amazon Web Services
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Amazon Web Services
 
Proactive Cost Management for AWS Cloud
Proactive Cost Management for AWS CloudProactive Cost Management for AWS Cloud
Proactive Cost Management for AWS Cloud
Nutanix Beam
 
Managing Amazon AWS Costs
Managing Amazon AWS CostsManaging Amazon AWS Costs
Managing Amazon AWS Costs
Joe Kinsella
 
Using the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - RogersUsing the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - Rogers
Amazon Web Services
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot Instances
Amazon Web Services
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
Wolfgang Gentzsch
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
Amazon Web Services
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO Conversations
Amazon Web Services
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Amazon Web Services
 
AWS Cost optimization at scale
AWS Cost optimization at scaleAWS Cost optimization at scale
AWS Cost optimization at scale
Brett Pollak
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)
Amazon Web Services
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
Amazon Web Services
 

What's hot (19)

AWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 PerspectiveAWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 Perspective
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership  AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
 
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
Proactive Cost Management for AWS Cloud
Proactive Cost Management for AWS CloudProactive Cost Management for AWS Cloud
Proactive Cost Management for AWS Cloud
 
Managing Amazon AWS Costs
Managing Amazon AWS CostsManaging Amazon AWS Costs
Managing Amazon AWS Costs
 
Using the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - RogersUsing the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - Rogers
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO Conversations
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
 
AWS Cost optimization at scale
AWS Cost optimization at scaleAWS Cost optimization at scale
AWS Cost optimization at scale
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 

Similar to AWS Cost Opt Meetup 2 - News corp - Spot On deep dive

An introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - WebinarAn introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - Webinar
CMPUTE
 
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot FleetAWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
Adam Book
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
Amazon Web Services
 
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Amazon Web Services
 
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
Amazon Web Services
 
Reduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot InstancesReduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot Instances
Intelligentia IT Systems Pvt. Ltd.
 
Getting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar SeriesGetting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar Series
Amazon Web Services
 
AWS Cost Control
AWS Cost ControlAWS Cost Control
AWS Cost Control
Bob Brown
 
Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options  Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands
Amazon Web Services
 
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
Edureka!
 
Optimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesOptimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot Instances
Amazon Web Services
 
Cut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than BatchCut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than Batch
RightScale
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
AWS User Group Bengaluru
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
Amazon Web Services
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
Amazon Web Services
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
Amazon Web Services
 
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Amazon Web Services
 

Similar to AWS Cost Opt Meetup 2 - News corp - Spot On deep dive (20)

An introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - WebinarAn introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - Webinar
 
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot FleetAWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
 
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
 
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
 
Reduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot InstancesReduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot Instances
 
Getting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar SeriesGetting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar Series
 
AWS Cost Control
AWS Cost ControlAWS Cost Control
AWS Cost Control
 
Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options  Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands
 
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
 
Optimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesOptimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot Instances
 
Cut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than BatchCut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than Batch
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
 

Recently uploaded

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

AWS Cost Opt Meetup 2 - News corp - Spot On deep dive

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jarrod Spiga - Solutions Architect, Amazon Web Services John Matheson – Cloud & Automation Platform Manager, News Corp Australia September 5, 2017 Spot On! Replacing Reserved Instances with Spot
  • 2. Spot 101 Workloads running on Spot therefore need to be fault-tolerant, or able to be restarted again at a later time. Customers can bid on unused AWS capacity. The tradeoff is if capacity becomes constrained, your Spot Instances may be terminated after a two-minute warning.
  • 3. Spot 101 Each instance family, each instance size, each availability zone in each region is a separate spot market. The Spot market is where price of compute fluctuations based on supply and demand. You’ll never pay more than your bid - you’ll only ever pay the market price. When the market price exceeds your bid, you get two minutes to wrap up.
  • 6. A Review of Spot Fleet Launch many spot instances with one call. Select whether you want instances in the cheapest market, or opt to diversify to reduce impact of market variability. Weighting allows you to scale based on cores, memory, latency, etc. This diversification option is what we are using to maximise availability!
  • 7. Challenge 1 – ELB Registration Q: How do we register a Spot Fleet Instance to an ELB/ALB? A: Use EC2 User Data. aws elb register-instances-with-load-balancer --load-balancer-name my-loadbalancer --instances $instance_id; As there is no mechanism to automatically register a Spot Instance provisioned via a Spot Fleet Request with an ELB/ALB, this needs to be implemented to distribute load across the fleet.
  • 8. Challenge 2 – EIP Attachment Q: How do we associate an Elastic IP Address to a Spot Fleet Instance? A: Use EC2 User Data. aws ec2 allocate-address --domain vpc; aws ec2 associate-address --instance-id $instance_id --allocation-id $eip_allocation_id; This will be required for services that need direct connectivity to the Internet such as NAT hosts and proxy servers. But it’s a bit more complicated… An alternative is to enable automatic public IP addressing – but this is a VPC-wide setting. This use case has been raised with the service team and a feature request has been created.
  • 9. Challenge 3 – De-registration from an ELB Q: How do we de-register a terminated Spot Instance from an ELB/ALB? A: Run a script that polls the instance metadata for a termination time. while true do if curl -s http://169.254.169.254/latest/meta- data/spot/termination-time | grep -q .*T.*Z; then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); aws elb deregister-instances-from-load-balancer --load-balancer-name my-load-balancer --instances $instance_id; else sleep 5 fi done; We need a mechanism to ensure that ELB traffic is not routed through to an instance that’s about to be terminated. This is a well documented process.
  • 10. Challenge 3 – De-registration from an ELB A (v2.0): Use a Lambda function to deregister the instance upon termination. The previous solution does not cater for instances that are manually terminated. Also, it cannot be tested. Finally, we can “think bigger” and build a solution that applies to ALL instances – on-demand or spot. When an EC2 instance is terminated, an EC2 Instance State-change Notification event is raised in CloudWatch Events. When an EC2 Instance changes state, a Lambda function can be executed. This function would first check that the instance was terminated, then would de-register the instance. Q: But how do we ensure that requests don’t get directed to an instance that’s about to be terminated? Utilise existing health check functionality within the ELB/ALB. When our termination notice is posted to our Spot Instance, poison the health check! Alternatively, you can use a scheduled CloudWatch Event to routinely initiate the Lambda function.
  • 11. Challenge 4 – Spot Price Variability Q: What happens if the Spot market price goes up beyond our bid price? A: Handle the outage or run on-demand instances in parallel. This needs to be considered if we are to have any guarantee of service, especially for production environments. Deploying diversified Spot Fleets helps greatly here, but there is still a chance that ALL markets in the fleet could be outbid… Running on-demand instances in parallel is the typical answer. But surely there is a better way?
  • 12. Challenge 4 – Spot Price Variability A (v2.0): Pre-empt the market. Each Spot Fleet request has an associated CloudWatch Metric called “EligibleInstancePoolCount”, which enumerates how many pools that a Fleet Request could fulfil a request from. We can configure a CloudWatch Alarm that triggers when the number of eligible pools drops below a certain threshold – say, 2 pools. Our On-Demand instances running in parallel can now be replaced with an AutoScaling group that typically has no instances running. When the alarm triggers, a Lambda functions is invoked to manipulate the AutoScling group configuration and provision On-Demand instances.
  • 13. Challenge 5 – T2 Instances Q: What happens if my workload runs on T2 Instance types? A: Use the larger instance types There is currently no Spot markets for T2 instance types - meaning that workloads may have to run on m3.medium or larger instance types to take advantage of Spot. Deploying Spot Fleets across a diverse range of pools means that we don’t need to be constrained by a particular instance type. It’s very rare that a workload has adverse performance with more resources!
  • 14. Challenge 5 – T2 Instances Let’s look at an example. A t2.small instance type has an hourly rate of $0.032. This instance type features a single (burstable) vCPU and 2GiB of memory. Compare this with an m3.medium instance. This instance type gives you a single (dedicated) vCPU and 3.75GiB of memory. More predictable performance and more memory at 40% of the cost! In all three AZs in the Sydney region, the market price of an m3.medium instance type has not exceeded $0.020 over the last 3 months... … and our average hourly rate in the most expensive AZ still ended up being $0.0128
  • 15. Challenge 5 – T2 Instances Now compare a t2.small with a m3.large instance. The m3.large instance type gives you a two (dedicated) vCPU and 7.50GiB of memory. We can further diversify our fleet to also utilize m3.large instances in the spot market, and still make savings over what would have been charged if we were running t2.small instances! In all three AZs in the Sydney region, the market price of an m3.large instance type has not exceeded $0.0318 over the last 3 months...
  • 16. Challenge 6 – Automation Q: How do we automate all of this? A: In steps the automation team This needs to be considered if we are to have any guarantee of service, especially for production environments. The advantage to using CloudWatch Events and Alarms as triggers to Lambda functions is that the Lambda functions should be able to be implemented in a single account and invoked by each stack. That said, the team have spent a LOT of time working through the complexities of building Spot Fleet requests in to CloudFormation stacks. Development teams are still provided with same baseline CloudFormation templates (utilizing ASGs) that have been provided in the past. However, a new tool has been written by the Automation team that takes these baseline templates and converts them for use with Spot Fleets. Given the recent launch of CloudFormation StackSets, we’re about to start looking at ways where this can be further simplified.
  • 17. The Solution Going Forward: Deploy 1. A deployment plan has been initiated. We start with a standard ASG based template (ASG, ELB, Baked AMI-ID, SecurityGroups, Subnets etc).
  • 18. 2. A lambda function is used to convert this ASG template into a skeleton Spot Fleet resource request The Solution Going Forward: Deploy Lambda Function (Conversion tool)
  • 19. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) 3. A Lambda Function is triggered generating a list of instances types similar to the one provided. It also calculates appropriate bid prices for those instance types. Lambda Function (Conversion tool)
  • 20. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) Lambda Function (Dynamic Spot Fleet Template) 4. The provided list is pushed to a third Lambda Function which dynamically creates the Spot Fleet Template and uploads it to S3. Lambda Function (Conversion tool)
  • 21. 5. Application Cloudformation Stack is created. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) Lambda Function (Dynamic Spot Fleet Template) Cloudformation (Application Stack) Lambda Function (Conversion tool)
  • 22. 1. The CloudFormation stack provisions an ELB/ALB and a Spot Fleet Request is made. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer
  • 23. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 2. The Spot Fleet Request is fulfilled and Spot Instances register with their ELB via EC2 User Data.
  • 24. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 3. If the market price for a Spot Instance exceeds the bid price, the Instance is flagged for Termination. Health check on host is poisoned. Instance marked as offline by ELB.
  • 25. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 4. After two minutes, Spot Instance terminated. Scheduled CloudWatch Event triggered, which initiates a Lambda function that ensures that unhealthy instances are terminated and deregisters terminated instances from the ELB. CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance)
  • 26. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) 5. A replacement Spot Instance is provisioned. Again, this Spot Instance registers itself with the ELB.
  • 27. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) 6. If the number of pools that Spot Fleet can fulfil instances from gets low…
  • 28. The Solution Going Forward: Stack Spot Fleet On-Demand Fleet Elastic Load Balancer CloudWatch Alarm (EligibleInstancePoolCount) CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) Lambda Function (ModifyOnDemandCapacity) 7. … a CloudWatch Alarm will trigger a Lambda function that manipulates an On-Demand AutoScaling group, which will commence provisioning On- Demand EC2 Instances to maintain capacity for the workload.
  • 29. The Solution Going Forward: Stack Spot Fleet On-Demand Fleet Elastic Load Balancer CloudWatch Alarm (EligibleInstancePoolCount) CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) Lambda Function (ModifyOnDemandCapacity) CloudWatch Alarm (Pending Capacity > 0 for > 5 min) 8. If a spot instance has not been able to be provisioned for more than 5 minutes, an on-demand instance is also added.
  • 30. Development Recommendations Build Stateless Applications If you can’t, persist state outside of the EC2 Instance using services such as DynamoDB, RDS, Aurora, Elasticache, EFS or S3. Poison Application Health Checks within the 2-minute warning period Detect when a Spot Instance is scheduled for termination and cause the ELB/ALB to think that the workload is out of service. That server will then be removed from the pool of Healthy servers, allowing your application to gracefully handle a termination event. Set your bid price appropriately A bid price that is too low will introduce volatility in to your workload and reduce the number of spot markets that you can draw instances from.
  • 31. Reference Material • EC2 Spot Instances - http://aws.amazon.com/ec2/spot/ • Spot Bid Advisor - http://aws.amazon.com/ec2/spot/bid-advisor/ • Getting Started with Spot - http://aws.amazon.com/ec2/spot/getting-started/ • Spot FAQs - http://aws.amazon.com/ec2/spot/faqs/ • Spot Testimonials - http://aws.amazon.com/ec2/spot/testimonials/ • • Documentation: Using Spot Instances - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html • Documentation: Spot Fleet - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html Any Questions?