2. Agenda
◦ Basics of Spot instances & Microservices?
◦ Why use Spot instances?
◦ How MetroGuild uses microservices & Spot instances
to deliver high availability & cost savings
◦ Q&A
2
3. The bill comes due
◦ Are you running production workloads?
◦ Are you using Spot instances?
◦ Infrastructure needs to scale with the business…
frugally!
◦ Your job is not just to make it work
◦ Check your org/BU costs
3
7. Spots are great!...Right?
◦ Interruptions are a reality!
◦ But… 95% of instances weren’t interrupted in 3
months
◦ We get a 2 minute warning before instance is killed
◦ Pricing was/is dynamic
7
8. HMMM… MICROSERVICES ON CONTAINERS
Is my workload ready for Spot?
8
STATELESS
FAULT
TOLERANT
FLEXIBLE
AZ + INSTANCE
LOOSELY
COUPLED
9. Spot Pricing/Bidding
◦ Remember Spot is spare capacity from the on-
demand pool
◦ If on-demand usage increases you may lose your spot
instances
◦ Bidding was the norm earlier
◦ Predictable pricing after Nov 2017
9
14. We are a sales
accelerator for large
enterprises
14
What is MetroLeads?
Sourcing
Lead Management
Voice
CRM
Email and SMS
Sig management
Calendaring
Field Sales
Vertical centric
3rd party apps
Metroleads Cocktail for
Sales and Marketing
API
15. 15
Analytics & Campaigns
Voice platform
Fast response times
Uptime
Quick updates
Experimentation
Security & Compliance
High throughput & scaled capacity
Low latency
Geo distribution
High availability
Fast control plane
A/B Testing & Environments
Isolation, certificate maintenance, etc
Business vs Engineering Goals
Metroleads
CRM
Insights
Reports
DATA CAPTURE
API EmailSMSCalls Web entries
Mobile App
IOS and Android
Web App
Expo App
USERS
16. ◦ Microservices architecture
◦ Containers (Docker) & Orchestration (Puppet)
◦ Highly available data sources – Cassandra & Elasticsearch
Clusters
◦ Data (API) delivery - ReactJS/React Native
◦ Data Security – Isolated environments and secure access
◦ Open Source software is preferred (Self hosted)
16
Key Tenets
17. ◦ Data Growth & Customer Growth
◦ Fluctuations of data (periodic spikes)
◦ Cost per customer was too high
17
Why did we even move to Spot?
19. ◦ We got outbid too often – Cost!
◦ No warnings – Data Loss!
◦ Code was not resilient to failure or load due to loss of
capacity – Broken windows!
◦ Lack of choices i.e. could not request classes of instances -
Locked on one type & spent time on deciding (we wrote a
prediction algorithm too)!
19
Initial Spot Hurdles – Circa ‘13
20. ◦ Ease of deployment
◦ Auto registration with ALB/ELB – Target groups
◦ Moved to a real messaging queue (RabbitMQ) to
throttle load for non SLA services
◦ Gracefully handled failures (not too graceful though)
20
Advent of ECS/ECR a la mode Docker
22. ◦ Using init-data, instance registers with cluster
◦ Installs docker on boot
◦ ECS cluster in daemon/replica mode deploys app
◦ Target groups route ALB traffic to docker containers on Spot
instances
◦ Data is routed via message queue to non internet facing
services
22
Deployment Notes - Recap
23. ECS Clusters – Backbone of Spot
23
Slow deployments, manual scaling, over
provisioned system
24. Benefits so far…
24
◦ Spot fleet allows us to request a class of instances and
maintains capacity - Overprovisioning
◦ Stable pricing (max we pay is on-demand cost)
◦ Transparency in cost savings & no cost spikes –
Predictable bills
◦ Early warnings - We just use them to notify us
◦ 90% workloads on Spot
◦ Scale up/down capacity within minutes
26. Key Takeaways
26
◦ Spot Fleets should be part of your architecture
◦ Spot Fleets can save a lot of money only if you change
your architecture
◦ Spots can handle all types of workloads not just data
processing
◦ Trust AWS to innovate
27. 27
Thanks!
We are hiring!
You can find us at:
◦ www.metroguild.com
◦ careers@metroguild.com
◦ https://www.linkedin.com/in/arifamirani/