SlideShare a Scribd company logo

Dynamically scaling a political news and activism hub (up to 5x the traffic in 20 minutes)

On any given day DailyKos can receive traffic peaks up to five times our base traffic, sometimes requiring us to scale out to double our backend app server capacity within a 10-20 minutes window (sometimes at unpredictable times). In this talk, Susan Potter will discuss DailyKos's use of autoscaling in EC2 from the essential components to some gotchas learned along the way.

1 of 69
Download to read offline
Dynamically scaling a news & activism hub
(scaling out up to 5x the write-traffic in 20 minutes)
Susan Potter
April 26, 2019
Outline
Intro
Problem Outline
Before & After
AWS EC2 AutoScaling: An overview
Related Side Notes
Questions?
1
Intro
whoami
$ finger $(whoami)
Name: Susan Potter
Last login Sun Jan 18 18:30 1996 (GMT) on tty1
- 23 years writing software
- Server-side/backend/infrastructure engineering, mostly
- Likes: functional programming (e.g. Haskell)
Today:
- Build new backend services in Haskell
- I babysit a bloated Rails webapp
Previously: trading systems, SaaS products, CI/CD
2
In the cloud
Figure 1: Programming cloud infrastructures from the soy bean fields
3
Problem Outline

Recommended

Continuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web ServicesContinuous Deployment with Amazon Web Services
Continuous Deployment with Amazon Web ServicesJulien SIMON
 
Building CI-CD Pipelines for Serverless Applications
Building CI-CD Pipelines for Serverless ApplicationsBuilding CI-CD Pipelines for Serverless Applications
Building CI-CD Pipelines for Serverless ApplicationsAmazon Web Services
 
DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS
DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWSDevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS
DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWSAmazon Web Services
 
Infrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web ServicesInfrastructure as code with Amazon Web Services
Infrastructure as code with Amazon Web ServicesJulien SIMON
 
Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)Building Serverless APIs (January 2017)
Building Serverless APIs (January 2017)Julien SIMON
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Deploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic BeanstalkDeploy, Scale and Manage your Application with AWS Elastic Beanstalk
Deploy, Scale and Manage your Application with AWS Elastic BeanstalkAmazon Web Services
 

More Related Content

What's hot

Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Julien SIMON
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...Amazon Web Services
 
Building Serverless APIs on AWS
Building Serverless APIs on AWSBuilding Serverless APIs on AWS
Building Serverless APIs on AWSJulien SIMON
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia
 
An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)Julien SIMON
 
Building your own slack bot on the AWS stack
Building your own slack bot on the AWS stackBuilding your own slack bot on the AWS stack
Building your own slack bot on the AWS stackTorontoNodeJS
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Julien SIMON
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Amazon Web Services
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDanilo Poccia
 
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Amazon Web Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
 
개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive창훈 정
 
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...Amazon Web Services
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateAmazon Web Services
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Julien SIMON
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...Amazon Web Services
 
AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAmazon Web Services
 

What's hot (20)

Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
 
Building Serverless APIs on AWS
Building Serverless APIs on AWSBuilding Serverless APIs on AWS
Building Serverless APIs on AWS
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and Hosting
 
An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)
 
Building your own slack bot on the AWS stack
Building your own slack bot on the AWS stackBuilding your own slack bot on the AWS stack
Building your own slack bot on the AWS stack
 
SOA on Rails
SOA on RailsSOA on Rails
SOA on Rails
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
 
CloudFormation Best Practices
CloudFormation Best PracticesCloudFormation Best Practices
CloudFormation Best Practices
 
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive
 
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef Automate
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
 
AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best Practices
 

Similar to Dynamically scaling a political news and activism hub (up to 5x the traffic in 20 minutes)

Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesHiroshi SHIBATA
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Amazon Web Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Amazon Web Services
 
Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Vadym Kazulkin
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!Julien SIMON
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersAmazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsAmazon Web Services
 
Creating scalable solutions with aws
Creating scalable solutions with awsCreating scalable solutions with aws
Creating scalable solutions with awsondrejbalas
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryCarlo Bonamico
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsAmazon Web Services
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Amazon Web Services
 

Similar to Dynamically scaling a political news and activism hub (up to 5x the traffic in 20 minutes) (20)

Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million Users
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing Operations
 
Creating scalable solutions with aws
Creating scalable solutions with awsCreating scalable solutions with aws
Creating scalable solutions with aws
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous Delivery
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill Baldwin
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
 

More from Susan Potter

Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in PropertiesSusan Potter
 
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Susan Potter
 
From Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedFrom Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedSusan Potter
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Susan Potter
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSSusan Potter
 
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016Susan Potter
 
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Susan Potter
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeSusan Potter
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedSusan Potter
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresSusan Potter
 
Distributed Developer Workflows using Git
Distributed Developer Workflows using GitDistributed Developer Workflows using Git
Distributed Developer Workflows using GitSusan Potter
 
Link Walking with Riak
Link Walking with RiakLink Walking with Riak
Link Walking with RiakSusan Potter
 
Writing Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptWriting Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptSusan Potter
 
Deploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatDeploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatSusan Potter
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for ConcurrencySusan Potter
 

More from Susan Potter (17)

Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in Properties
 
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
 
From Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedFrom Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons Learned
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOS
 
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
 
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak Pipe
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids Applied
 
Why Haskell
Why HaskellWhy Haskell
Why Haskell
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
Distributed Developer Workflows using Git
Distributed Developer Workflows using GitDistributed Developer Workflows using Git
Distributed Developer Workflows using Git
 
Link Walking with Riak
Link Walking with RiakLink Walking with Riak
Link Walking with Riak
 
Writing Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptWriting Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScript
 
Twitter4R OAuth
Twitter4R OAuthTwitter4R OAuth
Twitter4R OAuth
 
Deploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatDeploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweat
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for Concurrency
 

Recently uploaded

Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019VICTOR MAESTRE RAMIREZ
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!ISPMAIndia
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfssuser82c38d
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash CourseRohan Chandane
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...emili denli
 
Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Jeffrey Haguewood
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetMatthewTHawley
 
Welcome to AltTask - the nexus where innovation converges with empowerment!
Welcome to AltTask - the nexus where innovation converges with empowerment!Welcome to AltTask - the nexus where innovation converges with empowerment!
Welcome to AltTask - the nexus where innovation converges with empowerment!alttaskcom
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxMindInventory
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowNaoki (Neo) SATO
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriISPMAIndia
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!Anthony Dahanne
 
The Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and TakeawaysThe Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and TakeawaysThousandEyes
 
killing camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfkilling camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfssuser82c38d
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureHironori Washizaki
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20Shane Coughlan
 

Recently uploaded (20)

Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019Implementing Docker Containers with Windows Server 2019
Implementing Docker Containers with Windows Server 2019
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdf
 
Agile & Scrum, Certified Scrum Master! Crash Course
Agile & Scrum,  Certified Scrum Master! Crash CourseAgile & Scrum,  Certified Scrum Master! Crash Course
Agile & Scrum, Certified Scrum Master! Crash Course
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
 
Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)Automation for Bonterra Impact Management (fka Apricot)
Automation for Bonterra Impact Management (fka Apricot)
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 Smartsheet
 
Welcome to AltTask - the nexus where innovation converges with empowerment!
Welcome to AltTask - the nexus where innovation converges with empowerment!Welcome to AltTask - the nexus where innovation converges with empowerment!
Welcome to AltTask - the nexus where innovation converges with empowerment!
 
Role of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptxRole of DevOps in SaaS product Development.pdf.pptx
Role of DevOps in SaaS product Development.pdf.pptx
 
eLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdfeLearning Content Development Company Code and Pixels.pdf
eLearning Content Development Company Code and Pixels.pdf
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flow
 
AI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit BendigiriAI Product Management by Abhijit Bendigiri
AI Product Management by Abhijit Bendigiri
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!
 
The Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and TakeawaysThe Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and Takeaways
 
killing camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdfkilling camp 주차장 나누기-2 topology sort.pdf
killing camp 주차장 나누기-2 topology sort.pdf
 
Joseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about ArchitectureJoseph Yoder : Being Agile about Architecture
Joseph Yoder : Being Agile about Architecture
 
OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20OpenChain AI Study Group - North America and Europe - 2024-02-20
OpenChain AI Study Group - North America and Europe - 2024-02-20
 

Dynamically scaling a political news and activism hub (up to 5x the traffic in 20 minutes)

  • 1. Dynamically scaling a news & activism hub (scaling out up to 5x the write-traffic in 20 minutes) Susan Potter April 26, 2019
  • 2. Outline Intro Problem Outline Before & After AWS EC2 AutoScaling: An overview Related Side Notes Questions? 1
  • 4. whoami $ finger $(whoami) Name: Susan Potter Last login Sun Jan 18 18:30 1996 (GMT) on tty1 - 23 years writing software - Server-side/backend/infrastructure engineering, mostly - Likes: functional programming (e.g. Haskell) Today: - Build new backend services in Haskell - I babysit a bloated Rails webapp Previously: trading systems, SaaS products, CI/CD 2
  • 5. In the cloud Figure 1: Programming cloud infrastructures from the soy bean fields 3
  • 8. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 9. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 10. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 11. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 12. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 13. Related Problems • Deployment method (Capistrano) has horrific failure modes during scale out/in events • Chef converged less and less and work to maintain it increased • Using moving to dynamic autoscaling didn’t fix these directly but our solution considered how it could help. 6
  • 16. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 17. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 18. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 19. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 20. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 21. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 22. AWS EC2 AutoScaling: An overview
  • 23. 10
  • 24. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 25. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 26. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 27. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 28. AutoScaling Lifecycle Figure 2: Transition between instance states in the Amazon EC2 AutoScaling lifecycle 11
  • 29. AutoScaling Group: Properties • Min, max, desired • Launch configuration (exactly one pointer) • Health check type (EC2/ELB) • AZs • Timeouts • Scaling policies (zero or more) 12
  • 30. AutoScaling Group: Create via CLI declare -r rid="ResourceId=${asg_name}" delcare -r rtype="ResourceType=auto-scaling-group" aws autoscaling create-auto-scaling-group --auto-scaling-group-name "${asg_name}" --launch-configuration-name "${lc_name}" --min-size ${min_size:-1} --max-size ${max_size:-9} --default-cooldown ${cooldown:-120} --availability-zones ${availability_zones} --health-check-type "${health_check_type:-ELB}" --health-check-grace-period "${grace_period:-90}" --vpc-zone-identifier "${subnet_ids}" --tags "${rid},${rtype},Key=LifeCycle,Value=alive,PropagateAtLaunch=false" 13
  • 31. Autoscaling Group: Enable metrics collection # After creation aws autoscaling enable-metrics-collection --auto-scaling-group-name "${asg_name}" --granularity "1Minute" 14
  • 32. Autoscaling Group: Querying instance IDs in ASG aws autoscaling describe-auto-scaling-groups --output text --region "${region}" --auto-scaling-group-names "${asg_name}" --query 'AutoScalingGroups[].Instances[].InstanceId' 15
  • 33. Launch Configuration: Properties • AMI • Instance type • User-data • Instance tags • Security groups • Block device mappings • IAM instance profiles Note: immutable after creation 16
  • 34. Launch Configuration: Create via CLI declare -r bdev="DeviceName=/dev/sda1" declare -r vtype="VolumeType=gp2" declare -r term="DeleteOnTermination=true" aws autoscaling create-launch-configuration --launch-configuration-name "${lc_name}" --image-id "${image_id}" --iam-instance-profile "${lc_name}-profile" --security-groups ${security_groups} --instance-type ${instance} --block-device-mappings "${bdev},Ebs={${term},${vtype},VolumeSize=${disk_size}}" 17
  • 35. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 36. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 37. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 38. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 39. Scaling Policies: Create via CLI aws autoscaling put-scaling-policy --auto-scaling-group-name "${asg_name}" --policy-name "${scaling_policy_name}" --adjustment-type ChangeInCapacity --scaling-adjustment 1 19
  • 40. Scaling Policies: Attach Metric Alarm aws cloudwatch put-metric-alarm --alarm-name Step-Scaling-AlarmHigh-AddCapacity --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 120 --evaluation-periods 2 --threshold 60 --comparison-operator GreaterThanOrEqualToThreshold --dimensions "Name=AutoScalingGroupName,Value=${asg_name}" --alarm-actions "${policy_arn}" 20
  • 41. Custom Metrics: Report metric data aws cloudwatch put-metric-data --metric-name custom-metric-name --namespace MyOrg/Custom --unit Count --value ${value} --storage-resolution 1 --dimensions "AutoScalingGroupName=${asg_name}" 21
  • 42. Lifecycle Hooks: Properties We don’t use this but for adding hooks to provision software on newly launched instances and similar actions. 22
  • 44. 23
  • 45. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 46. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 47. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 48. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 49. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 50. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 51. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 52. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 53. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 54. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 55. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 56. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 57. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 58. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 59. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 60. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 61. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 62. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 63. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 64. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 65. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 66. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 67. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 69. LinkedIn /in/susanpotter GitHub @mbbx6spp Keybase @mbbx6spp Twitter @SusanPotter 27