SlideShare a Scribd company logo
1 of 69
Download to read offline
Dynamically scaling a news & activism hub
(scaling out up to 5x the write-traffic in 20 minutes)
Susan Potter
April 26, 2019
Outline
Intro
Problem Outline
Before & After
AWS EC2 AutoScaling: An overview
Related Side Notes
Questions?
1
Intro
whoami
$ finger $(whoami)
Name: Susan Potter
Last login Sun Jan 18 18:30 1996 (GMT) on tty1
- 23 years writing software
- Server-side/backend/infrastructure engineering, mostly
- Likes: functional programming (e.g. Haskell)
Today:
- Build new backend services in Haskell
- I babysit a bloated Rails webapp
Previously: trading systems, SaaS products, CI/CD
2
In the cloud
Figure 1: Programming cloud infrastructures from the soy bean fields
3
Problem Outline
Traffic
4
Legacy/History
• Deliver news, discussions, & campaigns to over two million users/day
• Traffic varies significantly during the day
• Heavy reads (varnish saves our site every day)
• Writes go to content publishing backends, which are slow and expensive (Perl, Ruby)
• When news breaks or the newsletter is sent our active users want to login, comment,
recommend, write their own story, etc, which are WRITES.
5
Legacy/History
• Deliver news, discussions, & campaigns to over two million users/day
• Traffic varies significantly during the day
• Heavy reads (varnish saves our site every day)
• Writes go to content publishing backends, which are slow and expensive (Perl, Ruby)
• When news breaks or the newsletter is sent our active users want to login, comment,
recommend, write their own story, etc, which are WRITES.
5
Legacy/History
• Deliver news, discussions, & campaigns to over two million users/day
• Traffic varies significantly during the day
• Heavy reads (varnish saves our site every day)
• Writes go to content publishing backends, which are slow and expensive (Perl, Ruby)
• When news breaks or the newsletter is sent our active users want to login, comment,
recommend, write their own story, etc, which are WRITES.
5
Legacy/History
• Deliver news, discussions, & campaigns to over two million users/day
• Traffic varies significantly during the day
• Heavy reads (varnish saves our site every day)
• Writes go to content publishing backends, which are slow and expensive (Perl, Ruby)
• When news breaks or the newsletter is sent our active users want to login, comment,
recommend, write their own story, etc, which are WRITES.
5
Legacy/History
• Deliver news, discussions, & campaigns to over two million users/day
• Traffic varies significantly during the day
• Heavy reads (varnish saves our site every day)
• Writes go to content publishing backends, which are slow and expensive (Perl, Ruby)
• When news breaks or the newsletter is sent our active users want to login, comment,
recommend, write their own story, etc, which are WRITES.
5
Related Problems
• Deployment method (Capistrano) has horrific failure modes during scale out/in
events
• Chef converged less and less and work to maintain it increased
• Using moving to dynamic autoscaling didn’t fix these directly but our solution
considered how it could help.
6
Making me …
7
Before & After
Before: When I started (Sept 2016)
• Only one problematic service was in a static autoscaling group (no scaling policies,
manually modified by a human :gasp:, ”static”)
• Services used atrophying AMIs that may not converge due to external APT source
dependencies changing in significant ways :(
• Often AMIs didn’t successfully bootstrap within 15 minutes
8
Before: When I started (Sept 2016)
• Only one problematic service was in a static autoscaling group (no scaling policies,
manually modified by a human :gasp:, ”static”)
• Services used atrophying AMIs that may not converge due to external APT source
dependencies changing in significant ways :(
• Often AMIs didn’t successfully bootstrap within 15 minutes
8
Before: When I started (Sept 2016)
• Only one problematic service was in a static autoscaling group (no scaling policies,
manually modified by a human :gasp:, ”static”)
• Services used atrophying AMIs that may not converge due to external APT source
dependencies changing in significant ways :(
• Often AMIs didn’t successfully bootstrap within 15 minutes
8
Today: All services in dynamic autoscaling groups
• Frontend caching/routing layer
• Both content publishing backends
• Internal systems, e.g. logging, metrics, etc.
9
Today: All services in dynamic autoscaling groups
• Frontend caching/routing layer
• Both content publishing backends
• Internal systems, e.g. logging, metrics, etc.
9
Today: All services in dynamic autoscaling groups
• Frontend caching/routing layer
• Both content publishing backends
• Internal systems, e.g. logging, metrics, etc.
9
AWS EC2 AutoScaling: An
overview
10
High-level Primitives
• AutoScaling Group
• Launch Configuration
• Scaling Policies
• Lifecycle Hooks
10
High-level Primitives
• AutoScaling Group
• Launch Configuration
• Scaling Policies
• Lifecycle Hooks
10
High-level Primitives
• AutoScaling Group
• Launch Configuration
• Scaling Policies
• Lifecycle Hooks
10
High-level Primitives
• AutoScaling Group
• Launch Configuration
• Scaling Policies
• Lifecycle Hooks
10
AutoScaling Lifecycle
Figure 2: Transition between instance states in the Amazon EC2 AutoScaling lifecycle
11
AutoScaling Group: Properties
• Min, max, desired
• Launch configuration (exactly one pointer)
• Health check type (EC2/ELB)
• AZs
• Timeouts
• Scaling policies (zero or more)
12
AutoScaling Group: Create via CLI
declare -r rid="ResourceId=${asg_name}"
delcare -r rtype="ResourceType=auto-scaling-group"
aws autoscaling create-auto-scaling-group 
--auto-scaling-group-name "${asg_name}" 
--launch-configuration-name "${lc_name}" 
--min-size ${min_size:-1} 
--max-size ${max_size:-9} 
--default-cooldown ${cooldown:-120} 
--availability-zones ${availability_zones} 
--health-check-type "${health_check_type:-ELB}" 
--health-check-grace-period "${grace_period:-90}" 
--vpc-zone-identifier "${subnet_ids}" 
--tags 
"${rid},${rtype},Key=LifeCycle,Value=alive,PropagateAtLaunch=false"
13
Autoscaling Group: Enable metrics collection
# After creation
aws autoscaling enable-metrics-collection 
--auto-scaling-group-name "${asg_name}" 
--granularity "1Minute"
14
Autoscaling Group: Querying instance IDs in ASG
aws autoscaling describe-auto-scaling-groups 
--output text 
--region "${region}" 
--auto-scaling-group-names "${asg_name}" 
--query 'AutoScalingGroups[].Instances[].InstanceId'
15
Launch Configuration: Properties
• AMI
• Instance type
• User-data
• Instance tags
• Security groups
• Block device mappings
• IAM instance profiles
Note: immutable after creation
16
Launch Configuration: Create via CLI
declare -r bdev="DeviceName=/dev/sda1"
declare -r vtype="VolumeType=gp2"
declare -r term="DeleteOnTermination=true"
aws autoscaling create-launch-configuration 
--launch-configuration-name "${lc_name}" 
--image-id "${image_id}" 
--iam-instance-profile "${lc_name}-profile" 
--security-groups ${security_groups} 
--instance-type ${instance} 
--block-device-mappings 
"${bdev},Ebs={${term},${vtype},VolumeSize=${disk_size}}"
17
Scaling Policies: Properties
• Policy name
• Metric type
• Adjustment type
• Scaling adjustment
18
Scaling Policies: Properties
• Policy name
• Metric type
• Adjustment type
• Scaling adjustment
18
Scaling Policies: Properties
• Policy name
• Metric type
• Adjustment type
• Scaling adjustment
18
Scaling Policies: Properties
• Policy name
• Metric type
• Adjustment type
• Scaling adjustment
18
Scaling Policies: Create via CLI
aws autoscaling put-scaling-policy 
--auto-scaling-group-name "${asg_name}" 
--policy-name "${scaling_policy_name}" 
--adjustment-type ChangeInCapacity 
--scaling-adjustment 1
19
Scaling Policies: Attach Metric Alarm
aws cloudwatch put-metric-alarm 
--alarm-name Step-Scaling-AlarmHigh-AddCapacity 
--metric-name CPUUtilization 
--namespace AWS/EC2 
--statistic Average 
--period 120 
--evaluation-periods 2 
--threshold 60 
--comparison-operator GreaterThanOrEqualToThreshold 
--dimensions "Name=AutoScalingGroupName,Value=${asg_name}" 
--alarm-actions "${policy_arn}"
20
Custom Metrics: Report metric data
aws cloudwatch put-metric-data 
--metric-name custom-metric-name 
--namespace MyOrg/Custom 
--unit Count 
--value ${value} 
--storage-resolution 1 
--dimensions "AutoScalingGroupName=${asg_name}"
21
Lifecycle Hooks: Properties
We don’t use this but for adding hooks to provision software on newly launched
instances and similar actions.
22
Related Side Notes
23
EC2 Instance Bootstrapping
• Chef converge boostrapping took ~15minutes
• Improved bootstrapping by an order of magnitude with fully baked AMIs
• Now we fully bake AMIs for each config and app change (5mins, one time per release
per environment, a constant factor, using NixOS)
Fully baking AMIs also gives us system reproducibility that convergent configuration
systems like Chef couldn’t give us.
23
EC2 Instance Bootstrapping
• Chef converge boostrapping took ~15minutes
• Improved bootstrapping by an order of magnitude with fully baked AMIs
• Now we fully bake AMIs for each config and app change (5mins, one time per release
per environment, a constant factor, using NixOS)
Fully baking AMIs also gives us system reproducibility that convergent configuration
systems like Chef couldn’t give us.
23
EC2 Instance Bootstrapping
• Chef converge boostrapping took ~15minutes
• Improved bootstrapping by an order of magnitude with fully baked AMIs
• Now we fully bake AMIs for each config and app change (5mins, one time per release
per environment, a constant factor, using NixOS)
Fully baking AMIs also gives us system reproducibility that convergent configuration
systems like Chef couldn’t give us.
23
Right-Size Instance Types per Service
• We used to use whatever instance type was set before because $REASONS
• Now we inspect each service’s resource usage in production in peak, typical, and
overnight resting states to know how to size a service’s cluster.
• Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and
potentially hurting your product’s UX
24
Right-Size Instance Types per Service
• We used to use whatever instance type was set before because $REASONS
• Now we inspect each service’s resource usage in production in peak, typical, and
overnight resting states to know how to size a service’s cluster.
• Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and
potentially hurting your product’s UX
24
Right-Size Instance Types per Service
• We used to use whatever instance type was set before because $REASONS
• Now we inspect each service’s resource usage in production in peak, typical, and
overnight resting states to know how to size a service’s cluster.
• Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and
potentially hurting your product’s UX
24
Find Leading Indicator Metric for Dynamic Scale Out/In
• Every service behaves differently under load
• We initially started dyanamically scaling using policies based purely on CPU (a start
but not good enough for us)
• Now we report custom metrics to AWS CloudWatch that are leading indicators that
our cluster needs to scale out or in.
Leads to more predictable performance on the site even under traffic spikes.
25
Find Leading Indicator Metric for Dynamic Scale Out/In
• Every service behaves differently under load
• We initially started dyanamically scaling using policies based purely on CPU (a start
but not good enough for us)
• Now we report custom metrics to AWS CloudWatch that are leading indicators that
our cluster needs to scale out or in.
Leads to more predictable performance on the site even under traffic spikes.
25
Find Leading Indicator Metric for Dynamic Scale Out/In
• Every service behaves differently under load
• We initially started dyanamically scaling using policies based purely on CPU (a start
but not good enough for us)
• Now we report custom metrics to AWS CloudWatch that are leading indicators that
our cluster needs to scale out or in.
Leads to more predictable performance on the site even under traffic spikes.
25
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Fail-Safe Semantics for Deploy
• AMI artifacts built and tested
• AMIs for each service uploaded and registered with AWS EC2
• Brand new ASG + LC created referring to new AMI for release
• Scaling policies from current/live ASG copied over to new ASG
• Copy over min, max, and desired capacities from current to new
• Wait for all desired instances to report app-level healthy
• Add ASG to ALB with current/old ASG
• Remove current/old ASG from ALB
• Set min=desired=0 in old ASG
• Clean up stale ASG (not old one, but older)
26
Other stuff
• DONE Script rollback (~1 minute to previous version)
• TODO Implement canary deploy capability
• TODO Check error rates and/or latencies haven’t increased before removing old ASG
from ALB
• REMINDER your max capacity should be determined by your backend runtime
dependencies (it’s transitive)
27
Other stuff
• DONE Script rollback (~1 minute to previous version)
• TODO Implement canary deploy capability
• TODO Check error rates and/or latencies haven’t increased before removing old ASG
from ALB
• REMINDER your max capacity should be determined by your backend runtime
dependencies (it’s transitive)
27
Other stuff
• DONE Script rollback (~1 minute to previous version)
• TODO Implement canary deploy capability
• TODO Check error rates and/or latencies haven’t increased before removing old ASG
from ALB
• REMINDER your max capacity should be determined by your backend runtime
dependencies (it’s transitive)
27
Other stuff
• DONE Script rollback (~1 minute to previous version)
• TODO Implement canary deploy capability
• TODO Check error rates and/or latencies haven’t increased before removing old ASG
from ALB
• REMINDER your max capacity should be determined by your backend runtime
dependencies (it’s transitive)
27
Questions?
LinkedIn /in/susanpotter
GitHub @mbbx6spp
Keybase @mbbx6spp
Twitter @SusanPotter
27

More Related Content

What's hot

Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Julien SIMON
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...Amazon Web Services
 
Building Serverless APIs on AWS
Building Serverless APIs on AWSBuilding Serverless APIs on AWS
Building Serverless APIs on AWSJulien SIMON
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia
 
An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)Julien SIMON
 
Building your own slack bot on the AWS stack
Building your own slack bot on the AWS stackBuilding your own slack bot on the AWS stack
Building your own slack bot on the AWS stackTorontoNodeJS
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Julien SIMON
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Amazon Web Services
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDanilo Poccia
 
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Amazon Web Services
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Amazon Web Services
 
개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive창훈 정
 
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...Amazon Web Services
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateAmazon Web Services
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Julien SIMON
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...Amazon Web Services
 
AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAmazon Web Services
 

What's hot (20)

Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)Running Open Source Platforms on AWS (November 2016)
Running Open Source Platforms on AWS (November 2016)
 
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
(ARC402) Deployment Automation: From Developers' Keyboards to End Users' Scre...
 
Building Serverless APIs on AWS
Building Serverless APIs on AWSBuilding Serverless APIs on AWS
Building Serverless APIs on AWS
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and Hosting
 
An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)An introduction to serverless architectures (February 2017)
An introduction to serverless architectures (February 2017)
 
Building your own slack bot on the AWS stack
Building your own slack bot on the AWS stackBuilding your own slack bot on the AWS stack
Building your own slack bot on the AWS stack
 
SOA on Rails
SOA on RailsSOA on Rails
SOA on Rails
 
Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)Deep Dive: Amazon Relational Database Service (March 2017)
Deep Dive: Amazon Relational Database Service (March 2017)
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and ToolsDeployment and Management on AWS:
 A Deep Dive on Options and Tools
Deployment and Management on AWS:
 A Deep Dive on Options and Tools
 
CloudFormation Best Practices
CloudFormation Best PracticesCloudFormation Best Practices
CloudFormation Best Practices
 
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) |...
 
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS r...
 
개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive개발자를 위한 Amazon Lightsail Deep-Dive
개발자를 위한 Amazon Lightsail Deep-Dive
 
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
(ARC311) Extreme Availability for Mission-Critical Applications | AWS re:Inve...
 
Configuration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef AutomateConfiguration Management with AWS OpsWorks for Chef Automate
Configuration Management with AWS OpsWorks for Chef Automate
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)
 
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
AWS Webcast - Backup & Restore for ElastiCache/Redis: Getting Started & Best ...
 
AWS CloudFormation Best Practices
AWS CloudFormation Best PracticesAWS CloudFormation Best Practices
AWS CloudFormation Best Practices
 

Similar to Dynamically scaling a news & activism hub to handle 5x traffic spikes

Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesHiroshi SHIBATA
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Amazon Web Services
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesAmazon Web Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Amazon Web Services
 
Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Vadym Kazulkin
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!Julien SIMON
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017Amazon Web Services
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersAmazon Web Services
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsAmazon Web Services
 
Creating scalable solutions with aws
Creating scalable solutions with awsCreating scalable solutions with aws
Creating scalable solutions with awsondrejbalas
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryCarlo Bonamico
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesAmazon Web Services
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsAmazon Web Services
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Amazon Web Services
 

Similar to Dynamically scaling a news & activism hub to handle 5x traffic spikes (20)

Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017Building Serverless Web Applications - DevDay Los Angeles 2017
Building Serverless Web Applications - DevDay Los Angeles 2017
 
Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019Serverless on AWS : Understanding the hard parts at Froscon 2019
Serverless on AWS : Understanding the hard parts at Froscon 2019
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!
 
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
How Netflix Tunes Amazon EC2 Instances for Performance - CMP325 - re:Invent 2017
 
Scaling up to Your First 10 Million Users
Scaling up to Your First 10 Million UsersScaling up to Your First 10 Million Users
Scaling up to Your First 10 Million Users
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
ATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing OperationsATC301-Big Data & Analytics for Manufacturing Operations
ATC301-Big Data & Analytics for Manufacturing Operations
 
Creating scalable solutions with aws
Creating scalable solutions with awsCreating scalable solutions with aws
Creating scalable solutions with aws
 
Infrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous DeliveryInfrastructure as Data with Ansible for easier Continuous Delivery
Infrastructure as Data with Ansible for easier Continuous Delivery
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Relational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill BaldwinRelational Database Services on AWS - Bill Baldwin
Relational Database Services on AWS - Bill Baldwin
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
T1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on awsT1 – Architecting highly available applications on aws
T1 – Architecting highly available applications on aws
 
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
Cloud Connected Devices on a Global Scale (CPN303) | AWS re:Invent 2013
 

More from Susan Potter

Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in PropertiesSusan Potter
 
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Susan Potter
 
From Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedFrom Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedSusan Potter
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Susan Potter
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSSusan Potter
 
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016Susan Potter
 
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Susan Potter
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeSusan Potter
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedSusan Potter
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresSusan Potter
 
Distributed Developer Workflows using Git
Distributed Developer Workflows using GitDistributed Developer Workflows using Git
Distributed Developer Workflows using GitSusan Potter
 
Link Walking with Riak
Link Walking with RiakLink Walking with Riak
Link Walking with RiakSusan Potter
 
Writing Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptWriting Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptSusan Potter
 
Deploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatDeploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatSusan Potter
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for ConcurrencySusan Potter
 

More from Susan Potter (17)

Thinking in Properties
Thinking in PropertiesThinking in Properties
Thinking in Properties
 
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)Champaign-Urbana Javascript Meetup Talk (Jan 2020)
Champaign-Urbana Javascript Meetup Talk (Jan 2020)
 
From Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons LearnedFrom Zero to Haskell: Lessons Learned
From Zero to Haskell: Lessons Learned
 
Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)Functional Operations (Functional Programming at Comcast Labs Connect)
Functional Operations (Functional Programming at Comcast Labs Connect)
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOS
 
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
From Zero To Production (NixOS, Erlang) @ Erlang Factory SF 2016
 
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
Scalaz By Example (An IO Taster) -- PDXScala Meetup Jan 2014
 
Ricon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak PipeRicon/West 2013: Adventures with Riak Pipe
Ricon/West 2013: Adventures with Riak Pipe
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids Applied
 
Why Haskell
Why HaskellWhy Haskell
Why Haskell
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
Distributed Developer Workflows using Git
Distributed Developer Workflows using GitDistributed Developer Workflows using Git
Distributed Developer Workflows using Git
 
Link Walking with Riak
Link Walking with RiakLink Walking with Riak
Link Walking with Riak
 
Writing Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScriptWriting Bullet-Proof Javascript: By Using CoffeeScript
Writing Bullet-Proof Javascript: By Using CoffeeScript
 
Twitter4R OAuth
Twitter4R OAuthTwitter4R OAuth
Twitter4R OAuth
 
Deploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweatDeploying distributed software services to the cloud without breaking a sweat
Deploying distributed software services to the cloud without breaking a sweat
 
Designing for Concurrency
Designing for ConcurrencyDesigning for Concurrency
Designing for Concurrency
 

Recently uploaded

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 

Recently uploaded (20)

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 

Dynamically scaling a news & activism hub to handle 5x traffic spikes

  • 1. Dynamically scaling a news & activism hub (scaling out up to 5x the write-traffic in 20 minutes) Susan Potter April 26, 2019
  • 2. Outline Intro Problem Outline Before & After AWS EC2 AutoScaling: An overview Related Side Notes Questions? 1
  • 4. whoami $ finger $(whoami) Name: Susan Potter Last login Sun Jan 18 18:30 1996 (GMT) on tty1 - 23 years writing software - Server-side/backend/infrastructure engineering, mostly - Likes: functional programming (e.g. Haskell) Today: - Build new backend services in Haskell - I babysit a bloated Rails webapp Previously: trading systems, SaaS products, CI/CD 2
  • 5. In the cloud Figure 1: Programming cloud infrastructures from the soy bean fields 3
  • 8. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 9. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 10. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 11. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 12. Legacy/History • Deliver news, discussions, & campaigns to over two million users/day • Traffic varies significantly during the day • Heavy reads (varnish saves our site every day) • Writes go to content publishing backends, which are slow and expensive (Perl, Ruby) • When news breaks or the newsletter is sent our active users want to login, comment, recommend, write their own story, etc, which are WRITES. 5
  • 13. Related Problems • Deployment method (Capistrano) has horrific failure modes during scale out/in events • Chef converged less and less and work to maintain it increased • Using moving to dynamic autoscaling didn’t fix these directly but our solution considered how it could help. 6
  • 16. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 17. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 18. Before: When I started (Sept 2016) • Only one problematic service was in a static autoscaling group (no scaling policies, manually modified by a human :gasp:, ”static”) • Services used atrophying AMIs that may not converge due to external APT source dependencies changing in significant ways :( • Often AMIs didn’t successfully bootstrap within 15 minutes 8
  • 19. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 20. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 21. Today: All services in dynamic autoscaling groups • Frontend caching/routing layer • Both content publishing backends • Internal systems, e.g. logging, metrics, etc. 9
  • 22. AWS EC2 AutoScaling: An overview
  • 23. 10
  • 24. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 25. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 26. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 27. High-level Primitives • AutoScaling Group • Launch Configuration • Scaling Policies • Lifecycle Hooks 10
  • 28. AutoScaling Lifecycle Figure 2: Transition between instance states in the Amazon EC2 AutoScaling lifecycle 11
  • 29. AutoScaling Group: Properties • Min, max, desired • Launch configuration (exactly one pointer) • Health check type (EC2/ELB) • AZs • Timeouts • Scaling policies (zero or more) 12
  • 30. AutoScaling Group: Create via CLI declare -r rid="ResourceId=${asg_name}" delcare -r rtype="ResourceType=auto-scaling-group" aws autoscaling create-auto-scaling-group --auto-scaling-group-name "${asg_name}" --launch-configuration-name "${lc_name}" --min-size ${min_size:-1} --max-size ${max_size:-9} --default-cooldown ${cooldown:-120} --availability-zones ${availability_zones} --health-check-type "${health_check_type:-ELB}" --health-check-grace-period "${grace_period:-90}" --vpc-zone-identifier "${subnet_ids}" --tags "${rid},${rtype},Key=LifeCycle,Value=alive,PropagateAtLaunch=false" 13
  • 31. Autoscaling Group: Enable metrics collection # After creation aws autoscaling enable-metrics-collection --auto-scaling-group-name "${asg_name}" --granularity "1Minute" 14
  • 32. Autoscaling Group: Querying instance IDs in ASG aws autoscaling describe-auto-scaling-groups --output text --region "${region}" --auto-scaling-group-names "${asg_name}" --query 'AutoScalingGroups[].Instances[].InstanceId' 15
  • 33. Launch Configuration: Properties • AMI • Instance type • User-data • Instance tags • Security groups • Block device mappings • IAM instance profiles Note: immutable after creation 16
  • 34. Launch Configuration: Create via CLI declare -r bdev="DeviceName=/dev/sda1" declare -r vtype="VolumeType=gp2" declare -r term="DeleteOnTermination=true" aws autoscaling create-launch-configuration --launch-configuration-name "${lc_name}" --image-id "${image_id}" --iam-instance-profile "${lc_name}-profile" --security-groups ${security_groups} --instance-type ${instance} --block-device-mappings "${bdev},Ebs={${term},${vtype},VolumeSize=${disk_size}}" 17
  • 35. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 36. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 37. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 38. Scaling Policies: Properties • Policy name • Metric type • Adjustment type • Scaling adjustment 18
  • 39. Scaling Policies: Create via CLI aws autoscaling put-scaling-policy --auto-scaling-group-name "${asg_name}" --policy-name "${scaling_policy_name}" --adjustment-type ChangeInCapacity --scaling-adjustment 1 19
  • 40. Scaling Policies: Attach Metric Alarm aws cloudwatch put-metric-alarm --alarm-name Step-Scaling-AlarmHigh-AddCapacity --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 120 --evaluation-periods 2 --threshold 60 --comparison-operator GreaterThanOrEqualToThreshold --dimensions "Name=AutoScalingGroupName,Value=${asg_name}" --alarm-actions "${policy_arn}" 20
  • 41. Custom Metrics: Report metric data aws cloudwatch put-metric-data --metric-name custom-metric-name --namespace MyOrg/Custom --unit Count --value ${value} --storage-resolution 1 --dimensions "AutoScalingGroupName=${asg_name}" 21
  • 42. Lifecycle Hooks: Properties We don’t use this but for adding hooks to provision software on newly launched instances and similar actions. 22
  • 44. 23
  • 45. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 46. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 47. EC2 Instance Bootstrapping • Chef converge boostrapping took ~15minutes • Improved bootstrapping by an order of magnitude with fully baked AMIs • Now we fully bake AMIs for each config and app change (5mins, one time per release per environment, a constant factor, using NixOS) Fully baking AMIs also gives us system reproducibility that convergent configuration systems like Chef couldn’t give us. 23
  • 48. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 49. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 50. Right-Size Instance Types per Service • We used to use whatever instance type was set before because $REASONS • Now we inspect each service’s resource usage in production in peak, typical, and overnight resting states to know how to size a service’s cluster. • Recommend this practice post-ASG or you are dropping $$$ in AWS’s lap and potentially hurting your product’s UX 24
  • 51. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 52. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 53. Find Leading Indicator Metric for Dynamic Scale Out/In • Every service behaves differently under load • We initially started dyanamically scaling using policies based purely on CPU (a start but not good enough for us) • Now we report custom metrics to AWS CloudWatch that are leading indicators that our cluster needs to scale out or in. Leads to more predictable performance on the site even under traffic spikes. 25
  • 54. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 55. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 56. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 57. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 58. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 59. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 60. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 61. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 62. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 63. Fail-Safe Semantics for Deploy • AMI artifacts built and tested • AMIs for each service uploaded and registered with AWS EC2 • Brand new ASG + LC created referring to new AMI for release • Scaling policies from current/live ASG copied over to new ASG • Copy over min, max, and desired capacities from current to new • Wait for all desired instances to report app-level healthy • Add ASG to ALB with current/old ASG • Remove current/old ASG from ALB • Set min=desired=0 in old ASG • Clean up stale ASG (not old one, but older) 26
  • 64. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 65. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 66. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 67. Other stuff • DONE Script rollback (~1 minute to previous version) • TODO Implement canary deploy capability • TODO Check error rates and/or latencies haven’t increased before removing old ASG from ALB • REMINDER your max capacity should be determined by your backend runtime dependencies (it’s transitive) 27
  • 69. LinkedIn /in/susanpotter GitHub @mbbx6spp Keybase @mbbx6spp Twitter @SusanPotter 27