Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Advanced AWS Patterns from the
trenches of the enterprise
John Painter
Principal Consultant
john.painter@sourcedgroup.com
Brent Harrison
Consultant
brent.harrison@sourcedgroup.com
...
CONSULTING
Banks Aviation
Telecom
FinTech
Media
Healthcare
Smartphone
Manufacturer
Utilities
• Behind the firewall
• BYO AWS Account
• Guaranteed single tenancy
• Multi-cloud options
• Customer-controlled encryption...
PATTERN 1 – AUTO HEALING GEN2
AUTOMATED HEALING OF SINGLE INSTANCES WITH DEEP
HEALTH CHECKING
THE BASIC AUTO-HEALING PATTERN
1. Create ASG with min:max of 1:1
2. Elastic Load Balancer (ELB) provides
deep health check...
There are strong fiscal motivators to reduce tier-1 operational costs
via the use of automated healing actions
ASGS, ETH0, AND STATIC IPS
• LOTS of “cloud” applications want static
networks
• ASG instances receive dynamic IPs
• Add a...
THE PREVIOUS APPROACH
1. Virtual Private Cloud (VPC) with large subnets (eg:>/24)
2. ASG with min:max of 1:1
3. Scripts ca...
ONE ALTERNATIVE
Auto Scaling Group
min: 1, max: 1
Other cluster
members/users
ASG Notifications SQS/SNS Lambda
AWS CLI
 S...
GOING BACK TO FIRST PRINCIPLES
• What hands out IP addresses in AWS?
–DHCP (VPC DHCP Options Group)
• Where does the range...
AUTO-HEAL GEN2
1. Create a Subnet for the Auto-Heal node (at the moment /28 is the smallest)
2. Create enough ENIs to remo...
Same technique can be used for “fixed clusters”, sets of quorum
servers, container systems
PATTERN 2 - ADVANCED PROXY
A SCALABLE , HIGHLY AVAILABLE PROXY WITH ACTIVE DATA
CONTROLS AND STATIC IP RANGES
EC2 with Outbound Internet Access
MNAT
EIP
Public Subnet
Private Subnet
EC2EC2EC2
 Uncontrolled access to the internet
 ...
• Limitation of VPC: Routes can only
reference single interface
 Active control of traffic
 HTTP/S inspection
? Non-triv...
Public SubnetPublic Subnet
Availability Zone A
Auto Scaling Proxy
Availability Zone B
ASG
PROXYPROXY
 Active control of t...
Variable edge IPs are undesirable in the enterprise
Auto Scaling Proxies with Static IPs
 Active control of traffic
 Actively load balanced
 Static external IP addresses
?...
Why? .....and hang on, I still see EIPs?
Scaling Increments:
10GB @ $42/month/10GB
100’s of Mb/s @ ~$210/month/100’s Mb/s
...
Complex Inspection Sandwich
• Lots of vendor solutions can now support
healing
• Some even support scaling
• Few support E...
PATTERN 3 - AUTO SCALING ANYTHING
A TECHNIQUE LEVERAGING EXISTING SERVICES TO AUTOSCALE ALMOST
ANYTHING
The fiscal and operational benefits of Auto Scaling are well understood.
Auto Scaling is currently limited to scaling EC2 ...
SCALING CLUSTERS? SCALING CELLS?
• Enterprises have many applications that cannot scale on compute alone
– Sharded databas...
THE GENERAL CASE - CELL / SHARD / CLUSTER
EC2
Node1
EC2
Node2
EC2
Node-n…
CloudFormation Stack
Health Check
THE GENERAL CASE - CELL / SHARD / CLUSTER
EC2
Node1
EC2
Node2
EC2
Node-n…
CloudFormation Stack
Health Check
EC2
Node1
EC2
...
STEP 1 – INSTRUMENT THE SCALING METRIC
… …
CloudWatch Custom Metric
Number of Users CloudWatch Alarm
ScaleUp
CloudWatch Al...
OPTION 1 – USE LAMBDA
ScaleUp ScaleDown
… …
Number of Users
SNS
…
SNS
Build
Lambda
TeardownL
ambda
CloudFormation
WHY NOT LAMBDA?
• Duplication of AWS engineering investment
• Ongoing cost to maintain cadence with the growing features o...
There are strong fiscal and complexity
motivators to use native ASGs
STEP 2 – “SHADOW” ASG
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Shadow ASG
STEP 3 – ADD THE CFN LAMBDAS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTANCE_LA...
$5.76 per month per stack
(Unoptimized)
Auto Scale  Auto Heal
STEP 4 – HEALTH CHECK THE CLUSTERS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTA...
HEALING SCENARIO 1 – CLUSTER FAILS
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_INSTA...
HEALING SCENARIO 1 – SHADOW TERMINATED
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_I...
HEALING SCENARIO 1 – ASG IS IMPACTED
ScaleUp
ScaleDown
Number of Users
… …
Shadow Shadow
Shadow ASG
Desired: 3
Actual: 2
HEALING SCENARIO 1 – CLUSTER RESTORED
ScaleUp
ScaleDown
Number of Users
… … …
Shadow Shadow Shadow
Auto Scaling SNS
EC2_IN...
Continuous Delivery for Clusters
Blue/Green Updates for Clusters at Huge Scale
CONTINUOUS DELIVERY FOR CLUSTERS
… … … … … …
• Using nothing but the ASG capacity, blue/green roll clusters of almost any ...
AUTO SCALE ANYTHING
• Solution works with many non-scaling AWS services
• CloudFormation can use Custom Resources to creat...
Database Throughput ScaleUp Alarm
SNS
Lambda
RDS Read
Slave
CloudFormation
Shadow ASGRDS Read
Slave
RDS Read
Slave
ScaleDo...
Number of user sign-ups/logins ScaleUp Alarm
SNS
Lambda
Application
Shard
CloudFormation
Shadow ASGApplication
Shard
Appli...
CPU/Memory ScaleUp Alarm
SNS
Lambda
VMWare Node
CloudFormation
Shadow ASGVMWare Node VMWare Node
ScaleDown Alarm
CUSTOM SC...
CPU/Memory ScaleUp Alarm
SNS
Lambda
Other
infrastructure
platforms
CloudFormation
Shadow ASG
Other
infrastructure
platform...
Number of items in the queue ScaleUp Alarm
SNS
Lambda
Life Sciences
Application
CloudFormation
Shadow ASGLife Sciences
App...
Number of planes currently in the air ScaleUp Alarm
SNS
Lambda
Flight Analysis
Stack
CloudFormation
Shadow ASGFlight Analy...
Number of door entries ScaleUp Alarm
SNS
Lambda
Trading Stack
CloudFormation
Shadow ASGTrading Stack Trading Stack
ScaleDo...
Order Volume ScaleUp Alarm
SNS
Lambda
Number of
robots on
station
CloudFormation
Shadow ASG
Number of
robots on
station
Nu...
Find Out MORE:
Visit Us: At our booth or online – www.sourcedgroup.com
Careers: www.sourcedgroup.com/careers
In the news:
...
Thank you!
Upcoming SlideShare
Loading in …5
×

Advanced AWS techniques from the trenches of the Enterprise – Sourced Group

618 views

Published on

Every environment comes with its own set of unique challenges. Looking across our global client base, advanced techniques have emerged to solve common or sometimes, very specific, problems. Techniques such as a re-imagining of autonomous healing, advanced networking and proxying patterns, data ex-filtration controls, and continuous delivery of networks will be covered. This fast-paced technical session will provide an in the trenches view of some of the solutions, discussion of considerations at scale, demonstration, and provide actionable designs to take into your organisation. Join us while we present tips, strategies, and cutting edge patterns from Sourced's battle-hardened consultants

Speakers:
John Painter, Principal Consultant, Sourced Group
Brent Harrison, Consultant, Sourced Group

Published in: Technology
  • Be the first to comment

Advanced AWS techniques from the trenches of the Enterprise – Sourced Group

  1. 1. Advanced AWS Patterns from the trenches of the enterprise
  2. 2. John Painter Principal Consultant john.painter@sourcedgroup.com Brent Harrison Consultant brent.harrison@sourcedgroup.com OUR TEAM TODAY SYDNEY | TORONTO | VANCOUVER | KELOWNA
  3. 3. CONSULTING Banks Aviation Telecom FinTech Media Healthcare Smartphone Manufacturer Utilities
  4. 4. • Behind the firewall • BYO AWS Account • Guaranteed single tenancy • Multi-cloud options • Customer-controlled encryption • Customer retains custody of all data Over half a petabyte per year of Splunk throughput under management ENGINEERED SERVICES
  5. 5. PATTERN 1 – AUTO HEALING GEN2 AUTOMATED HEALING OF SINGLE INSTANCES WITH DEEP HEALTH CHECKING
  6. 6. THE BASIC AUTO-HEALING PATTERN 1. Create ASG with min:max of 1:1 2. Elastic Load Balancer (ELB) provides deep health checking != EC2 Auto Recovery Top Tip: Deep Health Check 1. Script that checks multiple variables (eg: process + disk space + memory) and opens/closes a port via Netcat 2. Set the ELB to “port” type check Auto Scaling Group min: 1, max: 1
  7. 7. There are strong fiscal motivators to reduce tier-1 operational costs via the use of automated healing actions
  8. 8. ASGS, ETH0, AND STATIC IPS • LOTS of “cloud” applications want static networks • ASG instances receive dynamic IPs • Add a secondary interface (Elastic Network Interface – ENI) which maintains a fixed network address Auto Scaling Group min: 1, max: 1 Other cluster members/users “Re-mappable” ENI
  9. 9. THE PREVIOUS APPROACH 1. Virtual Private Cloud (VPC) with large subnets (eg:>/24) 2. ASG with min:max of 1:1 3. Scripts call EC2 API on boot to “bring” a re-mappable interface to the instance  Runs in the operating system, simple to pause apps for interface Lots of upstream/deployment co-ordination required Maintain support for multiple operating systems Incompatible with the increasing number of AWS Marketplace offerings Prone to failure if AWS API is under duress (which is also probably when you really want to be healing!)
  10. 10. ONE ALTERNATIVE Auto Scaling Group min: 1, max: 1 Other cluster members/users ASG Notifications SQS/SNS Lambda AWS CLI  Slow  Prone to AWS API/backplane duress  Does not understand the state of the operating system
  11. 11. GOING BACK TO FIRST PRINCIPLES • What hands out IP addresses in AWS? –DHCP (VPC DHCP Options Group) • Where does the range of IPs come from? –Subnet size • Can we reduce the number of IPs available from DHCP to 1? –Provision an ENI in the subnet -> 1 less IP (for FREE!) –Provision lots of ENIs in a subnet and there will only be 1 IP left
  12. 12. AUTO-HEAL GEN2 1. Create a Subnet for the Auto-Heal node (at the moment /28 is the smallest) 2. Create enough ENIs to remove all but 1 IP from DHCP 3. Create the normal ASG with min:max 1:1 4. Create the ELB with deep health checking as per normal No scripts, co-ordination, or complexity inside the OS or the deployment framework Fully compatible with the wide range of black-box AMIs from AWS Marketplace ✕Wastes address space (which may not be an issue depending on your network design and integration points)
  13. 13. Same technique can be used for “fixed clusters”, sets of quorum servers, container systems
  14. 14. PATTERN 2 - ADVANCED PROXY A SCALABLE , HIGHLY AVAILABLE PROXY WITH ACTIVE DATA CONTROLS AND STATIC IP RANGES
  15. 15. EC2 with Outbound Internet Access MNAT EIP Public Subnet Private Subnet EC2EC2EC2  Uncontrolled access to the internet  Reactive techniques such as VPC flow logs + Lambda are not capable of running in real-time * Diagram simplified for clarity, excludes multi availability zone elements
  16. 16. • Limitation of VPC: Routes can only reference single interface  Active control of traffic  HTTP/S inspection ? Non-trivial engineering required  Not truly HA  Relatively low and finite throughput  Prone to EC2 backplane saturation  100s Mb/s per EIP ~HA Transparent Proxy Design Whitelist Blacklist IP List EIP Public Subnet Private Subnet PROXY PROXY EC2EC2EC2 * Diagram simplified for clarity, excludes multi availability zone elements ENI
  17. 17. Public SubnetPublic Subnet Availability Zone A Auto Scaling Proxy Availability Zone B ASG PROXYPROXY  Active control of traffic  Actively load balanced  Truly HA  ≈ Infinite bandwidth  Variable public IPs Private Subnet Private Subnet EC2EC2EC2 EC2EC2EC2 “Auto Scaled” EIPs
  18. 18. Variable edge IPs are undesirable in the enterprise
  19. 19. Auto Scaling Proxies with Static IPs  Active control of traffic  Actively load balanced  Static external IP addresses ? ≈ Infinite bandwidth requires co- ordination Private SubnetsPrivate Subnets Availability Zone A Availability Zone B ASG EC2EC2EC2 EC2EC2EC2 MNAT PROXY PROXY PROXY PROXY PROXY PROXY Public Subnets MNAT Public Subnets EIP EIP
  20. 20. Why? .....and hang on, I still see EIPs? Scaling Increments: 10GB @ $42/month/10GB 100’s of Mb/s @ ~$210/month/100’s Mb/s • Provision 50/100/200Gb/s upfront • If you move in increments of +/-100Gb/s, see pattern 3. • Simple, HA, static IP proxies for a relatively low uplift in cost Private SubnetsPrivate Subnets Availability Zone A Availability Zone B ASG EC2EC2EC2 EC2EC2EC2 MNAT PROXY PROXY PROXY PROXY PROXY PROXY Public Subnets MNAT Public Subnets EIP EIP
  21. 21. Complex Inspection Sandwich • Lots of vendor solutions can now support healing • Some even support scaling • Few support ENI/EIP handling Private SubnetsPrivate Subnets Availability Zone A Availability Zone B EC2EC2EC2 EC2EC2EC2 MNAT INSPECTION SANDWICH Public Subnets MNAT Public Subnets EIP EIP
  22. 22. PATTERN 3 - AUTO SCALING ANYTHING A TECHNIQUE LEVERAGING EXISTING SERVICES TO AUTOSCALE ALMOST ANYTHING
  23. 23. The fiscal and operational benefits of Auto Scaling are well understood. Auto Scaling is currently limited to scaling EC2 instances We want to apply scaling to entire solutions, not just EC2
  24. 24. SCALING CLUSTERS? SCALING CELLS? • Enterprises have many applications that cannot scale on compute alone – Sharded databases – Life Sciences Clusters – Simulation Clusters • Organisations are starting to adopt “Cell Architecture” to account for scale • Auto Scaling  Auto Healing Client Example ~8000 instances connected in “rings” of 20 nodes via a cluster protocol + ~1500 Cassandra nodes. 50% variance in daily traffic volume. Ideal use-case for Auto Scale
  25. 25. THE GENERAL CASE - CELL / SHARD / CLUSTER EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check
  26. 26. THE GENERAL CASE - CELL / SHARD / CLUSTER EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check EC2 Node1 EC2 Node2 EC2 Node-n… CloudFormation Stack Health Check
  27. 27. STEP 1 – INSTRUMENT THE SCALING METRIC … … CloudWatch Custom Metric Number of Users CloudWatch Alarm ScaleUp CloudWatch Alarm ScaleDown
  28. 28. OPTION 1 – USE LAMBDA ScaleUp ScaleDown … … Number of Users SNS … SNS Build Lambda TeardownL ambda CloudFormation
  29. 29. WHY NOT LAMBDA? • Duplication of AWS engineering investment • Ongoing cost to maintain cadence with the growing features of Auto Scaling – Scheduled Scaling – Percentile Scaling – Machine Learning Scaling / Predictive Scaling • Lambda still needs a state machine • We don’t have healing
  30. 30. There are strong fiscal and complexity motivators to use native ASGs
  31. 31. STEP 2 – “SHADOW” ASG ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Shadow ASG
  32. 32. STEP 3 – ADD THE CFN LAMBDAS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  33. 33. $5.76 per month per stack (Unoptimized)
  34. 34. Auto Scale  Auto Heal
  35. 35. STEP 4 – HEALTH CHECK THE CLUSTERS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  36. 36. HEALING SCENARIO 1 – CLUSTER FAILS ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  37. 37. HEALING SCENARIO 1 – SHADOW TERMINATED ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  38. 38. HEALING SCENARIO 1 – ASG IS IMPACTED ScaleUp ScaleDown Number of Users … … Shadow Shadow Shadow ASG Desired: 3 Actual: 2
  39. 39. HEALING SCENARIO 1 – CLUSTER RESTORED ScaleUp ScaleDown Number of Users … … … Shadow Shadow Shadow Auto Scaling SNS EC2_INSTANCE_LAUNCH Create Stack EC2_INSTANCE_TERMINATE Delete Stack Shadow ASG
  40. 40. Continuous Delivery for Clusters Blue/Green Updates for Clusters at Huge Scale
  41. 41. CONTINUOUS DELIVERY FOR CLUSTERS … … … … … … • Using nothing but the ASG capacity, blue/green roll clusters of almost any size • Increment ASG in V2.0, wait for health check, decrement ASG in V1.0 V1.0 V2.0
  42. 42. AUTO SCALE ANYTHING • Solution works with many non-scaling AWS services • CloudFormation can use Custom Resources to create almost anything • The “Shadow” system only needs the scaling alarms from any CloudWatch metric and a health check endpoint. Decoupled and does not interact with the system in any way.
  43. 43. Database Throughput ScaleUp Alarm SNS Lambda RDS Read Slave CloudFormation Shadow ASGRDS Read Slave RDS Read Slave ScaleDown Alarm CUSTOM SCALING EXAMPLES
  44. 44. Number of user sign-ups/logins ScaleUp Alarm SNS Lambda Application Shard CloudFormation Shadow ASGApplication Shard Application Shard ScaleDown Alarm CUSTOM SCALING EXAMPLES
  45. 45. CPU/Memory ScaleUp Alarm SNS Lambda VMWare Node CloudFormation Shadow ASGVMWare Node VMWare Node ScaleDown Alarm CUSTOM SCALING EXAMPLES
  46. 46. CPU/Memory ScaleUp Alarm SNS Lambda Other infrastructure platforms CloudFormation Shadow ASG Other infrastructure platforms Other infrastructure platforms ScaleDown Alarm CUSTOM SCALING EXAMPLES
  47. 47. Number of items in the queue ScaleUp Alarm SNS Lambda Life Sciences Application CloudFormation Shadow ASGLife Sciences Application Life Sciences Application ScaleDown Alarm CUSTOM SCALING EXAMPLES
  48. 48. Number of planes currently in the air ScaleUp Alarm SNS Lambda Flight Analysis Stack CloudFormation Shadow ASGFlight Analysis Stack Flight Analysis Stack ScaleDown Alarm CUSTOM SCALING EXAMPLES
  49. 49. Number of door entries ScaleUp Alarm SNS Lambda Trading Stack CloudFormation Shadow ASGTrading Stack Trading Stack ScaleDown Alarm CUSTOM SCALING EXAMPLES
  50. 50. Order Volume ScaleUp Alarm SNS Lambda Number of robots on station CloudFormation Shadow ASG Number of robots on station Number of robots on station ScaleDown Alarm CUSTOM SCALING EXAMPLES
  51. 51. Find Out MORE: Visit Us: At our booth or online – www.sourcedgroup.com Careers: www.sourcedgroup.com/careers In the news: • Computerworld (2016): • Foreign Exchange Service OFX Embarks on Cloud Migration • Connecting the Australian Channel (2015): • Meet the Partner who took Qantas to the AWS Cloud • The Australian Business Review (2015): • Greater Buying Power lets Aussie bank on Adobe Experience Manager Our Awards: • AWS – Sydney Partners Summit - Invent & Simplify (2015) • AWS – Global - Customer Obsessed Partner (2014)
  52. 52. Thank you!

×