Your SlideShare is downloading. ×
0
They Don't Hug Back!
Or Why You Need To Stop Worrying About
prodweb001 And Start Loving i-98fb9856
Chris Munns, Amazon Web...
Why are we here?
Old-school IT practices continue to weigh us down
in the cloud. We need a way out.
“Everything now is a programmable
resource. There are no physical
things anymore. Things that you
needed to do by walking ...
“Everything now is a programmable
resource. There are no physical
things anymore. Things that you
needed to do by walking ...
“But I love my servers!”
- You (now)

https://secure.flickr.com/photos/schluesselbein/4157426778/
“They hate you, actually, I
honestly believe that they
hate you.
“They hate you, actually, I
honestly believe that they
hate you. At least that is
how they behaved
towards me.” –
Dr. Wern...
“But I love my servers!”
“Well now I’m kind of sad.”
- You (now)

https://secure.flickr.com/photos/bensonkua/2687804310/
So where does
server hugging
come from?
NAMING
THEM
https://secure.flickr.com/photos/quinnanya/4464205726
So where does server hugging come from?

Why do we name them?
So where does server hugging come from?

Why do we name them?
Because we have to know where to
find them.
So where does server hugging come from?

Why do we name them?
Because we have to know where to
find them.
Where do we need...
Here

https://secure.flickr.com/photos/arthur-caranta/2925352521
Here
Or here?

https://secure.flickr.com/photos/arthur-caranta/2925352521
IF THIS THING
IS OUT OF
TAPE, YOU
HAD A REALLY
BAD DAY.
https://secure.flickr.com/photos/stephendotcarter/6587082437
So where does server hugging come from?

Why did we need to find them in
person?
So where does server hugging come from?

Why did we need to find them in
person?
Because we HAD to fix them.
So where does server hugging come from?

Why did we need to find them in
person?
Because we HAD to fix them. WHY?
So where does server hugging come from?

We fixed them because:
Dead servers == dead space
Dead space == wasted $$$
Dead s...
So where else does
server hugging
come from?
SERVERS != OUR PETS

https://secure.flickr.com/photos/thegirlsny/3877243166/
What we name our pets
•
•
•
•
•
•
•
•

Greek gods: Zeus, Thor, Hercules…
Elements: Hydrogen, Helium, Lithium…
Comic book h...
What we name our pets
•
•
•
•
•
•
•
•

Greek gods: Zeus, Thor, Hercules…
Elements: Hydrogen, Helium, Lithium…
Comic book h...
P1cfw01v03
https://secure.flickr.com/photos/75898532@N00/3243666946/
EC
2

EC2
EC2

EC2
EC2
EC2
EC2

EC
2

P1cfw01v03
https://secure.flickr.com/photos/verylastexcitingmoment/3118396767/
Waking when they cry:
*** Nagios ***
Notification Type: PROBLEM
Service: Web CPU
Host: web03.example.com
Address: 10.167.1...
Hugging server babies and you
•
•
•
•
•
•
•

Is the site performing worse?
Are your customers impacted?
How impacted are t...
Server hugging bad practices
• “Pet-ting” – caring about a server’s “name,” its
well being, its individual status
• “Snowf...
In short, there are a lot of old-school, dated habits
being taken to cloud infrastructure. And once you’ve
brought them to...
https://secure.flickr.com/photos/tolomea/5113266973/
Letting go involves moving forward with
some of the best of what AWS can offer you
in terms of services and how you can wo...
Letting go and loving the new way
•
•
•
•
•
•

Using Auto Scaling for everything
ENIs and EIPs
Tags are the new DNS
Deploy...
Sleeping through
Infrastructure Recovery

https://secure.flickr.com/photos/dominiqs/331702231
The things that should never wake you up
•
•
•
•
•
•

High CPU usage on anything
High memory usage on anything
Thread/proc...
Metrics:
Metrics:
Common actions taken when paged
1. Look at logs
2. Look at graphs
3. Reboot/restart related application/instance
Common actions taken when paged
1. Look at logs
2. Look at graphs

}

Looking at past data

3. Reboot/restart related appl...
Common actions taken when paged
1. Look at logs
2. Look at graphs

}

Looking at past data

3. Reboot/restart related appl...
Traffic to our site vs. provisioned capacity manually
Provisioned capacity
Traffic to our site vs. provisioned capacity manually
76%
Provisioned capacity

24%
Traffic to our site vs. provisioned capacity with Auto Scaling

Provisioned capacity
STONITH
"Shoot the other node in the head”
Don’t be afraid to kill a node a with
something wrong with it as a resolution
t...
STONITH
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance

Auto Scaling Group min=3
Availability Z...
STONITH
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance

Auto Scaling Group min=3
Availability Z...
STONITH
CloudWatch
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance

Auto Scaling Group min=3
Ava...
STONITH
CloudWatch
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance

Auto Scaling Group min=3
Ava...
STONITH
Alarm
CloudWatch

Amazon SNS
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance

Auto Scali...
STONITH
Alarm
CloudWatch

Amazon SQS

Amazon SNS
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Web
Instance...
STONITH
Alarm
CloudWatch

Amazon SQS

Amazon SNS
Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Watcher
Inst...
STONITH
Alarm
CloudWatch

Amazon SQS

Amazon SNS

EC2 API

Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Wa...
STONITH
Alarm
CloudWatch

Amazon SQS

Amazon SNS

EC2 API

Internet
Gateway

ELB

ELB

ELB

Web
Instance

Watcher
Instance...
STONITH
CloudWatch

Amazon SQS

Amazon SNS

EC2 API

Internet
Gateway

ELB

Web
Instance

ELB

ELB

Web
Instance

Watcher
...
Auto Scaling for everything!
• You can use Auto Scaling for singular instances that
don’t scale up or down
– min = 1, max ...
Auto Scaling for everything!
• Make use of the user data or configuration
management tools to do things like:
– Re-attachi...
Elastic Network Interfaces/Elastic IPs
ENI:
• Add additional interfaces to an
instance
• One or more secondary private
IP ...
Elastic Network Interfaces

Attaching multiple network interfaces to an instance is useful when you
want to:
• Create a ma...
Elastic Network Interfaces

Attaching multiple network interfaces to an instance is useful when you
want to:
• Create a ma...
Healing a single instance

EC2 API

AWS
CloudFormation
AWS Cloud
Healing a single instance

EC2 API

Internet
Gateway

NAT
Instance
Availability Zone
Virtual Private Cloud
AWS Cloud

AWS
...
Healing a single instance

EC2 API

App
Instance

Internet
Gateway

NAT
Instance
Availability Zone
Virtual Private Cloud
A...
Healing a single instance

EC2 API

App
Instance

Internet
Gateway

Auto-Scaling
Group

NAT
Instance
Availability Zone
Vir...
Healing a single instance

EC2 API
Elastic Network
Instance

App
Instance
Auto-Scaling
Group

Internet
Gateway

EBS
Volume...
Healing a single instance

EC2 API
Elastic Network
Instance

App
Instance
Auto-Scaling
Group

Internet
Gateway

EBS
Volume...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance

EC2 API
Elastic Network
Instance

Instances

App
Instance
Auto-Scaling
Group

Internet
Gateway
...
Healing a single instance
"myENI" : {
"Type" : "AWS::EC2::NetworkInterface",
"Properties" : {
"Tags": [{"Key":"Name","Valu...
Healing a single instance
import boto.ec2
import boto.utils
conn = boto.ec2.connect_to_region('us-west-2')
myfilters = {'t...
Healing a single instance
import boto.ec2
import boto.utils
conn = boto.ec2.connect_to_region('us-west-2')

Connect to API...
Use tags as a source
of “truth” in your
infrastructure
https://secure.flickr.com/photos/cambodia4kidsorg/260004685
DNS bad. Tags good.
DNS
• 30-year old technology
• Only tells us a single
thing about a host, a
hostname to IP mapping.
• ...
DNS bad. Tags good.
DNS
Web03.example.com:
– 10.167.10.51

Tags
i-933f81a4:
–
–
–
–
–

Name:Web
Env:Prod
Project:Blog
Owne...
Tags as a source of truth
•
•
•
•
•
•

Tie various resources together
Billing reports
IAM resource-level permissions
Build...
Stop hand-crafting servers!

https://secure.flickr.com/photos/ndrwfgg/115898387
Use automation!
https://secure.flickr.com/photos/genewolf/147722350
AWS management tools
Higher-level services

AWS Elastic Beanstalk
Convenience

AWS OpsWorks

Do it yourself

AWS CloudForm...
Host-based configuration management

Fabric
Host-based configuration management
• All more or less accomplish the same things
– File configuration, package/software i...
“I don’t have time to learn Chef!?”

https://secure.flickr.com/photos/45909111@N00/9374169461/
“I don’t have time to learn Chef!?”

“I wrote custom shell
scripts instead!”
https://secure.flickr.com/photos/45909111@N00...
Go visit the AWS & Partner
exhibits and ask for more
info!

https://secure.flickr.com/photos/45909111@N00/9374169461/
Making Use of
Service Registries

https://secure.flickr.com/photos/fringedbenefit/9178086713
https://secure.flickr.com/photos/smartfinn/2651755337/
NOT THAT KINDA
REGISTRY!
https://secure.flickr.com/photos/smartfinn/2651755337/
“A service registry is one of the fundamental
pieces of service-oriented architecture
(SOA) for achieving reuse. It refers...
Service registry workflow
1. A new instance boots.
2. It registers itself with our “service registry.”
3. Changes to the s...
Service registry examples:
•
•
•
•

Zookeeper
MuleSoft Anypoint Service Registry
Netflix Eureka
IBM WebSphere Service Regi...
Zookeeper
“is a centralized service for maintaining
configuration information, naming, providing
distributed synchronizati...
Zookeeper
Leader Host

Zookeeper
Instance

Worker
Instance

Zookeeper
Instance

Zookeeper
Instance

Worker
Instance
Auto s...
Enough from me!
Customer Story: Airbnb SmartStack
Martin Rhoads
Airbnb SmartStack
Helping you build Service Oriented Architectures
Martin Rhoads
SRE @ Airbnb
November 13, 2013

© 2013 Am...
Intros
not at Re:Invent

Igor Serebryany
+ SRE at Airbnb since 2012
+ Built datacenter automation at
SingleHop
+ Scientifi...
Intros
This guy is even more bearded than the last!

Martin Rhoads
+ SRE at Airbnb
+ user of AWS since 2006
+ First 10 emp...
SmartStack
Helping you build SOA
Why do I need SOA?
What are you trying to sell me?

+ The definitive way to scale your architecture
+ Allow different peop...
How SOA happens
When customers love a service very, very much...

10
6
How SOA happens
When customers love a service very, very much...

10
7
How SOA happens
When customers love a service very, very much...

10
8
How SOA happens
When customers love a service very, very much...

10
9
How SOA happens
When customers love a service very, very much...

11
0
How SOA happens
When customers love a service very, very much...

11
1
Here’s how it ends up
A certain kind of fun

11
2
To sum up
1

Services help you scale

2

SOA is an architecture style designed around services

3

A SOA is hard to manage...
What is SmartStack?
And how does it help?
1

Service(s) you want to deliver

2

Zookeeper registry to track
everything

3

Nerve checks health and updates Zookeeper...
MONORAIL
NERVE

MOBILE WEB
SYNAPSE

NERVE

SYNAPSE

ZOOKEEPER

+ /production/monorail/services/i-1234567 => {‘host’: 1.2.3...
haproxy

At the core of synapse

We get myriad benefits from haproxy
+ Stable and well-tested
+ Performs in-process connec...
To Recap

SmartStack in action

11
8
Abstraction and DRY

Why
SmartStack?

Automatic failure detection
Introspection
Distributed by design
Abstraction

+ The same code in the same language is always doing
discovery/registration
+ Your application doesn’t know a...
Automatic Failure Handling
You don’t have to wake up

+ Bad backends are automatically taken out of rotation
+ Useful duri...
Introspection
See what’s REALLY going on

Leverage the power of haproxy
+ status page that lets you see local
state
+ lots...
Distributed by Design
No central point of failure

+ Traffic flows directly between boxes -- no routing layer
+ Even if Sm...
The Impact
How SmartStack has changed Airbnb

100+

2K

3K

30

Services
using
SmartStack

Requests per
second

LOC
delete...
Spike : “Nerve and Synapse have greatly simplified my
life as an application developer, and have enabled me to
launch our ...
Future Direction

Is this project, like, done...?

1

Better resiliency: more graceful handling of zookeeper edge
cases

2...
I’m sold!
How do I get started?
Getting Started

1

install Vagrant

2

git clone https://github.com/airbnb/smartstack-cookbook.git

3

vagrant up

12
8
Where is the code?

https://github.com/airbnb/nerve.git
https://github.com/airbnb/synapse.git

12
9
AWS re:Invent Pub Crawl
Join the AWS Startup Team this evening at the AWS Pub Crawl
When: Wednesday November 13, 5:30pm - ...
Startup Spotlight Sessions with Dr. Werner Vogels
Thurs. Nov 14, Marcello Room 4406

SPOT 203 – Fireside Chats – Startup F...
We are sincerely eager to hear
your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form
whe...
Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013

6,543

Published on

Traditionally, IT organizations have treated infrastructure components like family pets. We name them, we worry about them, and we let them wake us up at 4:00 am. Amazon CTO Werner Vogels has dubbed these behaviors as server hugging and antiquated in today's cloud infrastructures. In this breakout session, we will discuss methods and methodology to get away from server hugging and be concerned more with the overall status and life of our entire infrastructure. From making use of toss-away-able on-demand infrastructure, to monitoring services and not individual servers, to getting away from naming instances, this session helps you see your infrastructure for what it is, technology that you control.

Published in: Technology

Transcript of "Stop Worrying about Prodweb001 and Start Loving i-98fb9856 (ARC201) | AWS re:Invent 2013"

  1. 1. They Don't Hug Back! Or Why You Need To Stop Worrying About prodweb001 And Start Loving i-98fb9856 Chris Munns, Amazon Web Services November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Why are we here? Old-school IT practices continue to weigh us down in the cloud. We need a way out.
  3. 3. “Everything now is a programmable resource. There are no physical things anymore. Things that you needed to do by walking to the datacenter, by hugging your servers, and believe me I’ve hugged servers enough in my life. They DO NOT hug you back.”
  4. 4. “Everything now is a programmable resource. There are no physical things anymore. Things that you needed to do by walking to the datacenter, by hugging your servers, and believe me I’ve hugged servers enough in my life. They DO NOT hug you back.” Dr. Werner Vogels (Re:Invent 2012)
  5. 5. “But I love my servers!” - You (now) https://secure.flickr.com/photos/schluesselbein/4157426778/
  6. 6. “They hate you, actually, I honestly believe that they hate you.
  7. 7. “They hate you, actually, I honestly believe that they hate you. At least that is how they behaved towards me.” – Dr. Werner Vogels (Re:Invent 2012)
  8. 8. “But I love my servers!” “Well now I’m kind of sad.” - You (now) https://secure.flickr.com/photos/bensonkua/2687804310/
  9. 9. So where does server hugging come from?
  10. 10. NAMING THEM https://secure.flickr.com/photos/quinnanya/4464205726
  11. 11. So where does server hugging come from? Why do we name them?
  12. 12. So where does server hugging come from? Why do we name them? Because we have to know where to find them.
  13. 13. So where does server hugging come from? Why do we name them? Because we have to know where to find them. Where do we need to find them?
  14. 14. Here https://secure.flickr.com/photos/arthur-caranta/2925352521
  15. 15. Here Or here? https://secure.flickr.com/photos/arthur-caranta/2925352521
  16. 16. IF THIS THING IS OUT OF TAPE, YOU HAD A REALLY BAD DAY. https://secure.flickr.com/photos/stephendotcarter/6587082437
  17. 17. So where does server hugging come from? Why did we need to find them in person?
  18. 18. So where does server hugging come from? Why did we need to find them in person? Because we HAD to fix them.
  19. 19. So where does server hugging come from? Why did we need to find them in person? Because we HAD to fix them. WHY?
  20. 20. So where does server hugging come from? We fixed them because: Dead servers == dead space Dead space == wasted $$$ Dead servers == worse performance Worse performance == lost $$$
  21. 21. So where else does server hugging come from?
  22. 22. SERVERS != OUR PETS https://secure.flickr.com/photos/thegirlsny/3877243166/
  23. 23. What we name our pets • • • • • • • • Greek gods: Zeus, Thor, Hercules… Elements: Hydrogen, Helium, Lithium… Comic book heroes: Superman, Ironman… Musicians, Cities, Countries, Movies Prodweb01, Prodapi01… Web01.prod, Web01.test… Tacotruck01 P1cfw01v03
  24. 24. What we name our pets • • • • • • • • Greek gods: Zeus, Thor, Hercules… Elements: Hydrogen, Helium, Lithium… Comic book heroes: Superman, Ironman… Musicians, Cities, Countries, Movies Prodweb01, Prodapi01… Web01.prod, Web01.test… Tacotruck01 P1cfw01v03
  25. 25. P1cfw01v03 https://secure.flickr.com/photos/75898532@N00/3243666946/
  26. 26. EC 2 EC2 EC2 EC2 EC2 EC2 EC2 EC 2 P1cfw01v03 https://secure.flickr.com/photos/verylastexcitingmoment/3118396767/
  27. 27. Waking when they cry: *** Nagios *** Notification Type: PROBLEM Service: Web CPU Host: web03.example.com Address: 10.167.10.51 State: CRITICAL Date/Time: Thu Oct 24 08:14:13 UTC 2013 Additional Info: CRITICAL – CPU LOAD 29
  28. 28. Hugging server babies and you • • • • • • • Is the site performing worse? Are your customers impacted? How impacted are they? What are the other 20 web instances doing? Did I really need to wake up at 4am for this? If a server uses 100% of its CPU, should I care? If this server is bad, how much work is there in fixing it? • Is there something custom about this server?
  29. 29. Server hugging bad practices • “Pet-ting” – caring about a server’s “name,” its well being, its individual status • “Snowflakes” – unique hosts in a common pool • “Model T-ing” – Hand-built one-off servers • “Names In Stone” – overuse of host names as a source of truth
  30. 30. In short, there are a lot of old-school, dated habits being taken to cloud infrastructure. And once you’ve brought them to the cloud, you lose out on a lot of the benefits of the cloud. Such as: • Dynamic scale up/down • Self healing infrastructures • Increased flexibility • Automation
  31. 31. https://secure.flickr.com/photos/tolomea/5113266973/
  32. 32. Letting go involves moving forward with some of the best of what AWS can offer you in terms of services and how you can work with them in some pretty incredible ways.
  33. 33. Letting go and loving the new way • • • • • • Using Auto Scaling for everything ENIs and EIPs Tags are the new DNS Deployment tools Host-based configuration Service registries
  34. 34. Sleeping through Infrastructure Recovery https://secure.flickr.com/photos/dominiqs/331702231
  35. 35. The things that should never wake you up • • • • • • High CPU usage on anything High memory usage on anything Thread/process exhaustion Filled disks Not running software Failed instances
  36. 36. Metrics:
  37. 37. Metrics:
  38. 38. Common actions taken when paged 1. Look at logs 2. Look at graphs 3. Reboot/restart related application/instance
  39. 39. Common actions taken when paged 1. Look at logs 2. Look at graphs } Looking at past data 3. Reboot/restart related application/instance
  40. 40. Common actions taken when paged 1. Look at logs 2. Look at graphs } Looking at past data 3. Reboot/restart related application/instance Why do this manually?
  41. 41. Traffic to our site vs. provisioned capacity manually Provisioned capacity
  42. 42. Traffic to our site vs. provisioned capacity manually 76% Provisioned capacity 24%
  43. 43. Traffic to our site vs. provisioned capacity with Auto Scaling Provisioned capacity
  44. 44. STONITH "Shoot the other node in the head” Don’t be afraid to kill a node a with something wrong with it as a resolution to failure! With Auto Scaling it’s fine!
  45. 45. STONITH Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto Scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  46. 46. STONITH Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto Scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  47. 47. STONITH CloudWatch Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto Scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  48. 48. STONITH CloudWatch Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto Scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  49. 49. STONITH Alarm CloudWatch Amazon SNS Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto Scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  50. 50. STONITH Alarm CloudWatch Amazon SQS Amazon SNS Internet Gateway ELB Web Instance ELB ELB Web Instance Web Instance Auto scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  51. 51. STONITH Alarm CloudWatch Amazon SQS Amazon SNS Internet Gateway ELB Web Instance ELB ELB Web Instance Watcher Instance Web Instance Auto scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  52. 52. STONITH Alarm CloudWatch Amazon SQS Amazon SNS EC2 API Internet Gateway ELB Web Instance ELB ELB Web Instance Watcher Instance Web Instance Auto scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  53. 53. STONITH Alarm CloudWatch Amazon SQS Amazon SNS EC2 API Internet Gateway ELB ELB ELB Web Instance Watcher Instance Web Instance Auto scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  54. 54. STONITH CloudWatch Amazon SQS Amazon SNS EC2 API Internet Gateway ELB Web Instance ELB ELB Web Instance Watcher Instance Web Instance Auto scaling Group min=3 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  55. 55. Auto Scaling for everything! • You can use Auto Scaling for singular instances that don’t scale up or down – min = 1, max = 1 • Auto Scaling gives you the ability to specify multiple Availability Zones, even you only need a single host – gives you multi-AZ failover • Auto Scaling supports notifications on instance creation/termination – Useful for configuring other resources, bootstrapping, and provisioning • Auto Scaling is free!
  56. 56. Auto Scaling for everything! • Make use of the user data or configuration management tools to do things like: – Re-attaching an Amazon Elastic Block Store (EBS) volume with application data – Re-attaching an Elastic Network Interface (ENI) – Update service registries – Update DNS – Update other reliant applications of the new host
  57. 57. Elastic Network Interfaces/Elastic IPs ENI: • Add additional interfaces to an instance • One or more secondary private IP addresses • Has its own MAC address • Can have Security Groups assigned • Tag-able • Free EIP: • A static public IP address • Can be assigned to either an instance or an ENI • Doesn’t replace private IP • Small hourly charge when not attached to an instance
  58. 58. Elastic Network Interfaces Attaching multiple network interfaces to an instance is useful when you want to: • Create a management network. • Use network and security appliances in your Amazon Virtual Private Cloud (VPC). • Create dual-homed instances with workloads/roles on distinct subnets. • Create a low-budget, high-availability solution.
  59. 59. Elastic Network Interfaces Attaching multiple network interfaces to an instance is useful when you want to: • Create a management network. • Use network and security appliances in your Amazon Virtual Private Cloud (VPC). • Create dual-homed instances with workloads/roles on distinct subnets. • Create a low-budget, high-availability solution.
  60. 60. Healing a single instance EC2 API AWS CloudFormation AWS Cloud
  61. 61. Healing a single instance EC2 API Internet Gateway NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  62. 62. Healing a single instance EC2 API App Instance Internet Gateway NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  63. 63. Healing a single instance EC2 API App Instance Internet Gateway Auto-Scaling Group NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  64. 64. Healing a single instance EC2 API Elastic Network Instance App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  65. 65. Healing a single instance EC2 API Elastic Network Instance App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  66. 66. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  67. 67. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  68. 68. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  69. 69. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  70. 70. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  71. 71. Healing a single instance EC2 API Elastic Network Instance Instances App Instance Auto-Scaling Group Internet Gateway EBS Volume NAT Instance Availability Zone Virtual Private Cloud AWS Cloud AWS CloudFormation
  72. 72. Healing a single instance "myENI" : { "Type" : "AWS::EC2::NetworkInterface", "Properties" : { "Tags": [{"Key":"Name","Value":"AppENI"}, {"Key":"Project","Value":"Blog"}], "Description": "Blog One Off App Server ENI.", "SubnetId": "subnet-d2286cb9", "PrivateIpAddress": "192.168.11.100" } }
  73. 73. Healing a single instance import boto.ec2 import boto.utils conn = boto.ec2.connect_to_region('us-west-2') myfilters = {'tag:Name': 'AppENI', 'tag:Project': 'Blog’} myEni=conn.get_all_network_interfaces(filters=myfilters) myInstance=boto.utils.get_instance_metadata()['instance-id'] conn.attach_network_interface(myEni[0].id, myInstance, device_index=1, dry_run=False)
  74. 74. Healing a single instance import boto.ec2 import boto.utils conn = boto.ec2.connect_to_region('us-west-2') Connect to API myfilters = {'tag:Name': 'AppENI', 'tag:Project': 'Blog’} Find the right ENI myEni=conn.get_all_network_interfaces(filters=myfilters) myInstance=boto.utils.get_instance_metadata()['instance-id'] Attach ENI to instance conn.attach_network_interface(myEni[0].id, myInstance, device_index=1, dry_run=False)
  75. 75. Use tags as a source of “truth” in your infrastructure https://secure.flickr.com/photos/cambodia4kidsorg/260004685
  76. 76. DNS bad. Tags good. DNS • 30-year old technology • Only tells us a single thing about a host, a hostname to IP mapping. • Potential for split brain/broken replicas • Caching issues, caching issues, caching issues Tags • Set by you the user, held in AWS and available via APIs • Key:Value is totally up to you • Can have several per resource • Free to implement and query
  77. 77. DNS bad. Tags good. DNS Web03.example.com: – 10.167.10.51 Tags i-933f81a4: – – – – – Name:Web Env:Prod Project:Blog Owner:BobSmith aws:autoscaling:groupName : ProdBlogWebsASG – aws:cloudformation:stack-name: BlogSiteProd
  78. 78. Tags as a source of truth • • • • • • Tie various resources together Billing reports IAM resource-level permissions Build automation Deploy automation Security resource grouping
  79. 79. Stop hand-crafting servers! https://secure.flickr.com/photos/ndrwfgg/115898387
  80. 80. Use automation! https://secure.flickr.com/photos/genewolf/147722350
  81. 81. AWS management tools Higher-level services AWS Elastic Beanstalk Convenience AWS OpsWorks Do it yourself AWS CloudFormation Control
  82. 82. Host-based configuration management Fabric
  83. 83. Host-based configuration management • All more or less accomplish the same things – File configuration, package/software installation, user management, run commands, interface with OS, process management • All have their own syntax that isn’t too dissimilar • Some rely on agents, some are agentless • Use HBCM alongside one of the tools from the previous slide • Spend the time required to learn them • Can’t scale easily without HBCM
  84. 84. “I don’t have time to learn Chef!?” https://secure.flickr.com/photos/45909111@N00/9374169461/
  85. 85. “I don’t have time to learn Chef!?” “I wrote custom shell scripts instead!” https://secure.flickr.com/photos/45909111@N00/9374169461/
  86. 86. Go visit the AWS & Partner exhibits and ask for more info! https://secure.flickr.com/photos/45909111@N00/9374169461/
  87. 87. Making Use of Service Registries https://secure.flickr.com/photos/fringedbenefit/9178086713
  88. 88. https://secure.flickr.com/photos/smartfinn/2651755337/
  89. 89. NOT THAT KINDA REGISTRY! https://secure.flickr.com/photos/smartfinn/2651755337/
  90. 90. “A service registry is one of the fundamental pieces of service-oriented architecture (SOA) for achieving reuse. It refers to a place in which service providers can impart information about their offered services and potential clients can search for services.” - www.architecturejournal.net, Sept 2009
  91. 91. Service registry workflow 1. A new instance boots. 2. It registers itself with our “service registry.” 3. Changes to the service registry kick off changes on other systems related to the new instance. 4. Other instances now know about our new instance. 5. On instance termination, instance is deregistered, and other instances remove it from use.
  92. 92. Service registry examples: • • • • Zookeeper MuleSoft Anypoint Service Registry Netflix Eureka IBM WebSphere Service Registry and Repository • Airbnb SmartStack
  93. 93. Zookeeper “is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” – zookeeper.apache.org – – – – – – leader election group membership configuration maintenance event notification locking priority queue mechanism
  94. 94. Zookeeper Leader Host Zookeeper Instance Worker Instance Zookeeper Instance Zookeeper Instance Worker Instance Auto scaling Group min=2 Availability Zone Availability Zone Virtual Private Cloud AWS Cloud Availability Zone
  95. 95. Enough from me!
  96. 96. Customer Story: Airbnb SmartStack Martin Rhoads
  97. 97. Airbnb SmartStack Helping you build Service Oriented Architectures Martin Rhoads SRE @ Airbnb November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  98. 98. Intros not at Re:Invent Igor Serebryany + SRE at Airbnb since 2012 + Built datacenter automation at SingleHop + Scientific computing at University of Chicago + Hobbies: welding, biking, long walks on the beach 10 2
  99. 99. Intros This guy is even more bearded than the last! Martin Rhoads + SRE at Airbnb + user of AWS since 2006 + First 10 employees at RightScale + Previously worked at Cloudscaling deploying OpenStack at Tier1s and Telcos + BioInformatics at UCSB + Obsessed with making things easier 10 3
  100. 100. SmartStack Helping you build SOA
  101. 101. Why do I need SOA? What are you trying to sell me? + The definitive way to scale your architecture + Allow different people to work on different code without stepping on toes + Separate deployment schedules + Separate machine and data requirements + Fail separately -- so you can have graceful degradation 10 5
  102. 102. How SOA happens When customers love a service very, very much... 10 6
  103. 103. How SOA happens When customers love a service very, very much... 10 7
  104. 104. How SOA happens When customers love a service very, very much... 10 8
  105. 105. How SOA happens When customers love a service very, very much... 10 9
  106. 106. How SOA happens When customers love a service very, very much... 11 0
  107. 107. How SOA happens When customers love a service very, very much... 11 1
  108. 108. Here’s how it ends up A certain kind of fun 11 2
  109. 109. To sum up 1 Services help you scale 2 SOA is an architecture style designed around services 3 A SOA is hard to manage 4 SmartStack makes managing SOA a breeze 11 3
  110. 110. What is SmartStack? And how does it help?
  111. 111. 1 Service(s) you want to deliver 2 Zookeeper registry to track everything 3 Nerve checks health and updates Zookeeper 4 Synapse routes between services SERVICE NERVE ZOOKEEPER SYNAPSE
  112. 112. MONORAIL NERVE MOBILE WEB SYNAPSE NERVE SYNAPSE ZOOKEEPER + /production/monorail/services/i-1234567 => {‘host’: 1.2.3.4, ‘port’: 5678} + /production/mobile_web/services/i-0abcdef => {‘host’: 5.6.7.8, ‘port’: 5678}
  113. 113. haproxy At the core of synapse We get myriad benefits from haproxy + Stable and well-tested + Performs in-process connectivity checks + Great introspection and logging + Lots of load-balancing algorithms (RR, least-conn) + Somewhat dynamically reconfigurable (stats socket) 11 7
  114. 114. To Recap SmartStack in action 11 8
  115. 115. Abstraction and DRY Why SmartStack? Automatic failure detection Introspection Distributed by design
  116. 116. Abstraction + The same code in the same language is always doing discovery/registration + Your application doesn’t know about nerve/synapse -- it only knows about its dependencies + Always consistent across your infrastructure 12 0
  117. 117. Automatic Failure Handling You don’t have to wake up + Bad backends are automatically taken out of rotation + Useful during both problems and routine maintenance/deploys + Push-based => very rapid detection; avoid those little blips + haproxy even routes around network partitions! 12 1
  118. 118. Introspection See what’s REALLY going on Leverage the power of haproxy + status page that lets you see local state + lots of available integrations to gather global state + world-class logging for large-scale analysis 12 2
  119. 119. Distributed by Design No central point of failure + Traffic flows directly between boxes -- no routing layer + Even if SmartStack is stopped or broken, haproxy keeps traffic flowing + Zookeeper helps to avoid common pitfalls (like different backends in different network segments) 12 3
  120. 120. The Impact How SmartStack has changed Airbnb 100+ 2K 3K 30 Services using SmartStack Requests per second LOC deleted Engineers using SmartStack 12 4
  121. 121. Spike : “Nerve and Synapse have greatly simplified my life as an application developer, and have enabled me to launch our first Node.js services with very little ops overhead.” Sean: “Smart Stack has made deployment of new java services a matter of beer and 20 lines of ruby” Our engineers love SmartStack Ben: “SmartStack is great! It helped me to discover services – and quit smoking” Barbara: “I love it!” Phillippe: “Distributed computing? And all this time I thought everything was running on one machine”
  122. 122. Future Direction Is this project, like, done...? 1 Better resiliency: more graceful handling of zookeeper edge cases 2 Better testing: improve on the current integration test suite 3 Dynamic registration: for services running on Mesos et. al. 4 A push API for nerve: allow services to communicate coming downtime 5 An auto-scaling layer: use nerve information to determine load levels 12 6
  123. 123. I’m sold! How do I get started?
  124. 124. Getting Started 1 install Vagrant 2 git clone https://github.com/airbnb/smartstack-cookbook.git 3 vagrant up 12 8
  125. 125. Where is the code? https://github.com/airbnb/nerve.git https://github.com/airbnb/synapse.git 12 9
  126. 126. AWS re:Invent Pub Crawl Join the AWS Startup Team this evening at the AWS Pub Crawl When: Wednesday November 13, 5:30pm - 7:30pm Where: Canaletto at The Venetian, 2nd Floor Who Will Be There: Startups, the AWS Startup Team, Startup Launch Companies, and AWS re:Invent Hackathon winners
  127. 127. Startup Spotlight Sessions with Dr. Werner Vogels Thurs. Nov 14, Marcello Room 4406 SPOT 203 – Fireside Chats – Startup Founders, 1:30-2:30pm – Eliot Horowitz, CTO of MongoDB – Jeff Lawson, CEO of Twilio – Valentino Volonghi, Chief Architect of AdRoll SPOT 204 – Fireside Chats – Startup Influencers, 3:00-4:00pm – Albert Wegner, Managing Partner at Union Square Ventures – David Cohen, Founder and CEO of TechStars SPOT 101 - Startup Launches, 4:15-5:15pm – 5 companies powered by AWS launching at AWS re:Invent 2013
  128. 128. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×