Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*!
* All unlicensed or borrowed works ret...
A Tale of Two Clouds
2
Enterprise Computing Approach
3
GUI Driven!
Ticket-Based!
Hand-Crafted!
Reserved !
Scale-up!
Smart Hardware!
Proprietary!
...
Cloud Computing Approach
4
API Driven!
Self-Service!
Automated!
On-demand!
Scale-out!
Smart Apps!
Open Source!
Agile DevOp...
Elastic Cloud Shifts Uptime Responsibility
5
Enterprise Model Cloud Model
99.9%!
Applications!
(8h46m down)
99.999%!
Infra...
Elastic Cloud Origins
6
Elastic !
Private Cloud
Enterprise Virtualization!
Private Cloud
Elastic &
Virtualization
2.0 Clou...
What Companies Care About?
7
Cloud
Computing!
Agile
Development!
Business !
Agility!
Operational
Discipline!
ACCELERATING!...
Elastic Cloud is a Mindset Change
8
Attribution: Bill Baker, Distinguished Engineer, Microsoft
bowzer.company.com!
(scale-...
Pets vs. Cattle Takes Off
9
Microsoft
Cloudscaling
CERN
IBM
ScalrRackspace
Red Hat
Scale-out, not UP in Cloud
(Some) Elastic Cloud Patterns
!
!
!
What follows are *some* Elastic Cloud Patterns!
There are many more, but these are min...
Big Failure Domains !
Make Big Craters
11
Big Failure Domains !
Make Big Craters
12
Anti-Pattern
Anti-Pattern
Smaller Failure Domains
13
Would you rather have the whole cloud down !
or just a small bit of it for a short time?
vs
Loose Coupling
14
Synchronous, blocking
calls mean cascading
failures.
Async, non-block calls
mean failure in
isolation.
Open Source Software
15
Excessive software
taxation is the past.
Black boxes
create lock-in.
You can !
always fork.
Uptime in Software Self-management
16
Hardware fails.!
Software fails.!
People fail.
Only software can
measure itself &
re...
Scale Out vs Scale up
17
Vertical Scaling 

Make boxes bigger 

(usually an HA pair)
Horizontal Scaling

Make more boxes

...
Circuit Breaker Pattern
18
Fallback mechanisms (e.g.
cached data) 

ensure uninterrupted service
while giving service time...
Buy from ODMs
19
ODMs operate their
businesses on 3-10%
margins.
AMZN, GOOG, and
Facebook buy direct
without a middleman.
...
Less Enterprise “Value” in x86 Servers
20
Generic servers rule. Full
stop. Nothing is better
because nothing else is
*gene...
Fully Routed (L3) Networking
21
The largest cloud operators
all run layer-3 routed,
networks with no VLANs.
Cloud-ready ap...
Software-defined Networking (SDN)
22
• x86 server is the new Linecard"
• network switch is the new ASIC"
• VXLAN (or NVGRE)...
Flat Networking + SDNs
23
Flat + SDN co-exist
& thrive together
Standard
Security
Group
1 2
Availability
Zone
VM VM
VM
VM
...
RAIS instead of HA Pairs/Clusters
Redundant arrays of inexpensive services (RAIS)!
Load balanced with no state sharing!
Ac...
Service Array (RAIS) Example
25
Backbone Routers
Cloud Access Switches
AZ (Spine) Switches
RAIS (NAT, LB, VPN)
OSPF Route ...
Lots of Inexpensive 1RU Switches
26
1RU: 6K-30K VMs / AZ
Simple spine-and-leaf flat routed network
Rack 1 Rack 2 Rack 3
Mod...
Direct-attached Storage (DAS)
27
Cloud-ready apps
manage their own
data replication.
DAS is the smallest failure
domain po...
Elastic Block Device Services
28
EBS/EBD is a crutch
Bigger failure domains 

(AWS outage anyone?), complex,
sets high exp...
More Servers == More Storage I/O
29
>1M writes/second, triple-
redundancy w/ Cassandra on AWS
Linear scale-out == linear c...
Hypervisors are a Commodity
30
Cloud end-users want OS
of choice, not HVs.
Level up! Managing iron is for
mainframe operat...
The Hypervisor of the Future 

May Be NO Hypervisor
31
LXC
ironic
Bare Metal Cloud
Quiz Time
32
Quiz Time
33
Pets Cattle
NIC bonding?
Quiz Time
34
Pets Cattle
NIC bonding ➔
Quiz Time
35
Pets Cattle
NIC bonding
Managing a Server 

at a Time?
Quiz Time
36
Pets Cattle
NIC bonding
Managing a Server

at a Time ➔
Quiz Time
37
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling?
Quiz Time
38
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling➔
Quiz Time
39
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure?
Quiz Time
40
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure➔
Quiz Time
41
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals?
Quiz Time
42
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals ➔
Quiz Time
43
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Quiz Time
44
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Quiz Time
45
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Quiz Time
46
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Quiz Time
47
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Quiz Time
48
Pets Cattle
NIC bonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs ...
Q & A
49
Randy Bias!
Founder & CEO, Cloudscaling!
Director, OpenStack Foundation!
@randybias
Upcoming SlideShare
Loading in …5
×

Pets vs. Cattle: The Elastic Cloud Story

52,045 views

Published on

My recent presentation to the Chicago DevOps Meetup that explains how we're moving from a servers as Pets world to a servers as Cattle world. Understanding this change is critical to success in cloud, DevOps, and delivering new value to the enterprise.

Published in: Technology, Business

Pets vs. Cattle: The Elastic Cloud Story

  1. 1. CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*! * All unlicensed or borrowed works retain their original licenses Pets vs. Cattle:! The Elastic Cloud Story ! DevOps Chicago Meetup! February 26, 2014 @randybias
  2. 2. A Tale of Two Clouds 2
  3. 3. Enterprise Computing Approach 3 GUI Driven! Ticket-Based! Hand-Crafted! Reserved ! Scale-up! Smart Hardware! Proprietary! Traditional Dev! …
  4. 4. Cloud Computing Approach 4 API Driven! Self-Service! Automated! On-demand! Scale-out! Smart Apps! Open Source! Agile DevOps! …
  5. 5. Elastic Cloud Shifts Uptime Responsibility 5 Enterprise Model Cloud Model 99.9%! Applications! (8h46m down) 99.999%! Infrastructure! ($$$$) 99.999% Applications! (5m down) 99% Infrastructure! ($$)
  6. 6. Elastic Cloud Origins 6 Elastic ! Private Cloud Enterprise Virtualization! Private Cloud Elastic & Virtualization 2.0 Clouds are 
 very different.! ! Different workloads.! ! Different ! architectures.! ! Different ! skills.! ! Different economics. ≠ Virtual Infrastructure 
 Standardization, Automation,! Chargeback, Self-Service! Designed for Server Consolidation ! IT Admins manage Infrastructure! Ticket-based manual provisioning! Improves virtualization value = + Elastic Public Cloud 
 On-premise 
 Deployment! Designed for Agility! Cloud Admins manage Services! Self-service automated provisioning! Delivers cloud value on-premise = +
  7. 7. What Companies Care About? 7 Cloud Computing! Agile Development! Business ! Agility! Operational Discipline! ACCELERATING! TIME TO VALUE! Continuous Integration Continuous Testing & Delivery Agile Methodologies IaaS / PaaS ! ! Public / Private / Hybrid ! ! Big Data / Analytics ! ! Public APIs Continuous Deployment DevOps Data Center & 
 App Automation Line of Business Enablement New App Initiatives (Mobile, SaaS, etc.) Data Center Modernization
  8. 8. Elastic Cloud is a Mindset Change 8 Attribution: Bill Baker, Distinguished Engineer, Microsoft bowzer.company.com! (scale-up) web001.company.com! (scale-out) (Virtual) Servers *are* cattle
  9. 9. Pets vs. Cattle Takes Off 9 Microsoft Cloudscaling CERN IBM ScalrRackspace Red Hat Scale-out, not UP in Cloud
  10. 10. (Some) Elastic Cloud Patterns ! ! ! What follows are *some* Elastic Cloud Patterns! There are many more, but these are mine! Input, ideas, & other thoughts welcome via twitter / email 10
  11. 11. Big Failure Domains ! Make Big Craters 11
  12. 12. Big Failure Domains ! Make Big Craters 12 Anti-Pattern Anti-Pattern
  13. 13. Smaller Failure Domains 13 Would you rather have the whole cloud down ! or just a small bit of it for a short time? vs
  14. 14. Loose Coupling 14 Synchronous, blocking calls mean cascading failures. Async, non-block calls mean failure in isolation.
  15. 15. Open Source Software 15 Excessive software taxation is the past. Black boxes create lock-in. You can ! always fork.
  16. 16. Uptime in Software Self-management 16 Hardware fails.! Software fails.! People fail. Only software can measure itself & respond to failure in near real-time. Applications designed for 99.999% uptime can run anywhere
  17. 17. Scale Out vs Scale up 17 Vertical Scaling 
 Make boxes bigger 
 (usually an HA pair) Horizontal Scaling
 Make more boxes
 A A ➔ ➔ B B ...A B C N
  18. 18. Circuit Breaker Pattern 18 Fallback mechanisms (e.g. cached data) 
 ensure uninterrupted service while giving service time to recover When failing service detected, stop calling that API and serve fallback responses
  19. 19. Buy from ODMs 19 ODMs operate their businesses on 3-10% margins. AMZN, GOOG, and Facebook buy direct without a middleman. Only a few enterprise vendors are pivoting to compete.
  20. 20. Less Enterprise “Value” in x86 Servers 20 Generic servers rule. Full stop. Nothing is better because nothing else is *generic*. “... a data center full of vanity free servers ... more efficient ... less expensive to build and run ... “ - OCP
  21. 21. Fully Routed (L3) Networking 21 The largest cloud operators all run layer-3 routed, networks with no VLANs. Cloud-ready apps don’t need or want VLANs. Enterprise apps can be supported on elastic clouds using Software-defined Networking (SDN)
  22. 22. Software-defined Networking (SDN) 22 • x86 server is the new Linecard" • network switch is the new ASIC" • VXLAN (or NVGRE) is the new Chassis" • SDN Controller is the new SUP Engine “Network Virtualization”
  23. 23. Flat Networking + SDNs 23 Flat + SDN co-exist & thrive together Standard Security Group 1 2 Availability Zone VM VM VM VM VM VM Virtual L2 Network VM VMVM Virtual Private Cloud Networking VPC Security Group Internet VPC Gateway Physical Node
  24. 24. RAIS instead of HA Pairs/Clusters Redundant arrays of inexpensive services (RAIS)! Load balanced with no state sharing! Active … active … active … active … ! On failure, connections are lost, but failures are rare! Rolling upgrades are easier, because each server is an island! Think: scale-out + fault isolation (sharding)! Ridiculously simple & scalable! Hardware failures are infrequent & impact subset of traffic! (N-F)/N, where N = total, F = failed! 10 RAIS servers - 1 failure == 90% capacity! Most things retry anyway! Cascade failures are unlikely and failure domains are small 24
  25. 25. Service Array (RAIS) Example 25 Backbone Routers Cloud Access Switches AZ (Spine) Switches RAIS (NAT, LB, VPN) OSPF Route Announcements Return Traffic (default or source NAT) API Public IP Blocks Cloud Control Plane
  26. 26. Lots of Inexpensive 1RU Switches 26 1RU: 6K-30K VMs / AZ Simple spine-and-leaf flat routed network Rack 1 Rack 2 Rack 3 Modular: 40K-200K VMs / AZ Rack 1 Rack 2 Multiple Racks Rack 1 Rack 2 Multiple Racks Rack 1 Rack 2 Multiple Racks
  27. 27. Direct-attached Storage (DAS) 27 Cloud-ready apps manage their own data replication. DAS is the smallest failure domain possible with reasonable storage I/O. SAN == massive failure domain. SSDs will be the great equalizer.
  28. 28. Elastic Block Device Services 28 EBS/EBD is a crutch Bigger failure domains 
 (AWS outage anyone?), complex, sets high expectations Sometimes you need a crutch. When you do, overbuild the network, and make sure you have a smart scheduler. AWS EBS Outage! http://aws.amazon.com/message/65648/
  29. 29. More Servers == More Storage I/O 29 >1M writes/second, triple- redundancy w/ Cassandra on AWS Linear scale-out == linear costs for performance
  30. 30. Hypervisors are a Commodity 30 Cloud end-users want OS of choice, not HVs. Level up! Managing iron is for mainframe operators." … hypervisors are bare metal APIs Hypervisor of the future is open source, easily modifiable, & extensible.
  31. 31. The Hypervisor of the Future 
 May Be NO Hypervisor 31 LXC ironic Bare Metal Cloud
  32. 32. Quiz Time 32
  33. 33. Quiz Time 33 Pets Cattle NIC bonding?
  34. 34. Quiz Time 34 Pets Cattle NIC bonding ➔
  35. 35. Quiz Time 35 Pets Cattle NIC bonding Managing a Server 
 at a Time?
  36. 36. Quiz Time 36 Pets Cattle NIC bonding Managing a Server
 at a Time ➔
  37. 37. Quiz Time 37 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling?
  38. 38. Quiz Time 38 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling➔
  39. 39. Quiz Time 39 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure?
  40. 40. Quiz Time 40 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure➔
  41. 41. Quiz Time 41 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals?
  42. 42. Quiz Time 42 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals ➔
  43. 43. Quiz Time 43 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy?
  44. 44. Quiz Time 44 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy ➔
  45. 45. Quiz Time 45 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy Shared Nothing Architecture?
  46. 46. Quiz Time 46 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy Shared Nothing Architecture➔
  47. 47. Quiz Time 47 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy Shared Nothing Architecture Persistent Block Storage?
  48. 48. Quiz Time 48 Pets Cattle NIC bonding Managing Server at a Time Auto-scaling Design-for-Failure 100% Uptime Goals HA pairs for redundancy Shared Nothing Architecture Persistent Block Storage ➔
  49. 49. Q & A 49 Randy Bias! Founder & CEO, Cloudscaling! Director, OpenStack Foundation! @randybias

×