The document discusses the shift from traditional enterprise computing to cloud computing, emphasizing the differences between elastic and traditional models. It highlights the importance of scaling out rather than scaling up, along with the need for agile practices, self-service, and automation in cloud environments. Key themes include uptime responsibility, failure domains, and the evolution of infrastructure management towards more efficient and flexible solutions.
Introduction to Elastic Cloud concepts and the DevOps Meetup context.
Comparison of traditional enterprise computing with cloud computing, focusing on drive, automation, scale, and flexibility.
Discussion on uptime responsibilities and differences between enterprise and elastic cloud models regarding failure.
Emphasis on what companies prioritize in cloud computing and the mindset change required for adopting elastic clouds.
The concept of treating servers as cattle rather than pets to focus on scale and agility in cloud environments.
Concept of failure domains; discusses both big and smaller failure domains and the importance of loose coupling.
The significance of open source software in creating flexible, non-lock-in cloud infrastructures and self-management.
Differences between vertical and horizontal scaling, the operational models of companies buying directly from ODMs.
Discussion on networking strategies, including L3 networking and software-defined networking in cloud environments.
Innovative approaches to redundancy (RAIS) and storage strategies to ensure high availability and minimize failure.
Impact of adding servers on storage I/O, mentions the commoditization of hypervisors and the future direction.
Speculative trends indicating the potential phasing out of hypervisors and moving towards bare metal solutions.A quiz segment reinforcing key concepts discussed about pets versus cattle, scaling, and design-for-failure practices.
Wrap-up with a Q&A session with Randy Bias, allowing for audience interaction and discussion on the topics presented.
CCA - NoDerivs3.0 Unported License - Usage OK, no modifications, full attribution*!
* All unlicensed or borrowed works retain their original licenses
Pets vs. Cattle:!
The Elastic Cloud Story
!
DevOps Chicago Meetup!
February 26, 2014
@randybias
Elastic Cloud Origins
6
Elastic!
Private Cloud
Enterprise Virtualization!
Private Cloud
Elastic &
Virtualization
2.0 Clouds are
very different.!
!
Different
workloads.!
!
Different !
architectures.!
!
Different !
skills.!
!
Different
economics.
≠
Virtual Infrastructure
Standardization, Automation,!
Chargeback, Self-Service!
Designed for Server Consolidation !
IT Admins manage Infrastructure!
Ticket-based manual provisioning!
Improves virtualization value
=
+
Elastic Public Cloud
On-premise
Deployment!
Designed for Agility!
Cloud Admins manage Services!
Self-service automated provisioning!
Delivers cloud value on-premise
=
+
7.
What Companies CareAbout?
7
Cloud
Computing!
Agile
Development!
Business !
Agility!
Operational
Discipline!
ACCELERATING!
TIME TO VALUE!
Continuous
Integration
Continuous
Testing &
Delivery
Agile
Methodologies
IaaS / PaaS
!
!
Public / Private /
Hybrid
!
!
Big Data /
Analytics
!
!
Public APIs
Continuous
Deployment
DevOps Data Center &
App Automation
Line of
Business
Enablement
New App
Initiatives
(Mobile, SaaS,
etc.)
Data Center
Modernization
8.
Elastic Cloud isa Mindset Change
8
Attribution: Bill Baker, Distinguished Engineer, Microsoft
bowzer.company.com!
(scale-up)
web001.company.com!
(scale-out)
(Virtual) Servers *are* cattle
9.
Pets vs. CattleTakes Off
9
Microsoft
Cloudscaling
CERN
IBM
ScalrRackspace
Red Hat
Scale-out, not UP in Cloud
10.
(Some) Elastic CloudPatterns
!
!
!
What follows are *some* Elastic Cloud Patterns!
There are many more, but these are mine!
Input, ideas, & other thoughts welcome via twitter / email
10
Uptime in SoftwareSelf-management
16
Hardware fails.!
Software fails.!
People fail.
Only software can
measure itself &
respond to failure in
near real-time.
Applications designed
for 99.999% uptime can
run anywhere
17.
Scale Out vsScale up
17
Vertical Scaling
Make boxes bigger
(usually an HA pair)
Horizontal Scaling
Make more boxes
A
A
➔
➔
B
B ...A B C N
18.
Circuit Breaker Pattern
18
Fallbackmechanisms (e.g.
cached data)
ensure uninterrupted service
while giving service time to
recover
When failing service
detected, stop calling that
API and serve fallback
responses
19.
Buy from ODMs
19
ODMsoperate their
businesses on 3-10%
margins.
AMZN, GOOG, and
Facebook buy direct
without a middleman.
Only a few enterprise
vendors are pivoting to
compete.
20.
Less Enterprise “Value”in x86 Servers
20
Generic servers rule. Full
stop. Nothing is better
because nothing else is
*generic*.
“... a data center full of vanity free servers
... more efficient ... less expensive to build
and run ... “ - OCP
21.
Fully Routed (L3)Networking
21
The largest cloud operators
all run layer-3 routed,
networks with no VLANs.
Cloud-ready apps don’t
need or want VLANs.
Enterprise apps can be
supported on elastic clouds
using Software-defined
Networking (SDN)
22.
Software-defined Networking (SDN)
22
•x86 server is the new Linecard"
• network switch is the new ASIC"
• VXLAN (or NVGRE) is the new Chassis"
• SDN Controller is the new SUP Engine
“Network Virtualization”
23.
Flat Networking +SDNs
23
Flat + SDN co-exist
& thrive together
Standard
Security
Group
1 2
Availability
Zone
VM VM
VM
VM
VM
VM
Virtual L2 Network
VM
VMVM
Virtual Private
Cloud
Networking
VPC
Security
Group
Internet
VPC
Gateway
Physical
Node
24.
RAIS instead ofHA Pairs/Clusters
Redundant arrays of inexpensive services (RAIS)!
Load balanced with no state sharing!
Active … active … active … active … !
On failure, connections are lost, but failures are rare!
Rolling upgrades are easier, because each server is an island!
Think: scale-out + fault isolation (sharding)!
Ridiculously simple & scalable!
Hardware failures are infrequent & impact subset of traffic!
(N-F)/N, where N = total, F = failed!
10 RAIS servers - 1 failure == 90% capacity!
Most things retry anyway!
Cascade failures are unlikely and failure domains are small
24
25.
Service Array (RAIS)Example
25
Backbone Routers
Cloud Access Switches
AZ (Spine) Switches
RAIS (NAT, LB, VPN)
OSPF Route Announcements
Return Traffic (default or source
NAT)
API
Public IP
Blocks
Cloud
Control
Plane
Direct-attached Storage (DAS)
27
Cloud-readyapps
manage their own
data replication.
DAS is the smallest failure
domain possible with
reasonable storage I/O.
SAN == massive failure
domain.
SSDs will be the great
equalizer.
28.
Elastic Block DeviceServices
28
EBS/EBD is a crutch
Bigger failure domains
(AWS outage anyone?), complex,
sets high expectations
Sometimes you need a crutch.
When you do, overbuild the network, and make sure
you have a smart scheduler.
AWS EBS Outage!
http://aws.amazon.com/message/65648/
29.
More Servers ==More Storage I/O
29
>1M writes/second, triple-
redundancy w/ Cassandra on AWS
Linear scale-out == linear costs for performance
30.
Hypervisors are aCommodity
30
Cloud end-users want OS
of choice, not HVs.
Level up! Managing iron is for
mainframe operators."
… hypervisors are bare metal APIs
Hypervisor of the future is open
source, easily modifiable, &
extensible.
31.
The Hypervisor ofthe Future
May Be NO Hypervisor
31
LXC
ironic
Bare Metal Cloud
Quiz Time
41
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals?
42.
Quiz Time
42
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals ➔
43.
Quiz Time
43
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy?
44.
Quiz Time
44
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy ➔
45.
Quiz Time
45
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy
Shared Nothing
Architecture?
46.
Quiz Time
46
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy
Shared Nothing
Architecture➔
47.
Quiz Time
47
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy
Shared Nothing
Architecture
Persistent Block Storage?
48.
Quiz Time
48
Pets Cattle
NICbonding
Managing Server at a
Time
Auto-scaling
Design-for-Failure
100% Uptime Goals
HA pairs for redundancy
Shared Nothing
Architecture
Persistent Block Storage
➔