My presentation for 2012's Cloud Connect that goes over architectural and design patterns for open and scalable clouds. Technical deck targeted at business audiences with a technical bent.
Randy BiasCloud Pioneer, Founding Member of OpenStack Foundation, and Technology Disruptor
1. Architectures for open and scalable clouds
February 14, 2012
Randy Bias, CTO & Co-founder
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution
2. Our Perspective on Cloud Computing
It came from the large Internet players.
2
5. Tenets of Open & Scalable Clouds
1. Avoid vendor lock-in like bubonic plague
• See also Open Cloud Initiative (opencloudinitiative.org)
2. Simplicity scales, complexity fails
• 10x bigger == 100x more complex
3. TCO matters; measuring ROI is critical to success
4. Security is paramount ... but different
5. Risk acceptance over risk mitigation
6. Agility & iteration over big bang
5
6. This is a BIG Topic
• What I am covering today is patterns in:
• Hardware and software
• Networking, storage, and compute
• NOT covered today:
• Cloud operations
• Infrastructure software engineering
• Measuring success through operational excellence
• Security
6
9. Here we go ...
• Elements:
• Open APIs & protocols
• Open hardware
• Open networking
• Open source software (OSS)
• Combined with:
• Architectural patterns, best
practices, & de facto standards
• Operational excellence
9
15. Threads
• Small failure domains are less impacting
• Loose-coupling minimizes cascade failures
• Scale-out over scale-up with exceptions
• More AND cheaper
• State synchronization is dangerous (remember CAP)
• Everything has an API
• Automation ONLY works w/ homogeneity & modularity
• Lowest common denominator (LCD) services (LBaaS vs F5aaS)
• People are the number one source of failures
15
16. Pattern:
Loose coupling
Synchronous, blocking
calls mean cascading
failures.
Async, non-block calls
mean failure in
isolation.
16
17. Pattern:
Open source software
Excessive software
taxation is the past.
You can always fork.
Black boxes create
lock-in.
17
18. Pattern:
Uptime in software - self management
Hardware fails.
Software fails.
People fail.
Only software can
measure itself &
respond to failure in
near real-time.
Applications designed
for 99.999% uptime
can run anywhere
18
19. Pattern:
Scale-out, not UP
You name them
and when they get
Scale Up: (Virtual*)
sick, you nurse
Servers are like pets
them back to
health
garfield.company.com
attrib: Bill Baker, Distinguished Engineer, Microsoft
* added by yours truly ... 19
20. Pattern:
Scale-out, not UP
You name them
and when they get
Scale Up: (Virtual*)
sick, you nurse
Servers are like pets
them back to
health
garfield.company.com
You number them
Scale Out: (Virtual*) and when they get
Servers are like cattle sick, you shoot
them
web001.company.com
attrib: Bill Baker, Distinguished Engineer, Microsoft
* added by yours truly ... 19
21. Pattern:
Buy from ODMs
ODMs operate their
businesses on 3-10%
margins.
AMZN, GOOG, and
Facebook buy direct
without a middleman.
Only a few enterprise
vendors are pivoting to
compete.
20
22. Pattern:
Less enterprise “value” in x86 servers
Generic servers rule. Full
stop. Nothing is better
because nothing else is
*generic*.
“... a data center full of vanity
free servers ... more
efficient ... less expensive to
build and run ... “ - OCP
21
23. Pattern:
Flat Networking
The largest cloud operators
all run layer-3 routed, flat
networks with no VLANs.
Cloud-ready apps don’t
need or want VLANs.
Enterprise apps can be
supported on open clouds
using Software-defined
Networking (SDN)
22
24. Pattern:
Software-defined Networking (SDN)
• x86 server is the new Linecard
• network switch is the new ASIC “Network Virtualization”
• VXLAN (or NVGRE) is the new Chassis
• SDN Controller is the new SUP Engine
23
25. Pattern:
Flat Networking + SDNs
Flat + SDN co-exist Internet
& thrive together
VM VM
Availability
Zone
VM VM
VPC
VM
Gateway
Virtual L2 Network
1 2
VM VM
Standard VM VM VPC Virtual Private
Security Security Cloud
Group Group Networking
Physical
Node
24
26. Pattern:
RAIS instead of HA pairs/clusters
• Redundant arrays of inexpensive services (RAIS)
• Load balanced
• No state sharing
• On failure, connections are lost, but failures are rare
• Ridiculously simple & scalable
• Most things retry anyway
• Hardware failures are in-frequent & impact subset of traffic
• (N-F)/N, where N = total, F = failed
• Cascade failures are unlikely and failure domains are small
25
27. Service array (RAIS) example:
Public IP Backbone Routers
Blocks
OSPF Route Announcements
RAIS (NAT, LB, VPN)
Cloud Access Switches
API
Return Traffic (default or source
NAT)
Cloud
Control
Plane AZ (Spine) Switches
26
30. Pattern:
Direct-attached Storage (DAS)
Cloud-ready apps DAS is the smallest failure
manage their own domain possible with
data replication. reasonable storage I/O.
SAN == massive failure SSDs will be the great
domain. equalizer.
28
31. Pattern:
Elastic Block Device Services
EBS/EBD is a crutch for
poorly written apps.
Bigger failure domains (AWS
outage anyone?), complex, sets
high expectations
Sometimes you need a crutch.
When you do, overbuild the
network, and make sure you have
a smart scheduler.
29
32. Pattern:
More Servers == More Storage I/O
>1M writes/second, triple-
redundancy w/ Cassandra on AWS
Linear scale-out == linear costs for performance
30
33. Pattern:
Hypervisors are a commodity
Cloud end-users want OS
of choice, not HVs.
Level up! Managing iron is for
mainframe operators.
Hypervisor of the future is open
source, easily modifiable, &
extensible.
31