Architectures for open and scalable clouds

Randy Bias
Randy BiasCloud Pioneer, Founding Member of OpenStack Foundation, and Technology Disruptor
Architectures for open and scalable clouds
February 14, 2012

Randy Bias, CTO & Co-founder




                               CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution
Our Perspective on Cloud Computing
      It came from the large Internet players.




                        2
A Story of Two Clouds




          3
A Story of Two Clouds




          4
Tenets of Open & Scalable Clouds



1. Avoid vendor lock-in like bubonic plague
  • See also Open Cloud Initiative (opencloudinitiative.org)

2. Simplicity scales, complexity fails
  • 10x bigger == 100x more complex
3. TCO matters; measuring ROI is critical to success
4. Security is paramount ... but different
5. Risk acceptance over risk mitigation
6. Agility & iteration over big bang



                                  5
This is a BIG Topic



• What I am covering today is patterns in:
  • Hardware and software
  • Networking, storage, and compute
• NOT covered today:
  • Cloud operations
  • Infrastructure software engineering
  • Measuring success through operational excellence
  • Security



                              6
Open Clouds
  (briefly)




     7
A Word on ‘Open’




        8
Here we go ...



• Elements:
  • Open APIs & protocols
  • Open hardware
  • Open networking
  • Open source software (OSS)
• Combined with:
  • Architectural patterns, best
    practices, & de facto standards
  • Operational excellence


                               9
Open APIs & Protocols




          10
Open Hardware




      11
Open Networking
Published Networking
      Blueprints




                              12
Open Source Software




Open Cloud OS




                          13
Open & Scalable
Cloud Patterns



       14
Threads



•   Small failure domains are less impacting
•   Loose-coupling minimizes cascade failures
•   Scale-out over scale-up with exceptions
•   More AND cheaper
•   State synchronization is dangerous (remember CAP)
•   Everything has an API
•   Automation ONLY works w/ homogeneity & modularity
•   Lowest common denominator (LCD) services (LBaaS vs F5aaS)
•   People are the number one source of failures



                                 15
Pattern:
                   Loose coupling


Synchronous, blocking
 calls mean cascading
        failures.




                               Async, non-block calls
                                  mean failure in
                                     isolation.



                          16
Pattern:
Open source software

  Excessive software
 taxation is the past.




  You can always fork.




  Black boxes create
        lock-in.

              17
Pattern:
Uptime in software - self management

Hardware fails.
Software fails.
 People fail.

                   Only software can
                    measure itself &
                  respond to failure in
                     near real-time.

                                     Applications designed
                                      for 99.999% uptime
                                       can run anywhere



                               18
Pattern:
                              Scale-out, not UP

                                                              You name them
                                                             and when they get
 Scale Up: (Virtual*)
                                                               sick, you nurse
Servers are like pets
                                                                them back to
                                                                    health

                                  garfield.company.com




     attrib: Bill Baker, Distinguished Engineer, Microsoft
     * added by yours truly ...                       19
Pattern:
                                 Scale-out, not UP

                                                                 You name them
                                                                and when they get
   Scale Up: (Virtual*)
                                                                  sick, you nurse
  Servers are like pets
                                                                   them back to
                                                                       health

                                     garfield.company.com


                                                                You number them
 Scale Out: (Virtual*)                                          and when they get
Servers are like cattle                                           sick, you shoot
                                                                        them

                                     web001.company.com
        attrib: Bill Baker, Distinguished Engineer, Microsoft
        * added by yours truly ...                       19
Pattern:
                 Buy from ODMs

ODMs operate their
businesses on 3-10%
      margins.

                   AMZN, GOOG, and
                  Facebook buy direct
                  without a middleman.

                                     Only a few enterprise
                                    vendors are pivoting to
                                           compete.




                            20
Pattern:
Less enterprise “value” in x86 servers

 Generic servers rule. Full
 stop. Nothing is better
 because nothing else is
        *generic*.




            “... a data center full of vanity
                  free servers ... more
             efficient ... less expensive to
                build and run ... “ - OCP


                              21
Pattern:
                    Flat Networking
The largest cloud operators
 all run layer-3 routed, flat
 networks with no VLANs.



  Cloud-ready apps don’t
   need or want VLANs.


  Enterprise apps can be
supported on open clouds
 using Software-defined
    Networking (SDN)



                               22
Pattern:
          Software-defined Networking (SDN)
•   x86 server is the new Linecard
•   network switch is the new ASIC              “Network Virtualization”
•   VXLAN (or NVGRE) is the new Chassis
•   SDN Controller is the new SUP Engine




                                           23
Pattern:
                 Flat Networking + SDNs
Flat + SDN co-exist                                                  Internet
 & thrive together

              VM                                            VM

                                Availability
                                  Zone
              VM                                            VM
                                                                                           VPC
                                                                     VM
                                                                                          Gateway

                                                                     Virtual L2 Network

                       1                              2
                                                                     VM             VM




            Standard       VM                    VM         VPC      Virtual Private
            Security                                      Security       Cloud
             Group                                         Group      Networking
                                 Physical
                                  Node


                                            24
Pattern:
      RAIS instead of HA pairs/clusters


• Redundant arrays of inexpensive services (RAIS)
  • Load balanced
  • No state sharing
  • On failure, connections are lost, but failures are rare
• Ridiculously simple & scalable
• Most things retry anyway
• Hardware failures are in-frequent & impact subset of traffic
  • (N-F)/N, where N = total, F = failed
• Cascade failures are unlikely and failure domains are small


                               25
Service array (RAIS) example:


Public IP                                        Backbone Routers
 Blocks
                   OSPF Route Announcements



     RAIS (NAT, LB, VPN)


                                               Cloud Access Switches
  API


                   Return Traffic (default or source
                                NAT)
Cloud
Control
 Plane                                           AZ (Spine) Switches




                                  26
Pattern:
  Lots of inexpensive 1RU Switches
           Simple spine-and-leaf flat routed network




  Rack 1    Rack 2   Rack 3




1RU: 6K-30K VMs / AZ


                              27
Pattern:
  Lots of inexpensive 1RU Switches
           Simple spine-and-leaf flat routed network




                                        Multiple   Multiple   Multiple
  Rack 1    Rack 2   Rack 3
                                        Racks2
                                         Rack      Racks2
                                                    Rack      Racks2
                                                               Rack
                                          Rack 1     Rack 1     Rack 1




1RU: 6K-30K VMs / AZ               Modular: 40K-200K VMs / AZ


                              27
Pattern:
        Direct-attached Storage (DAS)
Cloud-ready apps               DAS is the smallest failure
manage their own                 domain possible with
data replication.               reasonable storage I/O.




SAN == massive failure        SSDs will be the great
      domain.                      equalizer.

                         28
Pattern:
           Elastic Block Device Services

                          EBS/EBD is a crutch for
                           poorly written apps.


 Bigger failure domains (AWS
outage anyone?), complex, sets
      high expectations


                        Sometimes you need a crutch.
                          When you do, overbuild the
                       network, and make sure you have
                              a smart scheduler.



                                 29
Pattern:
      More Servers == More Storage I/O
   >1M writes/second, triple-
redundancy w/ Cassandra on AWS




        Linear scale-out == linear costs for performance

                               30
Pattern:
           Hypervisors are a commodity

Cloud end-users want OS
   of choice, not HVs.




                                     Level up! Managing iron is for
                                         mainframe operators.




  Hypervisor of the future is open
    source, easily modifiable, &
            extensible.


                                31
Open Cloud System
Simply Scaled        Production Ready




     randyb@cloudscaling.com
           @randybias




                32
1 of 34

More Related Content

Viewers also liked(20)

Getting Started with Amazon CloudSearchGetting Started with Amazon CloudSearch
Getting Started with Amazon CloudSearch
Amazon Web Services15.7K views
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
Asian Institute of Technology (AIT)28.7K views
ClientSummit2010_CloudWorkshopClientSummit2010_CloudWorkshop
ClientSummit2010_CloudWorkshop
Razorfish21.8K views
Cloud is such stuff as dreams are made onCloud is such stuff as dreams are made on
Cloud is such stuff as dreams are made on
Patrick Chanezon58.5K views
High Performance Web ApplicationsHigh Performance Web Applications
High Performance Web Applications
Amazon Web Services9.9K views
Big data and intelligent platformsBig data and intelligent platforms
Big data and intelligent platforms
Krishnan Subramanian7.2K views
Google App Engine for Business 101Google App Engine for Business 101
Google App Engine for Business 101
Chris Schalk5.5K views
Cloud Computing for Enterprise ArchitectsCloud Computing for Enterprise Architects
Cloud Computing for Enterprise Architects
Jean-François Caenen6.6K views

Similar to Architectures for open and scalable clouds(20)

Vr storm cips_03nov2010Vr storm cips_03nov2010
Vr storm cips_03nov2010
National Research Council Canada233 views
Cloud deep-dive0212Cloud deep-dive0212
Cloud deep-dive0212
Accenture1.7K views
Cloud ComputingCloud Computing
Cloud Computing
Mark Frydenberg7.6K views
20090911 virtualizationandcloud20090911 virtualizationandcloud
20090911 virtualizationandcloud
Debabrata Debnath189 views
Achieving scalability & speed with IaaSAchieving scalability & speed with IaaS
Achieving scalability & speed with IaaS
IBM Software India434 views
Cloud computingCloud computing
Cloud computing
Higher Private School of Engineering and Technology428 views
Cloud computingCloud computing
Cloud computing
Higher Private School of Engineering and Technology578 views
Cloudy Ajax 08 10Cloudy Ajax 08 10
Cloudy Ajax 08 10
rajivmordani418 views
Cloud Foundry et le Cloud vu par VMwareCloud Foundry et le Cloud vu par VMware
Cloud Foundry et le Cloud vu par VMware
Publicis Sapient Engineering1.3K views
2012 open storage summit   keynote2012 open storage summit   keynote
2012 open storage summit keynote
Randy Bias1.9K views
20090911 virtualizationandcloud20090911 virtualizationandcloud
20090911 virtualizationandcloud
Supratik Ghatak306 views
20090911 virtualizationandcloud20090911 virtualizationandcloud
20090911 virtualizationandcloud
Meenal Joshi131 views
Chep2012Chep2012
Chep2012
Sebastien Goasguen609 views
AbiCloud Webinar 1.0AbiCloud Webinar 1.0
AbiCloud Webinar 1.0
Abiquo, Inc.950 views
eNovance Make Your CloudeNovance Make Your Cloud
eNovance Make Your Cloud
eNovance937 views

Recently uploaded(20)

Architectures for open and scalable clouds

  • 1. Architectures for open and scalable clouds February 14, 2012 Randy Bias, CTO & Co-founder CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution
  • 2. Our Perspective on Cloud Computing It came from the large Internet players. 2
  • 3. A Story of Two Clouds 3
  • 4. A Story of Two Clouds 4
  • 5. Tenets of Open & Scalable Clouds 1. Avoid vendor lock-in like bubonic plague • See also Open Cloud Initiative (opencloudinitiative.org) 2. Simplicity scales, complexity fails • 10x bigger == 100x more complex 3. TCO matters; measuring ROI is critical to success 4. Security is paramount ... but different 5. Risk acceptance over risk mitigation 6. Agility & iteration over big bang 5
  • 6. This is a BIG Topic • What I am covering today is patterns in: • Hardware and software • Networking, storage, and compute • NOT covered today: • Cloud operations • Infrastructure software engineering • Measuring success through operational excellence • Security 6
  • 7. Open Clouds (briefly) 7
  • 8. A Word on ‘Open’ 8
  • 9. Here we go ... • Elements: • Open APIs & protocols • Open hardware • Open networking • Open source software (OSS) • Combined with: • Architectural patterns, best practices, & de facto standards • Operational excellence 9
  • 10. Open APIs & Protocols 10
  • 14. Open & Scalable Cloud Patterns 14
  • 15. Threads • Small failure domains are less impacting • Loose-coupling minimizes cascade failures • Scale-out over scale-up with exceptions • More AND cheaper • State synchronization is dangerous (remember CAP) • Everything has an API • Automation ONLY works w/ homogeneity & modularity • Lowest common denominator (LCD) services (LBaaS vs F5aaS) • People are the number one source of failures 15
  • 16. Pattern: Loose coupling Synchronous, blocking calls mean cascading failures. Async, non-block calls mean failure in isolation. 16
  • 17. Pattern: Open source software Excessive software taxation is the past. You can always fork. Black boxes create lock-in. 17
  • 18. Pattern: Uptime in software - self management Hardware fails. Software fails. People fail. Only software can measure itself & respond to failure in near real-time. Applications designed for 99.999% uptime can run anywhere 18
  • 19. Pattern: Scale-out, not UP You name them and when they get Scale Up: (Virtual*) sick, you nurse Servers are like pets them back to health garfield.company.com attrib: Bill Baker, Distinguished Engineer, Microsoft * added by yours truly ... 19
  • 20. Pattern: Scale-out, not UP You name them and when they get Scale Up: (Virtual*) sick, you nurse Servers are like pets them back to health garfield.company.com You number them Scale Out: (Virtual*) and when they get Servers are like cattle sick, you shoot them web001.company.com attrib: Bill Baker, Distinguished Engineer, Microsoft * added by yours truly ... 19
  • 21. Pattern: Buy from ODMs ODMs operate their businesses on 3-10% margins. AMZN, GOOG, and Facebook buy direct without a middleman. Only a few enterprise vendors are pivoting to compete. 20
  • 22. Pattern: Less enterprise “value” in x86 servers Generic servers rule. Full stop. Nothing is better because nothing else is *generic*. “... a data center full of vanity free servers ... more efficient ... less expensive to build and run ... “ - OCP 21
  • 23. Pattern: Flat Networking The largest cloud operators all run layer-3 routed, flat networks with no VLANs. Cloud-ready apps don’t need or want VLANs. Enterprise apps can be supported on open clouds using Software-defined Networking (SDN) 22
  • 24. Pattern: Software-defined Networking (SDN) • x86 server is the new Linecard • network switch is the new ASIC “Network Virtualization” • VXLAN (or NVGRE) is the new Chassis • SDN Controller is the new SUP Engine 23
  • 25. Pattern: Flat Networking + SDNs Flat + SDN co-exist Internet & thrive together VM VM Availability Zone VM VM VPC VM Gateway Virtual L2 Network 1 2 VM VM Standard VM VM VPC Virtual Private Security Security Cloud Group Group Networking Physical Node 24
  • 26. Pattern: RAIS instead of HA pairs/clusters • Redundant arrays of inexpensive services (RAIS) • Load balanced • No state sharing • On failure, connections are lost, but failures are rare • Ridiculously simple & scalable • Most things retry anyway • Hardware failures are in-frequent & impact subset of traffic • (N-F)/N, where N = total, F = failed • Cascade failures are unlikely and failure domains are small 25
  • 27. Service array (RAIS) example: Public IP Backbone Routers Blocks OSPF Route Announcements RAIS (NAT, LB, VPN) Cloud Access Switches API Return Traffic (default or source NAT) Cloud Control Plane AZ (Spine) Switches 26
  • 28. Pattern: Lots of inexpensive 1RU Switches Simple spine-and-leaf flat routed network Rack 1 Rack 2 Rack 3 1RU: 6K-30K VMs / AZ 27
  • 29. Pattern: Lots of inexpensive 1RU Switches Simple spine-and-leaf flat routed network Multiple Multiple Multiple Rack 1 Rack 2 Rack 3 Racks2 Rack Racks2 Rack Racks2 Rack Rack 1 Rack 1 Rack 1 1RU: 6K-30K VMs / AZ Modular: 40K-200K VMs / AZ 27
  • 30. Pattern: Direct-attached Storage (DAS) Cloud-ready apps DAS is the smallest failure manage their own domain possible with data replication. reasonable storage I/O. SAN == massive failure SSDs will be the great domain. equalizer. 28
  • 31. Pattern: Elastic Block Device Services EBS/EBD is a crutch for poorly written apps. Bigger failure domains (AWS outage anyone?), complex, sets high expectations Sometimes you need a crutch. When you do, overbuild the network, and make sure you have a smart scheduler. 29
  • 32. Pattern: More Servers == More Storage I/O >1M writes/second, triple- redundancy w/ Cassandra on AWS Linear scale-out == linear costs for performance 30
  • 33. Pattern: Hypervisors are a commodity Cloud end-users want OS of choice, not HVs. Level up! Managing iron is for mainframe operators. Hypervisor of the future is open source, easily modifiable, & extensible. 31
  • 34. Open Cloud System Simply Scaled Production Ready randyb@cloudscaling.com @randybias 32