How DreamHost builds a
      Public Cloud with OpenStack

Carl Perry <carl.perry@dreamhost.com>
twitter/github/slideshare:edolnx irc:carlp@freenode
Well Hello!
•   I’m the Cloud Architect at DreamHost

•   We’ve been around since 1997

    •   We’re old enough to drive next year!

•   Spun off Inktank as a support company for Ceph last year

•   Launched DreamObjects, a Ceph based S3 alternative, in September

•   This week we Launched DreamCompute, our Public Cloud
“To empower
                                                           entrepreneurs
                                                          and developers”

                                                   Why?
http://www.flickr.com/photos/toywhirl/8050771631/
Design Tenants
•   Design for Reliability
    •   Maintenance is the norm not the exception
•   Isolate tenants from each other by default
•   Modular equipment design
    •   Easy to expand
    •   Easy to upgrade
•   Automate Everything
Considerations
•   Scalability

•   Speed

•   Monitoring

•   Uptime

•   Security

•   Cost
Obstacles
http://www.flickr.com/photos/brewbooks/4206976341/
Storage
•   Must be shared, local storage prevents maintenance

•   Has to be cost effective

•   Has to be massively scalable

•   Must run on commodity hardware

•   Single solution for boot and additional volumes

•   Fully Automat-able
Networking

•   Must support IPv6

•   Tenants must be isolated from each other

•   Cannot be limited to a physical location within a data center

•   10Gb, lots of 10Gb

•   No single point of failure (core switches are so 1980)
Hypervisor

•   Simpler is better

•   Should run on Linux

•   Support for architectures that are not x86(_64) a huge bonus

•   Must not require guest operation system modifications
1998 called, they are disappointed

•   We expect to operate this for more than 6
    months, so IPv6 is a requirement.

•   There are new and exciting problem to solve,
    but it’s past time

•   It’s a great way to piss off vendors

•   Best Part: Everything is Internet Addressable!
Decision Time
http://www.flickr.com/photos/inafrenzy/5787848646/
Hypervisor
•   Scalability: No changes needed for 2-2000 VMs

•   Speed: Fast. Especially when using virtio drivers

•   Monitoring: Lots of support for existing
    systems, hooks for custom ones

•   Uptime: Kernel module and userspace app. Easy
    to patch. Supports live migration

•   Security: Built into kernel, lots of eyes.

•   Cost: Free
Storage
•   Scalability: Works from gigabytes to exabytes

•   Speed: Easy to deploy, IOPS limited by hardware

•   Monitoring: Userspace apps, easy to monitor
    health of hardware. Software monitoring getting
    better all the time

•   Uptime: Userspace apps. Designed for high
    availability

•   Security: Provides isolation layers, not directly
    accessible to tenants

•   Cost: Free*
Physical Networking Hardware
•   Scalability: Pizza boxes, just buy more

•   Speed: Based on Broadcom Trident platform

•   Monitoring: (software)

•   Uptime: These guys make the switches for top
    tier OEMs

•   Security: (software)

•   Cost: Extremely Affordable (about the cost of a
    server)
Physical Networking Software
•   Scalability: Designed for spine & leaf and fat-tree
    architectures. Runs Linux natively.

•   Speed: Limited only by hardware

•   Monitoring: It’s Linux!

•   Uptime: Designed to meet our model

•   Security: It’s Linux!

•   Cost: Extremely Affordable (fraction of
    hardware)
Logical Networking Software
•   Scalability: Scales out with the rest of the cluster

•   Speed: Low overhead

•   Monitoring: SNMP and SFLOW

•   Uptime: No control plane has no single point of
    failure. We designed around HV node being
    failure point.

•   Security: Everyone is on their own network.
    Shared NOTHING.

•   Cost: Worth Every Penny
Who needs spanning tree?
                                     North Pod
                 West Pod                                 East Pod




                       QSFP+ Spine               QSFP+ Spine




           SFP+ Leaf                                                 SFP+ Leaf


        10/100/1000 Edge                                       10/100/1000 Edge
Automation
•   Scalability: No Problem

•   Speed: High speed, low drag

•   Monitoring: Easy

•   Uptime: If the server goes down for
    maintenance, we keep running just not changing

•   Security: No open ports!

•   Cost: Depends
Internet & SAN Access
•   Scalability: Scales out with the rest of the cluster

•   Speed: Blazing

•   Monitoring: SNMP and SFLOW

•   Uptime: Using multiple switches each in it’s own
    failure domain to allow for maintenance/
    upgrades

•   Security: Proven in the harshest environments

•   Cost: Best in class
Wait...
Did you just say SAN?
“If only you had an Open Source
       Cloud Infrastructure
     Orchestration Platform”
                       -Ron Pedde
HA Solution
•   Scalability: Somewhat Limited, but that’s OK

•   Speed: Impressive

•   Monitoring: Complicated

•   Uptime: Trusting the vendors on this one

•   Security: The enterprise better not be wrong

•   Cost: OUCH
Attention CTOs:
Avert your eyes now
HARDCORE HARDWARE
What Customers See
What Power Users See
Questions? not in sessions
Will be at booth to answer questions when
        (or leave a card - no SPAM I promise)
      http://slideshare.net/edolnx/presentations

How DreamHost builds a Public Cloud with OpenStack

  • 1.
    How DreamHost buildsa Public Cloud with OpenStack Carl Perry <carl.perry@dreamhost.com> twitter/github/slideshare:edolnx irc:carlp@freenode
  • 2.
    Well Hello! • I’m the Cloud Architect at DreamHost • We’ve been around since 1997 • We’re old enough to drive next year! • Spun off Inktank as a support company for Ceph last year • Launched DreamObjects, a Ceph based S3 alternative, in September • This week we Launched DreamCompute, our Public Cloud
  • 3.
    “To empower entrepreneurs and developers” Why? http://www.flickr.com/photos/toywhirl/8050771631/
  • 4.
    Design Tenants • Design for Reliability • Maintenance is the norm not the exception • Isolate tenants from each other by default • Modular equipment design • Easy to expand • Easy to upgrade • Automate Everything
  • 5.
    Considerations • Scalability • Speed • Monitoring • Uptime • Security • Cost
  • 6.
  • 7.
    Storage • Must be shared, local storage prevents maintenance • Has to be cost effective • Has to be massively scalable • Must run on commodity hardware • Single solution for boot and additional volumes • Fully Automat-able
  • 8.
    Networking • Must support IPv6 • Tenants must be isolated from each other • Cannot be limited to a physical location within a data center • 10Gb, lots of 10Gb • No single point of failure (core switches are so 1980)
  • 9.
    Hypervisor • Simpler is better • Should run on Linux • Support for architectures that are not x86(_64) a huge bonus • Must not require guest operation system modifications
  • 10.
    1998 called, theyare disappointed • We expect to operate this for more than 6 months, so IPv6 is a requirement. • There are new and exciting problem to solve, but it’s past time • It’s a great way to piss off vendors • Best Part: Everything is Internet Addressable!
  • 11.
  • 12.
    Hypervisor • Scalability: No changes needed for 2-2000 VMs • Speed: Fast. Especially when using virtio drivers • Monitoring: Lots of support for existing systems, hooks for custom ones • Uptime: Kernel module and userspace app. Easy to patch. Supports live migration • Security: Built into kernel, lots of eyes. • Cost: Free
  • 13.
    Storage • Scalability: Works from gigabytes to exabytes • Speed: Easy to deploy, IOPS limited by hardware • Monitoring: Userspace apps, easy to monitor health of hardware. Software monitoring getting better all the time • Uptime: Userspace apps. Designed for high availability • Security: Provides isolation layers, not directly accessible to tenants • Cost: Free*
  • 14.
    Physical Networking Hardware • Scalability: Pizza boxes, just buy more • Speed: Based on Broadcom Trident platform • Monitoring: (software) • Uptime: These guys make the switches for top tier OEMs • Security: (software) • Cost: Extremely Affordable (about the cost of a server)
  • 15.
    Physical Networking Software • Scalability: Designed for spine & leaf and fat-tree architectures. Runs Linux natively. • Speed: Limited only by hardware • Monitoring: It’s Linux! • Uptime: Designed to meet our model • Security: It’s Linux! • Cost: Extremely Affordable (fraction of hardware)
  • 16.
    Logical Networking Software • Scalability: Scales out with the rest of the cluster • Speed: Low overhead • Monitoring: SNMP and SFLOW • Uptime: No control plane has no single point of failure. We designed around HV node being failure point. • Security: Everyone is on their own network. Shared NOTHING. • Cost: Worth Every Penny
  • 17.
    Who needs spanningtree? North Pod West Pod East Pod QSFP+ Spine QSFP+ Spine SFP+ Leaf SFP+ Leaf 10/100/1000 Edge 10/100/1000 Edge
  • 18.
    Automation • Scalability: No Problem • Speed: High speed, low drag • Monitoring: Easy • Uptime: If the server goes down for maintenance, we keep running just not changing • Security: No open ports! • Cost: Depends
  • 19.
    Internet & SANAccess • Scalability: Scales out with the rest of the cluster • Speed: Blazing • Monitoring: SNMP and SFLOW • Uptime: Using multiple switches each in it’s own failure domain to allow for maintenance/ upgrades • Security: Proven in the harshest environments • Cost: Best in class
  • 20.
  • 21.
    “If only youhad an Open Source Cloud Infrastructure Orchestration Platform” -Ron Pedde
  • 23.
    HA Solution • Scalability: Somewhat Limited, but that’s OK • Speed: Impressive • Monitoring: Complicated • Uptime: Trusting the vendors on this one • Security: The enterprise better not be wrong • Cost: OUCH
  • 24.
  • 25.
  • 26.
  • 28.
  • 29.
    Questions? not insessions Will be at booth to answer questions when (or leave a card - no SPAM I promise) http://slideshare.net/edolnx/presentations