Cloud Storage Futures      (previously: Designing Private & Public Clouds)

May 22nd, 2012

Randy Bias, CTO & Co-founder

@randybias



                               CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*
                                                   * All unlicensed or borrowed works retain their original licenses
Part 1:

The Two Cloud
 Architectures


      2
A Story of Two Clouds




                 Scale-out




Enterprise


             3
... Driven by Two App Types




                New Elastic Apps




Existing Apps


                4
Cloud Computing ... Disrupts




       Mainframe            Enterprise            Cloud
       Computing            Computing           Computing

       "Big Iron"          "Client-Server"       "Web"

1960                1980                 2000               2020



                                  5
Cloud Computing ... Disrupts




       Mainframe            Enterprise            Cloud
       Computing            Computing           Computing

       "Big Iron"          "Client-Server"       "Web"

1960                1980                 2000               2020

             Disruption
                                  5
Cloud Computing ... Disrupts




       Mainframe            Enterprise            Cloud
       Computing            Computing           Computing

       "Big Iron"          "Client-Server"         "Web"

1960                1980                 2000               2020

             Disruption               Disruption
                                  5
Cloud Computing ... Disrupts




       Mainframe            Enterprise            Cloud
       Computing            Computing           Computing

       "Big Iron"          "Client-Server"         "Web"

1960                1980                 2000               2020

             Disruption               Disruption      Disruption
                                  5
IT – Evolution of Computing Models

   SLA


  Scaling


 Hardware


 HA Type


 Software


Consumption


                Mainframe      Enterprise         Cloud
                “big-iron”   “client/server”   “scale-out”


                                 6
IT – Evolution of Computing Models

   SLA           99.999


  Scaling                    Vertical


 Hardware       Custom


 HA Type                     Hardware


 Software      Centralized


Consumption    Centralized
                Service


                Mainframe                 Enterprise         Cloud
                “big-iron”              “client/server”   “scale-out”


                                            6
IT – Evolution of Computing Models

   SLA           99.999                      99.9


  Scaling                    Vertical                     Horizontal


 Hardware       Custom                    Enterprise


 HA Type                     Hardware                     Software


 Software      Centralized                      Decentralized


Consumption    Centralized                 Shared
                Service                    Service


                Mainframe                 Enterprise                      Cloud
                “big-iron”              “client/server”                “scale-out”


                                            6
IT – Evolution of Computing Models

   SLA           99.999                      99.9                       Always On


  Scaling                    Vertical                     Horizontal


 Hardware       Custom                    Enterprise                   Commodity


 HA Type                     Hardware                     Software


 Software      Centralized                      Decentralized                 Distributed


Consumption    Centralized                 Shared                       Self-service
                Service                    Service


                Mainframe                 Enterprise                      Cloud
                “big-iron”              “client/server”                “scale-out”


                                            6
Enterprise Computing
(existing apps built in silos)




              7
Cloud Computing
(new elastic apps)




        8
Scale-out apps require elastic infrastructure


          Traditional apps    Elastic cloud-ready apps



APPS




INFRA


                        9
Scale-out Cloud Technology

          Traditional apps   Elastic cloud-ready apps



APPS




INFRA


                       10
Scale-out Principles



•   Small failure domains
•   Risk acceptance vs. risk mitigation
•   More boxes for throughput & redundancy
•   Assume app manages complexity:
    • Data replication
    • Assumes infrastructure is unreliable:
      • Server & data redundancy
    • Geo-distribution
    • Auto-scaling


                                  11
What’s a failure domain?



• “Blast radius” during a failure
• What is impacted?
• Public SAN failures:
  • FlexiScale SAN failure in 2007
  • UOL Brazil in 2011:
     • http://goo.gl/8ct9n
  • There are many more
• Enterprise HA ‘pairs’ typically support
  BIG failure domains




                                12
Two Diff Arches for Two Kinds of Apps

App

                                                   Elastic Scale-out
        Location of complexity




                                                         Cloud


                                      “Enterprise”
                                         Cloud


Infra
                                        Primary scaling dimension
                                 Up                                    Out

                                                  13
Part 2:

Storage Architectures
    Are Changing


          14
Two Diff Storages for Two Kinds of Clouds




                 “Classic”        “Scale-out”
                 Storage            Storage



    Uptime in Infra                       Uptime in apps
Every part is redundant               Minimal h/w redundancy
  Data mgmt in Infra                    Data mgmt in apps
Bigger SAN/NAS/DFS                    Smaller failure domains

                             15
Difference in Tiers


Tier    $     Purpose        Classic            Scale-out

                Mission
                           •SAN, then NAS   •On-demand SAN (EBS)
 1     $$$$
                Critical
                           •10-15K RPM      •DynamoDB (AWS)
                           •SSD             •Variable service levels

                           •NAS then SAN •DAS
 2      $$    Important
                           •7.2K RPM     •App / DFS to scale out


               Archive &   •Tape
 3      $
               Backups     •Nearline 5.4K   •Object Storage




                              16
The Biggest Difference is in Where Data
        Management Resides


• In scale-out systems, apps are
  managing the data:


  • Riak / Scale-out distributed data
     store
  • Hadoop+HDFS / Scale-out
     distributed computation systems
  • Cassandra / Scale-out distributed
     columnar database




                               17
Cassandra / Netflix use case



• 3 x Replication
• Linearly scaling performance
    • 50 - 300 nodes
• > 1M writes/second

• When is this perfect?
   • data size unknown
   • growth unknown
   • lots of elastic dynamism


                                 18
Cassandra / Netflix use case


• DAS (‘ephemeral store’)
• Per node perf is constant
    • disk
    • CPU
    • network
• Client write times constant
• Nothing special here




                                19
Cassandra / Netflix use case



•   On-demand & app-managed
•   Cost per GB/hr: $.006
•   Cost per GB/mo: $4.14
•   Includes: storage, DB, storage
    admin, network, network
    admin, etc. etc. etc.




                                     20
Part 3:

Scale-out Storage ...
   Now & Future


          21
Only Change is Certain




           22
There are a few basic approaches being
               taken ...




                   23
Scale-out SAN



• In-rack SAN == faster, bigger
  DAS w/ better stat-muxing
• Accept normal DAS failure rates
• Assume app handles data
  replication                            Dedicated Storage SW

• Like AWS ‘ephemeral storage’           9K Jumbo Frames
                                         SSD caches (ZIL/L2ARC)
• KT architecture                        No replication
• Customers didn’t “get it”              Max HW redundancy

 • “Ephemeral SAN” not well
    understood



                                    24
AWS EBS - “Block-devices-as-a-Service”

                                                                                   API

                                                                           Cloud Control System
                                                                                   EBS
                                                                                Scheduler
• Scale-out SAN (sort of)
• Block scheduler                                                                                   EBS
                                                                                                  Clusters
• Async replication                        Core Network
                                                                            1         1
 • Some failure tolerance                                                             2

• Scheduler:                                                                2         3


 • Allocates customer block                                                           4
                                                                                                    Inter-rack
                                                                                                  cluster async
   devices across many                                VM1   N1     1   3
                                                                            3         5            replication
   failure domains
                                    RACK




                                                            RACK
                                                                                      6

• Customer run RAID inside                                                            7
                                                                                                     Intra-rack
                                                                                                   cluster async
  VMs to increase             VM2   N2            2                                   8
                                                                                                    replication

  redundancy


                                             25
DAS + Big Data
          (Storage + Compute + DFS)


• Storage capability:
  • Replication
  • Disk & server failure
  • Data rebalancing
  • Data locality
     • rack awareness
  • Checksums (basic)
• Also:
  • Built in computation


                            26
Distributed File Systems (DFS) over DAS



• Storage capability:
  • Replication
  • Disk & server failure
  • Data rebalancing
  • Checksums (w/ btrfs)
  • Block devices
• Also:                                                    ✴
                                           ceph architecture
  • No computation



       ✴ Looks familiar doesn’t it?   27
Why is DFS at the Physical Layer
  Dangerous for Scale-out?

 DFS



                     ==




                28
DAS + Database Replication / Scaling



• Storage capability:
  • Async/Sync Replication
  • Server failure
  • Checksums (sort of)
• Also:
  • Std RDBMS
  • SQL i/f
  • Well understood



                             29
Object Storage



• Storage capability:
 • Replication
 • Disk & server failure
 • Data rebalancing
 • Checksums (sometimes)
• Also:
 • Looks like a big web app
 • Uses a DHT/CHT to ‘index’ blobs
 • Very simple


                              30
Where does OpenStorage Fit?

 Scale-out                               Virtual or
                 Purpose / Tier                                   Fit
 Solution                                Physical?
Scale-out SAN       Tier-1/2               Physical           In-rack SAN

                                                              EBS Clusters
     EBS             Tier-1                Physical
                                                            (scale-out SAN)
                                                            Reliable, bit-rot
DAS+BigData          Tier-2                 Virtual
                                                             resistant DAS
                                                            Reliable, bit-rot
  DAS+DFS            Tier-2            Physical / Virtual    resistant DAS
                                                              (unproven)
                                                            In-VM reliable
  DAS+DB             Tier-2                 Virtual
                                                                DAS
                                                            Reliable, bit-rot
Object Storage       Tier-3                Physical
                                                             resistant DAS




                                  31
Summarizing ZFS Value in Scale-out



•   Data integrity & bit rot an issue that few solve today
•   Most SAN/NAS solutions don’t ‘scale down’
•   Commodity x86 servers are winning
•   There are two scale-out places ZFS wins:


    • Small SAN clusters
    • Best DAS management




                                  32
Summary




   33
Conclusions / Speculations



•   Build the right cloud
•   Which means the right storage for *that* cloud
•   A single cloud might support both ...
•   Open storage can be used for both ...
    • ... WITH the appropriate design/forethought




                                 34
Q&A
@randybias




    35

2012 open storage summit keynote

  • 1.
    Cloud Storage Futures (previously: Designing Private & Public Clouds) May 22nd, 2012 Randy Bias, CTO & Co-founder @randybias CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution* * All unlicensed or borrowed works retain their original licenses
  • 2.
    Part 1: The TwoCloud Architectures 2
  • 3.
    A Story ofTwo Clouds Scale-out Enterprise 3
  • 4.
    ... Driven byTwo App Types New Elastic Apps Existing Apps 4
  • 5.
    Cloud Computing ...Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web" 1960 1980 2000 2020 5
  • 6.
    Cloud Computing ...Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web" 1960 1980 2000 2020 Disruption 5
  • 7.
    Cloud Computing ...Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web" 1960 1980 2000 2020 Disruption Disruption 5
  • 8.
    Cloud Computing ...Disrupts Mainframe Enterprise Cloud Computing Computing Computing "Big Iron" "Client-Server" "Web" 1960 1980 2000 2020 Disruption Disruption Disruption 5
  • 9.
    IT – Evolutionof Computing Models SLA Scaling Hardware HA Type Software Consumption Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
  • 10.
    IT – Evolutionof Computing Models SLA 99.999 Scaling Vertical Hardware Custom HA Type Hardware Software Centralized Consumption Centralized Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
  • 11.
    IT – Evolutionof Computing Models SLA 99.999 99.9 Scaling Vertical Horizontal Hardware Custom Enterprise HA Type Hardware Software Software Centralized Decentralized Consumption Centralized Shared Service Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
  • 12.
    IT – Evolutionof Computing Models SLA 99.999 99.9 Always On Scaling Vertical Horizontal Hardware Custom Enterprise Commodity HA Type Hardware Software Software Centralized Decentralized Distributed Consumption Centralized Shared Self-service Service Service Mainframe Enterprise Cloud “big-iron” “client/server” “scale-out” 6
  • 13.
  • 14.
  • 15.
    Scale-out apps requireelastic infrastructure Traditional apps Elastic cloud-ready apps APPS INFRA 9
  • 16.
    Scale-out Cloud Technology Traditional apps Elastic cloud-ready apps APPS INFRA 10
  • 17.
    Scale-out Principles • Small failure domains • Risk acceptance vs. risk mitigation • More boxes for throughput & redundancy • Assume app manages complexity: • Data replication • Assumes infrastructure is unreliable: • Server & data redundancy • Geo-distribution • Auto-scaling 11
  • 18.
    What’s a failuredomain? • “Blast radius” during a failure • What is impacted? • Public SAN failures: • FlexiScale SAN failure in 2007 • UOL Brazil in 2011: • http://goo.gl/8ct9n • There are many more • Enterprise HA ‘pairs’ typically support BIG failure domains 12
  • 19.
    Two Diff Archesfor Two Kinds of Apps App Elastic Scale-out Location of complexity Cloud “Enterprise” Cloud Infra Primary scaling dimension Up Out 13
  • 20.
  • 21.
    Two Diff Storagesfor Two Kinds of Clouds “Classic” “Scale-out” Storage Storage Uptime in Infra Uptime in apps Every part is redundant Minimal h/w redundancy Data mgmt in Infra Data mgmt in apps Bigger SAN/NAS/DFS Smaller failure domains 15
  • 22.
    Difference in Tiers Tier $ Purpose Classic Scale-out Mission •SAN, then NAS •On-demand SAN (EBS) 1 $$$$ Critical •10-15K RPM •DynamoDB (AWS) •SSD •Variable service levels •NAS then SAN •DAS 2 $$ Important •7.2K RPM •App / DFS to scale out Archive & •Tape 3 $ Backups •Nearline 5.4K •Object Storage 16
  • 23.
    The Biggest Differenceis in Where Data Management Resides • In scale-out systems, apps are managing the data: • Riak / Scale-out distributed data store • Hadoop+HDFS / Scale-out distributed computation systems • Cassandra / Scale-out distributed columnar database 17
  • 24.
    Cassandra / Netflixuse case • 3 x Replication • Linearly scaling performance • 50 - 300 nodes • > 1M writes/second • When is this perfect? • data size unknown • growth unknown • lots of elastic dynamism 18
  • 25.
    Cassandra / Netflixuse case • DAS (‘ephemeral store’) • Per node perf is constant • disk • CPU • network • Client write times constant • Nothing special here 19
  • 26.
    Cassandra / Netflixuse case • On-demand & app-managed • Cost per GB/hr: $.006 • Cost per GB/mo: $4.14 • Includes: storage, DB, storage admin, network, network admin, etc. etc. etc. 20
  • 27.
    Part 3: Scale-out Storage... Now & Future 21
  • 28.
    Only Change isCertain 22
  • 29.
    There are afew basic approaches being taken ... 23
  • 30.
    Scale-out SAN • In-rackSAN == faster, bigger DAS w/ better stat-muxing • Accept normal DAS failure rates • Assume app handles data replication Dedicated Storage SW • Like AWS ‘ephemeral storage’ 9K Jumbo Frames SSD caches (ZIL/L2ARC) • KT architecture No replication • Customers didn’t “get it” Max HW redundancy • “Ephemeral SAN” not well understood 24
  • 31.
    AWS EBS -“Block-devices-as-a-Service” API Cloud Control System EBS Scheduler • Scale-out SAN (sort of) • Block scheduler EBS Clusters • Async replication Core Network 1 1 • Some failure tolerance 2 • Scheduler: 2 3 • Allocates customer block 4 Inter-rack cluster async devices across many VM1 N1 1 3 3 5 replication failure domains RACK RACK 6 • Customer run RAID inside 7 Intra-rack cluster async VMs to increase VM2 N2 2 8 replication redundancy 25
  • 32.
    DAS + BigData (Storage + Compute + DFS) • Storage capability: • Replication • Disk & server failure • Data rebalancing • Data locality • rack awareness • Checksums (basic) • Also: • Built in computation 26
  • 33.
    Distributed File Systems(DFS) over DAS • Storage capability: • Replication • Disk & server failure • Data rebalancing • Checksums (w/ btrfs) • Block devices • Also: ✴ ceph architecture • No computation ✴ Looks familiar doesn’t it? 27
  • 34.
    Why is DFSat the Physical Layer Dangerous for Scale-out? DFS == 28
  • 35.
    DAS + DatabaseReplication / Scaling • Storage capability: • Async/Sync Replication • Server failure • Checksums (sort of) • Also: • Std RDBMS • SQL i/f • Well understood 29
  • 36.
    Object Storage • Storagecapability: • Replication • Disk & server failure • Data rebalancing • Checksums (sometimes) • Also: • Looks like a big web app • Uses a DHT/CHT to ‘index’ blobs • Very simple 30
  • 37.
    Where does OpenStorageFit? Scale-out Virtual or Purpose / Tier Fit Solution Physical? Scale-out SAN Tier-1/2 Physical In-rack SAN EBS Clusters EBS Tier-1 Physical (scale-out SAN) Reliable, bit-rot DAS+BigData Tier-2 Virtual resistant DAS Reliable, bit-rot DAS+DFS Tier-2 Physical / Virtual resistant DAS (unproven) In-VM reliable DAS+DB Tier-2 Virtual DAS Reliable, bit-rot Object Storage Tier-3 Physical resistant DAS 31
  • 38.
    Summarizing ZFS Valuein Scale-out • Data integrity & bit rot an issue that few solve today • Most SAN/NAS solutions don’t ‘scale down’ • Commodity x86 servers are winning • There are two scale-out places ZFS wins: • Small SAN clusters • Best DAS management 32
  • 39.
  • 40.
    Conclusions / Speculations • Build the right cloud • Which means the right storage for *that* cloud • A single cloud might support both ... • Open storage can be used for both ... • ... WITH the appropriate design/forethought 34
  • 41.