Future of Cloud Storage
         AB Periasamy | CTO Gluster, Inc.
         Thu, June 9, 2011
Storage Transforming to Reflect Compute

                          Multi-tenant /                                          Commoditized
Virtualized                                             Automated
                            shared                                                Standardized

 Scale on                                                                        Free Software
                           In the Cloud                  Scale Out
 Demand                                                                          / OpenSource

  Storage must support the public and private cloud environment
 • Storage is the Achilles heel of full data center virtualization
 • Storage performance, availability, capacity, & interface is the Achilles heel of public
   cloud
 • Big Data, data migration, multi-site consistency are the Achilles heel of hybrid cloud
  Storage will look like the computing environment
 • Storage should be a commoditized, virtualized, centrally managed pool
 • Monolithic proprietary systems now challenged by nimble and shared network storage
 • Storage becomes a software problem
 • The “Google model” of storage


               Petascale Cloud Filesystem                                                        2
Disk Systems & Storage Interconnects



  FC (FCP)
  Infiniband (SRP, iSer, NFSoRDMA)
  Ethernet (iSCSI, FCoE, AoE, NBD/DRBD, NFS, HTTP)

  DAS vs JBOD vs SAN
  SATA vs SAS vs SSD




      Petascale Cloud Filesystem                     3
Filesystems


   Ext3/4, XFS, Btrfs, ZFS

   NFS, OCFS2, Lustre, GFS, GPFS

   MogileFS, VMFS, SheepDog, Ceph

   GlusterFS




        Petascale Cloud Filesystem   4
Future of Cloud Storage




      Filesystems vs Object Storage
           vs Big Data vs NoSQL




      Petascale Cloud Filesystem      5
GlusterFS towards Unified Storage




       Unified Multi-Protocol Storage

   NAS + Objects + Big Data + SAN




      Petascale Cloud Filesystem        6
GlusterFS Simple Commands

# gluster peer probe HOSTNAME
# gluster volume info
# gluster volume create VOLNAME [stripe COUNT] [replica
COUNT] [transport tcp | rdma] BRICK …
# gluster volume delete VOLNAME
# gluster volume add-brick VOLNAME NEW-BRICK ...
# gluster volume rebalance VOLNAME start



          Petascale Cloud Filesystem                      7
Gluster Architecture Advantages
                                       Software only
                                       No metadata server
‘Google Storage’ for                    • Fully distributed architecture, no bottleneck
Everyone                                • Gluster Elastic Hash
• Intelligence in the SW               High performance global namespace
• Leverage commodity HW                 • Scale out with linear performance
                                        • Hundreds of petabytes
• Scale-out elastically                 • 1 GbE, 10GbE
• Replication for reliability          High availability
• Software enables                      • Replication to survive hardware failure
  virtualization                        • Self-healing
                                        • Data stored in NFS-like native format
                                       Stackable userspace design
                                        • No kernel dependencies, simple install
                                        • Match specific workload profiles
                                        • Early maturity and rich functionality

               Petascale Cloud Filesystem                                                 9
Evolution of GlusterFS

 2006-2009 GlusterFS v1.0 – v3.0
 Distributed Filesystem capabilities with self-healing, synchronous
 replication, stripe, distribute (global name space)


 2010 GlusterFS v3.1
 Elastic Cloud capabilities


 2011 Q2 GlusterFS v3.2
 GeoGraphic replication, Enhanced monitoring, Directory level
 quotas (also works as cloud usage billing APIs)


 2011 Q3/Q4
 Hadoop HDFS drop-in replacement, Unified File and Object
 Storage (Amazon S3 compatible) and Near CDP.


           Petascale Cloud Filesystem                                 10
Story of Gluster




   1st meeting room
                                       1st Office US




   1st Office Bengalooru               1st Office Bengalooru
          Petascale Cloud Filesystem                           11
1000s of Community Deployments




      Petascale Cloud Filesystem   12
Fast Growing Commercial Deployments




      Petascale Cloud Filesystem      13
www.gluster.org

          Thank You
Gluster Deployment




  Private Cloud                        Public Cloud




          Petascale Cloud Filesystem                  15
GlusterFS & OpenStack


   VM Image Storage – Answer to VMWare VMFS

   Unified File & Object Storage – Application Data

   GeoReplication – Enable Hybrid Clouds




        Petascale Cloud Filesystem                    16
Partners Healthcare
Private Cloud: Centralized Storage as a Service

                                   Problem
                                   • Capacity growth from 144TB to 1+PB
                                   • Multiple distributed users/departments
                                   • Multi OS access - Windows, Linux and Unix
                                   Solution
 • Over 500 TB                     • GlusterFS Cluster
 • 9 Sun “Thumper”                 • Solaris/ZFS/x4500 w/ InfiniBand
   systems in cluster              • Native CIFS/ NFS access
                                   Benefits
                                   •   Capacity on demand / pay as you grow
                                   •   Centralized management
                                   •   Higher reliability
                                   •   OPEX decreased by 10X
             Petascale Cloud Filesystem                                          17
Pandora Internet Radio

                                  Problem
                                   • Explosive user & title growth
                                   • As many as 12 file formats for each song
                                   • ‘Hot’ content and long tail
                                  Solution
                                   • Three data centers, each with a six-node
                                     GlusterFS cluster
• 1.2 PB of audio served           • Replication for high availability
  per week                         • 250+ TB total capacity
• 13 million files                Benefits
• Over 50 GB/sec peak              • Easily scale capacity
  traffic                          • Centralized management; one
                                     administrator to manage day-to-day
                                     operations
                                   • No changes to application
                                   • Higher reliability
           Petascale Cloud Filesystem                                           18
Cincinnati Bell Technology Solutions


                                  Problem
                                   • Host a dedicated enterprise cloud solution
                                   • Large scale VMware environment
                                   • Need high availability
                                  Solution
• Large scale VM
                                   • Gluster for VM storage, NFS to clients
  storage
                                   • SAS drives on back-end
• Low cost service                 • Replication for high availability
  delivery for enterprise
  customer                        Benefits
• Drastic reduction in             •     Storage provisioning from 6 wks. to 15 min.
  provisioning time                •     Vendor agnostic storage
                                   •     Low cost of service delivery
                                   •     Elastic growth


            Petascale Cloud Filesystem                                                 19
Envoy Media

Public Cloud: Media Serving on AWS
                                 Problem
                                  • Limited scalability
                                  • Slow response to demand spikes
                                  • Manual data management
                                 Solution
                                  • Four EBS volumes under Gluster global
• Targeted media                    namespace
  serving                         • Replication for high availability
• 100% AWS hosted                 • EC2 for compute; S3 for backup
• Unpredictable traffic          Benefits
                                  •     No change to application
                                  •     Content immediately available to all servers
                                  •     Automatic resource allocation
                                  •     Lower cost (vs. colo and proprietary options)
           Petascale Cloud Filesystem                                                   20

Future of cloud storage

  • 1.
    Future of CloudStorage AB Periasamy | CTO Gluster, Inc. Thu, June 9, 2011
  • 2.
    Storage Transforming toReflect Compute Multi-tenant / Commoditized Virtualized Automated shared Standardized Scale on Free Software In the Cloud Scale Out Demand / OpenSource Storage must support the public and private cloud environment • Storage is the Achilles heel of full data center virtualization • Storage performance, availability, capacity, & interface is the Achilles heel of public cloud • Big Data, data migration, multi-site consistency are the Achilles heel of hybrid cloud Storage will look like the computing environment • Storage should be a commoditized, virtualized, centrally managed pool • Monolithic proprietary systems now challenged by nimble and shared network storage • Storage becomes a software problem • The “Google model” of storage Petascale Cloud Filesystem 2
  • 3.
    Disk Systems &Storage Interconnects FC (FCP) Infiniband (SRP, iSer, NFSoRDMA) Ethernet (iSCSI, FCoE, AoE, NBD/DRBD, NFS, HTTP) DAS vs JBOD vs SAN SATA vs SAS vs SSD Petascale Cloud Filesystem 3
  • 4.
    Filesystems Ext3/4, XFS, Btrfs, ZFS NFS, OCFS2, Lustre, GFS, GPFS MogileFS, VMFS, SheepDog, Ceph GlusterFS Petascale Cloud Filesystem 4
  • 5.
    Future of CloudStorage Filesystems vs Object Storage vs Big Data vs NoSQL Petascale Cloud Filesystem 5
  • 6.
    GlusterFS towards UnifiedStorage Unified Multi-Protocol Storage NAS + Objects + Big Data + SAN Petascale Cloud Filesystem 6
  • 7.
    GlusterFS Simple Commands #gluster peer probe HOSTNAME # gluster volume info # gluster volume create VOLNAME [stripe COUNT] [replica COUNT] [transport tcp | rdma] BRICK … # gluster volume delete VOLNAME # gluster volume add-brick VOLNAME NEW-BRICK ... # gluster volume rebalance VOLNAME start Petascale Cloud Filesystem 7
  • 9.
    Gluster Architecture Advantages Software only No metadata server ‘Google Storage’ for • Fully distributed architecture, no bottleneck Everyone • Gluster Elastic Hash • Intelligence in the SW High performance global namespace • Leverage commodity HW • Scale out with linear performance • Hundreds of petabytes • Scale-out elastically • 1 GbE, 10GbE • Replication for reliability High availability • Software enables • Replication to survive hardware failure virtualization • Self-healing • Data stored in NFS-like native format Stackable userspace design • No kernel dependencies, simple install • Match specific workload profiles • Early maturity and rich functionality Petascale Cloud Filesystem 9
  • 10.
    Evolution of GlusterFS 2006-2009 GlusterFS v1.0 – v3.0 Distributed Filesystem capabilities with self-healing, synchronous replication, stripe, distribute (global name space) 2010 GlusterFS v3.1 Elastic Cloud capabilities 2011 Q2 GlusterFS v3.2 GeoGraphic replication, Enhanced monitoring, Directory level quotas (also works as cloud usage billing APIs) 2011 Q3/Q4 Hadoop HDFS drop-in replacement, Unified File and Object Storage (Amazon S3 compatible) and Near CDP. Petascale Cloud Filesystem 10
  • 11.
    Story of Gluster 1st meeting room 1st Office US 1st Office Bengalooru 1st Office Bengalooru Petascale Cloud Filesystem 11
  • 12.
    1000s of CommunityDeployments Petascale Cloud Filesystem 12
  • 13.
    Fast Growing CommercialDeployments Petascale Cloud Filesystem 13
  • 14.
  • 15.
    Gluster Deployment Private Cloud Public Cloud Petascale Cloud Filesystem 15
  • 16.
    GlusterFS & OpenStack VM Image Storage – Answer to VMWare VMFS Unified File & Object Storage – Application Data GeoReplication – Enable Hybrid Clouds Petascale Cloud Filesystem 16
  • 17.
    Partners Healthcare Private Cloud:Centralized Storage as a Service Problem • Capacity growth from 144TB to 1+PB • Multiple distributed users/departments • Multi OS access - Windows, Linux and Unix Solution • Over 500 TB • GlusterFS Cluster • 9 Sun “Thumper” • Solaris/ZFS/x4500 w/ InfiniBand systems in cluster • Native CIFS/ NFS access Benefits • Capacity on demand / pay as you grow • Centralized management • Higher reliability • OPEX decreased by 10X Petascale Cloud Filesystem 17
  • 18.
    Pandora Internet Radio Problem • Explosive user & title growth • As many as 12 file formats for each song • ‘Hot’ content and long tail Solution • Three data centers, each with a six-node GlusterFS cluster • 1.2 PB of audio served • Replication for high availability per week • 250+ TB total capacity • 13 million files Benefits • Over 50 GB/sec peak • Easily scale capacity traffic • Centralized management; one administrator to manage day-to-day operations • No changes to application • Higher reliability Petascale Cloud Filesystem 18
  • 19.
    Cincinnati Bell TechnologySolutions Problem • Host a dedicated enterprise cloud solution • Large scale VMware environment • Need high availability Solution • Large scale VM • Gluster for VM storage, NFS to clients storage • SAS drives on back-end • Low cost service • Replication for high availability delivery for enterprise customer Benefits • Drastic reduction in • Storage provisioning from 6 wks. to 15 min. provisioning time • Vendor agnostic storage • Low cost of service delivery • Elastic growth Petascale Cloud Filesystem 19
  • 20.
    Envoy Media Public Cloud:Media Serving on AWS Problem • Limited scalability • Slow response to demand spikes • Manual data management Solution • Four EBS volumes under Gluster global • Targeted media namespace serving • Replication for high availability • 100% AWS hosted • EC2 for compute; S3 for backup • Unpredictable traffic Benefits • No change to application • Content immediately available to all servers • Automatic resource allocation • Lower cost (vs. colo and proprietary options) Petascale Cloud Filesystem 20