GlusterFS 3.3 Deep-dive
                              AB Periasamy
                Office of the CTO, Red Hat

                        John Mark Walker
                  Gluster Community Guy
Topics
 Review
 Community and Evolution of GlusterFS
 Feature overview
      Granular locking
      Replication Improvements (AFR)
      Unified file and object storage
      HDFS compatibility



06/13/12
1. Quick Review



06/13/12
Simple Economics
        Simplicity, scalability, less cost


  Virtualized     Multi-Tenant   Automated   Commoditized


Scale on Demand   In the Cloud   Scale Out   Open Source




  06/13/12
What is GlusterFS,
 Really?
           Gluster is a unified, distributed
            storage system
             DHT, stackable, POSIX, Swift, HDFS




06/13/12
What Can You Store?
      Media – Docs, Photos, Video
      VM Filesystem – VM Disk Images
      Big Data – Log Files, RFID Data
      Objects – Long Tail Data



06/13/12
2. Community and
           GlusterFS
            Evolution

06/13/12
Community-led Features
      2009 – GlusterFS easier to use
      2010 – CLI, shell, glusterd
      2011 – Marker framework, geo-replication




06/13/12
GlusterFS in 2011
      Scale-out NAS
      Distributed and replicated
      NFS, CIFS and native GlusterFS
      User-space, stackable architecture

 → A good platform to build on
06/13/12
GlusterFS in 2011: The
  Gaps
 Object storage – popularized by S3
      Simplicity bias – GET & PUT
      Combined with RESTful API
      Used mostly in web-based applications


06/13/12
GlusterFS in 2011: The
  Gaps
 Big data, semi-structured data
   No Hadoop, MapReduce capabilities
 Structured data (databases)
      No MongoDB, Oracle, MySQL capability

06/13/12
GlusterFS in 2011: The
  Gaps
 VM image hosting difficulties
  Difficulty in self-heal, rebalancing
 Small files
      PHP-based web sites, primary email storage

06/13/12
3. Feature Overview



06/13/12
GlusterFS in 2012:
  Filling the Gaps
 Better replication
   Granular locking
   Proactive self-healing
   Quorum enforcement
 Synchronous translator API

06/13/12
Granular Locking
           Server fails, comes back
           Files evaluated
           Block-by-block until healed
                                          Blocks compared



           Virtual Disk 1-1
                         Virtual Disk 1-2Virtual Disk 2-1
                                                       Virtual Disk 2-2


                    GlusterFS                     GlusterFS

                     Server 1                      Server 2
06/13/12
Proactive Self-healing
              Performed server-to-server
              Recovered node queries peers
              Server 1 - good                         Server 3 - good

                         / Symlink 1
                 Hidden | Symlink 2
Distributed




                          Symlink 3
                                       Replicated

              Server 2 - recovered                    Server 4 - good

               File 1                                 File 1
               File 2                                 File 2
               File 3                  Self-healing   File 3


 06/13/12
Split Brain
           Nodes cannot see each other, but can
            all still write
           Often due to network outages
           Sometimes results in conflicts
           Up to 3.2, GlusterFS had no concept of
            “quorum”


06/13/12
Quorum Enforcement
           Which node has valid data?
           If quorum, keep writing, else stop
               Configurable option

  Server 1                         Server 2           Server 3

      -No quorum                       -Quorum            -Quorum
     -Stops writing                  -Keeps writing     -Keeps writing

                       Broken
                      Connection



06/13/12
Quorum Enforcement
              After connection restored, self-heal kicks off


     Replica 1                          Replica 2         Replica 3

            -No quorum                     -Quorum            -Quorum
           -Stops writing   Self-heal    -Keeps writing     -Keeps writing



            -No quorum                     -Quorum            -Quorum
           -Stops writing                -Keeps writing     -Keeps writing




06/13/12
GlusterFS in 2012:
  Filling the Gaps
 Synchronous translator API
 Unified File and Object Storage (UFO)
 HDFS-compatible storage layer


06/13/12
Synchronous Translator
  API
 GlusterFS runs asynchronously
  non-blocking I/O, for performance
 Writing code for async I/O confusing


06/13/12
Synchronous Translator
  API
 3.3 introduces synchronous translators
      Easier to write
      Great for non-core operations
        Eg. background scrubbing

06/13/12
Unified File and Object
  (UFO)
           S3, Swift-style object storage
           Access via UFO or Gluster mount
                    HTTP Request
           Client                                   Account    Volume
                                            Proxy
                    ID=/dir/sub/sub2/file
                                                    Containe
                                                               Directory
                                                        r
                           NFS or
           Client                                                File
                                                     Object
                          GlusterFS Mount

06/13/12
Unified File and Object
  (UFO)
           Your gateway to the cloud
           Your data, accessed your way




06/13/12
HDFS Compatibility
           Run MapReduce jobs on GlusterFS
           Add unstructured data to Hadoop


              Hadoop Server                GlusterF   GlusterF
                                               S          S


                                           GlusterF   GlusterF
                 Local Disk                    S          S
                              HDFS
                              Connector
06/13/12                      (Jar file)
Thank you!
                     AB Periasamy
       Office of the CTO, Red Hat
                   ab@redhat.com

               John Mark Walker
         Gluster Community Guy
          johnmark@redhat.com

Gluster 3.3 deep dive

  • 1.
    GlusterFS 3.3 Deep-dive AB Periasamy Office of the CTO, Red Hat John Mark Walker Gluster Community Guy
  • 2.
    Topics Review Communityand Evolution of GlusterFS Feature overview Granular locking Replication Improvements (AFR) Unified file and object storage HDFS compatibility 06/13/12
  • 3.
  • 4.
    Simple Economics Simplicity, scalability, less cost Virtualized Multi-Tenant Automated Commoditized Scale on Demand In the Cloud Scale Out Open Source 06/13/12
  • 5.
    What is GlusterFS, Really? Gluster is a unified, distributed storage system DHT, stackable, POSIX, Swift, HDFS 06/13/12
  • 6.
    What Can YouStore? Media – Docs, Photos, Video VM Filesystem – VM Disk Images Big Data – Log Files, RFID Data Objects – Long Tail Data 06/13/12
  • 7.
    2. Community and GlusterFS Evolution 06/13/12
  • 8.
    Community-led Features 2009 – GlusterFS easier to use 2010 – CLI, shell, glusterd 2011 – Marker framework, geo-replication 06/13/12
  • 9.
    GlusterFS in 2011 Scale-out NAS Distributed and replicated NFS, CIFS and native GlusterFS User-space, stackable architecture → A good platform to build on 06/13/12
  • 10.
    GlusterFS in 2011:The Gaps Object storage – popularized by S3 Simplicity bias – GET & PUT Combined with RESTful API Used mostly in web-based applications 06/13/12
  • 11.
    GlusterFS in 2011:The Gaps Big data, semi-structured data No Hadoop, MapReduce capabilities Structured data (databases) No MongoDB, Oracle, MySQL capability 06/13/12
  • 12.
    GlusterFS in 2011:The Gaps VM image hosting difficulties Difficulty in self-heal, rebalancing Small files PHP-based web sites, primary email storage 06/13/12
  • 13.
  • 14.
    GlusterFS in 2012: Filling the Gaps Better replication Granular locking Proactive self-healing Quorum enforcement Synchronous translator API 06/13/12
  • 15.
    Granular Locking Server fails, comes back Files evaluated Block-by-block until healed Blocks compared Virtual Disk 1-1 Virtual Disk 1-2Virtual Disk 2-1 Virtual Disk 2-2 GlusterFS GlusterFS Server 1 Server 2 06/13/12
  • 16.
    Proactive Self-healing Performed server-to-server Recovered node queries peers Server 1 - good Server 3 - good / Symlink 1 Hidden | Symlink 2 Distributed Symlink 3 Replicated Server 2 - recovered Server 4 - good File 1 File 1 File 2 File 2 File 3 Self-healing File 3 06/13/12
  • 17.
    Split Brain Nodes cannot see each other, but can all still write Often due to network outages Sometimes results in conflicts Up to 3.2, GlusterFS had no concept of “quorum” 06/13/12
  • 18.
    Quorum Enforcement Which node has valid data? If quorum, keep writing, else stop Configurable option Server 1 Server 2 Server 3 -No quorum -Quorum -Quorum -Stops writing -Keeps writing -Keeps writing Broken Connection 06/13/12
  • 19.
    Quorum Enforcement After connection restored, self-heal kicks off Replica 1 Replica 2 Replica 3 -No quorum -Quorum -Quorum -Stops writing Self-heal -Keeps writing -Keeps writing -No quorum -Quorum -Quorum -Stops writing -Keeps writing -Keeps writing 06/13/12
  • 20.
    GlusterFS in 2012: Filling the Gaps Synchronous translator API Unified File and Object Storage (UFO) HDFS-compatible storage layer 06/13/12
  • 21.
    Synchronous Translator API GlusterFS runs asynchronously non-blocking I/O, for performance Writing code for async I/O confusing 06/13/12
  • 22.
    Synchronous Translator API 3.3 introduces synchronous translators Easier to write Great for non-core operations Eg. background scrubbing 06/13/12
  • 23.
    Unified File andObject (UFO) S3, Swift-style object storage Access via UFO or Gluster mount HTTP Request Client Account Volume Proxy ID=/dir/sub/sub2/file Containe Directory r NFS or Client File Object GlusterFS Mount 06/13/12
  • 24.
    Unified File andObject (UFO) Your gateway to the cloud Your data, accessed your way 06/13/12
  • 25.
    HDFS Compatibility Run MapReduce jobs on GlusterFS Add unstructured data to Hadoop Hadoop Server GlusterF GlusterF S S GlusterF GlusterF Local Disk S S HDFS Connector 06/13/12 (Jar file)
  • 26.
    Thank you! AB Periasamy Office of the CTO, Red Hat ab@redhat.com John Mark Walker Gluster Community Guy johnmark@redhat.com

Editor's Notes

  • #17 Previously, needed a client mount loading the replication translator to do the self-heal No longer need client mount, because now the server loads the replication translator to enable server-server self-heal