Atin Mukherjee
GlusterFS Hacker
GlusterFS – Architecture & Roadmap
GlusterFS Meetup
Feb 2015
07/02/2015 GlusterFS Meetup
Agenda
● Introduction in the Gluster community
● Current stable releases
● What is GlusterFS?
● Architecture
● GlusterFS 3.6 Features
● GlusterFS 3.7 Features planned
● GlusterFS 4.0 and beyond
● Q&A
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Different roles
● Users, testers, supporters, developers, editors, ...
● Different organizations
● Products based on / containing GlusterFS
● Service, consulting and support
● Integration in other (Open Source) projects
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Regular IRC meetings
● Discussions and support over mailinglists and on IRC
● Providing packages (RPMs, DEBs)
● Work with different Linux and BSD distributions to
improve portability and availability
● Infrastructure hosting for Gluster related projects
● Gerrit and Jenkins for code review and testing
● Gluster Forge for git/wiki hosting of projects
07/02/2015 GlusterFS Meetup
Introduction in Gluster community
● Some numbers from 2014
● Approx. 175 IRC participants
● Two main mailinglists reach ~600 emails/month
● 100/60 active users/devs posting to the lists
● Around 2200 patches merged in the master branch
● Patches of ~90 developers got included
07/02/2015 GlusterFS Meetup
Current stable releases
● Maintenance of three minor releases
● 3.6, 3.5 and 3.4
● Bugfixes only, non-intrusive features on high demand
● Patches get backported to fix reported bugs
07/02/2015 GlusterFS Meetup
Integration with glusterfs
● More projects built and enhanced around the
GlusterFS ecosystem – dockit, gluster-deploy,
gluster-nagios, glusterfsiostat, puppet-gluster to
name a few.
● Improved integration with broader ecosystem
projects like Ambari, NFS-Ganesha,
OpenStack, oVirt and Samba.
07/02/2015 GlusterFS Meetup
What is GlusterFS?
●
A general purpose scale-out distributed file system.
●
Aggregates storage exports over network interconnect to
provide a single unified namespace.
●
Filesystem is stackable and completely in userspace.
●
Layered on disk file systems that support extended
attributes.
07/02/2015 GlusterFS Meetup
Typical GlusterFS Deployment
Global namespace
Scale-out storage
building blocks
Supports
thousands of clients
Access using
GlusterFS native,
NFS, SMB and HTTP
protocols
Linear performance
scaling
07/02/2015 GlusterFS Meetup
GlusterFS Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
● Deployment agnostic
● Unified access
● Largely POSIX compliant
07/02/2015 GlusterFS Meetup
Concepts & Algorithms
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
●
Trusted Storage Pool (cluster) is a collection of storage servers.
●
Trusted Storage Pool is formed by invitation – “probe” a new
member from the cluster and not vice versa.
●
Logical partition for all data and management operations.
●
Membership information used for determining quorum.
●
Members can be dynamically added and removed from the
pool.
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
Node2
Probe
Probe
accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
Node1
07/02/2015 GlusterFS Meetup
GlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
07/02/2015 GlusterFS Meetup

A brick is the combination of a node and an export directory – for e.g.
hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
07/02/2015 GlusterFS Meetup
GlusterFS concepts - Volumes
● A volume is a logical collection of bricks.
● Volume is identified by an administrator provided name.
● Volume is a mountable entity and the volume name is
provided at the time of mounting.
– mount -t glusterfs server1:/<volname> /my/mnt/point
● Bricks from the same node can be part of different
volumes
07/02/2015 GlusterFS Meetup
GlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
07/02/2015 GlusterFS Meetup
Volume Types
➢
Type of a volume is specified at the time of volume
creation
➢
Volume type determines how and where data is placed
➢
Following volume types are supported in glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate
➢
f) Distributed Striped Replicate
➢
g) Dispersed
➢
h) Distributed dispersed
07/02/2015 GlusterFS Meetup
Distributed Replicated Volume
07/02/2015 GlusterFS Meetup
GlusterFS 3.6
● Better SSL support
● Heterogenous bricks
● Erasure coding
● Meta translator
● Volume snapshots and user-servicability
07/02/2015 GlusterFS Meetup
GlusterFS 3.6 contd.
● AFRv2
● RDMA transport for GlusterFS volumes
07/02/2015 GlusterFS Meetup
Better SSL support
● SSL support for management plane
● SSL for authorizing and authenticating access
to volumes.
● Paves way for fine-grained access to volumes
in the storage pool*.
● Makes self-service style management at a
volume-level possible*.
* - Not implemented yet; technically possible
07/02/2015 GlusterFS Meetup
Heterogenous bricks
● Allows distribution of data to account for bricks
of different sizes
● Uniform distribution can potentially penalise
smaller bricks with more allocations
● Changes were made to the DHT (distribute)
translator
07/02/2015 GlusterFS Meetup
Erasure Coding
● Provides resilience to brick failures using erasure
codes
● Configurable redundancy and fault tolerance
● Reduces disk space consumption in comparison to
replicated volumes
07/02/2015 GlusterFS Meetup
Meta xlator
● Provides a /proc like interface to GlusterFS
runtime
● Allows users to inspect internals of translators
present in GlusterFS runtime 'stack'.
● For e.g, cat /mnt/glusterfs/.meta/version
to fetch the version of glusterfs mount process
● tree /mnt/glusterfs/.meta/graphs/active
07/02/2015 GlusterFS Meetup
AFRv2
● Refactored AFR implementation
● Improvements in healing process' performance
● Paves way for better introspection into
thehealing process. More on 3.7
07/02/2015 GlusterFS Meetup
RDMA Support
● Minor fixes that made RDMA transport more
usable for GlusterFS volumes
07/02/2015 GlusterFS Meetup
GlusterFS 3.7
● Small file performance
● Data classification
● Bit­rot detection
● Better OpenStack integration – for e.g Manila
07/02/2015 GlusterFS Meetup
Small file performance
● Multi­threaded epoll – Transport layer
● Caching stat and xattr calls on small files –
Storage layer
● Migrate .glusterfs to SSDs – Physical layer
● Batching of RPCs per file access
07/02/2015 GlusterFS Meetup
Data Classification
● Mapping file characteristics to subvolume
characteristics
● File characteristics – size, age, access rate,
type (extension)
● Subvolume characteristics – physical location,
storage type (SSD, disk), encoding method
(deduplicated, erasure coded)
● User provided mappings via 'tags'
● Implementation using 'DHT over DHT' pattern
07/02/2015 GlusterFS Meetup
Bit­Rot detection
● Silent disk corruption
● Useful for archival or WORM workloads
● Lazy, policy­based and incremental checksum
computation
07/02/2015 GlusterFS Meetup
Better Openstack integration
● Manila – File share as a service
● Cinder – Block storage as a service
● Swift – Object storage as a service
● Sahara – Hadoop as a service
● For Kilo release
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● To be the best in class distributed commodity
storage with unified access of data
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● Community scaling – design by community
● Node scaling
● Technology scaling
● Development process scaling
07/02/2015 GlusterFS Meetup
GlusterFS 4.0 Vision
● 'Thousand node glusterd'
● DHT scalability
● NSR – Log based, chain replication
● Better brick management
● Split Network
● ... and many more. See
http://www.gluster.org/community/documentation/index.php/Planning40
07/02/2015 GlusterFS Meetup
Q&A
07/02/2015 GlusterFS Meetup
Resources
Mailing lists:
Gluster-users@gluster.org
Gluster-devel@gluster.org
IRC:
#gluster and #gluster-dev on freenode
Links:
http://www.gluster.org
http://hekafs.org
http://forge.gluster.org
http://www.gluster.org/community/documentation/index.php/Arch

Gluster fs architecture_&_roadmap_atin_punemeetup_2015

  • 1.
    Atin Mukherjee GlusterFS Hacker GlusterFS– Architecture & Roadmap GlusterFS Meetup Feb 2015
  • 2.
    07/02/2015 GlusterFS Meetup Agenda ●Introduction in the Gluster community ● Current stable releases ● What is GlusterFS? ● Architecture ● GlusterFS 3.6 Features ● GlusterFS 3.7 Features planned ● GlusterFS 4.0 and beyond ● Q&A
  • 3.
    07/02/2015 GlusterFS Meetup Introductionin Gluster community ● Different roles ● Users, testers, supporters, developers, editors, ... ● Different organizations ● Products based on / containing GlusterFS ● Service, consulting and support ● Integration in other (Open Source) projects
  • 4.
    07/02/2015 GlusterFS Meetup Introductionin Gluster community ● Regular IRC meetings ● Discussions and support over mailinglists and on IRC ● Providing packages (RPMs, DEBs) ● Work with different Linux and BSD distributions to improve portability and availability ● Infrastructure hosting for Gluster related projects ● Gerrit and Jenkins for code review and testing ● Gluster Forge for git/wiki hosting of projects
  • 5.
    07/02/2015 GlusterFS Meetup Introductionin Gluster community ● Some numbers from 2014 ● Approx. 175 IRC participants ● Two main mailinglists reach ~600 emails/month ● 100/60 active users/devs posting to the lists ● Around 2200 patches merged in the master branch ● Patches of ~90 developers got included
  • 6.
    07/02/2015 GlusterFS Meetup Currentstable releases ● Maintenance of three minor releases ● 3.6, 3.5 and 3.4 ● Bugfixes only, non-intrusive features on high demand ● Patches get backported to fix reported bugs
  • 7.
    07/02/2015 GlusterFS Meetup Integrationwith glusterfs ● More projects built and enhanced around the GlusterFS ecosystem – dockit, gluster-deploy, gluster-nagios, glusterfsiostat, puppet-gluster to name a few. ● Improved integration with broader ecosystem projects like Ambari, NFS-Ganesha, OpenStack, oVirt and Samba.
  • 8.
    07/02/2015 GlusterFS Meetup Whatis GlusterFS? ● A general purpose scale-out distributed file system. ● Aggregates storage exports over network interconnect to provide a single unified namespace. ● Filesystem is stackable and completely in userspace. ● Layered on disk file systems that support extended attributes.
  • 9.
    07/02/2015 GlusterFS Meetup TypicalGlusterFS Deployment Global namespace Scale-out storage building blocks Supports thousands of clients Access using GlusterFS native, NFS, SMB and HTTP protocols Linear performance scaling
  • 10.
    07/02/2015 GlusterFS Meetup GlusterFSArchitecture – Foundations ● Software only, runs on commodity hardware ● No external metadata servers ● Scale-out with Elasticity ● Extensible and modular ● Deployment agnostic ● Unified access ● Largely POSIX compliant
  • 11.
  • 12.
    07/02/2015 GlusterFS Meetup GlusterFSconcepts – Trusted Storage Pool ● Trusted Storage Pool (cluster) is a collection of storage servers. ● Trusted Storage Pool is formed by invitation – “probe” a new member from the cluster and not vice versa. ● Logical partition for all data and management operations. ● Membership information used for determining quorum. ● Members can be dynamically added and removed from the pool.
  • 13.
    07/02/2015 GlusterFS Meetup GlusterFSconcepts – Trusted Storage Pool Node2 Probe Probe accepted Node 1 and Node 2 are peers in a trusted storage pool Node2Node1 Node1
  • 14.
    07/02/2015 GlusterFS Meetup GlusterFSconcepts – Trusted Storage Pool Node1 Node2 Node3Node2Node1 Trusted Storage Pool Node3Node2Node1 Detach
  • 15.
    07/02/2015 GlusterFS Meetup  Abrick is the combination of a node and an export directory – for e.g. hostname:/dir  Each brick inherits limits of the underlying filesystem  No limit on the number bricks per node  Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks
  • 16.
    07/02/2015 GlusterFS Meetup GlusterFSconcepts - Volumes ● A volume is a logical collection of bricks. ● Volume is identified by an administrator provided name. ● Volume is a mountable entity and the volume name is provided at the time of mounting. – mount -t glusterfs server1:/<volname> /my/mnt/point ● Bricks from the same node can be part of different volumes
  • 17.
    07/02/2015 GlusterFS Meetup GlusterFSconcepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos
  • 18.
    07/02/2015 GlusterFS Meetup VolumeTypes ➢ Type of a volume is specified at the time of volume creation ➢ Volume type determines how and where data is placed ➢ Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate ➢ f) Distributed Striped Replicate ➢ g) Dispersed ➢ h) Distributed dispersed
  • 19.
  • 20.
    07/02/2015 GlusterFS Meetup GlusterFS3.6 ● Better SSL support ● Heterogenous bricks ● Erasure coding ● Meta translator ● Volume snapshots and user-servicability
  • 21.
    07/02/2015 GlusterFS Meetup GlusterFS3.6 contd. ● AFRv2 ● RDMA transport for GlusterFS volumes
  • 22.
    07/02/2015 GlusterFS Meetup BetterSSL support ● SSL support for management plane ● SSL for authorizing and authenticating access to volumes. ● Paves way for fine-grained access to volumes in the storage pool*. ● Makes self-service style management at a volume-level possible*. * - Not implemented yet; technically possible
  • 23.
    07/02/2015 GlusterFS Meetup Heterogenousbricks ● Allows distribution of data to account for bricks of different sizes ● Uniform distribution can potentially penalise smaller bricks with more allocations ● Changes were made to the DHT (distribute) translator
  • 24.
    07/02/2015 GlusterFS Meetup ErasureCoding ● Provides resilience to brick failures using erasure codes ● Configurable redundancy and fault tolerance ● Reduces disk space consumption in comparison to replicated volumes
  • 25.
    07/02/2015 GlusterFS Meetup Metaxlator ● Provides a /proc like interface to GlusterFS runtime ● Allows users to inspect internals of translators present in GlusterFS runtime 'stack'. ● For e.g, cat /mnt/glusterfs/.meta/version to fetch the version of glusterfs mount process ● tree /mnt/glusterfs/.meta/graphs/active
  • 26.
    07/02/2015 GlusterFS Meetup AFRv2 ●Refactored AFR implementation ● Improvements in healing process' performance ● Paves way for better introspection into thehealing process. More on 3.7
  • 27.
    07/02/2015 GlusterFS Meetup RDMASupport ● Minor fixes that made RDMA transport more usable for GlusterFS volumes
  • 28.
    07/02/2015 GlusterFS Meetup GlusterFS3.7 ● Small file performance ● Data classification ● Bit­rot detection ● Better OpenStack integration – for e.g Manila
  • 29.
    07/02/2015 GlusterFS Meetup Smallfile performance ● Multi­threaded epoll – Transport layer ● Caching stat and xattr calls on small files – Storage layer ● Migrate .glusterfs to SSDs – Physical layer ● Batching of RPCs per file access
  • 30.
    07/02/2015 GlusterFS Meetup DataClassification ● Mapping file characteristics to subvolume characteristics ● File characteristics – size, age, access rate, type (extension) ● Subvolume characteristics – physical location, storage type (SSD, disk), encoding method (deduplicated, erasure coded) ● User provided mappings via 'tags' ● Implementation using 'DHT over DHT' pattern
  • 31.
    07/02/2015 GlusterFS Meetup Bit­Rotdetection ● Silent disk corruption ● Useful for archival or WORM workloads ● Lazy, policy­based and incremental checksum computation
  • 32.
    07/02/2015 GlusterFS Meetup BetterOpenstack integration ● Manila – File share as a service ● Cinder – Block storage as a service ● Swift – Object storage as a service ● Sahara – Hadoop as a service ● For Kilo release
  • 33.
    07/02/2015 GlusterFS Meetup GlusterFS4.0 Vision ● To be the best in class distributed commodity storage with unified access of data
  • 34.
    07/02/2015 GlusterFS Meetup GlusterFS4.0 Vision ● Community scaling – design by community ● Node scaling ● Technology scaling ● Development process scaling
  • 35.
    07/02/2015 GlusterFS Meetup GlusterFS4.0 Vision ● 'Thousand node glusterd' ● DHT scalability ● NSR – Log based, chain replication ● Better brick management ● Split Network ● ... and many more. See http://www.gluster.org/community/documentation/index.php/Planning40
  • 36.
  • 37.
    07/02/2015 GlusterFS Meetup Resources Mailinglists: Gluster-users@gluster.org Gluster-devel@gluster.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch