The document provides an overview and future directions of Gluster distributed storage system. It discusses why Gluster is useful given increasing data volumes. It defines Gluster as a scale-out distributed storage system that aggregates storage over a network to provide a unified namespace. It outlines typical deployments and architecture, and describes various volume types like distributed, replicated, dispersed. It also covers access mechanisms, features, use cases and monitoring integration. Finally, it discusses recent releases and new features in development like data tiering, bitrot detection and sharding to improve performance and capabilities.
4. 03/12/15
Why Gluster?
● 2.5+ exabytes of data produced every
day!
● 90% of data in last two years
● Data needs to be stored somewhere!
● Commoditization and Democratization –
way to go
source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
6. 03/12/15
What is Gluster?
● Scale-out distributed storage system.
● Aggregates storage exports over network
interconnects to provide an unified namespace.
● File, Object and Block interfaces
● Layered on disk file systems that support extended
attributes.
8. 03/12/15
Gluster Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular
9. 03/12/15
Volumes in Gluster
● Logical collection of exports aka bricks.
● Identified by an administrative name.
● Volume or a part of the volume used by clients for data
CRUD operations.
● Multiple volume types supported currently
13. 03/12/15
Dispersed Volume
● Introduced in GlusterFS 3.6
● Erasure Coding / RAID 5 over the network
● “Disperses” data on to various bricks
● Algorithm: Reed solomon
● Non–systematic erasure coding
● Encoding / decoding done on client side
23. 03/12/15
Features
● Scale-out NAS
● Elasticity, quotas
● Data Protection and Recovery
● Volume and File Snapshots, User Serviceable
Snapshots, Geographic/Asynchronous replication
● Archival
● Read-only, WORM
● Native CLI / API for management
24. 03/12/15
Features
● Isolation for multi-tenancy
● SSL for data/connection, Encryption at rest
● Performance
● Data, metadata and readdir caching
● Monitoring
● Built in io statistics, /proc like interface for introspection
● Provisioning
● Puppet-gluster, gluster-deploy
● More..
28. 03/12/15
Translators in Gluster
● Translator = shared library
● Each translator is a self-contained functional
unit.
● Translators can be stacked together for
achieving desired functionality.
● Translators are deployment agnostic – write
once use anywhere!
35. 03/12/15
Data Tiering
● Policy based data movement across hot and
cold tiers
● New translator for identifying candidates for
promotion/demotion
● Enables better utilization of different classes of
storage device/SSDs
36. Tier Xlator
HOT DHT COLD DHT
Replication Xlator
HOT Tier
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data
Store
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data
Store
COLD Tier
Demotion
Promotion
Data Tiering
37. 03/12/15
Bitrot detection
● Detection of at rest data corruption
● Checksum associated with each file
● Asynchronous checksum signing
● Periodic data scrubbing
● Bitrot detection upon access
38. 03/12/15
Sharding
● Solves fragmentation in Gluster volumes
● Chunks and places data in any node that has
space
● Suitable for large file workloads requiring
parallelism
39. 03/12/15
Netgroups and Exports for NFS in 3.7
● More advanced configuration for authentication based
on /etc/exports like syntax
● Support for netgroups
● Patches written at Facebook
● Forward ported from 3.4 to 3.7
40. 03/12/15
NFS Ganesha improvements
● Supports active – active NFSv4, NFSv4.1 with
Kerberos
● pNFS support for Gluster
● New upcall infrastructure added in Gluster
● Gluster CLI to manage NFS Ganesha
● High-Availability based on Pacemaker and Corosync
41. 03/12/15
Performance enhancements
● Small file
● Multi-threaded epoll
● In memory metadata caching on bricks
● Improvements for directory listing
● Rebalance
● Parallel rebalance
● More efficient disk crawling
● Data tiering
42. 03/12/15
TrashCan
● Protection from fat finger deletions, truncations.
● Stored in a designated directory within the brick
● Captures deletions performed by maintenance
operations like self-healing, rebalance etc.
43. 03/12/15
Arbiter Replication
● 2 Data, 3 Metadata replication
● Additional metadata copy used for arbitration
● Minimizes possibilites of split-brain by a great
degree
● convert existing replica 2 volumes to arbiter
replica volumes
44. 03/12/15
Split-brain Resolution
● Existing behavior – EIO
● Administrative policies to automatically resolve
split-brain
● User can view split objects & resolve split-brain
45. 03/12/15
Other major improvements
● Support for inode quotas
● Volume clone from snapshot
● Snapshot scheduling
● glusterfind – 'Needle in a haystack'
● Loads of bug fixes
46. 03/12/15
Features beyond GlusterFS 3.7
● HyperConvergence with oVirt
● Compression (at rest)
● De-duplication
● Overlay translator
● Multi-protocol support with NFS, FUSE and SMB
● Native ReST APIs for gluster management
● More integration with OpenStack, Containers
47. 03/12/15
Hyperconverged oVirt – Gluster
● Server nodes are used both
for virtualization and storage
● Support for both scaling up,
adding more disks, and
scaling out, adding more hosts
VMsandStorageEngine
GlusterFS Volume
Bricks Bricks Bricks
48. 03/12/15 48
GlusterFS Native Driver – OpenStack
Manila
● Supports Certificate based access type of Manila
● Provision shares that use the 'glusterfs' protocol
● Multi-tenant
● Separation using tenant specific certificates
● Supports certificate chaining and cipher lists
50. 03/12/15 50
GlusterFS Ganesha Driver for OpenStack
Manila
Storage Backend
GlusterFS
Tenant 1
Service VM
Gluster FSAL
NFS-Ganesha Server
Tenant 2
Service VM
Gluster FSAL
NFS-Ganesha Server
Nova VM
Nova VM
52. 03/12/15
Gluster 4.0
● Address higher scale
● not just higher node count, also correctness and
consistency at higher node count
● glusterd, DHT changes
● Support more heterogeneous environments
● multiple OSes, multiple storage types, multiple
networks, NSR
● Increase deployment flexibility
● e.g. data classification, multiple replication/erasure
types and levels
53. 03/12/15
New Style Replication
● Server Side Replication
● Controlled by a designated “leader” also known
as sweeper.
● Advantages
● Bandwidth usage of client network optimized for
direct (fuse) mounts
● Avoidance of split brain
55. 03/12/15
DHTv2
● Improved scalability and performance for all
directory-entry operations.
● High consistency and reliability for conflicting
directory-entry operations, and for layout repair.
● Better performance for rebalance
56. 03/12/15
Thousand node glusterd
● Scale glusterd to manage more than 1000
nodes
● Paxos/Raft for membership and configuration
management
57. 03/12/15
Gluster 4.0 – What's next?
● Code name for the release? Open to suggestions
● Submissions for feature proposals is still open!
● Implementation of key features in progress.
● Voting on feature proposals during design summit
● Tentatively planned for May 2016
61. 03/12/15
Striped Volume
● Aggregation of chunks of files placed on various bricks.
● Recommended normally for workloads involving very
large files and parallel access.
● WIP Sharding feature likely to supersede striped
volumes.
62. 03/12/15
GlusterFS concepts – Trusted Storage Pool
● a.k.a cluster
● glusterd uses a membership protocol to form trusted
storage pool.
● Trusted Storage Pool is invite only.
● Membership information used for determining quorum.
● Members can be dynamically added and removed from
the pool.
66. 03/12/15
A brick is the combination of a node and an export directory – for e.g. hostname:/dir
Each brick inherits limits of the underlying filesystem
No limit on the number of bricks per node
Data and metadata get stored on bricks
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks