Gluster - Overview and Future Directions of the Scale-Out Distributed Storage System

Gluster – Overview & Future
Directions
Vijay Bellur
GlusterFS Co-maintainer
Red Hat

03/12/15
Agenda
● Overview
● Why Gluster?
● What is Gluster?
● Use Cases & Features
● Future Directions
● Q & A

03/12/15
Why Gluster?
● 2.5+ exabytes of data produced every
day!
● 90% of data in last two years
● Data needs to be stored somewhere!
● Commoditization and Democratization –
way to go
source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html

03/12/15
What is Gluster?
● Scale-out distributed storage system.
● Aggregates storage exports over network
interconnects to provide an unified namespace.
● File, Object and Block interfaces
● Layered on disk file systems that support extended
attributes.

03/12/15
Typical Gluster Deployment

03/12/15
Gluster Architecture – Foundations
● Software only, runs on commodity hardware
● No external metadata servers
● Scale-out with Elasticity
● Extensible and modular

03/12/15
Volumes in Gluster
● Logical collection of exports aka bricks.
● Identified by an administrative name.
● Volume or a part of the volume used by clients for data
CRUD operations.
● Multiple volume types supported currently

03/12/15
Distributed Replicated Volume

03/12/15
Dispersed Volume
● Introduced in GlusterFS 3.6
● Erasure Coding / RAID 5 over the network
● “Disperses” data on to various bricks
● Algorithm: Reed solomon
● Non–systematic erasure coding
● Encoding / decoding done on client side

03/12/15
FUSE based native access

03/12/15
NFSv3 access with Gluster NFS

03/12/15
Object/ReST - SwiftonFile
Client
Proxy Account
Container
Object
HTTP Request
( Swift REST
API)
Directory
Volume
FileClient
NFS or
GlusterFS Mount
● Unified File and object view.
● Entity mapping between file and object building
blocks

03/12/15
Nfs-Ganesha with GlusterFS

03/12/15
Features
● Scale-out NAS
● Elasticity, quotas
● Data Protection and Recovery
● Volume and File Snapshots, User Serviceable
Snapshots, Geographic/Asynchronous replication
● Archival
● Read-only, WORM
● Native CLI / API for management

03/12/15
Features
● Isolation for multi-tenancy
● SSL for data/connection, Encryption at rest
● Performance
● Data, metadata and readdir caching
● Monitoring
● Built in io statistics, /proc like interface for introspection
● Provisioning
● Puppet-gluster, gluster-deploy
● More..

03/12/15
Gluster & oVirt
Row 1 Row 2 Row 3 Row 4
0
2
4
6
8
10
12
Column 1
Column 2
Column 3

03/12/15
Gluster Monitoring with Nagios
http://www.ovirt.org/Features/Nagios_Integration

03/12/15
How is it implemented?

03/12/15
Translators in Gluster
● Translator = shared library
● Each translator is a self-contained functional
unit.
● Translators can be stacked together for
achieving desired functionality.
● Translators are deployment agnostic – write
once use anywhere!

03/12/15
Customizable Translator Stack

03/12/15
Where is Gluster used?

03/12/15
Gluster Use Cases
Source: 2014
GlusterFS user survey

03/12/15
Recent Gluster Releases
● 3.5 – April 2014
● 3.6 – Oct 2014
● 3.7 – April 2015
● Currently in development

03/12/15
New Features in Gluster 3.7

03/12/15
Data Tiering
● Policy based data movement across hot and
cold tiers
● New translator for identifying candidates for
promotion/demotion
● Enables better utilization of different classes of
storage device/SSDs

Tier Xlator
HOT DHT COLD DHT
Replication Xlator
HOT Tier
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data
Store
POSIX Xlator
CTR Xlator
Other Server Xlator
Brick Storage
Heat Data
Store
COLD Tier
Demotion
Promotion
Data Tiering

03/12/15
Bitrot detection
● Detection of at rest data corruption
● Checksum associated with each file
● Asynchronous checksum signing
● Periodic data scrubbing
● Bitrot detection upon access

03/12/15
Sharding
● Solves fragmentation in Gluster volumes
● Chunks and places data in any node that has
space
● Suitable for large file workloads requiring
parallelism

03/12/15
Netgroups and Exports for NFS in 3.7
● More advanced configuration for authentication based
on /etc/exports like syntax
● Support for netgroups
● Patches written at Facebook
● Forward ported from 3.4 to 3.7

03/12/15
NFS Ganesha improvements
● Supports active – active NFSv4, NFSv4.1 with
Kerberos
● pNFS support for Gluster
● New upcall infrastructure added in Gluster
● Gluster CLI to manage NFS Ganesha
● High-Availability based on Pacemaker and Corosync

03/12/15
Performance enhancements
● Small file
● Multi-threaded epoll
● In memory metadata caching on bricks
● Improvements for directory listing
● Rebalance
● Parallel rebalance
● More efficient disk crawling
● Data tiering

03/12/15
TrashCan
● Protection from fat finger deletions, truncations.
● Stored in a designated directory within the brick
● Captures deletions performed by maintenance
operations like self-healing, rebalance etc.

03/12/15
Arbiter Replication
● 2 Data, 3 Metadata replication
● Additional metadata copy used for arbitration
● Minimizes possibilites of split-brain by a great
degree
● convert existing replica 2 volumes to arbiter
replica volumes

03/12/15
Split-brain Resolution
● Existing behavior – EIO
● Administrative policies to automatically resolve
split-brain
● User can view split objects & resolve split-brain

03/12/15
Other major improvements
● Support for inode quotas
● Volume clone from snapshot
● Snapshot scheduling
● glusterfind – 'Needle in a haystack'
● Loads of bug fixes

03/12/15
Features beyond GlusterFS 3.7
● HyperConvergence with oVirt
● Compression (at rest)
● De-duplication
● Overlay translator
● Multi-protocol support with NFS, FUSE and SMB
● Native ReST APIs for gluster management
● More integration with OpenStack, Containers

03/12/15
Hyperconverged oVirt – Gluster
● Server nodes are used both
for virtualization and storage
● Support for both scaling up,
adding more disks, and
scaling out, adding more hosts
VMsandStorageEngine
GlusterFS Volume
Bricks Bricks Bricks

03/12/15 48
GlusterFS Native Driver – OpenStack
Manila
● Supports Certificate based access type of Manila
● Provision shares that use the 'glusterfs' protocol
● Multi-tenant
● Separation using tenant specific certificates
● Supports certificate chaining and cipher lists

03/12/15 49
GlusterFS Native Driver – OpenStack
Manila
10.1.1.1-24
Admin
192.168.1.2
Tech
10.1.2.1-12
HR
Share: Admin
(allow admin)
Share: Tech
(allow Tech)
Share: HR
(allow HR)
Gluster Pool
Manila
Orchestration

03/12/15 50
GlusterFS Ganesha Driver for OpenStack
Manila
Storage Backend
GlusterFS
Tenant 1
Service VM
Gluster FSAL
NFS-Ganesha Server
Tenant 2
Service VM
Gluster FSAL
NFS-Ganesha Server
Nova VM
Nova VM

03/12/15
Gluster 4.0
● Address higher scale
● not just higher node count, also correctness and
consistency at higher node count
● glusterd, DHT changes
● Support more heterogeneous environments
● multiple OSes, multiple storage types, multiple
networks, NSR
● Increase deployment flexibility
● e.g. data classification, multiple replication/erasure
types and levels

03/12/15
New Style Replication
● Server Side Replication
● Controlled by a designated “leader” also known
as sweeper.
● Advantages
● Bandwidth usage of client network optimized for
direct (fuse) mounts
● Avoidance of split brain

03/12/15
New Style Replication

03/12/15
DHTv2
● Improved scalability and performance for all
directory-entry operations.
● High consistency and reliability for conflicting
directory-entry operations, and for layout repair.
● Better performance for rebalance

03/12/15
Thousand node glusterd
● Scale glusterd to manage more than 1000
nodes
● Paxos/Raft for membership and configuration
management

03/12/15
Gluster 4.0 – What's next?
● Code name for the release? Open to suggestions
● Submissions for feature proposals is still open!
● Implementation of key features in progress.
● Voting on feature proposals during design summit
● Tentatively planned for May 2016

03/12/15
Resources
Mailing lists:
gluster-users@gluster.org
gluster-devel@nongnu.org
IRC:
#gluster and #gluster-dev on freenode
Web:
http://www.gluster.org

Thank You!
vijay at gluster.org
twitter: @vbellur

03/12/15
Striped Volume
● Aggregation of chunks of files placed on various bricks.
● Recommended normally for workloads involving very
large files and parallel access.
● WIP Sharding feature likely to supersede striped
volumes.

03/12/15
GlusterFS concepts – Trusted Storage Pool
● a.k.a cluster
● glusterd uses a membership protocol to form trusted
storage pool.
● Trusted Storage Pool is invite only.
● Membership information used for determining quorum.
● Members can be dynamically added and removed from
the pool.

03/12/15
How does a distributed volume work?

03/12/15

A brick is the combination of a node and an export directory – for e.g. hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number of bricks per node

Data and metadata get stored on bricks
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks

Gluster - Overview and Future Directions of the Scale-Out Distributed Storage System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Gluster - Overview and Future Directions of the Scale-Out Distributed Storage System

Similar to Gluster - Overview and Future Directions of the Scale-Out Distributed Storage System (20)

Recently uploaded

Recently uploaded (20)

Gluster - Overview and Future Directions of the Scale-Out Distributed Storage System