Performance characterization in large distributed file system with gluster fs
1. The Future of Storage is Open for Business 4
Highly scalable storage
Multiple peta-byte clusters
Geo-replication to disperse data
Scale-up and scale-out
No metadata bottlenecks – uses
algorithmic approach
Highly cost-effective
Leverages commodity x86 servers
No SAN
Software only
Process data and analytics on storage
node
Highly Flexible
Physical, Virtual, Cloud and Hybrid
deployment models
File and Object access protocols
No Lock-In
Deployment Agnostic
Deploy on-premise, in the public cloud
or a hybrid setup.
Open & standards based
NFS, CIFS, POSIX
REST
GlusterFS Architecture – key value propositions
2. The Future of Storage is Open for Business 5
GlusterFS Features
➢ Unidirectional Asynchronous replication.
➢ Directory and Volume Quotas
➢ Read-only and WORM volumes.
➢ Directory Quotas.
➢ Block Device
➢ io statistics
➢ Multi-tenancy, encryption, compression - WIP.
3. The Future of Storage is Open for Business 6
Use Cases - Current
➢ Unstructured data storage
➢ Archival
➢ Disaster Recovery
➢ Virtual Machine Image Store
➢ Cloud Storage for Service Providers.
➢ Content Cloud
4. The Future of Storage is Open for Business 7
GlusterFS concepts
5. The Future of Storage is Open for Business 8
Bricks
Trusted Storage Pool (cluster) is a collection of storage servers.
Trusted Storage Pool is formed by invitation – you “probe” a new
member from the cluster and not vice versa.
Logical partition for all data and management operations.
Membership information used for determining quorum.
Members can be dynamically added and removed from the pool.
GlusterFS concepts – Trusted Storage Pool
6. The Future of Storage is Open for Business 9
BricksGlusterFS concepts – Trusted Storage Pool
Node1 Node2
Probe
Probe
accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
7. The Future of Storage is Open for Business 10
BricksGlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
8. The Future of Storage is Open for Business 11
Bricks
A brick is the combination of a node and an export directory – for e.g.
hostname:/dir
Each brick inherits limits of the underlying filesystem
No limit on the number bricks per node
Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
9. The Future of Storage is Open for Business 12
BricksGlusterFS concepts - Volumes
➢ A volume is a logical collection of bricks.
➢ Volume is identified by an administrator provided name.
➢Volume is a mountable entity and the volume name is
provided at the time of mounting.
➢ mount -t glusterfs server1:/<volname> /my/mnt/point
➢ Bricks from the same node can be part of different
volumes
10. The Future of Storage is Open for Business 13
BricksGlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
11. The Future of Storage is Open for Business 14
Volume Types
➢Type of a volume is specified at the time of volume
creation
➢ Volume type determines how and where data is placed
➢ Following volume types are supported in glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate
f) Distributed Striped Replicate
12. The Future of Storage is Open for Business 15
Distributed Volume
➢Distributes files across various bricks of the volume.
➢Directories are present on all bricks of the volume.
➢Single brick failure will result in loss of data availability.
➢Removes the need for an external meta data server.
13. The Future of Storage is Open for Business 16
How does a replicated volume work?
14. The Future of Storage is Open for Business 17
Access Mechanisms:
Gluster volumes can be accessed via the following
mechanisms:
➢ FUSE based Native protocol
➢ NFS
➢ SMB
➢ libgfapi
➢ ReST
➢ HDFS
15. The Future of Storage is Open for Business 18
Access Mechanisms - How does FUSE work ?
16. The Future of Storage is Open for Business 19
Access Mechanisms
smbd
(CIFS)
VFS
FUSE
glusterfs
client processLinux kernel
other RHS servers...
RHS
server
Swift
NFS V3 clientsWindows HTTP clients
glusterfsd
brick server
App RHS
client
qemu
(KVM)
Hadoop
17. The Future of Storage is Open for Business 20
FUSE based native access
19. The Future of Storage is Open for Business 22
ReST based access
20. The Future of Storage is Open for Business 23
libgfapi
➢Exposes APIs for accessing Gluster volumes.
➢Reduces context switches.
➢Qemu integrated with libgfapi.
➢Integration of samba with libgfapi in progress.
➢Both sync and async interfaces available.
➢Emerging bindings for various languages.
21. The Future of Storage is Open for Business 24
smbd
(CIFS)
VFS
FUSE
glusterfs
client processLinux kernel
other RHS servers...
RHS
server
Swift
NFS V3 clientsWindows HTTP clients
glusterfsd
brick server
App
qemu
(KVM)
Hadoop
libgfapi
22. The Future of Storage is Open for Business 25
Translators in GlusterFS
➢Building blocks for a GlusterFS process.
➢Based on Translators in GNU HURD.
➢Each translator is a functional unit.
➢Translators can be stacked together for
achieving desired functionality.
➢ Translators are deployment agnostic – can
be loaded in either the client or server stacks.
23. The Future of Storage is Open for Business 26
VFS
Server
I/O Cache
Distribute / Stripe
POSIX
Ext4 Ext4Ext4
POSIX POSIX
Brick 1
ServerServer
G
lusterFS
C
lient
Read Ahead
Brick 2 Brick n-1
Gluster Server
Replicate
Ext4
POSIX
Server
Brick n
Replicate
Customizable GlusterFS Client/Server Stack
Client
Gluster Server
Client
GigE, 10GigE – TCPIP / InfiniBand – RDMA
Gluster ServerGluster Server
Client Client
24. The Future of Storage is Open for Business 27
Provisioning GlusterFS
➢ design -> install -> verify -> monitor
➢ design principles:
➢ balanced hardware configuration
➢ Is the hardware sufficient ?
➢ do not under fund network, cheapest of three CPU, Storage and Network
➢ limited bricks/volume -> make bricks big
➢ file size distribution affects bottleneck type
➢ network traffic increase for non-native protocols
➢ extra replication traffic for writes
➢ install principles:
➢ provision network for your use case
➢ configure storage to be future-ready
25. The Future of Storage is Open for Business 28
Recommended storage brick configuration (With XFS)
➢ 12 drives/RAID6 LUN, 1 LUN / brick
➢ hardware RAID stripe size 256 KB (default 64
KB)
➢ pvcreate –dataalignment 2560k
➢ mkfs.xfs -i size=512 -n size=8192 -d
su=256k,sw=10
➢ /dev/vg_bricks/lvN
➢ mount options: inode64,noatime
26. The Future of Storage is Open for Business 29
Deploying network
➢ if non-native protocol only, separate Gluster and non-Gluster
traffic onto separate VLANs
➢ isolates self-heal and rebalance traffic
➢ separates replica traffic from user traffic
➢ jumbo frames – improve throughput, but requires switch
configuration
➢ bisection bandwidth – Gluster doesn't respect rack boundaries
27. The Future of Storage is Open for Business 30
Capturing perf. problems onsite
➢ top utility – press H to show per-thread CPU utilization, will detect
“hot-thread” problems where thread is using up its core
➢ NFS: nfsiostat and nfsstat utilities
➢ gluster volume profile – shows latency, throughput for Gluster
RPC operations
➢ gluster volume top – shows which files, servers are hot
➢ Wireshark Gluster plug-in – isolates problems
28. The Future of Storage is Open for Business 31
Resources
Mailing lists:
gluster-users@gluster.org
gluster-devel@nongnu.org
IRC:
#gluster and #gluster-dev on freenode
Links:
http://www.gluster.org
http://hekafs.org
http://forge.gluster.org
http://www.gluster.org/community/documentation/index.php/Arch
29. The Future of Storage is Open for Business 32
Questions?
Thank You!