Performance characterization in large distributed file system with gluster fs

Neependra Khare
Neependra KhareFounder at CloudYuga, Docker Captain at Docker Community, Corporate Trainer on Container Technologies
The Future of Storage is Open for Business 4
Highly scalable storage

Multiple peta-byte clusters

Geo-replication to disperse data

Scale-up and scale-out

No metadata bottlenecks – uses
algorithmic approach
Highly cost-effective

Leverages commodity x86 servers

No SAN

Software only

Process data and analytics on storage
node
Highly Flexible

Physical, Virtual, Cloud and Hybrid
deployment models

File and Object access protocols

No Lock-In
Deployment Agnostic

Deploy on-premise, in the public cloud
or a hybrid setup.
Open & standards based

NFS, CIFS, POSIX

REST
GlusterFS Architecture – key value propositions
The Future of Storage is Open for Business 5
GlusterFS Features
➢ Unidirectional Asynchronous replication.
➢ Directory and Volume Quotas
➢ Read-only and WORM volumes.
➢ Directory Quotas.
➢ Block Device
➢ io statistics
➢ Multi-tenancy, encryption, compression - WIP.
The Future of Storage is Open for Business 6
Use Cases - Current
➢ Unstructured data storage
➢ Archival
➢ Disaster Recovery
➢ Virtual Machine Image Store
➢ Cloud Storage for Service Providers.
➢ Content Cloud
The Future of Storage is Open for Business 7
GlusterFS concepts
The Future of Storage is Open for Business 8
Bricks

Trusted Storage Pool (cluster) is a collection of storage servers.

Trusted Storage Pool is formed by invitation – you “probe” a new
member from the cluster and not vice versa.

Logical partition for all data and management operations.

Membership information used for determining quorum.

Members can be dynamically added and removed from the pool.
GlusterFS concepts – Trusted Storage Pool
The Future of Storage is Open for Business 9
BricksGlusterFS concepts – Trusted Storage Pool
Node1 Node2
Probe
Probe
accepted
Node 1 and Node 2 are peers in a trusted storage pool
Node2Node1
The Future of Storage is Open for Business 10
BricksGlusterFS concepts – Trusted Storage Pool
Node1 Node2 Node3Node2Node1 Trusted Storage Pool
Node3Node2Node1
Detach
The Future of Storage is Open for Business 11
Bricks

A brick is the combination of a node and an export directory – for e.g.
hostname:/dir

Each brick inherits limits of the underlying filesystem

No limit on the number bricks per node

Ideally, each brick in a cluster should be of the same size
/export3 /export3 /export3
Storage Node
/export1
Storage Node
/export2
/export1
/export2
/export4
/export5
Storage Node
/export1
/export2
3 bricks 5 bricks 3 bricks
GlusterFS concepts - Bricks
The Future of Storage is Open for Business 12
BricksGlusterFS concepts - Volumes
➢ A volume is a logical collection of bricks.
➢ Volume is identified by an administrator provided name.
➢Volume is a mountable entity and the volume name is
provided at the time of mounting.
➢ mount -t glusterfs server1:/<volname> /my/mnt/point
➢ Bricks from the same node can be part of different
volumes
The Future of Storage is Open for Business 13
BricksGlusterFS concepts - Volumes
Node2Node1 Node3
/export/brick1
/export/brick2
/export/brick1
/export/brick2
/export/brick1
/export/brick2
music
Videos
The Future of Storage is Open for Business 14
Volume Types
➢Type of a volume is specified at the time of volume
creation
➢ Volume type determines how and where data is placed
➢ Following volume types are supported in glusterfs:
a) Distribute
b) Stripe
c) Replication
d) Distributed Replicate
e) Striped Replicate
f) Distributed Striped Replicate
The Future of Storage is Open for Business 15
Distributed Volume
➢Distributes files across various bricks of the volume.
➢Directories are present on all bricks of the volume.
➢Single brick failure will result in loss of data availability.
➢Removes the need for an external meta data server.
The Future of Storage is Open for Business 16
How does a replicated volume work?
The Future of Storage is Open for Business 17
Access Mechanisms:
Gluster volumes can be accessed via the following
mechanisms:
➢ FUSE based Native protocol
➢ NFS
➢ SMB
➢ libgfapi
➢ ReST
➢ HDFS
The Future of Storage is Open for Business 18
Access Mechanisms - How does FUSE work ?
The Future of Storage is Open for Business 19
Access Mechanisms
smbd
(CIFS)
VFS
FUSE
glusterfs
client processLinux kernel
other RHS servers...
RHS
server
Swift
NFS V3 clientsWindows HTTP clients
glusterfsd
brick server
App RHS
client
qemu
(KVM)
Hadoop
The Future of Storage is Open for Business 20
FUSE based native access
The Future of Storage is Open for Business 21
NFS
The Future of Storage is Open for Business 22
ReST based access
The Future of Storage is Open for Business 23
libgfapi
➢Exposes APIs for accessing Gluster volumes.
➢Reduces context switches.
➢Qemu integrated with libgfapi.
➢Integration of samba with libgfapi in progress.
➢Both sync and async interfaces available.
➢Emerging bindings for various languages.
The Future of Storage is Open for Business 24
smbd
(CIFS)
VFS
FUSE
glusterfs
client processLinux kernel
other RHS servers...
RHS
server
Swift
NFS V3 clientsWindows HTTP clients
glusterfsd
brick server
App
qemu
(KVM)
Hadoop
libgfapi
The Future of Storage is Open for Business 25
Translators in GlusterFS
➢Building blocks for a GlusterFS process.
➢Based on Translators in GNU HURD.
➢Each translator is a functional unit.
➢Translators can be stacked together for
achieving desired functionality.
➢ Translators are deployment agnostic – can
be loaded in either the client or server stacks.
The Future of Storage is Open for Business 26
VFS
Server
I/O Cache
Distribute / Stripe
POSIX
Ext4 Ext4Ext4
POSIX POSIX
Brick 1
ServerServer
G
lusterFS
C
lient
Read Ahead
Brick 2 Brick n-1
Gluster Server
Replicate
Ext4
POSIX
Server
Brick n
Replicate
Customizable GlusterFS Client/Server Stack
Client
Gluster Server
Client
GigE, 10GigE – TCPIP / InfiniBand – RDMA
Gluster ServerGluster Server
Client Client
The Future of Storage is Open for Business 27
Provisioning GlusterFS
➢ design -> install -> verify -> monitor
➢ design principles:
➢ balanced hardware configuration
➢ Is the hardware sufficient ?
➢ do not under fund network, cheapest of three CPU, Storage and Network
➢ limited bricks/volume -> make bricks big
➢ file size distribution affects bottleneck type
➢ network traffic increase for non-native protocols
➢ extra replication traffic for writes
➢ install principles:
➢ provision network for your use case
➢ configure storage to be future-ready
The Future of Storage is Open for Business 28
Recommended storage brick configuration (With XFS)
➢ 12 drives/RAID6 LUN, 1 LUN / brick
➢ hardware RAID stripe size 256 KB (default 64
KB)
➢ pvcreate –dataalignment 2560k
➢ mkfs.xfs -i size=512 -n size=8192 -d
su=256k,sw=10
➢ /dev/vg_bricks/lvN
➢ mount options: inode64,noatime
The Future of Storage is Open for Business 29
Deploying network
➢ if non-native protocol only, separate Gluster and non-Gluster
traffic onto separate VLANs
➢ isolates self-heal and rebalance traffic
➢ separates replica traffic from user traffic
➢ jumbo frames – improve throughput, but requires switch
configuration
➢ bisection bandwidth – Gluster doesn't respect rack boundaries
The Future of Storage is Open for Business 30
Capturing perf. problems onsite
➢ top utility – press H to show per-thread CPU utilization, will detect
“hot-thread” problems where thread is using up its core
➢ NFS: nfsiostat and nfsstat utilities
➢ gluster volume profile – shows latency, throughput for Gluster
RPC operations
➢ gluster volume top – shows which files, servers are hot
➢ Wireshark Gluster plug-in – isolates problems
The Future of Storage is Open for Business 31
Resources
Mailing lists:
gluster-users@gluster.org
gluster-devel@nongnu.org
IRC:
#gluster and #gluster-dev on freenode
Links:
http://www.gluster.org
http://hekafs.org
http://forge.gluster.org
http://www.gluster.org/community/documentation/index.php/Arch
The Future of Storage is Open for Business 32
Questions?
Thank You!
1 of 29

More Related Content

Similar to Performance characterization in large distributed file system with gluster fs

Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
606 views18 slides
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
119 views18 slides

Similar to Performance characterization in large distributed file system with gluster fs(20)

More from Neependra Khare(11)

002 a solooverviewjul2020-ceposta002 a solooverviewjul2020-ceposta
002 a solooverviewjul2020-ceposta
Neependra Khare553 views
Securing modern infrastructureSecuring modern infrastructure
Securing modern infrastructure
Neependra Khare260 views
DevOps India Days' 17 KeynoteDevOps India Days' 17 Keynote
DevOps India Days' 17 Keynote
Neependra Khare544 views
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
Neependra Khare1.8K views
Project MobyProject Moby
Project Moby
Neependra Khare818 views

Recently uploaded(20)

METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 views
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum183 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman152 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez28 views

Performance characterization in large distributed file system with gluster fs

  • 1. The Future of Storage is Open for Business 4 Highly scalable storage  Multiple peta-byte clusters  Geo-replication to disperse data  Scale-up and scale-out  No metadata bottlenecks – uses algorithmic approach Highly cost-effective  Leverages commodity x86 servers  No SAN  Software only  Process data and analytics on storage node Highly Flexible  Physical, Virtual, Cloud and Hybrid deployment models  File and Object access protocols  No Lock-In Deployment Agnostic  Deploy on-premise, in the public cloud or a hybrid setup. Open & standards based  NFS, CIFS, POSIX  REST GlusterFS Architecture – key value propositions
  • 2. The Future of Storage is Open for Business 5 GlusterFS Features ➢ Unidirectional Asynchronous replication. ➢ Directory and Volume Quotas ➢ Read-only and WORM volumes. ➢ Directory Quotas. ➢ Block Device ➢ io statistics ➢ Multi-tenancy, encryption, compression - WIP.
  • 3. The Future of Storage is Open for Business 6 Use Cases - Current ➢ Unstructured data storage ➢ Archival ➢ Disaster Recovery ➢ Virtual Machine Image Store ➢ Cloud Storage for Service Providers. ➢ Content Cloud
  • 4. The Future of Storage is Open for Business 7 GlusterFS concepts
  • 5. The Future of Storage is Open for Business 8 Bricks  Trusted Storage Pool (cluster) is a collection of storage servers.  Trusted Storage Pool is formed by invitation – you “probe” a new member from the cluster and not vice versa.  Logical partition for all data and management operations.  Membership information used for determining quorum.  Members can be dynamically added and removed from the pool. GlusterFS concepts – Trusted Storage Pool
  • 6. The Future of Storage is Open for Business 9 BricksGlusterFS concepts – Trusted Storage Pool Node1 Node2 Probe Probe accepted Node 1 and Node 2 are peers in a trusted storage pool Node2Node1
  • 7. The Future of Storage is Open for Business 10 BricksGlusterFS concepts – Trusted Storage Pool Node1 Node2 Node3Node2Node1 Trusted Storage Pool Node3Node2Node1 Detach
  • 8. The Future of Storage is Open for Business 11 Bricks  A brick is the combination of a node and an export directory – for e.g. hostname:/dir  Each brick inherits limits of the underlying filesystem  No limit on the number bricks per node  Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks
  • 9. The Future of Storage is Open for Business 12 BricksGlusterFS concepts - Volumes ➢ A volume is a logical collection of bricks. ➢ Volume is identified by an administrator provided name. ➢Volume is a mountable entity and the volume name is provided at the time of mounting. ➢ mount -t glusterfs server1:/<volname> /my/mnt/point ➢ Bricks from the same node can be part of different volumes
  • 10. The Future of Storage is Open for Business 13 BricksGlusterFS concepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos
  • 11. The Future of Storage is Open for Business 14 Volume Types ➢Type of a volume is specified at the time of volume creation ➢ Volume type determines how and where data is placed ➢ Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate f) Distributed Striped Replicate
  • 12. The Future of Storage is Open for Business 15 Distributed Volume ➢Distributes files across various bricks of the volume. ➢Directories are present on all bricks of the volume. ➢Single brick failure will result in loss of data availability. ➢Removes the need for an external meta data server.
  • 13. The Future of Storage is Open for Business 16 How does a replicated volume work?
  • 14. The Future of Storage is Open for Business 17 Access Mechanisms: Gluster volumes can be accessed via the following mechanisms: ➢ FUSE based Native protocol ➢ NFS ➢ SMB ➢ libgfapi ➢ ReST ➢ HDFS
  • 15. The Future of Storage is Open for Business 18 Access Mechanisms - How does FUSE work ?
  • 16. The Future of Storage is Open for Business 19 Access Mechanisms smbd (CIFS) VFS FUSE glusterfs client processLinux kernel other RHS servers... RHS server Swift NFS V3 clientsWindows HTTP clients glusterfsd brick server App RHS client qemu (KVM) Hadoop
  • 17. The Future of Storage is Open for Business 20 FUSE based native access
  • 18. The Future of Storage is Open for Business 21 NFS
  • 19. The Future of Storage is Open for Business 22 ReST based access
  • 20. The Future of Storage is Open for Business 23 libgfapi ➢Exposes APIs for accessing Gluster volumes. ➢Reduces context switches. ➢Qemu integrated with libgfapi. ➢Integration of samba with libgfapi in progress. ➢Both sync and async interfaces available. ➢Emerging bindings for various languages.
  • 21. The Future of Storage is Open for Business 24 smbd (CIFS) VFS FUSE glusterfs client processLinux kernel other RHS servers... RHS server Swift NFS V3 clientsWindows HTTP clients glusterfsd brick server App qemu (KVM) Hadoop libgfapi
  • 22. The Future of Storage is Open for Business 25 Translators in GlusterFS ➢Building blocks for a GlusterFS process. ➢Based on Translators in GNU HURD. ➢Each translator is a functional unit. ➢Translators can be stacked together for achieving desired functionality. ➢ Translators are deployment agnostic – can be loaded in either the client or server stacks.
  • 23. The Future of Storage is Open for Business 26 VFS Server I/O Cache Distribute / Stripe POSIX Ext4 Ext4Ext4 POSIX POSIX Brick 1 ServerServer G lusterFS C lient Read Ahead Brick 2 Brick n-1 Gluster Server Replicate Ext4 POSIX Server Brick n Replicate Customizable GlusterFS Client/Server Stack Client Gluster Server Client GigE, 10GigE – TCPIP / InfiniBand – RDMA Gluster ServerGluster Server Client Client
  • 24. The Future of Storage is Open for Business 27 Provisioning GlusterFS ➢ design -> install -> verify -> monitor ➢ design principles: ➢ balanced hardware configuration ➢ Is the hardware sufficient ? ➢ do not under fund network, cheapest of three CPU, Storage and Network ➢ limited bricks/volume -> make bricks big ➢ file size distribution affects bottleneck type ➢ network traffic increase for non-native protocols ➢ extra replication traffic for writes ➢ install principles: ➢ provision network for your use case ➢ configure storage to be future-ready
  • 25. The Future of Storage is Open for Business 28 Recommended storage brick configuration (With XFS) ➢ 12 drives/RAID6 LUN, 1 LUN / brick ➢ hardware RAID stripe size 256 KB (default 64 KB) ➢ pvcreate –dataalignment 2560k ➢ mkfs.xfs -i size=512 -n size=8192 -d su=256k,sw=10 ➢ /dev/vg_bricks/lvN ➢ mount options: inode64,noatime
  • 26. The Future of Storage is Open for Business 29 Deploying network ➢ if non-native protocol only, separate Gluster and non-Gluster traffic onto separate VLANs ➢ isolates self-heal and rebalance traffic ➢ separates replica traffic from user traffic ➢ jumbo frames – improve throughput, but requires switch configuration ➢ bisection bandwidth – Gluster doesn't respect rack boundaries
  • 27. The Future of Storage is Open for Business 30 Capturing perf. problems onsite ➢ top utility – press H to show per-thread CPU utilization, will detect “hot-thread” problems where thread is using up its core ➢ NFS: nfsiostat and nfsstat utilities ➢ gluster volume profile – shows latency, throughput for Gluster RPC operations ➢ gluster volume top – shows which files, servers are hot ➢ Wireshark Gluster plug-in – isolates problems
  • 28. The Future of Storage is Open for Business 31 Resources Mailing lists: gluster-users@gluster.org gluster-devel@nongnu.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch
  • 29. The Future of Storage is Open for Business 32 Questions? Thank You!