Performance characterization in large distributed file system with gluster fs

11,623 views

Published on

GlusterFS talk at Quarterly Large Scale Production Engineering (LSPE) meet @ Yahoo! Bangalore.
http://www.meetup.com/lspe-in/events/108091572/

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
11,623
On SlideShare
0
From Embeds
0
Number of Embeds
10,210
Actions
Shares
0
Downloads
68
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Performance characterization in large distributed file system with gluster fs

  1. 1. The Future of Storage is Open for Business 4 Highly scalable storage  Multiple peta-byte clusters  Geo-replication to disperse data  Scale-up and scale-out  No metadata bottlenecks – uses algorithmic approach Highly cost-effective  Leverages commodity x86 servers  No SAN  Software only  Process data and analytics on storage node Highly Flexible  Physical, Virtual, Cloud and Hybrid deployment models  File and Object access protocols  No Lock-In Deployment Agnostic  Deploy on-premise, in the public cloud or a hybrid setup. Open & standards based  NFS, CIFS, POSIX  REST GlusterFS Architecture – key value propositions
  2. 2. The Future of Storage is Open for Business 5 GlusterFS Features ➢ Unidirectional Asynchronous replication. ➢ Directory and Volume Quotas ➢ Read-only and WORM volumes. ➢ Directory Quotas. ➢ Block Device ➢ io statistics ➢ Multi-tenancy, encryption, compression - WIP.
  3. 3. The Future of Storage is Open for Business 6 Use Cases - Current ➢ Unstructured data storage ➢ Archival ➢ Disaster Recovery ➢ Virtual Machine Image Store ➢ Cloud Storage for Service Providers. ➢ Content Cloud
  4. 4. The Future of Storage is Open for Business 7 GlusterFS concepts
  5. 5. The Future of Storage is Open for Business 8 Bricks  Trusted Storage Pool (cluster) is a collection of storage servers.  Trusted Storage Pool is formed by invitation – you “probe” a new member from the cluster and not vice versa.  Logical partition for all data and management operations.  Membership information used for determining quorum.  Members can be dynamically added and removed from the pool. GlusterFS concepts – Trusted Storage Pool
  6. 6. The Future of Storage is Open for Business 9 BricksGlusterFS concepts – Trusted Storage Pool Node1 Node2 Probe Probe accepted Node 1 and Node 2 are peers in a trusted storage pool Node2Node1
  7. 7. The Future of Storage is Open for Business 10 BricksGlusterFS concepts – Trusted Storage Pool Node1 Node2 Node3Node2Node1 Trusted Storage Pool Node3Node2Node1 Detach
  8. 8. The Future of Storage is Open for Business 11 Bricks  A brick is the combination of a node and an export directory – for e.g. hostname:/dir  Each brick inherits limits of the underlying filesystem  No limit on the number bricks per node  Ideally, each brick in a cluster should be of the same size /export3 /export3 /export3 Storage Node /export1 Storage Node /export2 /export1 /export2 /export4 /export5 Storage Node /export1 /export2 3 bricks 5 bricks 3 bricks GlusterFS concepts - Bricks
  9. 9. The Future of Storage is Open for Business 12 BricksGlusterFS concepts - Volumes ➢ A volume is a logical collection of bricks. ➢ Volume is identified by an administrator provided name. ➢Volume is a mountable entity and the volume name is provided at the time of mounting. ➢ mount -t glusterfs server1:/<volname> /my/mnt/point ➢ Bricks from the same node can be part of different volumes
  10. 10. The Future of Storage is Open for Business 13 BricksGlusterFS concepts - Volumes Node2Node1 Node3 /export/brick1 /export/brick2 /export/brick1 /export/brick2 /export/brick1 /export/brick2 music Videos
  11. 11. The Future of Storage is Open for Business 14 Volume Types ➢Type of a volume is specified at the time of volume creation ➢ Volume type determines how and where data is placed ➢ Following volume types are supported in glusterfs: a) Distribute b) Stripe c) Replication d) Distributed Replicate e) Striped Replicate f) Distributed Striped Replicate
  12. 12. The Future of Storage is Open for Business 15 Distributed Volume ➢Distributes files across various bricks of the volume. ➢Directories are present on all bricks of the volume. ➢Single brick failure will result in loss of data availability. ➢Removes the need for an external meta data server.
  13. 13. The Future of Storage is Open for Business 16 How does a replicated volume work?
  14. 14. The Future of Storage is Open for Business 17 Access Mechanisms: Gluster volumes can be accessed via the following mechanisms: ➢ FUSE based Native protocol ➢ NFS ➢ SMB ➢ libgfapi ➢ ReST ➢ HDFS
  15. 15. The Future of Storage is Open for Business 18 Access Mechanisms - How does FUSE work ?
  16. 16. The Future of Storage is Open for Business 19 Access Mechanisms smbd (CIFS) VFS FUSE glusterfs client processLinux kernel other RHS servers... RHS server Swift NFS V3 clientsWindows HTTP clients glusterfsd brick server App RHS client qemu (KVM) Hadoop
  17. 17. The Future of Storage is Open for Business 20 FUSE based native access
  18. 18. The Future of Storage is Open for Business 21 NFS
  19. 19. The Future of Storage is Open for Business 22 ReST based access
  20. 20. The Future of Storage is Open for Business 23 libgfapi ➢Exposes APIs for accessing Gluster volumes. ➢Reduces context switches. ➢Qemu integrated with libgfapi. ➢Integration of samba with libgfapi in progress. ➢Both sync and async interfaces available. ➢Emerging bindings for various languages.
  21. 21. The Future of Storage is Open for Business 24 smbd (CIFS) VFS FUSE glusterfs client processLinux kernel other RHS servers... RHS server Swift NFS V3 clientsWindows HTTP clients glusterfsd brick server App qemu (KVM) Hadoop libgfapi
  22. 22. The Future of Storage is Open for Business 25 Translators in GlusterFS ➢Building blocks for a GlusterFS process. ➢Based on Translators in GNU HURD. ➢Each translator is a functional unit. ➢Translators can be stacked together for achieving desired functionality. ➢ Translators are deployment agnostic – can be loaded in either the client or server stacks.
  23. 23. The Future of Storage is Open for Business 26 VFS Server I/O Cache Distribute / Stripe POSIX Ext4 Ext4Ext4 POSIX POSIX Brick 1 ServerServer G lusterFS C lient Read Ahead Brick 2 Brick n-1 Gluster Server Replicate Ext4 POSIX Server Brick n Replicate Customizable GlusterFS Client/Server Stack Client Gluster Server Client GigE, 10GigE – TCPIP / InfiniBand – RDMA Gluster ServerGluster Server Client Client
  24. 24. The Future of Storage is Open for Business 27 Provisioning GlusterFS ➢ design -> install -> verify -> monitor ➢ design principles: ➢ balanced hardware configuration ➢ Is the hardware sufficient ? ➢ do not under fund network, cheapest of three CPU, Storage and Network ➢ limited bricks/volume -> make bricks big ➢ file size distribution affects bottleneck type ➢ network traffic increase for non-native protocols ➢ extra replication traffic for writes ➢ install principles: ➢ provision network for your use case ➢ configure storage to be future-ready
  25. 25. The Future of Storage is Open for Business 28 Recommended storage brick configuration (With XFS) ➢ 12 drives/RAID6 LUN, 1 LUN / brick ➢ hardware RAID stripe size 256 KB (default 64 KB) ➢ pvcreate –dataalignment 2560k ➢ mkfs.xfs -i size=512 -n size=8192 -d su=256k,sw=10 ➢ /dev/vg_bricks/lvN ➢ mount options: inode64,noatime
  26. 26. The Future of Storage is Open for Business 29 Deploying network ➢ if non-native protocol only, separate Gluster and non-Gluster traffic onto separate VLANs ➢ isolates self-heal and rebalance traffic ➢ separates replica traffic from user traffic ➢ jumbo frames – improve throughput, but requires switch configuration ➢ bisection bandwidth – Gluster doesn't respect rack boundaries
  27. 27. The Future of Storage is Open for Business 30 Capturing perf. problems onsite ➢ top utility – press H to show per-thread CPU utilization, will detect “hot-thread” problems where thread is using up its core ➢ NFS: nfsiostat and nfsstat utilities ➢ gluster volume profile – shows latency, throughput for Gluster RPC operations ➢ gluster volume top – shows which files, servers are hot ➢ Wireshark Gluster plug-in – isolates problems
  28. 28. The Future of Storage is Open for Business 31 Resources Mailing lists: gluster-users@gluster.org gluster-devel@nongnu.org IRC: #gluster and #gluster-dev on freenode Links: http://www.gluster.org http://hekafs.org http://forge.gluster.org http://www.gluster.org/community/documentation/index.php/Arch
  29. 29. The Future of Storage is Open for Business 32 Questions? Thank You!

×