Successfully reported this slideshow.
Your SlideShare is downloading. ×

Lisa 2015-gluster fs-hands-on

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 49 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Lisa 2015-gluster fs-hands-on (20)

More from Gluster.org (20)

Advertisement

Recently uploaded (20)

Lisa 2015-gluster fs-hands-on

  1. 1. November 8–13, 2015 | Washington, D.C. www.usenix.org/lisa15 #lisa15 GlusterFSGlusterFS A Scale-out Software Defined Storage Rajesh Joseph Poornima Gurusiddaiah
  2. 2. 05/17/16 Note ● This holds good for 3.7 version of GlusterFS, other version might have variations ● Commands shown here work on CentOS, other distributions might have different command or options ● At the right corner of the slides, there is a link to the live demo
  3. 3. 05/17/16 GlusterFS Installation Installation via Repo Download latest repo file from download.gluster.org Install GlusterFS Installation via RPM Download latest gluster RPMs from download.gluster.org http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/epel­7/x86_64/ wget ­P /etc/yum.repos.d  http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/epel­7/x86_64/ yum install glusterfs­server
  4. 4. 05/17/16 GlusterFS Server Packages glusterfs glusterfs-server glusterfs-api glusterfs-cli glusterfs-libs GlusterFS Client Packages glusterfs glusterfs-client-xlators glusterfs-libs glusterfs-fuse GlusterFS Feature Packages glusterfs-extra-xlators glusterfs-ganesha glusterfs-geo-replication glusterfs-rdma GlusterFS Devel Packages glusterfs-debuginfo glusterfs-devel glusterfs-api-devel GlusterFS Packages
  5. 5. 05/17/16 Ports used by GlusterFS UDP Ports 111 – RPC 963 – NFS lock manager (NLM) TCP Ports 22 – For sshd used by geo-replication 111 – RPC 139 – netbios service 445 – CIFS protocol 965 – NLM
  6. 6. 05/17/16 Ports used by GlusterFS TCP Ports 2049 – NFS exports 4379 – CTDB 24007 – GlusterFS Daemon (Management) 24008 – GlusterFS Daemon (RDMA port for Management) 24009 – Each brick of every volume on the node (GlusterFS version < 3.4) 49152 – Each brick of every volume on the node (GlusterFS version >= 3.4) 38465-38467 – GlusterFS NFS service 38468 – NFS Lock Manager (NLM) 38469 – NFS ACL Support
  7. 7. 05/17/16 Starting Gluster Server Gluster server/service can be started by the following command Gluster server should be started on all the nodes To automatically start GlusterFS on node start use chkconfig command or # systemctl start glusterd # systemctl enable glusterd # chkconfig glusterd on
  8. 8. 05/17/16 Setting up Trusted Storage Pool Use gluster peer probe command to include a new Node to the Trusted Storage Pool Removing Node from the Trusted Storage Pool Verify the peer probe/detach succeeded by executing the following command on all the nodes # gluster peer status # gluster peer probe <Node IP/Hostname of new Node> # gluster peer detach <Node IP/Hostname> 
  9. 9. 05/17/16 Creating Bricks Create thinly provisioned volume (dm-thin) Create Physical Volume (PV) Create Volume Group (VG) from the PV Create Thin Pool Create Thinly provisioned Logical Volume (LV) # pvcreate /dev/sdb # vgcreate vgname1 /dev/sdb # lvcreate ­L 2T –poolmetadatasize 16G ­T vgname1/thinpoolname1 # lvcreate ­V 1T ­T vgname1/thinpoolname1 ­n lvname1 
  10. 10. 05/17/16 Creating Bricks Create Mount And use it # mount  /dev/mapper/vgname1­lvname1 /mnt/brick1 # mkfs.xfs ­i size=512 /dev/mapper/vgname1­lvname1 # mkdir /mnt/brick1/data
  11. 11. Distribute Volume Storage Node Storage Node Storage Node Brick Brick Brick Client File1 File2 [0, a] [a + 1, b] [b + 1, c] File1 Hash = x, Where 0 <= x <= a File2 Hash = y, Where b < y <= c
  12. 12. 05/17/16 Creating Volumes - Distribute Distributed volumes distributes files throughout the bricks in the volume Its advised to provide a nested directory in the brick mount point as the brick directory If transport type is not specified 'tcp' is used as default # gluster volume create <volume name> [transport <tcp|rdma| tcp,rdma>] <Node IP/hostname>:<brick path>.... [force] e.g. # gluster volume create dist_vol host1:/mnt/brick1/data  host2:/mnt/brick1/data Demo
  13. 13. Replicate Volume Storage Node Brick Client File1 Storage Node Brick File1
  14. 14. 05/17/16 Creating Volumes - Replicate Replicated volumes provides file replication across n (replica) bricks Number of bricks must be a multiple of the replica count It is advised to have bricks in different servers The replication is synchronous in nature, hence it is not advised to combine a brick in different geo location as it may reduce the performance drastically # gluster volume create <volume name> [replica <COUNT>] [transport  <tcp|rdma|tcp,rdma>] <Node IP/hostname>:<brick path>.... [force] e.g. # gluster volume create repl_vol replica 3 host1:/mnt/brick1/data  host2:/mnt/brick1/data host3:/mnt/brick1/data
  15. 15. Distribute Replicate Volume Storage Node Brick Storage Node Brick Storage Node Brick Client File1 Storage Node Brick Replica Pairs Replica Pairs File2 File2File1
  16. 16. 05/17/16 Creating Volumes – Distribute Replicate Distributed replicated volumes distributes files across replicated bricks in the volume Number of bricks must be a multiple of the replica count. Brick order decides replica set and distribution set # gluster volume create <volume name> [replica <COUNT>] [transport  <tcp|rdma|tcp,rdma>] <Node IP/hostname>:<brick path>.... [force] e.g. # gluster volume create repl_vol replica 3 host1:/mnt/brick1/data  host2:/mnt/brick1/data host3:/mnt/brick1/data host1:/mnt/brick2/data  host2:/mnt/brick2/data host3:/mnt/brick2/data
  17. 17. Disperse Volume Storage Node Brick Client File1 Storage Node Brick Storage Node Brick Data Part Parity / Redundancy Part
  18. 18. 05/17/16 Creating Volumes – Disperse Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures The data protection offered by erasure coding can be represented as n = k + m n = total number of bricks, disperse count k = total number of data bricks, disperse-data count m = number of brick failure that can be tolerated, redundancy count Any two counts need to be specified while creating volume # gluster volume create <volume name> [disperse COUNT] [disperse­ data COUNT] [redundancy COUNT] [transport tcp|rdma|tcp,rdma] <Node  IP/hostname>:<brick path>.... [force] Eg: 6 = 4 + 2 i.e. a 10MB file is split into 6 2.5MB chunks and stored in all 6  bricks(=15MB) but can withstand failure of 2 bricks
  19. 19. Sharded Volume Storage Node Brick Client File1 Storage Node Brick Storage Node Brick GFID1.1 GFID1.3GFID1.2
  20. 20. 05/17/16 Creating Volumes – Sharded Sharded volume is similar to striped volume Unlike other volume types shard is a volume option which can be set on any volume To disable sharding it is advisable to create a new volume without sharding and copy out contents of this volume into the new volume This feature is disabled by default, and is beta in 3.7.4 release # gluster volume set <volume name> features.shard on
  21. 21. 05/17/16 Starting Volumes Volumes must be started before they can be mounted Use the following command to start volume # gluster volume start <volname> e.g. # gluster volume start dist_vol
  22. 22. 05/17/16 Configuring Volume Options Current volume options Volume options can be configured using the following command # gluster volume info # gluster volume set <volname> <option> <value>
  23. 23. 05/17/16 Expanding Volume Volume can be expanded when the cluster is online and available Add Node to the Trusted Storage Pool Add bricks to the volume In case of replicate, the bricks count should be multiple of replica count # gluster peer probe <IP/hostname> # gluster volume add­brick <VOLNAME> <Node IP/hostname>:<brick path>....
  24. 24. 05/17/16 Expanding Volume To change the replica count, following command needs to be executed Number of replica bricks to be added must be equal to the number of distribute sub-volumes e.g change replica 2 distribute 3, to replica 3 distribute 3 for volume dist-repl Rebalance the bricks # gluster volume add­brick replica <new count> <VOLNAME> <Node IP/hostname>:<brick path>... # gluster volume dist­repl replica 3 host1:/brick1/brick1 host2:/brick1/brick1  host3:/brick1/brick1 # gluster volume rebalance <volname> <start | status | stop>
  25. 25. 05/17/16 Shrinking Volume Remove a brick using the following command You can view the status of the remove brick operation using the following command After status shows complete run the following command to remove brick # gluster volume remove­brick <volname> BRICK start [force] # gluster volume remove­brick <volname> BRICK commit # gluster volume remove­brick <volname> BRICK status
  26. 26. 05/17/16 Volume Self Healing In Replicate volume when an offline bricks comes online the updates on the online brick needs to be synced to this brick – Self Healing File is healed by Self-Heal daemon (SHD) On-access On-demand SHD automatically initiates heal every 10 minutes SHD can be turned on/off by the following command# gluster volume set <volname> cluster.self­heal­deamon <on | off>
  27. 27. 05/17/16 Volume Self Healing On-demand healing can by done by To enable/disable healing when file is accessed from the mount point # gluster volume set <volname> cluster.data­self­heal off # gluster volume heal <volname> cluster.entry­self­heal off # gluster volume heal <volname> cluster.metadata­self­heal off # gluster volume heal <volname> # gluster volume heal <volname> full # gluster volume heal <volname> info
  28. 28. 05/17/16 Volume Self Healing In Replicate volume when an offline bricks comes online the updates on the online brick needs to be synced to this brick – Self Healing File is healed by Self-Heal daemon (SHD) On-access On-demand SHD automatically initiates heal every 10 minutes SHD can be turned on/off by the following command# gluster volume set <volname> cluster.self­heal­deamon <on | off>
  29. 29. 05/17/16 Accessing Data Volume can be mounted on local file-system Following protocols supported for accessing volume GlusterFS Native client Filesystem in Userspace (FUSE) NFS NFS Ganesha Gluster NFSv3 SMB / CIFS
  30. 30. 05/17/16 GlusterFS Native Client Client machines should install GlusterFS client packages Mount the started GlusterFS volume Use any Node from Trusted Storage Pool to mount Use /etc/fstab for automatic mount # mount ­t glusterfs host1:/dist­vol /mnt/glusterfs e.g. to mount dist­vol append following to /etc/fstab host1:/dist­vol /mnt/glusterfs glusterfs defaults,_netdev,transport=tcp 0 0 Demo
  31. 31. 05/17/16 NFS Client Install NFS client packages Mount the started GlusterFS volume via NFS Gluster NFS supports only version 3 Use /etc/fstab for automatic mount # mount ­t nfs ­o vers=3 host1:/dist­vol /mnt/glusterfs
  32. 32. 05/17/16 SMB Client For high availability and lock synchronization SMB uses CTDB Install CTDB and GlusterFS Samba packages GlusterFS Samba pacakges can be downloaded from http://download.gluster.org/pub/gluster/glusterfs/samba/ Demo
  33. 33. 05/17/16 CTDB Setup Create n-way replicated CTDB volume n – Number of nodes that will be used as samba server Replace META=all to META=ctdb in the below files on all the nodes Start the ctdb volume # gluster volume start ctdb # gluster volume create ctdb replica 4 host1:/mnt/brick1/ctdb  host2:/mnt/brick1/ctdb host3:/mnt/brick1/ctdb host4:/mnt/brick1/ctdb /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh /var/lib/glusterd/hooks/1/stop/pre/S29CTDB­teardown.sh Demo
  34. 34. 05/17/16 CTDB Setup On volume start following entries are created in /etc/samba/smb.conf CTDB configuration files are stored on all the nodes used as Samba server Create /etc/ctdb/nodes file on all the nodes that is used by Samba server clustering = yes idmap backend = tdb2 /etc/sysconfig/ctdb 192.168.8.100 192.168.8.101 192.168.8.102 192.168.8.103 Demo
  35. 35. 05/17/16 CTDB Setup For IP failover create /etc/ctdb/public_addresses file on all the nodes Add virtual IPs that CTDB should create in this file <Virtual IP>/<routing prefix><node interface> e.g. 192.168.1.20/24 eth0 192.168.1.21/24 eth0 Demo
  36. 36. 05/17/16 Sharing Volumes over Samba Set following options to gluster volume Edit /etc/glusterfs/glusterd.vol in each Node and add the following Restart glusterd service on each Node Set following options to gluster volume # gluster volume set <volname> stat­prefetch off # gluster volume set <volname> server.allow­insecure on option rpc­auth­allow­insecure on # gluster volume set <volname> storage.batch­fsync­delay­usec 0 Demo
  37. 37. 05/17/16 Sharing Volumes over Samba On GlusterFS volume start following entry will be added to /etc/samba/smb.conf Start SMBD Specify the SMB password. This password is used during the SMB mount [gluster­VOLNAME] comment = For samba share of volume VOLNAME vfs objects = glusterfs glusterfs:volume = VOLNAME glusterfs:logfile = /var/log/samba/VOLNAME.log glusterfs:loglevel = 7 path = / read only = no guest ok = yes # systemctl start smb # smbpasswd ­a username Demo
  38. 38. 05/17/16 Mounting Volumes using SMB Mount from Windows system Mount from Linux system # net use <drive letter> <virtual IP>gluster­VOLNAME e.g. # net use Z: 192.168.1.20gluster­dist­vol # mount ­t cifs <virtual IP>gluster­VOLNAME /mnt/cifs e.g. # mount ­t cifs 192.168.1.20gluster­dist­vol /mnt/cifs Demo
  39. 39. 05/17/16 Troubleshooting Log files Following command will give you log file location Log dir will contain logs for each GlusterFS process glusterd - /var/log/glusterfs/etc­glusterfs­glusterd.vol.log Bricks - /var/log/glusterfs/bricks/<path extraction of brick path>.log Cli - /var/log/glusterfs/cmd_history.log Rebalance - /var/log/glusterfs/VOLNAME­rebalance.log Self-Heal Daemon (SHD) - /var/log/glusterfs/glustershd.log Quota - /var/log/glusterfs/quotad.log # gluster –print­logdir
  40. 40. 05/17/16 Troubleshooting Log files Log dir will contain logs for each GlusterFS process NFS - /var/log/glusterfs/nfs.log Samba - /var/log/samba/glusterfs­VOLNAME­<ClientIP>.log NFS-Ganesha - /var/log/nfs­ganesha.log Fuse Mount - /var/log/glusterfs/<mountpoint path extraction>.log Geo-replication - /var/log/glusterfs/geo­replication/<master> Volume status # gluster volume status [volname]
  41. 41. 05/17/16 Troubleshooting Connectivity issues Check network connectivity Check all necessary GlusterFS processes are running Check Firewall rules
  42. 42. 05/17/16 Troubleshooting – Split Brain Is a scenario where in a replicate volume GlusterFS is not in a position to determine the correct copy of file Three different types of split-brain Data split-brain Metadata split-brain Entry split-brain The only way to resolve split-brains is by manually inspecting the file contents from the backend and deciding which is the true copy
  43. 43. 05/17/16 Troubleshooting – Preventing Split Brain Configuring Server-Side Quorum Number of server failures that the trusted storage pool can sustain Server quorum can be by volume option All bricks on the node are brought down in case server-side quorum is not met # gluster volume set all cluster.server­quorum­ratio <Percentage> e.g. # gluster volume set all cluster.server­quorum­ratio 51%
  44. 44. Client-side Quorum Replica Brick Brick Read-only Distribute Replica Brick Brick Replica Brick Brick File1 File2 File3
  45. 45. 05/17/16 Troubleshooting – Preventing Split Brain Configuring Client-Side Quorum Determines number of bricks that must be up for allowing data modification Files will become read-only in case of quorum failure Two types of client-side quorum Fixed – fixed number of bricks should be up Auto – Quorum conditions are determined by GlusterFS # gluster volume set all cluster.quorum­type <fixed | auto> # gluster volume set all cluster.quorum­count <count>
  46. 46. 05/17/16 Troubleshooting – Preventing Split Brain Configuring Client-Side Quorum Auto quorum type At least n/2 brick needs to be up, where is n is the replica count If n is even and exactly n/2 bricks are up then first brick of the replica set should be up
  47. 47. 05/17/16 Community IRC channels: #gluster – For any gluster usage or related discussions #gluster-dev – For any gluster development related discussions #gluster-meeting – To attend the weekly meeting and bug triage Mailing lists: gluster-users@gluster.org - For any user queries or related discussions gluster-devel@gluster.org - For any gluster development related queries/discussions
  48. 48. 05/17/16 References www.gluster.org https://gluster.readthedocs.org/en/latest/ https://github.com/gluster/gluster-tutorial http://images.gluster.org/vmimages
  49. 49. 05/17/16 Thanks

Editor's Notes

  • Repos for other major distributions will be available at http://download.gluster.org/pub/gluster/glusterfs/LATEST
  • Configure firewall based on what all feature being used by GlusterFS
  • Most of these commands have various options by which we can improve performance. Based on your workloads carefully select various options.
  • When shrinking distributed replicated volumes, the number of bricks being removed must be a
    multiple of the replica count. For example, to shrink a distributed replicated volume with a
    replica count of 2, you need to remove bricks in multiples of 2 (such as 4, 6, 8, etc.). In
    addition, the bricks you are removing must be from the same sub-volume (the same replica
    set). In a non-replicated volume, all bricks must be available in order to migrate data and
    perform the remove brick operation. In a replicated volume, at least one of the bricks in the
    replica must be available
  • #gluster volume heal &amp;lt;VOLNAME&amp;gt; #trigger self-healing only on the files which require healing
    #gluster volume heal &amp;lt;VOLNAME&amp;gt; full #trigger self-healing on all the files on a volume
    #gluster volume heal &amp;lt;VOLNAME&amp;gt; info #view the list of files that need healing
  • Rpc-auth-allow-insecure allows SMBD to talk to gluster bricks on unprivileged ports
  • cluster.quorum-count → If quorum-type is &amp;quot;fixed&amp;quot; only allow writes if this many bricks or present

×