Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GlusterFS CTDB Integration


Published on

Published in: Technology

GlusterFS CTDB Integration

  1. 1. Red Hat K.K. All rights reserved.GlusterFS / CTDB Integrationv1.0 2013.05.14Etsuji NakaiSenior Solution ArchitectRed Hat K.K.
  2. 2. Red Hat K.K. All rights reserved. 2$ who am iEtsuji Nakai (@enakai00)●Senior solution architect and cloud evangelist atRed Hat K.K.●The author of “Professional Linux Systems” series.●Available in Japanese. Translation offering frompublishers are welcomed ;-)Professional Linux SystemsTechnology for Next DecadeProfessional Linux SystemsDeployment and ManagementProfessional Linux SystemsNetwork Management
  3. 3. Red Hat K.K. All rights reserved. 3ContentsCTDB OverviewWhy does CTDB matter?CTDB split-brain resolutionConfiguration steps for demo set-upSummary
  4. 4. Red Hat K.K. All rights reserved. 4DisclaimerThis document explains how to setup clustered Samba server using GlusterFS and CTDBwith the following software components.●Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone)●GlusterFS: GlusterFS 3.3.1 (Community version)●Since this is based on the community version of GlusterFS, you cannot receive a commercialsupport from Red Hat for this configuration. If you need a commercial support, pleaseconsider using Red Hat Storage Server(RHS). In addition, there are different conditions fora supportable configuration with RHS. Please consult sales representatives from Red Hatfor details.Red Hat accepts no liability for the content of this document, or for the consequences ofany actions taken on the basis of the information provided. Any views or opinionspresented in this document are solely those of the author and do not necessarily representthose of Red Hat.
  5. 5. Red Hat K.K. All rights reserved.CTDB Overview
  6. 6. Red Hat K.K. All rights reserved. 6Whats CTDB?TDB = Trivial Database●Simple backend DB for Samba, used to store user info, file lock info, etc...CTDB = Clustered TDB●Cluster extension of TDB, necessary formultiple Samba hosts configuration toprovide the same filesystem contents.All clients see the same contentsthrough different Samba hosts.Samba Samba Samba・・・Shared Filesystem
  7. 7. Red Hat K.K. All rights reserved. 7Whats wrong without CTDB?Windows file locks are not shared among Samba hosts.●You would see the following alert when someone is opening the same file.●Without CTDB, if others are opening the samefile through a different Samba host from you,you never see that alert.●This is because file lock info is stored in thelocal TDB if you dont use CTDB.●CTDB was initially developed as a shared TDBfor multiple Samba hosts to overcome file locksare not shared.Locked! Locked!
  8. 8. Red Hat K.K. All rights reserved. 8CTDB interconnect(heartbeat) networkYet another benefit of CTDBFloating IPs can be assigned across hosts for the transparent failover.●When one of the hosts fails, the floating IP is moved to another host.●Mutual health checking is done through the CTDB interconnect (so called“heartbeat”) network.●CTDB can also be used for NFS server cluster to provide the floating IPfeature. (CTDB doesnt provide shared file locking for NFS though.)Floating IP#1・・・Floating IP#2 Floating IP#NFloating IP#1・・・Floating IP#2 Floating IP#NFloating IP#1
  9. 9. Red Hat K.K. All rights reserved.Why does CTDB matter?
  10. 10. Red Hat K.K. All rights reserved. 10Access path of GlusterFS native clientThe native client directly communicates to all storage nodes.●Transparent failover is implemented on the client side. When the clientdetects the node failure, it accesses the replicated node.●Floating IP is unnecessary by design for the native client.file01 file02 file03・・・GlusterFS Storage Nodesfile01, file02, file03GlusterFSNative ClientGlusterFS VolumeNative client sees the volumeas a single filesystemThe real locations of files arecalculated on the client side.
  11. 11. Red Hat K.K. All rights reserved. 11CIFS/NFS usecase for GlusterFSThe downside of the native client is its not available for Unix/Windows.●You need to rely on CIFS/NFS for Unix/Windows clients.●In that case, windows file lock sharing and floating IP feature are not inGlusterFS. It should be provided with an external tool.CTDB is the tool for it ;-)・・・CIFS/NFS ClientCIFS/NFS client connects tojust one specified node.GlusterFS storage node actsas a proxy “client”.Different clients can connect todifferent nodes.DNS round-robin may work for it.
  12. 12. Red Hat K.K. All rights reserved. 12Network topology overview without CTDBStorage NodesCIFS/NFS ClientsGlusterFS interconnectCIFS/NFS Access segment...If you dont need the floating IP/Windows file lock, you can go without CTDB.●NFS file lock sharing (DNLM) is provided by GlusterFSs internal NFS server.Although its not mandatory, you can separate CIFS/NFS access segment fromthe GlusterFS interconnect for the sake of network performance.Samba Samba Samba Sambaglusterd glusterd glusterd glusterd
  13. 13. Red Hat K.K. All rights reserved. 13Network topology overview with CTDBStorage NodesCIFS/NFS ClientsGlusterFS interconnectCIFS/NFS access segment...If you use CTDB with GlusterFS, you need to add an independent CTDBinterconnect (heartbeat) segment for the reliable cluster.●The reason will be explained later.CTDB interconnect(Heartbeat)
  14. 14. Red Hat K.K. All rights reserved. 14Demo - Seeing is believing!
  15. 15. Red Hat K.K. All rights reserved.CTDB split-brain resolution
  16. 16. Red Hat K.K. All rights reserved. 16Whats CTDB split-brain?When heartbeat is cut-off from any reason (possibly network problem) while cluster nodesare still running, there must be some mechanism to choose which "island" should surviveand keep running.●Without this mechanism, the same floating IPs are assigned on both islands. This is not specificto CTDB, every cluster system in the world needs to take care of the “split-brain”.In the case of CTDB, a master node is elected though the "lock file" on the sharedfilesystem. An island with the master node survives. Especially, in the case of GlusterFS,the lock file is stored on the dedicated GlusterFS volume, called "lock volume".●The lock volume is locally mounted on each storage node. If you share the CTDB interconnect withGlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut-off, resulting in an unpredictable condition.Storage NodesGlusterFS interconnectCTDB interconnect(Heartbeat)Lock VolumeMasterThe master takes an exclusivelock on the lock file.
  17. 17. Red Hat K.K. All rights reserved. 17Typical volume config seen from storage node# dfFilesystem           1K­blocks      Used Available Use% Mounted on/dev/vda3              2591328   1036844   1422852  43% /tmpfs                   510288         0    510288   0% /dev/shm/dev/vda1               495844     33450    436794   8% /boot/dev/mapper/vg_bricks­lv_lock                         60736      3556     57180   6% /bricks/lock/dev/mapper/vg_bricks­lv_brick01                       1038336     33040   1005296   4% /bricks/brick01localhost:/lockvol      121472      7168    114304   6% /gluster/locklocalhost:/vol01       2076672     66176   2010496   4% /gluster/vol01# ls ­l /gluster/lock/total 2­rw­r­­r­­. 1 root root 294 Apr 26 15:43 ctdb­rw­­­­­­­. 1 root root   0 Apr 26 15:57 lockfile­rw­r­­r­­. 1 root root  52 Apr 26 15:56 nodes­rw­r­­r­­. 1 root root  96 Apr 26 15:04 public_addresses­rw­r­­r­­. 1 root root 218 Apr 26 16:31 smb.confLocally mountedlock volume.Locally mounted data volume,exported with Samba.Lock file to elect the master.Common config files can beplaced on the lock volume.
  18. 18. Red Hat K.K. All rights reserved. 18What about sharing CTDB interconnect withthe access segment?No, it doesnt work.When NIC for the access segment fails, the cluster detects the heartbeat failureand elects a master node through the lock file on the shared volume. However ifthe NIC failed node has the lock, it becomes the master although it doesnt serveto clients.●In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDBUNHEALTHY" status, too.
  19. 19. Red Hat K.K. All rights reserved. 19CTDB event monitoringCTDB provides a custom event monitoring mechanism which can be used tomonitor application status, NIC status, etc...●Monitoring scripts are stored in /etc/ctdb/events.d/●They need to implement handlers to pre-defined events.●They are called in the order of file name when some event occurs.●Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of somescript exits with non-zero return code, the node becomes "UNHEALTHY", and will berejected from the cluster.●For example, “10.interface” checks the link status of NIC on which floating IP isassigned.●See README for details - ls /etc/ctdb/events.d/00.ctdb       11.natgw           20.multipathd  41.httpd  61.nfstickle01.reclock    11.routing         31.clamd       50.samba  70.iscsi10.interface  13.per_ip_routing  40.vsftpd      60.nfs    91.lvs
  20. 20. Red Hat K.K. All rights reserved.Configuration steps for demo set-up
  21. 21. Red Hat K.K. All rights reserved. 21Step1 – Install RHEL6.4Install RHEL6.4 on storage nodes.●Scalable File System Add-On is required for XFS.●Resilient Storage Add-On is required for CTDB packages.Configure public key ssh authentication between nodes.●This is for an administrative purpose.Configure network interfaces as in the configuration pages.  gluster01192.168.122.12  gluster02192.168.122.13  gluster03192.168.122.14  gluster04192.168.2.11    gluster01c192.168.2.12    gluster02c192.168.2.13    gluster03c192.168.2.14    gluster04c192.168.1.11    gluster01g192.168.1.12    gluster02g192.168.1.13    gluster03g192.168.1.14    gluster04g/etc/hostsNFS/CIFS Access SegmentCTDB InterconnectGlusterFS Interconnect
  22. 22. Red Hat K.K. All rights reserved. 22Step1 – Install RHEL6.4Configure iptables on all nodes*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]­A INPUT ­m state ­­state ESTABLISHED,RELATED ­j ACCEPT­A INPUT ­p icmp ­j ACCEPT­A INPUT ­i lo ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 22 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 111 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 139 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 445 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 24007:24050 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 38465:38468 ­j ACCEPT­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 4379 ­j ACCEPT­A INPUT ­j REJECT ­­reject­with icmp­host­prohibited­A FORWARD ­j REJECT ­­reject­with icmp­host­prohibitedCOMMIT/etc/sysconfig/iptables# vi /etc/sysconfig/iptables# service iptables restartCTDBCIFSportmapNFS/NLMBricksCIFS
  23. 23. Red Hat K.K. All rights reserved. 23Step2 – Prepare bricksCreate and mount brick directories on all nodes.# pvcreate /dev/vdb# vgcreate vg_bricks /dev/vdb# lvcreate ­n lv_lock ­L 64M vg_bricks# lvcreate ­n lv_brick01 ­L 1G vg_bricks# yum install ­y xfsprogs# mkfs.xfs ­i size=512 /dev/vg_bricks/lv_lock # vi mkfs.xfs ­i size=512 /dev/vg_bricks/lv_brick01# echo /dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0 >> /etc/fstab# echo /dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0 >> /etc/fstab# mkdir ­p /bricks/lock# mkdir ­p /bricks/brick01# mount /bricks/lock# mount /bricksr/brick01/dev/vdblv_locklv_brick01vg_bricksMount on /bricks/lock, used for lock volume.Mount on /bricks/brick01, used for data volume.
  24. 24. Red Hat K.K. All rights reserved. 24Step3 – Install GlusterFS and create volumesInstall GlusterFS packages on all nodes# wget ­O /etc/yum.repos.d/glusterfs­epel.repo­epel.repo# yum install ­y rpcbind glusterfs­server# chkconfig rpcbind on# service rpcbind start# service glusterd start# gluster peer probe gluster02g# gluster peer probe gluster03g# gluster peer probe gluster04g# gluster vol create lockvol replica 2     gluster01g:/bricks/lock gluster02g:/bricks/lock     gluster03g:/bricks/lock gluster04g:/bricks/lock# gluster vol start lockvol# gluster vol create vol01 replica 2     gluster01g:/bricks/brick01 gluster02g:/bricks/brick01     gluster03g:/bricks/brick01 gluster04g:/bricks/brick01# gluster vol start vol01Do not auto start glusterdwith chkconfig.Need to specifyGlusterFS interconnect NICs.Configure cluster and create volumes from gluster01
  25. 25. Red Hat K.K. All rights reserved. 25Step4 – Install and configure Samba/CTDB●Create the following config files on the shared volume.# yum install ­y samba samba­client ctdb# mkdir ­p /gluster/lock# mount ­t glusterfs localhost:/lockvol /gluster/lockDo not auto start smband ctdb with chkconfig.CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addressesCTDB_NODES=/etc/ctdb/nodes# Only when using Samba. Unnecessary for NFS.CTDB_MANAGES_SAMBA=yes# some tunablesCTDB_SET_DeterministicIPs=1CTDB_SET_RecoveryBanPeriod=120CTDB_SET_KeepaliveInterval=5CTDB_SET_KeepaliveLimit=5CTDB_SET_MonitorInterval=15/gluster/lock/ctdb# yum install ­y rpcbind nfs­utils# chkconfig rpcbind on# service rpcbind startInstall Samba/CTDB packages on all nodesIf you use NFS, install the following packages, too.Configure CTDB and Samba only on gluster01
  26. 26. Red Hat K.K. All rights reserved. 26Step4 – Install and configure Samba/CTDB192.168.2.11192.168.2.12192.168.2.13192.168.2.14/gluster/lock/nodes192.168.122.201/24 eth0192.168.122.202/24 eth0192.168.122.203/24 eth0192.168.122.204/24 eth0/gluster/lock/public_addresses[global]workgroup = MYGROUPserver string = Samba Server Version %vclustering = yessecurity = userpassdb backend = tdbsam[share]comment = Shared Directoriespath = /gluster/vol01browseable = yeswritable = yes/gluster/lock/smb.confCTDB cluster nodes.Need to specify CTDB interconnect NICs.Floating IP list.Samba config.Need to specify “clustering = yes”
  27. 27. Red Hat K.K. All rights reserved. 27Step4 – Install and configure Samba/CTDBSet SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location.●Wed better set an appropriate seculity context, but theres an open issue for using chcon withGlusterFS.● mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig# mv /etc/samba/smb.conf /etc/samba/smb.conf.orig# ln ­s /gluster/lock/ctdb /etc/sysconfig/ctdb# ln ­s /gluster/lock/nodes /etc/ctdb/nodes# ln ­s /gluster/lock/public_addresses /etc/ctdb/public_addresses# ln ­s /gluster/lock/smb.conf /etc/samba/smb.conf# yum install ­y policycoreutils­python# semanage permissive ­a smbd_tCreate symlink to config files on all nodes.
  28. 28. Red Hat K.K. All rights reserved. 28Step4 – Install and configure Samba/CTDBCreate the following script for start/stop services#!/bin/shfunction runcmd {        echo exec on all nodes: $@        ssh gluster01 $@ &        ssh gluster02 $@ &        ssh gluster03 $@ &        ssh gluster04 $@ &        wait}case $1 in    start)        runcmd service glusterd start        sleep 1        runcmd mkdir ­p /gluster/lock        runcmd mount ­t glusterfs localhost:/lockvol /gluster/lock        runcmd mkdir ­p /gluster/vol01        runcmd mount ­t glusterfs localhost:/vol01 /gluster/vol01        runcmd service ctdb start        ;;    stop)        runcmd service ctdb stop        runcmd umount /gluster/lock        runcmd umount /gluster/vol01        runcmd service glusterd stop        Runcmd pkill glusterfs        ;;
  29. 29. Red Hat K.K. All rights reserved. 29Step5 – Start servicesNow you can start/stop services.●After a few moments, ctdb status becomes “OK” for all nodes.●And floating IPs are configured on each node.# ./ start# ctdb statusNumber of nodes:4pnn:0     OK (THIS NODE)pnn:1     OKpnn:2     OKpnn:3     OKGeneration:1489978381Size:4hash:0 lmaster:0hash:1 lmaster:1hash:2 lmaster:2hash:3 lmaster:3Recovery mode:NORMAL (0)Recovery master:1# ctdb ipPublic IPs on node 0192.168.122.201 node[3] active[] available[eth0] configured[eth0] node[2] active[] available[eth0] configured[eth0] node[1] active[] available[eth0] configured[eth0] node[0] active[eth0] available[eth0] configured[eth0]
  30. 30. Red Hat K.K. All rights reserved. 30Step5 – Start servicesSet samba password and check shared directories via one of floating IPs.# pdbedit ­a ­u rootnew password:retype new password:# smbclient ­L ­U rootEnter roots password: Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]Sharename       Type      Comment­­­­­­­­­       ­­­­      ­­­­­­­share           Disk      Shared DirectoriesIPC$            IPC       IPC Service (Samba Server Version 3.6.9­151.el6)Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]Server               Comment­­­­­­­­­            ­­­­­­­Workgroup            Master­­­­­­­­­            ­­­­­­­Password DB is sharedby all hosts in the cluster.
  31. 31. Red Hat K.K. All rights reserved. 31Configuration hintsTo specify the GlusterFS interconnect segment, "gluster peer probe" should be done forthe IP addresses on that segment.To specify the CTDB interconnect segment, IP addresses on that segment should bespecified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").To specify the NFS/CIFS access segment, NIC names on that segment should be specified in"/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associatedwith floating IPs.To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc-auth-reject” volume options. (reject supersedes allow.)The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") maybe useful for adjusting the CTDB failover timings. See the ctdbd man page for details.●CTDB_SET_DeterministicIPs=1●CTDB_SET_RecoveryBanPeriod=300●CTDB_SET_KeepaliveInterval=5●CTDB_SET_KeepaliveLimit=5●CTDB_SET_MonitorInterval=15
  32. 32. Red Hat K.K. All rights reserved.Summary
  33. 33. Red Hat K.K. All rights reserved. 33SummaryCTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.Network design is crucial to realize the reliable cluster, not only forCTDB but also for every cluster in the world ;-)Enjoy!And one important fine print....●Samba is not well tested on the large scale GlusterFS cluster. The use ofCIFS as a primary access protocol on Red Hat Storage Server 2.0 is notofficially supported by Red Hat. This will be improved in the future versions.