SlideShare a Scribd company logo
1 of 34
Download to read offline
Red Hat K.K. All rights reserved.
GlusterFS / CTDB Integration
v1.0 2013.05.14
Etsuji Nakai
Senior Solution Architect
Red Hat K.K.
Red Hat K.K. All rights reserved. 2
$ who am i

Etsuji Nakai (@enakai00)
●
Senior solution architect and cloud evangelist at
Red Hat K.K.
●
The author of “Professional Linux Systems” series.
●
Available in Japanese. Translation offering from
publishers are welcomed ;-)
Professional Linux Systems
Technology for Next Decade
Professional Linux Systems
Deployment and Management
Professional Linux Systems
Network Management
Red Hat K.K. All rights reserved. 3
Contents

CTDB Overview

Why does CTDB matter?

CTDB split-brain resolution

Configuration steps for demo set-up

Summary
Red Hat K.K. All rights reserved. 4
Disclaimer

This document explains how to setup clustered Samba server using GlusterFS and CTDB
with the following software components.
●
Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone)
●
GlusterFS: GlusterFS 3.3.1 (Community version)
●
http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/

Since this is based on the community version of GlusterFS, you cannot receive a commercial
support from Red Hat for this configuration. If you need a commercial support, please
consider using Red Hat Storage Server(RHS). In addition, there are different conditions for
a supportable configuration with RHS. Please consult sales representatives from Red Hat
for details.

Red Hat accepts no liability for the content of this document, or for the consequences of
any actions taken on the basis of the information provided. Any views or opinions
presented in this document are solely those of the author and do not necessarily represent
those of Red Hat.
Red Hat K.K. All rights reserved.
CTDB Overview
Red Hat K.K. All rights reserved. 6
What's CTDB?

TDB = Trivial Database
●
Simple backend DB for Samba, used to store user info, file lock info, etc...

CTDB = Clustered TDB
●
Cluster extension of TDB, necessary for
multiple Samba hosts configuration to
provide the same filesystem contents.
All clients see the same contents
through different Samba hosts.
Samba Samba Samba
・・・
Shared Filesystem
Red Hat K.K. All rights reserved. 7
What's wrong without CTDB?

Windows file locks are not shared among Samba hosts.
●
You would see the following alert when someone is opening the same file.
●
Without CTDB, if others are opening the same
file through a different Samba host from you,
you never see that alert.
●
This is because file lock info is stored in the
local TDB if you don't use CTDB.
●
CTDB was initially developed as a shared TDB
for multiple Samba hosts to overcome this
problem.
xxx.xls
Windows file locks
are not shared.
Locked! Locked!
Red Hat K.K. All rights reserved. 8
CTDB interconnect
(heartbeat) network
Yet another benefit of CTDB

Floating IP's can be assigned across hosts for the transparent failover.
●
When one of the hosts fails, the floating IP is moved to another host.
●
Mutual health checking is done through the CTDB interconnect (so called
“heartbeat”) network.
●
CTDB can also be used for NFS server cluster to provide the floating IP
feature. (CTDB doesn't provide shared file locking for NFS though.)
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
・・・
Floating IP#2 Floating IP#N
Floating IP#1
Red Hat K.K. All rights reserved.
Why does CTDB matter?
Red Hat K.K. All rights reserved. 10
Access path of GlusterFS native client

The native client directly communicates to all storage nodes.
●
Transparent failover is implemented on the client side. When the client
detects the node failure, it accesses the replicated node.
●
Floating IP is unnecessary by design for the native client.
file01 file02 file03
・・・
GlusterFS Storage Nodes
file01, file02, file03
GlusterFS
Native Client
GlusterFS Volume
Native client sees the volume
as a single filesystem
The real locations of files are
calculated on the client side.
Red Hat K.K. All rights reserved. 11
CIFS/NFS usecase for GlusterFS

The downside of the native client is it's not available for Unix/Windows.
●
You need to rely on CIFS/NFS for Unix/Windows clients.
●
In that case, windows file lock sharing and floating IP feature are not in
GlusterFS. It should be provided with an external tool.

CTDB is the tool for it ;-)
・・・
CIFS/NFS Client
CIFS/NFS client connects to
just one specified node.
GlusterFS storage node acts
as a proxy “client”.
Different clients can connect to
different nodes.
DNS round-robin may work for it.
Red Hat K.K. All rights reserved. 12
Network topology overview without CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS Access segment
...

If you don't need the floating IP/Windows file lock, you can go without CTDB.
●
NFS file lock sharing (DNLM) is provided by GlusterFS's internal NFS server.

Although it's not mandatory, you can separate CIFS/NFS access segment from
the GlusterFS interconnect for the sake of network performance.
Samba Samba Samba Samba
glusterd glusterd glusterd glusterd
Red Hat K.K. All rights reserved. 13
Network topology overview with CTDB
Storage Nodes
CIFS/NFS Clients
GlusterFS interconnect
CIFS/NFS access segment
...

If you use CTDB with GlusterFS, you need to add an independent CTDB
interconnect (heartbeat) segment for the reliable cluster.
●
The reason will be explained later.
CTDB interconnect
(Heartbeat)
Red Hat K.K. All rights reserved. 14
Demo - Seeing is believing!
http://www.youtube.com/watch?v=kr8ylOBCn8o
Red Hat K.K. All rights reserved.
CTDB split-brain resolution
Red Hat K.K. All rights reserved. 16
What's CTDB split-brain?

When heartbeat is cut-off from any reason (possibly network problem) while cluster nodes
are still running, there must be some mechanism to choose which "island" should survive
and keep running.
●
Without this mechanism, the same floating IP's are assigned on both islands. This is not specific
to CTDB, every cluster system in the world needs to take care of the “split-brain”.

In the case of CTDB, a master node is elected though the "lock file" on the shared
filesystem. An island with the master node survives. Especially, in the case of GlusterFS,
the lock file is stored on the dedicated GlusterFS volume, called "lock volume".
●
The lock volume is locally mounted on each storage node. If you share the CTDB interconnect with
GlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut-
off, resulting in an unpredictable condition.
Storage Nodes
GlusterFS interconnect
CTDB interconnect
(Heartbeat)
Lock Volume
Master
The master takes an exclusive
lock on the lock file.
Red Hat K.K. All rights reserved. 17
Typical volume config seen from storage node
# df
Filesystem           1K­blocks      Used Available Use% Mounted on
/dev/vda3              2591328   1036844   1422852  43% /
tmpfs                   510288         0    510288   0% /dev/shm
/dev/vda1               495844     33450    436794   8% /boot
/dev/mapper/vg_bricks­lv_lock
                         60736      3556     57180   6% /bricks/lock
/dev/mapper/vg_bricks­lv_brick01
                       1038336     33040   1005296   4% /bricks/brick01
localhost:/lockvol      121472      7168    114304   6% /gluster/lock
localhost:/vol01       2076672     66176   2010496   4% /gluster/vol01
# ls ­l /gluster/lock/
total 2
­rw­r­­r­­. 1 root root 294 Apr 26 15:43 ctdb
­rw­­­­­­­. 1 root root   0 Apr 26 15:57 lockfile
­rw­r­­r­­. 1 root root  52 Apr 26 15:56 nodes
­rw­r­­r­­. 1 root root  96 Apr 26 15:04 public_addresses
­rw­r­­r­­. 1 root root 218 Apr 26 16:31 smb.conf
Locally mounted
lock volume.
Locally mounted data volume,
exported with Samba.
Lock file to elect the master.
Common config files can be
placed on the lock volume.
Red Hat K.K. All rights reserved. 18
What about sharing CTDB interconnect with
the access segment?

No, it doesn't work.

When NIC for the access segment fails, the cluster detects the heartbeat failure
and elects a master node through the lock file on the shared volume. However if
the NIC failed node has the lock, it becomes the master although it doesn't serve
to clients.
●
In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDB
UNHEALTHY" status, too.
Red Hat K.K. All rights reserved. 19
CTDB event monitoring

CTDB provides a custom event monitoring mechanism which can be used to
monitor application status, NIC status, etc...
●
Monitoring scripts are stored in /etc/ctdb/events.d/
●
They need to implement handlers to pre-defined events.
●
They are called in the order of file name when some event occurs.
●
Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of some
script exits with non-zero return code, the node becomes "UNHEALTHY", and will be
rejected from the cluster.
●
For example, “10.interface” checks the link status of NIC on which floating IP is
assigned.
●
See README for details - http://bit.ly/14KOjlC
# ls /etc/ctdb/events.d/
00.ctdb       11.natgw           20.multipathd  41.httpd  61.nfstickle
01.reclock    11.routing         31.clamd       50.samba  70.iscsi
10.interface  13.per_ip_routing  40.vsftpd      60.nfs    91.lvs
Red Hat K.K. All rights reserved.
Configuration steps for demo set-up
Red Hat K.K. All rights reserved. 21
Step1 – Install RHEL6.4

Install RHEL6.4 on storage nodes.
●
Scalable File System Add-On is required for XFS.
●
Resilient Storage Add-On is required for CTDB packages.

Configure public key ssh authentication between nodes.
●
This is for an administrative purpose.

Configure network interfaces as in the configuration pages.
192.168.122.11  gluster01
192.168.122.12  gluster02
192.168.122.13  gluster03
192.168.122.14  gluster04
192.168.2.11    gluster01c
192.168.2.12    gluster02c
192.168.2.13    gluster03c
192.168.2.14    gluster04c
192.168.1.11    gluster01g
192.168.1.12    gluster02g
192.168.1.13    gluster03g
192.168.1.14    gluster04g
/etc/hosts
NFS/CIFS Access Segment
CTDB Interconnect
GlusterFS Interconnect
Red Hat K.K. All rights reserved. 22
Step1 – Install RHEL6.4

Configure iptables on all nodes
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
­A INPUT ­m state ­­state ESTABLISHED,RELATED ­j ACCEPT
­A INPUT ­p icmp ­j ACCEPT
­A INPUT ­i lo ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 22 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 111 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 139 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 445 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 24007:24050 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 38465:38468 ­j ACCEPT
­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 4379 ­j ACCEPT
­A INPUT ­j REJECT ­­reject­with icmp­host­prohibited
­A FORWARD ­j REJECT ­­reject­with icmp­host­prohibited
COMMIT
/etc/sysconfig/iptables
# vi /etc/sysconfig/iptables
# service iptables restart
CTDB
CIFS
portmap
NFS/NLM
Bricks
CIFS
Red Hat K.K. All rights reserved. 23
Step2 – Prepare bricks

Create and mount brick directories on all nodes.
# pvcreate /dev/vdb
# vgcreate vg_bricks /dev/vdb
# lvcreate ­n lv_lock ­L 64M vg_bricks
# lvcreate ­n lv_brick01 ­L 1G vg_bricks
# yum install ­y xfsprogs
# mkfs.xfs ­i size=512 /dev/vg_bricks/lv_lock 
# vi mkfs.xfs ­i size=512 /dev/vg_bricks/lv_brick01
# echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab
# echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab
# mkdir ­p /bricks/lock
# mkdir ­p /bricks/brick01
# mount /bricks/lock
# mount /bricksr/brick01
/dev/vdb
lv_lock
lv_brick01
vg_bricks
Mount on /bricks/lock, used for lock volume.
Mount on /bricks/brick01, used for data volume.
Red Hat K.K. All rights reserved. 24
Step3 – Install GlusterFS and create volumes

Install GlusterFS packages on all nodes
# wget ­O /etc/yum.repos.d/glusterfs­epel.repo 
  http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/RHEL/glusterfs­epel.repo
# yum install ­y rpcbind glusterfs­server
# chkconfig rpcbind on
# service rpcbind start
# service glusterd start
# gluster peer probe gluster02g
# gluster peer probe gluster03g
# gluster peer probe gluster04g
# gluster vol create lockvol replica 2 
    gluster01g:/bricks/lock gluster02g:/bricks/lock 
    gluster03g:/bricks/lock gluster04g:/bricks/lock
# gluster vol start lockvol
# gluster vol create vol01 replica 2 
    gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 
    gluster03g:/bricks/brick01 gluster04g:/bricks/brick01
# gluster vol start vol01
Do not auto start glusterd
with chkconfig.
Need to specify
GlusterFS interconnect NICs.

Configure cluster and create volumes from gluster01
Red Hat K.K. All rights reserved. 25
Step4 – Install and configure Samba/CTDB
●
Create the following config files on the shared volume.
# yum install ­y samba samba­client ctdb
# mkdir ­p /gluster/lock
# mount ­t glusterfs localhost:/lockvol /gluster/lock
Do not auto start smb
and ctdb with chkconfig.
CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addresses
CTDB_NODES=/etc/ctdb/nodes
# Only when using Samba. Unnecessary for NFS.
CTDB_MANAGES_SAMBA=yes
# some tunables
CTDB_SET_DeterministicIPs=1
CTDB_SET_RecoveryBanPeriod=120
CTDB_SET_KeepaliveInterval=5
CTDB_SET_KeepaliveLimit=5
CTDB_SET_MonitorInterval=15
/gluster/lock/ctdb
# yum install ­y rpcbind nfs­utils
# chkconfig rpcbind on
# service rpcbind start

Install Samba/CTDB packages on all nodes

If you use NFS, install the following packages, too.

Configure CTDB and Samba only on gluster01
Red Hat K.K. All rights reserved. 26
Step4 – Install and configure Samba/CTDB
192.168.2.11
192.168.2.12
192.168.2.13
192.168.2.14
/gluster/lock/nodes
192.168.122.201/24 eth0
192.168.122.202/24 eth0
192.168.122.203/24 eth0
192.168.122.204/24 eth0
/gluster/lock/public_addresses
[global]
workgroup = MYGROUP
server string = Samba Server Version %v
clustering = yes
security = user
passdb backend = tdbsam
[share]
comment = Shared Directories
path = /gluster/vol01
browseable = yes
writable = yes
/gluster/lock/smb.conf
CTDB cluster nodes.
Need to specify CTDB interconnect NICs.
Floating IP list.
Samba config.
Need to specify “clustering = yes”
Red Hat K.K. All rights reserved. 27
Step4 – Install and configure Samba/CTDB

Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location.
●
We'd better set an appropriate seculity context, but there's an open issue for using chcon with
GlusterFS.
●
https://bugzilla.redhat.com/show_bug.cgi?id=910380
# mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig
# mv /etc/samba/smb.conf /etc/samba/smb.conf.orig
# ln ­s /gluster/lock/ctdb /etc/sysconfig/ctdb
# ln ­s /gluster/lock/nodes /etc/ctdb/nodes
# ln ­s /gluster/lock/public_addresses /etc/ctdb/public_addresses
# ln ­s /gluster/lock/smb.conf /etc/samba/smb.conf
# yum install ­y policycoreutils­python
# semanage permissive ­a smbd_t

Create symlink to config files on all nodes.
Red Hat K.K. All rights reserved. 28
Step4 – Install and configure Samba/CTDB

Create the following script for start/stop services
#!/bin/sh
function runcmd {
        echo exec on all nodes: $@
        ssh gluster01 $@ &
        ssh gluster02 $@ &
        ssh gluster03 $@ &
        ssh gluster04 $@ &
        wait
}
case $1 in
    start)
        runcmd service glusterd start
        sleep 1
        runcmd mkdir ­p /gluster/lock
        runcmd mount ­t glusterfs localhost:/lockvol /gluster/lock
        runcmd mkdir ­p /gluster/vol01
        runcmd mount ­t glusterfs localhost:/vol01 /gluster/vol01
        runcmd service ctdb start
        ;;
    stop)
        runcmd service ctdb stop
        runcmd umount /gluster/lock
        runcmd umount /gluster/vol01
        runcmd service glusterd stop
        Runcmd pkill glusterfs
        ;;
esac
ctdb_manage.sh
Red Hat K.K. All rights reserved. 29
Step5 – Start services

Now you can start/stop services.
●
After a few moments, ctdb status becomes “OK” for all nodes.
●
And floating IP's are configured on each node.
# ./ctdb_manage.sh start
# ctdb status
Number of nodes:4
pnn:0 192.168.2.11     OK (THIS NODE)
pnn:1 192.168.2.12     OK
pnn:2 192.168.2.13     OK
pnn:3 192.168.2.14     OK
Generation:1489978381
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:1
# ctdb ip
Public IPs on node 0
192.168.122.201 node[3] active[] available[eth0] configured[eth0]
192.168.122.202 node[2] active[] available[eth0] configured[eth0]
192.168.122.203 node[1] active[] available[eth0] configured[eth0]
192.168.122.204 node[0] active[eth0] available[eth0] configured[eth0]
Red Hat K.K. All rights reserved. 30
Step5 – Start services

Set samba password and check shared directories via one of floating IP's.
# pdbedit ­a ­u root
new password:
retype new password:
# smbclient ­L 192.168.122.201 ­U root
Enter root's password: 
Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]
Sharename       Type      Comment
­­­­­­­­­       ­­­­      ­­­­­­­
share           Disk      Shared Directories
IPC$            IPC       IPC Service (Samba Server Version 3.6.9­151.el6)
Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6]
Server               Comment
­­­­­­­­­            ­­­­­­­
Workgroup            Master
­­­­­­­­­            ­­­­­­­
Password DB is shared
by all hosts in the cluster.
Red Hat K.K. All rights reserved. 31
Configuration hints

To specify the GlusterFS interconnect segment, "gluster peer probe" should be done for
the IP addresses on that segment.

To specify the CTDB interconnect segment, IP addresses on that segment should be
specified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").

To specify the NFS/CIFS access segment, NIC names on that segment should be specified in
"/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associated
with floating IP's.

To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc-
auth-reject” volume options. (reject supersedes allow.)

The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") may
be useful for adjusting the CTDB failover timings. See the ctdbd man page for details.
●
CTDB_SET_DeterministicIPs=1
●
CTDB_SET_RecoveryBanPeriod=300
●
CTDB_SET_KeepaliveInterval=5
●
CTDB_SET_KeepaliveLimit=5
●
CTDB_SET_MonitorInterval=15
Red Hat K.K. All rights reserved.
Summary
Red Hat K.K. All rights reserved. 33
Summary

CTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.

Network design is crucial to realize the reliable cluster, not only for
CTDB but also for every cluster in the world ;-)

Enjoy!

And one important fine print....
●
Samba is not well tested on the large scale GlusterFS cluster. The use of
CIFS as a primary access protocol on Red Hat Storage Server 2.0 is not
officially supported by Red Hat. This will be improved in the future versions.
Red Hat K.K. All rights reserved.
WE CAN DO MORE
WHEN WE WORK TOGETHER
THE OPEN SOURCE WAY

More Related Content

What's hot

Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
sprdd
 

What's hot (20)

Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Bluestore
BluestoreBluestore
Bluestore
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
Ibm spectrum scale fundamentals workshop for americas part 4 spectrum scale_r...
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Redis at LINE
Redis at LINERedis at LINE
Redis at LINE
 
IBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking FlowIBM Spectrum Scale Networking Flow
IBM Spectrum Scale Networking Flow
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0Glusterfs 파일시스템 구성_및 운영가이드_v2.0
Glusterfs 파일시스템 구성_및 운영가이드_v2.0
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢インメモリーデータグリッドの選択肢
インメモリーデータグリッドの選択肢
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
標的型攻撃からどのように身を守るのか
標的型攻撃からどのように身を守るのか標的型攻撃からどのように身を守るのか
標的型攻撃からどのように身を守るのか
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 

Viewers also liked

CodeFest 2010. Платов А. — Производство ПО для разработчиков
CodeFest 2010. Платов А. — Производство ПО для разработчиковCodeFest 2010. Платов А. — Производство ПО для разработчиков
CodeFest 2010. Платов А. — Производство ПО для разработчиков
CodeFest
 
2node cluster
2node cluster2node cluster
2node cluster
sprdd
 
Glusterfs 구성제안서 v1.0
Glusterfs 구성제안서 v1.0Glusterfs 구성제안서 v1.0
Glusterfs 구성제안서 v1.0
sprdd
 
Linux apache installation
Linux apache installationLinux apache installation
Linux apache installation
Dima Gomaa
 
Webmin configuration in Linux
Webmin configuration in LinuxWebmin configuration in Linux
Webmin configuration in Linux
Thamizharasan P
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 

Viewers also liked (20)

Openstackoverview-DEC2013
Openstackoverview-DEC2013Openstackoverview-DEC2013
Openstackoverview-DEC2013
 
CodeFest 2010. Платов А. — Производство ПО для разработчиков
CodeFest 2010. Платов А. — Производство ПО для разработчиковCodeFest 2010. Платов А. — Производство ПО для разработчиков
CodeFest 2010. Платов А. — Производство ПО для разработчиков
 
2node cluster
2node cluster2node cluster
2node cluster
 
Glusterfs 구성제안서 v1.0
Glusterfs 구성제안서 v1.0Glusterfs 구성제안서 v1.0
Glusterfs 구성제안서 v1.0
 
ISCSI server configuration
ISCSI server configurationISCSI server configuration
ISCSI server configuration
 
Linux apache installation
Linux apache installationLinux apache installation
Linux apache installation
 
Nagios Conference 2013 - David Stern - The Nagios Light Bar
Nagios Conference 2013 - David Stern - The Nagios Light BarNagios Conference 2013 - David Stern - The Nagios Light Bar
Nagios Conference 2013 - David Stern - The Nagios Light Bar
 
GlusterFS and Openstack Storage
GlusterFS and Openstack StorageGlusterFS and Openstack Storage
GlusterFS and Openstack Storage
 
DNS server configurationDns server configuration
DNS server configurationDns server configurationDNS server configurationDns server configuration
DNS server configurationDns server configuration
 
Apache server configuration
Apache server configurationApache server configuration
Apache server configuration
 
Network configuration in Linux
Network configuration in LinuxNetwork configuration in Linux
Network configuration in Linux
 
Webmin configuration in Linux
Webmin configuration in LinuxWebmin configuration in Linux
Webmin configuration in Linux
 
Samba server configuration
Samba server configurationSamba server configuration
Samba server configuration
 
Open vStorage Meetup - Santa Clara 04/16
Open vStorage Meetup -  Santa Clara 04/16Open vStorage Meetup -  Santa Clara 04/16
Open vStorage Meetup - Santa Clara 04/16
 
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and NagiosNagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
Nagios Conference 2013 - Eric Stanley and Andy Brist - API and Nagios
 
Turning object storage into vm storage
Turning object storage into vm storageTurning object storage into vm storage
Turning object storage into vm storage
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Scale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_glusterScale out backups-with_bareos_and_gluster
Scale out backups-with_bareos_and_gluster
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 

Similar to GlusterFS CTDB Integration

Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit
confluent
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
Gosuke Miyashita
 

Similar to GlusterFS CTDB Integration (20)

Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block device
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit Kafka on Kubernetes—From Evaluation to Production at Intuit
Kafka on Kubernetes—From Evaluation to Production at Intuit
 
Cloud firewall logging
Cloud firewall loggingCloud firewall logging
Cloud firewall logging
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
High Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go AssemblyHigh Performance Scaling Techniques in Golang Using Go Assembly
High Performance Scaling Techniques in Golang Using Go Assembly
 
State of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigDataState of Containers and the Convergence of HPC and BigData
State of Containers and the Convergence of HPC and BigData
 
OpenSlava Infrastructure Automation Patterns
OpenSlava   Infrastructure Automation PatternsOpenSlava   Infrastructure Automation Patterns
OpenSlava Infrastructure Automation Patterns
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
 
Cuda
CudaCuda
Cuda
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 

More from Etsuji Nakai

TensorFlowプログラミングと分類アルゴリズムの基礎
TensorFlowプログラミングと分類アルゴリズムの基礎TensorFlowプログラミングと分類アルゴリズムの基礎
TensorFlowプログラミングと分類アルゴリズムの基礎
Etsuji Nakai
 

More from Etsuji Nakai (20)

PRML11.2-11.3
PRML11.2-11.3PRML11.2-11.3
PRML11.2-11.3
 
「ITエンジニアリングの本質」を考える
「ITエンジニアリングの本質」を考える「ITエンジニアリングの本質」を考える
「ITエンジニアリングの本質」を考える
 
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実Googleのインフラ技術に見る基盤標準化とDevOpsの真実
Googleのインフラ技術に見る基盤標準化とDevOpsの真実
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Googleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービスGoogleにおける機械学習の活用とクラウドサービス
Googleにおける機械学習の活用とクラウドサービス
 
Spannerに関する技術メモ
Spannerに関する技術メモSpannerに関する技術メモ
Spannerに関する技術メモ
 
Googleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOpsGoogleのインフラ技術から考える理想のDevOps
Googleのインフラ技術から考える理想のDevOps
 
A Brief History of My English Learning
A Brief History of My English LearningA Brief History of My English Learning
A Brief History of My English Learning
 
TensorFlowプログラミングと分類アルゴリズムの基礎
TensorFlowプログラミングと分類アルゴリズムの基礎TensorFlowプログラミングと分類アルゴリズムの基礎
TensorFlowプログラミングと分類アルゴリズムの基礎
 
TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門TensorFlowによるニューラルネットワーク入門
TensorFlowによるニューラルネットワーク入門
 
Using Kubernetes on Google Container Engine
Using Kubernetes on Google Container EngineUsing Kubernetes on Google Container Engine
Using Kubernetes on Google Container Engine
 
Lecture note on PRML 8.2
Lecture note on PRML 8.2Lecture note on PRML 8.2
Lecture note on PRML 8.2
 
Machine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application DevelopersMachine Learning Basics for Web Application Developers
Machine Learning Basics for Web Application Developers
 
Your first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterYour first TensorFlow programming with Jupyter
Your first TensorFlow programming with Jupyter
 
Deep Q-Network for beginners
Deep Q-Network for beginnersDeep Q-Network for beginners
Deep Q-Network for beginners
 
Life with jupyter
Life with jupyterLife with jupyter
Life with jupyter
 
TensorFlowで学ぶDQN
TensorFlowで学ぶDQNTensorFlowで学ぶDQN
TensorFlowで学ぶDQN
 
DevOpsにおける組織に固有の事情を どのように整理するべきか
DevOpsにおける組織に固有の事情を どのように整理するべきかDevOpsにおける組織に固有の事情を どのように整理するべきか
DevOpsにおける組織に固有の事情を どのように整理するべきか
 
PRML7.2
PRML7.2PRML7.2
PRML7.2
 
インタークラウドを実現する技術 〜 デファクトスタンダードからの視点 〜
インタークラウドを実現する技術 〜 デファクトスタンダードからの視点 〜インタークラウドを実現する技術 〜 デファクトスタンダードからの視点 〜
インタークラウドを実現する技術 〜 デファクトスタンダードからの視点 〜
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

GlusterFS CTDB Integration

  • 1. Red Hat K.K. All rights reserved. GlusterFS / CTDB Integration v1.0 2013.05.14 Etsuji Nakai Senior Solution Architect Red Hat K.K.
  • 2. Red Hat K.K. All rights reserved. 2 $ who am i  Etsuji Nakai (@enakai00) ● Senior solution architect and cloud evangelist at Red Hat K.K. ● The author of “Professional Linux Systems” series. ● Available in Japanese. Translation offering from publishers are welcomed ;-) Professional Linux Systems Technology for Next Decade Professional Linux Systems Deployment and Management Professional Linux Systems Network Management
  • 3. Red Hat K.K. All rights reserved. 3 Contents  CTDB Overview  Why does CTDB matter?  CTDB split-brain resolution  Configuration steps for demo set-up  Summary
  • 4. Red Hat K.K. All rights reserved. 4 Disclaimer  This document explains how to setup clustered Samba server using GlusterFS and CTDB with the following software components. ● Base OS, Samba, CTDB: RHEL6.4 (or any of your favorite clone) ● GlusterFS: GlusterFS 3.3.1 (Community version) ● http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/  Since this is based on the community version of GlusterFS, you cannot receive a commercial support from Red Hat for this configuration. If you need a commercial support, please consider using Red Hat Storage Server(RHS). In addition, there are different conditions for a supportable configuration with RHS. Please consult sales representatives from Red Hat for details.  Red Hat accepts no liability for the content of this document, or for the consequences of any actions taken on the basis of the information provided. Any views or opinions presented in this document are solely those of the author and do not necessarily represent those of Red Hat.
  • 5. Red Hat K.K. All rights reserved. CTDB Overview
  • 6. Red Hat K.K. All rights reserved. 6 What's CTDB?  TDB = Trivial Database ● Simple backend DB for Samba, used to store user info, file lock info, etc...  CTDB = Clustered TDB ● Cluster extension of TDB, necessary for multiple Samba hosts configuration to provide the same filesystem contents. All clients see the same contents through different Samba hosts. Samba Samba Samba ・・・ Shared Filesystem
  • 7. Red Hat K.K. All rights reserved. 7 What's wrong without CTDB?  Windows file locks are not shared among Samba hosts. ● You would see the following alert when someone is opening the same file. ● Without CTDB, if others are opening the same file through a different Samba host from you, you never see that alert. ● This is because file lock info is stored in the local TDB if you don't use CTDB. ● CTDB was initially developed as a shared TDB for multiple Samba hosts to overcome this problem. xxx.xls Windows file locks are not shared. Locked! Locked!
  • 8. Red Hat K.K. All rights reserved. 8 CTDB interconnect (heartbeat) network Yet another benefit of CTDB  Floating IP's can be assigned across hosts for the transparent failover. ● When one of the hosts fails, the floating IP is moved to another host. ● Mutual health checking is done through the CTDB interconnect (so called “heartbeat”) network. ● CTDB can also be used for NFS server cluster to provide the floating IP feature. (CTDB doesn't provide shared file locking for NFS though.) Floating IP#1 ・・・ Floating IP#2 Floating IP#N Floating IP#1 ・・・ Floating IP#2 Floating IP#N Floating IP#1
  • 9. Red Hat K.K. All rights reserved. Why does CTDB matter?
  • 10. Red Hat K.K. All rights reserved. 10 Access path of GlusterFS native client  The native client directly communicates to all storage nodes. ● Transparent failover is implemented on the client side. When the client detects the node failure, it accesses the replicated node. ● Floating IP is unnecessary by design for the native client. file01 file02 file03 ・・・ GlusterFS Storage Nodes file01, file02, file03 GlusterFS Native Client GlusterFS Volume Native client sees the volume as a single filesystem The real locations of files are calculated on the client side.
  • 11. Red Hat K.K. All rights reserved. 11 CIFS/NFS usecase for GlusterFS  The downside of the native client is it's not available for Unix/Windows. ● You need to rely on CIFS/NFS for Unix/Windows clients. ● In that case, windows file lock sharing and floating IP feature are not in GlusterFS. It should be provided with an external tool.  CTDB is the tool for it ;-) ・・・ CIFS/NFS Client CIFS/NFS client connects to just one specified node. GlusterFS storage node acts as a proxy “client”. Different clients can connect to different nodes. DNS round-robin may work for it.
  • 12. Red Hat K.K. All rights reserved. 12 Network topology overview without CTDB Storage Nodes CIFS/NFS Clients GlusterFS interconnect CIFS/NFS Access segment ...  If you don't need the floating IP/Windows file lock, you can go without CTDB. ● NFS file lock sharing (DNLM) is provided by GlusterFS's internal NFS server.  Although it's not mandatory, you can separate CIFS/NFS access segment from the GlusterFS interconnect for the sake of network performance. Samba Samba Samba Samba glusterd glusterd glusterd glusterd
  • 13. Red Hat K.K. All rights reserved. 13 Network topology overview with CTDB Storage Nodes CIFS/NFS Clients GlusterFS interconnect CIFS/NFS access segment ...  If you use CTDB with GlusterFS, you need to add an independent CTDB interconnect (heartbeat) segment for the reliable cluster. ● The reason will be explained later. CTDB interconnect (Heartbeat)
  • 14. Red Hat K.K. All rights reserved. 14 Demo - Seeing is believing! http://www.youtube.com/watch?v=kr8ylOBCn8o
  • 15. Red Hat K.K. All rights reserved. CTDB split-brain resolution
  • 16. Red Hat K.K. All rights reserved. 16 What's CTDB split-brain?  When heartbeat is cut-off from any reason (possibly network problem) while cluster nodes are still running, there must be some mechanism to choose which "island" should survive and keep running. ● Without this mechanism, the same floating IP's are assigned on both islands. This is not specific to CTDB, every cluster system in the world needs to take care of the “split-brain”.  In the case of CTDB, a master node is elected though the "lock file" on the shared filesystem. An island with the master node survives. Especially, in the case of GlusterFS, the lock file is stored on the dedicated GlusterFS volume, called "lock volume". ● The lock volume is locally mounted on each storage node. If you share the CTDB interconnect with GlusterFS interconnect, access to the lock volume is not guaranteed when the heartbeat is cut- off, resulting in an unpredictable condition. Storage Nodes GlusterFS interconnect CTDB interconnect (Heartbeat) Lock Volume Master The master takes an exclusive lock on the lock file.
  • 17. Red Hat K.K. All rights reserved. 17 Typical volume config seen from storage node # df Filesystem           1K­blocks      Used Available Use% Mounted on /dev/vda3              2591328   1036844   1422852  43% / tmpfs                   510288         0    510288   0% /dev/shm /dev/vda1               495844     33450    436794   8% /boot /dev/mapper/vg_bricks­lv_lock                          60736      3556     57180   6% /bricks/lock /dev/mapper/vg_bricks­lv_brick01                        1038336     33040   1005296   4% /bricks/brick01 localhost:/lockvol      121472      7168    114304   6% /gluster/lock localhost:/vol01       2076672     66176   2010496   4% /gluster/vol01 # ls ­l /gluster/lock/ total 2 ­rw­r­­r­­. 1 root root 294 Apr 26 15:43 ctdb ­rw­­­­­­­. 1 root root   0 Apr 26 15:57 lockfile ­rw­r­­r­­. 1 root root  52 Apr 26 15:56 nodes ­rw­r­­r­­. 1 root root  96 Apr 26 15:04 public_addresses ­rw­r­­r­­. 1 root root 218 Apr 26 16:31 smb.conf Locally mounted lock volume. Locally mounted data volume, exported with Samba. Lock file to elect the master. Common config files can be placed on the lock volume.
  • 18. Red Hat K.K. All rights reserved. 18 What about sharing CTDB interconnect with the access segment?  No, it doesn't work.  When NIC for the access segment fails, the cluster detects the heartbeat failure and elects a master node through the lock file on the shared volume. However if the NIC failed node has the lock, it becomes the master although it doesn't serve to clients. ● In reality, CTDB event monitoring detects the NIC failure and the node becomes "CTDB UNHEALTHY" status, too.
  • 19. Red Hat K.K. All rights reserved. 19 CTDB event monitoring  CTDB provides a custom event monitoring mechanism which can be used to monitor application status, NIC status, etc... ● Monitoring scripts are stored in /etc/ctdb/events.d/ ● They need to implement handlers to pre-defined events. ● They are called in the order of file name when some event occurs. ● Especially, "monitor" event is issued every 15seconds. If the "monitor" handler of some script exits with non-zero return code, the node becomes "UNHEALTHY", and will be rejected from the cluster. ● For example, “10.interface” checks the link status of NIC on which floating IP is assigned. ● See README for details - http://bit.ly/14KOjlC # ls /etc/ctdb/events.d/ 00.ctdb       11.natgw           20.multipathd  41.httpd  61.nfstickle 01.reclock    11.routing         31.clamd       50.samba  70.iscsi 10.interface  13.per_ip_routing  40.vsftpd      60.nfs    91.lvs
  • 20. Red Hat K.K. All rights reserved. Configuration steps for demo set-up
  • 21. Red Hat K.K. All rights reserved. 21 Step1 – Install RHEL6.4  Install RHEL6.4 on storage nodes. ● Scalable File System Add-On is required for XFS. ● Resilient Storage Add-On is required for CTDB packages.  Configure public key ssh authentication between nodes. ● This is for an administrative purpose.  Configure network interfaces as in the configuration pages. 192.168.122.11  gluster01 192.168.122.12  gluster02 192.168.122.13  gluster03 192.168.122.14  gluster04 192.168.2.11    gluster01c 192.168.2.12    gluster02c 192.168.2.13    gluster03c 192.168.2.14    gluster04c 192.168.1.11    gluster01g 192.168.1.12    gluster02g 192.168.1.13    gluster03g 192.168.1.14    gluster04g /etc/hosts NFS/CIFS Access Segment CTDB Interconnect GlusterFS Interconnect
  • 22. Red Hat K.K. All rights reserved. 22 Step1 – Install RHEL6.4  Configure iptables on all nodes *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] ­A INPUT ­m state ­­state ESTABLISHED,RELATED ­j ACCEPT ­A INPUT ­p icmp ­j ACCEPT ­A INPUT ­i lo ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 22 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 111 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 139 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 445 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 24007:24050 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 38465:38468 ­j ACCEPT ­A INPUT ­m state ­­state NEW ­m tcp ­p tcp ­­dport 4379 ­j ACCEPT ­A INPUT ­j REJECT ­­reject­with icmp­host­prohibited ­A FORWARD ­j REJECT ­­reject­with icmp­host­prohibited COMMIT /etc/sysconfig/iptables # vi /etc/sysconfig/iptables # service iptables restart CTDB CIFS portmap NFS/NLM Bricks CIFS
  • 23. Red Hat K.K. All rights reserved. 23 Step2 – Prepare bricks  Create and mount brick directories on all nodes. # pvcreate /dev/vdb # vgcreate vg_bricks /dev/vdb # lvcreate ­n lv_lock ­L 64M vg_bricks # lvcreate ­n lv_brick01 ­L 1G vg_bricks # yum install ­y xfsprogs # mkfs.xfs ­i size=512 /dev/vg_bricks/lv_lock  # vi mkfs.xfs ­i size=512 /dev/vg_bricks/lv_brick01 # echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab # echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab # mkdir ­p /bricks/lock # mkdir ­p /bricks/brick01 # mount /bricks/lock # mount /bricksr/brick01 /dev/vdb lv_lock lv_brick01 vg_bricks Mount on /bricks/lock, used for lock volume. Mount on /bricks/brick01, used for data volume.
  • 24. Red Hat K.K. All rights reserved. 24 Step3 – Install GlusterFS and create volumes  Install GlusterFS packages on all nodes # wget ­O /etc/yum.repos.d/glusterfs­epel.repo    http://download.gluster.org/pub/gluster/glusterfs/3.3/3.3.1/RHEL/glusterfs­epel.repo # yum install ­y rpcbind glusterfs­server # chkconfig rpcbind on # service rpcbind start # service glusterd start # gluster peer probe gluster02g # gluster peer probe gluster03g # gluster peer probe gluster04g # gluster vol create lockvol replica 2      gluster01g:/bricks/lock gluster02g:/bricks/lock      gluster03g:/bricks/lock gluster04g:/bricks/lock # gluster vol start lockvol # gluster vol create vol01 replica 2      gluster01g:/bricks/brick01 gluster02g:/bricks/brick01      gluster03g:/bricks/brick01 gluster04g:/bricks/brick01 # gluster vol start vol01 Do not auto start glusterd with chkconfig. Need to specify GlusterFS interconnect NICs.  Configure cluster and create volumes from gluster01
  • 25. Red Hat K.K. All rights reserved. 25 Step4 – Install and configure Samba/CTDB ● Create the following config files on the shared volume. # yum install ­y samba samba­client ctdb # mkdir ­p /gluster/lock # mount ­t glusterfs localhost:/lockvol /gluster/lock Do not auto start smb and ctdb with chkconfig. CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addresses CTDB_NODES=/etc/ctdb/nodes # Only when using Samba. Unnecessary for NFS. CTDB_MANAGES_SAMBA=yes # some tunables CTDB_SET_DeterministicIPs=1 CTDB_SET_RecoveryBanPeriod=120 CTDB_SET_KeepaliveInterval=5 CTDB_SET_KeepaliveLimit=5 CTDB_SET_MonitorInterval=15 /gluster/lock/ctdb # yum install ­y rpcbind nfs­utils # chkconfig rpcbind on # service rpcbind start  Install Samba/CTDB packages on all nodes  If you use NFS, install the following packages, too.  Configure CTDB and Samba only on gluster01
  • 26. Red Hat K.K. All rights reserved. 26 Step4 – Install and configure Samba/CTDB 192.168.2.11 192.168.2.12 192.168.2.13 192.168.2.14 /gluster/lock/nodes 192.168.122.201/24 eth0 192.168.122.202/24 eth0 192.168.122.203/24 eth0 192.168.122.204/24 eth0 /gluster/lock/public_addresses [global] workgroup = MYGROUP server string = Samba Server Version %v clustering = yes security = user passdb backend = tdbsam [share] comment = Shared Directories path = /gluster/vol01 browseable = yes writable = yes /gluster/lock/smb.conf CTDB cluster nodes. Need to specify CTDB interconnect NICs. Floating IP list. Samba config. Need to specify “clustering = yes”
  • 27. Red Hat K.K. All rights reserved. 27 Step4 – Install and configure Samba/CTDB  Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location. ● We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS. ● https://bugzilla.redhat.com/show_bug.cgi?id=910380 # mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig # mv /etc/samba/smb.conf /etc/samba/smb.conf.orig # ln ­s /gluster/lock/ctdb /etc/sysconfig/ctdb # ln ­s /gluster/lock/nodes /etc/ctdb/nodes # ln ­s /gluster/lock/public_addresses /etc/ctdb/public_addresses # ln ­s /gluster/lock/smb.conf /etc/samba/smb.conf # yum install ­y policycoreutils­python # semanage permissive ­a smbd_t  Create symlink to config files on all nodes.
  • 28. Red Hat K.K. All rights reserved. 28 Step4 – Install and configure Samba/CTDB  Create the following script for start/stop services #!/bin/sh function runcmd {         echo exec on all nodes: $@         ssh gluster01 $@ &         ssh gluster02 $@ &         ssh gluster03 $@ &         ssh gluster04 $@ &         wait } case $1 in     start)         runcmd service glusterd start         sleep 1         runcmd mkdir ­p /gluster/lock         runcmd mount ­t glusterfs localhost:/lockvol /gluster/lock         runcmd mkdir ­p /gluster/vol01         runcmd mount ­t glusterfs localhost:/vol01 /gluster/vol01         runcmd service ctdb start         ;;     stop)         runcmd service ctdb stop         runcmd umount /gluster/lock         runcmd umount /gluster/vol01         runcmd service glusterd stop         Runcmd pkill glusterfs         ;; esac ctdb_manage.sh
  • 29. Red Hat K.K. All rights reserved. 29 Step5 – Start services  Now you can start/stop services. ● After a few moments, ctdb status becomes “OK” for all nodes. ● And floating IP's are configured on each node. # ./ctdb_manage.sh start # ctdb status Number of nodes:4 pnn:0 192.168.2.11     OK (THIS NODE) pnn:1 192.168.2.12     OK pnn:2 192.168.2.13     OK pnn:3 192.168.2.14     OK Generation:1489978381 Size:4 hash:0 lmaster:0 hash:1 lmaster:1 hash:2 lmaster:2 hash:3 lmaster:3 Recovery mode:NORMAL (0) Recovery master:1 # ctdb ip Public IPs on node 0 192.168.122.201 node[3] active[] available[eth0] configured[eth0] 192.168.122.202 node[2] active[] available[eth0] configured[eth0] 192.168.122.203 node[1] active[] available[eth0] configured[eth0] 192.168.122.204 node[0] active[eth0] available[eth0] configured[eth0]
  • 30. Red Hat K.K. All rights reserved. 30 Step5 – Start services  Set samba password and check shared directories via one of floating IP's. # pdbedit ­a ­u root new password: retype new password: # smbclient ­L 192.168.122.201 ­U root Enter root's password:  Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6] Sharename       Type      Comment ­­­­­­­­­       ­­­­      ­­­­­­­ share           Disk      Shared Directories IPC$            IPC       IPC Service (Samba Server Version 3.6.9­151.el6) Domain=[MYGROUP] OS=[Unix] Server=[Samba 3.6.9­151.el6] Server               Comment ­­­­­­­­­            ­­­­­­­ Workgroup            Master ­­­­­­­­­            ­­­­­­­ Password DB is shared by all hosts in the cluster.
  • 31. Red Hat K.K. All rights reserved. 31 Configuration hints  To specify the GlusterFS interconnect segment, "gluster peer probe" should be done for the IP addresses on that segment.  To specify the CTDB interconnect segment, IP addresses on that segment should be specified in "/gluster/lock/nodes" (symlink from "/etc/ctdb/nodes").  To specify the NFS/CIFS access segment, NIC names on that segment should be specified in "/gluster/lock/public_addresses" (symlink from "/etc/ctdb/public_addresses") associated with floating IP's.  To restrict NFS accesses for a volume, you can use “nfs.rpc-auth-allow” and “nfs.rpc- auth-reject” volume options. (reject supersedes allow.)  The following tunables in "/gluster/lock/ctdb" (symlink from "/etc/sysconfig/ctdb") may be useful for adjusting the CTDB failover timings. See the ctdbd man page for details. ● CTDB_SET_DeterministicIPs=1 ● CTDB_SET_RecoveryBanPeriod=300 ● CTDB_SET_KeepaliveInterval=5 ● CTDB_SET_KeepaliveLimit=5 ● CTDB_SET_MonitorInterval=15
  • 32. Red Hat K.K. All rights reserved. Summary
  • 33. Red Hat K.K. All rights reserved. 33 Summary  CTDB is the tool well combined with CIFS/NFS usecase for GlusterFS.  Network design is crucial to realize the reliable cluster, not only for CTDB but also for every cluster in the world ;-)  Enjoy!  And one important fine print.... ● Samba is not well tested on the large scale GlusterFS cluster. The use of CIFS as a primary access protocol on Red Hat Storage Server 2.0 is not officially supported by Red Hat. This will be improved in the future versions.
  • 34. Red Hat K.K. All rights reserved. WE CAN DO MORE WHEN WE WORK TOGETHER THE OPEN SOURCE WAY