SlideShare a Scribd company logo
1 of 31
Download to read offline
SUSE Linux Enterprise High Availability
Cluster Multi-Device
Antoine Giniès
Project Manager / Release Manager
SUSE / aginies@suse.com
Expert Days Paris
Feb 2018
Refreshing Memory
3
SUSE Entreprise Server HA
Main Features
• Policy Driven Cluster
• Cluster Aware FS
• Continuous Data Replication
• Setup and Installation bootstrap
• Simple
4
HA Cluster Stack Architecture
5
RAID 0 | RAID 1 | RAID 10 | RAID Forever !
HA Storage quick overview
7
Doing HA storage
2 main solution
• Cluster nodes have local storage and for each write request to be sent on the
network (minimum of 2 nodes)
OR
• Redundant storage separate from the Cluster nodes
– SAN (fiber/iSCSI etc…)
– SPOF !
8
High Availability – DRBD
• Data Replication Block Device
• master/slave resources is managed by
pacemaker, corosync software stack
• SLE HA stack manages the service
ordering, dependency, and failover
• Mirror of 2 block Devices (RAID1)
• Active-Passive
Host1 Host2
Virtual IP
Apache/ext4
DRBD Master
Kernel
Virtual IP
Apache/ext4
DRBD Slave
Pacemaker + Corosync
Kernel
Failover
9
Data Replication – DRBD
DRBD can be thought as a networked RAID1
DRBD allows you to create a mirror of two block devices that are
located at two different sites across the network
It mirrors data in real-time, so its replication occurs continuously,
and works well for long distance replication
> SLE 12 HA SP2 --> DRBD 9
10
Clustered LVM2 (cLVM2)
• multiple nodes to use LVM2 on the
shared disk
• cLVM2 coordinates LVM2 metadata
• coordinate access to the shared data
• Multiple nodes access data of different
dedicated VG are safe
• Active-Active
Host1 Host2
Pacemaker + Corosync + DLM
LVM2
Metadata
clvmd
Shared LUN 2
Shared LUN 1
LVM2
Metadata
clvmd
11
Data Replication – cLVM/cmirrord
● There are different types of LV in CLVM: Striped, Mirrored etc...
● CLVM has been extended from LVM to support transparent management
of volume groups across the whole cluster
● For CLVM, we can also created mirrored lv to achieve data replication,
and cmirrord is used to tracks mirror log info in a cluster
12
Problem
Solution?
• All nodes to have local storage and for each write request to be sent on the
network (mini of 2 nodes)
• Complexity of different nodes works together with multiple storage server
• CLVM2/cmirrord bad performance
Cluster MD
13
Cluster-MD
Cluster Multi-device
• Software based RAID storage
• Redundancy at the device level
• NOT a cluster FS !
• Ensure data between mirrors is consistent
• Improve performance (VS CLVM mirroring)
• RAID1 (redundancy)
• Replace at Runtime
• On top of 2 SAN storage → no more SPOF
• Possible to have more than 2 SAN
SAN1 SAN2
Pacemaker + Corosync + DLM
Cluster-MD
bitmap
Clvmd /
lvmlockd
Shared LUN 2
Shared LUN 1
Cluster-MD
bitmap
Clvmd /
lvmlockd
Shared LUN 3
…..Shared LUN 4
14
Data Replication – Cluster-MD
Internals:
– Cluster MD keeps write-intent-bitmap for each cluster node
– During "normal" I/O access, we assume the clustered filesystem ensures that only one
node writes to any given block at a time
– With each node have it's own bitmap, there would be no locking
and no need to keep sync array during normal operation
– Cluster MD would only handle the bitmaps when
resync/recovery etc happened
15
DRBD VS Cluster-MD
• DRBD
– SAN storage
– 2 nodes only (+1 backup)
– Possible Regular FS
– Primary / Primary with cluster aware FS
– Raid 0 (stripping) / Raid 1 (Mirroring)
• Cluster-MD
– Raid 1 (Mirroring)
– SAN storage
– > 2 nodes
– Cluster Aware FS
How it works (resync)
17
Active/Active FS
• All nodes write at same FS at the same time
• Cluster awares FS (OCFS2 / GFS2)
• Each node can write to any block
• Locking service is mandatory (DLM)
• RAID1:
– No needs of extra coordination
– 1 possible issue: 2 nodes write same block at same time
! CLVM and cluster-MD never do locking !
18
Cluster-MD VS CLVM (in details)
• Cluster-MD
– Resyncing device or reading from a single device (wait resync finish)
– Resync technical details:
• Bitmap with possibility-out-of-sync regions stored
• Bit is set before writing and cleared at the end
– Smaller cost updating bitmap rather than resyncing whole array (faster recovery)
– Bitmap stored on all the devices in the array
– But it has Separate bitmap for each cluster node
– Setting/clearing a bit is a single node case, simple write to all disks
• CLVM DM-RAID1
– Resync tech details:
• Dirty region log managed by dm-log-userspace (mark or clear a region)
• This is user space daemon
– Replication around the cluster through a message
– Acknowledgment return to the original node; and then kernel module
19
Cluster-MD VS CLVM
• Cluster-MD:
– Writing to a block of each storage device
– Waiting for confirmation
• DM-RAID1
– Message in user space to all nodes
– Acknowledgment return to the original node
– Pass the info to kernel module
Comparison
21
Comparison table
Nodes
Suitable
for Geo
A/A
A/P
RAID FS Shared Storage
DRBD
Supported
(limited to 2
nodes)
Yes A/P RAID1 Classical
No, storage is
dedicated to each
node
CLVM
limited by
pacemaker
& corosync
No A/A
RAID0
RAID1
Classical
Cluster aware
Yes
Cluster-MD
limited by
pacemaker
& corosync
No
A/A
A/P
RAID1 Cluster Aware Yes
22
Data Replication – Performance Comparison
FIO test with sync engine
Read Write
4k
16k
0 500 1000 1500 2000 2500 3000 3500
RawDisk
NativeRaid
Clustermd
Cmirror
Average iops
Blocksize
4k
16k
0 1000 2000 3000 4000 5000 6000 7000 8000
RawDisk
NativeRaid
Clustermd
Cmirror
Average iops
Blocksize
23
Data Replication – Performance Comparison
FIO test with libaio engine
Read Write
4k
16k
0 5000 10000 15000 20000 25000 30000 35000 40000
RawDisk
NativeRaid
Clustermd
Cmirror
Average iops
Blocksize
4k
16k
0 5000 10000 15000 20000 25000
RawDisk
NativeRaid
Clustermd
Cmirror
Average iopsBlocksize
Extension of a Cluster-MD device
25
Cluster-MD Demo
• 3 Virtual Machine (ha1 ha2 ha3)
• SLE12SP3 HA ready for usage
• Attached disks:
– 1 System
– 1 SBD
– 3 x 1Gb
– 3 x 2Gb
• Cluster-MD Deployment
26
Cluster-MD setup (step by step)
• Install Cluster-md-kmp mdadm on all nodes
• Shared storage: fake shared storage between nodes (vdd-vdi)
• Create a CIB and use it
cib new cluster_md_demo
• CRM: DLM resource
primitive dlm ocf:pacemaker:controld op monitor interval='60' timeout='60'
group base-group dlm
clone base-clone base-group meta interleave=true target-role=Startedup dlm
• Create the RAID1
mdadm --create /dev/md0 --bitmap=clustered --raid-devices=2 --level=mirror --spare-devices=1 /dev/vdd /dev/vde /dev/vdf
• Create /etc/mdadm.conf (using the UUID)
DEVICE /dev/vdd /dev/vde /dev/vdf /dev/vdg /dev/vdh /dev/vdi
ARRAY /dev/md0 metadata=1.2 spares=1 name=SLE12SP3ha3:0 UUID=c846e466:b7e15a4e:9ff54149:96b0dfb1
• Sync /etc/mdadm.conf to all nodes
• CRM: RAIDER primitive
primitive raider Raid1 params raidconf="/etc/mdadm.conf" raiddev="/dev/md0" force_clones=true 
op monitor timeout=20s interval=10 op start timeout=20s interval=0 op stop timeout=20s interval=0
• mkfs.ocfs2 --cluster-stack pcmk -L 'VMtesting' --cluster-name hacluster /dev/md0
27
Cluster-MD Extend Demo from 1Gb to 2 Gb
• VDC VDD VDE = 1 Gb | VDE VDF VDG = 2Gb
• Add more backend devices (VDE VDF VDG)
– mdadm –manage /dev/md0 –add DEV
• Declare as fail the spare, remove it
– mdadm --manage /dev/md0 --fail /dev/vdX
– mdadm --manage /dev/md0 --remove /dev/vdX
• Still 2 Active of 1Gb, and 3 Spare of 2Gb
• Declare fail & remove 1 Active → Resync between 1 Active (1Gb) & one of 2Gb
• Once Sync done, declare failed latest Active on 1Gb, 1 spare of 2Gb will replace it
• Remove the latest fail device of 1Gb
• grow the size of /dev/md0
– mdadm --grow /dev/md0 –size=max
• Resize the FS (tunefs.ocfs2 or gfs2_grow)
28
Futur of Cluster-MD
29
Cluster-MD RAID 10 (TP)
• mdadm --create /dev/md0 --bitmap=clustered --metadata=1.2 --raid-devices=2
--level=10 /dev/sda /dev/sdb
• RAID10 supports 3 layouts:
• Cluster-MD only supports near layout (best performance)
NEAR FAR OFFSET
a1 b1 c1 e1
0 0 1 1
2 2 3 3
4 4 5 5
6 6 7 7
8 8 9 9
a1 b1 c1 e1
0 1 2 3
4 5 6 7
. . .
3 0 1 2
7 4 5 6
a1 b1 c1 e1
0 1 2 3
3 0 1 2
4 5 6 7
7 4 5 6
8 9 10 11
11 8 9 10
30
Question & Answer
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device

More Related Content

What's hot

Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseAlex Lau
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersSage Weil
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Cinder Live Migration and Replication - OpenStack Summit Austin
Cinder Live Migration and Replication - OpenStack Summit AustinCinder Live Migration and Replication - OpenStack Summit Austin
Cinder Live Migration and Replication - OpenStack Summit AustinEd Balduf
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoopChiou-Nan Chen
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Gluster.org
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkXiaoxi Chen
 
RHEVM - Live Storage Migration
RHEVM - Live Storage MigrationRHEVM - Live Storage Migration
RHEVM - Live Storage MigrationRaz Tamir
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Novell
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGluster.org
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storageGluster.org
 
SUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXSUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXAlex Lau
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud StroageAlex Lau
 
Ceph and Mirantis OpenStack
Ceph and Mirantis OpenStackCeph and Mirantis OpenStack
Ceph and Mirantis OpenStackMirantis
 

What's hot (20)

Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Cinder Live Migration and Replication - OpenStack Summit Austin
Cinder Live Migration and Replication - OpenStack Summit AustinCinder Live Migration and Replication - OpenStack Summit Austin
Cinder Live Migration and Replication - OpenStack Summit Austin
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Cephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmarkCephfs jewel mds performance benchmark
Cephfs jewel mds performance benchmark
 
RHEVM - Live Storage Migration
RHEVM - Live Storage MigrationRHEVM - Live Storage Migration
RHEVM - Live Storage Migration
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
SUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXSUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderX
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
Ceph and Mirantis OpenStack
Ceph and Mirantis OpenStackCeph and Mirantis OpenStack
Ceph and Mirantis OpenStack
 

Similar to SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device

Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2Nick Wang
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDShapeBlue
 
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018Roger Zhou 周志强
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMAlex Zaballa
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875Duncan Epping
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13Gosuke Miyashita
 
VMworld 2017 - Top 10 things to know about vSAN
VMworld 2017 - Top 10 things to know about vSANVMworld 2017 - Top 10 things to know about vSAN
VMworld 2017 - Top 10 things to know about vSANDuncan Epping
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageAidan Finn
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
VSAN – Architettura e Design
VSAN – Architettura e DesignVSAN – Architettura e Design
VSAN – Architettura e DesignVMUG IT
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
 
Redis trouble shooting_eng
Redis trouble shooting_engRedis trouble shooting_eng
Redis trouble shooting_engDaeMyung Kang
 
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block deviceChanaka Lasantha
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph Community
 

Similar to SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device (20)

Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2brief introduction of drbd in SLE12SP2
brief introduction of drbd in SLE12SP2
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
 
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018
延伸Linux关键业务到双活高速NVMe-oF存储-OpenInfraDays-China2018
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASMSAOUG - Connect 2014 - Flex Cluster and Flex ASM
SAOUG - Connect 2014 - Flex Cluster and Flex ASM
 
A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875A day in the life of a VSAN I/O - STO7875
A day in the life of a VSAN I/O - STO7875
 
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
 
VMworld 2017 - Top 10 things to know about vSAN
VMworld 2017 - Top 10 things to know about vSANVMworld 2017 - Top 10 things to know about vSAN
VMworld 2017 - Top 10 things to know about vSAN
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined Storage
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
VSAN – Architettura e Design
VSAN – Architettura e DesignVSAN – Architettura e Design
VSAN – Architettura e Design
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
 
Redis trouble shooting_eng
Redis trouble shooting_engRedis trouble shooting_eng
Redis trouble shooting_eng
 
Distributed replicated block device
Distributed replicated block deviceDistributed replicated block device
Distributed replicated block device
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 

More from SUSE

Neuvector Rodeo 17 mars 20234
Neuvector Rodeo 17 mars 20234Neuvector Rodeo 17 mars 20234
Neuvector Rodeo 17 mars 20234SUSE
 
Coffee Break NeuVector
Coffee Break NeuVectorCoffee Break NeuVector
Coffee Break NeuVectorSUSE
 
Rancher Rodéo France
Rancher Rodéo FranceRancher Rodéo France
Rancher Rodéo FranceSUSE
 
Harvester
HarvesterHarvester
HarvesterSUSE
 
Presentation de NeuVector 5.0
Presentation de NeuVector 5.0Presentation de NeuVector 5.0
Presentation de NeuVector 5.0SUSE
 
Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022SUSE
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherSUSE
 
L'affaire CentOS
L'affaire CentOSL'affaire CentOS
L'affaire CentOSSUSE
 
Rancher Rodeo
Rancher RodeoRancher Rodeo
Rancher RodeoSUSE
 
Harvester café
Harvester caféHarvester café
Harvester caféSUSE
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherSUSE
 
Lancement Harvester
Lancement HarvesterLancement Harvester
Lancement HarvesterSUSE
 
Innovate everywhere - SUSE edge
Innovate everywhere - SUSE edgeInnovate everywhere - SUSE edge
Innovate everywhere - SUSE edgeSUSE
 
Expert Day 2019 - HA et SAP : How QA is done
Expert Day 2019 - HA et SAP : How QA is doneExpert Day 2019 - HA et SAP : How QA is done
Expert Day 2019 - HA et SAP : How QA is doneSUSE
 
Expert Day 2019 - Automated SAP HANA deployments et Terraform
Expert Day 2019 - Automated SAP HANA deployments et TerraformExpert Day 2019 - Automated SAP HANA deployments et Terraform
Expert Day 2019 - Automated SAP HANA deployments et TerraformSUSE
 
Expert Day 2019 - CaaSP et CAP
Expert Day 2019 - CaaSP et CAPExpert Day 2019 - CaaSP et CAP
Expert Day 2019 - CaaSP et CAPSUSE
 
Expert Day 2019 - SUSE Enterrpise Storage et CEPH
Expert Day 2019 - SUSE Enterrpise Storage et CEPHExpert Day 2019 - SUSE Enterrpise Storage et CEPH
Expert Day 2019 - SUSE Enterrpise Storage et CEPHSUSE
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudSUSE
 
Expert Day 2019 - SUSE Manager
Expert Day 2019 - SUSE ManagerExpert Day 2019 - SUSE Manager
Expert Day 2019 - SUSE ManagerSUSE
 
Expert Day 2019 - SUSE public beta program
Expert Day 2019 - SUSE public beta programExpert Day 2019 - SUSE public beta program
Expert Day 2019 - SUSE public beta programSUSE
 

More from SUSE (20)

Neuvector Rodeo 17 mars 20234
Neuvector Rodeo 17 mars 20234Neuvector Rodeo 17 mars 20234
Neuvector Rodeo 17 mars 20234
 
Coffee Break NeuVector
Coffee Break NeuVectorCoffee Break NeuVector
Coffee Break NeuVector
 
Rancher Rodéo France
Rancher Rodéo FranceRancher Rodéo France
Rancher Rodéo France
 
Harvester
HarvesterHarvester
Harvester
 
Presentation de NeuVector 5.0
Presentation de NeuVector 5.0Presentation de NeuVector 5.0
Presentation de NeuVector 5.0
 
Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022Rancher Rodeo 13 mai 2022
Rancher Rodeo 13 mai 2022
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et Rancher
 
L'affaire CentOS
L'affaire CentOSL'affaire CentOS
L'affaire CentOS
 
Rancher Rodeo
Rancher RodeoRancher Rodeo
Rancher Rodeo
 
Harvester café
Harvester caféHarvester café
Harvester café
 
Code Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et RancherCode Factory avec GitLab CI et Rancher
Code Factory avec GitLab CI et Rancher
 
Lancement Harvester
Lancement HarvesterLancement Harvester
Lancement Harvester
 
Innovate everywhere - SUSE edge
Innovate everywhere - SUSE edgeInnovate everywhere - SUSE edge
Innovate everywhere - SUSE edge
 
Expert Day 2019 - HA et SAP : How QA is done
Expert Day 2019 - HA et SAP : How QA is doneExpert Day 2019 - HA et SAP : How QA is done
Expert Day 2019 - HA et SAP : How QA is done
 
Expert Day 2019 - Automated SAP HANA deployments et Terraform
Expert Day 2019 - Automated SAP HANA deployments et TerraformExpert Day 2019 - Automated SAP HANA deployments et Terraform
Expert Day 2019 - Automated SAP HANA deployments et Terraform
 
Expert Day 2019 - CaaSP et CAP
Expert Day 2019 - CaaSP et CAPExpert Day 2019 - CaaSP et CAP
Expert Day 2019 - CaaSP et CAP
 
Expert Day 2019 - SUSE Enterrpise Storage et CEPH
Expert Day 2019 - SUSE Enterrpise Storage et CEPHExpert Day 2019 - SUSE Enterrpise Storage et CEPH
Expert Day 2019 - SUSE Enterrpise Storage et CEPH
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack Cloud
 
Expert Day 2019 - SUSE Manager
Expert Day 2019 - SUSE ManagerExpert Day 2019 - SUSE Manager
Expert Day 2019 - SUSE Manager
 
Expert Day 2019 - SUSE public beta program
Expert Day 2019 - SUSE public beta programExpert Day 2019 - SUSE public beta program
Expert Day 2019 - SUSE public beta program
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device

  • 1. SUSE Linux Enterprise High Availability Cluster Multi-Device Antoine Giniès Project Manager / Release Manager SUSE / aginies@suse.com Expert Days Paris Feb 2018
  • 3. 3 SUSE Entreprise Server HA Main Features • Policy Driven Cluster • Cluster Aware FS • Continuous Data Replication • Setup and Installation bootstrap • Simple
  • 4. 4 HA Cluster Stack Architecture
  • 5. 5 RAID 0 | RAID 1 | RAID 10 | RAID Forever !
  • 6. HA Storage quick overview
  • 7. 7 Doing HA storage 2 main solution • Cluster nodes have local storage and for each write request to be sent on the network (minimum of 2 nodes) OR • Redundant storage separate from the Cluster nodes – SAN (fiber/iSCSI etc…) – SPOF !
  • 8. 8 High Availability – DRBD • Data Replication Block Device • master/slave resources is managed by pacemaker, corosync software stack • SLE HA stack manages the service ordering, dependency, and failover • Mirror of 2 block Devices (RAID1) • Active-Passive Host1 Host2 Virtual IP Apache/ext4 DRBD Master Kernel Virtual IP Apache/ext4 DRBD Slave Pacemaker + Corosync Kernel Failover
  • 9. 9 Data Replication – DRBD DRBD can be thought as a networked RAID1 DRBD allows you to create a mirror of two block devices that are located at two different sites across the network It mirrors data in real-time, so its replication occurs continuously, and works well for long distance replication > SLE 12 HA SP2 --> DRBD 9
  • 10. 10 Clustered LVM2 (cLVM2) • multiple nodes to use LVM2 on the shared disk • cLVM2 coordinates LVM2 metadata • coordinate access to the shared data • Multiple nodes access data of different dedicated VG are safe • Active-Active Host1 Host2 Pacemaker + Corosync + DLM LVM2 Metadata clvmd Shared LUN 2 Shared LUN 1 LVM2 Metadata clvmd
  • 11. 11 Data Replication – cLVM/cmirrord ● There are different types of LV in CLVM: Striped, Mirrored etc... ● CLVM has been extended from LVM to support transparent management of volume groups across the whole cluster ● For CLVM, we can also created mirrored lv to achieve data replication, and cmirrord is used to tracks mirror log info in a cluster
  • 12. 12 Problem Solution? • All nodes to have local storage and for each write request to be sent on the network (mini of 2 nodes) • Complexity of different nodes works together with multiple storage server • CLVM2/cmirrord bad performance Cluster MD
  • 13. 13 Cluster-MD Cluster Multi-device • Software based RAID storage • Redundancy at the device level • NOT a cluster FS ! • Ensure data between mirrors is consistent • Improve performance (VS CLVM mirroring) • RAID1 (redundancy) • Replace at Runtime • On top of 2 SAN storage → no more SPOF • Possible to have more than 2 SAN SAN1 SAN2 Pacemaker + Corosync + DLM Cluster-MD bitmap Clvmd / lvmlockd Shared LUN 2 Shared LUN 1 Cluster-MD bitmap Clvmd / lvmlockd Shared LUN 3 …..Shared LUN 4
  • 14. 14 Data Replication – Cluster-MD Internals: – Cluster MD keeps write-intent-bitmap for each cluster node – During "normal" I/O access, we assume the clustered filesystem ensures that only one node writes to any given block at a time – With each node have it's own bitmap, there would be no locking and no need to keep sync array during normal operation – Cluster MD would only handle the bitmaps when resync/recovery etc happened
  • 15. 15 DRBD VS Cluster-MD • DRBD – SAN storage – 2 nodes only (+1 backup) – Possible Regular FS – Primary / Primary with cluster aware FS – Raid 0 (stripping) / Raid 1 (Mirroring) • Cluster-MD – Raid 1 (Mirroring) – SAN storage – > 2 nodes – Cluster Aware FS
  • 16. How it works (resync)
  • 17. 17 Active/Active FS • All nodes write at same FS at the same time • Cluster awares FS (OCFS2 / GFS2) • Each node can write to any block • Locking service is mandatory (DLM) • RAID1: – No needs of extra coordination – 1 possible issue: 2 nodes write same block at same time ! CLVM and cluster-MD never do locking !
  • 18. 18 Cluster-MD VS CLVM (in details) • Cluster-MD – Resyncing device or reading from a single device (wait resync finish) – Resync technical details: • Bitmap with possibility-out-of-sync regions stored • Bit is set before writing and cleared at the end – Smaller cost updating bitmap rather than resyncing whole array (faster recovery) – Bitmap stored on all the devices in the array – But it has Separate bitmap for each cluster node – Setting/clearing a bit is a single node case, simple write to all disks • CLVM DM-RAID1 – Resync tech details: • Dirty region log managed by dm-log-userspace (mark or clear a region) • This is user space daemon – Replication around the cluster through a message – Acknowledgment return to the original node; and then kernel module
  • 19. 19 Cluster-MD VS CLVM • Cluster-MD: – Writing to a block of each storage device – Waiting for confirmation • DM-RAID1 – Message in user space to all nodes – Acknowledgment return to the original node – Pass the info to kernel module
  • 21. 21 Comparison table Nodes Suitable for Geo A/A A/P RAID FS Shared Storage DRBD Supported (limited to 2 nodes) Yes A/P RAID1 Classical No, storage is dedicated to each node CLVM limited by pacemaker & corosync No A/A RAID0 RAID1 Classical Cluster aware Yes Cluster-MD limited by pacemaker & corosync No A/A A/P RAID1 Cluster Aware Yes
  • 22. 22 Data Replication – Performance Comparison FIO test with sync engine Read Write 4k 16k 0 500 1000 1500 2000 2500 3000 3500 RawDisk NativeRaid Clustermd Cmirror Average iops Blocksize 4k 16k 0 1000 2000 3000 4000 5000 6000 7000 8000 RawDisk NativeRaid Clustermd Cmirror Average iops Blocksize
  • 23. 23 Data Replication – Performance Comparison FIO test with libaio engine Read Write 4k 16k 0 5000 10000 15000 20000 25000 30000 35000 40000 RawDisk NativeRaid Clustermd Cmirror Average iops Blocksize 4k 16k 0 5000 10000 15000 20000 25000 RawDisk NativeRaid Clustermd Cmirror Average iopsBlocksize
  • 24. Extension of a Cluster-MD device
  • 25. 25 Cluster-MD Demo • 3 Virtual Machine (ha1 ha2 ha3) • SLE12SP3 HA ready for usage • Attached disks: – 1 System – 1 SBD – 3 x 1Gb – 3 x 2Gb • Cluster-MD Deployment
  • 26. 26 Cluster-MD setup (step by step) • Install Cluster-md-kmp mdadm on all nodes • Shared storage: fake shared storage between nodes (vdd-vdi) • Create a CIB and use it cib new cluster_md_demo • CRM: DLM resource primitive dlm ocf:pacemaker:controld op monitor interval='60' timeout='60' group base-group dlm clone base-clone base-group meta interleave=true target-role=Startedup dlm • Create the RAID1 mdadm --create /dev/md0 --bitmap=clustered --raid-devices=2 --level=mirror --spare-devices=1 /dev/vdd /dev/vde /dev/vdf • Create /etc/mdadm.conf (using the UUID) DEVICE /dev/vdd /dev/vde /dev/vdf /dev/vdg /dev/vdh /dev/vdi ARRAY /dev/md0 metadata=1.2 spares=1 name=SLE12SP3ha3:0 UUID=c846e466:b7e15a4e:9ff54149:96b0dfb1 • Sync /etc/mdadm.conf to all nodes • CRM: RAIDER primitive primitive raider Raid1 params raidconf="/etc/mdadm.conf" raiddev="/dev/md0" force_clones=true op monitor timeout=20s interval=10 op start timeout=20s interval=0 op stop timeout=20s interval=0 • mkfs.ocfs2 --cluster-stack pcmk -L 'VMtesting' --cluster-name hacluster /dev/md0
  • 27. 27 Cluster-MD Extend Demo from 1Gb to 2 Gb • VDC VDD VDE = 1 Gb | VDE VDF VDG = 2Gb • Add more backend devices (VDE VDF VDG) – mdadm –manage /dev/md0 –add DEV • Declare as fail the spare, remove it – mdadm --manage /dev/md0 --fail /dev/vdX – mdadm --manage /dev/md0 --remove /dev/vdX • Still 2 Active of 1Gb, and 3 Spare of 2Gb • Declare fail & remove 1 Active → Resync between 1 Active (1Gb) & one of 2Gb • Once Sync done, declare failed latest Active on 1Gb, 1 spare of 2Gb will replace it • Remove the latest fail device of 1Gb • grow the size of /dev/md0 – mdadm --grow /dev/md0 –size=max • Resize the FS (tunefs.ocfs2 or gfs2_grow)
  • 29. 29 Cluster-MD RAID 10 (TP) • mdadm --create /dev/md0 --bitmap=clustered --metadata=1.2 --raid-devices=2 --level=10 /dev/sda /dev/sdb • RAID10 supports 3 layouts: • Cluster-MD only supports near layout (best performance) NEAR FAR OFFSET a1 b1 c1 e1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 a1 b1 c1 e1 0 1 2 3 4 5 6 7 . . . 3 0 1 2 7 4 5 6 a1 b1 c1 e1 0 1 2 3 3 0 1 2 4 5 6 7 7 4 5 6 8 9 10 11 11 8 9 10