Sudoers Barcelona
Octubre 2013
alba ferrer
What is it?
Distributed Replicated Block Device
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated
storage solution mirroring the contents of block
devices
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated
storage solution mirroring the contents of block
devices
• In real time
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated
storage solution mirroring the contents of block
devices
• In real time
• Transparently
What is it?
Distributed Replicated Block Device
Software-based, shared-nothing replicated
storage solution mirroring the contents of block
devices
• In real time
• Transparently
• Synchronously/asynchronously
Kernel module
User space admin tools
• drbsetup
• Used to configure the kernel module
• All parameters in command-line
User space admin tools
• drbsetup
• Used to configure the kernel module
• All parameters in command-line
• drbdmeta
• Create/dump/restore/modify DRBD metadata
(more on this later)
User space admin tools
• drbsetup
• Used to configure the kernel module
• All parameters in command-line
• drbdmeta
• Create/dump/restore/modify DRBD metadata
(more on this later)
• drbdadm
• High-level, frontend for drbdsetup/drbdmeta
• Reads from /etc/drbd.conf
• Has a dry-run option (-d)
Resources
• A particular replicated storage device
Resources
• A particular replicated storage device
• Resource name
• DRBD device: virtual block device (major=147).
The associated block device is always
/dev/drbdm (m=minor)
• Disk configuration: local copy of the data
• Network configuration: comms with peer
Configuration
Per resource (/etc/drbd.d/mysql.res):
resource mysql {
device minor 0; # /dev/drbd0
disk /dev/sdb;
meta-disk internal;
on alice {
address 192.168.133.111:7000;
}
on bob {
address 192.168.133.112:7000;
}
syncer {
rate 10M; # static resync rate of 10MByte/s
}
}
Configuration
Global (/etc/drbd.d/global_common.conf):
global {
usage-count yes;
}
common {
protocol C;
disk {
on-io-error detach;
}
syncer {
al-extents 3833;
}
}
Resource roles
• Primary: read and write ops
• Secondary: receives updates from primary,
disallows any other access.
• Promotion: from secondary to primary
drbdadm primary all
• Demotion: from primary to secondary
drbdadm secondary all
Modes
• Single-primary
• Dual-primary (>= 8.0)
Modes
• Single-primary
• Dual-primary (>= 8.0)
• Replication modes:
• Protocol A: asynchronous
• Protocol B: memory synchronous
• Protocol C: synchronous
Features: efficient synchronization
• Synchronization != replication
Features: efficient synchronization
• Synchronization != replication
• Inconsistent remote dataset during sync
• Useless
Features: efficient synchronization
• Synchronization != replication
• Inconsistent remote dataset during sync
• Useless
• Service in active node unaffected
Features: efficient synchronization
• Synchronization != replication
• Inconsistent remote dataset during sync
• Useless
• Service in active node unaffected
• Synchronization and replication happen at
the same time
Features: efficient synchronization
• Only one write op per several successive
writes in active node in a block
Features: efficient synchronization
• Only one write op per several successive
writes in active node in a block
• Linear access to blocks
Features: efficient synchronization
• Only one write op per several successive
writes in active node in a block
• Linear access to blocks
• Configure rate of sync
Features: efficient synchronization
• Only one write op per several successive
writes in active node in a block
• Linear access to blocks
• Configure rate of sync
• Checksum-based synchronization
Features: data verification
• On-line device verification
• block-by-block data integrity check
between nodes
Features: data verification
• On-line device verification
• block-by-block data integrity check
between nodes
• Replication traffic integrity checking
• end-to-end message integrity checking
using cryptographic message digest
algorithms
Features: disk
• Support for disk flushes
Features: disk
• Support for disk flushes
• Disk error handling strategies
• Passing
• Masking
• DIY
Features: disk
• Support for disk flushes
• Disk error handling strategies
• Passing
• Masking
• DIY
• Deal with outdated data
• DRBD won't promote an outdated
resource -> fencing
Features: replication
• Three-way replication
Features: replication
• Long distance replication with DRBD Proxy
• Not free
• Truck based replication
Split-brain
Split brain is a situation where, due to temporary failure
of all network links between cluster nodes, and possibly
due to intervention by a cluster management software
or human error, both nodes switched to the primary
role while disconnected.
Split-brain
• Configurable notifications
Split-brain
• Configurable notifications
• Automatic recovery methods
• Discard modifications on 'younger' primary.
• Discard modifications on 'older' primary.
• Discard modifications on primary with
fewer changes.
• Graceful recovery if one primary had no
changes.
Metadata
• Various pieces of information about the data
DRBD keeps in a dedicated area
• The size of the DRBD device
• The generation identifier
• The activity log
• The quick-sync bitmap
Metadata
• Can be stored internally or externally
Metadata
• Can be stored internally or externally
• Size
root@bob:~ # blockdev --getsz /dev/drbd0
root@bob:~ # 8388280
(8388280/2^18) * 8 + 72 = 328 sectors
328 sectors = 0,16MB
What it’s not/What it can’t do
• It’s not a backup system
What it’s not/What it can’t do
• It’s not a backup system
• It can’t add features to upper layers
What it’s not/What it can’t do
• It’s not a backup system
• It can’t add features to upper layers
• DRBD cannot auto-detect file system
corruption
• DRBD cannot add active-active clustering
capability to file systems like ext3 or XFS.
Limitations
• Only two nodes
• Stacked resources
• Version 9
Limitations
• Only two nodes
• Stacked resources
• Version 9
• There is no automatic failover.
Limitations
• Only two nodes
• Stacked resources
• Version 9
• There is no automatic failover.
• Promotion/demotion is manual.
Limitations
• Only two nodes
• Stacked resources
• Version 9
• There is no automatic failover.
• Promotion/demotion is manual.
• Needs a CRM to be useful
PACEMAKER FTW
Funcionament
root@alice:/etc/drbd.d # cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash:
234a142f7cf5bb21ffa1e95afa4f31608089c8b8
build by buildsystem@linbit, 2012-09-12 14:27:28
0: cs:Connected ro:Primary/Secondary
ds:UpToDate/UpToDate C r-----
ns:152 nr:4 dw:156 dr:4017 al:5 bm:1 lo:0 pe:0
ua:0 ap:0 ep:1 wo:f oos:0
More info
• drbd.org
• www.drbd.org/home/mailinglists
• www.linbit.com

Introduction to DRBD

  • 1.
  • 2.
    What is it? DistributedReplicated Block Device
  • 3.
    What is it? DistributedReplicated Block Device Software-based, shared-nothing replicated storage solution mirroring the contents of block devices
  • 4.
    What is it? DistributedReplicated Block Device Software-based, shared-nothing replicated storage solution mirroring the contents of block devices • In real time
  • 5.
    What is it? DistributedReplicated Block Device Software-based, shared-nothing replicated storage solution mirroring the contents of block devices • In real time • Transparently
  • 6.
    What is it? DistributedReplicated Block Device Software-based, shared-nothing replicated storage solution mirroring the contents of block devices • In real time • Transparently • Synchronously/asynchronously
  • 7.
  • 8.
    User space admintools • drbsetup • Used to configure the kernel module • All parameters in command-line
  • 9.
    User space admintools • drbsetup • Used to configure the kernel module • All parameters in command-line • drbdmeta • Create/dump/restore/modify DRBD metadata (more on this later)
  • 10.
    User space admintools • drbsetup • Used to configure the kernel module • All parameters in command-line • drbdmeta • Create/dump/restore/modify DRBD metadata (more on this later) • drbdadm • High-level, frontend for drbdsetup/drbdmeta • Reads from /etc/drbd.conf • Has a dry-run option (-d)
  • 11.
    Resources • A particularreplicated storage device
  • 12.
    Resources • A particularreplicated storage device • Resource name • DRBD device: virtual block device (major=147). The associated block device is always /dev/drbdm (m=minor) • Disk configuration: local copy of the data • Network configuration: comms with peer
  • 13.
    Configuration Per resource (/etc/drbd.d/mysql.res): resourcemysql { device minor 0; # /dev/drbd0 disk /dev/sdb; meta-disk internal; on alice { address 192.168.133.111:7000; } on bob { address 192.168.133.112:7000; } syncer { rate 10M; # static resync rate of 10MByte/s } }
  • 14.
    Configuration Global (/etc/drbd.d/global_common.conf): global { usage-countyes; } common { protocol C; disk { on-io-error detach; } syncer { al-extents 3833; } }
  • 15.
    Resource roles • Primary:read and write ops • Secondary: receives updates from primary, disallows any other access. • Promotion: from secondary to primary drbdadm primary all • Demotion: from primary to secondary drbdadm secondary all
  • 16.
  • 17.
    Modes • Single-primary • Dual-primary(>= 8.0) • Replication modes: • Protocol A: asynchronous • Protocol B: memory synchronous • Protocol C: synchronous
  • 18.
    Features: efficient synchronization •Synchronization != replication
  • 19.
    Features: efficient synchronization •Synchronization != replication • Inconsistent remote dataset during sync • Useless
  • 20.
    Features: efficient synchronization •Synchronization != replication • Inconsistent remote dataset during sync • Useless • Service in active node unaffected
  • 21.
    Features: efficient synchronization •Synchronization != replication • Inconsistent remote dataset during sync • Useless • Service in active node unaffected • Synchronization and replication happen at the same time
  • 22.
    Features: efficient synchronization •Only one write op per several successive writes in active node in a block
  • 23.
    Features: efficient synchronization •Only one write op per several successive writes in active node in a block • Linear access to blocks
  • 24.
    Features: efficient synchronization •Only one write op per several successive writes in active node in a block • Linear access to blocks • Configure rate of sync
  • 25.
    Features: efficient synchronization •Only one write op per several successive writes in active node in a block • Linear access to blocks • Configure rate of sync • Checksum-based synchronization
  • 26.
    Features: data verification •On-line device verification • block-by-block data integrity check between nodes
  • 27.
    Features: data verification •On-line device verification • block-by-block data integrity check between nodes • Replication traffic integrity checking • end-to-end message integrity checking using cryptographic message digest algorithms
  • 28.
  • 29.
    Features: disk • Supportfor disk flushes • Disk error handling strategies • Passing • Masking • DIY
  • 30.
    Features: disk • Supportfor disk flushes • Disk error handling strategies • Passing • Masking • DIY • Deal with outdated data • DRBD won't promote an outdated resource -> fencing
  • 31.
  • 32.
    Features: replication • Longdistance replication with DRBD Proxy • Not free • Truck based replication
  • 33.
    Split-brain Split brain isa situation where, due to temporary failure of all network links between cluster nodes, and possibly due to intervention by a cluster management software or human error, both nodes switched to the primary role while disconnected.
  • 34.
  • 35.
    Split-brain • Configurable notifications •Automatic recovery methods • Discard modifications on 'younger' primary. • Discard modifications on 'older' primary. • Discard modifications on primary with fewer changes. • Graceful recovery if one primary had no changes.
  • 36.
    Metadata • Various piecesof information about the data DRBD keeps in a dedicated area • The size of the DRBD device • The generation identifier • The activity log • The quick-sync bitmap
  • 37.
    Metadata • Can bestored internally or externally
  • 38.
    Metadata • Can bestored internally or externally • Size root@bob:~ # blockdev --getsz /dev/drbd0 root@bob:~ # 8388280 (8388280/2^18) * 8 + 72 = 328 sectors 328 sectors = 0,16MB
  • 39.
    What it’s not/Whatit can’t do • It’s not a backup system
  • 40.
    What it’s not/Whatit can’t do • It’s not a backup system • It can’t add features to upper layers
  • 41.
    What it’s not/Whatit can’t do • It’s not a backup system • It can’t add features to upper layers • DRBD cannot auto-detect file system corruption • DRBD cannot add active-active clustering capability to file systems like ext3 or XFS.
  • 42.
    Limitations • Only twonodes • Stacked resources • Version 9
  • 43.
    Limitations • Only twonodes • Stacked resources • Version 9 • There is no automatic failover.
  • 44.
    Limitations • Only twonodes • Stacked resources • Version 9 • There is no automatic failover. • Promotion/demotion is manual.
  • 45.
    Limitations • Only twonodes • Stacked resources • Version 9 • There is no automatic failover. • Promotion/demotion is manual. • Needs a CRM to be useful PACEMAKER FTW
  • 46.
    Funcionament root@alice:/etc/drbd.d # cat/proc/drbd version: 8.3.13 (api:88/proto:86-96) GIT-hash: 234a142f7cf5bb21ffa1e95afa4f31608089c8b8 build by buildsystem@linbit, 2012-09-12 14:27:28 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:152 nr:4 dw:156 dr:4017 al:5 bm:1 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
  • 47.
    More info • drbd.org •www.drbd.org/home/mailinglists • www.linbit.com