Slides for presentation of "NetRAID for the Linux Kernel" at UKUUG LISA/Winter Conference on High-Availability and Reliability, Feb. 2004. The preprint for the full article is at http://www.academia.edu/2493525/NetRAID_for_the_Linux_Kernel .
4. The Numbers Game
100BT LAN = 10MB/s
1TB mirror @ 40MB/s = 25000s
7.5 hours!
Temporary network outages = frequent
permanent disk losses = infrequent
Adds up to a need for a changed paradigm
5. What's wrong with ordnary RAID1?
Full resync too slow over the net
Net dropouts too frequent
trigger full resync
Does not expect same disk to be restored
Network glitches are cable errors
Requires presencial administration!
Writes synchronous to both sides - too slow
Reads may be from the slow side
6. RAID vs netRAID
Classical
small disks
physically close
medium bandwidth
infrequent dropouts
permanent losses
admin on hand
netRAID
large disks
physically dispersed
low bandwidth
frequent dropouts
temporary losses
admin off-scene
7. Solutions
Replace drivers
Linux kernel NBD → ENBD
Linux kernel RAID1 → FR1
Replace problems
disk fail is permanent → disk fail is temporary
repair by insert new disk → repair by reinsert old
admin does repair → device repairs itself
cables never fail → cables often fail
8. ENBD
automatic reconnect after network outage
block not error during temporary outage
redundant channel connectivity
(partitionable)
accelerated - skips writes equal both sides
talks to soft RAID overlay driver
supports remote ioctls and removable devices
9. FR1
full resync → intelligent partial resync
hot repair
automatic
asynchronous
writes eliminate latency
read from fastest (not there yet)
retain state across reboots (Paul Clements)
13. netRAID1 nuances
With mirrored journal
must preserve write ordering!
immediate takeover - no fsck!
Without
3x faster!
needs fsck
Detecting failure
private or public connectivity test?
12.2.1.3
14. Summary
Component-based assembly
ENBD - remote network disk
FR1 - Fast RAID
neFS - any file system
easier to parcel out development
more testing
easier to slip part supports into kernel
FS agnostic
Work together for replication, failover, recovery
15. thebilbliography
● Paul Clements & James E.J. Bottomley. High
Availability Data Replication. Proc. Linux Symposium
July 2003 Ottawa, Ontario, Canada.
http://archive.linuxsymposium.org/ols2003/Proceedings/All-
Reprints/Reprint-Clements-OLS2003.pdf
● P.T. Breuer et al. The Network Block Device
http://www2.linuxjournal.com/lj-
issues/issue73/3778.html