SlideShare a Scribd company logo
CIT 470: Advanced Network and System Administration Slide #1
CIT 470: Advanced Network and
System Administration
Disks
CIT 470: Advanced Network and System Administration Slide #2
Topics
1. Disk interfaces
2. Disk components
3. Performance
4. Reliability
5. Partitions
6. RAID
7. Failures and backups
CIT 470: Advanced Network and System Administration Slide #3
Volumes
A volume is a chunk of storage as seen by the server.
 A disk.
 A partition.
 A RAID set.
Storage is organized into layers:
Volumes
UNIX System Administration Handbook, 4/e
CIT 470: Advanced Network and System Administration Slide #4
Disk Interfaces
SAS
Serial attached SCSI (Small Computer Systems Interface)
SATA
Serial ATA (Advanced Technology Attachment)
Fibre Channel
High bandwidth (4 Gbps and up)
Can run SCSI or IP over fiber optic network cabling
iSCSI
SCSI over fast (e.g., 10 Gbps) IP network equipment.
USB and FireWire
SATA drives with converter to use via slow USB/FireWire
USB 3.0 at 5 Gbps can use full drive throughput
CIT 470: Advanced Network and System Administration Slide #5
SCSI
Small Computer Systems Interface
Fast, reliable, expensive.
Older versions
Original: SCSI-1 (1979) 5MB/s
Recent: SCSI-3 (2001) 320MB/s
Serial Attached SCSI (SAS)
Up to 64k devices
Up to 6 Gbps throughput, 12 Gbps by 2012
Can use SATA drives via SATA Tunneling Protocol
UNIX System Administration Handbook, 4/e
CIT 470: Advanced Network and System Administration Slide #6
IDE/ATA
Integrated Drive Electronics / AT attachment
Slower, less reliable, cheap.
Only allows 2 devices per interface.
Older versions
Original: IDE / ATA (1986) 26.4 Mbps
Recent: Ultra-ATA/133 1.064 Gbps
Serial ATA
Up to 128 devices.
1.5 Gbps (SATA-1), 3 Gbps (SATA-2),
6 Gbps (SATA-3)
SAS vs. SATA
SAS
• Higher RPM drives
– Up to 15 krpm
• Smaller drive capacity
– Up to 600GB for 15krpm
– Same as SATA for slower
• Higher drive reliability
• Higher drive cost
SATA
• Lower RPM drives
– Most 7200 rpm, some 10krpm
• Higher drive capacity
– Up to 1TB for 10krpm
– Up to 3TB for 7200rpm
• Lower drive reliability
• Lower drive cost
CIT 470: Advanced Network and System Administration Slide #7
CIT 470: Advanced Network and System Administration Slide #8
Hard Drive Components
CIT 470: Advanced Network and System Administration Slide #9
Hard Drive Components
Actuator
Moves arm across disk to read/write data.
Arm has multiple read/write heads (often 2/platter.)
Platters
Rigid substrate material.
Thin coating of magnetic material stores data.
Coating type determines areal density: Gbits/in2
Spindle Motor
Spins platters from 3600-15,000 rpm.
Speed determines disk latency.
Cache
8-64MB of cache memory
Write sequencing and read prefetching
CIT 470: Advanced Network and System Administration Slide #10
Disk Information: hdparm
# hdparm -i /dev/hde
/dev/hde:
Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C4533667
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=no WriteCache=enabled
Drive conforms to: device does not report version:
* signifies the current active mode
CIT 470: Advanced Network and System Administration Slide #11
Disk Performance
Seek Time
Time to move head to desired track (3-8 ms)
Rotational Delay
Time until head over desired block (8ms for 7200)
Latency
Seek Time + Rotational Delay
Throughput
Data transfer rate (30-120 MB/s)
CIT 470: Advanced Network and System Administration Slide #12
Latency vs. Throughput
Which is more important?
Depends on the type of load.
Sequential access – Throughput
Multimedia on a single user PC
Random access – Latency
Most servers
Improving Disk Performance
1. Faster Disks
Upgrade to a 10krpm or 15krpm disk
2. More Caching
Larger caches at OS, controller, or disk level
3. More Spindles (Disks)
Spread disk access across multiple disks
CIT 470: Advanced Network and System Administration Slide #13
Faster Disks: Solid State Drives
• Much lower latency ~ 0.1ms
• Faster throughput ~ 250MB/s
• Much higher cost ~ $1.5-2/GB vs. $0.10/GB
• Flash is slow to erase, so first block write is fast, second
block write is slow. Fast SSDs erase asynchronously.
• Flash base
CIT 470: Advanced Network and System Administration Slide #14
Much Faster Disks: PCIe SSDs
• SSDs saturate 3 Gb/s
(300MB/s) SATA bus
• Solution: Use PCIe
• Fusion-IO ioDrive
– 1424 MB/s read
– 632 MB/s write
• Drawbacks
– Only for local disks
– Unreliable in RAID sets
CIT 470: Advanced Network and System Administration Slide #15
http://www.brentozar.com/
archive/2010/03/fusion-io-
iodrive-review-fusionio/
Disk Caching
Operating System Cache
– Typically gigabytes in size (uses spare RAM.)
– Power cutoff causes cached data to be lost.
– Use sync command to force write to disk.
Disk Controller Cache
– Motherboard controllers typically have no cache.
– Server RAID controllers often have 128MB-1GB write cache.
– Some are battery-backed to preserve data if power fails.
Disk Cache
– All SATA/SAS disks have 8-64MB cache.
– Read prefetching grabs data blocks before requested in anticipation
of sequential reads.
– Write ordering re-orders writes to minimize head movement (only
disk firmware knows drive geometry so higher level caches can’t.)
CIT 470: Advanced Network and System Administration Slide #16
Cache Types
Write-Back
– Write operations finish when data in cache.
– Disk writes happen at a later time (asynchronous.)
– Faster, but if power fails written data disappears.
Write-Through
– Write operations finish when data on disk.
– Disk writes happen at same time (synchronous.)
– Slower, but no data will be lost after write
complete.
CIT 470: Advanced Network and System Administration Slide #17
CIT 470: Advanced Network and System Administration Slide #18
Measuring Disk Performance
# hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 1954 MB in 2.00
seconds = 977.02 MB/sec
Timing buffered disk reads:
268 MB in 3.02 seconds = 88.66 MB/sec
• Select an appropriate measuring tool.
• Choose a time when disk is quiescent
– No background processes making disk accesses
CIT 470: Advanced Network and System Administration Slide #19
Whole Disk Reliability
MTTF/MTBF
Average time to failure.
Range from 1 to 1.5 million hours.
Annual Failure Rate (AFR)
Hours per year = 24 × 365 = 8760
AFR = (Hours/year) / MTTF = 0.88% to 0.58%
How many disks fail per year?
int(Disks × AFR)
ex: int(2000 × 0.58%) = 18 disks/year
CIT 470: Advanced Network and System Administration Slide #20
Bathtub Curve Model
Early phase: high failure rate from defects.
Constant failure rate phase: MTBF valid.
Wearout phase: high failure rate from wear.
CIT 470: Advanced Network and System Administration Slide #21
Whole Disk Reliability
Failures more likely on traumatic events.
Power on/off.
Annual Return Rates (ARR) > maker AFR
2-4% common
Up to 13%
Why is ARR > AFR?
Sysadmin tests harder than manufacturer tests.
Some manufacturers find 43% of returned drives
good by their standards.
SMART
Self-Monitoring, Analysis, and Reporting Tech
– Monitoring system for disk drives.
– Data collected by smartd under Linux.
Provide warnings before drive failure based on
– Increased heat or noise.
– Increased number of transient r/w failures.
SMART categories correlated with disk failure
– First scan error, reallocation errors, probational.
– Enable SMART in BIOS to use then
– Use smartctl -a /dev/sda to see SMART data.
CIT 470: Advanced Network and System Administration Slide #22
Unrecoverable Read Errors
Typical URE rate = 1 in 1016 bits
– Older drives had much higher rates.
Sector error rate = 1 in 2.4 × 1011 sectors
– 1016 bits / (512 bytes/sector × 8 bytes/bit)
Chance to read a sector correctly
– 1 - 1 / 2.4 × 1011 = 99.9999999995833%
Chance to read 1TB correctly
– (1 - 1 / 2.4 × 1011) ^ (1012 bytes / 512 bytes/sector)
– 99.2%
CIT 470: Advanced Network and System Administration Slide #23
CIT 470: Advanced Network and System Administration Slide #24
Partitions and the MBR
4 primary partitions.
One can be used as an
extended partition, which
is a link to an Extended
boot record on the 1st
sector of that partition.
Each logical partition is
described by its own EBR,
which links to the next
EBR.
CIT 470: Advanced Network and System Administration Slide #25
Extended Partitions and EBRs
There is only one extended partition.
– It is one of the primary partitions.
– It contains one or more logical partitions.
– EBRs describe the logical partitions.
– It should contain all disk space not used by the
other primary partitions.
EBRs contain two entries.
– The first entry describes a logical partition.
– The second entry points to the next EBR if there
are more logical partitions after the current one.
GUID Partition Table
• Replacement for the MBR
partition table
– Allows partitions larger than
2TB, unlike MBR.
– All addresses in LBA, not CHS
with various LBA hacks.
– Protective MBR allows
backwards compatibility.
• Used by Intel Macs.
• Can be used by Linux, BSD,
64-bit Windows.
CIT 470: Advanced Network and System Administration Slide #26
CIT 470: Advanced Network and System Administration Slide #27
Why Partition?
1. Separate OS from user files, to allow user
backups + OS upgrades w/o problems.
2. Have a faster swap area for virtual memory.
3. Improve performance by keeping filesystem
tables small and keeping frequently used together
files close together on the disk.
4. Limit the effect of disk full issues, often caused
by log or cache files.
5. Multi-boot systems with multiple OSes.
RAID
Redundant Array of Independent Disks
RAID capabilities:
– Improve capacity by using multiple disks
– Improve performance with striping
– Improve reliability with redundancy
• Mirroring
• Parity
CIT 470: Advanced Network and System Administration Slide #28
CIT 470: Advanced Network and System Administration Slide #29
RAID Levels
Level Min Description
JBOD 2 Merge disks for capacity, no striping.
RAID 0 2 Striped for performance + capacity.
RAID 1 2 Mirrored for fault tolerance.
RAID 3 3 Striped set with dedicated parity disk.
RAID 4 3 Block instead of byte level striping.
RAID 5 3 Striped set with distributed parity.
RAID 6 4 Striped set with dual distributed parity.
CIT 470: Advanced Network and System Administration Slide #30
Striping
• Distribute data across multiple disks.
• Improve I/O by accessing disks in parallel.
– Independent requests can be serviced in parallel by
separate disks.
– Single multi-block requests can be serviced by multiple
disks.
• Performance vs. reliability
– Performance increases with # disks.
– Reliability decreases with # disks.
CIT 470: Advanced Network and System Administration Slide #31
Parity
Store extra bit with each chunk of data.
7-bit data even parity odd parity
0000000 00000000 10000000
1011011 11011011 01011011
1100110 01100110 11100110
1111111 11111111 01111111
Odd parity
 add 0 if # of 1s is odd
 add 1 if # of 1s is even
Even parity
 add 0 if # of 1s is even
 add 1 if # of 1s is odd
CIT 470: Advanced Network and System Administration Slide #32
Error Detection with Parity
Even: every byte must have even # of 1s.
What if you read a byte with an odd # of 1s?
– It’s an error.
– An odd # of bits were flipped.
What if read a byte with an even # of 1s?
– It may be correct.
– It may be an error where an even # of bits are bad.
CIT 470: Advanced Network and System Administration Slide #33
Error Correction
XOR each block to get parity information.
XOR with parity block to retrieve missing block on bad drive.
CIT 470: Advanced Network and System Administration Slide #34
RAID 0: Striping, no Parity
Performance
Throughput = n × disk speed
Reliability
 Lower reliability.
 If one disk lost, entire set is lost.
 MTBF = (avg MTBF)/# disks
Capacity
n × disk size
CIT 470: Advanced Network and System Administration Slide #35
RAID 1: Disk Mirroring
Performance
– Reads are faster since read operations will
return after first read is complete.
– Writes are slower because write operations
return after second write is complete.
Reliability
– System continues to work after one disk dies.
– Doesn’t protect against disk or controller
failure that corrupts data instead of killing
disk.
– Doesn’t protect against human or software
error.
Capacity
– n/2 × disk size
CIT 470: Advanced Network and System Administration Slide #36
RAID 3: Striping + Dedicated Parity
Reliability
Survive failure of any 1 disk.
Performance
 Striping increases performance,
but
 Parity disk must be accessed on
every write.
 Parity calculation decreases write
performance.
 Good for sequential reads (large
graphics + video files.)
Capacity
(n-1) × disk size
CIT 470: Advanced Network and System Administration Slide #37
RAID 4: Stripe + Block Parity Disk
• Identical to RAID 3
except uses block
striping instead of
byte striping.
CIT 470: Advanced Network and System Administration Slide #38
RAID 5: Stripe + Distributed Parity
Reliability
Survive failure of any 1 disk.
Performance
 Fast reads (RAID 0), but
slow writes.
 Like RAID 4 but without
bottleneck of a single parity
disk.
 Still have to read blocks +
write parity block if alter
any data blocks.
Capacity
(n-1) × disk size
CIT 470: Advanced Network and System Administration Slide #39
RAID 6: Striped with Dual Parity
• Like RAID 5 but with two parity blocks.
• Can survive failure of two drives at once.
CIT 470: Advanced Network and System Administration Slide #40
Nested RAID Levels
Many RAID systems can use both
– Physical drives.
– RAID sets.
as RAID volumes:
– Allows admins to combine advantages of levels.
– Nested levels named by combination of levels,
e.g. RAID 01 or RAID 0+1
CIT 470: Advanced Network and System Administration Slide #41
RAID 01 (0+1)
Mirror of stripes.
If disk fails in RAID 0
array, can be tolerated by
using disk from other
RAID 0.
Cannot tolerate 2 disk
failures unless both from
same stripe.
CIT 470: Advanced Network and System Administration Slide #42
RAID 10 (1+0)
Stripe of mirrors.
Can tolerate all but one drive
can failing from each RAID
1 set.
Uses more disk space than
RAID 5 but provides higher
performance.
Highest capacity,
performance, and cost.
CIT 470: Advanced Network and System Administration Slide #43
RAID 51 (5+1)
Mirror of RAID 5s
Capacity = (n/2-1) ×
disk size
Min disks: 6
CIT 470: Advanced Network and System Administration Slide #44
RAID Failures
RAID sets still work after single disk failure
 Except RAID 0 and 6
 Operate in degraded mode
RAID set rebuilds after bad disk replaced
 Can take hours to rebuild parity/mirror data.
 Some hardware allows hot swapping, so server
doesn’t have to be rebooted to replace disk.
 Some hardware supports a hot spare disk that
will be used immediately on disk failure for
rebuild.
RAID 5 Write Hole
RAID 5 does incremental parity updates
– Read of old parity block
– Read of old block to be replaced
– Write of new data to old block
– Write of new parity block
Blocks can become out of sync with parity since
– You cannot write to two drives atomically
If a block goes bad, RAID 5 will not notice
– Until a drive fails and you rebuild the array
– Receiving garbage in place of that block.
CIT 470: Advanced Network and System Administration Slide #45
Hardware vs. Software RAID
Hardware RAID
– Dedicated storage processor, possibly cache.
– OS sees RAID volume as single hard disk.
– Easy to boot or rebuild after drive failure.
– Hardware RAID sets will not work on other controllers.
– SNIA Raid Disk Data Format (DDF) std may fix this in future.
Software RAID
– RAID controlled by operating system; will use 1-10% of CPU.
– Must configure OS to boot off either drive of mirror so system can boot if a
drive fails.
– Software RAID sets will work on any controller.
Fake RAID
– Hardware RAID without a dedicated storage processor; uses 1-10% CPU.
– Cheap, supports limited RAID types, may come with motherboard.
– Fake RAID sets will not work on other controllers.
CIT 470: Advanced Network and System Administration Slide #46
UREs and RAID
Ex: RAID 5 of 8 1TB disks (7TB storage)
– What happens if a disk fails and is replaced?
– RAID5 rebuild reads 7TB to reconstruct 1TB
Probability to read a sector correctly
– 1 - 1 / 2.4 × 1011 = 99.9999999995833%
Probability to rebuild RAID 5 array:
– (1 - 1 / 2.4 × 1011) ^ (7 × 1012 bytes / 512 bytes/sector)
– 94.5%
We need RAID 6 today.
CIT 470: Advanced Network and System Administration Slide #47
CIT 470: Advanced Network and System Administration Slide #48
You still need backups
Human and software errors
– RAID won’t protect you from rm –rf / or copying
over the wrong file.
System crash
– Crashes can interrupt write operations, leading to
situation where data is updated but parity not.
Correlated disk failures
– Accidents (power failures, dropping the machine) can
impact all disks at once.
– Disks bought at same time often fail at same time.
Hardware data corruption
– If a disk controller writes bad data, all disks will have the
bad data.
CIT 470: Advanced Network and System Administration Slide #49
References
1. Charles M. Kozierok, “Reference Guide—Hard Disk Drives,”
http://www.pcguide.com/ref/hdd/, 2005.
2. Adam Leventhal. Triple-Parity RAID and Beyond. In:
Communications of the ACM, Vol 53, No. 1. Jan 2010.
3. Evi Nemeth et al, UNIX System Administration Handbook, 3rd
edition, Prentice Hall, 2001.
4. Octane, “SCSI Technology Primer,”
http://arstechnica.com/paedia/s/scsi-1.html, 2002.
5. Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andre Barroso.
Failure Trends in a Large Disk Drive Populartion. In: FAST '07: 5th
USENIX Conference on File and Storage Technologies. 2007.
6. Bianca Schroder and Garth A. Gibson. Disk failures in the real
world: What does an MTTF of 1,000,000 hours mean? In: FAST '07:
5th USENIX Conference on File and Storage Technologies. 2007.
7. Wikipedia, RAID, http://en.wikipedia.org/wiki/RAID, 2010.

More Related Content

Similar to Disks.pptx

High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data Production
MattBethel1
 
Hard disk PPT
Hard disk PPTHard disk PPT
Hard disk PPT
George Ranson
 
Chap2 5e u v2 - theory
Chap2 5e u v2 - theoryChap2 5e u v2 - theory
Chap2 5e u v2 - theorydd25251
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet
美兰 曾
 
Unit3 ppt3 hard drive
Unit3 ppt3 hard driveUnit3 ppt3 hard drive
Unit3 ppt3 hard drive
FarhanMalik93
 
Introduction to Storage technologies
Introduction to Storage technologiesIntroduction to Storage technologies
Introduction to Storage technologies
Kaivalya Shah
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
Bishal Ghimire
 
Hard Disk Componets
Hard Disk ComponetsHard Disk Componets
Hard Disk Componets
Pramod Ithape
 
Thiru
ThiruThiru
Computer Hardware Basics (Components to be understand)
Computer Hardware Basics (Components to be understand)Computer Hardware Basics (Components to be understand)
Computer Hardware Basics (Components to be understand)
Digvijay Singh Karakoti
 
Milestone Server And Storage Best Practice
Milestone   Server And Storage Best PracticeMilestone   Server And Storage Best Practice
Milestone Server And Storage Best Practicehypknight
 
505 kobal exadata
505 kobal exadata505 kobal exadata
505 kobal exadata
Kam Chan
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
PencilData
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specifications
Raghul P
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Odinot Stanislas
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
Kasun Rathnayaka
 

Similar to Disks.pptx (20)

High Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data ProductionHigh Performance Computing for LiDAR Data Production
High Performance Computing for LiDAR Data Production
 
Hard disk PPT
Hard disk PPTHard disk PPT
Hard disk PPT
 
HARD DISK DRIVE ppt
HARD DISK DRIVE pptHARD DISK DRIVE ppt
HARD DISK DRIVE ppt
 
P1 Unit 3
P1 Unit 3 P1 Unit 3
P1 Unit 3
 
Chap2 5e u v2 - theory
Chap2 5e u v2 - theoryChap2 5e u v2 - theory
Chap2 5e u v2 - theory
 
Hpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server DatasheetHpe Proliant DL325 Gen10 Server Datasheet
Hpe Proliant DL325 Gen10 Server Datasheet
 
Unit3 ppt3 hard drive
Unit3 ppt3 hard driveUnit3 ppt3 hard drive
Unit3 ppt3 hard drive
 
Introduction to Storage technologies
Introduction to Storage technologiesIntroduction to Storage technologies
Introduction to Storage technologies
 
Computer hardware ppt1
Computer hardware ppt1Computer hardware ppt1
Computer hardware ppt1
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
 
Hard Disk Componets
Hard Disk ComponetsHard Disk Componets
Hard Disk Componets
 
Thiru
ThiruThiru
Thiru
 
Computer Hardware Basics (Components to be understand)
Computer Hardware Basics (Components to be understand)Computer Hardware Basics (Components to be understand)
Computer Hardware Basics (Components to be understand)
 
Milestone Server And Storage Best Practice
Milestone   Server And Storage Best PracticeMilestone   Server And Storage Best Practice
Milestone Server And Storage Best Practice
 
Technical
TechnicalTechnical
Technical
 
505 kobal exadata
505 kobal exadata505 kobal exadata
505 kobal exadata
 
SUN主机产品介绍.ppt
SUN主机产品介绍.pptSUN主机产品介绍.ppt
SUN主机产品介绍.ppt
 
Dmx3 950-technical specifications
Dmx3 950-technical specificationsDmx3 950-technical specifications
Dmx3 950-technical specifications
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
 

More from hoangdinhhanh88

linux-lecture3.ppt
linux-lecture3.pptlinux-lecture3.ppt
linux-lecture3.ppt
hoangdinhhanh88
 
Chapter 9 TCP IP Reference Model.ppt
Chapter 9 TCP IP Reference Model.pptChapter 9 TCP IP Reference Model.ppt
Chapter 9 TCP IP Reference Model.ppt
hoangdinhhanh88
 
RemoteAdmin.pptx
RemoteAdmin.pptxRemoteAdmin.pptx
RemoteAdmin.pptx
hoangdinhhanh88
 
7_Chapter 7_Email.pptx
7_Chapter 7_Email.pptx7_Chapter 7_Email.pptx
7_Chapter 7_Email.pptx
hoangdinhhanh88
 
3_CHAP~2.PPT
3_CHAP~2.PPT3_CHAP~2.PPT
3_CHAP~2.PPT
hoangdinhhanh88
 
2_Chapter 2_DNS.pptx
2_Chapter 2_DNS.pptx2_Chapter 2_DNS.pptx
2_Chapter 2_DNS.pptx
hoangdinhhanh88
 
1.khai niem can ban
1.khai niem can ban1.khai niem can ban
1.khai niem can ban
hoangdinhhanh88
 
1 giới thiệu-cài đặt oracle
1 giới thiệu-cài đặt oracle1 giới thiệu-cài đặt oracle
1 giới thiệu-cài đặt oracle
hoangdinhhanh88
 
2 co ban ve sql
2 co ban ve sql2 co ban ve sql
2 co ban ve sql
hoangdinhhanh88
 

More from hoangdinhhanh88 (10)

linux-lecture3.ppt
linux-lecture3.pptlinux-lecture3.ppt
linux-lecture3.ppt
 
Chapter 9 TCP IP Reference Model.ppt
Chapter 9 TCP IP Reference Model.pptChapter 9 TCP IP Reference Model.ppt
Chapter 9 TCP IP Reference Model.ppt
 
RemoteAdmin.pptx
RemoteAdmin.pptxRemoteAdmin.pptx
RemoteAdmin.pptx
 
7_Chapter 7_Email.pptx
7_Chapter 7_Email.pptx7_Chapter 7_Email.pptx
7_Chapter 7_Email.pptx
 
3_CHAP~2.PPT
3_CHAP~2.PPT3_CHAP~2.PPT
3_CHAP~2.PPT
 
2_Chapter 2_DNS.pptx
2_Chapter 2_DNS.pptx2_Chapter 2_DNS.pptx
2_Chapter 2_DNS.pptx
 
1.khai niem can ban
1.khai niem can ban1.khai niem can ban
1.khai niem can ban
 
Dns
DnsDns
Dns
 
1 giới thiệu-cài đặt oracle
1 giới thiệu-cài đặt oracle1 giới thiệu-cài đặt oracle
1 giới thiệu-cài đặt oracle
 
2 co ban ve sql
2 co ban ve sql2 co ban ve sql
2 co ban ve sql
 

Recently uploaded

J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 

Recently uploaded (20)

J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 

Disks.pptx

  • 1. CIT 470: Advanced Network and System Administration Slide #1 CIT 470: Advanced Network and System Administration Disks
  • 2. CIT 470: Advanced Network and System Administration Slide #2 Topics 1. Disk interfaces 2. Disk components 3. Performance 4. Reliability 5. Partitions 6. RAID 7. Failures and backups
  • 3. CIT 470: Advanced Network and System Administration Slide #3 Volumes A volume is a chunk of storage as seen by the server.  A disk.  A partition.  A RAID set. Storage is organized into layers: Volumes UNIX System Administration Handbook, 4/e
  • 4. CIT 470: Advanced Network and System Administration Slide #4 Disk Interfaces SAS Serial attached SCSI (Small Computer Systems Interface) SATA Serial ATA (Advanced Technology Attachment) Fibre Channel High bandwidth (4 Gbps and up) Can run SCSI or IP over fiber optic network cabling iSCSI SCSI over fast (e.g., 10 Gbps) IP network equipment. USB and FireWire SATA drives with converter to use via slow USB/FireWire USB 3.0 at 5 Gbps can use full drive throughput
  • 5. CIT 470: Advanced Network and System Administration Slide #5 SCSI Small Computer Systems Interface Fast, reliable, expensive. Older versions Original: SCSI-1 (1979) 5MB/s Recent: SCSI-3 (2001) 320MB/s Serial Attached SCSI (SAS) Up to 64k devices Up to 6 Gbps throughput, 12 Gbps by 2012 Can use SATA drives via SATA Tunneling Protocol UNIX System Administration Handbook, 4/e
  • 6. CIT 470: Advanced Network and System Administration Slide #6 IDE/ATA Integrated Drive Electronics / AT attachment Slower, less reliable, cheap. Only allows 2 devices per interface. Older versions Original: IDE / ATA (1986) 26.4 Mbps Recent: Ultra-ATA/133 1.064 Gbps Serial ATA Up to 128 devices. 1.5 Gbps (SATA-1), 3 Gbps (SATA-2), 6 Gbps (SATA-3)
  • 7. SAS vs. SATA SAS • Higher RPM drives – Up to 15 krpm • Smaller drive capacity – Up to 600GB for 15krpm – Same as SATA for slower • Higher drive reliability • Higher drive cost SATA • Lower RPM drives – Most 7200 rpm, some 10krpm • Higher drive capacity – Up to 1TB for 10krpm – Up to 3TB for 7200rpm • Lower drive reliability • Lower drive cost CIT 470: Advanced Network and System Administration Slide #7
  • 8. CIT 470: Advanced Network and System Administration Slide #8 Hard Drive Components
  • 9. CIT 470: Advanced Network and System Administration Slide #9 Hard Drive Components Actuator Moves arm across disk to read/write data. Arm has multiple read/write heads (often 2/platter.) Platters Rigid substrate material. Thin coating of magnetic material stores data. Coating type determines areal density: Gbits/in2 Spindle Motor Spins platters from 3600-15,000 rpm. Speed determines disk latency. Cache 8-64MB of cache memory Write sequencing and read prefetching
  • 10. CIT 470: Advanced Network and System Administration Slide #10 Disk Information: hdparm # hdparm -i /dev/hde /dev/hde: Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C4533667 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signifies the current active mode
  • 11. CIT 470: Advanced Network and System Administration Slide #11 Disk Performance Seek Time Time to move head to desired track (3-8 ms) Rotational Delay Time until head over desired block (8ms for 7200) Latency Seek Time + Rotational Delay Throughput Data transfer rate (30-120 MB/s)
  • 12. CIT 470: Advanced Network and System Administration Slide #12 Latency vs. Throughput Which is more important? Depends on the type of load. Sequential access – Throughput Multimedia on a single user PC Random access – Latency Most servers
  • 13. Improving Disk Performance 1. Faster Disks Upgrade to a 10krpm or 15krpm disk 2. More Caching Larger caches at OS, controller, or disk level 3. More Spindles (Disks) Spread disk access across multiple disks CIT 470: Advanced Network and System Administration Slide #13
  • 14. Faster Disks: Solid State Drives • Much lower latency ~ 0.1ms • Faster throughput ~ 250MB/s • Much higher cost ~ $1.5-2/GB vs. $0.10/GB • Flash is slow to erase, so first block write is fast, second block write is slow. Fast SSDs erase asynchronously. • Flash base CIT 470: Advanced Network and System Administration Slide #14
  • 15. Much Faster Disks: PCIe SSDs • SSDs saturate 3 Gb/s (300MB/s) SATA bus • Solution: Use PCIe • Fusion-IO ioDrive – 1424 MB/s read – 632 MB/s write • Drawbacks – Only for local disks – Unreliable in RAID sets CIT 470: Advanced Network and System Administration Slide #15 http://www.brentozar.com/ archive/2010/03/fusion-io- iodrive-review-fusionio/
  • 16. Disk Caching Operating System Cache – Typically gigabytes in size (uses spare RAM.) – Power cutoff causes cached data to be lost. – Use sync command to force write to disk. Disk Controller Cache – Motherboard controllers typically have no cache. – Server RAID controllers often have 128MB-1GB write cache. – Some are battery-backed to preserve data if power fails. Disk Cache – All SATA/SAS disks have 8-64MB cache. – Read prefetching grabs data blocks before requested in anticipation of sequential reads. – Write ordering re-orders writes to minimize head movement (only disk firmware knows drive geometry so higher level caches can’t.) CIT 470: Advanced Network and System Administration Slide #16
  • 17. Cache Types Write-Back – Write operations finish when data in cache. – Disk writes happen at a later time (asynchronous.) – Faster, but if power fails written data disappears. Write-Through – Write operations finish when data on disk. – Disk writes happen at same time (synchronous.) – Slower, but no data will be lost after write complete. CIT 470: Advanced Network and System Administration Slide #17
  • 18. CIT 470: Advanced Network and System Administration Slide #18 Measuring Disk Performance # hdparm -tT /dev/sda /dev/sda: Timing cached reads: 1954 MB in 2.00 seconds = 977.02 MB/sec Timing buffered disk reads: 268 MB in 3.02 seconds = 88.66 MB/sec • Select an appropriate measuring tool. • Choose a time when disk is quiescent – No background processes making disk accesses
  • 19. CIT 470: Advanced Network and System Administration Slide #19 Whole Disk Reliability MTTF/MTBF Average time to failure. Range from 1 to 1.5 million hours. Annual Failure Rate (AFR) Hours per year = 24 × 365 = 8760 AFR = (Hours/year) / MTTF = 0.88% to 0.58% How many disks fail per year? int(Disks × AFR) ex: int(2000 × 0.58%) = 18 disks/year
  • 20. CIT 470: Advanced Network and System Administration Slide #20 Bathtub Curve Model Early phase: high failure rate from defects. Constant failure rate phase: MTBF valid. Wearout phase: high failure rate from wear.
  • 21. CIT 470: Advanced Network and System Administration Slide #21 Whole Disk Reliability Failures more likely on traumatic events. Power on/off. Annual Return Rates (ARR) > maker AFR 2-4% common Up to 13% Why is ARR > AFR? Sysadmin tests harder than manufacturer tests. Some manufacturers find 43% of returned drives good by their standards.
  • 22. SMART Self-Monitoring, Analysis, and Reporting Tech – Monitoring system for disk drives. – Data collected by smartd under Linux. Provide warnings before drive failure based on – Increased heat or noise. – Increased number of transient r/w failures. SMART categories correlated with disk failure – First scan error, reallocation errors, probational. – Enable SMART in BIOS to use then – Use smartctl -a /dev/sda to see SMART data. CIT 470: Advanced Network and System Administration Slide #22
  • 23. Unrecoverable Read Errors Typical URE rate = 1 in 1016 bits – Older drives had much higher rates. Sector error rate = 1 in 2.4 × 1011 sectors – 1016 bits / (512 bytes/sector × 8 bytes/bit) Chance to read a sector correctly – 1 - 1 / 2.4 × 1011 = 99.9999999995833% Chance to read 1TB correctly – (1 - 1 / 2.4 × 1011) ^ (1012 bytes / 512 bytes/sector) – 99.2% CIT 470: Advanced Network and System Administration Slide #23
  • 24. CIT 470: Advanced Network and System Administration Slide #24 Partitions and the MBR 4 primary partitions. One can be used as an extended partition, which is a link to an Extended boot record on the 1st sector of that partition. Each logical partition is described by its own EBR, which links to the next EBR.
  • 25. CIT 470: Advanced Network and System Administration Slide #25 Extended Partitions and EBRs There is only one extended partition. – It is one of the primary partitions. – It contains one or more logical partitions. – EBRs describe the logical partitions. – It should contain all disk space not used by the other primary partitions. EBRs contain two entries. – The first entry describes a logical partition. – The second entry points to the next EBR if there are more logical partitions after the current one.
  • 26. GUID Partition Table • Replacement for the MBR partition table – Allows partitions larger than 2TB, unlike MBR. – All addresses in LBA, not CHS with various LBA hacks. – Protective MBR allows backwards compatibility. • Used by Intel Macs. • Can be used by Linux, BSD, 64-bit Windows. CIT 470: Advanced Network and System Administration Slide #26
  • 27. CIT 470: Advanced Network and System Administration Slide #27 Why Partition? 1. Separate OS from user files, to allow user backups + OS upgrades w/o problems. 2. Have a faster swap area for virtual memory. 3. Improve performance by keeping filesystem tables small and keeping frequently used together files close together on the disk. 4. Limit the effect of disk full issues, often caused by log or cache files. 5. Multi-boot systems with multiple OSes.
  • 28. RAID Redundant Array of Independent Disks RAID capabilities: – Improve capacity by using multiple disks – Improve performance with striping – Improve reliability with redundancy • Mirroring • Parity CIT 470: Advanced Network and System Administration Slide #28
  • 29. CIT 470: Advanced Network and System Administration Slide #29 RAID Levels Level Min Description JBOD 2 Merge disks for capacity, no striping. RAID 0 2 Striped for performance + capacity. RAID 1 2 Mirrored for fault tolerance. RAID 3 3 Striped set with dedicated parity disk. RAID 4 3 Block instead of byte level striping. RAID 5 3 Striped set with distributed parity. RAID 6 4 Striped set with dual distributed parity.
  • 30. CIT 470: Advanced Network and System Administration Slide #30 Striping • Distribute data across multiple disks. • Improve I/O by accessing disks in parallel. – Independent requests can be serviced in parallel by separate disks. – Single multi-block requests can be serviced by multiple disks. • Performance vs. reliability – Performance increases with # disks. – Reliability decreases with # disks.
  • 31. CIT 470: Advanced Network and System Administration Slide #31 Parity Store extra bit with each chunk of data. 7-bit data even parity odd parity 0000000 00000000 10000000 1011011 11011011 01011011 1100110 01100110 11100110 1111111 11111111 01111111 Odd parity  add 0 if # of 1s is odd  add 1 if # of 1s is even Even parity  add 0 if # of 1s is even  add 1 if # of 1s is odd
  • 32. CIT 470: Advanced Network and System Administration Slide #32 Error Detection with Parity Even: every byte must have even # of 1s. What if you read a byte with an odd # of 1s? – It’s an error. – An odd # of bits were flipped. What if read a byte with an even # of 1s? – It may be correct. – It may be an error where an even # of bits are bad.
  • 33. CIT 470: Advanced Network and System Administration Slide #33 Error Correction XOR each block to get parity information. XOR with parity block to retrieve missing block on bad drive.
  • 34. CIT 470: Advanced Network and System Administration Slide #34 RAID 0: Striping, no Parity Performance Throughput = n × disk speed Reliability  Lower reliability.  If one disk lost, entire set is lost.  MTBF = (avg MTBF)/# disks Capacity n × disk size
  • 35. CIT 470: Advanced Network and System Administration Slide #35 RAID 1: Disk Mirroring Performance – Reads are faster since read operations will return after first read is complete. – Writes are slower because write operations return after second write is complete. Reliability – System continues to work after one disk dies. – Doesn’t protect against disk or controller failure that corrupts data instead of killing disk. – Doesn’t protect against human or software error. Capacity – n/2 × disk size
  • 36. CIT 470: Advanced Network and System Administration Slide #36 RAID 3: Striping + Dedicated Parity Reliability Survive failure of any 1 disk. Performance  Striping increases performance, but  Parity disk must be accessed on every write.  Parity calculation decreases write performance.  Good for sequential reads (large graphics + video files.) Capacity (n-1) × disk size
  • 37. CIT 470: Advanced Network and System Administration Slide #37 RAID 4: Stripe + Block Parity Disk • Identical to RAID 3 except uses block striping instead of byte striping.
  • 38. CIT 470: Advanced Network and System Administration Slide #38 RAID 5: Stripe + Distributed Parity Reliability Survive failure of any 1 disk. Performance  Fast reads (RAID 0), but slow writes.  Like RAID 4 but without bottleneck of a single parity disk.  Still have to read blocks + write parity block if alter any data blocks. Capacity (n-1) × disk size
  • 39. CIT 470: Advanced Network and System Administration Slide #39 RAID 6: Striped with Dual Parity • Like RAID 5 but with two parity blocks. • Can survive failure of two drives at once.
  • 40. CIT 470: Advanced Network and System Administration Slide #40 Nested RAID Levels Many RAID systems can use both – Physical drives. – RAID sets. as RAID volumes: – Allows admins to combine advantages of levels. – Nested levels named by combination of levels, e.g. RAID 01 or RAID 0+1
  • 41. CIT 470: Advanced Network and System Administration Slide #41 RAID 01 (0+1) Mirror of stripes. If disk fails in RAID 0 array, can be tolerated by using disk from other RAID 0. Cannot tolerate 2 disk failures unless both from same stripe.
  • 42. CIT 470: Advanced Network and System Administration Slide #42 RAID 10 (1+0) Stripe of mirrors. Can tolerate all but one drive can failing from each RAID 1 set. Uses more disk space than RAID 5 but provides higher performance. Highest capacity, performance, and cost.
  • 43. CIT 470: Advanced Network and System Administration Slide #43 RAID 51 (5+1) Mirror of RAID 5s Capacity = (n/2-1) × disk size Min disks: 6
  • 44. CIT 470: Advanced Network and System Administration Slide #44 RAID Failures RAID sets still work after single disk failure  Except RAID 0 and 6  Operate in degraded mode RAID set rebuilds after bad disk replaced  Can take hours to rebuild parity/mirror data.  Some hardware allows hot swapping, so server doesn’t have to be rebooted to replace disk.  Some hardware supports a hot spare disk that will be used immediately on disk failure for rebuild.
  • 45. RAID 5 Write Hole RAID 5 does incremental parity updates – Read of old parity block – Read of old block to be replaced – Write of new data to old block – Write of new parity block Blocks can become out of sync with parity since – You cannot write to two drives atomically If a block goes bad, RAID 5 will not notice – Until a drive fails and you rebuild the array – Receiving garbage in place of that block. CIT 470: Advanced Network and System Administration Slide #45
  • 46. Hardware vs. Software RAID Hardware RAID – Dedicated storage processor, possibly cache. – OS sees RAID volume as single hard disk. – Easy to boot or rebuild after drive failure. – Hardware RAID sets will not work on other controllers. – SNIA Raid Disk Data Format (DDF) std may fix this in future. Software RAID – RAID controlled by operating system; will use 1-10% of CPU. – Must configure OS to boot off either drive of mirror so system can boot if a drive fails. – Software RAID sets will work on any controller. Fake RAID – Hardware RAID without a dedicated storage processor; uses 1-10% CPU. – Cheap, supports limited RAID types, may come with motherboard. – Fake RAID sets will not work on other controllers. CIT 470: Advanced Network and System Administration Slide #46
  • 47. UREs and RAID Ex: RAID 5 of 8 1TB disks (7TB storage) – What happens if a disk fails and is replaced? – RAID5 rebuild reads 7TB to reconstruct 1TB Probability to read a sector correctly – 1 - 1 / 2.4 × 1011 = 99.9999999995833% Probability to rebuild RAID 5 array: – (1 - 1 / 2.4 × 1011) ^ (7 × 1012 bytes / 512 bytes/sector) – 94.5% We need RAID 6 today. CIT 470: Advanced Network and System Administration Slide #47
  • 48. CIT 470: Advanced Network and System Administration Slide #48 You still need backups Human and software errors – RAID won’t protect you from rm –rf / or copying over the wrong file. System crash – Crashes can interrupt write operations, leading to situation where data is updated but parity not. Correlated disk failures – Accidents (power failures, dropping the machine) can impact all disks at once. – Disks bought at same time often fail at same time. Hardware data corruption – If a disk controller writes bad data, all disks will have the bad data.
  • 49. CIT 470: Advanced Network and System Administration Slide #49 References 1. Charles M. Kozierok, “Reference Guide—Hard Disk Drives,” http://www.pcguide.com/ref/hdd/, 2005. 2. Adam Leventhal. Triple-Parity RAID and Beyond. In: Communications of the ACM, Vol 53, No. 1. Jan 2010. 3. Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001. 4. Octane, “SCSI Technology Primer,” http://arstechnica.com/paedia/s/scsi-1.html, 2002. 5. Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andre Barroso. Failure Trends in a Large Disk Drive Populartion. In: FAST '07: 5th USENIX Conference on File and Storage Technologies. 2007. 6. Bianca Schroder and Garth A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean? In: FAST '07: 5th USENIX Conference on File and Storage Technologies. 2007. 7. Wikipedia, RAID, http://en.wikipedia.org/wiki/RAID, 2010.