0
ECE 6160: Advanced Computer Networks Introduction to Storage Devices Instructor: Dr. Xubin (Ben) He Email:  [email_address...
Prev… <ul><li>Gigabit Ethernet </li></ul><ul><ul><li>MAC </li></ul></ul><ul><ul><li>PHY </li></ul></ul><ul><li>VLAN </li><...
Anatomy: Key components for a typical computer
Motivation: Who Cares About I/O? <ul><li>CPU Performance: 60% per year </li></ul><ul><li>I/O system performance limited by...
I/O Systems Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller I/O Controller Graphics N...
Storage Technology Drivers <ul><li>Driven by the prevailing computing paradigm </li></ul><ul><ul><li>1950s: migration from...
Outline <ul><li>Disk Basics/Disk History </li></ul><ul><li>Current Disk options </li></ul><ul><li>Disk fallacies and perfo...
Disk Device Terminology <ul><li>Several  platters , with information recorded magnetically on both  surfaces  (usually) </...
Tracks, Sectors, and Cylinders A  cylinder is comprised of the set of tracks described by all the heads (on separate platt...
Photo of Disk Head, Arm, Actuator Actuator Arm Head Spindle source: www.seagate.com Platters (12) {
Disk Device Performance <ul><li>Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead </li></ul><...
Disk Device Performance <ul><li>Average rotation time: </li></ul><ul><ul><li>Average distance sector from head? </li></ul>...
Data Rate: Inner vs. Outer Tracks  <ul><li>To keep things simple, orginally kept same number of sectors per track </li></u...
Disk Performance Model /Trends <ul><li>Capacity </li></ul><ul><ul><li>+ 100%/year (2X / 1.0 yrs) </li></ul></ul><ul><li>Tr...
State of the Art: Barracuda 180 <ul><ul><li>181.6 GB, 3.5 inch disk </li></ul></ul><ul><ul><li>12 platters, 24 surfaces </...
Disk Performance Example <ul><li>Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performan...
Areal Density: linear density x track density  <ul><li>Bits recorded along a track </li></ul><ul><ul><li>Metric is  Bits P...
Areal Density <ul><ul><li>Areal Density =   BPI x TPI </li></ul></ul><ul><ul><li>Change slope 30%/yr to 60%/yr about 1991 ...
Historical Perspective <ul><li>1956 IBM Ramac — early 1970s Winchester </li></ul><ul><ul><li>Developed for mainframe compu...
Disk History Data  density Mbit/sq. in. Capacity of Unit Shown Megabytes 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit...
Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes source: New York Times, 2/23/98, page C...
1 inch disk drive! <ul><li>2003 IBM MicroDrive: </li></ul><ul><ul><li>1.7” x 1.4” x 0.2”  </li></ul></ul><ul><ul><li>1 GB,...
Disk Characteristics in 2003   Seagate Cheetah 3146807LW Hitachi TravelStar 7K60 IBM Microdrive ( 07N5574) Dimension 3.5”,...
Disk Characteristics in 2007   Seagate Cheetah ST3146707LW Hitachi TravelStar 7K60 Seagate Barracuda ES ST3750640NS Dimens...
Today’s Hard Drive (09/29/2008) (warehouse.com) <ul><li>Seagate Barracuda 7200.11 - hard drive - 750 GB - SATA-300  Hard d...
Fallacy: Use Data Sheet “Average Seek” Time <ul><li>Manufacturers needed standard for fair comparison (“benchmark”) </li><...
Fallacy: Use Data Sheet Transfer Rate <ul><li>Manufacturers quote the speed off the data rate off the surface of the disk....
Disk Performance Example: revision <ul><li>Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted...
Future Disk Size and Performance <ul><li>Continued advance in capacity (60%/yr) and bandwidth (40%/yr). </li></ul><ul><li>...
What about FLASH <ul><li>Compact Flash Cards </li></ul><ul><ul><li>Intel Strata Flash </li></ul></ul><ul><ul><ul><li>16 Mb...
Disk Operations <ul><li>Seek: move head to track; </li></ul><ul><li>Rotation: wait for sector under head; </li></ul><ul><l...
Improving disk performance. <ul><li>Use large sectors to improve bandwidth </li></ul><ul><li>Use track caches and read ahe...
Use Arrays of Small Disks? 14” 10” 5.25” 3.5” 3.5” Disk Array:  1 disk design Conventional:  4 disk  designs Low End High ...
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity  Volume  Power Data Rate  I/O ...
Array Reliability <ul><li>MTTF: Mean Time To Failure: average time that  a non - repairable component will operate before ...
Redundant  Arrays of (Inexpensive) Disks <ul><li>Replicate data over several disks so that no data will be lost if one dis...
 
Levels of RAID <ul><li>Original RAID paper described five categories (RAID levels 1-5).  (Patterson et al, “A case for red...
RAID 0: Nonredundant (JBOD) <ul><li>High I/O performance. </li></ul><ul><ul><li>Data is not save redundantly. </li></ul></...
RAID 0: Striped Disk Array without Fault Tolerance RAID Level 0 requires a minimum of 2 drives to implement               ...
Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing <ul><li>•  Each disk is fully duplicated onto its “...
RAID 1: Mirroring and Duplexing                                                                                           ...
RAID 4: Block Interleaved Parity <ul><li>Blocks: striping units  </li></ul><ul><li>Allow for parallel access by multiple I...
RAID 4: Independent Data disks with shared Parity disk                                                                    ...
Inspiration for RAID 5 <ul><li>RAID 4 works well for small reads </li></ul><ul><li>Small writes (write to one disk):  </li...
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of inte...
RAID 5: Block Interleaved Distributed-Parity <ul><li>Parity disk = (block number/4) mod 5  </li></ul><ul><li>Eliminate the...
RAID 5: Independent Data disks with distributed parity blocks                                                             ...
Comparison of RAID Levels (N disks, each with capacity of C) Small write problem High performance, reliability (N-1)xC Int...
Other RAIDs <ul><li>HP AutoRAID </li></ul><ul><li>AFRAID </li></ul><ul><li>RAPID </li></ul><ul><li>SwiftRAID </li></ul><ul...
Berkeley History: RAID-I <ul><li>RAID-I (1989)  </li></ul><ul><ul><li>Consisted of a Sun 4/280 workstation with 128 MB of ...
Upcoming SlideShare
Loading in...5
×

Storage devices and disk arrays

2,119

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,119
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
66
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Ancestor of Java had no I/O CPU vs. Peripheral Primary vs. Secondary What maks portable, PDA exciting?
  • 47 to 26 MB/s (external)
  • Capacity: 80 GB * 2^5 = 2.5 TB BW 30 MB/s * 1.4^5 = 160 MB/s seek 6 / 1.1^5 = 4 ms RPM 10000 =&gt; 20000? (3 to 1.5) to read sequentially = 260 minutes or 4.4 hours! To read randomly = 316 days or 1 year
  • Transcript of "Storage devices and disk arrays"

    1. 1. ECE 6160: Advanced Computer Networks Introduction to Storage Devices Instructor: Dr. Xubin (Ben) He Email: [email_address] Tel: 931-372-3462
    2. 2. Prev… <ul><li>Gigabit Ethernet </li></ul><ul><ul><li>MAC </li></ul></ul><ul><ul><li>PHY </li></ul></ul><ul><li>VLAN </li></ul>
    3. 3. Anatomy: Key components for a typical computer
    4. 4. Motivation: Who Cares About I/O? <ul><li>CPU Performance: 60% per year </li></ul><ul><li>I/O system performance limited by mechanical delays (disk I/O) </li></ul><ul><ul><li>< 10% per year (IO per sec) </li></ul></ul><ul><li>Amdahl's Law: system speed-up limited by the slowest part! </li></ul><ul><ul><li>10% IO & 10x CPU => 5x Performance (lose 50%) </li></ul></ul><ul><ul><li>10% IO & 100x CPU => 10x Performance (lose 90%) </li></ul></ul><ul><li>I/O bottleneck: </li></ul><ul><ul><li>Diminishing fraction of time in CPU </li></ul></ul><ul><ul><li>Diminishing value of faster CPUs </li></ul></ul>
    5. 5. I/O Systems Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller I/O Controller Graphics Network interrupts
    6. 6. Storage Technology Drivers <ul><li>Driven by the prevailing computing paradigm </li></ul><ul><ul><li>1950s: migration from batch to on-line processing </li></ul></ul><ul><ul><li>1990s: migration to ubiquitous computing </li></ul></ul><ul><ul><ul><li>computers in phones, books, cars, video cameras, … </li></ul></ul></ul><ul><ul><ul><li>nationwide fiber optical network with wireless tails </li></ul></ul></ul><ul><li>Effects on storage industry: </li></ul><ul><ul><li>Embedded storage </li></ul></ul><ul><ul><ul><li>smaller, cheaper, more reliable, lower power </li></ul></ul></ul><ul><ul><li>Data utilities </li></ul></ul><ul><ul><ul><li>high capacity, hierarchically managed storage </li></ul></ul></ul>
    7. 7. Outline <ul><li>Disk Basics/Disk History </li></ul><ul><li>Current Disk options </li></ul><ul><li>Disk fallacies and performance </li></ul><ul><li>FLASH </li></ul><ul><li>Tapes </li></ul><ul><li>RAID </li></ul>
    8. 8. Disk Device Terminology <ul><li>Several platters , with information recorded magnetically on both surfaces (usually) </li></ul><ul><li>Actuator moves head (end of arm ,1/surface) over track ( “seek” ) , select surface , wait for sector rotate under head , then read or write </li></ul><ul><ul><li>“ Cylinder ”: all tracks under heads </li></ul></ul><ul><li>Bits recorded in tracks , which in turn divided into sectors (e.g., 512 Bytes) </li></ul>Platter Outer Track Inner Track Sector Actuator Head Arm
    9. 9. Tracks, Sectors, and Cylinders A cylinder is comprised of the set of tracks described by all the heads (on separate platters) at a single seek position
    10. 10. Photo of Disk Head, Arm, Actuator Actuator Arm Head Spindle source: www.seagate.com Platters (12) {
    11. 11. Disk Device Performance <ul><li>Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead </li></ul><ul><li>Seek Time? depends on tracks move arm, seek speed of disk </li></ul><ul><li>Rotation Time? depends on speed disk rotates, how far sector is from head </li></ul><ul><li>Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request </li></ul>Platter Arm Actuator Head Sector Inner Track Outer Track Controller Spindle
    12. 12. Disk Device Performance <ul><li>Average rotation time: </li></ul><ul><ul><li>Average distance sector from head? </li></ul></ul><ul><ul><li>1/2 time of a rotation </li></ul></ul><ul><ul><ul><li>10000 Revolutions Per Minute  166.67 Rev/sec </li></ul></ul></ul><ul><ul><ul><li>1 revolution = 1/ 166.67 sec  6.00 milliseconds </li></ul></ul></ul><ul><ul><ul><li>1/2 rotation (revolution)  3.00 ms </li></ul></ul></ul><ul><li>Average seek time? </li></ul><ul><ul><li>Sum all possible seek distances from all possible tracks / # possible </li></ul></ul><ul><ul><ul><li>Assumes average seek distance is random </li></ul></ul></ul><ul><ul><li>Disk industry standard benchmark </li></ul></ul><ul><ul><li>About 10ms (ATA) and 5ms (SCSI) </li></ul></ul>
    13. 13. Data Rate: Inner vs. Outer Tracks <ul><li>To keep things simple, orginally kept same number of sectors per track </li></ul><ul><ul><li>Since outer track longer, lower bits per inch </li></ul></ul><ul><li>Competition  decided to keep BPI the same for all tracks (“ constant bit density ”) </li></ul><ul><ul><li> More capacity per disk </li></ul></ul><ul><ul><li> More of sectors per track towards edge </li></ul></ul><ul><ul><li> Since disk spins at constant speed, outer tracks have faster data rate </li></ul></ul><ul><li>Bandwidth outer track 1.7X inner track! </li></ul><ul><ul><li>Inner track highest density, outer track lowest, so not really constant </li></ul></ul><ul><ul><li>2.1X length of track outer / inner, 1.7X bits outer / inner </li></ul></ul>
    14. 14. Disk Performance Model /Trends <ul><li>Capacity </li></ul><ul><ul><li>+ 100%/year (2X / 1.0 yrs) </li></ul></ul><ul><li>Transfer rate (BW) </li></ul><ul><ul><li>+ 40%/year (2X / 2.0 yrs) </li></ul></ul><ul><li>Rotation + Seek time </li></ul><ul><ul><li>– 8%/ year (1/2 in 10 yrs) </li></ul></ul><ul><li>MB/$ </li></ul><ul><ul><li>> 100%/year (2X / 1.0 yrs) </li></ul></ul><ul><ul><li>60GB/$300 => $5/GB =>0.5c/MB </li></ul></ul>
    15. 15. State of the Art: Barracuda 180 <ul><ul><li>181.6 GB, 3.5 inch disk </li></ul></ul><ul><ul><li>12 platters, 24 surfaces </li></ul></ul><ul><ul><li>24,247 cylinders </li></ul></ul><ul><ul><li>7,200 RPM; </li></ul></ul><ul><ul><li>7.4/8.2 ms avg. seek (r/w) </li></ul></ul><ul><ul><li>64 to 35 MB/s (internal) </li></ul></ul><ul><ul><li>0.1 ms controller time </li></ul></ul><ul><ul><li>10.3 watts (idle) </li></ul></ul>source: www.seagate.com Sector Track Cylinder Head Platter Arm Track Buffer Q:Calculate time to read 64 KB (128 sectors) for Barracuda 180, sector is on outer track. Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte { +
    16. 16. Disk Performance Example <ul><li>Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performance; sector is on outer track </li></ul><ul><li>Disk latency = average seek time + average rotational delay + transfer time + controller overhead </li></ul><ul><li>= 7.4 ms + 0.5 * 1/(7200 RPM) + 64 KB / (64 MB/s) + 0.1 ms </li></ul><ul><li>= 7.4 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (64 KB/ms) + 0.1 ms </li></ul><ul><li>= 7.4 + 4.2 + 1.0 + 0.1 ms = 12.7 ms </li></ul>
    17. 17. Areal Density: linear density x track density <ul><li>Bits recorded along a track </li></ul><ul><ul><li>Metric is Bits Per Inch ( BPI ) along a track </li></ul></ul><ul><li>Number of tracks per surface </li></ul><ul><ul><li>Metric is Tracks Per Inch ( TPI ) </li></ul></ul><ul><li>Disk Designs Brag about bit density per unit area </li></ul><ul><ul><li>Metric is Bits Per Square Inch </li></ul></ul><ul><ul><li>Called Areal Density </li></ul></ul><ul><ul><li>Areal Density = BPI x TPI </li></ul></ul>
    18. 18. Areal Density <ul><ul><li>Areal Density = BPI x TPI </li></ul></ul><ul><ul><li>Change slope 30%/yr to 60%/yr about 1991 </li></ul></ul>
    19. 19. Historical Perspective <ul><li>1956 IBM Ramac — early 1970s Winchester </li></ul><ul><ul><li>Developed for mainframe computers, proprietary interfaces </li></ul></ul><ul><ul><li>Steady shrink in form factor: 27 in. to 14 in </li></ul></ul><ul><li>Form factor and capacity drives market, more than performance </li></ul><ul><li>1970s: Mainframes  14 inch diameter disks </li></ul><ul><li>1980s: Minicomputers,Servers  8”,5 1/4 ” diameter </li></ul><ul><li>PCs, workstations Late 1980s/Early 1990s: </li></ul><ul><ul><li>Mass market disk drives become a reality </li></ul></ul><ul><ul><ul><li>industry standards: SCSI, IPI, IDE </li></ul></ul></ul><ul><ul><li>Pizzabox PCs  3.5 inch diameter disks </li></ul></ul><ul><ul><li>Laptops, notebooks  2.5 inch disks </li></ul></ul><ul><ul><li>Palmtops didn’t use disks, so 1.8 inch diameter disks didn’t make it </li></ul></ul><ul><li>2000s: </li></ul><ul><ul><li>1 inch for cameras, cell phones? </li></ul></ul>
    20. 20. Disk History Data density Mbit/sq. in. Capacity of Unit Shown Megabytes 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”
    21. 21. Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces” 1997: 3090 Mbit/sq. in 8100 MBytes
    22. 22. 1 inch disk drive! <ul><li>2003 IBM MicroDrive: </li></ul><ul><ul><li>1.7” x 1.4” x 0.2” </li></ul></ul><ul><ul><li>1 GB, 3600 RPM, 13.3 MB/s, 15 ms seek </li></ul></ul><ul><ul><li>Digital camera, PalmPC? </li></ul></ul><ul><li>2003 USB mini flash drive </li></ul><ul><ul><li>75mmx28mmx12mm, 17g </li></ul></ul><ul><ul><li>256MB </li></ul></ul><ul><ul><li>0.5-1MB/s </li></ul></ul><ul><li>2006 MicroDrive? </li></ul><ul><ul><li>9 GB, 50 MB/s! </li></ul></ul><ul><ul><ul><li>Assuming it finds a niche in a successful product </li></ul></ul></ul><ul><ul><ul><li>Assuming past trends continue </li></ul></ul></ul>
    23. 23. Disk Characteristics in 2003   Seagate Cheetah 3146807LW Hitachi TravelStar 7K60 IBM Microdrive ( 07N5574) Dimension 3.5”, 1.5 lbs 2.5”, 0.2 lbs 1.7”, 0.6 lbs Interface Ultra 320 SCSI ATA-100 (Ultra) CF+ Capacity 146.8GB 60GB 1GB Data Transfer Rate 320MB/s 100MB/s 13.3MB/s Average seek time 4.7ms 10ms 15ms Spindle Speed 10000RPM 7200RPM 3600RPM Buffer 8MB 8MB 128KB Price (warehouse.com) $899.95 $299.95 $319.95
    24. 24. Disk Characteristics in 2007   Seagate Cheetah ST3146707LW Hitachi TravelStar 7K60 Seagate Barracuda ES ST3750640NS Dimension 3.5” 2.5”, 0.2 lbs 3.5” Interface Ultra 320 SCSI ATA-100 (Ultra) Serial ATA-300 Capacity 146.8GB 60GB 750GB Data Transfer Rate 320MB/s 100MB/s Average seek time 4.7ms 10ms Spindle Speed 10000RPM 7200RPM 7200RPM Buffer 8MB 8MB 16MB Price (warehouse.com) $259 $93.99 (80GB) $319.95
    25. 25. Today’s Hard Drive (09/29/2008) (warehouse.com) <ul><li>Seagate Barracuda 7200.11 - hard drive - 750 GB - SATA-300 Hard drive - 750 GB - internal - 3.5&quot; - SATA-300 - 7200 rpm - buffer: 32 MB, $116 </li></ul><ul><li>Seagate FreeAgent Pro - hard drive - 500 GB - FireWire/Hi-Speed USB/ eSATA Hard drive – 500 GB – external – 3.5” – USB 2.0, eSATA-300, FireWire interfaces – 7200 rpm – buffer: 32 MB, $100 </li></ul>
    26. 26. Fallacy: Use Data Sheet “Average Seek” Time <ul><li>Manufacturers needed standard for fair comparison (“benchmark”) </li></ul><ul><ul><li>Calculate all seeks from all tracks, divide by number of seeks => “average” </li></ul></ul><ul><li>Real average would be based on how data laid out on disk, where seek in real applications, then measure performance </li></ul><ul><ul><li>Usually, tend to seek to tracks nearby, not to random track </li></ul></ul><ul><li>Rule of Thumb: observed average seek time is typically about 1/4 to 1/3 of quoted seek time (i.e., 3X-4X faster) </li></ul><ul><ul><li>Barracuda 180 X avg. seek: 7.4 ms  2.5 ms </li></ul></ul>
    27. 27. Fallacy: Use Data Sheet Transfer Rate <ul><li>Manufacturers quote the speed off the data rate off the surface of the disk. </li></ul><ul><li>Sectors contain an error detection and correction field (can be 20% of sector size) plus sector number as well as data </li></ul><ul><li>There are gaps between sectors on track </li></ul><ul><li>Rule of Thumb: disks deliver about 3/4 of internal media rate (1.3X slower) for data </li></ul><ul><li>For example, Barracuda 180X quotes 64 to 35 MB/sec internal media rate </li></ul><ul><li> 47 to 26 MB/sec external data rate (74%) </li></ul>
    28. 28. Disk Performance Example: revision <ul><li>Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted seek time, 3/4 of internal outer track bandwidth; ( 12.7 ms before ) </li></ul><ul><li>Disk latency = average seek time + average rotational delay + transfer time + controller overhead </li></ul><ul><li>= ( 0.33 * 7.4 ms) + 0.5 * 1/(7200 RPM) + 64 KB / ( 0.75 * 65 MB/s) + 0.1 ms </li></ul><ul><li>= 2.5 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / ( 47 KB/ms) + 0.1 ms </li></ul><ul><li>= 2.5 + 4.2 + 1.4 + 0.1 ms = 8.2 ms (64% of 12.7) </li></ul>
    29. 29. Future Disk Size and Performance <ul><li>Continued advance in capacity (60%/yr) and bandwidth (40%/yr). </li></ul><ul><li>Slow improvement in seek, rotation (8%/yr). </li></ul><ul><li>Time to read whole disk. </li></ul><ul><li>Year Sequentially Randomly (1 sector/seek) </li></ul><ul><li>1990 4 minutes 6 hours </li></ul><ul><li>2000 12 minutes 1 week(!) </li></ul><ul><li>3.5” form factor make sense in 5 yrs? </li></ul><ul><ul><li>What is capacity, bandwidth, seek time, RPM? </li></ul></ul><ul><ul><li>Assume today 80 GB, 100 MB/sec, 6 ms, 10000 RPM </li></ul></ul><ul><ul><ul><li>838GB, 537MB/s, 4ms, 15000RPM </li></ul></ul></ul>
    30. 30. What about FLASH <ul><li>Compact Flash Cards </li></ul><ul><ul><li>Intel Strata Flash </li></ul></ul><ul><ul><ul><li>16 Mb in 1 square cm. (.6 mm thick) </li></ul></ul></ul><ul><ul><li>100,000 write/erase cycles. </li></ul></ul><ul><ul><li>Standby current = 100uA, write = 45mA </li></ul></ul><ul><ul><li>Compact Flash 256MB~=$120 512MB~=$542 </li></ul></ul><ul><ul><li>Transfer @ 3.5MB/s </li></ul></ul><ul><li>IBM Microdrive 1G~370 </li></ul><ul><ul><li>Standby current = 20mA, write = 250mA </li></ul></ul><ul><ul><li>Efficiency advertised in wats/MB </li></ul></ul><ul><li>VS. Disks </li></ul><ul><ul><li>Nearly instant standby wake-up time </li></ul></ul><ul><ul><li>Random access to data stored </li></ul></ul><ul><ul><li>Tolerant to shock and vibration (1000G of operating shock) </li></ul></ul>
    31. 31. Disk Operations <ul><li>Seek: move head to track; </li></ul><ul><li>Rotation: wait for sector under head; </li></ul><ul><li>Transfer:Move data to/from disks; </li></ul><ul><li>Overhead: </li></ul><ul><ul><li>Controller delay </li></ul></ul><ul><ul><li>Queuing delay </li></ul></ul>Access time = seek time + rotational delay + transfer time+overhead
    32. 32. Improving disk performance. <ul><li>Use large sectors to improve bandwidth </li></ul><ul><li>Use track caches and read ahead; </li></ul><ul><ul><li>Read entire track into on-controller cache </li></ul></ul><ul><ul><li>Exploit locality (improves both latency and BW) </li></ul></ul><ul><li>Design file systems to maximize locality </li></ul><ul><ul><li>Allocate files sequentially on disks (exploit track cache) </li></ul></ul><ul><ul><li>Locate similar files in same cylinder (reduce seeks) </li></ul></ul><ul><ul><li>Locate simlar files in near-by cylinders (reduce seek distance) </li></ul></ul><ul><li>Pack bits closer together to improve transfer rate and density. </li></ul><ul><li>Use a collection of small disks to form a large, high performance one--->disk array </li></ul><ul><li>Stripping data across multiple disks to allow parallel I/O, thus improving performance. </li></ul>
    33. 33. Use Arrays of Small Disks? 14” 10” 5.25” 3.5” 3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End <ul><li>Katz and Patterson asked in 1987: </li></ul><ul><ul><li>Can smaller disks be used to close gap in performance between disks and CPUs? </li></ul></ul>
    34. 34. Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity Volume Power Data Rate I/O Rate MTTF Cost IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5&quot; 0061 320 MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K x70 23 GBytes 11 cu. ft. 1 KW 110 MB/s 3900 IOs/s ??? Hrs $150K Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability? 9X 3X 8X 6X
    35. 35. Array Reliability <ul><li>MTTF: Mean Time To Failure: average time that a non - repairable component will operate before experiencing failure. </li></ul><ul><li>Reliability of N disks = Reliability of 1 Disk ÷N </li></ul><ul><ul><li>50,000 Hours ÷ 70 disks = 700 hours </li></ul></ul><ul><li>Disk system MTTF: Drops from 6 years to 1 month! </li></ul><ul><li>Arrays without redundancy too unreliable to be useful! </li></ul><ul><li>Solution: redundancy. </li></ul>
    36. 36. Redundant Arrays of (Inexpensive) Disks <ul><li>Replicate data over several disks so that no data will be lost if one disk fails. </li></ul><ul><li>Redundancy yields high data availability </li></ul><ul><ul><li>Availability : service still provided to user, even if some components failed </li></ul></ul><ul><li>Disks will still fail </li></ul><ul><li>Contents reconstructed from data redundantly stored in the array </li></ul><ul><ul><li> Capacity penalty to store redundant info </li></ul></ul><ul><ul><li> Bandwidth penalty to update redundant info </li></ul></ul>
    37. 38. Levels of RAID <ul><li>Original RAID paper described five categories (RAID levels 1-5). (Patterson et al, “A case for redundant arrays of inexpensive disks (RAID)”, ACM SIGMOD, 1988) </li></ul><ul><li>Disk striping with no redundant now is called RAID0 or JBOD(Just a bunch of disks). </li></ul><ul><li>Other kinds have been proposed in literature, </li></ul><ul><li>Level 6 (P+Q Redundancy), Level 10, RAID53, etc. </li></ul><ul><li>Except RAID0, all the RAID levels trade disk capacity for reliability, and the extra reliability makes parallism a practical way to improve performance. </li></ul>
    38. 39. RAID 0: Nonredundant (JBOD) <ul><li>High I/O performance. </li></ul><ul><ul><li>Data is not save redundantly. </li></ul></ul><ul><ul><li>Single copy of data is striped across multiple disks. </li></ul></ul><ul><li>Low cost. </li></ul><ul><ul><li>Lack of redundancy. </li></ul></ul><ul><li>Least reliable: single disk failure leads to data loss. </li></ul>file data block 1 block 0 block 2 block 3 Disk 1 Disk 0 Disk 2 Disk 3
    39. 40. RAID 0: Striped Disk Array without Fault Tolerance RAID Level 0 requires a minimum of 2 drives to implement                                                                                                                 
    40. 41. Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing <ul><li>•  Each disk is fully duplicated onto its “ mirror ” </li></ul><ul><li>Very high availability can be achieved </li></ul><ul><li>• Bandwidth sacrifice on write: </li></ul><ul><li>Logical write = two physical writes </li></ul><ul><ul><li>• Reads may be optimized, minimize the queue and disk search time </li></ul></ul><ul><li>• Most expensive solution: 100% capacity overhead </li></ul>recovery group Targeted for high I/O rate , high availability environments
    41. 42. RAID 1: Mirroring and Duplexing                                                                                                                  For Highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. RAID Level 1 requires a minimum of 2 drives to implement
    42. 43. RAID 4: Block Interleaved Parity <ul><li>Blocks: striping units </li></ul><ul><li>Allow for parallel access by multiple I/O requests, high I/O rates </li></ul><ul><li>Doing multiple small reads is now faster than before. (allows small read requests to be restricted to a single disk). </li></ul><ul><li>Large writes(full stripe), update the parity: </li></ul><ul><li>P’ = d0’ + d1’ + d2’ + d3’; </li></ul><ul><li>Small writes(eg. write on d0), update the parity: </li></ul><ul><li>P = d0 + d1 + d2 + d3 </li></ul><ul><li>P’ = d0’ + d1 + d2 + d3 = P + d0’ + d0; </li></ul><ul><li>However, writes are still very slow since the parity </li></ul><ul><li>disk is the bottleneck. </li></ul>block 0 block 4 block 8 block 12 block 1 block 5 block 9 block 13 block 2 block 6 block 10 block 14 block 3 block 7 block 11 block 15 P(0-3) P(4-7) P(8-11) P(12-15)
    43. 44. RAID 4: Independent Data disks with shared Parity disk                                                                                                                  Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads. RAID Level 4 requires a minimum of 3 drives to implement
    44. 45. Inspiration for RAID 5 <ul><li>RAID 4 works well for small reads </li></ul><ul><li>Small writes (write to one disk): </li></ul><ul><ul><li>Option 1: read other data disks, create new sum and write to Parity Disk </li></ul></ul><ul><ul><li>Option 2: since P has old sum, compare old data to new data, add the difference to P </li></ul></ul><ul><li>Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk. Parity disk must be updated for every write operation! </li></ul>D0 D1 D2 D3 P D4 D5 D6 P D7
    45. 46. Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity Independent writes possible because of interleaved parity D0 D1 D2 D3 P D4 D5 D6 P D7 D8 D9 P D10 D11 D12 P D13 D14 D15 P D16 D17 D18 D19 D20 D21 D22 D23 P . . . . . . . . . . . . . . . Disk Columns Increasing Logical Disk Addresses Example: write to D0, D5 uses disks 0, 1, 3, 4
    46. 47. RAID 5: Block Interleaved Distributed-Parity <ul><li>Parity disk = (block number/4) mod 5 </li></ul><ul><li>Eliminate the parity disk bottleneck of RAID 4 </li></ul><ul><li>Best small read, large read and large write performance </li></ul><ul><li>Can correct any single self-identifying failure </li></ul><ul><li>Small logical writes take two physical reads and two </li></ul><ul><li>physical writes. </li></ul><ul><li>Recovering needs reading all nonfailed disks </li></ul>Left Symmetric Distribution block 0 block 4 block 8 block 12 P(16-19) block 1 block 5 block 9 P(12-15) block 16 block 2 block 6 P(8-11) block 13 block 17 block 3 P(4-7) block 10 block 14 block 18 P(0-3) block 7 block 11 block 15 block 19
    47. 48. RAID 5: Independent Data disks with distributed parity blocks                                                                                                                  Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. RAID Level 5 requires a minimum of 3 drives to implement
    48. 49. Comparison of RAID Levels (N disks, each with capacity of C) Small write problem High performance, reliability (N-1)xC Interleave parity 5 Write-related bottleneck High redundancy and better performance (N-1)xC Block-level parity 4 Low performance Easy to implement, high error recoverability (N-1)xC Bit(byte)-level parity 3 cost High performance, fastest write (N/2)xC Mirroring 1 No redundancy Maximum data transfer rate and sizes NxC Striping 0 Disadvantage Advantage Capacity Technique RAID level
    49. 50. Other RAIDs <ul><li>HP AutoRAID </li></ul><ul><li>AFRAID </li></ul><ul><li>RAPID </li></ul><ul><li>SwiftRAID </li></ul><ul><li>TickerTAIP </li></ul><ul><li>SMDA </li></ul><ul><li>…… </li></ul>
    50. 51. Berkeley History: RAID-I <ul><li>RAID-I (1989) </li></ul><ul><ul><li>Consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dual-string SCSI controllers, 28 5.25-inch SCSI disks and specialized disk striping software </li></ul></ul><ul><li>Today RAID is $19 billion dollar industry, 80% nonPC disks sold in RAIDs </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×