RAID: High-Performance, Reliable Secondary Storage

898 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
898
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
61
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

RAID: High-Performance, Reliable Secondary Storage

  1. 1. High Performance, Reliable Secondary Storage Uğur Tılıkoğlu Gebze Institute of Technology
  2. 2. Overview  Introduction  Background  Disk Terminology  Data Paths  Technology Trends  Disk Array Basics  Data Striping and Redundancy  Basic Raid Organizations  Performance and Cost Comparisons  Reliability  Implementation Considerations 2
  3. 3. Overview  Advanced Topics  Improving Small Write Performance for RAID Level 5  Declustered Parity  Exploiting On-Line Spare Disks  Data Striping in Disk Arrays  Performance and Reliability Modeling  Opportunities for Future Research  Experience with Disk Arrays  Interaction among New Organizations  Scalability, Massively Parallel Computers and Small Disks  Latency 3
  4. 4. Introduction  RAID: Redundant Arrays of Inexpensive / Independent Disks  Improvements in microprocessors and memory systems require larger, higher-performance secondary storage systems  Microprocessors performance increase rate > Disk performance increase rate  Disk arrays: multiple, independent disks  large, high-performance logical disk 4
  5. 5. Background  Disk Terminology  Data Paths  Technology Trends 5
  6. 6. Disk Terminology 6
  7. 7. Data Paths 7
  8. 8. Technology Trends 8
  9. 9. Disk Array Basics  Data Striping and Redundancy  Basic Raid Organizations  Performance and Cost Comparisons  Reliability  Implementation Considerations 9
  10. 10. Data Striping and Redundancy  Data Striping  Distribute data over multiple disks  Service in parallel  More disks  More performance 10
  11. 11. Data Striping and Redundancy  More disks  More unreliable  100 disks  1/100 reliability of a single disks  Redundancy  Two categories  Granularity of data interleaving  Method of computing redundant information and distribute accross the disk array 11
  12. 12. Data Striping and Redundancy  Data interleaving  Fine grained  Advantages:   access all the disks high transfer rate  Disadvantages   only one I/O request serviced at any time All disks waste time positioning for every request 12
  13. 13. Data Striping and Redundancy  Data interleaving  Coarse grained  Advantages:   Multiple small requests serviced simultaneously Large requests can access all the disks 13
  14. 14. Data Striping and Redundancy  Redundancy  Two main problems  Computing the redundant information: Parity  Selecting a method for distributing the redundant information accross the disk array 14
  15. 15. Basic Raid Organizations  Nonredundant (RAID Level 0)  Lowest cost  Best write performance  No best read performance  Any single disk failure result data loss 15
  16. 16. Basic Raid Organizations  Mirrored (RAID Level 1)  Twice number of disks  Data also written to redundant disk  If a disk fails, the other copy is used 16
  17. 17. Basic Raid Organizations  Memory Style ECC (RAID Level 2)  Contain parity disks  Parity disk proportional to data disks  Efficiency increases when data disk number increases  Multiple parity disks are needed to identify the failed disk, but only one is needed to recover 17
  18. 18. Basic Raid Organizations  Bit-Interleaved Parity (RAID Level 3)  Bit-wise data is used  Disk controller can identify which disk has failed  A single parity disk is used  Read  all disks, Write  all disks + parity disk 18
  19. 19. Basic Raid Organizations  Block-Interleaved Parity (RAID Level 4)  Same as Level 3 but blocks (striping units) are used  Read & write < striping unit  one disk  Parity calculation  xor new data with old data  Four I/O: write new data, read old data and old parity, write new parity  Bottleneck at parity disk 19
  20. 20. Basic Raid Organizations  Block-Interleaved Distributed-Parity (RAID Level 5)  Solves bottleneck problem at Level 4  Best small read, larger read and large write performance  Small writes are inefficient because of read-modifywrite 20
  21. 21. Basic Raid Organizations  P + Q Redundancy (RAID Level 6)  Have stronger codes to solve multiple failures  Operate in much the same manner as Level 5  Small writes are inefficient because of 6 I/O requests due to update both P and Q information 21
  22. 22. Performance and Cost Comparisons  Ground Rules and Observations  Reliability, performance and cost  Disk arrays are throughput oriented  I/Os per second per dollar  Configuration  RAID 5 can operate as RAID 1 and RAID 3 by configuring striping unit 22
  23. 23. Performance and Cost Comparisons  Comparisons – Small Read & Writes 23
  24. 24. Performance and Cost Comparisons  Comparisons – Large Read & Writes 24
  25. 25. Performance and Cost Comparisons  Comparisons – RAID 3 & 5 & 6 25
  26. 26. Reliability  Basic Reliability  RAID 5    MTTF: mean time to failure, MTTR: mean time to repair N: total number of disks, G: parity group size 100 disks each had MTTF of 200.000 hours, MTTR of 1 hour, partiy group size 16  mean time to failure of the system is about 3000 years !!! 26
  27. 27. Reliability  Basic Reliability  RAID 6    MTTF: mean time to failure, MTTR: mean time to repair N: total number of disks, G: parity group size 100 disks each had MTTF of 200.000 hours, MTTR of 1 hour, partiy group size 16  mean time to failure of the system is about 38.000.000 years !!! 27
  28. 28. System Crashes and Parity Inconsistency  System crash: power failure, operator error, hardware breakdown, software crash etc.  Causes parity inconsistencies in both bit-interleaved and block-interleaved disk arrays  System crash may occur more frequently than disk failures  To avoid the loss of parity on system crashes, information sufficient to recover parity mus be logged on a non-volatile storage (nvram) before each write operation. 28
  29. 29. Uncorrectable Bit Errors  What is bit error? It is unclear  Data is incorrectly written or magnetic media gradually damaged  Some manifacturers developed an approach; monitors the warnings given by disks and notifies an operator when it feels the disk is about to fail. 29
  30. 30. Correlated Disk Failures  Environmental and manufacturing factors  Example: earthquake 30
  31. 31. Reliability Revisited  Double disk failure  System crash followed by a disk failure  Disk failure followed by an uncorrectable bit error during reconstruction 31
  32. 32. Reliability Revisited 32
  33. 33. Reliability Revisited 33
  34. 34. Reliability Revisited 34
  35. 35. Implementation Considerations  Avoiding Stale Data  When a disk fails, failed disk must be marked as invalid. Invalid mark prevents user from reading corrupted data on the failed disk  When an invalid logical sector is reconstructed to a spare disk, the logical sector must be marked as valid. 35
  36. 36. Implementation Considerations  Regenerating Parity after a System Crash  Before servicing any write request, the corresponding parity sectors must be marked inconsistent  When bringing a system up from a system crash, all inconsistent parity sectors must be regenerated 36
  37. 37. Implementation Considerations  Operating with a Failed Disk  Demand reconstruction: access to a parity stripe with an invalid sector triggers reconstruction of the appropriate data immediately onto a spare disk. A background process scans the entire disk.  Parity sparing: before servicing a write request, the invalid sector is reconstructed and relocated to overwrite its corresponding parity sector 37
  38. 38. Implementation Considerations  Orthagonal Raid 38
  39. 39. Advanced Topics  Improving Small Write Performance for RAID Level 5  Declustered Parity  Exploiting On-Line Spare Disks  Data Striping in Disk Arrays  Performance and Reliability Modeling 39
  40. 40. Improving Small Write Performance for RAID Level 5  Buffering and Caching  Write buffering (async writes): Collect small writes in a buffer and write as a large data  Read caching: reduce four I/O access to three, old data is read from cache 40
  41. 41. Improving Small Write Performance for RAID Level 5  Floating Parity  Shortens the read-modify-write time  Many free blocks  New parity block is writted rotationally nearest unallocated block following the old parity block  Implemented on disk controller 41
  42. 42. Improving Small Write Performance for RAID Level 5  Floating Parity  Shortens the read-modify-write time  Many free blocks  New parity block is writted rotationally nearest unallocated block following the old parity block  Implemented on disk controller 42
  43. 43. Improving Small Write Performance for RAID Level 5  Parity Logging  Delaying the read of old parity and write of the new parity  Difference is temporarily logged  Logs are grouped together and large contiguous blocks are updated more efficiently 43
  44. 44. Declustered Parity  Distributes the increased load uniformly over all disks 44
  45. 45. Exploiting On-Line Spare Disks  Distributed Sparing 45
  46. 46. Exploiting On-Line Spare Disks  Parity Sparing 46
  47. 47. Data Striping in Disk Arrays  Disk positioning time is wasted work  Idle times are same as disk positioning  Data striping or interleaving is, distributing data among multiple disks.  Researchers work on data striping unit size to maximize the throughput 47
  48. 48. Data Striping in Disk Arrays  P: average disk positioning time  X: average disk transfer rate  L: concurrency  Z: request size  N: array size in disks 48
  49. 49. Performance and Reliability Modeling  Performance  Kim: response time equations  Kim & Tantawi: approximate service time equations  Chen & Towsley  Lee & Katz  Reliability  Markov 49
  50. 50. Opportunities for Future Research  Experience with Disk Arrays  Interaction among New Organizations  Scalability, Massively Parallel Computers and Small Disks  Latency 50

×