• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Overview of Redundant Disk Arrays
 

Overview of Redundant Disk Arrays

on

  • 596 views

Some slides on the original design of RAID, a Redundant Array of Inexpensive Disks. Demonstrates the tradeoffs between the varying RAID levels and gives some historical context.

Some slides on the original design of RAID, a Redundant Array of Inexpensive Disks. Demonstrates the tradeoffs between the varying RAID levels and gives some historical context.

Statistics

Views

Total Views
596
Views on SlideShare
596
Embed Views
0

Actions

Likes
1
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • ----- Meeting Notes (1/21/12 13:53) -----Invented around 1987.
  • ----- Meeting Notes (1/21/12 13:53) -----Patterson - BerkeleyGibson – Currently at CMUKatz - Berkeley
  • Exploits clever XOR trick to not require reading data off of all the disks to recalculate parity.Each small write requires 2 disks and 4 accesses, 2 reads and 2 writes.Each small read requires only 1 access.

Overview of Redundant Disk Arrays Overview of Redundant Disk Arrays Presentation Transcript

  • Andrew Robinson University of Michigan <androbin@umich.edu> Redundant Arrays ofInexpensive Disks (RAID) What a cool idea!
  • Authors• David A Patterson• Garth Gibson• Randy H KatzOfficially published in 1988.
  • Overview• What is RAID?• Why bother?• What is RAID, really?• How well does it work?• How’s it holding up?
  • What is RAID?• Take a bunch of disks and make them appear as one disk.• Put data on all of them• Use all at once to gain performance• Duplicate data to gain reliability• Buy cheap disks to gain dollars
  • This seems like a lot of work…why bother?
  • Let’s go back to 1987
  • CPUs and Memory kept getting faster…• Exponential growth everywhere!• CPU Performance: 1.4X increase per year – More transistors – Better architecture• Memory Performance: 1.4-2X increase per year – Invention of caches – SRAM technology
  • … but disks did not.• It’s hard to make things spin exponentially faster every year (they tend to fly apart).• Disk seek time improved at a rate of approximately 7% a year.• Caching had been employed to buffer I/O activity, this works reasonably well for predictable workloads.
  • Slow I/O Makes Slow Computers• Amdahl’s Law describes the impact of only improving some pieces, while leaving others. 1 S= S – The effective speedup F – Fraction of work in faster mode (1- f ) + f / k K – Speedup while in faster mode
  • …really slow.• If applications spend 10% of their time in I/O, when computers are 10 times faster, they will only appear 5% faster. Something needed to be done.
  • What should we do?• Single Large Expensive Disks (SLED) are not improving fast enough.• Larger memory or solid state drives weren’t practical• Small personal hard drives are emerging… can we do something with those?
  • Inexpensive Disks Rock
  • Visual Comparison
  • Why didn’t someone do this before?• Standards like SCSI have finally allowed drive makers to integrate features seen in traditional mainframe controllers.
  • There is a problem…• A hundredfold increase in number of disks means a hundredfold increase decrease in total reliability MTTFSingleDisk MTTFDiskArray = nDisks
  • that’s all really nice, butwhat is RAID, really?
  • A couple levels… a single idea• RAID manages the tradeoff between performance and reliability• RAID comes in levels (RAID1 to RAID5)• These levels represent points in the performance reliability space
  • Groups, Disks, and Check Disks• RAID organizes disks into groups of reliability• Some of the disks in a group store error correcting data D = Total disks with data G = Disks in a group C = Number of check disks in a group
  • Metrics• Useable Storage – Percent of storage that holds data, excluding parity information• Performance – Tough to make one number: – Reads, Writes, and Read-Modify-Write Access Patterns – Sequential and Random Data Distribution
  • RAID1 – The Naive Approach• Mirroring of all data• To read: – Use either disk• To write: – Send to both disks simultaneously• Minor read performance increase.
  • EvaluationPros Cons• Reads can occur • Useable storage is cut in simultaneously half• Seek times can improve • All other performance with special controllers metrics are left the same• Predictable performance Alright for large sequential jobs and transaction processing jobs
  • RAID2 – Bit Level Striping• Uses Hamming Code for Error Detection• Requires many check disks – For 10 data disks, 4 check disks – For 25 data disks, 5 check disks• Can detect errors, and determine the at-fault disk
  • RAID2 - Visually
  • EvaluationPros Cons• Better useable storage, 71% • Dismal small random data for G=10, 83% for G=25 access performance: 3-9% of RAID1 or SLED Good for large sequential jobs, bad for transaction processing systems.
  • RAID3 – Byte Level Striping• Simpler parity error correction• Only a single check disk required for error detection• Cannot determine which disk failed, but that’s usually pretty obvious• Transfers of large continuous blocks is good
  • RAID3
  • EvaluationPros Cons• Even better useable • Small random data access storage, 91% for G=10, 96% performance: Just as bad as for G=25 RAID2 Even better for large sequential jobs, bad for transaction processing systems.
  • What is parity?• Parity is calculated as an XOR of the data blocks.• XOR is reversible: – 1011 (A1) XOR 1100 (A2) => 0111 (AP) “parity” – 0111 (AP) XOR 1011 (A1) => 1100 (A2) – 0111 (AP) XOR 1100 (A2) => 1011 (A1)• This makes error detection and reconstruction possible!
  • RAID4 - Block Level Striping• Like RAID3, but more parallelly• Interleave data at sector level rather than bit level• Allows for servicing of multiple block requests by different drives• Still keeps all the parity information on a single drive
  • RAID4
  • EvaluationPros Cons• Finally better small random • Small writes, and read- access. Reads are fast! write-modifies are still slow. Good for large sequential jobs, still not great for transaction processing systems.
  • RAID5 – Block Level Striping with Distributed Parity• Instead of checksums on a single disk, we distribute them across all disks.• Allows us to support multiple writes per group
  • RAID5
  • EvaluationPros Cons• Really good usable storage • Slightly worse write• Finally decent small random performance, data must be data access performance written to two disks across the board! simultaneouslyFinally, a system that works well for both applications!
  • sounds complicated,how well does it work?
  • As a Whole• RAID has many different levels that achieve different tradeoffs in reliability and performance• Almost all of them, for some (or many) use cases will outperform a SLED for the same cost.
  • Read-Modify-Write Per Disk Performance
  • wow, raid sounds awesome,how’s it holding up?
  • Arriving back in 2012 now…
  • RAID has held up remarkably well• Data centers around the world use RAID technology.• The small, inexpensive disk is the de facto standard of storage• The ideas developed for RAID have been applied to many not-RAID things
  • Some open questions• What will become of RAID as new, super fast storage mediums start to become cost effective?• How does it fit in with massive internet-scale storage farms?
  • Take Aways• RAID offers significant advantage over SLED for the same cost – RAID5 offers 10x improvement in performance, reliability, and power consumption while reducing size of array.• RAID allows for modular growth (add more disks)• Cost effective option to meet challenge of exponential growth in processor and memory speeds
  • References• “A Case for Redundant Arrays of Inexpensive Disks” by David A Patterson, Garth Gibson, and Randy H Katz• “RAID: A Personal Recollection of How Storage Became a System” by Randy H Katz• Slides by David Luo and Ramasubramanian K.• Images generously borrowed from Wikipedia <http://en.wikipedia.org/wiki/RAID>
  • Thank you!