Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
HDFS RAID<br />DhrubaBorthakur (dhruba@fb.com)<br />Rodrigo Schmidt (rschmidt@fb.com)<br />RamkumarVadali (rvadali@fb.com)...
Agenda<br />What is RAID<br />RAID at Facebook<br />Anatomy of RAID<br />How to Deploy<br />Questions<br />
What Is RAID<br /><ul><li>Contrib project in MAPREDUCE</li></ul>Default HDFS replication is 3<br />Too much at PetaByte sc...
Tolerates 2 missing blocks, Storage cost 3x <br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />1...
RAID at Facebook<br />Reduces disk usage in the warehouse<br />Currently saving about 5PB with XOR RAID<br />Gradual deplo...
Saving 5PB at Facebook<br />
Anatomy of RAID<br />Server-side:<br />RaidNode<br />BlockFixer<br />Block placement policy<br />Client-side:<br />Distrib...
Anatomy of RAID<br />DataNodes<br />NameNode<br /><ul><li> Obtain missing blocks
 Get files to raid</li></ul>RaidNode<br /><ul><li> Create parity files
 Fix missing blocks
 Recover files while reading</li></ul>JobTracker<br />Raid File System<br />
RaidNode<br />Daemon that scans filesystem<br />Policy file used to provide file patterns<br />Generate parity files<br />...
Block Fixer<br />Reconstructs missing/corrupt blocks<br />Retrieves a list of corrupt files from NameNode<br />Source bloc...
Block Fixer<br />Bonus: Parity HARs<br />One HAR block => multiple parity blocks<br />Reconstructs all necessary blocks<br />
Block Fixer Stats<br />
Erasure Code<br />ErasureCode<br />abstraction for erasure code implementations<br />   public void encode(int[] message, ...
Block Placement<br />Replication = 3, Tolerates any 2 errors<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />...
Block Placement<br />Raid introduces new dependency between blocks in source and parity files<br />Default block placement...
DistributedRaidFileSystem<br />A filter file system implementation<br />Allows clients to read “corrupt” source files<br /...
RaidShell<br />Administrator tool<br />Recover blocks<br />Reconstruct missing blocks<br />Send reconstructed block to a d...
Deployment<br />Single configuration file “raid.xml”<br />Specifies file patterns to RAID<br />In HDFS config file<br />Sp...
Upcoming SlideShare
Loading in …5
×

HUG Nov 2010: HDFS Raid - Facebook

10,561 views

Published on

Published in: Technology
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/Ons1g ◀ ◀ ◀ ◀
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download Full doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download PDF EBOOK here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download EPUB Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... Download doc Ebook here { http://shorturl.at/mzUV6 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy &amp; Proven Way to Build Good Habits &amp; Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (Unlimited) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ACCESS WEBSITE for All Ebooks ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Really helpful for me. Thanks. :-)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

HUG Nov 2010: HDFS Raid - Facebook

  1. 1. HDFS RAID<br />DhrubaBorthakur (dhruba@fb.com)<br />Rodrigo Schmidt (rschmidt@fb.com)<br />RamkumarVadali (rvadali@fb.com)<br />Scott Chen (schen@fb.com)<br />Patrick Kling (pkling@fb.com)<br />
  2. 2. Agenda<br />What is RAID<br />RAID at Facebook<br />Anatomy of RAID<br />How to Deploy<br />Questions<br />
  3. 3. What Is RAID<br /><ul><li>Contrib project in MAPREDUCE</li></ul>Default HDFS replication is 3<br />Too much at PetaByte scale<br />RAID helps save space in HDFS<br />Reduce replication of “source” data<br />Data safety using “parity” data<br />
  4. 4. Tolerates 2 missing blocks, Storage cost 3x <br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />Tolerates 4 missing blocks, Storage cost 1.4x <br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />Source file<br />P1<br />P2<br />P3<br />P4<br />Parity file<br />Reed-Solomon Erasure Codes<br />
  5. 5. RAID at Facebook<br />Reduces disk usage in the warehouse<br />Currently saving about 5PB with XOR RAID<br />Gradual deployment<br />Started with few tables<br />Now used with all tables<br />Reed Solomon RAID under way<br />
  6. 6. Saving 5PB at Facebook<br />
  7. 7. Anatomy of RAID<br />Server-side:<br />RaidNode<br />BlockFixer<br />Block placement policy<br />Client-side:<br />DistributedRaidFileSystem<br />Raid Shell<br />
  8. 8. Anatomy of RAID<br />DataNodes<br />NameNode<br /><ul><li> Obtain missing blocks
  9. 9. Get files to raid</li></ul>RaidNode<br /><ul><li> Create parity files
  10. 10. Fix missing blocks
  11. 11. Recover files while reading</li></ul>JobTracker<br />Raid File System<br />
  12. 12. RaidNode<br />Daemon that scans filesystem<br />Policy file used to provide file patterns<br />Generate parity files<br />Single thread<br />Map-Reduce job<br />Reduces replication of source file<br />One thread to purge outdated parity files<br />If the source gets deleted<br />One thread to HAR parity files<br />To reduce inode count<br />
  13. 13. Block Fixer<br />Reconstructs missing/corrupt blocks<br />Retrieves a list of corrupt files from NameNode<br />Source blocks are reconstructed by “decoding”<br />Parity blocks are reconstructed by “encoding”<br />
  14. 14. Block Fixer<br />Bonus: Parity HARs<br />One HAR block => multiple parity blocks<br />Reconstructs all necessary blocks<br />
  15. 15. Block Fixer Stats<br />
  16. 16. Erasure Code<br />ErasureCode<br />abstraction for erasure code implementations<br /> public void encode(int[] message, int[] parity);<br /> public void decode(int[] data, <br />int[] erasedLocations,<br />int[] erasedValues);<br />Current implementations<br />XOR Code<br />Reed Solomon Code<br />Encoder/Decoder – uses ErasureCode to integrate with RAID framework<br />
  17. 17. Block Placement<br />Replication = 3, Tolerates any 2 errors<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />Dependent Blocks<br />Replication = 1, Parity Length = 4, Tolerates any 4 errors<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />P1<br />P2<br />P3<br />P4<br />Dependent Blocks<br />
  18. 18. Block Placement<br />Raid introduces new dependency between blocks in source and parity files<br />Default block placement is bad for RAID<br />Source/Parity blocks can be on a single node/rack<br />Parity blocks could co-locate with source blocks<br />Raid Block Policy<br />Source files: After RAIDing, disperse blocks<br />Parity files: Control placement of parity blocks to avoid source blocks and other parity blocks<br />
  19. 19. DistributedRaidFileSystem<br />A filter file system implementation<br />Allows clients to read “corrupt” source files<br />Catches BlockMissingException, ChecksumException<br />Recreates missing blocks on the fly by using parity<br />Does not fix the missing blocks<br />Only allows the reads to succeed<br />
  20. 20. RaidShell<br />Administrator tool<br />Recover blocks<br />Reconstruct missing blocks<br />Send reconstructed block to a data node<br />Raid FSCK<br /> Report corrupt files that cannot be fixed by raid<br />Handy tool as a last resort to fix blocks<br />
  21. 21. Deployment<br />Single configuration file “raid.xml”<br />Specifies file patterns to RAID<br />In HDFS config file<br />Specify raid.xml location<br />Specify location of parity files (default: /raid)<br />Specify FileSystem, BlockPlacementPolicy<br />Starting RaidNode<br />start-raidnode.sh, stop-raidnode.sh<br />http://wiki.apache.org/hadoop/HDFS-RAID<br />
  22. 22. Questions?<br />http://wiki.apache.org/hadoop/HDFS-RAID<br />
  23. 23. Limitations<br />RAID needs file with 3 or more blocks<br />Otherwise parity blocks negate space saving<br />Need to HAR small source files<br />Replication of 1 reduces locality for MR jobs<br />Replication of 2 is not too bad<br />Its very difficult to manage block placement of Parity HAR blocks<br />
  24. 24. File Stats<br />

×