HBase User Group #9: HBase and HDFS
 

Like this? Share it with your network

Share

HBase User Group #9: HBase and HDFS

on

  • 9,420 views

 

Statistics

Views

Total Views
9,420
Views on SlideShare
8,260
Embed Views
1,160

Actions

Likes
10
Downloads
231
Comments
0

8 Embeds 1,160

http://www.cloudera.com 917
http://blog.cloudera.com 180
http://www.slideshare.net 56
http://static.slidesharecdn.com 2
http://webcache.googleusercontent.com 2
http://jakeo.org 1
http://www.readpath.com 1
http://127.0.0.1:8795 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HBase User Group #9: HBase and HDFS Presentation Transcript

  • 1. HBase and HDFS Todd Lipcon todd@cloudera.com Twitter: @tlipcon #hbase IRC: tlipcon March 10, 2010
  • 2. Outline HDFS Overview HDFS meets HBase Solving the HDFS-HBase problems Small Random Reads Single-Client Fault Tolerance Durable Record Appends Summary
  • 3. HDFS Overview What is HDFS? Hadoop’s Distributed File System Modeled after Google’s GFS Scalable, reliable data storage All persistent HBase storage is on HDFS HDFS reliability and performance are key to HBase reliability and performance
  • 4. HDFS Architecture
  • 5. HDFS Design Goals Store large amounts of data Data should be reliable Storage and performance should scale with number of nodes. Primary use: bulk processing with MapReduce
  • 6. Requirements for MapReduce MR Task Outputs Large streaming writes of entire files MR Task Inputs Medium-size partial reads Each task usually has 1 reader, 1 writer; 8-16 tasks per node. DataNodes usually servicing few concurrent clients MapReduce can restart tasks with ease (they are idempotent)
  • 7. Requirements for HBase All of the requirements of MapReduce, plus: Constantly append small records to an edit log (WAL) Small-size random reads Many concurrent readers Clients cannot restart → single-client fault tolerance is necessary.
  • 8. HDFS Requirements Matrix Requirement MR HBase Scalable storage System fault tolerance Large streaming writes Large streaming reads Small random reads - Single client fault tolerance - Durable record appends -
  • 9. HDFS Requirements Matrix Requirement MR HBase Scalable storage © System fault tolerance © Large streaming writes © Large streaming reads © Small random reads - § Single client fault tolerance - § Durable record appends - §
  • 10. Solutions ...turn that frown upside-down hard ↔ easy Configuration Tuning HBase-side workarounds HDFS Development/Patching
  • 11. Small Random Reads Configuration Tuning HBase often has more concurrent clients than MapReduce. Typical problems: xceiverCount 257 exceeds the limit of concurrent xcievers 256 Increase dfs.datanode.max.xcievers → 1024 (or greater) Too many open files Edit /etc/security/limits.conf to increase nofile → 32768
  • 12. Small Random Reads HBase Features HBase block cache Avoids the need to hit HDFS for many reads Finer grained synchronization in HFile reads (HBASE-2180) Allow parallel clients to read data in parallel for higher throughput Seek-and-read vs pread API (HBASE-1505) In current HDFS, these have different performance characteristics
  • 13. Small Random Reads HDFS Development in Progress Client↔DN connection reuse (HDFS-941, HDFS-380) Eliminates TCP handshake latency Avoids restarting TCP Slow-Start algorithm for each read Multiplexed BlockSender (HDFS-918) Reduces number of threads and open files in DN Netty DataNode (hack in progress) Non-blocking IO may be more efficient for high concurrency
  • 14. Single-Client Fault Tolerance What exactly do I mean? If a MapReduce task fails to write, the MR framework will restart the task. MR relies on idempotence → task failures are not a big deal. Thus, fault tolerance of a single client is not as important to MR If an HBase region fails to write, it cannot recreate the data easily HBase may access a single file for a day at a time → must ride over transient errors
  • 15. Single-Client Fault Tolerance HDFS Patches HDFS-127 / HDFS-927 Clients used to give up after N read failures on a file, with no regard for time. This patch resets the failure count after successful reads. HDFS-630 Fixes block allocation to exclude nodes client knows to be bad Important for small clusters! Backported to 0.20 in CDH2 Various other write pipeline recovery fixes in 0.20.2 (HDFS-101, HDFS-793)
  • 16. Durable Record Appends What exactly is the infamous sync()/append()? Well, it’s really hflush() HBase accepts writes into memory (the MemStore) It also logs them to disk (the HLog / WAL) Each write needs to be on disk before claiming durability. hflush() provides this guarantee (almost) Unfortunately, it doesn’t work in Apache Hadoop 0.20.x
  • 17. Durable Record Appends HBase Workarounds HDFS files are durable once closed Currently, HBase rolls the edit log periodically After a roll, previous edits are safe
  • 18. Durable Record Appends HBase Workarounds HDFS files are durable once closed Currently, HBase rolls the edit log periodically After a roll, previous edits are safe Not much of a workaround § A crash will lose any edits since last roll. Rolling constantly results in small files Bad for NN metadata efficiency. Triggers frequent flushes → bad for region server efficiency
  • 19. Durable Record Appends HDFS Development On Apache trunk: HDFS-265 New append re-implementation for 0.21/0.22 Will work great, but essentially a very large set of patches Not released yet - running unreleased Hadoop is “daring” In 0.20.x distributions: HDFS-200 patch Fixes bugs in old hflush() implementation Not quite as efficient as HDFS-265, but good enough and simpler Dhruba Borthakur from Facebook testing and improving Cloudera will test and merge this into CDH3
  • 20. Summary HDFS’s original target workload was MapReduce, and HBase has different (harder) requirements. Engineers from the HBase team plus Facebook, Cloudera, and Yahoo are working together to improve things. Cloudera will integrate all necessary HDFS patches in CDH3, available for testing soon. Contact me if you’d like to help test in April.
  • 21. todd@cloudera.com Twitter: @tlipcon #hbase IRC: tlipcon P.S. we’re hiring!