• Save
Hadoop Summit 2012 | HBase Consistency and Performance Improvements

Like this? Share it with your network

Share

Hadoop Summit 2012 | HBase Consistency and Performance Improvements

  • 9,267 views
Uploaded on

The latest Apache HBase releases, 0.92 and 0.94, contain many improvements over prior releases in terms of correctness and performance improvements. We discuss a couple of these improvements from......

The latest Apache HBase releases, 0.92 and 0.94, contain many improvements over prior releases in terms of correctness and performance improvements. We discuss a couple of these improvements from a development and operations perspective. For correctness, we discuss the ACID guarantees of HBase, give a case study of problems with earlier releases, and give an overview of the implementation internals that were improved to fix the issues. For performance, we discuss recent improvements in 0.94 and how to monitor the performance of a cluster with new metrics.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
9,267
On Slideshare
4,408
From Embeds
4,859
Number of Embeds
11

Actions

Shares
Downloads
0
Comments
0
Likes
22

Embeds 4,859

http://www.scoop.it 4,572
http://www.cloudera.com 256
http://blog.cloudera.com 10
http://webcache.googleusercontent.com 7
https://twitter.com 5
http://www.linkedin.com 4
http://author01.core.cloudera.com 1
http://author.cloudera.solutionset.com 1
https://si0.twimg.com 1
http://us-w1.rockmelt.com 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • We are going to talk about recent improvements in HBase for ACID consistency and performance. We are going to discuss customer cases, and also look at the internals of HBase to give you a taste of these issues.
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • This is what the data format looks like, how do we write it?
  • At Cloudera support we have seen few issues where hbase consistency can be a problem
  • In some workflows is desirable to upload data directly (ETL) into Hbase instead of invokingPut() to add new records. Depending on the case of use it might also have some performance advantages.
  • It was Fixed in 0.92HBASE-4552 and back ported into Hbase 0.90.5 (for convenience its also available since CDH3u3)
  • Each read returns partial content for the same row. It can be empty data or an old version of the data.
  • Also is possible to monitor the logs and metrics before exposing the new data to users.
  • WithHBASE-4552 read-write lock was implemented in order to make the data available to the readers until the bulkupload was complete. Also the old method was deprecated and a new one was implemented.
  • Once this lock is release the data is available to the readers (Scan)
  • In our example this is a system is an email storage based hbase that stores millions of emails and a MR task is concurrently running to classify emails as spam.
  • MR users will see the counters familiar, in this example we are running a filter that scans only for a dataset of 500K records from a table with 50M rows.
  • Remember, the filter should return always 500k records
  • Remember, the filter should return always 500k records
  • Not only empty rows can be this behavior, depending on the number of version you can get old data too!
  • This is production, so you can’t stop the service just to try a workaround.
  • This is production, so you can’t stop the service just to try a workaround.
  • This is what the data format looks like, how do we write it?
  • 2
  • 2

Transcript

  • 1. June 13, 2012HBase Consistency andPerformance ImprovementsEsteban Gutierrez, Gregory Chanan{esteban, gchanan}@cloudera.com
  • 2. Who We Are • Esteban Gutierrez – Customer Operations Engineer - Focused on HBase operations • Gregory Chanan – HBase developer – Currently focused on wire compatibility2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. Apache HBase Apache HBase is a distributed, scalable column-oriented data store that runs on top of HDFS. It provides consistent, low latency, random read/write access.3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. HBase Data FormatRowKey header:from header:subject body:textgreg_email1 sister@gmail.com Father’s day card <…>greg_email2 friend@gmail.com Taco night <…>4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. HBase Data FormatColumn names are family:qualifierRowKey header:from header:subject body:textgreg_email1 sister@gmail.com Father’s day card <…>greg_email2 friend@gmail.com Taco night <…>5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. HBase Data FormatColumn names are family:qualifierRowKey header:from header:subject body:textgreg_email1 sister@gmail.com Father’s day card <…>greg_email2 friend@gmail.com Taco night <…>Column Families are a set of related columnsthat are physically stored together on disk6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. HBase Write Path HBase Put Client HBase Server7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. HBase Write Path HBase Put Client HBase Server HLog1. Write to HLog for disaster recovery Put 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. HBase Write Path HBase Put Client HBase Server HLog1. Write to HLog for disaster recovery Put MemStore2. Write to MemStore (in memory map) Put 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. HBase Write Path HBase Put Client HBase Server HLog1. Write to HLog for disaster recovery Put MemStore2. Write to MemStore (in memory map) Put 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. HBase Write Path HBase Put Client Put HBase Server HLog Put MemStore Put11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. HBase Write Path HBase Put Client Put HBase Server HLog1. Write to HLog for disaster recovery Put Put MemStore Put 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. HBase Write Path HBase Put Client Put HBase Server HLog1. Write to HLog for disaster recovery Put Put MemStore MemStore2. Write to MemStore (in memory map) Put Put 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. HBase Write Path HBase Put Client Put HBase Server HLog1. Write to HLog for disaster recovery Put Put MemStore MemStore2. Write to MemStore (in memory map) Put Put HFile3. Flush MemStore to disk as HFile Put 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. HBase Write Path - Compactions As we write and flush, we eventually get a lot of HFiles HFile HFile HFile15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. HBase Write Path - Compactions As we write and flush, we eventually get a lot of HFiles… HFile HFile HFile HFile Merge these together in a ―compaction‖16 ©2012 Cloudera, Inc. All Rights Reserved.
  • 17. HBase ACID • HBase 0.90 guarantees ACID transactions within a single row, ―with caveats‖ • HBase 0.92 guarantees ACID compliance within a single row17 ©2012 Cloudera, Inc. All Rights Reserved.
  • 18. What are ACID Transactions? • Atomicity – All parts of transaction complete or none complete • Consistency – Only valid data written to database • Isolation – Parallel transactions do not impact each other’s execution • Durability – Once transaction committed, it remains18 ©2012 Cloudera, Inc. All Rights Reserved.
  • 19. HBase ACID in 0.92 • ―Any row returned by [a] scan will be a consistent view (i.e. that version of the complete row existed at some point in time)‖[1] [1] http://hbase.apache.org/acid-semantics.html19 ©2012 Cloudera, Inc. All Rights Reserved.
  • 20. Histories from the Trenches We have seen… • Atomic Bulk Uploads • Read ACID Compliance20 ©2012 Cloudera, Inc. All Rights Reserved.
  • 21. Atomic Bulk Upload • A common pattern of use in HBase is to upload data as fast as possible from external sources • HRegion.bulkLoadHFile() makes that possible21 ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. Atomic Bulk Upload • Unfortunately importing Multiple Column Family HFiles is not an atomic operation22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. Atomic Bulk Upload • Unfortunately importing Multiple Column Family HFiles was not an atomic operation23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFile() ≤ HBase 0.90.5 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 sister@... T2 sister@... family Scan T3 sister@... family Hi… T4 sister@... family Hi… image/jpeg24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. Atomic Bulk Upload Workarounds • Implement application level validation of the imported data25 ©2011 Cloudera, Inc. All Rights Reserved.
  • 26. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 T2 Scan T3 T426 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. Atomic Bulk Upload Row 1 HRegion.bulkLoadHFiles() ≥ HBase 0.92 HFile1: HFile2: HFile3: HFile4: header:to meta:labels body:text attach:file T1 T2 Scan T3 T4 sister@... family Hi… image/jpeg …27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Read ACID Compliance Issue • Some records missing • Results are used to update an user facing application • Customer is not happy — ―Where is my data?”28 ©2011 Cloudera, Inc. All Rights Reserved.
  • 29. Read ACID Compliance Symptoms Run 1 … … … SPLIT_RAW_FILES … Map-Reduce Framework Map output records 500,00029 ©2011 Cloudera, Inc. All Rights Reserved.
  • 30. Read ACID Compliance Symptoms Run 1 Run 2 … … … … SPLIT_RAW_FILES … … Map-Reduce Framework Map output records 500,000 499,99730 ©2011 Cloudera, Inc. All Rights Reserved.
  • 31. Read ACID Compliance Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,00131 ©2011 Cloudera, Inc. All Rights Reserved.
  • 32. Read ACID Compliance Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 header:to header:from body:text greg_email1 sister@... greg@... Hi… greg_email2 sister@... esteban_email3 esteban@... Good news!.. esteban_email3 brother@...32 ©2011 Cloudera, Inc. All Rights Reserved.
  • 33. Read ACID Compliance Symptoms Scale testing shows between 0.5% to 2% of inconsistent results between runs Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 header:to header:from body:text greg_email1 sister@... greg@... Hi… greg_email2 sister@... esteban_email3 esteban@... Good news!.. esteban_email3 brother@...33 ©2011 Cloudera, Inc. All Rights Reserved.
  • 34. Read ACID Compliance • Seen only twice by Cloudera Support • Hard to detect if application level monitoring is not implemented34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. Read ACID Compliance Workarounds • Re-try scan if not all CFs are present • Or use a single CF • Re-submit job if any inconsistency is found35 ©2011 Cloudera, Inc. All Rights Reserved.
  • 36. Read ACID Compliance Long-Term Solution • Sometimes workarounds not possible -- SLAs! • Upgrade to 0.92+36 ©2011 Cloudera, Inc. All Rights Reserved.
  • 37. MVCC • HBase maintains ACID semantics using Multiversion Concurrency Control • Instead of overwriting state, create a new version of object with timestamp memStoreTs RowKey fam1:col1 fam2:col2 t1 row1 val1 val137 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. Multi Version Concurrency Control • HBase maintains ACID semantics using Multiversion Concurrency Control • Instead of overwriting state, create a new version of object with timestamp (―memStoreTs‖) memstoreTs RowKey fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 • Reads never have to block • ―memStoreTs‖ is not externally visible! Different from external timestamp38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. Review: HBase Write Path HBase Put Client Put HBase Server HLog1. Write to Hlog for disaster recovery Put Put MemStore MemStore2. Write to MemStore (in memory map) Put Put HFile3. Flush MemStore to disk as HFile Put 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t1 greg_email wife pick up kids40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t1 greg_email wife pick up kids And start a scan.41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t2 greg_email coworker bug report t1 greg_email wife pick up kids And start a scan. And concurrently put.42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. Putting it together Let’s go back to the beginning… MemStore memstoreTs RowKey hdr:from body:text t2 greg_email coworker bug report t1 greg_email wife pick up kids And start a scan. HFile And concurrently put. RowKey body:text Which causes a flush. greg_email bug report greg_email pick up kids43 ©2012 Cloudera, Inc. All Rights Reserved.
  • 44. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile RowKey body:text greg_email bug report greg_email pick up kids44 ©2012 Cloudera, Inc. All Rights Reserved.
  • 45. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile RowKey body:text greg_email bug report greg_email pick up kids But HFile has no timestamp!45 ©2012 Cloudera, Inc. All Rights Reserved.
  • 46. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile Inconsistent Result RowKey body:text RowKey hdr:from body:text greg_email bug report greg_email wife wife bug report bug report greg_email pick up kids But HFile has no timestamp!46 ©2012 Cloudera, Inc. All Rights Reserved.
  • 47. Putting it together Now, scan needs to make sense of this… MemStore memstoreTs RowKey hdr:from t2 greg_email coworker t1 greg_email wife HFile Inconsistent Result RowKey body:text RowKey hdr:from body:text greg_email bug report greg_email wife wife bug report bug report greg_email pick up kids But HFile has no timestamp!47 ©2012 Cloudera, Inc. All Rights Reserved.
  • 48. Solution Store the timestamp in the Hfile MemStore HFilememstoreTs RowKey hdr:from memStoreTs RowKey body:text t2 greg_email bug reportt2 greg_email coworkert1 greg_email wife t1 greg_email pick up kids Correct Result RowKey hdr:from body:text greg_email val1 wife val1 up kids pick Now we have all the information we need 48 ©2012 Cloudera, Inc. All Rights Reserved.
  • 49. Consistency • Only some of the consistency issues in 0.90 – e.g. HBASE-5121: MajorCompaction may affect scans correctness • Solution: Upgrade to 0.92/0.9449 ©2012 Cloudera, Inc. All Rights Reserved.
  • 50. Consistency to Performance • Initial community focus on correctness and consistency • HBase adoption growing – Number of customers – Size of deployment • Newer focus on performance50 ©2012 Cloudera, Inc. All Rights Reserved.
  • 51. Performance • Initial community focus on correctness and consistency • HBase adoption growing – Number of customers – Size of deployment • Newer focus on performance – 0.94 dubbed the ―performance release‖51 ©2012 Cloudera, Inc. All Rights Reserved.
  • 52. Performance Areas for Improvement • Read Path • Compactions • Write Path • HDFS level52 ©2012 Cloudera, Inc. All Rights Reserved.
  • 53. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more!53 ©2012 Cloudera, Inc. All Rights Reserved.
  • 54. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more!54 ©2012 Cloudera, Inc. All Rights Reserved.
  • 55. Read Path Performance: Checksums • HDFS stores checksum in separate file HFile Checksum • So each file read actually requires two disk iops • HBase often bottlenecked by random disk iops55 ©2012 Cloudera, Inc. All Rights Reserved.
  • 56. Read Path Performance: Checksums • Solution: Store checksum in HFile block • Turn off HDFS-level checksum HFile HFile Block Chksum Data • On by default (―hbase.regionserver.checksum.verify‖) • Bytes per checksum (―hbase.hstore.bytes.per.checksum‖) – default is 16K56 ©2012 Cloudera, Inc. All Rights Reserved.
  • 57. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more!57 ©2012 Cloudera, Inc. All Rights Reserved.
  • 58. Compaction Performance • Recall: Compactions • User can specify TTL per column family58 ©2012 Cloudera, Inc. All Rights Reserved.
  • 59. Compaction Performance • Recall: Compactions • User can specify TTL per column family • If all values in the HFile expired, delete rather than compact59 ©2012 Cloudera, Inc. All Rights Reserved.
  • 60. Performance Areas for Improvement • Read Path – Support checksums in HFile format (HBASE-5047) • Compactions – Delete out of TTL store files before compactions (HBASE-5199) • Write Path – HLog Compression (HBASE-4608) • HDFS level – Works with hadoop 2.0 – See HBase and HDFS: Past, Present and Future • And much more!60 ©2012 Cloudera, Inc. All Rights Reserved.
  • 61. HBase Performance Comparison Test Setup: • Compare CDH4 to CDH3u4 • 5 node cluster running Yahoo Cloud Serving Benchmark (YCSB) • 5 million records • Two distributions of operations: – 100% write – 50% read, 50% write61 ©2012 Cloudera, Inc. All Rights Reserved.
  • 62. HBase Performance Results • 100% write workload: – 49% throughput improvement – 28% latency improvement • 50% write, 50% read workload: – 14% throughput improvement – 14% latency improvement62 ©2012 Cloudera, Inc. All Rights Reserved.
  • 63. HBase Performance Conclusion • Caveat: Need to run performance tests on your workload • But compelling to upgrade to HBase to 0.92/0.94 and hadoop 2.063 ©2012 Cloudera, Inc. All Rights Reserved.
  • 64. Conclusion • Many consistency improvements in 0.92 / CDH4 • Performance improvements in 0.94 • 0.94 is wire compatible with 0.92, so will be in a CDH4 update64 ©2012 Cloudera, Inc. All Rights Reserved.
  • 65. References • HBase Acid Semantics, http://hbase.apache.org/acid- semantics.html • Apache HBase Meetup @ SU; Michael Stack. http://files.meetup.com/1350427/20120327hbase_meetu p.pdf • HBase Internals; Lars Hofhansl. http://www.cloudera.com/resource/hbasecon-2012- learning-hbase-internals/ • Hbase and HDFS: Past, Present, and Future; Todd Lipcon http://www.cloudera.com/resource/hbasecon- 2012-hbase-and-hdfs-past-present-future/65 ©2012 Cloudera, Inc. All Rights Reserved.
  • 66. Questions? Thanks for listening!66 ©2012 Cloudera, Inc. All Rights Reserved.