Apache HBase, Accelerated: In-Memory Flush and Compaction

469 views

Published on

Eshcar Hillel and Anastasia Braginsky (Yahoo!)

Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
469
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Apache HBase, Accelerated: In-Memory Flush and Compaction

  1. 1. HBase Accelerated: In-Memory Flush and Compaction E s h c a r H i l l e l , A n a s t a s i a B r a g i n s k y , E d w a r d B o r t n i k o v ⎪ H B a s e C o n , S a n F r a n c i s c o , M a y 2 4 , 2 0 1 6
  2. 2. Outline 2  Background  In-Memory Compaction › Design & Evaluation  In-Memory Index Reduction › Design & Evaluation
  3. 3. Motivation: Dynamic Content Processing on Top of HBase 3  Real-time content processing pipelines › Store intermediate results in persistent map › Notification mechanism is prevalent  Storage and notifications on the same platform  Sieve – Yahoo’s real-time content management platform Crawl Docpro c Link Analysis Queue Crawl schedule Content Queue Links Serving Apache Storm Apache HBase
  4. 4. Notification Mechanism is Like a Sliding Window 4  Small working set but not necessarily FIFO queue  Short life-cycle delete message after processing it  High-churn workload message state can be updated  Frequent scans to consume message
  5. 5. HBase Accelerated: Mission Definition 5 Goal: Real-time performance in persistent KV-stores How: Use less in-memory space  less I/O
  6. 6. HBase Accelerated: Two Base Ideas 6 In-Memory Compaction  Exploit redundancies in the workload to eliminate duplicates in memory  Gain is proportional to the duplicate ratio In-Memory Index Reduction  Reduce the index memory footprint, less overhead per cell  Gain is proportional to the cell size Prolong in-memory lifetime, before flushing to disk  Reduce amount of new files  Reduce write amplification effect (overall I/O)  Reduce retrieval latencies
  7. 7. Outline 7  Background  In-Memory Compaction › Design & Evaluation  In-Memory Index Reduction › Design & Evaluation In-Memory Compaction Design
  8. 8.  Random writes absorbed in active segment (Cm)  When active segment is full › Becomes immutable segment (snapshot) › A new mutable (active) segment serves writes › Flushed to disk, truncate WAL  On-Disk compaction reads a few files, merge-sorts them, writes back new files 9 HBase Writes C’m flush memory HDFS Cm prepare-for-flush Cd WAL write
  9. 9. 10 HBase Reads C’m memory HDFS Cm Cd Read  Random reads from Cm or C’m or Cd (Block Cache)  When data piles-up on disk › Hit ratio drops › Retrieval latency up  Compaction re-writes small files into fewer bigger files › Causes replication-related network and disk IO Block cache
  10. 10. 12 Hbase In-Memory Compaction C’m flush memory HDFS Cm in-memory-flush Cd WAL Block cache Compaction pipeline
  11. 11.  New compaction pipeline › Active segment flushed to pipeline › Pipeline segments compacted in memory › Flush to disk only when needed 13 New Design: In-Memory Flush and Compaction C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  12. 12. Trade read cache (BlockCache) for write cache (compaction pipeline) 14 New Design: In-Memory Flush and Compaction write read cache cache C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  13. 13. 15 New Design: In-Memory Flush and Compaction CPU IO C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory Trade read cache (BlockCache) for write cache (compaction pipeline) More CPU cycles for less I/O
  14. 14. Outline 16  Background  In-Memory Compaction › Design & Evaluation  In-Memory Index Reduction › Design & Evaluation
  15. 15. Evaluation Settings: In-Memory Working Set 17  YCSB: compares compacting vs. default memstore  Small cluster: 3 HDFS nodes on a single rack, 1 RS › 1GB heap space, MSLAB enabled (2MB chunks) › Default: 128MB flush size, 100MB block-cache › Compacting: 192MB flush size, 36MB block-cache  High-churn workload, small working set › 128,000 records, 1KB value field › 10 threads running 5 millions operations, various key distributions › 50% reads 50% updates, target 1000ops › 1% (long) scans 99% updates, target 500ops  Measure average latency over time › Latencies accumulated over intervals of 10 seconds
  16. 16. Evaluation Results: Read Latency (Zipfian Distribution) 18 Flush to disk Compaction Data fits into cache
  17. 17. Evaluation Results: Read Latency (Uniform Distribution) 19 Region split
  18. 18. Evaluation Results: Scan Latency (Uniform Distribution) 20
  19. 19. Evaluation Settings: Handling Tombstones 21  YCSB: compares compacting vs. default memstore  Small cluster: 3 HDFS nodes on a single rack, 1 RS › 1GB heap space, MSLAB enabled (2MB chunks), 128MB flush size, 64MB block-cache › Default: Minimum 4 files for compaction › Compaction: Minimum 2 files for compaction  High-churn workload, small working set with deletes › 128,000 records, 1KB value field › 10 threads running 5 millions operations, various key distributions › 40% reads 40% updates 20% deletes (with 50,000 updates head start), target 1000ops › 1% (long) scans 66% updates 33% deletes (with head start), target 500ops  Measure average latency over time › Latencies accumulated over intervals of 10 seconds
  20. 20. Evaluation Results: Read Latency (Zipfian Distribution) 22 (total 2 flushes and 1 disk compactions) (total 15 flushes and 4 disk compactions)
  21. 21. Evaluation Results: Read Latency (Uniform Distribution) 23 (total 3 flushes and 2 disk compactions) (total 15 flushes and 4 disk compactions)
  22. 22. Evaluation Results: Scan Latency (Zipfian Distribution) 24
  23. 23. Outline 25  Background  In-Memory Compaction › Design & Evaluation  In-Memory Index Reduction › Design & Evaluation In-Memory Index Reduction Design
  24. 24. 26 New Design: Effective in-memory representation C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  25. 25. 27 New Design: Effective in-memory representation C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL Block cache memory
  26. 26. 28 Segment for dynamic updates Cell A Cell G Cell F Cell BCell D Cell E MSLAB
  27. 27. Exploit the Immutability of Segment after Compaction 29  Current design › Data stored in flat buffers, index is a skip-list › All memory allocated on-heap  New Design: Flat layout for immutable segments index › Less overhead per cell › Manage (allocate, store, release) data buffers off-heap  Pros › Locality in access to index › Reduce memory fragmentation › Significantly reduce GC work › Better utilization of memory and CPU
  28. 28. 30 Read-Only Segment Cell A Cell G Cell F Cell BCell D Cell E MSLAB
  29. 29. 31 New Design: Effective in-memory representation C’m flush-to-disk memory HDFS Cm in-memory-flush Cd prepare-for-flush Compaction pipeline WAL memory Are there redundancies to compact? yesno Flatten the index – less overhead per cell Flatten the index & compact – less cells & less overhead per cell
  30. 30. Evaluation Results: Read Latency 1K Cell (Small Cache) 32 0 500 1000 1500 2000 2500 10 110 210 310 410 510 610 710 810 910 1010 1110 1210 1310 1410 1510 1610 1710 1810 1910 2010 2110 2210 2310 2410 2510 2610 2710 2810 2910 3010 3110 3210 3310 3410 3510 3610 3710 3810 3910 4010 4110 4210 4310 4410 Latency(us) Timeline (seconds) Uniform (Reads 50% - Writes 50%) - Read Latency skip-list based compaction cell-array based compaction
  31. 31. Evaluation Results: Read Latency 100Byte Cell (Small Cache) 33 0 200 400 600 800 1000 1200 10 140 270 400 530 660 790 920 1050 1180 1310 1440 1570 1700 1830 1960 2090 2220 2350 2480 2610 2740 2870 3000 3130 3260 3390 3520 3650 3780 3910 4040 4170 4300 4430 4560 4690 4820 4950 5080 5210 5340 5470 5600 Latency(us) Timeline (seconds) Zipfian (Reads 50% - Writes 50%) - Read Latency skip-list based compaction cell-array based compaction
  32. 32. Evaluation Results: Scan Latency (Uniform Distribution) 34 0 50000 100000 150000 200000 250000 10 250 490 730 970 1210 1450 1690 1930 2170 2410 2650 2890 3130 3370 3610 3850 4090 4330 4570 4810 5050 5290 5530 5770 6010 6250 6490 6730 6970 7210 7450 7690 7930 8170 8410 8650 8890 9130 9370 9610 9850 10090 10330 10570 10810 11050 11290 11530 11770 12010 12250 12490 12730 12970 13210 13450 13690 13930 Latency(us) Timeline (seconds) skip-list based compaction cell-array based compaction
  33. 33. Status Umbrella Jira HBASE-14918 35  HBASE-14919 HBASE-15016 HBASE-15359 Infrastructure refactoring › Status: committed  HBASE-14920 new compacting memstore › Status: pre-commit  HBASE-14921 memory optimizations (memory layout, off-heaping) › Status: under code review
  34. 34. Summary 36  Feature intended for HBase 2.0.0  New design pros over default implementation › Predictable retrieval latency by serving (mainly) from memory › Less compaction on disk reduces write amplification effect › Less disk I/O and network traffic reduces load on HDFS › New space efficient index representation  We would like to thank the reviewers › Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu
  35. 35. 38 Evaluation Results: Write Latency

×