HBase Accelerated: In-Memory Flush and Compaction

HBase Accelerated:
In-Memory Flush and Compaction
E s h c a r H i l l e l , A n a s t a s i a B r a g i n s k y , E d w a r d B o r t n i k o v ⎪ H a d o o p S u m m i t E u r o p e ,
A p r i l 1 3 , 2 0 1 6

Motivation: Dynamic Content Processing on Top of HBase
2
 Real-time content processing pipelines
› Store intermediate results in persistent map
› Notification mechanism is prevalent
 Storage and notifications on the same platform
 Sieve – Yahoo’s real-time content management platform
Crawl
Docpro
c
Link
Analysis Queue
Crawl
schedule
Content
Queue
Links
Serving
Apache Storm
Apache HBase

Notification Mechanism is Like a Sliding Window
3
 Small working set but not necessarily FIFO queue
 Short life-cycle delete message after processing it
 High-churn workload message state can be updated
 Frequent scans to consume message
Producer
Producer
Producer
Consumer
Consumer
Consumer

HBase Accelerated: Mission Definition
4
Goal
Real-time performance and less I/O in persistent KV-stores
How?
Exploit redundancies in the workload to eliminate duplicates in memory
Reduce amount of new files
Reduce write amplification effect (overall I/O)
Reduce retrieval latencies
How much can we gain?
Gain is proportional to the duplications

 Persistent map
› Atomic get, put, and snapshot scans
 Table sorted set of rows
› Each row uniquely identified by a key
 Region unit of horizontal partitioning
 Column family unit of vertical partitioning
 Store implements a column family within a
region
 In production a region server (RS) can
manage 100s-1000s of regions
› Share resources: memory, cpu, disk
key Col1 Col2 Col3
5
HBase Data model
Column family
Region

 Random I/O  Sequential I/O
 Random writes absorbed in RAM (Cm)
› Low latency
› Built in versioning
› Write-ahead-log (WAL) for durability
 Periodic batch merge to disk
› High-throughput I/O
 Disk components (Cd) immutable
› Periodic merge&compact eliminate duplicate
versions and tombstones
 Random reads from Cm or Cd cache
6
Log-Structured Merge (LSM) Key-Value Store Approach
Cm memory
disk
C1
C2
merge
Cd

 Random writes absorbed in active segment (Cm)
 When active segment is full
› Becomes immutable (snapshot)
› A new mutable (active) segment serves writes
› Flushed to disk, truncate WAL
 Compaction reads a few files, merge-sorts them, writes back new files
7
HBase Writes
C’m
flush
memory HDFS
Cm
prepare-for-flush
Cd
WAL
write

 Random reads from Cm or C’m or Cd cache
 When data piles-up on disk
› Hit ratio drops
› Retrieval latency up
 Compaction re-writes small files into fewer bigger files
› Causes replication-related network and disk IO
8
HBase Reads
C’m
memory HDFS
Cm
Cd
Read
Block
cache

 New compaction pipeline
› Active segment flushed to pipeline
› Pipeline segments compacted in memory
› Flush to disk only when needed
9
New Design: In-Memory Flush and Compaction
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

Trade read cache (BlockCache) for write cache (compaction pipeline)
10
write read
cache cache
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

11
CPU
IO
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory
Trade read cache (BlockCache) for write cache (compaction pipeline)
More CPU cycles for less I/O

Evaluation Settings: In-Memory Working Set
12
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks)
› Default: 128MB flush size, 100MB block-cache
› Compacting: 192MB flush size, 36MB block-cache
 High-churn workload, small working set
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 50% reads 50% updates, target 1000ops
› 1% (long) scans 99% updates, target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds

Evaluation Results: Read Latency (Zipfian Distribution)
13
Flush
to disk
Compaction
Data
fits into
cache

Evaluation Results: Read Latency (Uniform Distribution)
14
Region
split

Evaluation Results: Scan Latency (Uniform Distribution)
15

Evaluation Settings: Handling Tombstones
16
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks), 128MB flush size, 64MB block-cache
› Default: Minimum 4 files for compaction
› Compaction: Minimum 2 files for compaction
 High-churn workload, small working set with deletes
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 40% reads 40% updates 20% deletes (with 50,000 updates head start), target 1000ops
› 1% (long) scans 66% updates 33% deletes (with head start), target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds

Evaluation Results: Read Latency (Zipfian Distribution)
17
(total 2 flushes and 1 disk compactions)

Evaluation Results: Read Latency (Uniform Distribution)
18

Evaluation Results: Scan Latency (Zipfian Distribution)
19

Work in Progress: Memory Optimizations
20
 Current design
› Data stored in flat buffers, index is a skip-list
› All memory allocated on-heap
 Optimizations
› Flat layout for immutable segments index
› Manage (allocate, store, release) data buffers off-heap
 Pros
› Locality in access to index
› Reduce memory fragmentation
› Significantly reduce GC work
› Better utilization of memory and CPU

Status Umbrella Jira HBASE-14918
21
 HBASE-14919 HBASE-15016 HBASE-15359 Infrastructure refactoring
› Status: committed
 HBASE-14920 new compacting memstore
› Status: pre-commit
 HBASE-14921 memory optimizations (memory layout, off-heaping)
› Status: under code review

Summary
22
 Feature intended for HBase 2.0.0
 New design pros over default implementation
› Predictable retrieval latency by serving (mainly) from memory
› Less compaction on disk reduces write amplification effect
› Less disk I/O and network traffic reduces load on HDFS
 We would like to thank the reviewers
› Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu

24
Evaluation Results: Write Latency

HBase Accelerated: In-Memory Flush and Compaction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HBase Accelerated: In-Memory Flush and Compaction

Similar to HBase Accelerated: In-Memory Flush and Compaction (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

HBase Accelerated: In-Memory Flush and Compaction