HBase: Extreme makeover
Vladimir Rodionov
Hadoop/HBase architect
Founder of BigBase.org
HBaseCon 2014
Features & Internal ...
Agenda
About myself
• Principal Platform Engineer @Carrier IQ, Sunnyvale, CA
• Prior to Carrier IQ, I worked @ GE, EBay, Plumtree...
What?
BigBase = EM(HBase)
BigBase = EM(HBase)
EM(*) = ?
BigBase = EM(HBase)
EM(*) =
BigBase = EM(HBase)
EM(*) =
Seriously?
BigBase = EM(HBase)
EM(*) =
Seriously?
for HBase
It’s a Multi-Level Caching solution
Real Agenda
• Why BigBase?
• Brief history of BigBase.org project
• BigBase MLC high level architecture (L1/L2/L3)
• Level...
HBase
• Still lacks some original BigTable’s features.
• Still not able to utilize efficiently all RAM.
• No good mixed st...
BigBase
• Adds Row Cache and block cache compression.
• Utilizes efficiently all RAM (TBs).
• Supports mixed storage (SSD/...
BigBase History
Koda (2010)
• Koda - Java off heap object cache, similar to
Terracotta’s BigMemory.
• Delivers 4x times more transactions ...
Karma (2011-12)
• Karma - Java off heap BTree implementation to
support fast in memory queries.
• Supports extra large hea...
Yamm (2013)
• Yet Another Memory Manager.
– Pure 100% Java memory allocator.
– Replaced jemalloc in Koda.
– Now Koda is 10...
BigBase Architecture
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Disk
JVMRAM
Bucket cache
One level of caching :
...
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Bucket cache
JVMRAM
One level of caching :
• RAM...
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Disk
JVMRAM
Bucket cache
BigBase 1.0
Block Cache...
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Disk
JVMRAM
Bucket cache
BigBase 1.0
JVMRAM
Row ...
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Disk
JVMRAM
Bucket cache
BigBase 1.0
JVMRAM
Row ...
MLC – Multi-Level Caching
HBase 0.94
Disk
JVMRAM
LRUBlockCache
HBase 0.96
Disk
JVMRAM
Bucket cache
BigBase 1.0
JVMRAM
Row ...
BigBase Row Cache (L1)
Where is BigTable’s Scan Cache?
• Scan Cache caches hot rows data.
• Complimentary to Block Cache.
• Still missing in HBas...
Row Cache vs. Block Cache
HFile Block HFile BlockHFile BlockHFile BlockHFile Block
Row Cache vs. Block Cache
Row Cache vs. Block Cache
BLOCK CACHE
ROW CACHE
Row Cache vs. Block Cache
ROW CACHE
BLOCK CACHE
Row Cache vs. Block Cache
ROW CACHE
BLOCK CACHE
BigBase Row Cache
• Off Heap Scan Cache for HBase.
• Cache size: 100’s of GBs to TBs.
• Eviction policies: LRU, LFU, FIFO,...
BigBase Row Cache
• Read through cache.
• It caches rowkey:CF.
• Invalidates key on every mutation.
• Can be enabled/disab...
Performance-Scalability
• GET (small rows < 100 bytes): 175K operations per sec
per one Region Server (from cache).
• MULT...
BigBase Block Cache (L2, L3)
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
What is wrong with Bucket Cache?
Scalability LIMITED
Multi-Level Caching (MLC) NOT SUPPORTED
Persistence (‘offheap’ mode) ...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Here comes BigBase
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low latency...
Wait, there are more …
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low lat...
Wait, there are more …
Scalability HIGH
Multi-Level Caching (MLC) SUPPORTED
Persistence (‘offheap’ mode) SUPPORTED
Low lat...
BigBase 1.0 vs. HBase 0.98
BigBase HBase 0.98
Row Cache (L1) YES NO
Block Cache RAM (L2) YES (fully off heap) YES (partial...
YCSB Benchmark
Test setup (AWS)
• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.
• Clients: 5 (30 threads...
Test setup (AWS)
• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.
• Clients: 5 (30 threads...
Benchmark results (RPS)
11405
6123
5553
6265
4086 3850
15150
3512
28553224
1500
709820 434 228
0
2000
4000
6000
8000
10000...
Average latency (ms)
13 24 2723 36 3910
44 5248
102
223
187
375
700
0
100
200
300
400
500
600
700
800
BigBase R1.0 HBase 0...
95% latency (ms)
51
91 10088 124 138
38
152
197175
405
950
729
0
100
200
300
400
500
600
700
800
900
1000
BigBase R1.0 HBa...
99% latency (ms)
133
190 213225
304
338
111
554
632
367
811
0
100
200
300
400
500
600
700
800
900
BigBase R1.0 HBase 0.96....
YCSB 100% Read
3621
1308
2281
11111253
770
0
500
1000
1500
2000
2500
3000
3500
4000
BigBase R1.0 HBase 0.94.15
Per Server
...
YCSB 100% Read
3621
1308
2281
11111253
770
0
500
1000
1500
2000
2500
3000
3500
4000
BigBase R1.0 HBase 0.94.15
Per Server
...
All data in cache
• Setup: BigBase 1.0, 48G
RAM, (8/16) CPU cores –
5 nodes (1+ 4)
• Data set: 200M (300GB)
• Test: Read 1...
All data in cache
• Setup: BigBase 1.0, 48G
RAM, (8/16) CPU cores –
5 nodes (1+ 4)
• Data set: 200M (300GB)
• Test: Read 1...
What is next?
• Release 1.1 (2014 Q2)
– Support HBase 0.96, 0.98, trunk
– Fully tested L3 cache (SSD)
• Release 1.5 (2014 ...
What is next?
• Release 2.0 (2014 Q3)
– HBASE-5263: Preserving cache data on compaction
– Cache data blocks on memstore fl...
Download/Install/Uninstall
• Download BigBase 1.0 from www.bigbase.org
• Installation/upgrade takes 10-20 minutes
• Beatif...
Q & A
Vladimir Rodionov
Hadoop/HBase architect
Founder of BigBase.org
HBase: Extreme makeover
Features & Internal Track
HBase: Extreme makeover
HBase: Extreme makeover
Upcoming SlideShare
Loading in …5
×

HBase: Extreme makeover

731 views
542 views

Published on

BigBase is a read-optimized version of HBase NoSQL data store and is FULLY, 100% HBase compatible. 100% compatibility means that the upgrade from HBase to BigBase and other way around does not involve data migration and even can be made without stopping the cluster (via rolling restart).

Published in: Software, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
731
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
9
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

HBase: Extreme makeover

  1. 1. HBase: Extreme makeover Vladimir Rodionov Hadoop/HBase architect Founder of BigBase.org HBaseCon 2014 Features & Internal Track
  2. 2. Agenda
  3. 3. About myself • Principal Platform Engineer @Carrier IQ, Sunnyvale, CA • Prior to Carrier IQ, I worked @ GE, EBay, Plumtree/BEA. • HBase user since 2009. • HBase hacker since 2013. • Areas of expertise include (but not limited to) Java, HBase, Hadoop, Hive, large-scale OLAP/Analytics, and in- memory data processing. • Founder of BigBase.org
  4. 4. What?
  5. 5. BigBase = EM(HBase)
  6. 6. BigBase = EM(HBase) EM(*) = ?
  7. 7. BigBase = EM(HBase) EM(*) =
  8. 8. BigBase = EM(HBase) EM(*) = Seriously?
  9. 9. BigBase = EM(HBase) EM(*) = Seriously? for HBase It’s a Multi-Level Caching solution
  10. 10. Real Agenda • Why BigBase? • Brief history of BigBase.org project • BigBase MLC high level architecture (L1/L2/L3) • Level 1 - Row Cache. • Level 2/3 - Block Cache RAM/SSD. • YCSB benchmark results • Upcoming features in R1.5, 2.0, 3.0. • Q&A
  11. 11. HBase • Still lacks some original BigTable’s features. • Still not able to utilize efficiently all RAM. • No good mixed storage (SSD/HDD) support. • Single Level Caching only. Simple. • HBase + Large JVM Heap (MemStore) = ?
  12. 12. BigBase • Adds Row Cache and block cache compression. • Utilizes efficiently all RAM (TBs). • Supports mixed storage (SSD/HDD). • Has Multi Level Caching. Not that simple. • Will move MemStore off heap in R2.
  13. 13. BigBase History
  14. 14. Koda (2010) • Koda - Java off heap object cache, similar to Terracotta’s BigMemory. • Delivers 4x times more transactions … • 10x times better latencies than BigMemory 4. • Compression (Snappy, LZ4, LZ4HC, Deflate). • Disk persistence and periodic cache snapshots. • Tested up to 240GB.
  15. 15. Karma (2011-12) • Karma - Java off heap BTree implementation to support fast in memory queries. • Supports extra large heaps, 100s millions – billions objects. • Stores 300M objects in less than 10G of RAM. • Block Compression. • Tested up to 240GB. • Off Heap MemStore in R2.
  16. 16. Yamm (2013) • Yet Another Memory Manager. – Pure 100% Java memory allocator. – Replaced jemalloc in Koda. – Now Koda is 100% Java. – Karma is the next (still on jemalloc). – Similar to memcached slab allocator. • BigBase project started (Summer 2013).
  17. 17. BigBase Architecture
  18. 18. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache
  19. 19. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Disk JVMRAM Bucket cache One level of caching : • RAM (L2)
  20. 20. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Bucket cache JVMRAM One level of caching : • RAM (L2) • Or DISK (L3)
  21. 21. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Disk JVMRAM Bucket cache BigBase 1.0 Block Cache L3 SSD JVMRAM Row Cache L1 Block Cache L2
  22. 22. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Disk JVMRAM Bucket cache BigBase 1.0 JVMRAM Row Cache L1 Block Cache L2 BlockCache L3 Network
  23. 23. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Disk JVMRAM Bucket cache BigBase 1.0 JVMRAM Row Cache L1 Block Cache L2 BlockCache L3 memcached
  24. 24. MLC – Multi-Level Caching HBase 0.94 Disk JVMRAM LRUBlockCache HBase 0.96 Disk JVMRAM Bucket cache BigBase 1.0 JVMRAM Row Cache L1 Block Cache L2 BlockCache L3 DynamoDB
  25. 25. BigBase Row Cache (L1)
  26. 26. Where is BigTable’s Scan Cache? • Scan Cache caches hot rows data. • Complimentary to Block Cache. • Still missing in HBase (as of 0.98). • It’s very hard to implement in Java (off heap). • Max GC pause is ~ 0.5-2 sec per 1GB of heap • G1 GC in Java 7 does not resolve the problem. • We call it Row Cache in BigBase.
  27. 27. Row Cache vs. Block Cache HFile Block HFile BlockHFile BlockHFile BlockHFile Block
  28. 28. Row Cache vs. Block Cache
  29. 29. Row Cache vs. Block Cache BLOCK CACHE ROW CACHE
  30. 30. Row Cache vs. Block Cache ROW CACHE BLOCK CACHE
  31. 31. Row Cache vs. Block Cache ROW CACHE BLOCK CACHE
  32. 32. BigBase Row Cache • Off Heap Scan Cache for HBase. • Cache size: 100’s of GBs to TBs. • Eviction policies: LRU, LFU, FIFO, Random. • Pure 100% - compatible Java. • Sub-millisecond latencies, zero GC. • Implemented as RegionObserver coprocessor. Row Cache YAMM Codecs Kryo SerDe KODA
  33. 33. BigBase Row Cache • Read through cache. • It caches rowkey:CF. • Invalidates key on every mutation. • Can be enabled/disabled per table and per table:CF. • New ROWCACHE attribute. • Best for small rows (< block size) Row Cache YAMM Codecs Kryo SerDe KODA
  34. 34. Performance-Scalability • GET (small rows < 100 bytes): 175K operations per sec per one Region Server (from cache). • MULTI-GET (small rows < 100 bytes): > 1M records per second (network limited) per one Region Server. • LATENCY : 99% < 1ms (for GETs) with 100K ops. • Vertical scalability: tested up to 240GB (the maximum available in Amazon EC2). • Horizontal scalability: limited by HBase scalability. • No more memcached farms in front of HBase clusters.
  35. 35. BigBase Block Cache (L2, L3)
  36. 36. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps NOT SUPPORTED SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  37. 37. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps NOT SUPPORTED SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  38. 38. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps NOT SUPPORTED SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  39. 39. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps ? SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  40. 40. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps NOT SUPPORTED SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  41. 41. What is wrong with Bucket Cache? Scalability LIMITED Multi-Level Caching (MLC) NOT SUPPORTED Persistence (‘offheap’ mode) NOT SUPPORTED Low latency apps NOT SUPPORTED SSD friendliness (‘file’ mode) NOT FRIENDLY Compression NOT SUPPORTED
  42. 42. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  43. 43. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  44. 44. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  45. 45. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  46. 46. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  47. 47. Here comes BigBase Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE
  48. 48. Wait, there are more … Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE Non disk–based L3 cache SUPPORTED RAM Cache optimization IBCO
  49. 49. Wait, there are more … Scalability HIGH Multi-Level Caching (MLC) SUPPORTED Persistence (‘offheap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE Non disk–based L3 cache SUPPORTED RAM Cache optimization IBCO
  50. 50. BigBase 1.0 vs. HBase 0.98 BigBase HBase 0.98 Row Cache (L1) YES NO Block Cache RAM (L2) YES (fully off heap) YES (partially off heap) Block Cache (L3) DISK YES (SSD- friendly) YES (not SSD – friendly) Block Cache (L3) NON DISK YES NO Compression YES NO RAM Cache persistence YES (both L1 and L2) NO Low Latency optimized YES NO MLC support YES (L1, L2, L3) NO (either L2 or L3) Scalability HIGH MEDIUM (limited by JVM heap)
  51. 51. YCSB Benchmark
  52. 52. Test setup (AWS) • HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap. • Clients: 5 (30 threads each), collocated with Region Servers. • Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache. • Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian. • YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementation. • Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD • BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap. • HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.
  53. 53. Test setup (AWS) • HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap. • Clients: 5 (30 threads each), collocated with Region Servers. • Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache. • Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian. • YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementation. • Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD • BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap. • HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.
  54. 54. Benchmark results (RPS) 11405 6123 5553 6265 4086 3850 15150 3512 28553224 1500 709820 434 228 0 2000 4000 6000 8000 10000 12000 14000 16000 BigBase R1.0 HBase 0.96.2 HBase 0.94.15 read100 read200 hotspot100 scan100 scan200
  55. 55. Average latency (ms) 13 24 2723 36 3910 44 5248 102 223 187 375 700 0 100 200 300 400 500 600 700 800 BigBase R1.0 HBase 0.96.2 HBase 0.94.15 read100 read200 hotspot100 scan100 scan200
  56. 56. 95% latency (ms) 51 91 10088 124 138 38 152 197175 405 950 729 0 100 200 300 400 500 600 700 800 900 1000 BigBase R1.0 HBase 0.96.2 HBase 0.94.15 read100 read200 hotspot100 scan100 scan200
  57. 57. 99% latency (ms) 133 190 213225 304 338 111 554 632 367 811 0 100 200 300 400 500 600 700 800 900 BigBase R1.0 HBase 0.96.2 HBase 0.94.15 read100 read200 hotspot100 scan100 scan200
  58. 58. YCSB 100% Read 3621 1308 2281 11111253 770 0 500 1000 1500 2000 2500 3000 3500 4000 BigBase R1.0 HBase 0.94.15 Per Server 50M 100M 200M • 50M = 2.77X • 100M = 2.05X • 200M = 1.63X • 50M = 40% fits cache • 100M = 20% fits cache • 200M = 10% fits cache • What is the maximum?
  59. 59. YCSB 100% Read 3621 1308 2281 11111253 770 0 500 1000 1500 2000 2500 3000 3500 4000 BigBase R1.0 HBase 0.94.15 Per Server 50M 100M 200M • 50M = 2.77X • 100M = 2.05X • 200M = 1.63X • 50M = 40% fits cache • 100M = 20% fits cache • 200M = 10% fits cache • What is the maximum? • ~ 75X (hotspot 2.5/100) • 56K (BB) vs. 750 (HBase) • 100% in cache
  60. 60. All data in cache • Setup: BigBase 1.0, 48G RAM, (8/16) CPU cores – 5 nodes (1+ 4) • Data set: 200M (300GB) • Test: Read 100%, hotspot • YCSB 0.1.4 – 4 clients • 40 threads – 100K • 100 threads – 168K • 200 threads – 224K • 400 threads - 262K 100,000 168,000 224,000 262,000 99% 1 2 3 7 95% 1 1 2 3 avg 0.4 0.6 0.9 1.5 0 1 2 3 4 5 6 7 8 Latency(ms) Hotspot (2.5/100 – 200M data)
  61. 61. All data in cache • Setup: BigBase 1.0, 48G RAM, (8/16) CPU cores – 5 nodes (1+ 4) • Data set: 200M (300GB) • Test: Read 100%, hotspot • YCSB 0.1.4 – 4 clients • 40 threads – 100K • 100 threads – 168K • 200 threads – 224K • 400 threads - 262K 100,000 168,000 224,000 262,000 99% 1 2 3 7 95% 1 1 2 3 avg 0.4 0.6 0.9 1.5 0 1 2 3 4 5 6 7 8 Latency(ms) Hotspot (2.5/100 – 200M data) 100K ops: 99% < 1ms
  62. 62. What is next? • Release 1.1 (2014 Q2) – Support HBase 0.96, 0.98, trunk – Fully tested L3 cache (SSD) • Release 1.5 (2014 Q3) – YAMM: memory allocator compacting mode . – Integration with Hadoop metrics. – Row Cache: merge rows on update (good for counters). – Block Cache: new eviction policy (LRU-2Q). – File read posix_fadvise ( bypass OS page cache). – Row Cache: make it available for server-side apps
  63. 63. What is next? • Release 2.0 (2014 Q3) – HBASE-5263: Preserving cache data on compaction – Cache data blocks on memstore flush (configurable). – HBASE-10648: Pluggable Memstore. Off heap implementation, based on Karma (off heap BTree lib). • Release 3.0 (2014 Q4) – Real Scan Cache – caches results of Scan operations on immutable store files. – Scan Cache integration with Phoenix and with other 3rd party libs provided rich query API for HBase.
  64. 64. Download/Install/Uninstall • Download BigBase 1.0 from www.bigbase.org • Installation/upgrade takes 10-20 minutes • Beatification operator EM(*) is invertible: HBase = EM-1(BigBase) (the same 10-20 min)
  65. 65. Q & A Vladimir Rodionov Hadoop/HBase architect Founder of BigBase.org HBase: Extreme makeover Features & Internal Track

×