Speaker: Vladimir Rodionov (bigbase.org)
This talks introduces a totally new implementation of a multilayer caching in HBase called BigBase. BigBase has a big advantage over HBase 0.94/0.96 because of an ability to utilize all available server RAM in the most efficient way, and because of a novel implementation of a L3 level cache on fast SSDs. The talk will show that different type of caches in BigBase work best for different type of workloads, and that a combination of these caches (L1/L2/L3) increases the overall performance of HBase by a very wide margin.
3. About
myself
• Principal
PlaKorm
Engineer
@Carrier
IQ,
Sunnyvale,
CA
• Prior
to
Carrier
IQ,
I
worked
@
GE,
EBay,
Plumtree/BEA.
• HBase
user
since
2009.
• HBase
hacker
since
2013.
• Areas
of
experTse
include
(but
not
limited
to)
Java,
HBase,
Hadoop,
Hive,
large-‐scale
OLAP/AnalyTcs,
and
in-‐
memory
data
processing.
• Founder
of
BigBase.org
10. BigBase
=
EM(HBase)
EM(*)
=
Seriously?
for
HBase
It’s
a
MulT-‐Level
Caching
soluTon
11. Real
Agenda
• Why
BigBase?
• Brief
history
of
BigBase.org
project
• BigBase
MLC
high
level
architecture
(L1/L2/L3)
• Level
1
-‐
Row
Cache.
• Level
2/3
-‐
Block
Cache
RAM/SSD.
• YCSB
benchmark
results
• Upcoming
features
in
R1.5,
2.0,
3.0.
• Q&A
12.
13. HBase
• STll
lacks
some
original
BigTable’s
features.
• STll
not
able
to
uTlize
efficiently
all
RAM.
• No
good
mixed
storage
(SSD/HDD)
support.
• Single
Level
Caching
only.
Simple.
• HBase
+
Large
JVM
Heap
(MemStore)
=
?
14. BigBase
• Adds
Row
Cache
and
block
cache
compression.
• UTlizes
efficiently
all
RAM
(TBs).
• Supports
mixed
storage
(SSD/HDD).
• Has
MulT
Level
Caching.
Not
that
simple.
• Will
move
MemStore
off
heap
in
R2.
16. Koda
(2010)
• Koda
-‐
Java
off
heap
object
cache,
similar
to
Terracola’s
BigMemory.
• Delivers
4x
Tmes
more
transacTons
…
• 10x
Tmes
beler
latencies
than
BigMemory
4.
• Compression
(Snappy,
LZ4,
LZ4HC,
Deflate).
• Disk
persistence
and
periodic
cache
snapshots.
• Tested
up
to
240GB.
17. Karma
(2011-‐12)
• Karma
-‐
Java
off
heap
BTree
implementaTon
to
support
fast
in
memory
queries.
• Supports
extra
large
heaps,
100s
millions
–
billions
objects.
• Stores
300M
objects
in
less
than
10G
of
RAM.
• Block
Compression.
• Tested
up
to
240GB.
• Off
Heap
MemStore
in
R2.
18. Yamm
(2013)
• Yet
Another
Memory
Manager.
– Pure
100%
Java
memory
allocator.
– Replaced
jemalloc
in
Koda.
– Now
Koda
is
100%
Java.
– Karma
is
the
next
(sTll
on
jemalloc).
– Similar
to
memcached
slab
allocator.
• BigBase
project
started
(Summer
2013).
28. Where
is
BigTable’s
Scan
Cache?
• Scan
Cache
caches
hot
rows
data.
• Complimentary
to
Block
Cache.
• STll
missing
in
HBase
(as
of
0.98).
• It’s
very
hard
to
implement
in
Java
(off
heap).
• Max
GC
pause
is
~
0.5-‐2
sec
per
1GB
of
heap
• G1
GC
in
Java
7
does
not
resolve
the
problem.
• We
call
it
Row
Cache
in
BigBase.
34. BigBase
Row
Cache
• Off
Heap
Scan
Cache
for
HBase.
• Cache
size:
100’s
of
GBs
to
TBs.
• EvicTon
policies:
LRU,
LFU,
FIFO,
Random.
• Pure
100%
-‐
compaTble
Java.
• Sub-‐millisecond
latencies,
zero
GC.
• Implemented
as
RegionObserver
coprocessor.
Row
Cache
YAMM
Codecs
Kryo
SerDe
KODA
35. BigBase
Row
Cache
• Read
through
cache.
• It
caches
rowkey:CF.
• Invalidates
key
on
every
mutaTon.
• Can
be
enabled/disabled
per
table
and
per
table:CF.
• New
ROWCACHE
alribute.
• Best
for
small
rows
(<
block
size)
Row
Cache
YAMM
Codecs
Kryo
SerDe
KODA
36. Performance-‐Scalability
• GET
(small
rows
<
100
bytes):
175K
operaTons
per
sec
per
one
Region
Server
(from
cache).
• MULTI-‐GET
(small
rows
<
100
bytes):
>
1M
records
per
second
(network
limited)
per
one
Region
Server.
• LATENCY
:
99%
<
1ms
(for
GETs)
with
100K
ops.
• VerTcal
scalability:
tested
up
to
240GB
(the
maximum
available
in
Amazon
EC2).
• Horizontal
scalability:
limited
by
HBase
scalability.
• No
more
memcached
farms
in
front
of
HBase
clusters.
38. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
NOT
SUPPORTED
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
39. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
NOT
SUPPORTED
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
40. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
NOT
SUPPORTED
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
41. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
?
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
42. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
NOT
SUPPORTED
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
43. What
is
wrong
with
Bucket
Cache?
Scalability
LIMITED
MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED
Persistence
(‘ozeap’
mode)
NOT
SUPPORTED
Low
latency
apps
NOT
SUPPORTED
SSD
friendliness
(‘file’
mode)
NOT
FRIENDLY
Compression
NOT
SUPPORTED
44. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
45. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
46. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
47. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
48. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
49. Here
comes
BigBase
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
50. Wait,
there
are
more
…
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
Non
disk–based
L3
cache
SUPPORTED
RAM
Cache
opTmizaTon
IBCO
51. Wait,
there
are
more
…
Scalability
HIGH
MulT-‐Level
Caching
(MLC)
SUPPORTED
Persistence
(‘ozeap’
mode)
SUPPORTED
Low
latency
apps
SUPPORTED
SSD
friendliness
(‘file’
mode)
SSD-‐FRIENDLY
Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE
Non
disk–based
L3
cache
SUPPORTED
RAM
Cache
opTmizaTon
IBCO
52. BigBase
1.0
vs.
HBase
0.98
BigBase
HBase
0.98
Row
Cache
(L1)
YES
NO
Block
Cache
RAM
(L2)
YES
(fully
off
heap)
YES
(parTally
off
heap)
Block
Cache
(L3)
DISK
YES
(SSD-‐
friendly)
YES
(not
SSD
–
friendly)
Block
Cache
(L3)
NON
DISK
YES
NO
Compression
YES
NO
RAM
Cache
persistence
YES
(both
L1
and
L2)
NO
Low
Latency
opTmized
YES
NO
MLC
support
YES
(L1,
L2,
L3)
NO
(either
L2
or
L3)
Scalability
HIGH
MEDIUM
(limited
by
JVM
heap)
64. What
is
next?
• Release
1.1
(2014
Q2)
– Support
HBase
0.96,
0.98,
trunk
– Fully
tested
L3
cache
(SSD)
• Release
1.5
(2014
Q3)
– YAMM:
memory
allocator
compacTng
mode
.
– IntegraTon
with
Hadoop
metrics.
– Row
Cache:
merge
rows
on
update
(good
for
counters).
– Block
Cache:
new
evicTon
policy
(LRU-‐2Q).
– File
read
posix_fadvise
(
bypass
OS
page
cache).
– Row
Cache:
make
it
available
for
server-‐side
apps
65. What
is
next?
• Release
2.0
(2014
Q3)
– HBASE-‐5263:
Preserving
cache
data
on
compacTon
– Cache
data
blocks
on
memstore
flush
(configurable).
– HBASE-‐10648:
Pluggable
Memstore.
Off
heap
implementaTon,
based
on
Karma
(off
heap
BTree
lib).
• Release
3.0
(2014
Q4)
– Real
Scan
Cache
–
caches
results
of
Scan
operaTons
on
immutable
store
files.
– Scan
Cache
integraTon
with
Phoenix
and
with
other
3rd
party
libs
provided
rich
query
API
for
HBase.
66. Download/Install/Uninstall
• Download
BigBase
1.0
from
www.bigbase.org
• InstallaTon/upgrade
takes
10-‐20
minutes
• BeaTficaTon
operator
EM(*)
is
inverTble:
HBase
=
EM-‐1(BigBase)
(the
same
10-‐20
min)
67. Q
&
A
Vladimir
Rodionov
Hadoop/HBase
architect
Founder
of
BigBase.org
HBase:
Extreme
makeover
Features
&
Internal
Track