HBase: Extreme Makeover

HBase:
Extreme
makeover

Vladimir
Rodionov

Hadoop/HBase
architect

Founder
of
BigBase.org

HBaseCon
2014

Features
&
Internal
Track

About
myself

•  Principal
PlaKorm
Engineer
@Carrier
IQ,
Sunnyvale,
CA

•  Prior
to
Carrier
IQ,
I
worked
@
GE,
EBay,
Plumtree/BEA.

•  HBase
user
since
2009.

•  HBase
hacker
since
2013.

•  Areas
of
experTse
include
(but
not
limited
to)
Java,

HBase,
Hadoop,
Hive,
large-‐scale
OLAP/AnalyTcs,
and
in-‐
memory
data
processing.

•  Founder
of
BigBase.org

BigBase
=
EM(HBase)

EM(*)
=
?

BigBase
=
EM(HBase)

EM(*)
=

BigBase
=
EM(HBase)

EM(*)
=

Seriously?

BigBase
=
EM(HBase)

EM(*)
=

Seriously?

for
HBase

It’s
a
MulT-‐Level
Caching
soluTon

Real
Agenda

•  Why
BigBase?

•  Brief
history
of
BigBase.org
project

•  BigBase
MLC
high
level
architecture
(L1/L2/L3)

•  Level
1
-‐
Row
Cache.

•  Level
2/3
-‐
Block
Cache
RAM/SSD.

•  YCSB
benchmark
results

•  Upcoming
features
in
R1.5,
2.0,
3.0.

•  Q&A

HBase

•  STll
lacks
some
original
BigTable’s
features.

•  STll
not
able
to
uTlize
eﬃciently
all
RAM.

•  No
good
mixed
storage
(SSD/HDD)
support.

•  Single
Level
Caching
only.
Simple.

•  HBase
+
Large
JVM
Heap
(MemStore)
=
?

BigBase

•  Adds
Row
Cache
and
block
cache
compression.

•  UTlizes
eﬃciently
all
RAM
(TBs).

•  Supports
mixed
storage
(SSD/HDD).

•  Has
MulT
Level
Caching.
Not
that
simple.

•  Will
move
MemStore
oﬀ
heap
in

R2.

Koda
(2010)

•  Koda
-‐
Java
oﬀ
heap
object
cache,
similar
to

Terracola’s
BigMemory.

•  Delivers
4x
Tmes
more
transacTons
…

•  10x
Tmes
beler
latencies
than
BigMemory
4.

•  Compression
(Snappy,
LZ4,
LZ4HC,
Deﬂate).

•  Disk
persistence
and
periodic
cache
snapshots.

•  Tested
up
to
240GB.

Karma
(2011-‐12)

•  Karma
-‐
Java
oﬀ
heap
BTree
implementaTon
to

support
fast
in
memory
queries.

•  Supports
extra
large
heaps,
100s
millions
–
billions

objects.

•  Stores
300M
objects
in
less
than
10G
of
RAM.

•  Block
Compression.

•  Tested
up
to
240GB.

•  Oﬀ
Heap
MemStore
in
R2.

Yamm
(2013)

•  Yet
Another
Memory
Manager.

– Pure
100%
Java
memory
allocator.

– Replaced
jemalloc
in
Koda.

– Now
Koda
is
100%
Java.

– Karma
is
the
next
(sTll
on
jemalloc).

– Similar
to
memcached
slab
allocator.

•  BigBase
project
started
(Summer
2013).

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Disk

JVM

RAM

Bucket
cache

One
level
of
caching
:

•  RAM
(L2)

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Bucket
cache

JVM

RAM

One
level
of
caching
:

•  RAM
(L2)

•  Or
DISK
(L3)

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Disk

JVM

RAM

Bucket
cache

BigBase
1.0

Block
Cache
L3

SSD

JVM

RAM

Row
Cache
L1

Block
Cache
L2

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Disk

JVM

RAM

Bucket
cache

BigBase
1.0

JVM

RAM

Row
Cache
L1

Block
Cache
L2

BlockCache
L3

Network

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Disk

JVM

RAM

Bucket
cache

BigBase
1.0

JVM

RAM

Row
Cache
L1

Block
Cache
L2

BlockCache
L3

memcached

MLC
–
MulT-‐Level
Caching

HBase
0.94

Disk

JVM

RAM

LRUBlockCache

HBase
0.96

Disk

JVM

RAM

Bucket
cache

BigBase
1.0

JVM

RAM

Row
Cache
L1

Block
Cache
L2

BlockCache
L3

DynamoDB

BigBase
Row
Cache
(L1)

Where
is
BigTable’s
Scan
Cache?

•  Scan
Cache
caches
hot
rows
data.

•  Complimentary
to
Block
Cache.

•  STll
missing
in
HBase
(as
of
0.98).

•  It’s
very
hard
to
implement
in
Java
(oﬀ
heap).

•  Max
GC
pause

is
~
0.5-‐2
sec
per
1GB
of
heap

•  G1
GC
in
Java
7
does
not
resolve
the
problem.

•  We
call
it
Row
Cache
in
BigBase.

Row
Cache
vs.
Block
Cache

HFile
Block
HFile
Block
HFile
Block
HFile
Block
HFile
Block

Row
Cache
vs.
Block
Cache

Row
Cache
vs.
Block
Cache

BLOCK
CACHE

ROW
CACHE

Row
Cache
vs.
Block
Cache

ROW
CACHE

BLOCK
CACHE

BigBase
Row
Cache

•  Oﬀ
Heap
Scan
Cache

for
HBase.

•  Cache
size:
100’s
of
GBs
to
TBs.

•  EvicTon
policies:
LRU,
LFU,
FIFO,

Random.

•  Pure
100%
-‐
compaTble
Java.

•  Sub-‐millisecond
latencies,
zero
GC.

•  Implemented
as
RegionObserver

coprocessor.

Row
Cache

YAMM
Codecs

Kryo

SerDe

KODA

BigBase
Row
Cache

•  Read
through
cache.

•  It
caches
rowkey:CF.

•  Invalidates
key
on
every
mutaTon.

•  Can
be
enabled/disabled
per
table

and
per
table:CF.

•  New
ROWCACHE
alribute.

•  Best
for
small
rows
(<
block
size)

Row
Cache

YAMM
Codecs

Kryo

SerDe

KODA

Performance-‐Scalability

•  GET
(small
rows
<
100
bytes):
175K
operaTons
per
sec

per
one
Region
Server
(from
cache).

•  MULTI-‐GET
(small
rows
<
100
bytes):
>
1M
records
per

second
(network
limited)
per
one
Region
Server.

•  LATENCY
:

99%
<
1ms
(for
GETs)
with
100K
ops.

•  VerTcal
scalability:
tested
up
to
240GB
(the
maximum

available
in
Amazon
EC2).

•  Horizontal
scalability:
limited
by
HBase
scalability.

•  No
more
memcached
farms
in
front
of
HBase
clusters.

BigBase
Block
Cache
(L2,
L3)

What
is
wrong
with
Bucket
Cache?

Scalability
LIMITED

MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED

Persistence
(‘ozeap’
mode)
NOT
SUPPORTED

Low
latency
apps
NOT
SUPPORTED

SSD
friendliness
(‘ﬁle’
mode)
NOT
FRIENDLY

Compression
NOT
SUPPORTED

What
is
wrong
with
Bucket
Cache?

Scalability
LIMITED

MulT-‐Level
Caching
(MLC)
NOT
SUPPORTED

Persistence
(‘ozeap’
mode)
NOT
SUPPORTED

Low
latency
apps
?

SSD
friendliness
(‘ﬁle’
mode)
NOT
FRIENDLY

Compression
NOT
SUPPORTED

Here
comes
BigBase

Scalability
HIGH

MulT-‐Level
Caching
(MLC)
SUPPORTED

Persistence
(‘ozeap’
mode)
SUPPORTED

Low
latency
apps
SUPPORTED

SSD
friendliness
(‘ﬁle’
mode)
SSD-‐FRIENDLY

Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE

Wait,
there
are
more
…

Scalability
HIGH

MulT-‐Level
Caching
(MLC)
SUPPORTED

Persistence
(‘ozeap’
mode)
SUPPORTED

Low
latency
apps
SUPPORTED

SSD
friendliness
(‘ﬁle’
mode)
SSD-‐FRIENDLY

Compression
SNAPPY,
LZ4,
LZHC,
DEFLATE

Non
disk–based
L3
cache
SUPPORTED

RAM
Cache
opTmizaTon
IBCO

BigBase
1.0
vs.
HBase
0.98

BigBase
HBase
0.98

Row
Cache
(L1)
YES
NO

Block
Cache
RAM
(L2)
YES
(fully
oﬀ
heap)
YES
(parTally
oﬀ
heap)

Block
Cache
(L3)
DISK
YES
(SSD-‐
friendly)
YES
(not
SSD
–
friendly)

Block
Cache
(L3)
NON
DISK
YES
NO

Compression
YES
NO

RAM
Cache
persistence
YES
(both
L1
and
L2)
NO

Low
Latency
opTmized
YES
NO

MLC
support
YES
(L1,
L2,
L3)
NO
(either
L2
or
L3)

Scalability
HIGH
MEDIUM
(limited
by
JVM
heap)

Test
setup
(AWS)

•  HBase
0.94.15
–
RS:
11.5GB
heap
(6GB
LruBlockCache
on
heap);
Master:
4GB
heap.

•  Clients:
5
(30
threads
each),
collocated
with
Region
Servers.

•  Data
sets:
100M
and
200M.
120GB
/
240GB
approximately.
Only
25%
fits
in
a
cache.

•  Workloads:
100%
read
(read100,
read200,
hotspot100),
100%
scan
(scan100,
scan200)
–zipfian.

•  YCSB
0.1.4
(modified
to
generate
compressible
data).
We
generated
compressible
data

(with
factor
of
2.5x)
only
for
scan
workloads
to
evaluate
effect
of
compression
in
BigBase

block
cache
implementaTon.

•  Common
–
Whirr
0.8.2;
1
(Master
+
Zk)
+
5
RS;
m1.xlarge:
15GB
RAM,
4
vCPU,
4x420
HDD

•  BigBase
1.0
(0.94.15)
–
RS:
4GB
heap
(6GB
off
heap
cache);
Master:
4GB
heap.

•  HBase
0.96.2
–
RS:
4GB
heap
(6GB
Bucket
Cache
off
heap);
Master:
4GB
heap.

Benchmark
results
(RPS)

11405

6123

5553

6265

4086
3850

15150

3512

2855
3224

1500

709
820
434
228

0

2000

4000

6000

8000

10000

12000

14000

16000

BigBase
R1.0
HBase
0.96.2
HBase
0.94.15

read100

read200

hotspot100

scan100

scan200

Average
latency
(ms)

13
24
27
23
36
39
10

44
52
48

102

223

187

375

700

0

100

200

300

400

500

600

700

800

BigBase
R1.0
HBase
0.96.2
HBase
0.94.15

read100

read200

hotspot100

scan100

scan200

95%
latency
(ms)

51

91
100
88
124
138

38

152

197
175

405

950

729

0

100

200

300

400

500

600

700

800

900

1000

BigBase
R1.0
HBase
0.96.2
HBase
0.94.15

read100

read200

hotspot100

scan100

scan200

99%
latency
(ms)

133

190
213
225

304

338

111

554

632

367

811

0

100

200

300

400

500

600

700

800

900

BigBase
R1.0
HBase
0.96.2
HBase
0.94.15

read100

read200

hotspot100

scan100

scan200

YCSB
100%
Read

3621

1308

2281

1111
1253

770

0

500

1000

1500

2000

2500

3000

3500

4000

BigBase
R1.0
HBase
0.94.15

Per
Server

50M
100M
200M

•  50M

=
2.77X

•  100M
=
2.05X

•  200M
=
1.63X

•  50M
=
40%
fits
cache

•  100M
=
20%
fits
cache

•  200M
=
10%
fits
cache

•  What
is
the
maximum?

YCSB
100%
Read

3621

1308

2281

1111
1253

770

0

500

1000

1500

2000

2500

3000

3500

4000

BigBase
R1.0
HBase
0.94.15

Per
Server

50M
100M
200M

•  50M

=
2.77X

•  100M
=
2.05X

•  200M
=
1.63X

•  50M
=
40%
fits
cache

•  100M
=
20%
fits
cache

•  200M
=
10%
fits
cache

•  What
is
the
maximum?

•  ~
75X
(hotspot
2.5/100)

•  56K
(BB)
vs.
750
(HBase)

•  100%
in
cache

All
data
in
cache

•  Setup:
BigBase
1.0,
48G

RAM,
(8/16)
CPU
cores
–
5

nodes
(1+
4)

•  Data
set:
200M
(300GB)

•  Test:
Read
100%,
hotspot

•  YCSB
0.1.4
–
4
clients

•  40
threads
–

100K

•  100
threads
–
168K

•  200
threads
–
224K

•  400
threads
-‐

262K

100,000
168,000
224,000
262,000

99%
1
2
3
7

95%
1
1
2
3

avg
0.4
0.6
0.9
1.5

0

1

2

3

4

5

6

7

8

Latency
(ms)

Hotspot
(2.5/100
–
200M
data)

All
data
in
cache

•  Setup:
BigBase
1.0,
48G

RAM,
(8/16)
CPU
cores
–
5

nodes
(1+
4)

•  Data
set:
200M
(300GB)

•  Test:
Read
100%,
hotspot

•  YCSB
0.1.4
–
4
clients

•  40
threads
–

100K

•  100
threads
–
168K

•  200
threads
–
224K

•  400
threads
-‐

262K

100,000
168,000
224,000
262,000

99%
1
2
3
7

95%
1
1
2
3

avg
0.4
0.6
0.9
1.5

0

1

2

3

4

5

6

7

8

Latency
(ms)

Hotspot
(2.5/100
–
200M
data)

100K
ops:
99%
<
1ms

What
is
next?

•  Release
1.1
(2014
Q2)

–  Support
HBase
0.96,
0.98,
trunk

–  Fully
tested
L3
cache
(SSD)

•  Release
1.5
(2014
Q3)

–  YAMM:
memory
allocator
compacTng
mode
.

–  IntegraTon
with
Hadoop
metrics.

–  Row
Cache:
merge
rows
on
update
(good
for
counters).

–  Block
Cache:
new
evicTon
policy
(LRU-‐2Q).

–  File
read
posix_fadvise
(
bypass
OS
page
cache).

–  Row
Cache:
make
it
available
for
server-‐side
apps

What
is
next?

•  Release
2.0
(2014
Q3)

–  HBASE-‐5263:
Preserving
cache
data
on
compacTon

–  Cache
data
blocks
on
memstore
flush
(configurable).

–  HBASE-‐10648:
Pluggable
Memstore.
Off
heap

implementaTon,
based
on
Karma
(off
heap
BTree
lib).

•  Release
3.0
(2014
Q4)

–  Real
Scan
Cache
–

caches
results
of
Scan
operaTons
on

immutable
store
files.

–  Scan
Cache
integraTon
with
Phoenix
and
with
other
3rd

party
libs
provided

rich
query
API
for
HBase.

Download/Install/Uninstall

•  Download
BigBase
1.0
from
www.bigbase.org

•  InstallaTon/upgrade
takes
10-‐20
minutes

•  BeaTﬁcaTon
operator
EM(*)
is
inverTble:

HBase
=
EM-‐1(BigBase)
(the
same
10-‐20
min)

Q
&
A

Vladimir
Rodionov

Hadoop/HBase
architect

Founder
of
BigBase.org

HBase:
Extreme
makeover

Features
&
Internal
Track

HBase: Extreme Makeover

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HBase: Extreme Makeover

Similar to HBase: Extreme Makeover (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

HBase: Extreme Makeover