HBase Applications - Atlanta HUG - May 2014

1

HBase
Applica-ons

Selected
Use-‐Cases
around
a
Common
Theme

Atlanta
HUG
–May
2014

Lars
George,
Cloudera

EMEA
Chief
Architect

2

About
Me

•  EMEA
Chief
Architect
@
Cloudera

•  Consul-ng
on
Hadoop
projects
(everywhere)

•  Apache
CommiNer

•  HBase
and
Whirr

•  O’Reilly
Author

•  HBase
–
The
Deﬁni-ve
Guide

•  Now
in
Japanese!

•  Contact

•  lars@cloudera.com

•  @larsgeorge

日本語版も出ました!

3

The
Content...

•  HBase
-‐
Strengths
and
weaknesses

•  Common
use-‐cases
and
paNerns

•  Focus
on
speciﬁc
type
of
applica-ons

•  Summary

4
CONFIDENTIAL
-‐
RESTRICTED

HBase

Strength
and
Weaknesses

5

IOPS
vs
Throughput
Mythbusters

It
is
all
physics
in
the
end,
you
cannot
solve
an
I/O

problem
without
reducing
I/O
in
general.
Parallelize

access
and
read/write
sequen-ally.

6

HBase:
Strengths
&
Weaknesses

Strengths:

•  Random
access
to
small(ish)
key-‐value
pairs

•  Rows
and
columns
stored
sorted
lexicographically

•  Adds
table
and
region
concepts
to
group
related
KVs

•  Stores
and
reads
data
sequen-ally

•  Parallelizes
across
all
clients

•  Non-‐blocking
I/O
throughout

7

HBase:
Strengths
&
Weaknesses

Weaknesses:

•  Not
op-mized
(yet)
for
100%
possible
throughput
of

underlying
storage
layer

•  And
HDFS
is
not
op-mized
fully
either

•  Single
writer
issue
with
WALs

•  Single
server
hot-‐spojng
with
non-‐distributed
keys

8

PaNerns

•  There
are
common
paNerns
in
many
common
use-‐
cases,
like
programming
paNerns.

•  We
need
to
extract
these
common
paNerns
and
make

them
repeatable.

•  Similar
to
the
“Gang
of
Four”
(Gamma,
Helm,

Johnson,
Vlissides),
or
the
“Three
Amigos”
(Booch,

Jacobson,
Rumbaugh)

9
CONFIDENTIAL
-‐
RESTRICTED

Common
PaNerns

10

HBase
Dilemma

Although
HBase
can
host
many
applica-ons,
they
may

require
completely
opposite
features

Events Entities
Time Series Message Store

11

This
talk
(at
this
event)

•  Message
Store

•  Informa-on
exchange
between
en--es

•  Sending/Receiving
informa-on
is
an
event

•  Time-‐Series

•  Sequence
of
data
points
measure
at
successive
points
in

-me,
spaced
at
uniform
intervals

•  Measuring
of
a
data
point
is
an
event

12

Using
HBase
Strengths

13

HBase
“Indexes”
(cont.)

•  Use
primary
keys,
aka
the
row
keys,
as
sorted
index

•  One
sort
direc-on
only

•  Use
“secondary
index”
to
get
reverse
sor-ng

•  Lookup
table
or
same
table

•  Use
secondary
keys,
aka
the
column
qualiﬁers,
as

sorted
index
within
main
record

•  Use
preﬁxes
within
a
column
family
or
separate
column

families

14
CONFIDENTIAL
-‐
RESTRICTED

Common
Use-‐Cases

15

Use-‐Case
I:
Messages

16

HBase
Message
Store

Use-‐Case:

•  Store
incoming
messages
in
HBase,
such
as
Emails,

SMS,
MMS,
IM

•  Constant
updates
of
exis-ng
en--es

•  e.g.
Email
read,
ﬂagged,
starred,
moved,
deleted

•  Reading
of
top-‐N
entries,
sorted
by
-me

•  Newest
20
messages,
last
20
conversa-ons

•  Examples:

•  Facebook
Messages

17

Problem
Descrip-on

•  Records
are
of
varying
size

•  Large
ones
hinder
smaller
ones

•  Massive
index
issue

•  User
can
sort,
ﬁlter
by
everything

•  At
the
same
-me
reading
top-‐N
should
be
fast

•  But
what
to
do
for
automated
accounts?
80/20
rule?

•  Only
doable
with
heuris-cs

•  Only
create
minimal
indexes

•  Create
addi-onal
ones
when
user
asks
for
it

•  Cross
mailbox
issues
with
Conversa-ons

•  Similar
to
-meline
in
Facebook

•  Overall
requirements
for
I/O

18

Interlude I: Compaction
Details
Write Amplification in HBase

19

Compac-ons
in
HBase

•  Must
happen
to
keep
data
in
check

•  Combine
small
flush
files
into
larger
ones

•  Remove
old
data
(during
major
compac-ons)

•  Two
types:
Minor
and
Major
Compac-ons

•  Minor
are
triggered
with
API
muta-on
calls

•  Major
are
-me
scheduled
(or
auto-‐promoted)

•  Both
can
be
triggered
manually
if
needed

•  Add
extra
background
I/O
that
grows
over
-me

•  Write
amplifica-on!

•  Have
to
be
tuned
for
heavy
write
systems

20

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF1
hbase.hregion.memstore.flush.size = 128MB

21

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF1HF2HF1

22

Writes:
Flushes
and
Compac-ons

HF3
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF2HF1
hbase.hstore.compaction.min = 3
hbase.hstore.compactionThreshold = 3 (0.90)
hbase.hstore.compaction.max = 10

23

Writes:
Flushes
and
Compac-ons

CF1
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
1. Compaction
(Major auto promoted)

24

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4

25

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4 HF5HF4

26

Writes:
Flushes
and
Compac-ons

HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5HF4

27

Writes:
Flushes
and
Compac-ons

HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5HF4
hbase.hstore.compaction.ratio = 1.2
hbase.hstore.compaction.min.size = flush size

28

Writes:
Flushes
and
Compac-ons

HF6
Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF1 HF5
HF4
120%

29

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
2. Compaction

30

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
HF7
CF2

31

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2

32

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9

33

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9 HF10

34

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
HF7
HF8
CF2
HF9
HF10
120%
Eliminate older to newer files, until in ratio

35

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
CF2
CF3
3. Compaction

37

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750

38

Addi-onal
Notes
#1

There
are
a
few
more
sejngs
for
compac-ons:

•  hbase.hstore.compaction.max = 10
Limit
per
maximum
number
of
ﬁles
per
compac-on

•  hbase.hstore.compaction.max.size =
Long.MAX_VALUE
Exclude
ﬁles
larger
than
that
sejng
(0.92+)

•  hbase.hregion.majorcompaction = 1d
Scheduled
major
compac-ons

39

Addi-onal
Notes
#2

•  hbase.hstore.compaction.kv.max = 10
Limits
internal
scanner
caching
during
read
of
ﬁles
to

be
compacted

•  hbase.hstore.blockingStoreFiles = 7

Enforces
upper
limit
of
ﬁles
for
compac-ons
to
catch

up
-‐
blocks
user
opera-ons!

•  hbase.hstore.blockingWaitTime = 90s

Upper
limit
on
blocking
user
opera-ons

40

Write Fragmentation
Yo, where’s the data at?

41

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations
Unique Row Inserts
We are looking at two specific rows,
one is never changed, the other
frequently

42

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts

43

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts

44

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
1. Compaction
Unique Row Inserts

45

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts

46

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts

48

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts

49

Source:http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/

50

Compac-on
Summary

•  Compac-on
tuning
is
important

•  Do
not
be
too
aggressive
or
write
amplifica-on
is

no-ceable
under
load

•  Use
-mestamp/-‐ranges
in
Get/Scan
to
limit
files

Ra+o
Effect

1.0

Dampened,
causes
more
store
files,
needs
to
be
combined
with
an

effec-ve
Bloom
filter
usage
(non
random)

1.2
Default
value,
moderate
sejng

1.4

More
aggressive,
keeps
number
of
files
low,
causes
more
auto

promoted
major
compac-ons
to
occur

51

Interlude II: Bloom Filter
Call me maybe, baby?

52

Background
on
Bloom
Filters

53

Background
on
Bloom
Filters

•  Bit
arrays
of
m
bits,
an
k
hash
func-ons

•  HBase
uses
Hash
folding

•  Returns
“No”
or
“Maybe”
only

•  Error
rate
tunable,
usually
about
1%

•  At
1%
error
rate,
op-mal
k

9.6
bits
per
key

m=18, k=3

54

Seeking
with
Bloom
Filters

55

Read
Time
Series
Entry

•  Event
record
is
wriNen
once
and
never
deleted
or

updated

•  Keeps
en-re
record
in
specific
loca-on
in
storage
files

•  Use
-me
range
to
indicate
what
is
needed

•  {Get|Scan}.setTimeRange()
•  Helps
system
to
skip
unnecessary
(older)
files

•  Bloom
Filter
helps
for
given
row
key(s)
and
column

qualifiers

•  Can
skip
files
not
containing
requested
details

56

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Unique Row Inserts
Single
Block
Read
(64K)

Block
ﬁlter
and/or
-me
range

eliminates
all
other
store
ﬁles

57

Read
Updateable
En-ty

•  Data
is
updated
regularly,
aging
out
at
intervals

•  Reading
en-ty
needs
to
read
all
details
to

recons-tute
the
current
state

•  Deletes
mask
out
aNributes

•  Updates
overrides
(or
complements)
aNributes

•  Bloom
filters
will
have
a
hard
-me
to
say
“no”
since

most
files
might
contain
en-ty
aNributes

•  Time
filter
on
scans
or
gets
also
has
few
op-ons
to

skip
files
since
older
aNributes
might
s-ll
be

important

58

Writes:
Flushes
and
Compac-ons

Older NewerTIME
SIZE (MB)
1000
0
250
500
750
Bloom
Filter
returns
“yes”
for

all
but
two
ﬁles:

7+
block
loads
(64KB)
needed

yes

yes
yes
yes

yes
no

yes
yes
no

59

Bloom
Filter
Op-ons

There
are
three
choices:

•  NONE

Duh!
Use
this
when
the
Bloom
Filter
is
not
useful

based
on
the
use-‐case
(Default
sejng)

•  ROW

Index
only
row
key,
needs
an
entry
per
row
key
in

Bloom
Filter

•  ROWCOL

Index
row
and
column
key,
requires
an
entry
in
the

Filter
for
every
column
cell
(KeyValue)

61

Bloom
Filter
Summary

•  They
help
a
lot
-‐
but
not
always

•  Highly
depends
on
write
paNerns

•  Keep
an
eye
on
size,
since
they
are
cached

•  HFile
v2
helps
here
as
it
only
loads
root
index
info

“Bloom
ﬁlters
can
get
as
large
as
100
MB
per
HFile,
which

adds
up
to
2
GB
when
aggregated
over
20
regions.
Block

indexes
can
grow
as
large
as
6
GB
in
aggregate
size
over
the

same
set
of
regions.”

Source:
hNp://hbase.apache.org/book/hﬁlev2.html

62

Interlude III: Write-ahead
Log
The lonesome writer tale.

63

Write-‐ahead
Log
-‐
Data
Flow

64

Write-‐ahead
Log
-‐
Overview

•  One
file
per
Region
Server

•  All
regions
have
a
reference
to
this
file

•  Actually
a
wrapper
around
the
physical
file

•  The
file
is
in
the
end
a
Hadoop
SequenceFile

•  Stored
in
HDFS
so
it
can
be
recovered
ater
a
server

failure

•  There
is
a
synchroniza+on
barrier
that
impacts
all

parallel
writers,
aka
clients

•  Overall
performance
is
BAD,
maybe
10MB/s

65

Write-‐ahead
Log
-‐
Workarounds

•  Enable
log
compression

hbase.regionserver.wal.enablecompression
•  Disable
WAL
for
secondary
records

•  Restore
indexes
or
derived
records
from
main
one

•  But
be
careful
to
use
coprocessor
hook
as
it
cannot
access

currently
replaying
region

•  Work
on
upstream
JIRAs

•  Mul+ple
logs
per
server

•  Fix
single
writer
issue
in
HDFS

66

Back to the main
theme...
Yes, message stores.

67

Schema

•  Every
line
is
an
inbox

•  Indexes
as
CFs
or
separate
tables

•  Random
updates
and
inserts
cause
storage
ﬁle
churn

•  Facebook
used
more
than
4
or
5
schema
itera+ons

•  Not
representa-ve
really:
pure
blob
storage

•  Evolved
over
-me
to
be
more
HBase
like

•  Another
customer
iterated
about
the
same
-me
over

various
schemas

•  Diﬃcult
to
keep
indexes
up
to
date

68

Facebook Messages
An interesting use-case…

69

Facebook
Messages
-‐
Sta-s-cs

Source: HBaseCon 2012 - Anshuman Singh

73

Notes
on
Facebook
Schema
1

This
is
basically
the
same
as
the
NameNode,
i.e.
the

applica-on
only
writes
edits
and
those
are
merged

with
a
snapshot
of
the
data.

The
applica-on
does
not
use
HBase
as
an
opera-onal

store,
but
all
data
is
cached
in
memory.

Writes
occasionally
large
chunks,
and
reads
only
a
few

-mes
to
merge
or
recover.

74

Notes
on
Facebook
Schema
1

Three
column
families:

•  Snapshot,
Ac+ons,
Keywords

Sejngs
changes:

•  DFS
Block
Size:
256MB

•  Since
large
KVs
are
wriNen

•  Eﬃciency
of
HFile
block
index
a
concern

•  Compac-on
ra-o:
1.4

•  Be
more
aggressive
to
clean
up
ﬁles

•  Split
Size:
2TB

•  Manage
splijng
manually

•  Major
Compac-ons:
3
days

76

Notes
on
Facebook
Schema
2

•  Eight
column
families

•  Snapshots
per
thread
(user
to
user)

Sejngs
changes:

•  Block
Cache
Size:
55%

•  Cache
more
data
on
HBase
side

•  Blocking
Store
Files:
25

•  Allow
more
ﬁles
to
be
around

•  Compac-on
Min
Size:
4MB

•  Reduce
number
of
uncondi-onally
selected
ﬁles

•  Major
Compac-ons:
14
days

78

Notes
on
Facebook
Schema
3

•  Eleven
column
families

•  Twenty
regions
per
server

•  One
hundred
server
per
cluster

Sejngs
changes:

•  Block
Cache
Size:
60%

•  Cache
more
data
on
HBase
side

•  Region
Slop:
5%
(from
20%)

•  Keep
strict
boundaries
on
regions
per
server

80

Note
the
imbalance!
Recall
ﬂushes
are
interconnected

and
causes
compac-on
storms.

81

FB
Messages
Summary

•  Triggered
many
changes
in
HBase:

•  Change
compac-on
selec-on
algorithm

•  Upper
bounds
on
ﬁle
sizes

•  Pools
for
small
and
large
compac-ons

•  Online
schema
changes

•  Finer
grained
metrics

•  Lazy
seeking
in
ﬁles

•  Point-‐seek
op-miza-ons

•  …

82

FB
Messages
Summary

•  Went
from
“Snapshot”
to
more
proper
schema

•  Needed
to
wait
for
schema
to
seNle

•  Could
sustain
warped
load
for
a
while

•  Eventually
uses
HBase
more
as
KV
store

•  Tweaked
sejngs
depending
on
schema

•  Tuned
compac-ons
from
aggressive
to
relaxed

•  Changed
block
sizes
to
ﬁt
KV
sizes

•  Strict
limit
on
I/O

•  100
server

•  20
regions
per
server

•  50
million
users
per
cluster

83

Use-‐Case
II:
Time
Series
Database

84

Events
make
big
data
big

•  Majority
use
cases
are
dealing
with
event
based
data

•  Especially
on
HDFS
and
MapReduce
level

•  Machine
Scale
vs.
Human
Scale

•  Event
has
aNributes

•  Type

•  Iden-ﬁer

•  Actor

•  Other
aNributes

85

Events
contd.

•  Accessing
event
data

•  Give
me
everything
about
event
e_id1

•  Give
me
everything
in
[t1,t2]

•  Give
me
everything
for
event
type
e_t1
in
[t1,t2]

•  Give
me
everything
for
actor
a1
in
[t1,t2]

•  Give
me
everything
for
event
type
e_t1
by
actor
a1
in

[t1,t2]

•  Aggregate
based
on
some
parameters
(like
above)
and

report

•  Find
events
that
match
some
other
given
criteria

86

HBase
and
Time
Series

•  Access
paNerns
suited
for
HBase

•  Random
access
to
event
data
or
aggregate
data

•  Serving…
Not
real
-me
compu-ng
(that’s
Impala)

•  Schema
design
is
the
tricky
thing

•  OpenTSDB
does
this
well
(but
limited)

•  Key
principle:

•  Collocate
data
you
want
to
read
together

•  Spread
out
as
much
as
possible
at
write
-me

•  The
above
two
are
conﬂic-ng
in
a
lot
of
cases.
So,
you

decide
on
trade
oﬀ

87

Time
Series
design
paNerns

•  Ingest

•  Flume
or
direct
wri-ng
via
app

•  HDFS

•  Batch
queries
in
Hive

•  Faster
queries
in
Impala

•  No
user
-me
serving

•  HBase

•  Serve
individual
events
(OpenTSDB)

•  Serve
pre-‐computed
aggregates
(OpenTSDB,
FB
Insights)

•  Solr

•  To
make
individual
events
searchable

88

Time
Series
design
paNerns

•  Land
data
in
HDFS
and
HBase

•  Aggregate
in
HDFS
and
write
to
HBase

•  HBase
can
do
some
aggregates
too
(counters)

•  Keep
serve-‐able
data
in
HBase.
Then
discard
(TTL
tw)

•  Keep
all
data
in
HDFS
for
future
use

89

The
story
with
only
HBase

•  Landing
des-na-on

•  Aggregates
via
counters

•  Serving
end
users

•  Event
-‐>
Flume/App
-‐>
HBase

•  Raw
entry
in
HBase
for
exact
value

•  Mul-ple
counter
increments
for
aggregates

•  OSS
implementa-on
-‐
OpenTSDB

91

Applica-ons
in
HBase

Requires
working
with
schema
peculiari-es
and

implementa-on
idiosyncrasies.

Important
is
to
compute
write
rate
and
un-‐op+mize

schema
to
ﬁt
given
hardware.
If
hardware
is
no
issue

then
the
op-mum
is
achievable.

Trifacta
of
good
performance:
Compac+ons,
Bloom

Filters,
and
key
design.

(but
also
look
out
for
Memstore
and
Blockcache
sejngs)

HBase Applications - Atlanta HUG - May 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to HBase Applications - Atlanta HUG - May 2014

Similar to HBase Applications - Atlanta HUG - May 2014 (20)

Recently uploaded

Recently uploaded (20)

HBase Applications - Atlanta HUG - May 2014