Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop's HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase's availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.
Unleash Your Potential - Namagunga Girls Coding Club
Improving h base availability and repair
1. Improving
HBase
Availability
and
Repair
Improving
HBase
Availability
and
Repair
Jeff
Bean,
Jonathan
Hsieh
{jw2ean,jon}
@cloudera.com
6/13/12
2. Who
Are
We?
• Jeff
Bean
• Designated
Support
Engineer,
Cloudera
• EducaGon
Program
Lead,
Cloudera
• Jonathan
Hsieh
• SoJware
Engineer,
Cloudera
• Apache
HBase
CommiLer
and
PMC
member
Hadoop
Summit
2012.
6/13/12
Copyright
2012
2
Cloudera
Inc,
All
Rights
Reserved
3. What
is
Apache
HBase?
Apache
HBase
is
an
reliable,
column-‐
oriented
data
store
that
provides
consistent,
low-‐
latency,
random
read/write
access.
Hadoop
Summit
2012.
6/13/12
Copyright
2012
3
Cloudera
Inc,
All
Rights
Reserved
4. Fault
Tolerance
vs
Highly
Available
• Fault
tolerant:
• Ability
to
recover
service
if
a
component
fails,
without
losing
data.
Fault
Tolerant
• Highly
Available:
• Ability
to
quickly
recover
service
if
Highly
a
component
fails,
without
losing
Available
data.
• Goal:
Minimize
downGme!
Hadoop
Summit
2012.
6/13/12
Copyright
2012
4
Cloudera
Inc,
All
Rights
Reserved
5. HBase
Architecture
• HBase
is
designed
to
be
fault
tolerant
and
highly
available
• It
depends
on
other
systems
to
be
as
well.
App
MR
• ReplicaDon
for
fault
tolerance
• Serve
regions
from
any
Region
server
• Failover
HMasters
• ZK
Quorums
• HDFS
Block
replicaGon
on
Data
Nodes
ZK
HDFS
• But
replicaGon
doesn’t
guarantee
high
availability
• There
can
sGll
be
soJware
or
human
faults
Hadoop
Summit
2012.
6/13/12
Copyright
2012
5
Cloudera
Inc,
All
Rights
Reserved
7. Causes
of
Unexpected
Maintenance
Incidents
Unplanned
Maintenance:
Root
Cause
from
Cloudera
Support
• MisconfiguraGon
• Metadata
CorrupGons
Repair
• Network
/
HW
problems
Needed
HBase,
ZK,
28%
• SW
problems
MR,
HDFS
Misconfig
44%
Fix
HW/
• Long
recovery
Gme
NW
16%
Patch
• Automated
and
manual
Required
12%
Source:
Cloudera’s
producGon
HBase
Support
Tickets
CDH3’s
HBase
0.90.x,
Hadoop
0.20.x/1.0.x
Hadoop
Summit
2012.
6/13/12
Copyright
2012
7
Cloudera
Inc,
All
Rights
Reserved
8. Outline
• Where
we
were
• HBase
0.90.x
+
Hadoop
0.20.x/1.0.x
• Case
Studies
• Where
we
are
today
• HBase
0.92.x/0.94.x
+
Hadoop
2.0.x
• Feature
Summary
• Where
we
are
going
• HBase
0.96.x
+
Hadoop
2.x
• Feature
Preview
Hadoop
Summit
2012.
6/13/12
Copyright
2012
8
Cloudera
Inc,
All
Rights
Reserved
9. [T]here
are
known
knowns;
there
are
things
we
know
we
know.
We
also
know
there
are
known
unknowns;
that
is
to
say
we
know
there
are
some
things
we
do
not
know.
But
there
are
also
unknown
unknowns
–
there
are
things
we
do
not
know
we
don't
know.
—United
States
Secretary
of
Defense
Donald
Rumsfeld
WHERE
WE
WERE:
CASE
STUDIES
Hadoop
Summit
2012.
6/13/12
Copyright
2012
9
Cloudera
Inc,
All
Rights
Reserved
10. Best
PracDces
to
avoid
hazards
Unplanned
Maintenance:
Root
Cause
from
Cloudera
Support
Repair
Needed
HBase,
ZK,
28%
MR,
HDFS
Misconfig
44%
Fix
HW/
NW
16%
Patch
Required
12%
CAN PREVENT HBASE Source:
Cloudera’s
producGon
HBase
Support
Tickets
MISCONFIGURATIONS CDH3’s
HBase
0.90.x,
Hadoop
0.20.x/1.0.x
Hadoop
Summit
2012.
6/13/12
Copyright
2012
10
Cloudera
Inc,
All
Rights
Reserved
11. Case
#1:
Memory
Over-‐subscripDon
Hazard
Misconfig
Bad
Outcome
Masters
Take
Node
A
swaps
• Too
many
MR
Slots
• MapReduce
tasks
fail
AcGon
• MR
Slots
too
large
• HDFS
datanode
• “Arbitrary”
processes
operaGons
Gme
out
• JobTracker
blacklists
TT
pause
or
unresponsive
on
node
B
• HBase
client
operaGons
fail
• Jobs
fail
or
run
slow
• NameNode
re-‐replicates
blocks
from
node
A
Node
A
Under
Node
B
can’t
Load
connect
to
node
A
Hadoop
Summit
2012.
6/13/12
Copyright
2012
11
Cloudera
Inc,
All
Rights
Reserved
12. Case
#2,
#3:
Hazards
of
Abusing
HDFS
and
ZK
Millions
of
HDFS
files
Millions
of
ZK
nodes
Bad
PracGce
MisconfiguraGon
500,000
blocks
per
Millions
of
ZK
znodes
datanode
400MB
snapshot
Heartbeat
thread
SW
Bug
ZK
fails
to
create
new
blocks
IO
snapshots,
fails
RS
cannot
access
Bad
outcome
HBase
goes
down
HDFS
HBase
goes
down
Bad
outcome
HBase
fails
to
restart
SW
Bug,
Worse
Hadoop
Summit
2012.
6/13/12
Copyright
2012
outcome
12
Cloudera
Inc,
All
Rights
Reserved
13. Case
#4:
SpliYng
CorrupDon
from
HW
failure
Manual,
Slow,
and
HW
Failure
requires
expert
HBase
has
Region
regions
MulGple
6
hour
Network
failure
Split
Recovery
inconsistencies
aLempts
to
manual
repair
(takes
out
NN)
incomplete
split
(overlaps
/
sessions.
holes)
SW
Bug
Hadoop
Summit
2012.
6/13/12
Copyright
2012
13
Cloudera
Inc,
All
Rights
Reserved
14. Case
#5:
Slow
recovery
from
HW
failure
Correct
but
slow!
Human
error
On
restart,
RS
loses
9
hour
hlog
Network
Root
Manual
HDFS,
spliung
HW
failure
and
.META.
Repairs
WALs
recovery
assign
fails
SW
error
Hadoop
Summit
2012.
6/13/12
Copyright
2012
14
Cloudera
Inc,
All
Rights
Reserved
15. IniDal
Lessons
• Use
Best
pracGces
to
avoid
problems
• ConservaGve
first
• Avoid
unstable
features
• What
can
we
do?
• Fix
the
bugs
• Recover
from
problems
faster
• Make
people
smarter
to
avoid
hazards
and
misconfiguraGons
• Make
soJware
smarter
to
prevent
hazards
and
misconfiguraGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
15
Cloudera
Inc,
All
Rights
Reserved
16. In
war,
then,
let
your
great
object
be
victory,
not
lengthy
campaigns.
-‐-‐
Sun
Tzu
WHERE
WE
ARE
TODAY
HBASE
0.92.X
+
HADOOP
2.0.X
Hadoop
Summit
2012.
6/13/12
Copyright
2012
16
Cloudera
Inc,
All
Rights
Reserved
17. Goal:
Reduce
unexpected
downDme
by
recovering
faster
• Removing
the
SPOFs
• HA
HDFS
• Faster
Recovery
• Improved
hbck
• Distributed
Log
spliung
Hadoop
Summit
2012.
6/13/12
Copyright
2012
17
Cloudera
Inc,
All
Rights
Reserved
18. Problem:
HDFS
NN
goes
down
under
HBase
• HBase
depends
on
HDFS.
App
MR
• If
HDFS
is
down,
HBase
goes
down.
• RamificaGons.
• Forces
Recovery
mechanism
• Caused
some
data
corrupGons
ZK
HDFS
• Ideally
we
avoid
having
to
do
recovery
at
all.
Hadoop
Summit
2012.
6/13/12
Copyright
2012
18
Cloudera
Inc,
All
Rights
Reserved
19. HBase-‐HDFS
HA
Nodes
NameNode
(acGve)
HMaster
(metadata
server)
(region
metadata)
NameNode
(standby)
HMaster
(acGve-‐standby
(hot
standby)
hot
failover)
ZooKeeper
Quorum
HDFS
DataNodes
HBase
RegionServers
Hadoop
Summit
2012.
6/13/12
Copyright
2012
19
Cloudera
Inc,
All
Rights
Reserved
20. HBase-‐HDFS
HA
Nodes:
Transparent
to
HBase
HMaster
(region
metadata)
HMaster
NameNode
(acGve)
(hot
standby)
ZooKeeper
Quorum
HDFS
DataNodes
HBase
RegionServers
Hadoop
Summit
2012.
6/13/12
Copyright
2012
20
Cloudera
Inc,
All
Rights
Reserved
21. HBase-‐HDFS
HA
Nodes:
No
more
SPOF
HMaster
NameNode
(acGve)
(acGve)
ZooKeeper
Quorum
HDFS
DataNodes
HBase
RegionServers
Hadoop
Summit
2012.
6/13/12
Copyright
2012
21
Cloudera
Inc,
All
Rights
Reserved
22. Recovery
operaDons
• If
a
network
switch
fails
or
if
there
is
a
power
outage,
• HBase,
ZK,
and
HA
HDFS
will
fail
• Will
always
sGll
rely
on
recovery
mechanisms.
• Need
to
be
able
to
quickly
recover
• Metadata
Invariants
to
fix
metadata
corrupGons
• Data
Consistency
to
restore
ACID
guarantees
Hadoop
Summit
2012.
6/13/12
Copyright
2012
22
Cloudera
Inc,
All
Rights
Reserved
23. HBase
Metadata
CorrupDons
• Internal
HBase
metadata
Unplanned
Maintenance:
Root
Cause
corrupGons
from
Cloudera
Support
• Prevent
HBase
from
starGng
• Cause
some
regions
to
be
Repair
unavailable.
Needed
28%
HBase,
ZK,
MR,
HDFS
Misconfig
• Repairs
are
intricate
and
44%
Fix
HW/
can
cause
extended
periods
NW
of
downGme.
16%
Patch
Required
12%
Hadoop
Summit
2012.
6/13/12
Copyright
2012
23
Cloudera
Inc,
All
Rights
Reserved
24. HBase
Metadata
Invariants
Table
Integrity
Region
Consistency
• Every
key
shall
get
assigned
• Metadata
about
regions
should
to
a
single
region.
agree
in
hdfs,
meta
and
region
server
assignment.
[‘
‘,A)
[A,B)
regioninfo
in
META
[B,
C)
[C,
D)
[D,
E)
Good
[E,
F)
region
assigned
.regioninfo
[F,
G)
to
RS
in
HDFS
[G,
‘
‘)
Hadoop
Summit
2012.
6/13/12
Copyright
2012
24
Cloudera
Inc,
All
Rights
Reserved
25. DetecDng
and
Repairing
corrupDon
with
hbck
• HBase
0.90
hbck
• Checks
an
HBase
instance’s
internals
invariants.
• HBase
hbck
today
• Checks
and
can
fix
problem
in
an
HBase
instance’s
internal
invariants
• 0.90.7,
0.92.2,
0.94.0
• CDH3u4,
CDH4
Hadoop
Summit
2012.
6/13/12
Copyright
2012
25
Cloudera
Inc,
All
Rights
Reserved
26. Case
#4
redux:
SpliYng
CorrupDon
Manual,
Slow,
and
HW
Failure
requires
expert
HBase
has
Region
Network
failure
regions
MulGple
6
hour
Split
Recovery
inconsistencies
aLempts
to
manual
repair
(takes
out
NN)
incomplete
split
(overlaps
/
sessions.
holes)
SW
Bug
Hadoop
Summit
2012.
6/13/12
Copyright
2012
26
Cloudera
Inc,
All
Rights
Reserved
27. Case
#4
redux:
SpliYng
CorrupDon
HW
Failure
HBase
has
Region
Network
failure
regions
Automated
Split
Recovery
inconsistencies
aLempts
to
repair
tool
(takes
out
NN)
incomplete
split
(overlaps
/
(Minutes)
holes)
SW
Bug
Fixes
are
quicker,
operator
can
use
Hadoop
Summit
2012.
6/13/12
Copyright
2012
27
Cloudera
Inc,
All
Rights
Reserved
28. Case
#4
redux:
SpliYng
CorrupDon
HW
Failure
Minor
HBase
Region
Network
failure
inconsistencies
Automated
Split
Recovery
aLempts
to
repair
tool
(takes
out
NN)
incomplete
(bad
split
(seconds)
assignments)
Fixed
SW
Bug
Hadoop
Summit
2012.
6/13/12
Copyright
2012
28
Cloudera
Inc,
All
Rights
Reserved
29. Data
Consistency
• When
a
region
server
goes
down,
it
tries
to
flush
data
in
memory
to
HDFS.
• If
it
cannot
write
to
HDFS,
it
relies
on
the
WAL/HLog.
• Recovery
via
the
HLog
is
vital
to
prevent
data
loss
• Understand
the
write
path.
• Recovery:
HLog
spliung.
• Faster
Recovery:
Distributed
HLog
spliung.
Hadoop
Summit
2012.
6/13/12
Copyright
2012
29
Cloudera
Inc,
All
Rights
Reserved
30. Write
Path
(Put
/
Delete
/
Increment)
HBase
client
Region
Server
HLog
Put
Server
HRegion
HRegion
MemStore
MemStore
Put
HStore
HStore
HStore
HStore
Hadoop
Summit
2012.
6/13/12
Copyright
2012
30
Cloudera
Inc,
All
Rights
Reserved
31. Write
Path
(Put
/
Delete
/
Increment)
Note,
both
regions
write
to
the
same
HBase
HLog
client
Region
Server
Put
HLog
Put
Put
Server
HRegion
HRegion
MemStore
MemStore
Put
Put
HStore
HStore
HStore
HStore
Hadoop
Summit
2012.
6/13/12
Copyright
2012
31
Cloudera
Inc,
All
Rights
Reserved
32. Log
SpliYng
HMaster
RegionServer
RegionServer
RegionServer
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
mem
mem
mem
mem
mem
mem
Hadoop
Summit
2012.
6/13/12
Copyright
2012
32
Cloudera
Inc,
All
Rights
Reserved
33. Log
SpliYng
HMaster
RegionServer
RegionServer
RegionServer
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
mem
mem
mem
mem
mem
mem
Hadoop
Summit
2012.
6/13/12
Copyright
2012
33
Cloudera
Inc,
All
Rights
Reserved
39. Log
SpliYng
Whew.
I
did
a
lot
of
spliung
work.
That
took
9
hours!
HMaster
HLog
HLog
HLog
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
39
Cloudera
Inc,
All
Rights
Reserved
40. Log
SpliYng
RegionServers,
here
are
your
region
assignments.
HMaster
RegionServer4
RegionServer5
RegionServer6
…
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
40
Cloudera
Inc,
All
Rights
Reserved
41. Log
SpliYng
Victory!
HMaster
RegionServer4
RegionServer5
RegionServer6
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
mem
mem
mem
mem
mem
mem
Hadoop
Summit
2012.
6/13/12
Copyright
2012
41
Cloudera
Inc,
All
Rights
Reserved
42. Can
we
recover
more
quickly?
• In
the
case
study,
this
is
all
done
serially
by
the
master
• The
master
took
9
hours
to
recovery.
• The
100
region
server
nodes
were
idle.
• Let’s
use
the
idle
machines
to
do
spliung
in
parallel!
• Distributed
log
spliYng
(HBASE-‐1364)
• Introduced
in
0.92.0
by
Prakash
Khemani
(Facebook)
• Included
in
CDH4
(0.92.1)
• Backported
to
CDH3u3
(off
by
default)
Hadoop
Summit
2012.
6/13/12
Copyright
2012
42
Cloudera
Inc,
All
Rights
Reserved
43. Distributed
Log
SpliYng
I’m
the
boss.
HMaster
RegionServer
RegionServer
RegionServer
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
mem
mem
mem
mem
mem
mem
Hadoop
Summit
2012.
6/13/12
Copyright
2012
43
Cloudera
Inc,
All
Rights
Reserved
44. Distributed
Log
SpliYng
There
is
a
lot
of
spliung
work
here,
HMaster
let’s
split
it
up.
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
44
Cloudera
Inc,
All
Rights
Reserved
45. Distributed
Log
SpliYng
You
guys
do
the
work
for
me.
HMaster
RegionServer4
RegionServer5
RegionServer6
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
45
Cloudera
Inc,
All
Rights
Reserved
46. Distributed
Log
SpliYng
You
guys
do
the
work
for
me.
HMaster
RegionServer4
RegionServer5
RegionServer6
HLog1
HLog2
HLog3
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
46
Cloudera
Inc,
All
Rights
Reserved
47. Distributed
Log
SpliYng
Great,
that
took
5.4
minutes.
HMaster
RegionServer4
RegionServer5
RegionServer6
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
47
Cloudera
Inc,
All
Rights
Reserved
48. Distributed
Log
SpliYng
Good
Job,
here
are
your
region
assignments.
HMaster
RegionServer4
RegionServer5
RegionServer6
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Hadoop
Summit
2012.
6/13/12
Copyright
2012
48
Cloudera
Inc,
All
Rights
Reserved
49. Distributed
Log
SpliYng
Like
a
Boss.
HMaster
RegionServer4
RegionServer5
RegionServer6
…
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
mem
mem
mem
mem
mem
mem
Hadoop
Summit
2012.
6/13/12
Copyright
2012
49
Cloudera
Inc,
All
Rights
Reserved
50. Case
#5
redux:
Network
failure
and
slow
recovery
Correct
but
slow!
Human
error
On
restart,
RS
loses
9
hour
hlog
Network
Root
Manual
HDFS,
spliung
HW
failure
and
.META.
Repair
WALs
recovery
assign
fails
Hadoop
Summit
2012.
6/13/12
Copyright
2012
50
Cloudera
Inc,
All
Rights
Reserved
51. Case
#5
redux:
Network
failure
and
slow
recovery
Correct
and
Faster!
Human
error
On
restart,
5.4
Minute
RS
loses
Network
Root
AutomaGc
hlog
HDFS,
HW
failure
and
.META.
repairs
spliung
WALs
assign
fails
recovery
Fixed!
Hadoop
Summit
2012.
6/13/12
Copyright
2012
51
Cloudera
Inc,
All
Rights
Reserved
52. WHERE
WE
ARE
GOING
HBASE
0.96
+
HADOOP
2.X
Hadoop
Summit
2012.
6/13/12
Copyright
2012
52
Cloudera
Inc,
All
Rights
Reserved
53. Themes
• Minimizing
Planned
downGme
HBase
DownDme
• Changing
configuraGons
DistribuDon
• Online
Schema
Change
(experimental
in
0.92,
0.94)
• Rolling
Restarts
Planned
• Wire
compaGbility
Unplanned
Hadoop
Summit
2012.
6/13/12
Copyright
2012
53
Cloudera
Inc,
All
Rights
Reserved
54. Table
unavailable
when
changing
schema
• Changing
table
schema
requires
disabling
table
• disable
table,
alter
table
schema,
enable
table
• Schema
includes
compression,
cf’s,
caching,
Ll,
versions.
• Goal:
Quickly
change
table
and
column
configuraGon
seungs
without
having
to
disable
Hbase
tables.
• Feature
Online
Schema
Change
(HBASE-‐1730)
• Included
in
but
considered
experimental
in
HBase
0.92/0.94.
• Contributed
by
Facebook
Hadoop
Summit
2012.
6/13/12
Copyright
2012
54
Cloudera
Inc,
All
Rights
Reserved
55. Changing
Server
Configs
and
Sogware
updates
• Rolling
restart
is
an
operaGon
for
upgrading
an
HBase
cluster
to
a
compaGble
version
while
keeping
HBase
available
and
serving
data.
• Handle
server
config
changes.
• Handle
code
changes
like
ho}ixes
or
compaGble
upgrades
Hadoop
Summit
2012.
6/13/12
Copyright
2012
55
Cloudera
Inc,
All
Rights
Reserved
56. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
56
Cloudera
Inc,
All
Rights
Reserved
57. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
57
Cloudera
Inc,
All
Rights
Reserved
58. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
58
Cloudera
Inc,
All
Rights
Reserved
59. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
59
Cloudera
Inc,
All
Rights
Reserved
60. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
60
Cloudera
Inc,
All
Rights
Reserved
61. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
61
Cloudera
Inc,
All
Rights
Reserved
62. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
62
Cloudera
Inc,
All
Rights
Reserved
63. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
63
Cloudera
Inc,
All
Rights
Reserved
64. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
64
Cloudera
Inc,
All
Rights
Reserved
65. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
65
Cloudera
Inc,
All
Rights
Reserved
66. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
66
Cloudera
Inc,
All
Rights
Reserved
67. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
67
Cloudera
Inc,
All
Rights
Reserved
68. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
68
Cloudera
Inc,
All
Rights
Reserved
69. Rolling
Restart
Admin
operaGons
ZK
Client
Shell
HM1
User
operaGons
HM2
RS1
RS2
RS3
RS4
Internal
operaGons
Hadoop
Summit
2012.
6/13/12
Copyright
2012
69
Cloudera
Inc,
All
Rights
Reserved
70. Rolling
restart
limitaDons
• There
are
limitaGons
on
Unplanned
Maintenance:
Root
rolling
restarts
Cause
from
Cloudera
Support
• All
Servers
and
clients
must
be
wire
compaGble
• All
must
be
able
to
read
old
data
in
FS
and
ZK.
Repair
Needed
HBase,
ZK,
28%
• RamificaGons:
MR,
HDFS
Misconfig
• Only
minor
version
upgrades
44%
possible
Fix
HW/
• New
features
that
change
RPCs
NW
require
custom
compaGbility
16%
Patch
shims.
Required
• Data
format
changes
not
12%
possible
across
minor
versions.
Source:
Cloudera’s
producGon
HBase
Support
Tickets
CDH3’s
HBase
0.90.x,
Hadoop
0.20.x/1.0.x
Hadoop
Summit
2012.
6/13/12
Copyright
2012
70
Cloudera
Inc,
All
Rights
Reserved
71. HBase
CompaDbility
and
Extensibility
• Coming
in
HBase
0.96
• HBASE-‐5305
and
friends
• Goals:
• Allow
API
and
changes
and
persistent
data
structure
changes
while
guarantees
compaGbility
between
different
minor
versions
(0.96.0
-‐>
0.96.1)
• HBase
client
server
compaGbility
between
Major
Versions.
(0.96.x
-‐>
0.98.x)
Hadoop
Summit
2012.
6/13/12
Copyright
2012
71
Cloudera
Inc,
All
Rights
Reserved
72. HDFS
Wire
CompaDbility
• Here
in
HDFS
2.0.x
• HADOOP-‐7347
and
friends
App
MR
• Goals:
• Allow
API
and
changes
while
guaranteeing
wire
compaGbility
between
different
minor
versions
• HDFS
client
server
compaGbility
ZK
HDFS
between
Major
Versions.
Hadoop
Summit
2012.
6/13/12
Copyright
2012
72
Cloudera
Inc,
All
Rights
Reserved
73. HDFS
Wire
CompaDbility
• Here
in
HDFS
2.0.x
• HADOOP-‐7347
and
friends
App
MR
• Goals:
• Allow
API
and
changes
while
guaranteeing
wire
compaGbility
between
different
minor
versions
• HDFS
client
server
compaGbility
ZK
HDFS
between
Major
Versions.
Hadoop
Summit
2012.
6/13/12
Copyright
2012
73
Cloudera
Inc,
All
Rights
Reserved
74. CONCLUSIONS
Hadoop
Summit
2012.
6/13/12
Copyright
2012
74
Cloudera
Inc,
All
Rights
Reserved
75. Improving
how
we
handling
causes
of
downDme
HBase
DownDme
DistribuDon
Unplanned
Maintenance:
Root
Cause
from
Cloudera
Support
Wire
compat
Best
hbck
pracGces
Repair
Planned
Needed
HBase,
ZK,
28%
MR,
HDFS
Misconfig
44%
Unplanned
Fix
HW/
NW
16%
Patch
Required
hbck
and
12%
distributed
log
Wire
spliung
compat
Hadoop
Summit
2012.
6/13/12
Copyright
2012
75
Cloudera
Inc,
All
Rights
Reserved
76. jon@cloudera.com
TwiLer:
@jmhsieh
We’re
hiring!
QUESTIONS?
Hadoop
Summit
2012.
6/13/12
Copyright
2012
76
Cloudera
Inc,
All
Rights
Reserved