Trends in Supporting Production Apache HBase Clusters

Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Trends in Supporting Production
Apache HBase Clusters
Jonathan Hsieh | @jmhsieh | Software Engineer at Cloudera /
HBase PMC Member
Kevin O’Dell| kevin.odell@cloudera| Systems Engineer at Cloudera
June 26, 2013

Who are we?
Jonathan Hsieh
• Cloudera:
• Software Engineer
• Apache HBase committer /
PMC
• Apache Flume founder
Kevin O’Dell
• Cloudera:
• Systems Engineer
• Apache HBase contributor
• Cloudera HBase Support Lead
2 6/26/13 Hadoop Summit / O'Dell, Hsieh

What is Apache HBase?
Apache HBase is a
reliable, column-
oriented data store that
provides consistent, low-
latency, random
read/write access.
ZK HDFS
App MR

HBase Architecture
ZK HDFS
App MR
• HBase is designed to be fault tolerant
and highly available
• It depends on other systems to be as
well.
• Replication for fault tolerance
• Serve regions from any Region server
• Failover HMasters
• ZK Quorums
• HDFS Block replication on Data Nodes

From the trenches at Cloudera Customer Operations
Trends Supporting HBase

Customers in 2011-12 vs in 2012-13
0.90.x / CDH3 era
• Red Hat 5.x
• Java jvm 1.6.13
• 4-8 disk machines
• 24-48 GB RAM
• Dual 4-core HT
• CDH3
• Apache HBase 0.90
• Apache Hadoop 0.20.x
0.92.x/0.94.x / CDH4 era
• Red Hat 6.x
• Java jvm 1.6.31
• 12-15 disk machines
• 48-96 GB RAM
• Dual 6-core HT
• CDH4
• Apache HBase 0.92/0.94
• Apache Hadoop 2.0

Support Incidents 6/2011-6/2012
• Patched Bug
• Patched delivered, or
• Fixed in next version
• Operational Workaround
• Misconfiguration
• Schema design / tuning
• hbck used to fix
• Network/HW/OS
• Problems with underlying
systems.
7
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
6/26/13 Hadoop Summit / O'Dell, Hsieh

Comparing 6/11-6/12 to 6/12-6/13
8
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
6/12-6/13 - CDH3+CDH4 HBase
Support Tickets
Much smaller!
Merged
config/hbck
New
category
This is
bigger!

Comparing 2011 to 2012
• Majority customers
upgraded to CDH4.
• More customers, but similar
volume of support incidents
• Shrunk the CDH3’s largest
trouble spots significantly.
• Larger number of issues
due to underlying systems.
• This is actually a good thing!
9
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Operation’s pain points from 6/12 – 6/13
• Hardware (Net/OS/HW)
• Upgrade (0.90 -> 0.92)
• HBase configuration

Hardware / Network / Operating System
• Leap second
• Transparent Huge pages
• Bad 10GB Ethernet Firmware
12
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Cloudera Manager (CM) system host checker

Upgrade Issues
• Old .edits (HBASE-6440)
• 0-length HLogs (HBASE-6443)
• Bad region refs (HBASE-7199)
• Invalid HFile (Heisenbug)
14
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Upgrade Assistance
• Parcels
• simplified distribution
• flexibility of install location
• side by side installs for rolling upgrades
• Rolling upgrades via CM
• hot fixes
• minor version upgrades
• Automated tests for upgrades and compatibility

Configuration / Feature
• Continuous Bulk Load
• Avoid and Use Puts
• Region tuning
• Updated defaults + CM
• GC tuning
• Updated defaults + CM
• Balancer
• Manual / custom tools
• Bad Schema
• Trial and Error
16
Bug
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

CM helps
• Sanity checks on configurations
• Wizard based installation and setup
• Wizard based rolling upgrades (minor versions)
• Wizard based backup and disaster recovery strategies

Configuration Management

Support improvement wishlist
• Improved “Ergonomics”
• Better default configuration and guard rails
• “I’m sorry Dave, I can’t let you do that”
• Improved error messaging
• Suggest likely root causes in logs
• Improve log signal-to-noise ratio
• More improved ops tooling and frameworks for app development
6/26/13 Hadoop Summit / O'Dell, Hsieh19

Good news
• All bug fixes go into the Apache versions before CDH
• HBase is maturing
• Higher percentage of incidents by underlying OS/HW/NW
• More performance and tuning oriented questions
• Similar percentage of incidents caused by bugs
• We’re getting better
• Lower percentage of incidents managed with workarounds
• More tools in place to help operational support
• Hbck, CM, defaults
• We can still do better!

Getting rid of workarounds
Trends Developing HBase

Developer Community
• Vibrant, Highly
Active community!
• We’re Growing!

Upstream Development Improvements for 0.95+
• Improving Usability
• Improving Reliability
• Improving Predictability
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets

Improving Usability
Metrics and Frameworks

Usability Concerns
• Administering HBase has been too hard.
• Difficult to see what is happening in HBase
• Easy to make bad design decisions early without realizing
• New Developments
• Metrics Revamp
• HTrace
• Frameworks for Schema design

Metrics Options
Cloudera Manager OpenTSDB
26
Ganglia
Ganglia Image From:http://www.flickr.com/photos/hongiiv/

HTrace
• Problem:
• Where is time being spent inside HBase?
• Solution: HTrace Framework
• Inspired by Google Dapper
• Threaded through HBase and HDFS
• Tracks time spent in calls in a distributed system by tracking spans*
on different machines.
*Some assembly still required.

HBase Schemas
• HBase Application developers must iterate to find a suitable HBase
schema
• Schema critical for Performance at Scale
• How can we make this easier?
• How can we reduce the expertise required to do this?
• Today:
• Lots of tuning knobs
• Developers need to understand Column Families, Rowkey design, Data
encoding, …
• Some are expensive to change after the fact

Row key design techniques
• Numeric Keys and lexicographic sort
• Store numbers big-endian.
• Pad ASCII numbers with 0’s.
• Use reversal to have most significant traits first.
• Reverse URL.
• Reverse timestamp to get most recent first.
• (MAX_LONG - ts) so “time” gets monotonically smaller.
• Use composite keys to make key distribute nicely and work
well with sub-scans
• Ex: User-ReverseTimeStamp
• Do not use current timestamp as first part of row key!
29
Row100
Row3
Row 31
Row003
Row031
Row100
vs.
blog.cloudera.com
hbase.apache.org
strataconf.com
vs.
com.cloudera.blog
com.strataconf
org.apache.hbase

Row key design techniques
• Numeric Keys and lexicographic sort
• Store numbers big-endian.
• Pad ASCII numbers with 0’s.
• Use reversal to have most significant traits first.
• Reverse URL.
• Reverse timestamp to get most recent first.
• (MAX_LONG - ts) so “time” gets monotonically smaller.
• Use composite keys to make key distribute nicely and work
well with sub-scans
• Ex: User-ReverseTimeStamp
• Do not use current timestamp as first part of row key!
30
Row100
Row3
Row 31
Row003
Row031
Row100
vs.
blog.cloudera.com
hbase.apache.org
strataconf.com
vs.
com.cloudera.blog
com.strataconf
org.apache.hbase

Reliable
Reliable / Highly Available
• Reliable:
• Ability to recover service if a
component fails, without losing data.
• Highly Available:
• Ability to quickly recover service if a
• Goal: Minimize downtime!
Highly Available

Mean Time To Recovery (MTTR)
• Average time taken to automatically recover from a failure.
• Detection time
• Repair Time
• Notification Time
• Measure: HTrace (Dapper) Infrastructure (0.96+)
Detect Repair Notify
time

Reduce Detection Time
• Proactive notification of HMaster failure (0.95)
• Proactive notification of RS failure (0.95)
• Fast server failover (Hardware)
Detect Notify
time
Repair

Reduce Detection Time
• Proactive notification of HMaster failure (0.95)
• Proactive notification of RS failure (0.95)
• Fast server failover (Hardware)
Repair Notify
time
Detect

Reduce Recovery Time
• Distributed Log Splitting (0.92)
• Distributed Log Replay (0.95)
• Fast Write recovery (0.95)
• Pristine Read recovery (0.96+)
Notify
time
Detect Repair

Reduce Recovery Time
• Distributed Log Splitting (0.92)
• Distributed Log Replay (0.95)
• Fast Write recovery (0.95)
• Pristine Read recovery (0.96+)
Repair Notify
time
Detect

Reduce Notification Time
• Notify client on recovery
• Async Client rewrite (0.96+)
Notify
time
Detect Repair

Reduce Notification Time
• Notify client on recovery
• Async Client rewrite (0.96+)
Repair Notify
time
Detect

Compactions
Improving Predictability

Reliable
Reliable / Highly Available
• Reliable:
• Ability to recover service if a component
fails, without losing data.
• Goal: Minimize downtime!
Highly Available

Reliable
Reliable / Highly Available / Latency Tolerant
• Reliable:
• Ability to recover service if a component
fails, without losing data.
• Latency Tolerant
• Ability to perform and recover in a
predictable amount of time, without
losing data
• New Goal: Predictable performance
Highly Available
42
Latency
Tolerant

Common causes of performance variability
• Compaction
• Garbage Collection
• Locality Loss

Compaction
• Compactions optimizing read layout by rewriting files
• Reduce the seeks required to read a row
• Improve random read performance
• Age off expired or deleted data
• Assumes uniformly distributed write workload
• But we have new workloads:
• Continuous Bulk load write pattern
• Time-series write pattern

Compactions: Put workload
• Minor compactions
• Optimizes a sub set of adjacent
files
• Major Compactions
• Optimizes all files
• Choosing:
• Assume: older files should be
larger than newer files.
• “New” files are “larger” than
“older” files? major compaction
• Else, look at newer files and
select files for a minor
compaction
Newly flushed HFiles
Minor
…
…
Minor
MajorMinor

Compactions: Bulkload workload
• Functionality for loading data en
masse
• Intended for Bootstrapping
HBase tables
• New write workload:
frequently ingest data only via
bulk load
• Problem:
• Breaks age/size assumption!
• Major Compaction Storms!
• Compactions unnecessarily
rewrite large files.
Newly bulk loaded HFiles
Major
MajorMajor

Bulkload: Exploring Compactor
• Explore all compaction
possibilities
• Choose minor compactions
that reduces # of files while
incurring least IO.
• “the best bang of the buck”
• Compaction workload is
more manageable
Newly bulk loaded HFiles
Explore
Minor
Minor

Comparing 6/11-6/12 to 6/12-6/13
49
Patched
12%
Workaround
(hbck)
28%
Workaround
(config)
44%
Net/HW/OS
16%
6/11-6/12 - CDH3 / 0.90.x HBase
Support Tickets
Patched
14%
Workaround
(config/hbck)
36%
Net/HW/OS
42%
Documentation
8%
Support Tickets
Development
and tooling
efforts continue
to reduce
HBase is
becoming more
robust
Improved
testing

Summary by Version
0.90 0.92 /0.94 0.95-dev / 0.96 0.98 /trunk
•HBase Developer
Expertise
• HBase Operational
Experience
• Distributed Systems Admin
Experience
• 
•True Durability • Consistency
• Performance
• MTTR
• Protobufs
• Snapshots
• Table locks
• (Predictability)
• (File Block Affinity†)
•Distributed log
splitting*
•Distributed log splitting • Distributed log splitting
• Distributed log replay†
• Fast Write Recovery†
•Distributed log splitting
•Distributed log replay†
•Fast Write Recovery†
•(Pristine Region Read Recovery)
•Metrics • CF+Region Granularity
Metrics
• CF+Region Granularity Metrics
• Improved failure detection time
•CF +Region Granularity Metrics
•Improved failure detection time
•(Htrace)
Recovery in Hours Recovery in Minutes Recovery in Seconds (for writes) Recovery in Seconds
† experimental (in progress) *backported in CDH

Questions?
@kevinrodell
@jmhsieh

Trends in Supporting Production Apache HBase Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Trends in Supporting Production Apache HBase Clusters

Similar to Trends in Supporting Production Apache HBase Clusters (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Trends in Supporting Production Apache HBase Clusters

Editor's Notes