TriHUG 3/14: HBase in Production

HBase In
Production
Hey we’re hiring!

Contents
● Bronto Overview
● HBase Architecture
● Operations
● Table Design
● Questions?

Bronto Overview
Bronto Software provides a cloud-based
marketing platform for organizations to drive
revenue through their email, mobile and social
campaigns

Bronto Contd.
● ESP for E-Commerce retailers
● Our customers are marketers
● Charts, graphs, reports
● Market segmentation
● Automation
● We are also hiring

Where We Use HBase
● High volume scenarios
● Realtime data
● Batch processing
● HDFS staging area
● Sorting/Indexing not a priority
○ We are working on this

HBase Overview
● Implementation of Google’s BigTable
● Sparse, sorted, versioned map
● Built on top of HDFS
● Row level ACID
● Get, Put, Scan
● Assorted RMW operations

Tables Overview
Tables are sorted (lexicographically) key value
pairs of uninterpreted byte[]s. Keyspace is
divided up into regions of keys. Each region is
hosted by exactly one machine.

R3R1
Server 1
Key Value
a byte[]
aa byte[]
b byte[]
bb byte[]
c byte[]
ca byte[]
R1: [a, b)
R2: [b, c)
R3: [c, d)
R2
Server 1
Table Overview

Operations
● Layers of complexity
● Normal failure modes
○ Hardware dies (or combust)
○ Human error
● JVM
● HDFS considerations
● Lots of knobs

Cascading Failure
1. High write volume fragments heap
2. GC promotion failure
3. Stop the world GC
4. ZK timeout
5. Receive YouAreDeadException, die
6. Failover
7. Goto 1

Useful Tunings
● MSLAB enabled
● hbase.regionserver.handler.count
○ Increasing puts more IO load on RS
○ 50 is our sweet spot
● JVM tuning
○ UseConcMarkSweepGC
○ UseParNewGC

Monitoring Tools
● Nagios for hardware checks
● Cloudera Manager
○ Reporting and health checks
○ Apache Ambari and MapR provide similar tools
● Hannibal + custom scripts
○ Identify hot regions for splitting

Table Design
● Table design is deceptively simple
● Main Considerations:
○ Row key structure
○ Number of column families
● Know your queries in advance

Additional Context
● SAAS environment
○ “Twitter clone” model won’t work
● Thousands of users millions, of attributes
● Skewed customer base
○ Biggest clients have 10MM+ contacts
○ Smallest have thousands

Row Keys
● Most important decision
● The only (native) index in HBase
● Random reads and writes are fast
○ Sorted on disk and in memory
○ Bloom filters speed read performance (not in use)

Hotspotting
● Associated with monotonically increasing
keys
○ MySql AUTO_INCREMENT
● Writes lock onto one region at a time
● Consequences:
○ Flush and compaction storms
○ $500K cluster limited by $10K machine

Row Key Advice
● Read/Write ratio should drive design
○ We pay a write time penalty for faster reads
● Identify queries you need to support
● Consider composite keys instead of indexes
● Bucketed/Salted keys are an option
○ Distribute writes across N buckets
○ Rebucketing is difficult
○ Requires N reads, slow workers

Variable Width Keys
customer_hash::email
● Allows scans for a single customer
● Hashed id distributes customers
● Sorted by email address
○ Could also use reverse domain for gmail, yahoo, etc.

Fixed Width Keys
site::contact::create::email
● FuzzyRowFilter
○ Can fix site, contact, and reverse_create
○ Can search for any email address
○ Could use a fixed width encoding for domain
■ Search for just gmail, yahoo, etc
● Distributes sites and users
● Contacts sorted by create date

Column Families
● Groupings of named columns
● Versioning, compression, TTL
● Different than BigTable
○ BigTable: 100s
○ HBase: 1 or 2

Column Family Example
Id d {VERSIONS => 2} s7 {TTL => 604800}
a (address) p (phone) o:3-27 (open) c:3-20 (click)
dfajkdh byte[] byte[]:555-5555 byte[]
hnvdzu9 byte[]:1234 St. XXXX
hnvdzu9 byte[]:1233 St.
hnvdzu9 XXXX byte[]
er9asyjk byte[]: 324 Ave
Column Family Example
● PROTIP: Keep CF and qualifier names short
○ They are repeated on disk for every cell
● “d” supports 2 versions of each column, maps to demographics
● “s7” has seven day TTL, maps to stats kept for 7 days.

MemStore
HDFS
s2s1 s3
f1
Column Families In Depth
MemStore
HDFS
s2s1
f2
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
● StoreFile(s) for each CF
in region
● Sparse
● One memstore per CF
○ Must flush together
● Compactions happen at
region level
(Region)
(family) (family)

Compactions
● Rewrites StoreFiles
○ Improves read performance
○ IO Intensive
● Region scope
● Used to take > 50 hours
● Custom script took it down to 18
○ Can (theoretically) run during the day

MemStore
HDFS
S1
f1
my_table,,
1328551097416.12921
bbc0c91869f88ba6a044a6a1c5
0.
(Region)
MemStore
HDFS
s
2
s
1 s3
f1
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
(Region)
Compaction Before and After
s4 s5 s6
Before After
K-Way
Merge

The Table From Hell
● 19 Column Families
● 60% of our region count
● Skewed write pattern
○ KB size store files
○ Frequent compaction storms
○ hbase.hstore.compaction.min.size (HBASE-5461)
● Moved to it’s own cluster

And yet...
● Cluster remained operational
○ Table is still in use today
● Met read and write demand
● Regions only briefly active
○ Rowkeys by date and customer

What saved us
● Keyed by customer and date
● Effectively write once
○ Kept “active” region count low
● Custom compaction script
○ Skipped old regions
● More hardware
● Were able to selectively migrate

Column Family Advice
● Bad choice for fine grained partitioning
● Good for
○ Similarly typed data
○ Varying versioning/retention requirements
● Prefer intra row scans
○ CF and qualifiers are sorted
○ ColumnRangeFilter

TriHUG 3/14: HBase in Production

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to TriHUG 3/14: HBase in Production

Similar to TriHUG 3/14: HBase in Production (20)

More from trihug

More from trihug (11)

Recently uploaded

Recently uploaded (20)

TriHUG 3/14: HBase in Production