Tales from Taming the Long Tail

Tales from
Taming Tail Latencies
Deepankar Reddy, Ishan Chhabra
Rocket Fuel Inc.

Recap : Rocket Fuel Inc
◦ Programmatic Ad Tech firm
◦ Eval(s) ~100B Ad opportunities daily
◦ Each eval has strict SLA of 100 ms

Recap : Blackbird
Scalable collection storage API
▫ Backed by HBase
▫ Append only collections

Recap : Blackbird
Stores rich anonymized user data
◦ Historical behavior - Ads viewed and clicked,
pages visited, etc.
◦ Interest - Third party and learned
interests
◦ Feature vectors for various ML models
◦ etc ..

Blackbird Workload
◦ 80% Read - 20% Write workload
◦ 90 - 95 % cache hit ratio
◦ Record size (compressed protobufs) :
▫ Mean : 11 KB
▫ Median : 8 KB

Blackbird Workload
◦ 14 TB of Unreplicated Data
◦ 60 - 70 Nodes in a Data Center
◦ Strict SLA of 40 ms
◦ Current SLA violation rate @ ~2 %

Blackbird Workload
◦ Read latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 25 ms
◦ Write latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 1 ms

Blackbird WorkLoad
Rocket Fuel Moment Scoring Pipeline
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
BlackBird
Service
... ... ...
Anon.
UserId
Tab

People Based Marketing
◦ Efforts to cluster identities of user
▫ Probabilistic using Machine Learning
▫ Deterministic via integrations
◦ Reduces information loss
◦ Aligns with user’s buying patterns

New Blackbird Queries
Rocket Fuel Moment Scoring Pipeline
. . . . . . . . . . . . . . . . . . . . . .
BlackBird Data Service
Anon
UserID
UserID
Clusters
Request for All IDs in
the cluster

Translated SLA
X axis :
SLA violation
rate for a
single read
Y axis :
New SLA
violation rate for
multiple reads

Server Observed Read Latency
99P ~ 25 ms
95P ~ 8 ms

Client Observed Read Latency
99P > 100MS
95P > 25MS

Network Level Time (ms)
Server Side Time (ms)
RS Heap size (MBs)

Observations
◦ Client / Server times match
◦ Except during the longer mixed GCs

Co-ordinated omission
© Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015

◦ ~ Censorship of Data
▫ Not random but co-ordinated omission
◦ Censorship during a GC*
▫ Queueing time in app
▫ Time sitting in the network buffer
▫ Time in the responder
*with normal latencies

Server Side Latencies
◦ GCs don’t show up on inprocess latencies
◦ More events of “request pipeline”, more
probability to capture a GC
▫ Add time in network buffer
▫ Add time in responder
▫ Add time in transit etc...

Client Side Latencies
◦ Very important to slice requests per
server to see the patterns
◦ Even the will miss Young GCs
▫ Problems with moving avgs (Yammer metrics)

Causes for Peaks
What’s causing Mixed GC :
Heap size MB
Cache size GB
~25GB
~4-5 GB
Around 5 times in a GC cycle
5 x 5 = 25

Causes for Peaks
Mixed GC:
◦ LRU on heap is bad for GC
▫ All evicted blocks will be in Old Gen
▫ Old Gen cleanup => Mixed GC
◦ No GC optimizations can fix this

Why peaks are bad
◦ Not so bad in normal use cases
▫ If peaks occur rarely
◦ Bad enough in clustered reads
◦ Decreases times for other parts in our
Moment Scoring Pipeline

Fix is to move to Off Heap
for LRU cache

Off Heap Block Cache
◦ An array of byte buffers (4MB size)
◦ Offset based free space management
◦ Re-use the buffers by overwriting
◦ HBaseCon Talk at 3:10-3:50pm

Off Heap Advantages
◦ Can scale to higher memory
▫ Reserving less for promotion failures
◦ Potentially could be on SSDs/NVMes
▫ Allows us to use more denser boxes

Work for moving Off Heap
◦ HBase 1.0 copies data onto Heap
◦ Leads to too much Garbage
▫ GCing once every 2 - 3 secs

Work for moving Off Heap
◦ HBase 2.0 fixes this
▫ HBASE-11425
◦ Pulled patches from upstream on 1.1
◦ Encountered a few issues
▫ HBASE-15064, HBASE-15525 . . .

What about Young GC ?
◦ Any GC time above your SLA is bad
◦ Hard to see this with Yammer metrics
▫ Sliding Window smoothing
▫ Eliminates peaks in percentiles
◦ Use histograms without Averaging
▫ HDRHistogram / HBase-2.0 Fast Histogram ...

What about Young GC ?
HDR Histogram
Yammer Histogram

Fixing measurements ?
More (precise) metrics
◦ Percentiles are confusing
▫ Very hard to reason sometimes
◦ Used SLA based violation counts
▫ Ex:- Time Bucketed counts

How to fix Young GC ?
◦ Tried to get GC pause times << SLA
◦ Not possible with current heap sizes
◦ Need RS with smaller heaps
▫ Less promotion work
▫ Less Young cleanup work

◦ Have to run a lot of RegionServers
▫ Commodity servers are multi core now with
large RAM
◦ Slider makes this easier
◦ Load Balancing issue

◦ Smaller GC pauses => more freq GCs
◦ Need to reduce garbage gen. also

Reducing Garbage generation
◦ Memstore / ConcurrentSkipList oppty.
◦ To use or not use Data Block Encoding
◦ Compress / Decompress Opts.
◦ Misc….

Processing Times
◦ Huge bump between 95 & 99 percs
◦ Cache hit ratio 95 %
◦ We are going to disks for these

How can we fix this gap?
1. Increase cache hit ratio
2. Make disk reads faster

Increasing Cache Hit Ratio
Using NVME cards
◦ ~ SSD higher throughput due to PCIe
◦ Support already in Bucket Cache
◦ Cost effective w.r.t RAM
▫ RAM ~ $10 per GB, NVMe ~ $1 per GB

Disk throughputs
Exploring move to SSDs
◦ SSDs cost ~ SAS disks costs
▫ Depends on SSD grade
▫ Including SAS backplane costs
◦ HDFS can store 1 replica in SSDs,
other 2 in HDDs

Newer SLA Requirements
◦ Older single read SLA is not enough
▫ 98% translate to 90% in newer model
◦ Top two areas of improvements
▫ Garbage Collection in JVM
▫ Reads going to Disks

RS with 14GB heap
pauses < 10ms
SLA violation counts

Results (sunday)

Tales from Taming the Long Tail

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Tales from Taming the Long Tail

Similar to Tales from Taming the Long Tail (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

Tales from Taming the Long Tail