Deepankar Reddy and Ishan Chhabra (Rocket Fuel)
Rocket Fuel is a marketing technology company that participates in 120+ billion real-time bidding auctions daily to show the right ad to the right user at the right time for our clients. In this talk, we discuss our efforts to systematically identify causes of, and how to decrease, long-tail read latencies.
5. Recap : Blackbird
Stores rich anonymized user data
◦ Historical behavior - Ads viewed and clicked,
pages visited, etc.
◦ Interest - Third party and learned
interests
◦ Feature vectors for various ML models
◦ etc ..
6. Blackbird Workload
◦ 80% Read - 20% Write workload
◦ 90 - 95 % cache hit ratio
◦ Record size (compressed protobufs) :
▫ Mean : 11 KB
▫ Median : 8 KB
7. Blackbird Workload
◦ 14 TB of Unreplicated Data
◦ 60 - 70 Nodes in a Data Center
◦ Strict SLA of 40 ms
◦ Current SLA violation rate @ ~2 %
8. Blackbird Workload
◦ Read latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 25 ms
◦ Write latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 1 ms
10. People Based Marketing
◦ Efforts to cluster identities of user
▫ Probabilistic using Machine Learning
▫ Deterministic via integrations
◦ Reduces information loss
◦ Aligns with user’s buying patterns
11. New Blackbird Queries
Rocket Fuel Moment Scoring Pipeline
. . . . . . . . . . . . . . . . . . . . . .
BlackBird Data Service
Anon
UserID
UserID
Clusters
Request for All IDs in
the cluster
12. Translated SLA
X axis :
SLA violation
rate for a
single read
Y axis :
New SLA
violation rate for
multiple reads
21. Co-ordinated omission
◦ ~ Censorship of Data
▫ Not random but co-ordinated omission
◦ Censorship during a GC*
▫ Queueing time in app
▫ Time sitting in the network buffer
▫ Time in the responder
*with normal latencies
22. Co-ordinated omission
Server Side Latencies
◦ GCs don’t show up on inprocess latencies
◦ More events of “request pipeline”, more
probability to capture a GC
▫ Add time in network buffer
▫ Add time in responder
▫ Add time in transit etc...
23. Co-ordinated omission
Client Side Latencies
◦ Very important to slice requests per
server to see the patterns
◦ Even the will miss Young GCs
▫ Problems with moving avgs (Yammer metrics)
24. Causes for Peaks
What’s causing Mixed GC :
Heap size MB
Cache size GB
~25GB
~4-5 GB
Around 5 times in a GC cycle
5 x 5 = 25
25. Causes for Peaks
Mixed GC:
◦ LRU on heap is bad for GC
▫ All evicted blocks will be in Old Gen
▫ Old Gen cleanup => Mixed GC
◦ No GC optimizations can fix this
26. Why peaks are bad
◦ Not so bad in normal use cases
▫ If peaks occur rarely
◦ Bad enough in clustered reads
◦ Decreases times for other parts in our
Moment Scoring Pipeline
31. Off Heap Block Cache
◦ An array of byte buffers (4MB size)
◦ Offset based free space management
◦ Re-use the buffers by overwriting
◦ HBaseCon Talk at 3:10-3:50pm
32. Off Heap Advantages
◦ Can scale to higher memory
▫ Reserving less for promotion failures
◦ Potentially could be on SSDs/NVMes
▫ Allows us to use more denser boxes
34. Work for moving Off Heap
◦ HBase 1.0 copies data onto Heap
◦ Leads to too much Garbage
▫ GCing once every 2 - 3 secs
35. Work for moving Off Heap
◦ HBase 2.0 fixes this
▫ HBASE-11425
◦ Pulled patches from upstream on 1.1
◦ Encountered a few issues
▫ HBASE-15064, HBASE-15525 . . .
36. What about Young GC ?
◦ Any GC time above your SLA is bad
◦ Hard to see this with Yammer metrics
▫ Sliding Window smoothing
▫ Eliminates peaks in percentiles
◦ Use histograms without Averaging
▫ HDRHistogram / HBase-2.0 Fast Histogram ...
38. Fixing measurements ?
More (precise) metrics
◦ Percentiles are confusing
▫ Very hard to reason sometimes
◦ Used SLA based violation counts
▫ Ex:- Time Bucketed counts
40. How to fix Young GC ?
◦ Tried to get GC pause times << SLA
◦ Not possible with current heap sizes
◦ Need RS with smaller heaps
▫ Less promotion work
▫ Less Young cleanup work
41. How to fix Young GC ?
◦ Have to run a lot of RegionServers
▫ Commodity servers are multi core now with
large RAM
◦ Slider makes this easier
◦ Load Balancing issue
42. How to fix Young GC ?
◦ Smaller GC pauses => more freq GCs
◦ Need to reduce garbage gen. also
43. How to fix Young GC ?
Reducing Garbage generation
◦ Memstore / ConcurrentSkipList oppty.
◦ To use or not use Data Block Encoding
◦ Compress / Decompress Opts.
◦ Misc….
51. Increasing Cache Hit Ratio
Using NVME cards
◦ ~ SSD higher throughput due to PCIe
◦ Support already in Bucket Cache
◦ Cost effective w.r.t RAM
▫ RAM ~ $10 per GB, NVMe ~ $1 per GB
52. Disk throughputs
Exploring move to SSDs
◦ SSDs cost ~ SAS disks costs
▫ Depends on SSD grade
▫ Including SAS backplane costs
◦ HDFS can store 1 replica in SSDs,
other 2 in HDDs
55. Newer SLA Requirements
◦ Older single read SLA is not enough
▫ 98% translate to 90% in newer model
◦ Top two areas of improvements
▫ Garbage Collection in JVM
▫ Reads going to Disks
56. How to fix Young GC ?
RS with 14GB heap
pauses < 10ms
SLA violation counts