Tales from Taming the Long Tail

243 views

Published on

Deepankar Reddy and Ishan Chhabra (Rocket Fuel)

Rocket Fuel is a marketing technology company that participates in 120+ billion real-time bidding auctions daily to show the right ad to the right user at the right time for our clients. In this talk, we discuss our efforts to systematically identify causes of, and how to decrease, long-tail read latencies.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
243
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tales from Taming the Long Tail

  1. 1. Tales from Taming Tail Latencies Deepankar Reddy, Ishan Chhabra Rocket Fuel Inc.
  2. 2. Recap : Rocket Fuel Inc ◦ Programmatic Ad Tech firm ◦ Eval(s) ~100B Ad opportunities daily ◦ Each eval has strict SLA of 100 ms
  3. 3. Recap : Blackbird @RocketFuel
  4. 4. Recap : Blackbird Scalable collection storage API ▫ Backed by HBase ▫ Append only collections
  5. 5. Recap : Blackbird Stores rich anonymized user data ◦ Historical behavior - Ads viewed and clicked, pages visited, etc. ◦ Interest - Third party and learned interests ◦ Feature vectors for various ML models ◦ etc ..
  6. 6. Blackbird Workload ◦ 80% Read - 20% Write workload ◦ 90 - 95 % cache hit ratio ◦ Record size (compressed protobufs) : ▫ Mean : 11 KB ▫ Median : 8 KB
  7. 7. Blackbird Workload ◦ 14 TB of Unreplicated Data ◦ 60 - 70 Nodes in a Data Center ◦ Strict SLA of 40 ms ◦ Current SLA violation rate @ ~2 %
  8. 8. Blackbird Workload ◦ Read latencies: ▫ Mean, Median : < 1ms ▫ 95th perc : 25 ms ◦ Write latencies: ▫ Mean, Median : < 1ms ▫ 95th perc : 1 ms
  9. 9. Blackbird WorkLoad Rocket Fuel Moment Scoring Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BlackBird Service ... ... ... Anon. UserId Tab
  10. 10. People Based Marketing ◦ Efforts to cluster identities of user ▫ Probabilistic using Machine Learning ▫ Deterministic via integrations ◦ Reduces information loss ◦ Aligns with user’s buying patterns
  11. 11. New Blackbird Queries Rocket Fuel Moment Scoring Pipeline . . . . . . . . . . . . . . . . . . . . . . BlackBird Data Service Anon UserID UserID Clusters Request for All IDs in the cluster
  12. 12. Translated SLA X axis : SLA violation rate for a single read Y axis : New SLA violation rate for multiple reads
  13. 13. Observations
  14. 14. Server Observed Read Latency 99P ~ 25 ms 95P ~ 8 ms
  15. 15. Client Observed Read Latency 99P > 100MS 95P > 25MS
  16. 16. Why the difference?
  17. 17. Network Level Time (ms) Server Side Time (ms) RS Heap size (MBs)
  18. 18. Observations ◦ Client / Server times match ◦ Except during the longer mixed GCs
  19. 19. Co-ordinated omission © Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
  20. 20. Co-ordinated omission © Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
  21. 21. Co-ordinated omission ◦ ~ Censorship of Data ▫ Not random but co-ordinated omission ◦ Censorship during a GC* ▫ Queueing time in app ▫ Time sitting in the network buffer ▫ Time in the responder *with normal latencies
  22. 22. Co-ordinated omission Server Side Latencies ◦ GCs don’t show up on inprocess latencies ◦ More events of “request pipeline”, more probability to capture a GC ▫ Add time in network buffer ▫ Add time in responder ▫ Add time in transit etc...
  23. 23. Co-ordinated omission Client Side Latencies ◦ Very important to slice requests per server to see the patterns ◦ Even the will miss Young GCs ▫ Problems with moving avgs (Yammer metrics)
  24. 24. Causes for Peaks What’s causing Mixed GC : Heap size MB Cache size GB ~25GB ~4-5 GB Around 5 times in a GC cycle 5 x 5 = 25
  25. 25. Causes for Peaks Mixed GC: ◦ LRU on heap is bad for GC ▫ All evicted blocks will be in Old Gen ▫ Old Gen cleanup => Mixed GC ◦ No GC optimizations can fix this
  26. 26. Why peaks are bad ◦ Not so bad in normal use cases ▫ If peaks occur rarely ◦ Bad enough in clustered reads ◦ Decreases times for other parts in our Moment Scoring Pipeline
  27. 27. Why peaks are bad
  28. 28. Why peaks are bad
  29. 29. Why peaks are bad
  30. 30. Fix is to move to Off Heap for LRU cache
  31. 31. Off Heap Block Cache ◦ An array of byte buffers (4MB size) ◦ Offset based free space management ◦ Re-use the buffers by overwriting ◦ HBaseCon Talk at 3:10-3:50pm
  32. 32. Off Heap Advantages ◦ Can scale to higher memory ▫ Reserving less for promotion failures ◦ Potentially could be on SSDs/NVMes ▫ Allows us to use more denser boxes
  33. 33. Off Heap Tests
  34. 34. Work for moving Off Heap ◦ HBase 1.0 copies data onto Heap ◦ Leads to too much Garbage ▫ GCing once every 2 - 3 secs
  35. 35. Work for moving Off Heap ◦ HBase 2.0 fixes this ▫ HBASE-11425 ◦ Pulled patches from upstream on 1.1 ◦ Encountered a few issues ▫ HBASE-15064, HBASE-15525 . . .
  36. 36. What about Young GC ? ◦ Any GC time above your SLA is bad ◦ Hard to see this with Yammer metrics ▫ Sliding Window smoothing ▫ Eliminates peaks in percentiles ◦ Use histograms without Averaging ▫ HDRHistogram / HBase-2.0 Fast Histogram ...
  37. 37. What about Young GC ? HDR Histogram Yammer Histogram
  38. 38. Fixing measurements ? More (precise) metrics ◦ Percentiles are confusing ▫ Very hard to reason sometimes ◦ Used SLA based violation counts ▫ Ex:- Time Bucketed counts
  39. 39. Fixing Young GC
  40. 40. How to fix Young GC ? ◦ Tried to get GC pause times << SLA ◦ Not possible with current heap sizes ◦ Need RS with smaller heaps ▫ Less promotion work ▫ Less Young cleanup work
  41. 41. How to fix Young GC ? ◦ Have to run a lot of RegionServers ▫ Commodity servers are multi core now with large RAM ◦ Slider makes this easier ◦ Load Balancing issue
  42. 42. How to fix Young GC ? ◦ Smaller GC pauses => more freq GCs ◦ Need to reduce garbage gen. also
  43. 43. How to fix Young GC ? Reducing Garbage generation ◦ Memstore / ConcurrentSkipList oppty. ◦ To use or not use Data Block Encoding ◦ Compress / Decompress Opts. ◦ Misc….
  44. 44. How to fix Young GC ? Results
  45. 45. How to fix Young GC ? Results
  46. 46. Reads going to Disk
  47. 47. Processing Times
  48. 48. Processing Times ◦ Huge bump between 95 & 99 percs ◦ Cache hit ratio 95 % ◦ We are going to disks for these
  49. 49. How can we fix this gap? 1. Increase cache hit ratio 2. Make disk reads faster
  50. 50. Exploring SSD & NVMe
  51. 51. Increasing Cache Hit Ratio Using NVME cards ◦ ~ SSD higher throughput due to PCIe ◦ Support already in Bucket Cache ◦ Cost effective w.r.t RAM ▫ RAM ~ $10 per GB, NVMe ~ $1 per GB
  52. 52. Disk throughputs Exploring move to SSDs ◦ SSDs cost ~ SAS disks costs ▫ Depends on SSD grade ▫ Including SAS backplane costs ◦ HDFS can store 1 replica in SSDs, other 2 in HDDs
  53. 53. Thanks! ANY QUESTIONS?
  54. 54. Addendum
  55. 55. Newer SLA Requirements ◦ Older single read SLA is not enough ▫ 98% translate to 90% in newer model ◦ Top two areas of improvements ▫ Garbage Collection in JVM ▫ Reads going to Disks
  56. 56. How to fix Young GC ? RS with 14GB heap pauses < 10ms SLA violation counts
  57. 57. How to fix Young GC ? Results (sunday)
  58. 58. How to fix Young GC ? Results (sunday)
  59. 59. Off Heap Tests

×