Tales from Taming the Long Tail

HBaseCon
Tales from
Taming Tail Latencies
Deepankar Reddy, Ishan Chhabra
Rocket Fuel Inc.
Recap : Rocket Fuel Inc
◦ Programmatic Ad Tech firm
◦ Eval(s) ~100B Ad opportunities daily
◦ Each eval has strict SLA of 100 ms
Recap : Blackbird @RocketFuel
Recap : Blackbird
Scalable collection storage API
▫ Backed by HBase
▫ Append only collections
Recap : Blackbird
Stores rich anonymized user data
◦ Historical behavior - Ads viewed and clicked,
pages visited, etc.
◦ Interest - Third party and learned
interests
◦ Feature vectors for various ML models
◦ etc ..
Blackbird Workload
◦ 80% Read - 20% Write workload
◦ 90 - 95 % cache hit ratio
◦ Record size (compressed protobufs) :
▫ Mean : 11 KB
▫ Median : 8 KB
Blackbird Workload
◦ 14 TB of Unreplicated Data
◦ 60 - 70 Nodes in a Data Center
◦ Strict SLA of 40 ms
◦ Current SLA violation rate @ ~2 %
Blackbird Workload
◦ Read latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 25 ms
◦ Write latencies:
▫ Mean, Median : < 1ms
▫ 95th perc : 1 ms
Blackbird WorkLoad
Rocket Fuel Moment Scoring Pipeline
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
BlackBird
Service
... ... ...
Anon.
UserId
Tab
People Based Marketing
◦ Efforts to cluster identities of user
▫ Probabilistic using Machine Learning
▫ Deterministic via integrations
◦ Reduces information loss
◦ Aligns with user’s buying patterns
New Blackbird Queries
Rocket Fuel Moment Scoring Pipeline
. . . . . . . . . . . . . . . . . . . . . .
BlackBird Data Service
Anon
UserID
UserID
Clusters
Request for All IDs in
the cluster
Translated SLA
X axis :
SLA violation
rate for a
single read
Y axis :
New SLA
violation rate for
multiple reads
Observations
Server Observed Read Latency
99P ~ 25 ms
95P ~ 8 ms
Client Observed Read Latency
99P > 100MS
95P > 25MS
Why the difference?
Network Level Time (ms)
Server Side Time (ms)
RS Heap size (MBs)
Observations
◦ Client / Server times match
◦ Except during the longer mixed GCs
Co-ordinated omission
© Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
Co-ordinated omission
© Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
Co-ordinated omission
◦ ~ Censorship of Data
▫ Not random but co-ordinated omission
◦ Censorship during a GC*
▫ Queueing time in app
▫ Time sitting in the network buffer
▫ Time in the responder
*with normal latencies
Co-ordinated omission
Server Side Latencies
◦ GCs don’t show up on inprocess latencies
◦ More events of “request pipeline”, more
probability to capture a GC
▫ Add time in network buffer
▫ Add time in responder
▫ Add time in transit etc...
Co-ordinated omission
Client Side Latencies
◦ Very important to slice requests per
server to see the patterns
◦ Even the will miss Young GCs
▫ Problems with moving avgs (Yammer metrics)
Causes for Peaks
What’s causing Mixed GC :
Heap size MB
Cache size GB
~25GB
~4-5 GB
Around 5 times in a GC cycle
5 x 5 = 25
Causes for Peaks
Mixed GC:
◦ LRU on heap is bad for GC
▫ All evicted blocks will be in Old Gen
▫ Old Gen cleanup => Mixed GC
◦ No GC optimizations can fix this
Why peaks are bad
◦ Not so bad in normal use cases
▫ If peaks occur rarely
◦ Bad enough in clustered reads
◦ Decreases times for other parts in our
Moment Scoring Pipeline
Why peaks are bad
Why peaks are bad
Why peaks are bad
Fix is to move to Off Heap
for LRU cache
Off Heap Block Cache
◦ An array of byte buffers (4MB size)
◦ Offset based free space management
◦ Re-use the buffers by overwriting
◦ HBaseCon Talk at 3:10-3:50pm
Off Heap Advantages
◦ Can scale to higher memory
▫ Reserving less for promotion failures
◦ Potentially could be on SSDs/NVMes
▫ Allows us to use more denser boxes
Off Heap Tests
Work for moving Off Heap
◦ HBase 1.0 copies data onto Heap
◦ Leads to too much Garbage
▫ GCing once every 2 - 3 secs
Work for moving Off Heap
◦ HBase 2.0 fixes this
▫ HBASE-11425
◦ Pulled patches from upstream on 1.1
◦ Encountered a few issues
▫ HBASE-15064, HBASE-15525 . . .
What about Young GC ?
◦ Any GC time above your SLA is bad
◦ Hard to see this with Yammer metrics
▫ Sliding Window smoothing
▫ Eliminates peaks in percentiles
◦ Use histograms without Averaging
▫ HDRHistogram / HBase-2.0 Fast Histogram ...
What about Young GC ?
HDR Histogram
Yammer Histogram
Fixing measurements ?
More (precise) metrics
◦ Percentiles are confusing
▫ Very hard to reason sometimes
◦ Used SLA based violation counts
▫ Ex:- Time Bucketed counts
Fixing Young GC
How to fix Young GC ?
◦ Tried to get GC pause times << SLA
◦ Not possible with current heap sizes
◦ Need RS with smaller heaps
▫ Less promotion work
▫ Less Young cleanup work
How to fix Young GC ?
◦ Have to run a lot of RegionServers
▫ Commodity servers are multi core now with
large RAM
◦ Slider makes this easier
◦ Load Balancing issue
How to fix Young GC ?
◦ Smaller GC pauses => more freq GCs
◦ Need to reduce garbage gen. also
How to fix Young GC ?
Reducing Garbage generation
◦ Memstore / ConcurrentSkipList oppty.
◦ To use or not use Data Block Encoding
◦ Compress / Decompress Opts.
◦ Misc….
How to fix Young GC ?
Results
How to fix Young GC ?
Results
Reads going to Disk
Processing Times
Processing Times
◦ Huge bump between 95 & 99 percs
◦ Cache hit ratio 95 %
◦ We are going to disks for these
How can we fix this gap?
1. Increase cache hit ratio
2. Make disk reads faster
Exploring SSD & NVMe
Increasing Cache Hit Ratio
Using NVME cards
◦ ~ SSD higher throughput due to PCIe
◦ Support already in Bucket Cache
◦ Cost effective w.r.t RAM
▫ RAM ~ $10 per GB, NVMe ~ $1 per GB
Disk throughputs
Exploring move to SSDs
◦ SSDs cost ~ SAS disks costs
▫ Depends on SSD grade
▫ Including SAS backplane costs
◦ HDFS can store 1 replica in SSDs,
other 2 in HDDs
Thanks!
ANY QUESTIONS?
Addendum
Newer SLA Requirements
◦ Older single read SLA is not enough
▫ 98% translate to 90% in newer model
◦ Top two areas of improvements
▫ Garbage Collection in JVM
▫ Reads going to Disks
How to fix Young GC ?
RS with 14GB heap
pauses < 10ms
SLA violation counts
Tales from Taming the Long Tail
How to fix Young GC ?
Results (sunday)
How to fix Young GC ?
Results (sunday)
Off Heap Tests
1 of 60

Recommended

Keynote: Apache HBase at Yahoo! Scale by
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
5.3K views24 slides
hbaseconasia2017: HBase Practice At XiaoMi by
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
1.8K views45 slides
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices by
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
1.5K views11 slides
HBaseCon 2015: OpenTSDB and AsyncHBase Update by
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
7.7K views37 slides
HBaseCon2017 gohbase: Pure Go HBase Client by
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
1.7K views32 slides
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides

More Related Content

What's hot

HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi by
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiHBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiMichael Stack
1.5K views20 slides
HBaseCon2017 Improving HBase availability in a multi tenant environment by
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
1.2K views73 slides
Rolling Out Apache HBase for Mobile Offerings at Visa by
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
2.6K views39 slides
Apache HBase, Accelerated: In-Memory Flush and Compaction by
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
2.5K views36 slides
Accordion HBaseCon 2017 by
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017Edward Bortnikov
174 views25 slides
Off-heaping the Apache HBase Read Path by
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
4.2K views19 slides

What's hot(20)

HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi by Michael Stack
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiHBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
Michael Stack1.5K views
HBaseCon2017 Improving HBase availability in a multi tenant environment by HBaseCon
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon1.2K views
Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
Apache HBase, Accelerated: In-Memory Flush and Compaction by HBaseCon
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon2.5K views
Off-heaping the Apache HBase Read Path by HBaseCon
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon4.2K views
OpenTSDB: HBaseCon2017 by HBaseCon
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
HBaseCon2K views
Date-tiered Compaction Policy for Time-series Data by HBaseCon
Date-tiered Compaction Policy for Time-series DataDate-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series Data
HBaseCon1K views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 HBase at Xiaomi by HBaseCon
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
HBaseCon1K views
Argus Production Monitoring at Salesforce by HBaseCon
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon3.2K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
Kafka Summit SF 2017 - Infrastructure for Streaming Applications by confluent
Kafka Summit SF 2017 - Infrastructure for Streaming Applications Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
confluent750 views
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC by Erik Krogen
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Erik Krogen1.1K views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL by Cloudera, Inc.
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Cloudera, Inc.5.3K views
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase by HBaseCon
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon14.6K views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time by Michael Stack
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack1.6K views
Kafka on ZFS: Better Living Through Filesystems by confluent
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent5.9K views

Viewers also liked

HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment by
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
4K views31 slides
Apache HBase at Airbnb by
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb HBaseCon
5.9K views35 slides
Improvements to Apache HBase and Its Applications in Alibaba Search by
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search HBaseCon
3.6K views19 slides
Apache HBase - Just the Basics by
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
4.6K views22 slides
Keynote: Welcome Message/State of Apache HBase by
Keynote: Welcome Message/State of Apache HBase Keynote: Welcome Message/State of Apache HBase
Keynote: Welcome Message/State of Apache HBase HBaseCon
2.5K views13 slides
HBaseCon 2015: Elastic HBase on Mesos by
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
3.1K views47 slides

Viewers also liked(20)

HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment by HBaseCon
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon4K views
Apache HBase at Airbnb by HBaseCon
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
HBaseCon5.9K views
Improvements to Apache HBase and Its Applications in Alibaba Search by HBaseCon
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon3.6K views
Apache HBase - Just the Basics by HBaseCon
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon4.6K views
Keynote: Welcome Message/State of Apache HBase by HBaseCon
Keynote: Welcome Message/State of Apache HBase Keynote: Welcome Message/State of Apache HBase
Keynote: Welcome Message/State of Apache HBase
HBaseCon2.5K views
HBaseCon 2015: Elastic HBase on Mesos by HBaseCon
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon3.1K views
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace by HBaseCon
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon4.5K views
HBaseCon 2015: HBase Performance Tuning @ Salesforce by HBaseCon
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon6.1K views
Apache HBase in the Enterprise Data Hub at Cerner by HBaseCon
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon2.1K views
Apache Spark on Apache HBase: Current and Future by HBaseCon
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
HBaseCon2.8K views
HBaseCon 2015: HBase @ Flipboard by HBaseCon
HBaseCon 2015: HBase @ FlipboardHBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ Flipboard
HBaseCon4K views
Breaking the Sound Barrier with Persistent Memory by HBaseCon
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon1.6K views
Apache HBase Improvements and Practices at Xiaomi by HBaseCon
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon4.8K views
Keynote: The Future of Apache HBase by HBaseCon
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
HBaseCon2.9K views
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase by HBaseCon
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon8.8K views
Apache Phoenix: Use Cases and New Features by HBaseCon
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
HBaseCon4.7K views
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight by HBaseCon
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
HBaseCon2.8K views
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho... by huguk
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk1.7K views
Argus Production Monitoring at Salesforce by HBaseCon
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon404 views
Time-Series Apache HBase by HBaseCon
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon5.6K views

Similar to Tales from Taming the Long Tail

EVCache: Lowering Costs for a Low Latency Cache with RocksDB by
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
651 views72 slides
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah... by
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
13.1K views36 slides
MariaDB Server Performance Tuning & Optimization by
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB plc
8.1K views43 slides
Twitter Fatcache by
Twitter FatcacheTwitter Fatcache
Twitter Fatcacheits_skm
1.6K views12 slides
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N... by
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil
527 views32 slides
CASSANDRA MEETUP - Choosing the right cloud instances for success by
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
149 views26 slides

Similar to Tales from Taming the Long Tail(20)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB by Scott Mansfield
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield651 views
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah... by Lucidworks
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks13.1K views
MariaDB Server Performance Tuning & Optimization by MariaDB plc
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
MariaDB plc8.1K views
Twitter Fatcache by its_skm
Twitter FatcacheTwitter Fatcache
Twitter Fatcache
its_skm1.6K views
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N... by Brian Brazil
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...
Brian Brazil527 views
CASSANDRA MEETUP - Choosing the right cloud instances for success by Erick Ramirez
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
Erick Ramirez149 views
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed by Equnix
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
Equnix908 views
AWS Webcast - Cost and Performance Optimization in Amazon RDS by Amazon Web Services
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDS
Amazon Web Services7.5K views
How to Make SQL Server Go Faster by Brent Ozar
How to Make SQL Server Go FasterHow to Make SQL Server Go Faster
How to Make SQL Server Go Faster
Brent Ozar2.4K views
Performance Tipping Points - Hitting Hardware Bottlenecks by MongoDB
Performance Tipping Points - Hitting Hardware BottlenecksPerformance Tipping Points - Hitting Hardware Bottlenecks
Performance Tipping Points - Hitting Hardware Bottlenecks
MongoDB1.2K views
Maximizing performance via tuning and optimization by MariaDB plc
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
MariaDB plc104 views
Maximizing performance via tuning and optimization by MariaDB plc
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
MariaDB plc642 views
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C... by Red_Hat_Storage
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red_Hat_Storage762 views
Right-Sizing your SQL Server Virtual Machine by heraflux
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
heraflux6.9K views
Application Caching: The Hidden Microservice by Scott Mansfield
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
Scott Mansfield2K views
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud by Patrick McGarry
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Patrick McGarry2.5K views
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud by Ceph Community
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Ceph Community 158 views
Netflix - Realtime Impression Store by Nitin S
Netflix - Realtime Impression Store Netflix - Realtime Impression Store
Netflix - Realtime Impression Store
Nitin S313 views
Why is My Stream Processing Job Slow? with Xavier Leaute by Databricks
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
Databricks2.3K views
How we got to 1 millisecond latency in 99% under repair, compaction, and flus... by ScyllaDB
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB876 views

More from HBaseCon

hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides
hbaseconasia2017: HBase在Hulu的使用和实践 by
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
878 views31 slides

More from HBaseCon(20)

hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas... by HBaseCon
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon1.1K views
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase by HBaseCon
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon729 views
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce by HBaseCon
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon359 views
HBaseCon2017 Community-Driven Graphs with JanusGraph by HBaseCon
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon5.8K views
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ... by HBaseCon
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon742 views
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St... by HBaseCon
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon606 views
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor... by HBaseCon
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon479 views

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
35 views124 slides
Generic or specific? Making sensible software design decisions by
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
7 views60 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
6 views7 slides
Quality Assurance by
Quality Assurance Quality Assurance
Quality Assurance interworksoftware2
5 views6 slides
Bootstrapping vs Venture Capital.pptx by
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
15 views17 slides
predicting-m3-devopsconMunich-2023.pptx by
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
8 views24 slides

Recently uploaded(20)

Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic15 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta9 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS10 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app8 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin95 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8714 views
360 graden fabriek by info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492165 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan7 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi216 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite6 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views

Tales from Taming the Long Tail

  • 1. Tales from Taming Tail Latencies Deepankar Reddy, Ishan Chhabra Rocket Fuel Inc.
  • 2. Recap : Rocket Fuel Inc ◦ Programmatic Ad Tech firm ◦ Eval(s) ~100B Ad opportunities daily ◦ Each eval has strict SLA of 100 ms
  • 3. Recap : Blackbird @RocketFuel
  • 4. Recap : Blackbird Scalable collection storage API ▫ Backed by HBase ▫ Append only collections
  • 5. Recap : Blackbird Stores rich anonymized user data ◦ Historical behavior - Ads viewed and clicked, pages visited, etc. ◦ Interest - Third party and learned interests ◦ Feature vectors for various ML models ◦ etc ..
  • 6. Blackbird Workload ◦ 80% Read - 20% Write workload ◦ 90 - 95 % cache hit ratio ◦ Record size (compressed protobufs) : ▫ Mean : 11 KB ▫ Median : 8 KB
  • 7. Blackbird Workload ◦ 14 TB of Unreplicated Data ◦ 60 - 70 Nodes in a Data Center ◦ Strict SLA of 40 ms ◦ Current SLA violation rate @ ~2 %
  • 8. Blackbird Workload ◦ Read latencies: ▫ Mean, Median : < 1ms ▫ 95th perc : 25 ms ◦ Write latencies: ▫ Mean, Median : < 1ms ▫ 95th perc : 1 ms
  • 9. Blackbird WorkLoad Rocket Fuel Moment Scoring Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BlackBird Service ... ... ... Anon. UserId Tab
  • 10. People Based Marketing ◦ Efforts to cluster identities of user ▫ Probabilistic using Machine Learning ▫ Deterministic via integrations ◦ Reduces information loss ◦ Aligns with user’s buying patterns
  • 11. New Blackbird Queries Rocket Fuel Moment Scoring Pipeline . . . . . . . . . . . . . . . . . . . . . . BlackBird Data Service Anon UserID UserID Clusters Request for All IDs in the cluster
  • 12. Translated SLA X axis : SLA violation rate for a single read Y axis : New SLA violation rate for multiple reads
  • 14. Server Observed Read Latency 99P ~ 25 ms 95P ~ 8 ms
  • 15. Client Observed Read Latency 99P > 100MS 95P > 25MS
  • 17. Network Level Time (ms) Server Side Time (ms) RS Heap size (MBs)
  • 18. Observations ◦ Client / Server times match ◦ Except during the longer mixed GCs
  • 19. Co-ordinated omission © Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
  • 20. Co-ordinated omission © Borrowed from Gil Tene How Not to Measure Latency QCON SF 2015
  • 21. Co-ordinated omission ◦ ~ Censorship of Data ▫ Not random but co-ordinated omission ◦ Censorship during a GC* ▫ Queueing time in app ▫ Time sitting in the network buffer ▫ Time in the responder *with normal latencies
  • 22. Co-ordinated omission Server Side Latencies ◦ GCs don’t show up on inprocess latencies ◦ More events of “request pipeline”, more probability to capture a GC ▫ Add time in network buffer ▫ Add time in responder ▫ Add time in transit etc...
  • 23. Co-ordinated omission Client Side Latencies ◦ Very important to slice requests per server to see the patterns ◦ Even the will miss Young GCs ▫ Problems with moving avgs (Yammer metrics)
  • 24. Causes for Peaks What’s causing Mixed GC : Heap size MB Cache size GB ~25GB ~4-5 GB Around 5 times in a GC cycle 5 x 5 = 25
  • 25. Causes for Peaks Mixed GC: ◦ LRU on heap is bad for GC ▫ All evicted blocks will be in Old Gen ▫ Old Gen cleanup => Mixed GC ◦ No GC optimizations can fix this
  • 26. Why peaks are bad ◦ Not so bad in normal use cases ▫ If peaks occur rarely ◦ Bad enough in clustered reads ◦ Decreases times for other parts in our Moment Scoring Pipeline
  • 30. Fix is to move to Off Heap for LRU cache
  • 31. Off Heap Block Cache ◦ An array of byte buffers (4MB size) ◦ Offset based free space management ◦ Re-use the buffers by overwriting ◦ HBaseCon Talk at 3:10-3:50pm
  • 32. Off Heap Advantages ◦ Can scale to higher memory ▫ Reserving less for promotion failures ◦ Potentially could be on SSDs/NVMes ▫ Allows us to use more denser boxes
  • 34. Work for moving Off Heap ◦ HBase 1.0 copies data onto Heap ◦ Leads to too much Garbage ▫ GCing once every 2 - 3 secs
  • 35. Work for moving Off Heap ◦ HBase 2.0 fixes this ▫ HBASE-11425 ◦ Pulled patches from upstream on 1.1 ◦ Encountered a few issues ▫ HBASE-15064, HBASE-15525 . . .
  • 36. What about Young GC ? ◦ Any GC time above your SLA is bad ◦ Hard to see this with Yammer metrics ▫ Sliding Window smoothing ▫ Eliminates peaks in percentiles ◦ Use histograms without Averaging ▫ HDRHistogram / HBase-2.0 Fast Histogram ...
  • 37. What about Young GC ? HDR Histogram Yammer Histogram
  • 38. Fixing measurements ? More (precise) metrics ◦ Percentiles are confusing ▫ Very hard to reason sometimes ◦ Used SLA based violation counts ▫ Ex:- Time Bucketed counts
  • 40. How to fix Young GC ? ◦ Tried to get GC pause times << SLA ◦ Not possible with current heap sizes ◦ Need RS with smaller heaps ▫ Less promotion work ▫ Less Young cleanup work
  • 41. How to fix Young GC ? ◦ Have to run a lot of RegionServers ▫ Commodity servers are multi core now with large RAM ◦ Slider makes this easier ◦ Load Balancing issue
  • 42. How to fix Young GC ? ◦ Smaller GC pauses => more freq GCs ◦ Need to reduce garbage gen. also
  • 43. How to fix Young GC ? Reducing Garbage generation ◦ Memstore / ConcurrentSkipList oppty. ◦ To use or not use Data Block Encoding ◦ Compress / Decompress Opts. ◦ Misc….
  • 44. How to fix Young GC ? Results
  • 45. How to fix Young GC ? Results
  • 48. Processing Times ◦ Huge bump between 95 & 99 percs ◦ Cache hit ratio 95 % ◦ We are going to disks for these
  • 49. How can we fix this gap? 1. Increase cache hit ratio 2. Make disk reads faster
  • 51. Increasing Cache Hit Ratio Using NVME cards ◦ ~ SSD higher throughput due to PCIe ◦ Support already in Bucket Cache ◦ Cost effective w.r.t RAM ▫ RAM ~ $10 per GB, NVMe ~ $1 per GB
  • 52. Disk throughputs Exploring move to SSDs ◦ SSDs cost ~ SAS disks costs ▫ Depends on SSD grade ▫ Including SAS backplane costs ◦ HDFS can store 1 replica in SSDs, other 2 in HDDs
  • 55. Newer SLA Requirements ◦ Older single read SLA is not enough ▫ 98% translate to 90% in newer model ◦ Top two areas of improvements ▫ Garbage Collection in JVM ▫ Reads going to Disks
  • 56. How to fix Young GC ? RS with 14GB heap pauses < 10ms SLA violation counts
  • 58. How to fix Young GC ? Results (sunday)
  • 59. How to fix Young GC ? Results (sunday)