SlideShare a Scribd company logo
1 of 47
HBase Low Latency 
Nick Dimiduk, Hortonworks (@xefyr) 
Nicolas Liochon, Scaled Risk (@nkeywal) 
Strata New York, October 17, 2014
Agenda 
• Latency, what is it, how to measure it 
• Write path 
• Read path 
• Next steps
What’s low latency 
• Meaning from micro seconds (High Frequency 
Trading) to seconds (interactive queries) 
• In this talk milliseconds 
Latency is about percentiles 
• Average != 50% percentile 
• There are often order of magnitudes between « average » and « 95 
percentile » 
• Post 99% = « magical 1% ». Work in progress here.
Measure latency 
YCSB - Yahoo! Cloud Serving Benchmark 
• Useful for comparison between databases 
• Set of workload already defined 
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation 
• More options related to HBase: autoflush, replicas, … 
• Latency measured in micro second 
• Easier for internal analysis
Why is it important 
Durability 
Availability 
Consistency
3 
1 
Client 
Buffer 
0 
Server 
Buffer 
HBase BigTable 
OS 
Buffer 
on GFS1 
Traditional 
DB Engine 
Disk 
Durability 
2
Durability
Consistency 
Two processes: P1, P2 
Counter updated by a P1 
v1, then v2, then v3 
Eventual consistency allows P1 
and P2 to see these events in any 
order. 
Strong consistency allows only one 
order 
Google F1 paper, VLDB (2013) 
We store financial data and have hard requirements 
on data integrity and consistency. We also have a lot 
of experience with eventual consistency systems at 
Google. In all such systems, we find developers spend 
a significant fraction of their time building extremely 
complex and error-prone mechanisms to cope with 
eventual consistency and handle data that may be 
out of date. We think this is an unacceptable burden 
to place on developers and that consistency problems 
should be solved at the database level.
Consistency 
Big Table design: allows consistency by partitioning the data: each 
machine serves a subset of the data.
Availability 
• Contract is: « a client outside the cluster will sees HBase as available if 
there are partitions or failure within the HBase cluster » 
• There is a lot more to say, but it’s outside the scope of this talk 
(unfortunately)
Availability 
A partition or a machine failure 
appear to the client as a latency spike
Trade off 
• Maximizing the benefits while minimizing 
the cost 
• Implementation details count 
• Configuration counts
Write path 
• Two parts 
• Single put (WAL) 
• The client just sends the put 
• Multiple puts from the client (new behavior since 0.96) 
• The client is much smarter 
• Four stages to look at for latency 
• Start (establish tcp connections, etc.) 
• Steady: when expected conditions are met 
• Machine failure: expected as well 
• Overloaded system
Single put: communication & scheduling 
• Client: TCP connection to the server 
• Shared: multitheads on the same client are using the same TCP connection 
• Pooling is possible and does improve the performances in some circonstances 
• hbase.client.ipc.pool.size 
• Server: multiple calls from multiple threads on multiple machines 
• Can become thousand of simultaneous queries 
• Scheduling is required
Single put: real work 
• The server must 
• Write into the WAL queue 
• Sync the WAL queue (HDFS flush) 
• Write into the memstore 
• WALs queue is shared between all the regions/handlers 
• Sync is avoided if another handlers did the work 
• You may flush more than expected
Simple put: A small run 
Percentile Time in ms 
Mean 1.21 
50% 0.95 
95% 1.50 
99% 2.12
Latency sources 
• Candidate one: network 
• 0.5ms within a datacenter 
• Much less between nodes in the same rack 
Percentile Time in ms 
Mean 0.13 
50% 0.12 
95% 0.15 
99% 0.47
Latency sources 
• Candidate two: HDFS Flush 
Percentile Time in ms 
Mean 0.33 
50% 0.26 
95% 0.59 
99% 1.24 
• We can still do better: HADOOP-7714 & sons.
Latency sources 
• Millisecond world: everything can go wrong 
• JVM 
• Network 
• OS Scheduler 
• File System 
• All this goes into the post 99% percentile 
• Requires monitoring 
• Usually using the latest version shelps.
Latency sources 
• Split (and presplits) 
• Autosharding is great! 
• Puts have to wait 
• Impacts: seconds 
• Balance 
• Regions move 
• Triggers a retry for the client 
• hbase.client.pause = 100ms since HBase 0.96 
• Garbage Collection 
• Impacts: 10’s of ms, even with a good config 
• Covered with the read path of this talk
From steady to loaded and overloaded 
• Number of concurrent tasks is a factor of 
• Number of cores 
• Number of disks 
• Number of remote machines used 
• Difficult to estimate 
• Queues are doomed to happen 
• hbase.regionserver.handler.count 
• So for low latency 
• Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code. 
• RPC Priorities: since 0.98 (HBASE-11048)
From loaded to overloaded 
• MemStore takes too much room: flush, then blocksquite quickly 
• hbase.regionserver.global.memstore.size.lower.limit 
• hbase.regionserver.global.memstore.size 
• hbase.hregion.memstore.block.multiplier 
• Too many Hfiles: block until compactions keep up 
• hbase.hstore.blockingStoreFiles 
• Too manyWALs files: Flush and block 
• hbase.regionserver.maxlogs
Machine failure 
• Failure 
• Dectect 
• Reallocate 
• Replay WAL 
• ReplayingWAL is NOT required for puts 
• hbase.master.distributed.log.replay 
• (default true in 1.0) 
• Failure = Dectect + Reallocate + Retry 
• That’s in the range of ~1s for simple failures 
• Silent failures leads puts you in the 10s range if the hardware does not help 
• zookeeper.session.timeout
Single puts 
• Millisecond range 
• Spikes do happen in steady mode 
• 100ms 
• Causes: GC, load, splits
Streaming puts 
Htable#setAutoFlushTo(false) 
Htable#put 
Htable#flushCommit 
• As simple puts, but 
• Puts are grouped and send in background 
• Load is taken into account 
• Does not block
Multiple puts 
hbase.client.max.total.tasks (default 100) 
hbase.client.max.perserver.tasks (default 5) 
hbase.client.max.perregion.tasks (default 1) 
• Decouple the client from a latency spike of a region server 
• Increase the throughput by 50% compared to old multiput 
• Makes split and GC more transparent
Conclusion on write path 
• Single puts can be very fast 
• It’s not a « hard real time » system: there are spikes 
• Most latency spikes can be hidden when streaming puts 
• Failure are NOT that difficult for the write path 
• No WAL to replay
And now for the read path
Read path 
• Get/short scan are assumed for low-latency operations 
• Again, two APIs 
• Single get: HTable#get(Get) 
• Multi-get: HTable#get(List<Get>) 
• Four stages, same as write path 
• Start (tcp connection, …) 
• Steady: when expected conditions are met 
• Machine failure: expected as well 
• Overloaded system: you may need to add machines or tune your workload
Multi get / Client 
Group Gets by 
RegionServer 
Execute them 
one by one
Multi get / Server
Multi get / Server
AcceSstso rlaagtee hniecrya rmchya: gan diifdfeeresnt view 
Dean/2009 
Memory is 100000x 
faster than disk! 
Disk seek = 10ms
Known unknowns 
• For each candidate HFile 
• Exclude by file metadata 
• Timestamp 
• Rowkey range 
• Exclude by bloom filter 
StoreFileScanner# 
shouldUseScanner()
Unknown knowns 
• Merge sort results polled from Stores 
• Seek each scanner to a reference KeyValue 
• Retrieve candidate data from disk 
• Multiple HFiles => mulitple seeks 
• hbase.storescanner.parallel.seek.enable=true 
• Short Circuit Reads 
• dfs.client.read.shortcircuit=true 
• Block locality 
• Happy clusters compact! 
HFileBlock# 
readBlockData()
BlockCache 
• Reuse previously read data 
• Maximize cache hit rate 
• Larger cache 
• Temporal access locality 
• Physical access locality 
BlockCache#getBlock()
BlockCache Showdown 
• LruBlockCache 
• Default, onheap 
• Quite good most of the time 
• Evictions impact GC 
• BucketCache 
• Offheap alternative 
• Serialization overhead 
• Large memory configurations 
http://www.n10k.com/blog/block 
cache-showdown/ 
L2 off-heap BucketCache 
makes a strong showing
Latency enemies: Garbage Collection 
• Use heap. Not too much. With CMS. 
• Max heap 
• 30GB (compressed pointers) 
• 8-16GB if you care about 9’s 
• Healthy cluster load 
• regular, reliable collections 
• 25-100ms pause on regular interval 
• Overloaded RegionServer suffers GC overmuch
Off-heap to the rescue? 
• BucketCache (0.96, HBASE-7404) 
• Network interfaces (HBASE-9535) 
• MemStore et al (HBASE-10191)
Latency enemies: Compactions 
• Fewer HFiles => fewer seeks 
• Evict data blocks! 
• Evict Index blocks!! 
• hfile.block.index.cacheonwrite 
• Evict bloom blocks!!! 
• hfile.block.bloom.cacheonwrite 
• OS buffer cache to the rescue 
• Compactected data is still fresh 
• Better than going all the way back to disk
Failure 
• Detect + Reassign + Replay 
• Strong consistency requires replay 
• Locality drops to 0 
• Cache starts from scratch
Hedging our bets 
• HDFS Hedged reads (2.4, HDFS-5776) 
• Reads on secondary DataNodes 
• Strongly consistent 
• Works at the HDFS level 
• Timeline consistency (HBASE-10070) 
• Reads on « Replica Region » 
• Not strongly consistent
Read latency in summary 
• Steady mode 
• Cache hit: < 1 ms 
• Cache miss: + 10 ms per seek 
• Writing while reading => cache churn 
• GC: 25-100ms pause on regular interval 
Network request + (1 - P(cache hit)) * (10 ms * seeks) 
• Same long tail issues as write 
• Overloaded: same scheduling issues as write 
• Partial failures hurt a lot
HBase ranges for 99% latency 
Put 
Streamed 
Multiput Get Timeline get 
Steady milliseconds milliseconds milliseconds milliseconds 
Failure seconds seconds seconds milliseconds 
GC 
10’s of 
milliseconds milliseconds 
10’s of 
milliseconds milliseconds
What’s next 
• Less GC 
• Use less objects 
• Offheap 
✓Compressed BlockCache (HBASE-11331) 
• Prefered location (HBASE-4755) 
• The « magical 1% » 
• Most tools stops at the 99% latency 
• What happens after is much more complex
35.0x 
30.0x 
25.0x 
20.0x 
15.0x 
10.0x 
5.0x 
0.0x 
Performance with Compressed BlockCache 
Total RAM: 24G LruBlockCache Size: 12G 
Data Size: 45G Compressed Size: 11G 
Compression: SNAPPY 
throughput (ops/sec) latency (ms, p95) latency (ms, p99) cpu load 
Times improvement 
Metric 
Enabled 
Disabled
Thanks! 
Nick Dimiduk, Hortonworks (@xefyr) 
Nicolas Liochon, Scaled Risk (@nkeywal) 
Strata New York, October 17, 2014

More Related Content

What's hot

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory HBaseCon
 
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageHBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageCloudera, Inc.
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 

What's hot (20)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageHBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Viewers also liked

Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 

Viewers also liked (11)

Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
HBase Data Types
HBase Data TypesHBase Data Types
HBase Data Types
 
Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 

Similar to HBase Low Latency, StrataNYC 2014

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopAyon Sinha
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 

Similar to HBase Low Latency, StrataNYC 2014 (20)

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014HBASE by  Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

HBase Low Latency, StrataNYC 2014

  • 1. HBase Low Latency Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Strata New York, October 17, 2014
  • 2. Agenda • Latency, what is it, how to measure it • Write path • Read path • Next steps
  • 3. What’s low latency • Meaning from micro seconds (High Frequency Trading) to seconds (interactive queries) • In this talk milliseconds Latency is about percentiles • Average != 50% percentile • There are often order of magnitudes between « average » and « 95 percentile » • Post 99% = « magical 1% ». Work in progress here.
  • 4. Measure latency YCSB - Yahoo! Cloud Serving Benchmark • Useful for comparison between databases • Set of workload already defined bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation • More options related to HBase: autoflush, replicas, … • Latency measured in micro second • Easier for internal analysis
  • 5. Why is it important Durability Availability Consistency
  • 6. 3 1 Client Buffer 0 Server Buffer HBase BigTable OS Buffer on GFS1 Traditional DB Engine Disk Durability 2
  • 8. Consistency Two processes: P1, P2 Counter updated by a P1 v1, then v2, then v3 Eventual consistency allows P1 and P2 to see these events in any order. Strong consistency allows only one order Google F1 paper, VLDB (2013) We store financial data and have hard requirements on data integrity and consistency. We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.
  • 9. Consistency Big Table design: allows consistency by partitioning the data: each machine serves a subset of the data.
  • 10. Availability • Contract is: « a client outside the cluster will sees HBase as available if there are partitions or failure within the HBase cluster » • There is a lot more to say, but it’s outside the scope of this talk (unfortunately)
  • 11. Availability A partition or a machine failure appear to the client as a latency spike
  • 12. Trade off • Maximizing the benefits while minimizing the cost • Implementation details count • Configuration counts
  • 13. Write path • Two parts • Single put (WAL) • The client just sends the put • Multiple puts from the client (new behavior since 0.96) • The client is much smarter • Four stages to look at for latency • Start (establish tcp connections, etc.) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system
  • 14. Single put: communication & scheduling • Client: TCP connection to the server • Shared: multitheads on the same client are using the same TCP connection • Pooling is possible and does improve the performances in some circonstances • hbase.client.ipc.pool.size • Server: multiple calls from multiple threads on multiple machines • Can become thousand of simultaneous queries • Scheduling is required
  • 15. Single put: real work • The server must • Write into the WAL queue • Sync the WAL queue (HDFS flush) • Write into the memstore • WALs queue is shared between all the regions/handlers • Sync is avoided if another handlers did the work • You may flush more than expected
  • 16. Simple put: A small run Percentile Time in ms Mean 1.21 50% 0.95 95% 1.50 99% 2.12
  • 17. Latency sources • Candidate one: network • 0.5ms within a datacenter • Much less between nodes in the same rack Percentile Time in ms Mean 0.13 50% 0.12 95% 0.15 99% 0.47
  • 18. Latency sources • Candidate two: HDFS Flush Percentile Time in ms Mean 0.33 50% 0.26 95% 0.59 99% 1.24 • We can still do better: HADOOP-7714 & sons.
  • 19. Latency sources • Millisecond world: everything can go wrong • JVM • Network • OS Scheduler • File System • All this goes into the post 99% percentile • Requires monitoring • Usually using the latest version shelps.
  • 20. Latency sources • Split (and presplits) • Autosharding is great! • Puts have to wait • Impacts: seconds • Balance • Regions move • Triggers a retry for the client • hbase.client.pause = 100ms since HBase 0.96 • Garbage Collection • Impacts: 10’s of ms, even with a good config • Covered with the read path of this talk
  • 21. From steady to loaded and overloaded • Number of concurrent tasks is a factor of • Number of cores • Number of disks • Number of remote machines used • Difficult to estimate • Queues are doomed to happen • hbase.regionserver.handler.count • So for low latency • Replable scheduler since HBase 0.98 (HBASE-8884). Requires specific code. • RPC Priorities: since 0.98 (HBASE-11048)
  • 22. From loaded to overloaded • MemStore takes too much room: flush, then blocksquite quickly • hbase.regionserver.global.memstore.size.lower.limit • hbase.regionserver.global.memstore.size • hbase.hregion.memstore.block.multiplier • Too many Hfiles: block until compactions keep up • hbase.hstore.blockingStoreFiles • Too manyWALs files: Flush and block • hbase.regionserver.maxlogs
  • 23. Machine failure • Failure • Dectect • Reallocate • Replay WAL • ReplayingWAL is NOT required for puts • hbase.master.distributed.log.replay • (default true in 1.0) • Failure = Dectect + Reallocate + Retry • That’s in the range of ~1s for simple failures • Silent failures leads puts you in the 10s range if the hardware does not help • zookeeper.session.timeout
  • 24. Single puts • Millisecond range • Spikes do happen in steady mode • 100ms • Causes: GC, load, splits
  • 25. Streaming puts Htable#setAutoFlushTo(false) Htable#put Htable#flushCommit • As simple puts, but • Puts are grouped and send in background • Load is taken into account • Does not block
  • 26. Multiple puts hbase.client.max.total.tasks (default 100) hbase.client.max.perserver.tasks (default 5) hbase.client.max.perregion.tasks (default 1) • Decouple the client from a latency spike of a region server • Increase the throughput by 50% compared to old multiput • Makes split and GC more transparent
  • 27. Conclusion on write path • Single puts can be very fast • It’s not a « hard real time » system: there are spikes • Most latency spikes can be hidden when streaming puts • Failure are NOT that difficult for the write path • No WAL to replay
  • 28. And now for the read path
  • 29. Read path • Get/short scan are assumed for low-latency operations • Again, two APIs • Single get: HTable#get(Get) • Multi-get: HTable#get(List<Get>) • Four stages, same as write path • Start (tcp connection, …) • Steady: when expected conditions are met • Machine failure: expected as well • Overloaded system: you may need to add machines or tune your workload
  • 30. Multi get / Client Group Gets by RegionServer Execute them one by one
  • 31. Multi get / Server
  • 32. Multi get / Server
  • 33. AcceSstso rlaagtee hniecrya rmchya: gan diifdfeeresnt view Dean/2009 Memory is 100000x faster than disk! Disk seek = 10ms
  • 34. Known unknowns • For each candidate HFile • Exclude by file metadata • Timestamp • Rowkey range • Exclude by bloom filter StoreFileScanner# shouldUseScanner()
  • 35. Unknown knowns • Merge sort results polled from Stores • Seek each scanner to a reference KeyValue • Retrieve candidate data from disk • Multiple HFiles => mulitple seeks • hbase.storescanner.parallel.seek.enable=true • Short Circuit Reads • dfs.client.read.shortcircuit=true • Block locality • Happy clusters compact! HFileBlock# readBlockData()
  • 36. BlockCache • Reuse previously read data • Maximize cache hit rate • Larger cache • Temporal access locality • Physical access locality BlockCache#getBlock()
  • 37. BlockCache Showdown • LruBlockCache • Default, onheap • Quite good most of the time • Evictions impact GC • BucketCache • Offheap alternative • Serialization overhead • Large memory configurations http://www.n10k.com/blog/block cache-showdown/ L2 off-heap BucketCache makes a strong showing
  • 38. Latency enemies: Garbage Collection • Use heap. Not too much. With CMS. • Max heap • 30GB (compressed pointers) • 8-16GB if you care about 9’s • Healthy cluster load • regular, reliable collections • 25-100ms pause on regular interval • Overloaded RegionServer suffers GC overmuch
  • 39. Off-heap to the rescue? • BucketCache (0.96, HBASE-7404) • Network interfaces (HBASE-9535) • MemStore et al (HBASE-10191)
  • 40. Latency enemies: Compactions • Fewer HFiles => fewer seeks • Evict data blocks! • Evict Index blocks!! • hfile.block.index.cacheonwrite • Evict bloom blocks!!! • hfile.block.bloom.cacheonwrite • OS buffer cache to the rescue • Compactected data is still fresh • Better than going all the way back to disk
  • 41. Failure • Detect + Reassign + Replay • Strong consistency requires replay • Locality drops to 0 • Cache starts from scratch
  • 42. Hedging our bets • HDFS Hedged reads (2.4, HDFS-5776) • Reads on secondary DataNodes • Strongly consistent • Works at the HDFS level • Timeline consistency (HBASE-10070) • Reads on « Replica Region » • Not strongly consistent
  • 43. Read latency in summary • Steady mode • Cache hit: < 1 ms • Cache miss: + 10 ms per seek • Writing while reading => cache churn • GC: 25-100ms pause on regular interval Network request + (1 - P(cache hit)) * (10 ms * seeks) • Same long tail issues as write • Overloaded: same scheduling issues as write • Partial failures hurt a lot
  • 44. HBase ranges for 99% latency Put Streamed Multiput Get Timeline get Steady milliseconds milliseconds milliseconds milliseconds Failure seconds seconds seconds milliseconds GC 10’s of milliseconds milliseconds 10’s of milliseconds milliseconds
  • 45. What’s next • Less GC • Use less objects • Offheap ✓Compressed BlockCache (HBASE-11331) • Prefered location (HBASE-4755) • The « magical 1% » • Most tools stops at the 99% latency • What happens after is much more complex
  • 46. 35.0x 30.0x 25.0x 20.0x 15.0x 10.0x 5.0x 0.0x Performance with Compressed BlockCache Total RAM: 24G LruBlockCache Size: 12G Data Size: 45G Compressed Size: 11G Compression: SNAPPY throughput (ops/sec) latency (ms, p95) latency (ms, p99) cpu load Times improvement Metric Enabled Disabled
  • 47. Thanks! Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) Strata New York, October 17, 2014

Editor's Notes

  1. This talk: assume get/short scan implies low-latency requirements No fancy streaming client like Puts; waiting for slowest RS.
  2. Gets grouped by RS, groups sent in parallel, block until all groups return. Network call: 10’s micros, in parallel
  3. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  4. Read path in full detail is quite complex See Lar’s HBaseCon 2012 talk for the nitty-gritty Complexity optimizing around one invariant: 10ms seek
  5. Complexity optimizing around one invariant: 10ms seek Aiming for 100 microSec world; how to work around this? Goal: avoid disk at all cost!
  6. Goal: don’t go to disk unless absolutely necessary. Tactic: Candidate HFile elimination. Regular compactions => 3-5 candidates
  7. Mergesort over multiple files, multiple seeks More spindles = parallel scanning SCR avoids proxy process (DataNode) But remember! Goal: don’t go to disk unless absolutely necessary.
  8. “block” is a segment of an HFile Data blocks, index blocks, and bloom blocks Read blocks retained in BlockCache Seek to same and adjacent data become cache hits
  9. HBase ships with a variety of cache implementations Happy with 95% stats? Stick with LruBlockCache and modest heapsize Pushing 99%? Lru still okay, but watch that heapsize. Spent money on RAM? BucketCache
  10. GC is a part of healthy operation BlockCache garbage is awkward size and age, which means pauses Pause time is a function of heap size More like ~16GiB if you’re really worried about 99% Overloaded: more cache evictions, time in GC
  11. Why generate garbage at all? GC are smart, but maybe we know our pain spots better? Don’t know until we try
  12. Necessary for fewer scan candidates, fewer seeks Buffer cache to the rescue “That’s steady state and overloaded; let’s talk about failure”
  13. Replaying WALs takes time Unlucky: no data locality, talking to remove DataNode Empty cache “Failure isn’t binary. What about the sick and the dying?”
  14. Don’t wait for a slow machine!
  15. Reads dominated by disk seek, so keep that data in memory After cache miss, GC is the next candidate cause of latency “Ideal formula” P(cache hit): fn (cache size :: db size, request locality) Sometimes the jitter dominates
  16. Standard deployment, well designed schema Millisecond responses, seconds for failure recovery, and GC at a regular interval Everything we’ve focused on here is impactful of the 99% Beyond that there’s a lot of interesting problems to solve
  17. There’s always more work to be done Generate less garbage Compressed BlockCache Improve recovery time and locality
  18. Questions!