Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

Optimizing HBase for Cloud
Storage in Microsoft Azure
HDInsight
Nitin Verma, Pravin Mittal, Maxim Lukiyanov
May 24th 2016, HBaseCon 2016

About Us
Nitin Verma
Senior Software Development Engineer – Microsoft, Big Data Platform
Contact: nitinver@microsoft
Pravin Mittal
Principal Software Engineering Manager – Microsoft, Big Data
Contact: pravinm@microsoft
Maxim Lukiyanov
Senior Program Manager – Microsoft, Big Data Platform
Contact: maxluk@microsoft

Outline
 Overview of HBase Service in HDInsight
 Customer Case Study
 Performance Debugging
 Key Takeaways

What is HDInsight HBase Service
 On demand cluster with few clicks
 Out of the box performance
 Supports both Linux & Windows
 Enterprise SLA of 99.9% availability
 Active Health Monitoring via
Telemetry
 24/7 Customer Support
Unique Features
 Storage is decoupled from compute
 Flexibility to scale-out and scale-in
 Write/read unlimited amount of data
irrespective of cluster size
 Data is preserved and accessible
even when cluster is down or deleted










Azure Data Lake Storage: Built For Cloud
Maxim Lukiyanov, Ashit Gosalia7
Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place).
Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance.
Low latency Must have low latency for high-frequency operations.
Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc.
No one analytic framework can work for all data and all types of analysis.
Multiple analytic
frameworks
Details Must be able to store data with all details; aggregation may lead to loss of details.
Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark.
Reliable Must be highly available and reliable (no permanent loss of data).
Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up.
All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.

Customer Case Study and Performance
Optimization

Microsoft’s Real Time Analytics Platform
 Modern self-service telemetry
platform
 Near real-time analytics
 Product health and user engagement
monitoring with custom dashboards
 Performs large-scale indexing on
HDInsight Hbase

4.01 million
EVENTS PER SECOND AT PEAK
12.8 petabytes
INGESTION PER MONTH
>500 million
WEEKLY UNIQUE DEVICES AND MACHINES
450 + 2600
PRODUCTION + INT/SANDBOX
SELF-SERVE TENANTS
__________________________________________
1,600
STORAGE ACCOUNTS
500,000
AZURE STORAGE TRANSACTIONS / SEC
0
20
40
60
80
100
Feb-21 Feb-22 Feb-23
TBingress/hr
Azure Storage traffic
0
1000
2000
3000
4000
5000
6000
7000
Feb-21 Feb-22 Feb-23
Millionstransactions/hr
Table Blob Queue

Results of Early HBase Evaluation
 Customer had very high throughput need for key-value store
 Performance was ~10X lower than their requirement
 Bigger concern: Throughput didn’t scale from 15 -> 30 nodes

Developing a Strategy
Understand the architecture
Run the workload
Collect Metrics & Profile
Profile relevant components
Make performance fixes
Isolate/divide the problem (unit test)
Reproduce at lower scale
Fixed?
Identify Performance Bottlenecks
 Automation can save time
YES
Iterative
process

Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
Cloud
Storage30 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes

Initial Iterations
1. High CPU utilization with REST being top consumer
2. GZ compression was turned ON in REST
3. Heavy GC activity on REST and REGION processes
4. Starvation of REGION process by REST [busy wait for network IO]
 Throughput improved by 10-30% after each iteration

Initial Iterations (contd.)
5. REST server threads waiting on network IO
Collected TCP dump on all the nodes of cluster
REST
REGION SERVER 1
REGION SERVER 2
REGION SERVER 3
REGION SERVER 30
BATCH
 REST server was fanning-out batch
request to all the region servers
 Slowest region server governed the
throughput
 Used SALT_BUCKET scheme to
improve the locality
SLOWEST
Insight from tcpdump

Improvement
 Throughput improved by 2.75X
 Measurement window = ~72 hours
 Avg. Cluster CPU utilization = ~60%
 But no scaling from 30 node to 60
node cluster 
 Time to get back to the architecture

VM VM VM VM
VM VM VM VM
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes
Could GW be a
Bottleneck at
Such high ingestion
rate?

We Had Gateway Bottleneck
And the guess was right!!
 Collected perfmon data on GW nodes
 Core#0 was 100% busy
 RSS is a trick to balance the DPC’s
 Performance improved but not significant
 Both CPU and networking was a
bottleneck
 Time to scale-up the gateway VM size

Configuring private gateway
 We provisioned custom gateway on large VM’s using NginX
 We confirmed that gateway issue was indeed fixed,
 Throughput problem was still not solved and continued to give us
new puzzles

20
VM VM VM VM
VM VM VM VM
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
Could customer app
be a
Bottleneck?

21
VM VM VM VM
VM VM VM VM
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
DISCONNECTED
RETURN 200

New strategy
 We divided the data pipeline into two parts and debugged them in
isolation
1) Client  Gateway [solved]
2) Rest  Region  WASB [unsolved]
 For fast turn-around, we decided to use YCSB for debugging #2
 We configured YCSB with characteristics of customer’s workload
 We ran YCSB locally inside HBase cluster
22

YCSB Experiments
 We had suspicion on one of the following two:
1) REST
2) Azure Storage
 We isolated the problem by replacing Azure Storage with local SSDs
 We then compared the performance of REST v/s RPC
 Results:
 REST was clearly a bottleneck!
23

YCSB Experiments (contd.)
 Root cause of bottleneck in REST:
• Profiling the REST Servers uncovered multiple threads that were blocked on
INFO/DEBUG logging.
• Limiting the logging to WARNING/ERROR level dramatically improved the REST
server performance and brought it very close to RPC.
Sample Stack:
Thread 11540: (state = BLOCKED)
- org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame)
- org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame)
- org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame)
- org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame)
- com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame)
- com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext)
@bci=16, line=205 (Interpreted frame)

YCSB Experiments (contd.)
 RPC v/s REST after fixing INFO message logging
 We could saturate the SSD performance at 160K requests/sec
throughput
 This confirmed that the bottleneck in REST server was solved
25

Back to Customer Workload
 After limiting logging level to WARN, throughput improved further by
~5.5X
 This was ~15X gain from the point where we started
 Customer is happy and use HDInsight Hbase service in production
 They are able to meet the throughput goals with enough margin to
scale further
26

Tools Utilized
27
Category Tools on Windows Tools on Linux for Java Process
System Counters: CPU, Memory, IO,
Process etc.
Perfmon mpstat, iostat, vmstat, sar, nload,
glances
Networking tcpdump Tcpdump
CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof
CPU blocking issues Xperf, concurrency visualizer, ppa Jstack
Debugging Large Clusters powershell, python expect, bash, awk, python, screen,
expect

New performance features in
HBase
28

Overcoming Storage Latency
 HBase now has MultiWAL and BucketCaching features
 Made to minimize the impact of high storage latency
 Parallelism and batching are the keys to hide write latency (MultiWAL)
 MultiWAL gives higher throughput with lower number of region nodes
 We achieve 500K inserts/sec with just 8 small region nodes for an IoT
customer
29

Overcoming Storage Latency (contd.)
 What about read latency?
 Caching and Read-Ahead are the keys to overcome read latency
 Cache on write helps application that are temporal in nature
 HDInsight VM’s are backed with SSD’s
 BucketCaching feature can utilize SSD as L2 cache
 BucketCaching gives ~20X-30X gain in read performance to our
customers

Conclusion
 The performance issue was quite complex, where bottlenecks were
hiding at several layers and components in the pipeline
 Deeper engagement with customers helped in optimizing HDInsight
HBase service
 HDI Team has been actively productizing performance fixes
 ADLS, MultiWAL and BucketCache help in minimizing the latency impact
31

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

Similar to Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

Editor's Notes