Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

HBaseCon
Optimizing HBase for Cloud
Storage in Microsoft Azure
HDInsight
Nitin Verma, Pravin Mittal, Maxim Lukiyanov
May 24th 2016, HBaseCon 2016
About Us
Nitin Verma
Senior Software Development Engineer – Microsoft, Big Data Platform
Contact: nitinver@microsoft
Pravin Mittal
Principal Software Engineering Manager – Microsoft, Big Data
Contact: pravinm@microsoft
Maxim Lukiyanov
Senior Program Manager – Microsoft, Big Data Platform
Contact: maxluk@microsoft
Outline
 Overview of HBase Service in HDInsight
 Customer Case Study
 Performance Debugging
 Key Takeaways
What is HDInsight HBase Service
 On demand cluster with few clicks
 Out of the box performance
 Supports both Linux & Windows
 Enterprise SLA of 99.9% availability
 Active Health Monitoring via
Telemetry
 24/7 Customer Support
Unique Features
 Storage is decoupled from compute
 Flexibility to scale-out and scale-in
 Write/read unlimited amount of data
irrespective of cluster size
 Data is preserved and accessible
even when cluster is down or deleted
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight








Azure Data Lake Storage: Built For Cloud
Maxim Lukiyanov, Ashit Gosalia7
Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place).
Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance.
Low latency Must have low latency for high-frequency operations.
Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc.
No one analytic framework can work for all data and all types of analysis.
Multiple analytic
frameworks
Details Must be able to store data with all details; aggregation may lead to loss of details.
Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark.
Reliable Must be highly available and reliable (no permanent loss of data).
Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up.
All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
Customer Case Study and Performance
Optimization
Microsoft’s Real Time Analytics Platform
 Modern self-service telemetry
platform
 Near real-time analytics
 Product health and user engagement
monitoring with custom dashboards
 Performs large-scale indexing on
HDInsight Hbase
4.01 million
EVENTS PER SECOND AT PEAK
12.8 petabytes
INGESTION PER MONTH
>500 million
WEEKLY UNIQUE DEVICES AND MACHINES
450 + 2600
PRODUCTION + INT/SANDBOX
SELF-SERVE TENANTS
__________________________________________
1,600
STORAGE ACCOUNTS
500,000
AZURE STORAGE TRANSACTIONS / SEC
0
20
40
60
80
100
Feb-21 Feb-22 Feb-23
TBingress/hr
Azure Storage traffic
0
1000
2000
3000
4000
5000
6000
7000
Feb-21 Feb-22 Feb-23
Millionstransactions/hr
Table Blob Queue
Results of Early HBase Evaluation
 Customer had very high throughput need for key-value store
 Performance was ~10X lower than their requirement
 Bigger concern: Throughput didn’t scale from 15 -> 30 nodes
Developing a Strategy
Understand the architecture
Run the workload
Collect Metrics & Profile
Profile relevant components
Make performance fixes
Isolate/divide the problem (unit test)
Reproduce at lower scale
Fixed?
Identify Performance Bottlenecks
 Automation can save time
YES
Iterative
process
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
Cloud
Storage30 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes
Initial Iterations
1. High CPU utilization with REST being top consumer
2. GZ compression was turned ON in REST
3. Heavy GC activity on REST and REGION processes
4. Starvation of REGION process by REST [busy wait for network IO]
 Throughput improved by 10-30% after each iteration
Initial Iterations (contd.)
5. REST server threads waiting on network IO
Collected TCP dump on all the nodes of cluster
REST
REGION SERVER 1
REGION SERVER 2
REGION SERVER 3
REGION SERVER 30
BATCH
 REST server was fanning-out batch
request to all the region servers
 Slowest region server governed the
throughput
 Used SALT_BUCKET scheme to
improve the locality
SLOWEST
Insight from tcpdump
Improvement
 Throughput improved by 2.75X
 Measurement window = ~72 hours
 Avg. Cluster CPU utilization = ~60%
 But no scaling from 30 node to 60
node cluster 
 Time to get back to the architecture
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM HDI Gateway &
Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x large
worker nodes
1000+ cores
Medium Latency
High Bandwidth
REST REQUEST
Batch Size = 1000
Row Size = 1400 bytes
Could GW be a
Bottleneck at
Such high ingestion
rate?
We Had Gateway Bottleneck
And the guess was right!!
 Collected perfmon data on GW nodes
 Core#0 was 100% busy
 RSS is a trick to balance the DPC’s
 Performance improved but not significant
 Both CPU and networking was a
bottleneck
 Time to scale-up the gateway VM size
Configuring private gateway
 We provisioned custom gateway on large VM’s using NginX
 We confirmed that gateway issue was indeed fixed,
 Throughput problem was still not solved and continued to give us
new puzzles
20
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
Could customer app
be a
Bottleneck?
21
Pipelineofdataingestion
VM VM VM VM
VM VM VM VM
Data Ingestion Client App [PaaS]
Multiple Storage Accounts and Queues
300 VM’s
VM VM
Programmed
NGINX as Gateway
and Load Balancer
REST SERVERS
REGION SERVERS
HBASE CLUSTER
WASB
60 x D14
worker nodes
1040 cores
DISCONNECTED
RETURN 200
New strategy
 We divided the data pipeline into two parts and debugged them in
isolation
1) Client  Gateway [solved]
2) Rest  Region  WASB [unsolved]
 For fast turn-around, we decided to use YCSB for debugging #2
 We configured YCSB with characteristics of customer’s workload
 We ran YCSB locally inside HBase cluster
22
YCSB Experiments
 We had suspicion on one of the following two:
1) REST
2) Azure Storage
 We isolated the problem by replacing Azure Storage with local SSDs
 We then compared the performance of REST v/s RPC
 Results:
 REST was clearly a bottleneck!
23
YCSB Experiments (contd.)
 Root cause of bottleneck in REST:
• Profiling the REST Servers uncovered multiple threads that were blocked on
INFO/DEBUG logging.
• Limiting the logging to WARNING/ERROR level dramatically improved the REST
server performance and brought it very close to RPC.
Sample Stack:
Thread 11540: (state = BLOCKED)
- org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame)
- org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame)
- org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame)
- org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame)
- org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame)
- sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame)
- com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame)
- com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext)
@bci=16, line=205 (Interpreted frame)
YCSB Experiments (contd.)
 RPC v/s REST after fixing INFO message logging
 We could saturate the SSD performance at 160K requests/sec
throughput
 This confirmed that the bottleneck in REST server was solved
25
Back to Customer Workload
 After limiting logging level to WARN, throughput improved further by
~5.5X
 This was ~15X gain from the point where we started
 Customer is happy and use HDInsight Hbase service in production
 They are able to meet the throughput goals with enough margin to
scale further
26
Tools Utilized
27
Category Tools on Windows Tools on Linux for Java Process
System Counters: CPU, Memory, IO,
Process etc.
Perfmon mpstat, iostat, vmstat, sar, nload,
glances
Networking tcpdump Tcpdump
CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof
CPU blocking issues Xperf, concurrency visualizer, ppa Jstack
Debugging Large Clusters powershell, python expect, bash, awk, python, screen,
expect
New performance features in
HBase
28
Overcoming Storage Latency
 HBase now has MultiWAL and BucketCaching features
 Made to minimize the impact of high storage latency
 Parallelism and batching are the keys to hide write latency (MultiWAL)
 MultiWAL gives higher throughput with lower number of region nodes
 We achieve 500K inserts/sec with just 8 small region nodes for an IoT
customer
29
Overcoming Storage Latency (contd.)
 What about read latency?
 Caching and Read-Ahead are the keys to overcome read latency
 Cache on write helps application that are temporal in nature
 HDInsight VM’s are backed with SSD’s
 BucketCaching feature can utilize SSD as L2 cache
 BucketCaching gives ~20X-30X gain in read performance to our
customers
Conclusion
 The performance issue was quite complex, where bottlenecks were
hiding at several layers and components in the pipeline
 Deeper engagement with customers helped in optimizing HDInsight
HBase service
 HDI Team has been actively productizing performance fixes
 ADLS, MultiWAL and BucketCache help in minimizing the latency impact
31
Thank You!
32
1 of 32

Recommended

HBaseCon 2015- HBase @ Flipboard by
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
7.6K views34 slides
Apache HBase in the Enterprise Data Hub at Cerner by
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
2.1K views72 slides
HBaseCon 2015: State of HBase Docs and How to Contribute by
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon 2015: State of HBase Docs and How to Contribute
HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon
3.3K views23 slides
Off-heaping the Apache HBase Read Path by
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
4.2K views19 slides
HBaseCon 2015: HBase Operations in a Flurry by
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
4.1K views22 slides
HBase: Where Online Meets Low Latency by
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
4.7K views42 slides

More Related Content

What's hot

HBaseCon 2012 | HBase, the Use Case in eBay Cassini by
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
6.1K views13 slides
HBaseCon 2015: HBase Operations at Xiaomi by
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
4.5K views35 slides
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera by
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
5.5K views30 slides
Meet HBase 1.0 by
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
8.2K views48 slides
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
3.1K views26 slides
Time-Series Apache HBase by
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
5.6K views17 slides

What's hot(20)

HBaseCon 2012 | HBase, the Use Case in eBay Cassini by Cloudera, Inc.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.6.1K views
HBaseCon 2015: HBase Operations at Xiaomi by HBaseCon
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon4.5K views
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera by Cloudera, Inc.
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Cloudera, Inc.5.5K views
Meet HBase 1.0 by enissoz
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz8.2K views
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by Cloudera, Inc.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.3.1K views
Time-Series Apache HBase by HBaseCon
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
HBaseCon5.6K views
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster by Cloudera, Inc.
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.7.5K views
Digital Library Collection Management using HBase by HBaseCon
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
HBaseCon3.1K views
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc... by Cloudera, Inc.
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.9.3K views
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time by Michael Stack
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
Michael Stack1.6K views
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket by Cloudera, Inc.
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
Cloudera, Inc.3.6K views
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity by HBaseCon
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon4.8K views
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight by HBaseCon
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon3.8K views
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by Cloudera, Inc.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.4.6K views
HBase Read High Availability Using Timeline-Consistent Region Replicas by HBaseCon
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon4.1K views
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage by Cloudera, Inc.
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed StorageHBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage
Cloudera, Inc.9.4K views
HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment by HBaseCon
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon4K views
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase by Cloudera, Inc.
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.3.2K views

Viewers also liked

Apache HBase at Airbnb by
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb HBaseCon
5.9K views35 slides
Apache HBase - Just the Basics by
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
4.6K views22 slides
Apache HBase Improvements and Practices at Xiaomi by
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiHBaseCon
4.8K views56 slides
Improvements to Apache HBase and Its Applications in Alibaba Search by
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search HBaseCon
3.6K views19 slides
Breaking the Sound Barrier with Persistent Memory by
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory HBaseCon
1.6K views14 slides
Argus Production Monitoring at Salesforce by
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
3.2K views21 slides

Viewers also liked(20)

Apache HBase at Airbnb by HBaseCon
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
HBaseCon5.9K views
Apache HBase - Just the Basics by HBaseCon
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
HBaseCon4.6K views
Apache HBase Improvements and Practices at Xiaomi by HBaseCon
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon4.8K views
Improvements to Apache HBase and Its Applications in Alibaba Search by HBaseCon
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon3.6K views
Breaking the Sound Barrier with Persistent Memory by HBaseCon
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon1.6K views
Argus Production Monitoring at Salesforce by HBaseCon
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon3.2K views
Keynote: Apache HBase at Yahoo! Scale by HBaseCon
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
HBaseCon5.3K views
Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
Apache HBase, Accelerated: In-Memory Flush and Compaction by HBaseCon
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon2.5K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon803 views
Introduction to PolyBase by James Serra
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
James Serra5K views
Keynote: Welcome Message/State of Apache HBase by HBaseCon
Keynote: Welcome Message/State of Apache HBase Keynote: Welcome Message/State of Apache HBase
Keynote: Welcome Message/State of Apache HBase
HBaseCon2.5K views
Design Patterns for Building 360-degree Views with HBase and Kiji by HBaseCon
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon4.3K views
Architecting big data solutions in the cloud by Mostafa Elzoghbi
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
Mostafa Elzoghbi570 views
HBaseCon 2015: HBase @ Flipboard by HBaseCon
HBaseCon 2015: HBase @ FlipboardHBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ Flipboard
HBaseCon4K views
Tales from Taming the Long Tail by HBaseCon
Tales from Taming the Long TailTales from Taming the Long Tail
Tales from Taming the Long Tail
HBaseCon1.5K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon2.6K views
HBase: Just the Basics by HBaseCon
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon7.4K views
Solving Multi-tenancy and G1GC in Apache HBase by HBaseCon
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase
HBaseCon2.2K views
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase by HBaseCon
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon8.8K views

Similar to Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

High-speed, Reactive Microservices 2017 by
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017Rick Hightower
2.3K views59 slides
MySQL High Availability and Disaster Recovery with Continuent, a VMware company by
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyContinuent
1.8K views40 slides
How leading financial services organisations are winning with tech by
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
394 views34 slides
TechTalkThai-CiscoHyperFlex by
TechTalkThai-CiscoHyperFlexTechTalkThai-CiscoHyperFlex
TechTalkThai-CiscoHyperFlexJarut Nakaramaleerat
196 views35 slides
High-Speed Reactive Microservices - trials and tribulations by
High-Speed Reactive Microservices - trials and tribulationsHigh-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulationsRick Hightower
1.1K views55 slides
Introduction To Cloud Computing by
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud ComputingRinat Shagisultanov
871 views46 slides

Similar to Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight(20)

High-speed, Reactive Microservices 2017 by Rick Hightower
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
Rick Hightower2.3K views
MySQL High Availability and Disaster Recovery with Continuent, a VMware company by Continuent
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
Continuent1.8K views
How leading financial services organisations are winning with tech by MongoDB
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
MongoDB394 views
High-Speed Reactive Microservices - trials and tribulations by Rick Hightower
High-Speed Reactive Microservices - trials and tribulationsHigh-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulations
Rick Hightower1.1K views
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv... by Jamie Kinney
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Jamie Kinney1.8K views
Webinar Slides: Geo-Scale MySQL in AWS by Continuent
Webinar Slides: Geo-Scale MySQL in AWSWebinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWS
Continuent88 views
Azure Databases for PostgreSQL, MySQL and MariaDB by rockplace
Azure Databases for PostgreSQL, MySQL and MariaDBAzure Databases for PostgreSQL, MySQL and MariaDB
Azure Databases for PostgreSQL, MySQL and MariaDB
rockplace119 views
EEDC 2010. Scaling SaaS Applications by Expertos en TI
EEDC 2010. Scaling SaaS ApplicationsEEDC 2010. Scaling SaaS Applications
EEDC 2010. Scaling SaaS Applications
Expertos en TI576 views
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha by GECon_Org Team
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon_Org Team74 views
Majid_Jalili_SRC_2014 by Majid Jalili
Majid_Jalili_SRC_2014Majid_Jalili_SRC_2014
Majid_Jalili_SRC_2014
Majid Jalili194 views
Optimising Service Deployment and Infrastructure Resource Configuration by RECAP Project
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
RECAP Project548 views
VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver by VMworld
VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver
VMworld 2013: How to Replace Websphere Application Server (WAS) with TCserver
VMworld887 views
Improve Customer Experience with Multi CDN Solution by Cloudxchange.io
Improve Customer Experience with Multi CDN SolutionImprove Customer Experience with Multi CDN Solution
Improve Customer Experience with Multi CDN Solution
Cloudxchange.io305 views

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides
hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides

More from HBaseCon(20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by HBaseCon
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon3.9K views
hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: HBase Practice At XiaoMi by HBaseCon
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon1.8K views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 gohbase: Pure Go HBase Client by HBaseCon
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon1.7K views

Recently uploaded

DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...Deltares
9 views24 slides
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
28 views124 slides
Unleash The Monkeys by
Unleash The MonkeysUnleash The Monkeys
Unleash The MonkeysJacob Duijzer
7 views28 slides
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...Marc Müller
38 views62 slides
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsRa'Fat Al-Msie'deen
5 views49 slides
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDeltares
17 views13 slides

Recently uploaded(20)

DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... by Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke28 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports by Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - GeertsemaDSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
DSD-INT 2023 Delft3D FM Suite 2024.01 1D2D - Beta testing programme - Geertsema
Deltares17 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... by Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares10 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski10 views
SUGCON ANZ Presentation V2.1 Final.pptx by Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 views
Software testing company in India.pptx by SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols by Deltares
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
Deltares7 views
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme... by Deltares
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
DSD-INT 2023 Salt intrusion Modelling of the Lauwersmeer, towards a measureme...
Deltares5 views
Tridens DevOps by Tridens
Tridens DevOpsTridens DevOps
Tridens DevOps
Tridens9 views
AI and Ml presentation .pptx by FayazAli87
AI and Ml presentation .pptxAI and Ml presentation .pptx
AI and Ml presentation .pptx
FayazAli8711 views

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

  • 1. Optimizing HBase for Cloud Storage in Microsoft Azure HDInsight Nitin Verma, Pravin Mittal, Maxim Lukiyanov May 24th 2016, HBaseCon 2016
  • 2. About Us Nitin Verma Senior Software Development Engineer – Microsoft, Big Data Platform Contact: nitinver@microsoft Pravin Mittal Principal Software Engineering Manager – Microsoft, Big Data Contact: pravinm@microsoft Maxim Lukiyanov Senior Program Manager – Microsoft, Big Data Platform Contact: maxluk@microsoft
  • 3. Outline  Overview of HBase Service in HDInsight  Customer Case Study  Performance Debugging  Key Takeaways
  • 4. What is HDInsight HBase Service  On demand cluster with few clicks  Out of the box performance  Supports both Linux & Windows  Enterprise SLA of 99.9% availability  Active Health Monitoring via Telemetry  24/7 Customer Support Unique Features  Storage is decoupled from compute  Flexibility to scale-out and scale-in  Write/read unlimited amount of data irrespective of cluster size  Data is preserved and accessible even when cluster is down or deleted
  • 7. Azure Data Lake Storage: Built For Cloud Maxim Lukiyanov, Ashit Gosalia7 Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place). Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance. Low latency Must have low latency for high-frequency operations. Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc. No one analytic framework can work for all data and all types of analysis. Multiple analytic frameworks Details Must be able to store data with all details; aggregation may lead to loss of details. Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark. Reliable Must be highly available and reliable (no permanent loss of data). Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up. All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
  • 8. Customer Case Study and Performance Optimization
  • 9. Microsoft’s Real Time Analytics Platform  Modern self-service telemetry platform  Near real-time analytics  Product health and user engagement monitoring with custom dashboards  Performs large-scale indexing on HDInsight Hbase
  • 10. 4.01 million EVENTS PER SECOND AT PEAK 12.8 petabytes INGESTION PER MONTH >500 million WEEKLY UNIQUE DEVICES AND MACHINES 450 + 2600 PRODUCTION + INT/SANDBOX SELF-SERVE TENANTS __________________________________________ 1,600 STORAGE ACCOUNTS 500,000 AZURE STORAGE TRANSACTIONS / SEC 0 20 40 60 80 100 Feb-21 Feb-22 Feb-23 TBingress/hr Azure Storage traffic 0 1000 2000 3000 4000 5000 6000 7000 Feb-21 Feb-22 Feb-23 Millionstransactions/hr Table Blob Queue
  • 11. Results of Early HBase Evaluation  Customer had very high throughput need for key-value store  Performance was ~10X lower than their requirement  Bigger concern: Throughput didn’t scale from 15 -> 30 nodes
  • 12. Developing a Strategy Understand the architecture Run the workload Collect Metrics & Profile Profile relevant components Make performance fixes Isolate/divide the problem (unit test) Reproduce at lower scale Fixed? Identify Performance Bottlenecks  Automation can save time YES Iterative process
  • 13. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER Cloud Storage30 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes
  • 14. Initial Iterations 1. High CPU utilization with REST being top consumer 2. GZ compression was turned ON in REST 3. Heavy GC activity on REST and REGION processes 4. Starvation of REGION process by REST [busy wait for network IO]  Throughput improved by 10-30% after each iteration
  • 15. Initial Iterations (contd.) 5. REST server threads waiting on network IO Collected TCP dump on all the nodes of cluster REST REGION SERVER 1 REGION SERVER 2 REGION SERVER 3 REGION SERVER 30 BATCH  REST server was fanning-out batch request to all the region servers  Slowest region server governed the throughput  Used SALT_BUCKET scheme to improve the locality SLOWEST Insight from tcpdump
  • 16. Improvement  Throughput improved by 2.75X  Measurement window = ~72 hours  Avg. Cluster CPU utilization = ~60%  But no scaling from 30 node to 60 node cluster   Time to get back to the architecture
  • 17. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes Could GW be a Bottleneck at Such high ingestion rate?
  • 18. We Had Gateway Bottleneck And the guess was right!!  Collected perfmon data on GW nodes  Core#0 was 100% busy  RSS is a trick to balance the DPC’s  Performance improved but not significant  Both CPU and networking was a bottleneck  Time to scale-up the gateway VM size
  • 19. Configuring private gateway  We provisioned custom gateway on large VM’s using NginX  We confirmed that gateway issue was indeed fixed,  Throughput problem was still not solved and continued to give us new puzzles
  • 20. 20 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores Could customer app be a Bottleneck?
  • 21. 21 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores DISCONNECTED RETURN 200
  • 22. New strategy  We divided the data pipeline into two parts and debugged them in isolation 1) Client  Gateway [solved] 2) Rest  Region  WASB [unsolved]  For fast turn-around, we decided to use YCSB for debugging #2  We configured YCSB with characteristics of customer’s workload  We ran YCSB locally inside HBase cluster 22
  • 23. YCSB Experiments  We had suspicion on one of the following two: 1) REST 2) Azure Storage  We isolated the problem by replacing Azure Storage with local SSDs  We then compared the performance of REST v/s RPC  Results:  REST was clearly a bottleneck! 23
  • 24. YCSB Experiments (contd.)  Root cause of bottleneck in REST: • Profiling the REST Servers uncovered multiple threads that were blocked on INFO/DEBUG logging. • Limiting the logging to WARNING/ERROR level dramatically improved the REST server performance and brought it very close to RPC. Sample Stack: Thread 11540: (state = BLOCKED) - org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame) - org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame) - org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame) - org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame) - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame) - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame) - com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame) - com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext) @bci=16, line=205 (Interpreted frame)
  • 25. YCSB Experiments (contd.)  RPC v/s REST after fixing INFO message logging  We could saturate the SSD performance at 160K requests/sec throughput  This confirmed that the bottleneck in REST server was solved 25
  • 26. Back to Customer Workload  After limiting logging level to WARN, throughput improved further by ~5.5X  This was ~15X gain from the point where we started  Customer is happy and use HDInsight Hbase service in production  They are able to meet the throughput goals with enough margin to scale further 26
  • 27. Tools Utilized 27 Category Tools on Windows Tools on Linux for Java Process System Counters: CPU, Memory, IO, Process etc. Perfmon mpstat, iostat, vmstat, sar, nload, glances Networking tcpdump Tcpdump CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof CPU blocking issues Xperf, concurrency visualizer, ppa Jstack Debugging Large Clusters powershell, python expect, bash, awk, python, screen, expect
  • 29. Overcoming Storage Latency  HBase now has MultiWAL and BucketCaching features  Made to minimize the impact of high storage latency  Parallelism and batching are the keys to hide write latency (MultiWAL)  MultiWAL gives higher throughput with lower number of region nodes  We achieve 500K inserts/sec with just 8 small region nodes for an IoT customer 29
  • 30. Overcoming Storage Latency (contd.)  What about read latency?  Caching and Read-Ahead are the keys to overcome read latency  Cache on write helps application that are temporal in nature  HDInsight VM’s are backed with SSD’s  BucketCaching feature can utilize SSD as L2 cache  BucketCaching gives ~20X-30X gain in read performance to our customers
  • 31. Conclusion  The performance issue was quite complex, where bottlenecks were hiding at several layers and components in the pipeline  Deeper engagement with customers helped in optimizing HDInsight HBase service  HDI Team has been actively productizing performance fixes  ADLS, MultiWAL and BucketCache help in minimizing the latency impact 31

Editor's Notes

  1. Understand the architecture and overall pipeline of data movement Monitor the resource utilization of each layer in the pipeline Profile the components with high resource utilization and identify hotspots When resource utilization is low, identify blocking issues (if any) Divide and Conquer – Develop a strategy to isolate the components that could be culprit. Isolation makes debugging easier. Iterative Process!!
  2. Reproduced customer scenario with 30 worker nodes Collected system metrics (CPU, Memory, IO, etc.) on all the worker nodes Started our analysis with HBase CPU consumption was very high on nearly all REST servers We then profiled the REST servers and observed following Compression was ON by default (GZ filter) and was consuming ~70% CPU Heavy GC activity on REST and REGION servers. We had to tune certain GC related parameters REST Server busy wait for network IO’s. Bumping REGION server priority solved that issue Tools like YourKit and JVMTop helped in uncovering efficiency issues
  3. We noticed multiple threads in REST server waiting on network IO We performed a deep networking analysis using TCP Dump and uncovered the locality issue with the key REST server was fanning-out each batch request to almost all the region servers Overall throughput seemed to be governed by the slowest region server We used SALT_BUCKET scheme to improve the locality of batch requests
  4. At this high ingestion rate, we suspected HDI gateway being a bottleneck and confirmed it by collecting perfmon data on both the gateways Core#0 was ~100% on both the gateway nodes Fixing RSS helped, but we started hitting network throttling The network utilization on gateway nodes (A2 instances), surpassed Azure throttling limit
  5. The custom GW gave us ability to debug the ingestion bottlenecks from customer app From custom GW rules, we could directly return success without sending data to HBase cluster We identified a few occasions, where client app wasn’t sending enough load to Hbase After fixing scalability issues in the client application, it was able to send ~2 Gbps data to GW nodes But we couldn’t push 2 Gbps data into HBase cluster The next bottleneck was clearly in HBase
  6. The custom GW gave us ability to debug the ingestion bottlenecks from customer app From custom GW rules, we could directly return success without sending data to HBase cluster We identified scalability bottlenecks in the client app and fixed them with customer’s help The client application, was now able to send ~15X data to GW nodes But we couldn’t push that much into HBase cluster The next bottleneck was clearly in HBase