Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

435 views

Published on

Nitin Verma, Pravin Mittal, and Maxim Lukiyanov (Microsoft)

This session presents our success story of enabling a big internal customer on Microsoft Azure’s HBase service along with the methodology and tools used to meet high-throughput goals. We will also present how new features in HBase (like BucketCache and MultiWAL) are helping our customers in the medium-latency/high-bandwidth cloud-storage scenario.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
435
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Understand the architecture and overall pipeline of data movement

    Monitor the resource utilization of each layer in the pipeline

    Profile the components with high resource utilization and identify hotspots

    When resource utilization is low, identify blocking issues (if any)

    Divide and Conquer – Develop a strategy to isolate the components that could be culprit. Isolation makes debugging easier.

    Iterative Process!!
  • Reproduced customer scenario with 30 worker nodes
    Collected system metrics (CPU, Memory, IO, etc.) on all the worker nodes
    Started our analysis with HBase
    CPU consumption was very high on nearly all REST servers
    We then profiled the REST servers and observed following
    Compression was ON by default (GZ filter) and was consuming ~70% CPU
    Heavy GC activity on REST and REGION servers. We had to tune certain GC related parameters
    REST Server busy wait for network IO’s. Bumping REGION server priority solved that issue
    Tools like YourKit and JVMTop helped in uncovering efficiency issues
  • We noticed multiple threads in REST server waiting on network IO
    We performed a deep networking analysis using TCP Dump and uncovered the locality issue with the key
    REST server was fanning-out each batch request to almost all the region servers
    Overall throughput seemed to be governed by the slowest region server
    We used SALT_BUCKET scheme to improve the locality of batch requests

  • At this high ingestion rate, we suspected HDI gateway being a bottleneck and confirmed it by collecting perfmon data on both the gateways
    Core#0 was ~100% on both the gateway nodes
    Fixing RSS helped, but we started hitting network throttling
    The network utilization on gateway nodes (A2 instances), surpassed Azure throttling limit
  • The custom GW gave us ability to debug the ingestion bottlenecks from customer app
    From custom GW rules, we could directly return success without sending data to HBase cluster
    We identified a few occasions, where client app wasn’t sending enough load to Hbase
    After fixing scalability issues in the client application, it was able to send ~2 Gbps data to GW nodes
    But we couldn’t push 2 Gbps data into HBase cluster
    The next bottleneck was clearly in HBase
  • The custom GW gave us ability to debug the ingestion bottlenecks from customer app

    From custom GW rules, we could directly return success without sending data to HBase cluster

    We identified scalability bottlenecks in the client app and fixed them with customer’s help

    The client application, was now able to send ~15X data to GW nodes

    But we couldn’t push that much into HBase cluster

    The next bottleneck was clearly in HBase
  • Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

    1. 1. Optimizing HBase for Cloud Storage in Microsoft Azure HDInsight Nitin Verma, Pravin Mittal, Maxim Lukiyanov May 24th 2016, HBaseCon 2016
    2. 2. About Us Nitin Verma Senior Software Development Engineer – Microsoft, Big Data Platform Contact: nitinver@microsoft Pravin Mittal Principal Software Engineering Manager – Microsoft, Big Data Contact: pravinm@microsoft Maxim Lukiyanov Senior Program Manager – Microsoft, Big Data Platform Contact: maxluk@microsoft
    3. 3. Outline  Overview of HBase Service in HDInsight  Customer Case Study  Performance Debugging  Key Takeaways
    4. 4. What is HDInsight HBase Service  On demand cluster with few clicks  Out of the box performance  Supports both Linux & Windows  Enterprise SLA of 99.9% availability  Active Health Monitoring via Telemetry  24/7 Customer Support Unique Features  Storage is decoupled from compute  Flexibility to scale-out and scale-in  Write/read unlimited amount of data irrespective of cluster size  Data is preserved and accessible even when cluster is down or deleted
    5. 5.        
    6. 6. Azure Data Lake Storage: Built For Cloud Maxim Lukiyanov, Ashit Gosalia7 Secure Must be highly secure to prevent unauthorized access (especially as all data is in one place). Native format Must permit data to be stored in its ‘native format’ to track lineage and for data provenance. Low latency Must have low latency for high-frequency operations. Must support multiple analytic frameworks—Batch, Real-time, Streaming, Machine Learning, etc. No one analytic framework can work for all data and all types of analysis. Multiple analytic frameworks Details Must be able to store data with all details; aggregation may lead to loss of details. Throughput Must have high throughput for massively parallel processing via frameworks such as Hadoop and Spark. Reliable Must be highly available and reliable (no permanent loss of data). Scalable Must be highly scalable. When storing all data indefinitely, data volumes can quickly add up. All sources Must be able ingest data from a variety of sources-LOB/ERP, Logs, Devices, Social NWs etc.
    7. 7. Customer Case Study and Performance Optimization
    8. 8. Microsoft’s Real Time Analytics Platform  Modern self-service telemetry platform  Near real-time analytics  Product health and user engagement monitoring with custom dashboards  Performs large-scale indexing on HDInsight Hbase
    9. 9. 4.01 million EVENTS PER SECOND AT PEAK 12.8 petabytes INGESTION PER MONTH >500 million WEEKLY UNIQUE DEVICES AND MACHINES 450 + 2600 PRODUCTION + INT/SANDBOX SELF-SERVE TENANTS __________________________________________ 1,600 STORAGE ACCOUNTS 500,000 AZURE STORAGE TRANSACTIONS / SEC 0 20 40 60 80 100 Feb-21 Feb-22 Feb-23 TBingress/hr Azure Storage traffic 0 1000 2000 3000 4000 5000 6000 7000 Feb-21 Feb-22 Feb-23 Millionstransactions/hr Table Blob Queue
    10. 10. Results of Early HBase Evaluation  Customer had very high throughput need for key-value store  Performance was ~10X lower than their requirement  Bigger concern: Throughput didn’t scale from 15 -> 30 nodes
    11. 11. Developing a Strategy Understand the architecture Run the workload Collect Metrics & Profile Profile relevant components Make performance fixes Isolate/divide the problem (unit test) Reproduce at lower scale Fixed? Identify Performance Bottlenecks  Automation can save time YES Iterative process
    12. 12. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER Cloud Storage30 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes
    13. 13. Initial Iterations 1. High CPU utilization with REST being top consumer 2. GZ compression was turned ON in REST 3. Heavy GC activity on REST and REGION processes 4. Starvation of REGION process by REST [busy wait for network IO]  Throughput improved by 10-30% after each iteration
    14. 14. Initial Iterations (contd.) 5. REST server threads waiting on network IO Collected TCP dump on all the nodes of cluster REST REGION SERVER 1 REGION SERVER 2 REGION SERVER 3 REGION SERVER 30 BATCH  REST server was fanning-out batch request to all the region servers  Slowest region server governed the throughput  Used SALT_BUCKET scheme to improve the locality SLOWEST Insight from tcpdump
    15. 15. Improvement  Throughput improved by 2.75X  Measurement window = ~72 hours  Avg. Cluster CPU utilization = ~60%  But no scaling from 30 node to 60 node cluster   Time to get back to the architecture
    16. 16. Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM HDI Gateway & Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x large worker nodes 1000+ cores Medium Latency High Bandwidth REST REQUEST Batch Size = 1000 Row Size = 1400 bytes Could GW be a Bottleneck at Such high ingestion rate?
    17. 17. We Had Gateway Bottleneck And the guess was right!!  Collected perfmon data on GW nodes  Core#0 was 100% busy  RSS is a trick to balance the DPC’s  Performance improved but not significant  Both CPU and networking was a bottleneck  Time to scale-up the gateway VM size
    18. 18. Configuring private gateway  We provisioned custom gateway on large VM’s using NginX  We confirmed that gateway issue was indeed fixed,  Throughput problem was still not solved and continued to give us new puzzles
    19. 19. 20 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores Could customer app be a Bottleneck?
    20. 20. 21 Pipelineofdataingestion VM VM VM VM VM VM VM VM Data Ingestion Client App [PaaS] Multiple Storage Accounts and Queues 300 VM’s VM VM Programmed NGINX as Gateway and Load Balancer REST SERVERS REGION SERVERS HBASE CLUSTER WASB 60 x D14 worker nodes 1040 cores DISCONNECTED RETURN 200
    21. 21. New strategy  We divided the data pipeline into two parts and debugged them in isolation 1) Client  Gateway [solved] 2) Rest  Region  WASB [unsolved]  For fast turn-around, we decided to use YCSB for debugging #2  We configured YCSB with characteristics of customer’s workload  We ran YCSB locally inside HBase cluster 22
    22. 22. YCSB Experiments  We had suspicion on one of the following two: 1) REST 2) Azure Storage  We isolated the problem by replacing Azure Storage with local SSDs  We then compared the performance of REST v/s RPC  Results:  REST was clearly a bottleneck! 23
    23. 23. YCSB Experiments (contd.)  Root cause of bottleneck in REST: • Profiling the REST Servers uncovered multiple threads that were blocked on INFO/DEBUG logging. • Limiting the logging to WARNING/ERROR level dramatically improved the REST server performance and brought it very close to RPC. Sample Stack: Thread 11540: (state = BLOCKED) - org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) @bci=12, line=204 (Compiled frame) - org.apache.log4j.Category.forcedLog(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=14, line=391 (Compiled frame) - org.apache.log4j.Category.log(java.lang.String, org.apache.log4j.Priority, java.lang.Object, java.lang.Throwable) @bci=34, line=856 (Compiled frame) - org.apache.commons.logging.impl.Log4JLogger.debug(java.lang.Object) @bci=12, line=155 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.update(org.apache.hadoop.hbase.rest.model.CellSetModel, boolean) @bci=580, line=225 (Compiled frame) - org.apache.hadoop.hbase.rest.RowResource.put(org.apache.hadoop.hbase.rest.model.CellSetModel, javax.ws.rs.core.UriInfo) @bci=60, line=318 (Interpreted frame) - sun.reflect.GeneratedMethodAccessor27.invoke(java.lang.Object, java.lang.Object[]) @bci=48 (Interpreted frame) - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame) - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=57, line=606 (Compiled frame) - com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(java.lang.reflect.Method, java.lang.Object, java.lang.Object[]) @bci=3, line=60 (Interpreted frame) - com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(java.lang.Object, com.sun.jersey.api.core.HttpContext) @bci=16, line=205 (Interpreted frame)
    24. 24. YCSB Experiments (contd.)  RPC v/s REST after fixing INFO message logging  We could saturate the SSD performance at 160K requests/sec throughput  This confirmed that the bottleneck in REST server was solved 25
    25. 25. Back to Customer Workload  After limiting logging level to WARN, throughput improved further by ~5.5X  This was ~15X gain from the point where we started  Customer is happy and use HDInsight Hbase service in production  They are able to meet the throughput goals with enough margin to scale further 26
    26. 26. Tools Utilized 27 Category Tools on Windows Tools on Linux for Java Process System Counters: CPU, Memory, IO, Process etc. Perfmon mpstat, iostat, vmstat, sar, nload, glances Networking tcpdump Tcpdump CPU Profiling kernrate, f1 sample, xperf YourKit, jvmtop, jprof CPU blocking issues Xperf, concurrency visualizer, ppa Jstack Debugging Large Clusters powershell, python expect, bash, awk, python, screen, expect
    27. 27. New performance features in HBase 28
    28. 28. Overcoming Storage Latency  HBase now has MultiWAL and BucketCaching features  Made to minimize the impact of high storage latency  Parallelism and batching are the keys to hide write latency (MultiWAL)  MultiWAL gives higher throughput with lower number of region nodes  We achieve 500K inserts/sec with just 8 small region nodes for an IoT customer 29
    29. 29. Overcoming Storage Latency (contd.)  What about read latency?  Caching and Read-Ahead are the keys to overcome read latency  Cache on write helps application that are temporal in nature  HDInsight VM’s are backed with SSD’s  BucketCaching feature can utilize SSD as L2 cache  BucketCaching gives ~20X-30X gain in read performance to our customers
    30. 30. Conclusion  The performance issue was quite complex, where bottlenecks were hiding at several layers and components in the pipeline  Deeper engagement with customers helped in optimizing HDInsight HBase service  HDI Team has been actively productizing performance fixes  ADLS, MultiWAL and BucketCache help in minimizing the latency impact 31
    31. 31. Thank You! 32

    ×