Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

7,177 views

Published on

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Jonathan Hsieh
Dima Spivak
Cloudera

Published in: Technology

Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments

  1. 1. 1© Cloudera, Inc. All rights reserved. Hadoop Summit EU, 16 Apr 2015 Jonathan Hsieh| HBase Tech Lead @ Cloudera, Apache HBase PMC Dima Spivak | HBase QE Lead @ Cloudera Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
  2. 2. 2© Cloudera, Inc. All rights reserved. • Jonathan Hsieh • Tech Lead, HBase Team @ Cloudera • Apache HBase PMC Member • Apache Flume founder • Contact • jon@cloudera.com • @jmhsieh • Dima Spivak • QE Lead, HBase Team @Cloudera • Contact • dspivak@cloudera.com • @dimaspivak Who are we? 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  3. 3. 3© Cloudera, Inc. All rights reserved. What is Apache HBase? Apache HBase is an consistent, low latency, random access, non- relational database built on Apache Hadoop. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  4. 4. 4© Cloudera, Inc. All rights reserved. Some HBase Contributors, Users, and Providers
  5. 5. 5© Cloudera, Inc. All rights reserved. Challenges as usage increases • How does one: • Isolate different application workloads. • Share datasets between different workloads. • Prepare for geographic redundancy and availability. • Manage cluster migrations. • Test and prototype (multi-)cluster deployments. • There are multiple solutions! 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  6. 6. 6© Cloudera, Inc. All rights reserved. Multiple Multi- Solutions Using more than one cluster for an application. Using one cluster for more than one application. Using one machine to run [one or more] multi-node clusters. Multi-Cluster Multi-Tenant Multi-Container 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  7. 7. 7© Cloudera, Inc. All rights reserved. Multi-Cluster Safety in numbers
  8. 8. 8© Cloudera, Inc. All rights reserved. Multi-Cluster Deployments • Deploy multiple HBase cluster instances. • Motivation: • Isolating different workloads from each other. • Geographic disaster recovery, redundancy, and availability. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  9. 9. 9© Cloudera, Inc. All rights reserved. Isolation • Isolation is usually done in were many apps share one data center. • Two different workloads on the same dataset. • Perform latency-sensitive workloads on the same set of data as analytic MR workload. • Two disjoint applications workloads and datasets. • Deploy OpenTSDB on HBase in same data center, but as cluster to monitor production HBase cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  10. 10. 10© Cloudera, Inc. All rights reserved. Isolation: Operational with Analytical access pattern HBase Client Get, Scan HBase Replication low latency Isolated from full scans high throughput MapReduce HBase Scanner HBase Client Put, Incr, Append Bulk Import HBase Client HBase Replication high throughput 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  11. 11. 11© Cloudera, Inc. All rights reserved. Geographic Recovery, Redundancy, and Availability • Run multiple HBase clusters in multiple data centers. • Often using “Podding” schemes. • Primarily for backups of data in case data center outages. • Locality for Performance. • Locality for Compliance. • Availability while a datacenter is down. • Deploy with: • HBase replication - master master, master slave. • Multicluster clients. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  12. 12. 12© Cloudera, Inc. All rights reserved. Master-Master Replication logs logs logs Replicating data reduces chances of data loss. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  13. 13. 13© Cloudera, Inc. All rights reserved. HBase Multi-Cluster Client • High Availability with Eventual Consistency when using replication. • Simple implementation. • Hedged operations. If primary takes too long, go to the failover cluster. • Same HConnection interface just a different factory HConnectionManagerMultiClusterWrapper.get Connection(conf) • HBase.MCC to be available in Cloudera Labs. Work by Ted Malaska (Cloudera Solution Architect) https://github.com/tmalaska/HBase.MCC 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  14. 14. 14© Cloudera, Inc. All rights reserved. Multi-Tenant We’re all in this together
  15. 15. 15© Cloudera, Inc. All rights reserved. Multi-tenant deployments • Deploy multiple workloads on one cluster. • Motivation: • Better Resource utilization. • Cost efficiency. • Simpler operations. • Shared data. • Multiple services on one cluster. • Running HBase, Spark, Impala and MR on the same cluster. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  16. 16. 16© Cloudera, Inc. All rights reserved. Security and namespaces • Challenges: • Resource management, prioritizing and fairness. • Authentication and Authorization. • Mechanisms: • HBase Security – Authentication, Authorization for commands via ACLs. • Namespaces – Isolate administrative domains for ACLs. • Proxy Impersonation – Thrift proxy doAs, and REST proxy doAs. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  17. 17. 17© Cloudera, Inc. All rights reserved. Request Throttling • Idea: some tables or users get a limited budget of ops or throughput, while others do not. • Multiple workloads on one dataset. • Production/real-time user: unthrottled. • Analytic/adhoc workloads user: throttled. • Caveat: if all users throttled, we may not use all machine resources. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  18. 18. 18© Cloudera, Inc. All rights reserved. Request Scheduling • Idea: gets should have high priority while scans should get deprioritized the more they are used (HBASE-10994). • Multiple workloads on one dataset . • Production real-time gets: immediately scheduled. • Analytic scan workloads: delay scheduled. • All resources are used. • Caveat: requires manual tuning . 1 1 2 1 1 3 1 1 1 21 1 31 Delayed by long scan requests Rescheduled so new request get priority 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  19. 19. 19© Cloudera, Inc. All rights reserved. Performance Isolation inside a cluster • Region Server Groups (under review). • Limit performance impact load on one table has on others (HBASE-6721). • Multiple workloads on multiple data sets on one HBase cluster. • Two separate apps on one cluster. Mixed workload Isolated workload 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  20. 20. 20© Cloudera, Inc. All rights reserved. • Today, the easiest strategy for isolating latency-sensitive HBase deployment from other services is static partitioning. • Future: • Improve IO isolation via YARN/Slider/Mesos. • Separate HBase actions into separate processes. • e.g. externalize compaction for better resource management. Service Isolation Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN Yarn NM/MR HBase RS impalad HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN HBase RS HDFS DN HBase RS HDFS DN Yarn NM/MR impalad HDFS DN Yarn NM/MR impalad HDFS DN Multi service deployment Statically partitioned service deployment 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  21. 21. 21© Cloudera, Inc. All rights reserved. Multi-Container My name is Jonah
  22. 22. 22© Cloudera, Inc. All rights reserved. Multi-container deployments • Run a distributed HBase cluster on a single host. • Testing applications. • Use cases requiring quick cluster stand-up. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  23. 23. 23© Cloudera, Inc. All rights reserved. Linux containers • cgroups (2.6.24+). • Isolating resources (memory, CPU, networking). • Namespace isolation (filesystems, process trees). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  24. 24. 24© Cloudera, Inc. All rights reserved. Virtual Machines vs Linux Containers Hypervisor Host Operating System Guest OS Guest OS Guest OS Guest OS Libraries Libraries Libraries Libraries User processes User processes User processes User processes Virtual Machines Host Operating System Libraries User processes User processes User processes User processes Containers 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  25. 25. 25© Cloudera, Inc. All rights reserved. Docker • User front-end for containers. • Container management (start, stop, pause). • docker run • Images (templates for containers). • docker commit • Registries (repository for images). • docker push 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  26. 26. 26© Cloudera, Inc. All rights reserved. Integration testing • Automate long-running tests from hbase-it module. • $ hbase org.apache.hadoop.hbase.IntegrationTest… • Integration with fault injection framework (Chaos Monkey). 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  27. 27. 27© Cloudera, Inc. All rights reserved. Starting container cluster DNS server dnsserver (10.0.0.2) Node node-1 (10.0.0.3) Node node-2 (10.0.0.4) Start cluster Master Slave Node node-3 (10.0.0.5) Slave Node node-4 (10.0.0.6) Slave 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  28. 28. 28© Cloudera, Inc. All rights reserved. Automation • Replace fragile infrastructure. • Setup distributed cluster as part of test execution. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  29. 29. 29© Cloudera, Inc. All rights reserved. In progress • Extend this workflow to upstream Apache HBase (HBASE-12721) • Upstream integration testing (builds.apache.org) • Multi-cluster use cases (e.g. MCC, replication) • Upgrades 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  30. 30. 30© Cloudera, Inc. All rights reserved. Conclusions Multi multi multi
  31. 31. 31© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups.
  32. 32. 32© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices.
  33. 33. 33© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talk 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak Goal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing.
  34. 34. 34© Cloudera, Inc. All rights reserved. Summary • Fancy table that summarizes our talkGoal Multi Cluster Multi Tenant Multi-Container Isolate workloads One cluster per workload. Region Server Groups. cgroups. Multiple workloads on same dataset (real-time vs analytic workload) Separate cluster per workload. Request throttling, request scheduling. Containers as “VMs” or microservices. Reliability and Availability Disaster recovery, master-master replication, multi-cluster client. Multiple tables with Region Server Groups. More realistic testing. Cost Savings Disaster recovery. One cluster, multiple use cases. One machine, multiple nodes. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  35. 35. 35© Cloudera, Inc. All rights reserved. Futures • We are seeing more and more deployments that are multi cluster and/or multi- tenant. • Traditional workflows are giving way to hybrid ones • More knobs to turn to optimize for performance and value • Multi-container deployments are a way forward to make prototyping and testing these deployments easier. 16 Apr 2015. Hadoop Summit EU '15. Hsieh and Spivak
  36. 36. 36© Cloudera, Inc. All rights reserved. Thank you!
  37. 37. 37© Cloudera, Inc. All rights reserved. HBaseCon 2015 is Coming! Thurs., May 7, in San Francisco Presentations from the world’s biggest HBase operators: Bloomberg, Dropbox, eBay, Facebook, Google, Pinterest, Xiaomi, Yahoo!, more! Seats are limited; register at hbasecon.com Community Sponsor

×