Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: Multitenancy in HBase


Published on

Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.

Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.

In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.

Published in: Software

HBaseCon 2015: Multitenancy in HBase

  1. 1. HBase Scale and Multi-tenancy @ Y! PRESENTED BY Francis Liu | Vandana Ayyalasomayajula | Virag Kothari |
  2. 2. Outline ▪ HBase @ Y! ▪ Group Favored Nodes ▪ Scaling to 1M Regions and beyond
  3. 3. Y! Grid ▪ Off-Stage Processing ▪ Hosted Service ▪ Multi-tenant
  4. 4. Y! HBase ▪ Hosted Multi-tenant Service ▪ Isolation › Isolated Deployment › Region Server Groups › Namespace ▪ Security › ACLs › Audit Logging ▪ Cross-Colo Replication
  5. 5. HBase Client HBase Client JobTracker Namenode TaskTracker DataNode Namenode RegionServer DataNode RegionServer DataNode RegionServer DataNode HBase MasterZookeeper Quorum HBase Client MR Client M/R Task TaskTracker DataNode M/R Task TaskTracker DataNode MR Task Compute Cluster HBase Cluster Gateway/Launcher Isolated Deployment
  6. 6. Region Server Groups - Overview ▪ Member Tables ▪ Resource Isolation ▪ Flexibility with configuration Group Bar Region Server 5…8 Table3 Table4 Group Foo Region Server 1…4 Table1 Table2 RS1 Table1 Table2 RS2 Table1 Table2 RS3 Table1 Table2 RS4 RS5 Table3 Table4 RS6 Table3 Table4 RS7 Table3 Table4 RS8 Configs
  7. 7. Region Server Groups - Implementation LoadBalancer GroupBasedLoadBalancer GroupAdminEndpoint GroupMasterObserver HMaster FilterBy Group foo bar GroupInfoManager Group Table Group ZNode
  8. 8. Namespace ▪ Analogous to Database ▪ Full Table Name: <table namespace>:<table name> ▪ i.e. my_ns:my_table ▪ Reserved namespaces › default – tables with no explicit namespace › hbase – system tables (ie hbase:meta, hbase:acl, etc) ▪ Table Path: /<hbaseRoot>/data/<namespace>/<tableName>
  9. 9. Namespace ▪ Default Region Server Group ▪ Quota › Max Tables › Max Regions ▪ Per Tenant
  10. 10. Replication ▪ Sinks are randomly picked ▪ Sources recover any queue ▪ Shared RPC Quality of Protection config source:
  11. 11. Replication + Group ▪ Region Server Group Aware ▪ Rule based API › Source: {namespace},[Table], [CF] › Slave: {Peer} › Effective Time Group Foo Group Bar Table1 Table2 Group Foo Table1 Table2
  12. 12. Replication + Thrift ▪ Encryption via SASL ▪ 0.94 <-> 0.96+ interoperability
  13. 13. Favored Nodes ▪ What are Favored Nodes ? › While writing data, we can pass a set of preferred hosts to HDFS client to replicate data. › preferred hosts => “Favored Nodes” › Usually 3 hosts : primary, secondary, tertiary. › Constraint: Primary host on one rack , secondary and tertiary hosts on different rack. ▪ Favored Nodes of regions are scattered across various groups. › No guarantees about data locality within a region server group.
  14. 14. Example RS7 DN7 RS Group - B RS5 DN5 DN6 RS6 RS8 DN8 RS3 DN3 RS Group - A RS1 DN1 DN2 RS2 RS4 DN4
  15. 15. Example ▪ Locality is lost when region server RS1 dies. RS7 DN7 RS Group - B RS5 DN5 DN6 RS6 RS8 DN8 RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS dies
  16. 16. ▪ Fix the data locality problem by › choosing favored nodes within region server group › Assigning regions to only favored nodes Group Aware Favored Nodes RS7 DN7 RS Group - B RS5 DN5 DN6 RS6 RS8 DN8 RS3 DN3 RS Group - A RS1 DN1 DN2 RS2 RS4 DN4
  17. 17. FavoredGroupLoadBalancer ▪ Region server groups aware ▪ Region assignment on favored nodes ▪ Region balancing done using Stochastic Load Balancer ▪ Favored Node Management › Generate favored nodes for regions › Favored nodes are inherited during a region split/merge events. › Favored nodes do not change unless required.
  18. 18. Favored Node Management APIs ▪ Redistribute › Ability to expand region block replicas to newly added nodes. › Change favored nodes of regions such that replicas spread to newly added nodes RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 redistribute New node added
  19. 19. Favored Node Management APIs ▪ Complete_Redistribute › Ability to recreate entire set of favored nodes in balanced fashion › Balances the replica load evenly among all the nodes RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 complete redistribute RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 Host with least number of replicas
  20. 20. Enhancements ▪ Improvements to Stochastic Load Balancer (HBASE-13376) ▪ Improvements to Region Placement Maintainer Tool › Ability to view locality of region on each of its FN. › Ability to view primary, secondary and tertiary node distribution of region servers. ▪ Hadoop JIRA’s › HDFS-7300 › HDFS-7795 ▪ Configuration changes made on Hadoop side › Set “dfs.namenode.replication.considerLoad” to false in small clusters
  21. 21. Scaling to 1M and beyond (HBASE-11165) ▪ Store Petabytes of data ▪ Support mixed workload (batch and near real-time) ▪ Performance › Latency, throughput ▪ Operability › Load balancing, compactions, etc.
  22. 22. Experience at Scale ▪ Web Crawl Cache › ~2.3PB Table › 80GB regions -> 20GB regions › Batch workload ▪ Hot Regions ▪ Large compactions (Write amplification) ▪ Longer failover time ▪ Less Parallel/Imbalanced MapReduce Tasks ▪ Large MapReduce tasks
  23. 23. Scaling Region Count ▪ Master Region Management › Creation, Assign, Balance, etc. › Meta table ▪ Metadata › HDFS scalability › Zookeeper › Region Server density
  24. 24. RSMaster Meta region Zookeeper Region 1 Region 2 Region 1 Region 2 RS RS Assignment communication Write ops Observations ▪ Assignment › ZK assignment - complex and more storage › High CPU usage on master ▪ Single hot meta › 7GB in size for 1M › Master writing at 400 ops/second › Longer scanning times ▪ HDFS ▪ Longer directory creation time
  25. 25. User region 1 User region 2 RS Master ▪ Assignment › Zk less assignment (HBASE-11059) › Simpler › No involvement of Zk › Unlock region states (HBASE-11290) Enhancements - Assignment User region 1 User region 2 User region Meta region RS User region 1 User region 2 RS
  26. 26. ▪ Split meta (HBASE-11288) › Distributed IO load › Distributed caching › Shorter scan time › Distributed compaction Meta region User region RS Master Meta region User region User region Meta region RS Meta region User region RS Enhancements – Split Meta
  27. 27. Region dir creation time - 4k buckets 1M regions 5M 10M normal table 20 mins 4 hours 23 minutes Doesn’t finish humongous table 15 mins 48 secs 1 hour 27 minutes 2hr 53 minutes Enhancements - Hierarchical region dir ● Scaling namenode operations - Table dir has millions of region files ● Approach - Buckets within table directory ● E.g 3 letters of bucket names gives 4k buckets
  28. 28. HBaseCon 2014 Thank You! (We’re Hiring)