Since 2013, Yahoo! has been successfully running multi-tenant HBase clusters. Our tenants run applications ranging from real-time processing (e.g. content personalization, Ad targeting) to operational warehouses (e.g. advertising, content). Tenants are guaranteed an adequate level of resource isolation and security. This is achieved through the use of open source and in-house developed HBase features such as region server groups, group-based replication, and group-based favored nodes.
Today, with the increase in adoption and new use cases, we are working towards scaling our HBase clusters to support petabytes of data without compromising on performance and operability. A common tradeoff when scaling a cluster to this size is to increase the size of a region, thus avoiding the problem of having too many regions on a cluster. However, large regions negatively affect the performance and operability of a cluster mainly because region size determines the following: 1. granularity for load distribution, and 2. amount of write amplification due to compaction. Thus we are working towards enabling an HBase cluster to host at least a million regions.
In this presentation, we will walk through the key features we have implemented as well as share our experiences working on multi-tenancy and scaling the cluster.