Keynote: Apache HBase at Yahoo! Scale

554 views

Published on

Yahoo has long been involved in HBase and its community. In 2013, HBase was offered as a hosted service at Yahoo. Since then, adoption has grown rapidly., and today, HBase is used by numerous teams across the company, helping to enable a diverse set of use cases ranging from near real-time processing to data warehousing.
This was made possible thanks to HBase along with some enhancements to support multi-tenancy and scale. As our clusters continue to grow and use cases become more demanding we are working towards supporting a million regions in a single cluster.
In this keynote, we’ll paint a picture of where Yahoo! is today and the enhancements we have been working on to reach today’s scale as well as supporting a million regions and beyond.

Published in: Software

Keynote: Apache HBase at Yahoo! Scale

  1. 1. Apache HBase at Yahoo Scale PUSHING THE LIMITS Francis Liu HBase Yahoo
  2. 2. HBase @
  3. 3. HBase @ Y! • Hosted multi-tenant clusters • 3 Production • 3 Sandbox • HBase-only • Off-stage Use Cases • Internal 0.98 releases • Security HBase Client HBase Client Resource Mgr Namenode TaskTracker DataNode Namenode RegionServer DataNode RegionServer DataNode RegionServer DataNode HBase Master Zookeeper Quorum HBase Client MR Client M/R Task TaskTracker DataNode M/R Task Node Mgr DataNode MR Task Compute Cluster HBase Cluster Gateway/Launcher Rest Proxy HTTP Client
  4. 4. Workload Jungle
  5. 5. Multi-tenancy
  6. 6. Multi-tenancy at Scale • 35 Tenants • 800 RegionServers • 300k regions • RS Peak 115k requests/sec
  7. 7. Divide and Conquer RS RS…Group A RS RS RS…Group B RS RS RS…Group C RS RS RS…Group D RS RS RS…Group E RS
  8. 8. RegionServer Groups • Group Membership • Table • RegionServer • Coarse Isolation • Group customization • Namespace integration
  9. 9. Multi-tenancy at Scale • 800 RegionServers • 40 namespaces • 40 Region server groups • 4 to 100s of servers • Up to 2000+ regions per server • ~1 week rolling upgrade
  10. 10. Scaling to 10’s of PBs (and Beyond) • Scale to Millions of Regions (and Beyond) • Avoid large regions • Data Locality • Network utilization • Datanode load • Performance
  11. 11. • Region directories under table directory • HDFS data structure bottleneck • Namenode Hard Limit of ~6.7 Million Filesystem Layout
  12. 12. Create file ops for 5M Region Table Filesystem Layout
  13. 13. • Hierarchical Table Layout Filesystem Layout
  14. 14. Performance Comparison Test 1M Regions 5M Regions 10M Regions Normal Table 20 mins 4 hours 23 mins DNF Humongous 15 mins 48 secs 1 hour 27 mins 2 hours 53 mins Region directory creation time
  15. 15. ▪ Lock Thrashing ▪ ZK bottlenecks › List/Mutate Millions of Znodes › Notification firehose ▪ State is kept in 3 places › Cached in master › Zookeeper › Meta ZK Region Assignment RS Master Zookeeper Meta Region 1 Region 2 RS
  16. 16. ZKLess Region Assignment ▪ ZK no longer involved ▪ Master approves all assignment ▪ State is persisted only in Meta ▪ State is updated by the Master Meta region RS Master Region 1 Region 2 RS
  17. 17. Performance Comparison Test Latency ZK 1hr 16mins ZK w/o force-sync 11mins ZKLess 11mins Assignment Time for 1 Million Regions
  18. 18. Single Meta Region ▪ Meta not splittable ▪ Large compactions ▪ Longer failover times
  19. 19. Splittable Meta Table ▪ Scale Horizontally › I/O load › Caching › RPC Load
  20. 20. Performance Comparison Scan Meta Assignment Total 1 Meta / 1 RS 56min 19.79min 75.79min 1 Meta / 1 RS 58.63min 28.16min 86.79min 32 Meta / 3 RS 2.91min 12.56min 15.47min 32 Meta / 3 RS 3.6min 12.54min 16.4min Assignment Time for 3 Million Regions
  21. 21. Data Locality ▪ HDFS › Hadoop Distributed Filesystem ▪ Region Server › Serves Regions › Locality of a Region’s Data blocks
  22. 22. Favored Nodes ▪ HDFS › Dictate block placement on file creation ▪ HBase › Partially completed in Apache HBase › Select 3 favored nodes per Region › 1 Node on-rack, 2 Node off-rack › Restrict Region Assignment
  23. 23. Favored Nodes – Fault Testing Control Favored Nodes
  24. 24. THANK YOU Icon Courtesy – iconfinder.com (under Creative Commons)

×