Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity


Published on

Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)

In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.

Published in: Software, Technology
  • Be the first to comment

Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

  1. 1. Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity PRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Anish Mathew⎪ May 5, 2014
  2. 2. Agenda Topic Speaker(s) Overview of Hadoop stack and Grid Infrastructure at Yahoo Rajiv Chittajallu Application onboarding on Multi-Tenant HBase Dheeraj Kapur Automation for Compaction/Splits and Monitoring Anish Mathew Q&A All Presenters
  3. 3. Hadoop at Yahoo
  4. 4. Hadoop Usage at Yahoo HBaseCon 2014 Browsers Mobile Devices Web Crawl Knowledge Graph 3rd Party Yahoo Grid Business Intelligence Tools (e.g. Tableau, MicroStrategy) Data Collection Asynchronous Data Processing Synchronous Serving User Events WCC Entity Feeds Content Feeds Source of truth for data* Serving Systems Home Run Search Mail Mobile Flickr Media Stream Ads Native Ads Display Ads Content systems Y! NoSql …
  5. 5. Grid Infrastructure at Yahoo HBaseCon 2014 A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for large scale data processing
  6. 6. Grid Stack HBaseCon 2014
  7. 7. Deployment Model HBaseCon 2014 DataNode NodeManager NameNode RM DataNodes RegionServers NameNode HBase Master Nimbus Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat
  8. 8. Network Architecture – 1G to node HBaseCon 2014
  9. 9. Network Architecture – 10G Node tor VC0 spine0 leaf0 spine1 spine7 . . leaf1 leaf31 . . VC1 spine0 leaf0 spine1 spine7 . . leaf1 leaf31 . . tor 40G (or 4 x 10G) Host Host Host . . . . . . 10G
  10. 10. Hbase @ Yahoo HBaseCon 2014 • 7 clusters, 1500 region servers, 6 PB of data • Diverse use cases, 500+ Tables, 100k regions • Rolling Major compaction & Split and Group Rebalancing • RegionServer groups, namespaces and multi region config System
  11. 11. Challenges HBaseCon 2014 • Customer onboarding and provisioning • Access management and Table provisioning • Deployments • Customizing group configs • Rolling Major Compaction and Splits • Group Balancing
  12. 12. Use Cases
  13. 13. Use Cases HBaseCon 2014 Search ▪ Web Cache ▪ Query Analysis ▪ Local Listings ▪ Analytics Y! Mail ▪ Anti-spam ▪ Log Analytics ▪ Metadata Mgmt. Cloud Platforms ▪ Performance ▪ Monitoring ▪ OpenStack Consumer Platforms ▪ CMS ▪ Social Data Online Ads ▪ Traffic Protection ▪ Ads Data Mgmt. P13N ▪ Content P13N ▪ Ad targeting Mobile ▪ Notifications ▪ Flickr Sales ▪ eCommerce Yahoo’s Global Business
  14. 14. Web Crawl Cache HBaseCon 2014 Developers/ Scientists Poller Fetcher Ingestor Extruder Processing Random Read poll fetch launch write Compute Clusters NM DN NM DN NM DN.....HDFS NN YARN RM Clusters RS DN RS DN RS DN..... HDFSNN HBaseHM r/w insert scan
  15. 15. Customer Onboarding & Multi Tenancy
  16. 16. Customer Onboarding & Provisioning HBaseCon 2014 • Two identical environments (Prod and Non-Prod) • Applications are on boarded to Non-Prod for performance/Integration testing • Once ready, provisioned on prod • Performance results help in production onboarding
  17. 17. Namespaces HBaseCon 2014 • Allow tenants to create/drop/modify their own tables • Only super admin used to do it before • Quota Management • Security administration • Commands : alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
  18. 18. RegionServer Groups HBaseCon 2014 • Missing QoS in Hbase 0.94 • Isolation is required in Multi-tenant env • Multi configs are required for different apps • Commands : group_add, group_balance, group_get, group_list, group_list_tables, group_list_transitions, group_move_servers, group_move_tables, group_of_server, group_of_table, group_remove
  19. 19. Multi Region Configs HBaseCon 2014 SVN Jenkins Build Farm Master Repository Slave Repository Colo B Slave Repository Colo A HBase Cluster A HBase Cluster B Fetch Group List Generate Multi Configs Merge Default Config & Push multi config Sync Configs Download Host Maps and Multi Region Config
  20. 20. Compaction HBaseCon 2014 • Minor & Major • Minor picks up couple of smaller files and rewrite as one • Major drop deletes or expire cells and picks up all files and rewrite as one
  21. 21. Compaction file selection HBaseCon 2014 F I L E S I Z E Older File Age Younger minCompactSize ExcludedIncluded
  22. 22. Compaction/Split Managemt
  23. 23. Config Parameters HBaseCon 2014 • hbase.hstore.blockingStoreFile • hbase.hstore.compaction.max.size • hbase.hstore.compaction.min.size • hbase.hstore.compaction.ratio • hbase.hregion.max.filesize • hbase.hregion.memstore.flush.size • hbase.master.wait.on.regionservers.mintostart
  24. 24. Managed Compactions and Splits HBaseCon 2014 • Flexible Scheduling • Custom Logic per table and workload
  25. 25. Compaction and Splits Scheduler HBaseCon 2014 Metrics Mysql Metrics Analyze Region Specific Metrics Server Metrics Scheduling Parameters HBaseCtl: Scheduler HDFS Publish HBase Cluster A HBase Cluster B HBase Cluster CUpdate Compaction/Split Statistics Zookeeper Coordination & Intermediate Store
  26. 26. Group Balancing • Scheduled group balance followed by rolling major compaction • Based on Data Locality – Find data locality of each block of store files – Move region to server where the maximum blocks are located • Helps after cluster upgrades and restarts • After config changes for a region group HBaseCon 2014
  27. 27. Monitoring
  28. 28. Monitoring HBaseCon 2014 • Simon Metrics & Yahoo Monitoring As a Service (YMS) • OpenTSDB at Yahoo, replacing MySQL as backend for YMS
  29. 29. Monitoring cont.. HBaseCon 2014
  30. 30. Monitoring Cont.. ( Metrics for Customers) HBaseCon 2014 Simon System Other Systems for Analysis & Reporting Jenkins Job : Merges and Formats Metrics HBase HBase Master HDFS Master Grid Snodes Customer Dashboards Upload data to HDFS Memory Dump from Master Region Server Metrics Push compiled metrics to snodes Fetch metrics
  31. 31. Monitoring cont.. ( OpenTSDB ) HBaseCon 2014 • Evaluating • Work required to make is production ready at Yahoo
  32. 32. Thank You HBaseCon 2014