Harmonizing Multi-tenant HBase Clusters for
Managing Workload Diversity
PRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Ani...
Agenda
Topic Speaker(s)
Overview of Hadoop stack and Grid Infrastructure at Yahoo Rajiv Chittajallu
Application onboarding...
Hadoop at Yahoo
Hadoop Usage at Yahoo
HBaseCon 2014
Browsers
Mobile Devices
Web Crawl
Knowledge
Graph
3rd Party
Yahoo Grid
Business Intell...
Grid Infrastructure at Yahoo
HBaseCon 2014
A multi-tenant, secure, distributed compute and storage
environment, based on H...
Grid Stack
HBaseCon 2014
Deployment Model
HBaseCon 2014
DataNode NodeManager
NameNode RM
DataNodes RegionServers
NameNode HBase Master Nimbus
Super...
Network Architecture – 1G to node
HBaseCon 2014
Network Architecture – 10G Node
tor
VC0
spine0 leaf0
spine1
spine7
.
.
leaf1
leaf31
.
.
VC1
spine0 leaf0
spine1
spine7
.
....
Hbase @ Yahoo
HBaseCon 2014
• 7 clusters, 1500 region servers, 6 PB of data
• Diverse use cases, 500+ Tables, 100k regions...
Challenges
HBaseCon 2014
• Customer onboarding and provisioning
• Access management and Table provisioning
• Deployments
•...
Use Cases
Use Cases
HBaseCon 2014
Search
▪ Web Cache
▪ Query Analysis
▪ Local Listings
▪ Analytics
Y! Mail
▪ Anti-spam
▪ Log Analyti...
Web Crawl Cache
HBaseCon 2014
Developers/
Scientists
Poller
Fetcher
Ingestor
Extruder
Processing
Random
Read
poll
fetch
la...
Customer Onboarding & Multi Tenancy
Customer Onboarding & Provisioning
HBaseCon 2014
• Two identical environments (Prod and Non-Prod)
• Applications are on bo...
Namespaces
HBaseCon 2014
• Allow tenants to create/drop/modify their own tables
• Only super admin used to do it before
• ...
RegionServer Groups
HBaseCon 2014
• Missing QoS in Hbase 0.94
• Isolation is required in Multi-tenant env
• Multi configs ...
Multi Region Configs
HBaseCon 2014
SVN Jenkins Build
Farm
Master Repository
Slave Repository
Colo B
Slave Repository
Colo ...
Compaction
HBaseCon 2014
• Minor & Major
• Minor picks up couple of smaller files and rewrite as one
• Major drop deletes ...
Compaction file selection
HBaseCon 2014
F
I
L
E
S
I
Z
E
Older File Age Younger
minCompactSize
ExcludedIncluded
Compaction/Split Managemt
Config Parameters
HBaseCon 2014
• hbase.hstore.blockingStoreFile
• hbase.hstore.compaction.max.size
• hbase.hstore.compact...
Managed Compactions and Splits
HBaseCon 2014
• Flexible Scheduling
• Custom Logic per table and workload
Compaction and Splits Scheduler
HBaseCon 2014
Metrics
Mysql
Metrics
Analyze
Region
Specific
Metrics
Server
Metrics
Schedul...
Group Balancing
• Scheduled group balance followed by rolling major compaction
• Based on Data Locality
– Find data locali...
Monitoring
Monitoring
HBaseCon 2014
• Simon Metrics & Yahoo Monitoring As a Service (YMS)
• OpenTSDB at Yahoo, replacing MySQL as bac...
Monitoring cont..
HBaseCon 2014
Monitoring Cont.. ( Metrics for Customers)
HBaseCon 2014
Simon
System
Other
Systems for
Analysis &
Reporting
Jenkins Job :...
Monitoring cont.. ( OpenTSDB )
HBaseCon 2014
• Evaluating
• Work required to make is production ready at Yahoo
Thank You
HBaseCon 2014
Upcoming SlideShare
Loading in...5
×

Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

1,640

Published on

Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)

In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.

Published in: Software, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,640
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
78
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

  1. 1. Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity PRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Anish Mathew⎪ May 5, 2014
  2. 2. Agenda Topic Speaker(s) Overview of Hadoop stack and Grid Infrastructure at Yahoo Rajiv Chittajallu Application onboarding on Multi-Tenant HBase Dheeraj Kapur Automation for Compaction/Splits and Monitoring Anish Mathew Q&A All Presenters
  3. 3. Hadoop at Yahoo
  4. 4. Hadoop Usage at Yahoo HBaseCon 2014 Browsers Mobile Devices Web Crawl Knowledge Graph 3rd Party Yahoo Grid Business Intelligence Tools (e.g. Tableau, MicroStrategy) Data Collection Asynchronous Data Processing Synchronous Serving User Events WCC Entity Feeds Content Feeds Source of truth for data* Serving Systems Home Run Search Mail Mobile Flickr Media Stream Ads Native Ads Display Ads Content systems Y! NoSql …
  5. 5. Grid Infrastructure at Yahoo HBaseCon 2014 A multi-tenant, secure, distributed compute and storage environment, based on Hadoop stack for large scale data processing
  6. 6. Grid Stack HBaseCon 2014
  7. 7. Deployment Model HBaseCon 2014 DataNode NodeManager NameNode RM DataNodes RegionServers NameNode HBase Master Nimbus Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat
  8. 8. Network Architecture – 1G to node HBaseCon 2014
  9. 9. Network Architecture – 10G Node tor VC0 spine0 leaf0 spine1 spine7 . . leaf1 leaf31 . . VC1 spine0 leaf0 spine1 spine7 . . leaf1 leaf31 . . tor 40G (or 4 x 10G) Host Host Host . . . . . . 10G
  10. 10. Hbase @ Yahoo HBaseCon 2014 • 7 clusters, 1500 region servers, 6 PB of data • Diverse use cases, 500+ Tables, 100k regions • Rolling Major compaction & Split and Group Rebalancing • RegionServer groups, namespaces and multi region config System
  11. 11. Challenges HBaseCon 2014 • Customer onboarding and provisioning • Access management and Table provisioning • Deployments • Customizing group configs • Rolling Major Compaction and Splits • Group Balancing
  12. 12. Use Cases
  13. 13. Use Cases HBaseCon 2014 Search ▪ Web Cache ▪ Query Analysis ▪ Local Listings ▪ Analytics Y! Mail ▪ Anti-spam ▪ Log Analytics ▪ Metadata Mgmt. Cloud Platforms ▪ Performance ▪ Monitoring ▪ OpenStack Consumer Platforms ▪ CMS ▪ Social Data Online Ads ▪ Traffic Protection ▪ Ads Data Mgmt. P13N ▪ Content P13N ▪ Ad targeting Mobile ▪ Notifications ▪ Flickr Sales ▪ eCommerce Yahoo’s Global Business
  14. 14. Web Crawl Cache HBaseCon 2014 Developers/ Scientists Poller Fetcher Ingestor Extruder Processing Random Read poll fetch launch write Compute Clusters NM DN NM DN NM DN.....HDFS NN YARN RM Clusters RS DN RS DN RS DN..... HDFSNN HBaseHM r/w insert scan
  15. 15. Customer Onboarding & Multi Tenancy
  16. 16. Customer Onboarding & Provisioning HBaseCon 2014 • Two identical environments (Prod and Non-Prod) • Applications are on boarded to Non-Prod for performance/Integration testing • Once ready, provisioned on prod • Performance results help in production onboarding
  17. 17. Namespaces HBaseCon 2014 • Allow tenants to create/drop/modify their own tables • Only super admin used to do it before • Quota Management • Security administration • Commands : alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
  18. 18. RegionServer Groups HBaseCon 2014 • Missing QoS in Hbase 0.94 • Isolation is required in Multi-tenant env • Multi configs are required for different apps • Commands : group_add, group_balance, group_get, group_list, group_list_tables, group_list_transitions, group_move_servers, group_move_tables, group_of_server, group_of_table, group_remove
  19. 19. Multi Region Configs HBaseCon 2014 SVN Jenkins Build Farm Master Repository Slave Repository Colo B Slave Repository Colo A HBase Cluster A HBase Cluster B Fetch Group List Generate Multi Configs Merge Default Config & Push multi config Sync Configs Download Host Maps and Multi Region Config
  20. 20. Compaction HBaseCon 2014 • Minor & Major • Minor picks up couple of smaller files and rewrite as one • Major drop deletes or expire cells and picks up all files and rewrite as one
  21. 21. Compaction file selection HBaseCon 2014 F I L E S I Z E Older File Age Younger minCompactSize ExcludedIncluded
  22. 22. Compaction/Split Managemt
  23. 23. Config Parameters HBaseCon 2014 • hbase.hstore.blockingStoreFile • hbase.hstore.compaction.max.size • hbase.hstore.compaction.min.size • hbase.hstore.compaction.ratio • hbase.hregion.max.filesize • hbase.hregion.memstore.flush.size • hbase.master.wait.on.regionservers.mintostart
  24. 24. Managed Compactions and Splits HBaseCon 2014 • Flexible Scheduling • Custom Logic per table and workload
  25. 25. Compaction and Splits Scheduler HBaseCon 2014 Metrics Mysql Metrics Analyze Region Specific Metrics Server Metrics Scheduling Parameters HBaseCtl: Scheduler HDFS Publish HBase Cluster A HBase Cluster B HBase Cluster CUpdate Compaction/Split Statistics Zookeeper Coordination & Intermediate Store
  26. 26. Group Balancing • Scheduled group balance followed by rolling major compaction • Based on Data Locality – Find data locality of each block of store files – Move region to server where the maximum blocks are located • Helps after cluster upgrades and restarts • After config changes for a region group HBaseCon 2014
  27. 27. Monitoring
  28. 28. Monitoring HBaseCon 2014 • Simon Metrics & Yahoo Monitoring As a Service (YMS) • OpenTSDB at Yahoo, replacing MySQL as backend for YMS
  29. 29. Monitoring cont.. HBaseCon 2014
  30. 30. Monitoring Cont.. ( Metrics for Customers) HBaseCon 2014 Simon System Other Systems for Analysis & Reporting Jenkins Job : Merges and Formats Metrics HBase HBase Master HDFS Master Grid Snodes Customer Dashboards Upload data to HDFS Memory Dump from Master Region Server Metrics Push compiled metrics to snodes Fetch metrics
  31. 31. Monitoring cont.. ( OpenTSDB ) HBaseCon 2014 • Evaluating • Work required to make is production ready at Yahoo
  32. 32. Thank You HBaseCon 2014
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×