April 2013 HUG: HBase as a Service at Yahoo!


Published on

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. Yahoo! has been using HBase for a long time as isolated one off deployments. Having a multi-tenant platform makes it possible for all our grid customers to take advantage of HBase capabilities now. We will provide a brief overview of HBase and how it works (several of you asked for back to basics type talks), and then spend the majority of our time talking about multi-tenancy with HBase.


Francis Christopher Liu, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Vandana Ayyalasomayajula, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Published in: Spiritual, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

April 2013 HUG: HBase as a Service at Yahoo!

  1. 1. HBase as a Service at Yahoo!Bay Area HUG PresentationFrancis LiuVandana AyyalasomayajulaApril 17, 2013
  2. 2. HBase Overview2Yahoo! Presentation, ConfidentialApache HBase is an open source Bigtable-like, distributed, scalable, consistent,random access, key-value store built on Apache HadoopColumn Family - InfoRowkey Email Age PasswordAlice alice@wonderland.com 23Bob bob@myworld.com 25 IambobEve hithere@getintouch.com 30 nice1passTable islexicographicallysorted on rowkeys123trickedyounewpasswordCells4ts1 = 1ts2 = 2Each cell has multipleversions represented bytimestamp wherets2>ts1Identify your data (cell value) in the HBase table by[1] rowkey, [2] column family, [3] column qualifier, [4] timestamp/ version]HBase Data Model
  3. 3. HBase Distributed Mode3Yahoo! Presentation, ConfidentialAndy ArchBrad ArchDheeraj OpsEleanor PgMFrancis DevGovind DevRajiv OpsSumeet PMVandana DevTable T1 is split into threeregions R1, R2, R3Each region is served by aRegionServer collocated withthe DataNodeClientZooKeeper-Root-Client contactsZooKeeper, aseparate cluster ofZK nodesRetrieve RS hosting–ROOT- region(Row/ Meta region)Find Sumeet’s rolewith HBaseM1M2RS1T1R1RS2T1R2, T1R3RS1(Row/ table region)RS2Query the .Meta.server that has therow key “Sumeet”T1R1T1R2T1R3RS1RS2RS2RS3
  4. 4. HBase High-level Architecture4Yahoo! Presentation, ConfidentialSource: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  5. 5. HBase Operations§  get()§  put()§  scan()§  checkAndDelete()§  checkAndPut()§  increment()…check HTable class for further details on operationsCaution:§  No queries§  No secondary indexes55Yahoo! Presentation, Confidential
  6. 6. Multi-tenancy Motivation§  Successful Deployments§  C.O.R.Eo  Personalization Engine§  Web Crawl Cache§  etc…§  Off-stage processing§  Mutable data§  Random read/write6
  7. 7. Metrics/Analytics Use Cases7HBaseCollector Collector CollectorQuery ServerIngestion
  8. 8. Dimension Store Use CaseHBaseHDFSMapReduceHivePigClickstream Ad Campaign8
  9. 9. Incremental Processing Use Cases9HBaseMapReduceStormHDFSCollectorSlowFastOn-stageOff-stage processingCollectionServingStoreSearchEventsFiles
  10. 10. Hadoop at Yahoo!§  Hosted Multi-tenant Service§  Security§  Job Queues§  HDFS Quota10
  11. 11. HBase at Yahoo!§  Hosted Multi-tenant Service§  Security§  Isolated Deployment§  Region Server Group§  Namespace11
  12. 12. Security§  Authentication§  Kerberos (users, processes)§  Delegation Token (MapReduce, YARN, etc)§  Authorization§  HBase ACLs (Read, Write, Create, Admin)§  Grant permissions to User or Unix Group§  ACL for Table, Column Family or Column§  Only Global Admin can create/drop tables12
  13. 13. Isolated DeploymentHBaseClientHBaseClientJobTracker NamenodeTaskTrackerDataNodeNamenodeRegionServerDataNodeRegionServerDataNodeRegionServerDataNodeHBase MasterZookeeperQuorumHBaseClientMR ClientM/R TaskTaskTrackerDataNodeM/R TaskTaskTrackerDataNodeMR TaskCompute Cluster HBase ClusterGateway/Launcher13
  14. 14. Region Server Groups§  Member Region Servers§  Member Tables§  Resource Isolation§  Flexibility with configuration14Group BarRegion Server 5…8Table3Table4Group FooRegion Server 1…4Table1Table2RS1Table1Table2RS2Table1Table2RS3Table1Table2RS4 RS5Table3Table4RS6Table3Table4RS7Table3Table4RS8
  15. 15. Region Server Groups15§  group_add§  group_remove§  group_move_servers§  group_move_tables§  create … { … CONFIGURATION=>{‘hbase.rsgroup.name’=>’my_group’}}
  16. 16. Region Server Groups16LoadBalancerGroupBasedLoadBalancerGroupAdminEndpointGroupMasterObserverHMasterFilterByGroupfoobarGroupInfoManagerGroup TableGroupZNode
  17. 17. Namespace§  Analogous to Database§  Table Name: <table namespace>.<table qualifier>§  i.e. my_ns.my_table§  Reserved namespaces§  Default – tables with no explicit namespace§  System – tables are guaranteed to be assigned prior to user tables§  Table Path: /<hbaseRoot>/data/<namespace>/<tableName>§  /hbase/data/my_ns/my_ns.my_table17
  18. 18. Namespace + Security + Group + Quota§  Tables§  Namespace ACL§  Default Region Server Group§  Quota§  Max Tables§  Max Regions18NamespaceGroup Tables Quota ACL
  19. 19. Namespace + Quota19HMasterTableNamespaceManagerNamespaceTableNamespaceZNodesNamespace NamespaceControllerZKNamespaceManagerMasterCPHostRegionCPHost
  20. 20. Conclusion§  HBase enables new processing paradigms (vs HDFS)§  Namespace provide tenants with a project space§  Region Server Groups guarantee Isolation§  Namespace Quota limits use of shared resources§  Namespace ACLs help project level administrationYahoo! Presentation, Confidential 20
  21. 21. References§  http://hbase.apache.org/book/book.html§  Region Server Group (HBASE-6721)§  Namespace (HBASE-8015)Yahoo! Presentation, Confidential 21