Your SlideShare is downloading. ×
April 2013 HUG: HBase as a Service at Yahoo!
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

April 2013 HUG: HBase as a Service at Yahoo!

2,028
views

Published on

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. Yahoo! has been using HBase for a long time as isolated one off deployments. Having a …

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. Yahoo! has been using HBase for a long time as isolated one off deployments. Having a multi-tenant platform makes it possible for all our grid customers to take advantage of HBase capabilities now. We will provide a brief overview of HBase and how it works (several of you asked for back to basics type talks), and then spend the majority of our time talking about multi-tenancy with HBase.

Presenter(s):

Francis Christopher Liu, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Vandana Ayyalasomayajula, Software Engineer, Yahoo! and PPMC Member, Apache HCatalog

Published in: Spiritual, Technology

0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,028
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HBase as a Service at Yahoo!Bay Area HUG PresentationFrancis LiuVandana AyyalasomayajulaApril 17, 2013
  • 2. HBase Overview2Yahoo! Presentation, ConfidentialApache HBase is an open source Bigtable-like, distributed, scalable, consistent,random access, key-value store built on Apache HadoopColumn Family - InfoRowkey Email Age PasswordAlice alice@wonderland.com 23Bob bob@myworld.com 25 IambobEve hithere@getintouch.com 30 nice1passTable islexicographicallysorted on rowkeys123trickedyounewpasswordCells4ts1 = 1ts2 = 2Each cell has multipleversions represented bytimestamp wherets2>ts1Identify your data (cell value) in the HBase table by[1] rowkey, [2] column family, [3] column qualifier, [4] timestamp/ version]HBase Data Model
  • 3. HBase Distributed Mode3Yahoo! Presentation, ConfidentialAndy ArchBrad ArchDheeraj OpsEleanor PgMFrancis DevGovind DevRajiv OpsSumeet PMVandana DevTable T1 is split into threeregions R1, R2, R3Each region is served by aRegionServer collocated withthe DataNodeClientZooKeeper-Root-Client contactsZooKeeper, aseparate cluster ofZK nodesRetrieve RS hosting–ROOT- region(Row/ Meta region)Find Sumeet’s rolewith HBaseM1M2RS1T1R1RS2T1R2, T1R3RS1(Row/ table region)RS2Query the .Meta.server that has therow key “Sumeet”T1R1T1R2T1R3RS1RS2RS2RS3
  • 4. HBase High-level Architecture4Yahoo! Presentation, ConfidentialSource: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
  • 5. HBase Operations§  get()§  put()§  scan()§  checkAndDelete()§  checkAndPut()§  increment()…check HTable class for further details on operationsCaution:§  No queries§  No secondary indexes55Yahoo! Presentation, Confidential
  • 6. Multi-tenancy Motivation§  Successful Deployments§  C.O.R.Eo  Personalization Engine§  Web Crawl Cache§  etc…§  Off-stage processing§  Mutable data§  Random read/write6
  • 7. Metrics/Analytics Use Cases7HBaseCollector Collector CollectorQuery ServerIngestion
  • 8. Dimension Store Use CaseHBaseHDFSMapReduceHivePigClickstream Ad Campaign8
  • 9. Incremental Processing Use Cases9HBaseMapReduceStormHDFSCollectorSlowFastOn-stageOff-stage processingCollectionServingStoreSearchEventsFiles
  • 10. Hadoop at Yahoo!§  Hosted Multi-tenant Service§  Security§  Job Queues§  HDFS Quota10
  • 11. HBase at Yahoo!§  Hosted Multi-tenant Service§  Security§  Isolated Deployment§  Region Server Group§  Namespace11
  • 12. Security§  Authentication§  Kerberos (users, processes)§  Delegation Token (MapReduce, YARN, etc)§  Authorization§  HBase ACLs (Read, Write, Create, Admin)§  Grant permissions to User or Unix Group§  ACL for Table, Column Family or Column§  Only Global Admin can create/drop tables12
  • 13. Isolated DeploymentHBaseClientHBaseClientJobTracker NamenodeTaskTrackerDataNodeNamenodeRegionServerDataNodeRegionServerDataNodeRegionServerDataNodeHBase MasterZookeeperQuorumHBaseClientMR ClientM/R TaskTaskTrackerDataNodeM/R TaskTaskTrackerDataNodeMR TaskCompute Cluster HBase ClusterGateway/Launcher13
  • 14. Region Server Groups§  Member Region Servers§  Member Tables§  Resource Isolation§  Flexibility with configuration14Group BarRegion Server 5…8Table3Table4Group FooRegion Server 1…4Table1Table2RS1Table1Table2RS2Table1Table2RS3Table1Table2RS4 RS5Table3Table4RS6Table3Table4RS7Table3Table4RS8
  • 15. Region Server Groups15§  group_add§  group_remove§  group_move_servers§  group_move_tables§  create … { … CONFIGURATION=>{‘hbase.rsgroup.name’=>’my_group’}}
  • 16. Region Server Groups16LoadBalancerGroupBasedLoadBalancerGroupAdminEndpointGroupMasterObserverHMasterFilterByGroupfoobarGroupInfoManagerGroup TableGroupZNode
  • 17. Namespace§  Analogous to Database§  Table Name: <table namespace>.<table qualifier>§  i.e. my_ns.my_table§  Reserved namespaces§  Default – tables with no explicit namespace§  System – tables are guaranteed to be assigned prior to user tables§  Table Path: /<hbaseRoot>/data/<namespace>/<tableName>§  /hbase/data/my_ns/my_ns.my_table17
  • 18. Namespace + Security + Group + Quota§  Tables§  Namespace ACL§  Default Region Server Group§  Quota§  Max Tables§  Max Regions18NamespaceGroup Tables Quota ACL
  • 19. Namespace + Quota19HMasterTableNamespaceManagerNamespaceTableNamespaceZNodesNamespace NamespaceControllerZKNamespaceManagerMasterCPHostRegionCPHost
  • 20. Conclusion§  HBase enables new processing paradigms (vs HDFS)§  Namespace provide tenants with a project space§  Region Server Groups guarantee Isolation§  Namespace Quota limits use of shared resources§  Namespace ACLs help project level administrationYahoo! Presentation, Confidential 20
  • 21. References§  http://hbase.apache.org/book/book.html§  Region Server Group (HBASE-6721)§  Namespace (HBASE-8015)Yahoo! Presentation, Confidential 21