Hbasepreso 111116185419-phpapp02

172 views

Published on

  • Be the first to comment

  • Be the first to like this

Hbasepreso 111116185419-phpapp02

  1. 1. Introduction to HBaseGokuldas K Pillai@gokool
  2. 2. HBase - The Hadoop Database• Based on Google’s BigTable (OSDI’06)• Runs on top of Hadoop but provides real timeread/write access• Distributed Column Oriented Database
  3. 3. HBase Strengths• Can scale to billions of rows X millions ofcolumns• Relatively cheap & easy to scale• Random real time access read/write access tovery large data• Support for update, delete
  4. 4. Who is using it• StumpleUpon/ su.pr– Uses Hbase as a realtime data storage and analytics platform• Twitter– Distributed read/write backup of all mySQL instances. Powers“people search”.• Powerset (Now part of MS)• Adobe• Yahoo• Ning• Meetup• More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
  5. 5. Key features• Column Oriented store– Table costs only for the data stored– NULLs in rows are free• Rows stored in sorted order• Can scale to Petabytes (At Google)
  6. 6. Comparing to RDBMS• No Joins• No Query engine• No transactions• No column typing• No SQL, No ODBC/JDBC (Hbql is there now)
  7. 7. Data Model - Tables• Tables consisting of rows and columns• Table cells are versioned (by timestamp)• Tables are sorted by row keys• Table access is via primary key• Row updates lock the row no matter howmany columns are involved
  8. 8. Column Families• Row’s columns are grouped into families• Column family members identified by acommon ‘printable’ prefix• Column family should be predefined– but column family members can be addeddynamically– member name can be bytes• All column family members are collocated ondisk
  9. 9. Server Architecture• Similar to HDFS– HbaseMaster ~ NameNode– RegionServer ~ DataNode• HBase stores state via the Hadoop FS API• Can persist to :– Local– Amazon S3– HDFS (Default)
  10. 10. HBaseMasterWhat it does:• Bootstrapping a new instance• Assignment and handling RegionServer problems– Each region from every table is assigned to a RegionServer• When machines fail, move regions• When regions split, move regions to balanceWhat it does NOT do:– Handle write requests (Not a DB Master)– Handle location finding requests (handled by RegionServer)
  11. 11. RegionServer• Carry the regions• Handle client read/write requests• Manage region splits (inform the Master)
  12. 12. Regions• Horizontal Partitioning• Every region has a subset of the table’s rows• Region identified as– [table, first row(+), last row(-)]• Table starts on a single region• Splits into two equal sized regions as theoriginal region grows bigger and so on..
  13. 13. Zookeeper• Master election and server availability• Cluster management– Assignment transaction state management• Client contacts ZooKeeper to bootstrapconnection to the Hbase cluster• Region key ranges, region server addresses• Guarantees consistency of data across clients
  14. 14. Workflow (Client connecting first time)• Client  ZooKeeper (returns –ROOT- )• Client  -ROOT- (returns .META.)• Client  .META. (returns RegionServer)• To avoid 3-lookups everytime, client cachesthis info.– Recache on fault
  15. 15. Write/Read Operation• Write request from Client  RegionServer Commit log (on HDFS), memstore• Flush to filesystem when memstore fills• Read request from Client  RegionServerLookup the memstore if availableIf not, lookup flush files (reverse chrono. Order)
  16. 16. Integration• Java HBase Client API• High performance Thrift gateway• A REST-ful Web service gateway (Stargate)– Supports XML, binary dat encoding options• Cascading, Hive and Pig integration• HBase shell (jruby)• TableInput/TableOutputFormat for MR
  17. 17. Main Classes• HBaseAdmin– Create table, drop table, list and alter table• HTable– Put– Get– Scan
  18. 18. Alternatives to HBase• Cassandra (From Facebook)– Based on Amazon’s Dynamo– No Master-slave but P2P– Tunable: Consistency Vs Latency• Yahoo’s PNUTS– Not Open source– Works well for multi DC/geographical disbursed servers
  19. 19. References• Hadoop – The Definitive Guide• Cloudera website• http://wiki.hbase.apache.org• Lars George,– http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html• Comparing Hbase, Cassandra and PNUTS– http://blog.amandeepkhurana.com/2010/05/comparing-pnuts-hbase-and-cassandra.html• ACID compliance of Hbase -http://hbase.apache.org/docs/r0.89.20100621/acid-semantics.html

×