A Soft Introduction
What is Hbase 
 Non-relational, distributed database 
 Column‐Oriented 
 Multi‐Dimensional 
 High Availability 
 High Performance 
 open source
Implementation 
 HBase modeled with an HBase master node 
orchestrating a cluster of one or more regionserver 
slaves. 
 The HBase master is responsible for bootstrapping a 
virgin install, for assigning regions to registered region 
servers, and for recovering region server failures. The 
master node is lightly loaded.
Installation 
 Download a stable release from an Apache Download 
Mirror and unpack it on your local filesystem. 
 you first need to tell HBase where Java is located on 
your system. i.e JAVA_HOME environment variable 
 you can set the Java installation that HBase uses by 
editing HBase’s conf/hbaseenv.sh, and specifying the 
JAVA_HOME
Continue 
 Add the HBase binary directory to your command-line 
path. For example: 
% exportHBASE_HOME=/home/hbase/hbase-x.y.z 
% export PATH=$PATH:$HBASE_HOME/bin 
 To get the list of HBase options, type: % hbase 
 To start a temporary instance of HBase 
% start-hbase.sh 
 launch the HBase shell by typing: % hbase shell 
type:
Difference Between Hadoop/HDFS 
and Hbase 
 HDFS is a distributed file system that is well suited for 
the storage of large files. 
 HBase, on the other hand, is built on top of HDFS and 
provides fast record lookups (and updates) for large 
tables. 
 HDFS has based on GFS file system. 
 Hbase is Distributed – uses HDFS for storage ,Column 
– Oriented , Multi Dimensional( Versions) , Storage 
System
Hbase is NOT 
 A sql Database – No Joins, 
datatypes , no (damn) sql 
 No Schema 
 No DBA needed 
no query engine, no
Programming with HBase 
 HBase shell 
 Scan, List, Create, Get, Put, Delete 
 Java API 
 Get, Put, Delete, Scan,… 
 Non-Java Clients 
 Thrift 
 REST 
 HBase MapReduce API 
 hbase.mapreduce.TableMapper; 
 hbase.mapreduce.TableReducer; 
 High Level Interface 
 Pig, Hive
Storage Model 
 Column – oriented database (column families) 
 Table consists of Rows, each which has a 
primary 
key(row key) 
 Each Row may have any number of columns 
 Table schema only defines Column familes(column 
family can have any number of columns) 
 Each cell value has a timestamp
Hbase shell 
 hbase(main):003:0> create 'test', 'cf' 
 0 row(s) in 1.2200 seconds 
 hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' 
 0 row(s) in 0.0560 seconds 
 0 hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2' 
 0 row(s) in 0.0370 seconds 
 hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3' 
 0 row(s) in 0.0450 seconds
Hbase vs RDBMS 
RDBMS Hbase 
 Data layout: Row-oriented Column family oriented 
 Query language SQL Get/put/scan/etc * 
 Security 
 Max data size 
 Read/write 
Authentication/Authorization Work in Progress 
TBs 
1000s queries/second 
Hundrends of PBs 
Millions of queries /second
Use Cases 
 Facebook Analytics 
 Real-time counters of URLs shared, preferred links 
 Twitter 
 25 TB of message every month 
 Mozilla 
 Store crashes report, 2.5 million per day.
Hbase

Hbase

  • 1.
  • 2.
    What is Hbase  Non-relational, distributed database  Column‐Oriented  Multi‐Dimensional  High Availability  High Performance  open source
  • 3.
    Implementation  HBasemodeled with an HBase master node orchestrating a cluster of one or more regionserver slaves.  The HBase master is responsible for bootstrapping a virgin install, for assigning regions to registered region servers, and for recovering region server failures. The master node is lightly loaded.
  • 5.
    Installation  Downloada stable release from an Apache Download Mirror and unpack it on your local filesystem.  you first need to tell HBase where Java is located on your system. i.e JAVA_HOME environment variable  you can set the Java installation that HBase uses by editing HBase’s conf/hbaseenv.sh, and specifying the JAVA_HOME
  • 6.
    Continue  Addthe HBase binary directory to your command-line path. For example: % exportHBASE_HOME=/home/hbase/hbase-x.y.z % export PATH=$PATH:$HBASE_HOME/bin  To get the list of HBase options, type: % hbase  To start a temporary instance of HBase % start-hbase.sh  launch the HBase shell by typing: % hbase shell type:
  • 7.
    Difference Between Hadoop/HDFS and Hbase  HDFS is a distributed file system that is well suited for the storage of large files.  HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.  HDFS has based on GFS file system.  Hbase is Distributed – uses HDFS for storage ,Column – Oriented , Multi Dimensional( Versions) , Storage System
  • 8.
    Hbase is NOT  A sql Database – No Joins, datatypes , no (damn) sql  No Schema  No DBA needed no query engine, no
  • 9.
    Programming with HBase  HBase shell  Scan, List, Create, Get, Put, Delete  Java API  Get, Put, Delete, Scan,…  Non-Java Clients  Thrift  REST  HBase MapReduce API  hbase.mapreduce.TableMapper;  hbase.mapreduce.TableReducer;  High Level Interface  Pig, Hive
  • 10.
    Storage Model Column – oriented database (column families)  Table consists of Rows, each which has a primary key(row key)  Each Row may have any number of columns  Table schema only defines Column familes(column family can have any number of columns)  Each cell value has a timestamp
  • 11.
    Hbase shell hbase(main):003:0> create 'test', 'cf'  0 row(s) in 1.2200 seconds  hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'  0 row(s) in 0.0560 seconds  0 hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'  0 row(s) in 0.0370 seconds  hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'  0 row(s) in 0.0450 seconds
  • 12.
    Hbase vs RDBMS RDBMS Hbase  Data layout: Row-oriented Column family oriented  Query language SQL Get/put/scan/etc *  Security  Max data size  Read/write Authentication/Authorization Work in Progress TBs 1000s queries/second Hundrends of PBs Millions of queries /second
  • 13.
    Use Cases Facebook Analytics  Real-time counters of URLs shared, preferred links  Twitter  25 TB of message every month  Mozilla  Store crashes report, 2.5 million per day.