Hbase.pptx

Atilim University
Big Data Analytics
Dr. Ziya Karakaya
Mirwais Doost

AGENDA
 What is HBase
 HBase Features
 Applications of HBase
 HBase vs RDBM
 HBase Storage
 HBase Architectural Components

Structured data
This data could
be easily stores in
a Relational
Database (RDMS)
Introduction to HBase
At the past, data used to
be less and was mostly
structured

Semi-structured
data
Storing and
processing this
data on RDBMS
became a major
problem
Then, Internet evolved and
huge volumes of structured
and semi-structured data got
generated

Semi-structured
data
Apache HBASE
was the solution
for this
Then, Internet evolved and
huge volumes of structured
and semi-structured data got
generated
Solution

What is HBase?
HBase is a column oriented database management system
derived from google NoSQL database Big Table that runs
on the top of HDFS
Open source project that is horizontally scalable
1
NoSQL database written in java which performs faster querying
2
Well suited for sparse datasets (can contain missing or NA values)
3

Applications of HBase
Medical E-Commerce Sports
HBase is used for
genome sequences
Storing disease history
of people or an area
HBase is used for storing
logs about customer
search history
Performs analytics and
target advertisement for
better business insights
HBase stores match
details and history of
each match
Uses this data for better
prediction

HBase vs RDBMS
Does not have a fix schema
(schema-less). Defines only
column families
Works well with structured and
semi-structured data
It can have denormalized data
(can contain missing or NA values)
Built for wide tables that can be
scaled horizontally
Has a fixed schema which
describes the structure of the
tables
Works well with structured data
RDBMS can store only normalized
data
Built for the tables that is hard
to scale

HBase column oriented storage
Row 1
Row 2
Row 3
Column Family 1 Column Family 2 Column Family 3
Row id
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9
Row Key Column Family
Column
Qualifiers
Cells

HBase column oriented storage
1 Angela Chicago 31 Big Data Architect $70,000
2 Dwayne Boston 35 Web Developer $65000
3 David Seattle 29 Data Analytics $55000
Personal data Professional data
name city age Designation salary
Row Key Column Family
Column
Qualifiers
Cells
empid
Row id

HBase
Architectural
Components

HBase Architectural Components
HFile HFile
Store Memory
Region Server
HLog
Region
HFile HFile
Store Memory
Region Server
HLog
Region
HFile HFile
Store Memory
Region Server
HLog
Region
HDFS
Zookeeper is used for
monitoring
Apache
Zookeeper
HMaster
HBase Master assigns
regions and load
balancing

Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
startKey
endKey
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
startKey
endKey
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - Regions
HBase tables are divided horizontally by
row key range into “Regions”
Regions are assigned to the nodes in the
cluster, called “Regions Servers”
A regions contains all rows in the table
between the regions start key and end key
These servers serve data for read and write
Client
get

Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Assigns regions to
region serves
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - HMaster
Region assignment, Data Definition Language
operation (create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery
or load balancing and monitoring all servers
Client
HMaster
Create, delete,
update table
Monitors region
servers
Assigns
regions
to region
serves
HBase has a distributed environment where HMaster alone is not
sufficient to manage every thing, Hence, ZooKeeper was introduced

Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - ZooKeeper
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
ZooKeeper maintains which servers are alive and
available, and provides server failure notification
Inactive
HMaster
Ative
HMaster
heartbeat
ZooKeeper
Active HMaster sends a heartbeat signal to ZooKeeper indicating that it is active and region servers
send their status to ZooKeeper indicating they are ready for read and write operation

Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - ZooKeeper
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
ZooKeeper maintains which servers are alive and
available, and provides server failure notification
Inactive
HMaster
Ative
HMaster
heartbeat
ZooKeeper
Inactive HMaster acts as a backup if the active HMaster fails, it
will come to rescue

Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components work together
HMaster
1 master is
active
ZooKeeper
• Acvtive Hmaster selection
• Region Server session
heartbeat
Ephermera
l node
Ephermeral
node
Active HMaster and Region Servers connect with a session to ZooKeeper and ZooKeeper maintains ephemeral
nodes for active sessions via heartbeats to indicate that region servers are up and running

HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
There is a special HBase Catalog table called the META table, Which
holds the location of the regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
The client gets the Region
Server that hosts the META
table from ZooKeeper
META location is stored
in ZooKeeper
Requests for
Region Server
META table
location

HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
The client gets the Region
Server that hosts the META
table from ZooKeeper
Get region server for row
key from meta table
Meta
Cache
The client caches this
information along
with the meta table
location

HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
It will get the Row from the
corresponding Region Server
Get row
Put row

Key col col
xxx val val
xxx val val
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region 3 Region 4
Region Server
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Meta Table
Region Server
Meta Table
Row key Vale
table, key, region region server
Special HBase catalog
table that maintains a
list of all the Region
Servers in the HBase
storage system
META table is used to
find the Region for a
given Table Key

HBase Write Mechanism
MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
When client issues a put request, it will write the data to the write-ahead log (WAL)
1
1
Write Ahead Log (WAL) is a file
use to store new data that is
yet to be put on permanent
storage. It is used for recovery
in the case of failure.

MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
Once data is written to the WAL, it is then copied to the MemStore
2
1
MemStore is the write cache
that stores new data that has
not yet been written to the
disk. There is one MemStore
per column family per region. 2

MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
Once the data is placed in the MemStore, the client then receives the acknowledgment
3
1
2
3 ACK

MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
When the MemStore reaches the threshold, it dumps or commits the data into HFile
4
1
Hfiles store the rows of data as
stored KeyValue on disk
2
3
4 4
ACK

HBase Features
Scalable
Automatic failure
support
Consistent read and
write
JAVA API for client
access
Data can be scaled
across various nodes
as it is stored in HDFS
Write Ahead Log
across clusters
which provides
automatic support
against failure
HBase Provides
consistent read and
write of data
Provides ease to
use JAVA API for
clients

Many thanks for your attention!

Hbase.pptx

Recommended

Recommended

More Related Content

Similar to Hbase.pptx

Similar to Hbase.pptx (20)

Recently uploaded

Recently uploaded (20)

Hbase.pptx