SlideShare a Scribd company logo
Atilim University
Big Data Analytics
Dr. Ziya Karakaya
Mirwais Doost
AGENDA
AGENDA
 What is HBase
 HBase Features
 Applications of HBase
 HBase vs RDBM
 HBase Storage
 HBase Architectural Components
What is HBase
Structured data
This data could
be easily stores in
a Relational
Database (RDMS)
Introduction to HBase
At the past, data used to
be less and was mostly
structured
Semi-structured
data
Storing and
processing this
data on RDBMS
became a major
problem
Introduction to HBase
Then, Internet evolved and
huge volumes of structured
and semi-structured data got
generated
Semi-structured
data
Apache HBASE
was the solution
for this
Introduction to HBase
Then, Internet evolved and
huge volumes of structured
and semi-structured data got
generated
Solution
What is HBase?
HBase is a column oriented database management system
derived from google NoSQL database Big Table that runs
on the top of HDFS
Open source project that is horizontally scalable
1
NoSQL database written in java which performs faster querying
2
Well suited for sparse datasets (can contain missing or NA values)
3
Applications of
HBase
Applications of HBase
Medical E-Commerce Sports
HBase is used for
genome sequences
Storing disease history
of people or an area
HBase is used for storing
logs about customer
search history
Performs analytics and
target advertisement for
better business insights
HBase stores match
details and history of
each match
Uses this data for better
prediction
HBase vs
RDBMS
HBase vs RDBMS
Does not have a fix schema
(schema-less). Defines only
column families
Works well with structured and
semi-structured data
It can have denormalized data
(can contain missing or NA values)
Built for wide tables that can be
scaled horizontally
Has a fixed schema which
describes the structure of the
tables
Works well with structured data
RDBMS can store only normalized
data
Built for the tables that is hard
to scale
HBase
Storage
HBase column oriented storage
Row 1
Row 2
Row 3
Column Family 1 Column Family 2 Column Family 3
Row id
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9
Row Key Column Family
Column
Qualifiers
Cells
HBase column oriented storage
1 Angela Chicago 31 Big Data Architect $70,000
2 Dwayne Boston 35 Web Developer $65000
3 David Seattle 29 Data Analytics $55000
Personal data Professional data
name city age Designation salary
Row Key Column Family
Column
Qualifiers
Cells
empid
Row id
HBase
Architectural
Components
HBase Architectural Components
HFile HFile
Store Memory
Region Server
HLog
Region
HFile HFile
Store Memory
Region Server
HLog
Region
HFile HFile
Store Memory
Region Server
HLog
Region
HDFS
Zookeeper is used for
monitoring
Apache
Zookeeper
HMaster
HBase Master assigns
regions and load
balancing
Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
startKey
endKey
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
startKey
endKey
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - Regions
HBase tables are divided horizontally by
row key range into “Regions”
Regions are assigned to the nodes in the
cluster, called “Regions Servers”
A regions contains all rows in the table
between the regions start key and end key
These servers serve data for read and write
Client
get
Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Assigns regions to
region serves
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - HMaster
Region assignment, Data Definition Language
operation (create, delete) are handled by HMaster
Assigning and re-assigning regions for recovery
or load balancing and monitoring all servers
Client
HMaster
Create, delete,
update table
Monitors region
servers
Assigns
regions
to region
serves
HBase has a distributed environment where HMaster alone is not
sufficient to manage every thing, Hence, ZooKeeper was introduced
Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - ZooKeeper
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
ZooKeeper maintains which servers are alive and
available, and provides server failure notification
Inactive
HMaster
Ative
HMaster
heartbeat
ZooKeeper
Active HMaster sends a heartbeat signal to ZooKeeper indicating that it is active and region servers
send their status to ZooKeeper indicating they are ready for read and write operation
Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components - ZooKeeper
ZooKeeper is a distributed coordination service to
maintain server state in the cluster
ZooKeeper maintains which servers are alive and
available, and provides server failure notification
Inactive
HMaster
Ative
HMaster
heartbeat
ZooKeeper
Inactive HMaster acts as a backup if the active HMaster fails, it
will come to rescue
Key col col
xxx val val
xxx val val
Region Server 1
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region Server 2
Region 3 Region 4
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Architectural Components work together
HMaster
1 master is
active
ZooKeeper
• Acvtive Hmaster selection
• Region Server session
heartbeat
Ephermera
l node
Ephermeral
node
Active HMaster and Region Servers connect with a session to ZooKeeper and ZooKeeper maintains ephemeral
nodes for active sessions via heartbeats to indicate that region servers are up and running
HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
There is a special HBase Catalog table called the META table, Which
holds the location of the regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
The client gets the Region
Server that hosts the META
table from ZooKeeper
META location is stored
in ZooKeeper
Requests for
Region Server
META table
location
HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
There is a special HBase Catalog table called the META table, Which
holds the location of the regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
The client gets the Region
Server that hosts the META
table from ZooKeeper
Get region server for row
key from meta table
Meta
Cache
The client caches this
information along
with the meta table
location
HBase Read or Write
Region Server
DataNode
Region Server
DataNode
Client ZooKeeper
There is a special HBase Catalog table called the META table, Which
holds the location of the regions in the cluster
Here is what happens the first time a client reads or writes data to HBase
It will get the Row from the
corresponding Region Server
Get row
Put row
Key col col
xxx val val
xxx val val
Region 1 Region 2
Key col Col
xxx val val
xxx val val
Region 3 Region 4
Region Server
Key col Col
xxx val val
xxx val val
Key col Col
xxx val val
xxx val val
HBase Meta Table
Region Server
Meta Table
Row key Vale
table, key, region region server
Special HBase catalog
table that maintains a
list of all the Region
Servers in the HBase
storage system
META table is used to
find the Region for a
given Table Key
HBase Write Mechanism
MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
When client issues a put request, it will write the data to the write-ahead log (WAL)
1
1
Write Ahead Log (WAL) is a file
use to store new data that is
yet to be put on permanent
storage. It is used for recovery
in the case of failure.
HBase Write Mechanism
MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
Once data is written to the WAL, it is then copied to the MemStore
2
1
MemStore is the write cache
that stores new data that has
not yet been written to the
disk. There is one MemStore
per column family per region. 2
HBase Write Mechanism
MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
Once the data is placed in the MemStore, the client then receives the acknowledgment
3
1
2
3 ACK
HBase Write Mechanism
MemStore MemStore
Region
HFile
HFile
HFile
HFile
WAL
HDFS DataNode
Client
When the MemStore reaches the threshold, it dumps or commits the data into HFile
4
1
Hfiles store the rows of data as
stored KeyValue on disk
2
3
4 4
ACK
HBase Features
HBase Features
Scalable
Automatic failure
support
Consistent read and
write
JAVA API for client
access
Data can be scaled
across various nodes
as it is stored in HDFS
Write Ahead Log
across clusters
which provides
automatic support
against failure
HBase Provides
consistent read and
write of data
Provides ease to
use JAVA API for
clients
Many thanks for your attention!

More Related Content

Similar to Hbase.pptx

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Hbase
HbaseHbase
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
Sadhik7
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introductionyangwm
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
Kaushik Rajan
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
sheetal sharma
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
vijayapraba1
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
Salesforce Engineering
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Hbase
HbaseHbase
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
Josh Elser
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 

Similar to Hbase.pptx (20)

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Hbase
HbaseHbase
Hbase
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Scaling HBase for Big Data
Scaling HBase for Big DataScaling HBase for Big Data
Scaling HBase for Big Data
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Hbase
HbaseHbase
Hbase
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
01 hbase
01 hbase01 hbase
01 hbase
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hbase
HbaseHbase
Hbase
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Hbase.pptx

  • 1. Atilim University Big Data Analytics Dr. Ziya Karakaya Mirwais Doost
  • 2.
  • 4. AGENDA  What is HBase  HBase Features  Applications of HBase  HBase vs RDBM  HBase Storage  HBase Architectural Components
  • 6. Structured data This data could be easily stores in a Relational Database (RDMS) Introduction to HBase At the past, data used to be less and was mostly structured
  • 7. Semi-structured data Storing and processing this data on RDBMS became a major problem Introduction to HBase Then, Internet evolved and huge volumes of structured and semi-structured data got generated
  • 8. Semi-structured data Apache HBASE was the solution for this Introduction to HBase Then, Internet evolved and huge volumes of structured and semi-structured data got generated Solution
  • 9. What is HBase? HBase is a column oriented database management system derived from google NoSQL database Big Table that runs on the top of HDFS Open source project that is horizontally scalable 1 NoSQL database written in java which performs faster querying 2 Well suited for sparse datasets (can contain missing or NA values) 3
  • 11. Applications of HBase Medical E-Commerce Sports HBase is used for genome sequences Storing disease history of people or an area HBase is used for storing logs about customer search history Performs analytics and target advertisement for better business insights HBase stores match details and history of each match Uses this data for better prediction
  • 13. HBase vs RDBMS Does not have a fix schema (schema-less). Defines only column families Works well with structured and semi-structured data It can have denormalized data (can contain missing or NA values) Built for wide tables that can be scaled horizontally Has a fixed schema which describes the structure of the tables Works well with structured data RDBMS can store only normalized data Built for the tables that is hard to scale
  • 15. HBase column oriented storage Row 1 Row 2 Row 3 Column Family 1 Column Family 2 Column Family 3 Row id Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Row Key Column Family Column Qualifiers Cells
  • 16. HBase column oriented storage 1 Angela Chicago 31 Big Data Architect $70,000 2 Dwayne Boston 35 Web Developer $65000 3 David Seattle 29 Data Analytics $55000 Personal data Professional data name city age Designation salary Row Key Column Family Column Qualifiers Cells empid Row id
  • 18. HBase Architectural Components HFile HFile Store Memory Region Server HLog Region HFile HFile Store Memory Region Server HLog Region HFile HFile Store Memory Region Server HLog Region HDFS Zookeeper is used for monitoring Apache Zookeeper HMaster HBase Master assigns regions and load balancing
  • 19. Key col col xxx val val xxx val val Region Server 1 Region 1 Region 2 startKey endKey Key col Col xxx val val xxx val val Region Server 2 Region 3 Region 4 startKey endKey Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Architectural Components - Regions HBase tables are divided horizontally by row key range into “Regions” Regions are assigned to the nodes in the cluster, called “Regions Servers” A regions contains all rows in the table between the regions start key and end key These servers serve data for read and write Client get
  • 20. Key col col xxx val val xxx val val Region Server 1 Region 1 Region 2 Assigns regions to region serves Key col Col xxx val val xxx val val Region Server 2 Region 3 Region 4 Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Architectural Components - HMaster Region assignment, Data Definition Language operation (create, delete) are handled by HMaster Assigning and re-assigning regions for recovery or load balancing and monitoring all servers Client HMaster Create, delete, update table Monitors region servers Assigns regions to region serves HBase has a distributed environment where HMaster alone is not sufficient to manage every thing, Hence, ZooKeeper was introduced
  • 21. Key col col xxx val val xxx val val Region Server 1 Region 1 Region 2 Key col Col xxx val val xxx val val Region Server 2 Region 3 Region 4 Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Architectural Components - ZooKeeper ZooKeeper is a distributed coordination service to maintain server state in the cluster ZooKeeper maintains which servers are alive and available, and provides server failure notification Inactive HMaster Ative HMaster heartbeat ZooKeeper Active HMaster sends a heartbeat signal to ZooKeeper indicating that it is active and region servers send their status to ZooKeeper indicating they are ready for read and write operation
  • 22. Key col col xxx val val xxx val val Region Server 1 Region 1 Region 2 Key col Col xxx val val xxx val val Region Server 2 Region 3 Region 4 Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Architectural Components - ZooKeeper ZooKeeper is a distributed coordination service to maintain server state in the cluster ZooKeeper maintains which servers are alive and available, and provides server failure notification Inactive HMaster Ative HMaster heartbeat ZooKeeper Inactive HMaster acts as a backup if the active HMaster fails, it will come to rescue
  • 23. Key col col xxx val val xxx val val Region Server 1 Region 1 Region 2 Key col Col xxx val val xxx val val Region Server 2 Region 3 Region 4 Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Architectural Components work together HMaster 1 master is active ZooKeeper • Acvtive Hmaster selection • Region Server session heartbeat Ephermera l node Ephermeral node Active HMaster and Region Servers connect with a session to ZooKeeper and ZooKeeper maintains ephemeral nodes for active sessions via heartbeats to indicate that region servers are up and running
  • 24. HBase Read or Write Region Server DataNode Region Server DataNode Client ZooKeeper There is a special HBase Catalog table called the META table, Which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase The client gets the Region Server that hosts the META table from ZooKeeper META location is stored in ZooKeeper Requests for Region Server META table location
  • 25. HBase Read or Write Region Server DataNode Region Server DataNode Client ZooKeeper There is a special HBase Catalog table called the META table, Which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase The client gets the Region Server that hosts the META table from ZooKeeper Get region server for row key from meta table Meta Cache The client caches this information along with the meta table location
  • 26. HBase Read or Write Region Server DataNode Region Server DataNode Client ZooKeeper There is a special HBase Catalog table called the META table, Which holds the location of the regions in the cluster Here is what happens the first time a client reads or writes data to HBase It will get the Row from the corresponding Region Server Get row Put row
  • 27. Key col col xxx val val xxx val val Region 1 Region 2 Key col Col xxx val val xxx val val Region 3 Region 4 Region Server Key col Col xxx val val xxx val val Key col Col xxx val val xxx val val HBase Meta Table Region Server Meta Table Row key Vale table, key, region region server Special HBase catalog table that maintains a list of all the Region Servers in the HBase storage system META table is used to find the Region for a given Table Key
  • 28. HBase Write Mechanism MemStore MemStore Region HFile HFile HFile HFile WAL HDFS DataNode Client When client issues a put request, it will write the data to the write-ahead log (WAL) 1 1 Write Ahead Log (WAL) is a file use to store new data that is yet to be put on permanent storage. It is used for recovery in the case of failure.
  • 29. HBase Write Mechanism MemStore MemStore Region HFile HFile HFile HFile WAL HDFS DataNode Client Once data is written to the WAL, it is then copied to the MemStore 2 1 MemStore is the write cache that stores new data that has not yet been written to the disk. There is one MemStore per column family per region. 2
  • 30. HBase Write Mechanism MemStore MemStore Region HFile HFile HFile HFile WAL HDFS DataNode Client Once the data is placed in the MemStore, the client then receives the acknowledgment 3 1 2 3 ACK
  • 31. HBase Write Mechanism MemStore MemStore Region HFile HFile HFile HFile WAL HDFS DataNode Client When the MemStore reaches the threshold, it dumps or commits the data into HFile 4 1 Hfiles store the rows of data as stored KeyValue on disk 2 3 4 4 ACK
  • 33. HBase Features Scalable Automatic failure support Consistent read and write JAVA API for client access Data can be scaled across various nodes as it is stored in HDFS Write Ahead Log across clusters which provides automatic support against failure HBase Provides consistent read and write of data Provides ease to use JAVA API for clients
  • 34. Many thanks for your attention!