SlideShare a Scribd company logo
1 of 14
Download to read offline
HBase
Introduction
● HBase is a distributed column-oriented database built on top of the Hadoop
file system.
● It is an open-source project and is horizontally scalable.
● HBase is a data model that is similar to Google’s big table designed to provide
quick random access to huge amounts of structured data.
● It is a part of the Hadoop ecosystem that provides random real-time read/write
access to data in the Hadoop File System.
Introduction
HBase Architecture and Data Model
● An HBase table consists of rows and columns and has a third dimension,
version, to maintain the different values of a row and column intersection over
time
● Example : customer doing online shopping
● For this type of application, real-time access is required
● Thus, the use of the batch processing of Pig, Hive, or Hadoop's MapReduce is
not a reasonable implementation approach
● HBase stores the data and provides real-time read and write access
HBase Architecture and Data Model (cont’d)
● HBase uses a key/value structure to store the contents of an HBase table
● (row key, column family, column, timestamp) -> value
● Each value is the data to be stored at the intersection of the row, column, and
version
● Each key consists of the following elements
○ Row length
○ Row (sometimes called the row key)
○ Column family length
○ Column family
○ Column qualifier
○ Version
○ Key type
HBase Architecture and Data Model (cont’d)
● Table is a collection of rows.
● Row is a collection of column families.
● Column family is a collection of columns.
● Column is a collection of key value pairs.
HBase Architecture and Data Model (cont’d)
● Create HBase Table
$ hbase shell
hbase> create 'my_table', 'cfl', 'cf2',
{SPLITS =>['250000', '500000', '750000']}
● my_ table stored in HBase
$ hadoop fs -ls -R /hbase
● add data to the table
hbase> put ‘my_table’, '000700', 'cfl:cql', 'data1'
hbase> put ‘my_table’, '000700', 'cfl:cq2', 'data2'
hbase> put ‘my_table’, '000700', 'cf2:cq3', 'data3'
HBase Architecture and Data Model (cont’d)
● Data retrieved from table
○ hbase> get 'my_table', '000700', 'cf2:cq3'
● Scan function
○ hbase> scan 'my_table', {STARTROW => '000600', STOPROW =>'000800'}
● Delete the oldest entry for column
○ hbase> delete ‘my_table', '000700', 'cf2:cq3', 1393866138714
Use Cases for HBase
● a common use case for a data store such as HBase is to store the results from
a web crawler
○ row com.cnn.www corresponds to a website URl, www.cnn.com
○ A column family, called anchor, is defined to capture the website URLs that provide links to the
row's website
○ anchoring website URLs are used as the column qualifiers
○ Additional websites that provide links to www. cnn. com appear as additional column qualifiers.
○ The value stored in the cell is simply the text on the website that provides the link.
○ hbase> get 'web_table', 'com.cnn.www', {VERSIONS=> 2}
Use Cases for HBase (cont’d)
● This use case illustrates several important points
1. it is possible to get to a billion rows and millions of columns in an HBase
table.
2. row needs to be defined based on how the data will be accessed
3. it may be advantageous to use the column qualifiers to actually store the
data of interest, rather than simply storing it in a cell
● A second use case is the storage and search access of messages.
○ The row was defined to be the user I D.
○ The column qualifier was set to a word that appears in the message.
○ The version was the message I D.
○ The cell's content was the offset of the word in the message.
● This implementation allowed Facebook to provide auto-complete capability in
the search box and to return the results of the query quickly
Use Cases for HBase (cont’d)
● Power of being able to add new column by adding new column qualifiers, on
demand.
● RDBMS implementation, new columns require the involvement of a DBA to
alter the structure of the table.
Other HBase Usage Considerations
● Java API
○ The shell commands are useful for exploring the data in an HBase environment and illustrating
their use
○ in a production environment, the HBase Java API could be used to program the desired
operations and the conditions in which to execute the operations.
● Column family and column qualifier names
○ keep the name lengths of the column families and column qualifiers as short as possible
○ column family name and the column qualifier are stored as part of the key of each key/value
pair.
○ three copies of each HDFS block are replicated across the Hadoop cluster, which triples the
storage requirement.
Other HBase Usage Considerations (cont’d)
● Defining rows
○ definition of the row is the main mechanism to perform read/write operations on an HBase table
○ The row needs to be constructed in such a way that the requested columns can be easily and
quickly retrieved.
● Avoid creating sequential rows
○ all the new users and their data are being written to just one region, which is not distributing the
workload across the cluster as intended
○ randomly assign a prefix to the sequential number.
Other HBase Usage Considerations (cont’d)
● Versioning control
○ control how long a version of a cell's contents will exist
○ TimeTolive (TTL) after which any older versions will be deleted
○ minimum and maximum number of versions to maintain.
● Zookeeper
○ HBase uses Apache Zookeeper to coordinate and manage the various regions running on the
distributed cluster
○ Zookeeper is "a centralized service for maintaining configuration information, naming, providing
distributed synchronization, and providing group services.
○ Instead of building its own coordination service, HBase uses Zookeeper

More Related Content

What's hot

Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesinovia
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.TigerGraph
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquetManpreet Khurana
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)IDERA Software
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibJens Fisseler, Dr.
 
7. backup & restore data
7. backup & restore data7. backup & restore data
7. backup & restore dataTrần Thanh
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
Pivotal greenplum external tables
Pivotal greenplum external tablesPivotal greenplum external tables
Pivotal greenplum external tablesRajesh Goyal
 

What's hot (20)

R tutorial
R tutorialR tutorial
R tutorial
 
Chapter13
Chapter13Chapter13
Chapter13
 
Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexes
 
Hbase
HbaseHbase
Hbase
 
Big Data - How important it is
Big Data - How important it isBig Data - How important it is
Big Data - How important it is
 
HBaseCon 2015: Just the Basics
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquet
 
Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)Geek Sync I Polybase and Time Travel (Temporal Tables)
Geek Sync I Polybase and Time Travel (Temporal Tables)
 
Hbase
HbaseHbase
Hbase
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlib
 
7. backup & restore data
7. backup & restore data7. backup & restore data
7. backup & restore data
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
8.replication
8.replication8.replication
8.replication
 
Assignment#12
Assignment#12Assignment#12
Assignment#12
 
Pivotal greenplum external tables
Pivotal greenplum external tablesPivotal greenplum external tables
Pivotal greenplum external tables
 
Sql data shrink steps
Sql data shrink stepsSql data shrink steps
Sql data shrink steps
 
Tx well data final
Tx well data finalTx well data final
Tx well data final
 

Similar to HBase Introduction and Architecture

Similar to HBase Introduction and Architecture (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable
Bigtable Bigtable
Bigtable
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
01 hbase
01 hbase01 hbase
01 hbase
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Performance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODBPerformance Analysis of HBASE and MONGODB
Performance Analysis of HBASE and MONGODB
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 

More from Vishnupriya T H

Computer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysComputer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysVishnupriya T H
 
Security challenges in IoT
Security challenges in IoTSecurity challenges in IoT
Security challenges in IoTVishnupriya T H
 
Security auditing architecture
Security auditing architectureSecurity auditing architecture
Security auditing architectureVishnupriya T H
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...Vishnupriya T H
 
Sampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationSampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationVishnupriya T H
 
Halstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueHalstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueVishnupriya T H
 
Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Vishnupriya T H
 

More from Vishnupriya T H (7)

Computer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displaysComputer graphics - colour crt and flat-panel displays
Computer graphics - colour crt and flat-panel displays
 
Security challenges in IoT
Security challenges in IoTSecurity challenges in IoT
Security challenges in IoT
 
Security auditing architecture
Security auditing architectureSecurity auditing architecture
Security auditing architecture
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...
 
Sampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determinationSampling design, sampling errors, sample size determination
Sampling design, sampling errors, sample size determination
 
Halstead's software science - ananalytical technique
Halstead's software science - ananalytical techniqueHalstead's software science - ananalytical technique
Halstead's software science - ananalytical technique
 
Introduction to Triz (TIPS)
Introduction to Triz (TIPS)Introduction to Triz (TIPS)
Introduction to Triz (TIPS)
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

HBase Introduction and Architecture

  • 2. Introduction ● HBase is a distributed column-oriented database built on top of the Hadoop file system. ● It is an open-source project and is horizontally scalable. ● HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. ● It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
  • 4. HBase Architecture and Data Model ● An HBase table consists of rows and columns and has a third dimension, version, to maintain the different values of a row and column intersection over time ● Example : customer doing online shopping ● For this type of application, real-time access is required ● Thus, the use of the batch processing of Pig, Hive, or Hadoop's MapReduce is not a reasonable implementation approach ● HBase stores the data and provides real-time read and write access
  • 5. HBase Architecture and Data Model (cont’d) ● HBase uses a key/value structure to store the contents of an HBase table ● (row key, column family, column, timestamp) -> value ● Each value is the data to be stored at the intersection of the row, column, and version ● Each key consists of the following elements ○ Row length ○ Row (sometimes called the row key) ○ Column family length ○ Column family ○ Column qualifier ○ Version ○ Key type
  • 6. HBase Architecture and Data Model (cont’d) ● Table is a collection of rows. ● Row is a collection of column families. ● Column family is a collection of columns. ● Column is a collection of key value pairs.
  • 7. HBase Architecture and Data Model (cont’d) ● Create HBase Table $ hbase shell hbase> create 'my_table', 'cfl', 'cf2', {SPLITS =>['250000', '500000', '750000']} ● my_ table stored in HBase $ hadoop fs -ls -R /hbase ● add data to the table hbase> put ‘my_table’, '000700', 'cfl:cql', 'data1' hbase> put ‘my_table’, '000700', 'cfl:cq2', 'data2' hbase> put ‘my_table’, '000700', 'cf2:cq3', 'data3'
  • 8. HBase Architecture and Data Model (cont’d) ● Data retrieved from table ○ hbase> get 'my_table', '000700', 'cf2:cq3' ● Scan function ○ hbase> scan 'my_table', {STARTROW => '000600', STOPROW =>'000800'} ● Delete the oldest entry for column ○ hbase> delete ‘my_table', '000700', 'cf2:cq3', 1393866138714
  • 9. Use Cases for HBase ● a common use case for a data store such as HBase is to store the results from a web crawler ○ row com.cnn.www corresponds to a website URl, www.cnn.com ○ A column family, called anchor, is defined to capture the website URLs that provide links to the row's website ○ anchoring website URLs are used as the column qualifiers ○ Additional websites that provide links to www. cnn. com appear as additional column qualifiers. ○ The value stored in the cell is simply the text on the website that provides the link. ○ hbase> get 'web_table', 'com.cnn.www', {VERSIONS=> 2}
  • 10. Use Cases for HBase (cont’d) ● This use case illustrates several important points 1. it is possible to get to a billion rows and millions of columns in an HBase table. 2. row needs to be defined based on how the data will be accessed 3. it may be advantageous to use the column qualifiers to actually store the data of interest, rather than simply storing it in a cell ● A second use case is the storage and search access of messages. ○ The row was defined to be the user I D. ○ The column qualifier was set to a word that appears in the message. ○ The version was the message I D. ○ The cell's content was the offset of the word in the message. ● This implementation allowed Facebook to provide auto-complete capability in the search box and to return the results of the query quickly
  • 11. Use Cases for HBase (cont’d) ● Power of being able to add new column by adding new column qualifiers, on demand. ● RDBMS implementation, new columns require the involvement of a DBA to alter the structure of the table.
  • 12. Other HBase Usage Considerations ● Java API ○ The shell commands are useful for exploring the data in an HBase environment and illustrating their use ○ in a production environment, the HBase Java API could be used to program the desired operations and the conditions in which to execute the operations. ● Column family and column qualifier names ○ keep the name lengths of the column families and column qualifiers as short as possible ○ column family name and the column qualifier are stored as part of the key of each key/value pair. ○ three copies of each HDFS block are replicated across the Hadoop cluster, which triples the storage requirement.
  • 13. Other HBase Usage Considerations (cont’d) ● Defining rows ○ definition of the row is the main mechanism to perform read/write operations on an HBase table ○ The row needs to be constructed in such a way that the requested columns can be easily and quickly retrieved. ● Avoid creating sequential rows ○ all the new users and their data are being written to just one region, which is not distributing the workload across the cluster as intended ○ randomly assign a prefix to the sequential number.
  • 14. Other HBase Usage Considerations (cont’d) ● Versioning control ○ control how long a version of a cell's contents will exist ○ TimeTolive (TTL) after which any older versions will be deleted ○ minimum and maximum number of versions to maintain. ● Zookeeper ○ HBase uses Apache Zookeeper to coordinate and manage the various regions running on the distributed cluster ○ Zookeeper is "a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. ○ Instead of building its own coordination service, HBase uses Zookeeper