SlideShare a Scribd company logo
The Hadoop database, a distributed, scalable, big data store.
Agenda
NoSQL Recap

Research Background 

Table Components

High Level Architecture

Data Modeling Patterns

Challenges
RDBMS vs NoSQL
Relational Databases (PostgreSQL, MySQL…) have the
fl
exibility to answer any question,
which means they are optimized for none

Typically scaling writes needs a lot of engineering when using RDBMS

Most NoSQL DBs take a di
ff
erent approach and to answer one or few questions, in a
optimized scalable manner.
RDBMS vs NoSQL
Relational Databases are ACID(Atomic, Consistent, Isolated, Durable) which is a great property however it is hard to scale.

Leader accepts writes, followers accept reads. Scaling writes means we need to shard the database which is di
ffi
cult and has drawbacks.
Leader
Follower Follower
Research Background & Use Cases
Hbase is primarily based on two seminal papers in distributed systems:

• The Hadoop File System 

• Bigtable: A Distributed Storage System for Structured Data
Use cases
• Hbase is a good choice for consistent, linearly scalable read & write on
large tables with billions x millions of columns scalable to PBs of data
without sacri
fi
cing performance (millisecond latency).
• Real time messaging App

• Gmail like Email Service

• Timeseries Data

• Firehose Application
Hbase Characteristics
• Hbase is consistent

• Hbase is a key value store with ability to do random read/write, scans (and
partial scans)

• Hbase data is sorted based on row key

• Data is stored as Byte Array, It is applications job to serialize or deserialize
to types or objects.
High Level Architecture
Row Key
Region Server
Address
Key,Table Region Server
A..B, table xyz 192.168.1..
C…D, table bx 192.168.20..
Meta Table
Region
Server


192.168.1..
Region 1


Table with row key
A
Region 2


Table with row key
B
Region
Server


192.168.20..
Region 1


Table with row key
A
Region 2


Table with row key
B
Has the mapping between keys and
region servers
Data Modeling
Most important aspect of Hbase Data Modeling is how you de
fi
ne the row key 

Row key most be random and evenly distributed so all region servers can serve the
load evenly and avoid (hot partitioning/ hot spotting)
Examples of Bad Row Keys:

First name & Last name (“John-Smith”)

Url(“www.yahoo.com", “www.youtube.com")
Patterns to Avoid Hot Partitions
Hashing the key 

www.yahoo.com => 1b03577ed104f16aadc00a639d33cb44

www.youtube.com => ab3201c6103205c14f6e56b11b2fcd46

Salting

Adding a random number to su
ffi
x of the key to distribute the same partition

www.yahoo.com-10240

www.yahoo.com-10213
Partial Scans & Data Modeling
Hbase supports scans on partial row key or regexes

Example:

Row_keys:

yahoo.com

youtube.com

fb.com
Scan query with y* will 

return both youTube and yahoo
Partial Scans & Data Modeling
Suppose we want to create a service like Gmail to store emails:

Data modeling approach 1:

user_uuid:email_uuid 

example: 0db72126-59fb-4e70-85f9-c82fce62c1e5:f15fe2db-09cc-4456-bc9a-
d87b2a81d1b2
Now if we want to see all emails for this user we can partial scan with uuid*

Scan: 0db72126-59fb-4e70-85f9-c82fce62c1e5:*
Partial Scans & Data Modeling
Data modeling approach 2: Adding Time into the key

user_uuid:year:month:email_uuid 

example: 0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:07:f15fe2db-09cc-4456-b
d87b2a81d1b2
Now if we can also scan based on time range if we wantScan:
0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:*

Or 0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:07*
Challenges
Hbase Operations is hard to maintain, it has a lot of components. 

Hbase is not friendly to a lot of delete operations because it stores data as LSM Tree

Which leads to tombstones

More Related Content

What's hot

Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
Some corner at the Laboratory
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
Kanike Krishna
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
DATAVERSITY
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Amit Khandelwal
 
Apache hive
Apache hiveApache hive
Apache hive
Inthra onsap
 
Apache Hive
Apache HiveApache Hive
Apache Hive
tusharsinghal58
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
Divya Sharma
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
Rohit Agrawal
 
Hbase
HbaseHbase
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
Rohit Agrawal
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
rainynovember12
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
Suvradeep Rudra
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
Anuja Gunale
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
vishal choudhary
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
VishalJharwade
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
Dilfaroz Khan
 

What's hot (20)

Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Apache hive
Apache hiveApache hive
Apache hive
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Hbase
HbaseHbase
Hbase
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 

Similar to Hbase

Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
IndicThreads
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Byeongweon Moon
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
Mohammad Shaker
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
Chester Chen
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
Aman Sinha
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
IndicThreads
 
Nosql seminar
Nosql seminarNosql seminar
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
Kel Graham
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
Santiago Coffey
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
Bin Cai
 
AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDB
Amazon Web Services
 
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdf
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdfRelational database was proposed by Edgar Codd (of IBM Research) aro.pdf
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdf
APMRETAIL
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
Cynthia Saracco
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Amazon Web Services Korea
 

Similar to Hbase (20)

Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 
Hspark index conf
Hspark index confHspark index conf
Hspark index conf
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
Indic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path aheadIndic threads pune12-nosql now and path ahead
Indic threads pune12-nosql now and path ahead
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
Hadoop in sigmod 2011
Hadoop in sigmod 2011Hadoop in sigmod 2011
Hadoop in sigmod 2011
 
AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDB
 
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdf
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdfRelational database was proposed by Edgar Codd (of IBM Research) aro.pdf
Relational database was proposed by Edgar Codd (of IBM Research) aro.pdf
 
Big Data: Big SQL and HBase
Big Data:  Big SQL and HBase Big Data:  Big SQL and HBase
Big Data: Big SQL and HBase
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
 

Recently uploaded

一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

Hbase

  • 1. The Hadoop database, a distributed, scalable, big data store.
  • 2. Agenda NoSQL Recap Research Background Table Components High Level Architecture Data Modeling Patterns Challenges
  • 3. RDBMS vs NoSQL Relational Databases (PostgreSQL, MySQL…) have the fl exibility to answer any question, which means they are optimized for none Typically scaling writes needs a lot of engineering when using RDBMS Most NoSQL DBs take a di ff erent approach and to answer one or few questions, in a optimized scalable manner.
  • 4. RDBMS vs NoSQL Relational Databases are ACID(Atomic, Consistent, Isolated, Durable) which is a great property however it is hard to scale. Leader accepts writes, followers accept reads. Scaling writes means we need to shard the database which is di ffi cult and has drawbacks. Leader Follower Follower
  • 5. Research Background & Use Cases Hbase is primarily based on two seminal papers in distributed systems: • The Hadoop File System • Bigtable: A Distributed Storage System for Structured Data
  • 6. Use cases • Hbase is a good choice for consistent, linearly scalable read & write on large tables with billions x millions of columns scalable to PBs of data without sacri fi cing performance (millisecond latency). • Real time messaging App • Gmail like Email Service • Timeseries Data • Firehose Application
  • 7. Hbase Characteristics • Hbase is consistent • Hbase is a key value store with ability to do random read/write, scans (and partial scans) • Hbase data is sorted based on row key • Data is stored as Byte Array, It is applications job to serialize or deserialize to types or objects.
  • 8.
  • 10. Row Key Region Server Address Key,Table Region Server A..B, table xyz 192.168.1.. C…D, table bx 192.168.20.. Meta Table Region Server 192.168.1.. Region 1 
 Table with row key A Region 2 
 Table with row key B Region Server 192.168.20.. Region 1 
 Table with row key A Region 2 
 Table with row key B Has the mapping between keys and region servers
  • 11. Data Modeling Most important aspect of Hbase Data Modeling is how you de fi ne the row key Row key most be random and evenly distributed so all region servers can serve the load evenly and avoid (hot partitioning/ hot spotting) Examples of Bad Row Keys:
 First name & Last name (“John-Smith”) Url(“www.yahoo.com", “www.youtube.com")
  • 12. Patterns to Avoid Hot Partitions Hashing the key www.yahoo.com => 1b03577ed104f16aadc00a639d33cb44 www.youtube.com => ab3201c6103205c14f6e56b11b2fcd46 Salting Adding a random number to su ffi x of the key to distribute the same partition www.yahoo.com-10240 www.yahoo.com-10213
  • 13. Partial Scans & Data Modeling Hbase supports scans on partial row key or regexes Example: Row_keys: yahoo.com youtube.com fb.com Scan query with y* will return both youTube and yahoo
  • 14. Partial Scans & Data Modeling Suppose we want to create a service like Gmail to store emails: Data modeling approach 1: user_uuid:email_uuid example: 0db72126-59fb-4e70-85f9-c82fce62c1e5:f15fe2db-09cc-4456-bc9a- d87b2a81d1b2 Now if we want to see all emails for this user we can partial scan with uuid* Scan: 0db72126-59fb-4e70-85f9-c82fce62c1e5:*
  • 15. Partial Scans & Data Modeling Data modeling approach 2: Adding Time into the key user_uuid:year:month:email_uuid example: 0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:07:f15fe2db-09cc-4456-b d87b2a81d1b2 Now if we can also scan based on time range if we wantScan: 0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:* Or 0db72126-59fb-4e70-85f9-c82fce62c1e5:2020:07*
  • 16. Challenges Hbase Operations is hard to maintain, it has a lot of components. Hbase is not friendly to a lot of delete operations because it stores data as LSM Tree Which leads to tombstones