HBase: Just the Basics

HBaseCon
1
HBase: Just the Basics
Jesse Anderson – Curriculum Developer and Instructor
v1
2
What Is HBase?
©2014 Cloudera, Inc. All rights reserved.2
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• Handles the various manifestations of Big Data
• Based on Google’s BigTable paper
3
Why Use HBase?
©2014 Cloudera, Inc. All rights reserved.3
• Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
4
When to Consider Not Using HBase?
©2014 Cloudera, Inc. All rights reserved.4
• Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
• Not random reads or writes
• Access patterns of the data are ill-defined
5
HBase Architecture
How it works
6
Meet the Daemons
©2014 Cloudera, Inc. All rights reserved.6
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
• NameNode/Standby NameNode
• DataNode
7
Daemon Locations
©2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Master Nodes
Slave Nodes
8
Tables and Column Families
©2014 Cloudera, Inc. All rights reserved.8
Column Family “contactinfo” Column Family “profilephoto”
Tables are broken into groupings called Column Families.
Group data frequently
accessed together and
compress it Group photos with different settings
9
Rows and Columns
©2014 Cloudera, Inc. All rights reserved.9
Row key Column Family “contactinfo” Column Family “profilephoto”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith image: <smith.jpg>
mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
10
Regions
©2014 Cloudera, Inc. All rights reserved.10
Row key Column Family “contactinfo”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith
A table is broken into regions
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Row key Column Family “contactinfo”
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by
RegionServers
11
Client
Write Path
©2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Write to
RegionServer
12
Client
Read Path
©2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Read from
RegionServer
13
HBase API
How to access the data
14
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.14
• Data is not accessed over SQL
• You must:
• Create your own connections
• Keep track of the type of data in a column
• Give each row a key
• Access a row by its key
15
Types of Access
©2014 Cloudera, Inc. All rights reserved.15
• Gets
• Gets a row’s data based on the row key
• Puts
• Upserts a row with data based on the row key
• Scans
• Finds all matching rows based on the row key
• Scan logic can be increased by using filters
16
Gets
©2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue =
Bytes.toString(byteArray);
17
Puts
©2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new
Put(Bytes.toBytes(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BYTES,
Bytes.toBytes("value"));
table.put(p);
18
HBase Schema Design
How to design
19
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.19
• Designing schemas for HBase requires an in-depth
knowledge
• Schema Design is ‘data-centric’ not ‘relationship-
centric’
• You design around how data is accessed
• Row keys are engineered
20
Treating HBase like a traditional RDBMS will lead
to abject failure!
Captain Picard
21
Row Keys
©2014 Cloudera, Inc. All rights reserved.21
• A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row
key
• Contents of a row key vary by access pattern
• Often made up of several pieces of data
22
Schema Design
©2014 Cloudera, Inc. All rights reserved.22
• Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
• Fewer, bigger tables
23 ©2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson
1 of 23

Recommended

HBASE Overview by
HBASE OverviewHBASE Overview
HBASE OverviewSampath Rachakonda
622 views21 slides
Chicago Data Summit: Apache HBase: An Introduction by
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
22.5K views31 slides
Apache HBase - Just the Basics by
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
4.6K views22 slides
Apache hadoop hbase by
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbasesheetal sharma
4.2K views34 slides
HBaseCon 2015: Analyzing HBase Data with Apache Hive by
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
7.9K views22 slides
Introduction To HBase by
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
87.9K views18 slides

More Related Content

What's hot

Apache Spark on Apache HBase: Current and Future by
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
2.8K views23 slides
Introduction to HBase by
Introduction to HBaseIntroduction to HBase
Introduction to HBaseByeongweon Moon
2.5K views22 slides
HBase for Architects by
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
33.7K views21 slides
Hadoop and HBase in the Real World by
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
2.4K views16 slides
Apache Hadoop and HBase by
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
36.7K views44 slides
Apache HBase™ by
Apache HBase™Apache HBase™
Apache HBase™Prashant Gupta
3.8K views58 slides

What's hot(20)

Apache Spark on Apache HBase: Current and Future by HBaseCon
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
HBaseCon2.8K views
HBase for Architects by Nick Dimiduk
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk33.7K views
Hadoop and HBase in the Real World by Cloudera, Inc.
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
Cloudera, Inc.2.4K views
Apache Hadoop and HBase by Cloudera, Inc.
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.36.7K views
HBase Read High Availability Using Timeline-Consistent Region Replicas by HBaseCon
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon4.1K views
Intro to HBase by alexbaranau
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau42.7K views
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,... by Cloudera, Inc.
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.3.8K views
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T... by Simplilearn
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn2.6K views
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory... by Cloudera, Inc.
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.4.1K views
HBaseCon 2013: Integration of Apache Hive and HBase by Cloudera, Inc.
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
Cloudera, Inc.9.9K views
Big Data Fundamentals in the Emerging New Data World by Jongwook Woo
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
Jongwook Woo3.3K views
HBaseCon 2013: Full-Text Indexing for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.7.3K views
HBaseCon 2015: Just the Basics by HBaseCon
HBaseCon 2015: Just the BasicsHBaseCon 2015: Just the Basics
HBaseCon 2015: Just the Basics
HBaseCon4.8K views
Intro to HBase - Lars George by JAX London
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London5K views
A Survey of HBase Application Archetypes by HBaseCon
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon20K views
Apache HBase 1.0 Release by Nick Dimiduk
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
Nick Dimiduk25.6K views
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS by HBaseCon
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon4K views

Similar to HBase: Just the Basics

HBase Data Modeling and Access Patterns with Kite SDK by
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
4.7K views37 slides
Introduction to HBase by
Introduction to HBaseIntroduction to HBase
Introduction to HBaseApekshit Sharma
329 views75 slides
Kite SDK: Working with Datasets by
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with DatasetsCloudera, Inc.
2.8K views41 slides
Cloudera Operational DB (Apache HBase & Apache Phoenix) by
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
1.4K views50 slides
Hive 3 a new horizon by
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonAbdelkrim Hadjidj
390 views50 slides
Hive 3 - a new horizon by
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
2.6K views50 slides

Similar to HBase: Just the Basics(20)

HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
Kite SDK: Working with Datasets by Cloudera, Inc.
Kite SDK: Working with DatasetsKite SDK: Working with Datasets
Kite SDK: Working with Datasets
Cloudera, Inc.2.8K views
Cloudera Operational DB (Apache HBase & Apache Phoenix) by Timothy Spann
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann1.4K views
Hive 3 - a new horizon by Thejas Nair
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair2.6K views
Big Data Conference April 2015 by Aaron Benz
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015
Aaron Benz442 views
Kite SDK introduction for Portland Big Data by _blue
Kite SDK introduction for Portland Big DataKite SDK introduction for Portland Big Data
Kite SDK introduction for Portland Big Data
_blue903 views
Sql on everything with drill by Julien Le Dem
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem2.7K views
Introduction to HBase - Phoenix HUG 5/14 by Jeremy Walsh
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
Jeremy Walsh1.2K views
SQL Engines for Hadoop - The case for Impala by markgrover
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover1.2K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by HBaseCon
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon4.1K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by Chris Huang
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
Chris Huang1.4K views
Sql saturday pig session (wes floyd) v2 by Wes Floyd
Sql saturday   pig session (wes floyd) v2Sql saturday   pig session (wes floyd) v2
Sql saturday pig session (wes floyd) v2
Wes Floyd4.8K views
Hive Quick Start Tutorial by Carl Steinbach
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
Carl Steinbach139.8K views
Webinar: Selecting the Right SQL-on-Hadoop Solution by MapR Technologies
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies1.7K views

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides
hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides

More from HBaseCon(20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by HBaseCon
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon3.9K views
hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: HBase Practice At XiaoMi by HBaseCon
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon1.8K views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 gohbase: Pure Go HBase Client by HBaseCon
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon1.7K views

Recently uploaded

FOSSLight Community Day 2023-11-30 by
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
5 views18 slides
Keep by
KeepKeep
KeepGeniusee
78 views10 slides
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Marc Müller
42 views83 slides
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
35 views124 slides
Using Qt under LGPL-3.0 by
Using Qt under LGPL-3.0Using Qt under LGPL-3.0
Using Qt under LGPL-3.0Burkhard Stubert
12 views11 slides
Unleash The Monkeys by
Unleash The MonkeysUnleash The Monkeys
Unleash The MonkeysJacob Duijzer
8 views28 slides

Recently uploaded(20)

FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan5 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller42 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski12 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx by animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm15 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri890 views
FIMA 2023 Neo4j & FS - Entity Resolution.pptx by Neo4j
FIMA 2023 Neo4j & FS - Entity Resolution.pptxFIMA 2023 Neo4j & FS - Entity Resolution.pptx
FIMA 2023 Neo4j & FS - Entity Resolution.pptx
Neo4j12 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino5 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ8 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic12 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller41 views
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action by Márton Kodok
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Márton Kodok11 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254556 views

HBase: Just the Basics

  • 1. 1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v1
  • 2. 2 What Is HBase? ©2014 Cloudera, Inc. All rights reserved.2 • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • Handles the various manifestations of Big Data • Based on Google’s BigTable paper
  • 3. 3 Why Use HBase? ©2014 Cloudera, Inc. All rights reserved.3 • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes
  • 4. 4 When to Consider Not Using HBase? ©2014 Cloudera, Inc. All rights reserved.4 • Only use with Big Data problems • Read straight through files • Write all at once or append new files • Not random reads or writes • Access patterns of the data are ill-defined
  • 6. 6 Meet the Daemons ©2014 Cloudera, Inc. All rights reserved.6 • HBase Master • RegionServer • ZooKeeper • HDFS • NameNode/Standby NameNode • DataNode
  • 7. 7 Daemon Locations ©2014 Cloudera, Inc. All rights reserved.7 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Master Nodes Slave Nodes
  • 8. 8 Tables and Column Families ©2014 Cloudera, Inc. All rights reserved.8 Column Family “contactinfo” Column Family “profilephoto” Tables are broken into groupings called Column Families. Group data frequently accessed together and compress it Group photos with different settings
  • 9. 9 Rows and Columns ©2014 Cloudera, Inc. All rights reserved.9 Row key Column Family “contactinfo” Column Family “profilephoto” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith image: <smith.jpg> mrossi fname: Mario lname: Rossi image: <mario.jpg> Row keys identify a row No storage penalty for unused columns Each Column Family can have many columns
  • 10. 10 Regions ©2014 Cloudera, Inc. All rights reserved.10 Row key Column Family “contactinfo” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith A table is broken into regions NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Row key Column Family “contactinfo” mrossi fname: Mario lname: Rossi zstevens fname: Zack lname: Stevens Regions are served by RegionServers
  • 11. 11 Client Write Path ©2014 Cloudera, Inc. All rights reserved.11 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Write to RegionServer
  • 12. 12 Client Read Path ©2014 Cloudera, Inc. All rights reserved.12 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Read from RegionServer
  • 13. 13 HBase API How to access the data
  • 14. 14 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.14 • Data is not accessed over SQL • You must: • Create your own connections • Keep track of the type of data in a column • Give each row a key • Access a row by its key
  • 15. 15 Types of Access ©2014 Cloudera, Inc. All rights reserved.15 • Gets • Gets a row’s data based on the row key • Puts • Upserts a row with data based on the row key • Scans • Finds all matching rows based on the row key • Scan logic can be increased by using filters
  • 16. 16 Gets ©2014 Cloudera, Inc. All rights reserved.16 1 2 3 4 Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g); byte[] byteArray = r.getValue(COLFAM_BYTS,COLDESC_BYTS); String columnValue = Bytes.toString(byteArray);
  • 17. 17 Puts ©2014 Cloudera, Inc. All rights reserved.17 1 2 3 4 Put p = new Put(Bytes.toBytes(ROW_KEY_BYTES); p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value")); table.put(p);
  • 19. 19 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.19 • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship- centric’ • You design around how data is accessed • Row keys are engineered
  • 20. 20 Treating HBase like a traditional RDBMS will lead to abject failure! Captain Picard
  • 21. 21 Row Keys ©2014 Cloudera, Inc. All rights reserved.21 • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key • Contents of a row key vary by access pattern • Often made up of several pieces of data
  • 22. 22 Schema Design ©2014 Cloudera, Inc. All rights reserved.22 • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance • Fewer, bigger tables
  • 23. 23 ©2014 Cloudera, Inc. All rights reserved. Jesse Anderson @jessetanderson