Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
HBase: Just the Basics
Jesse Anderson – Curriculum Developer and Instructor
v2
2
What Is HBase?
©2014 Cloudera, Inc. All rights reserved.2
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache To...
3
Why Use HBase?
©2014 Cloudera, Inc. All rights reserved.3
• Storing large amounts of data (TB/PB)
• High throughput for ...
4
When to Consider Not Using HBase?
©2014 Cloudera, Inc. All rights reserved.4
• Only use with Big Data problems
• Read st...
5
HBase Architecture
How it works
6
Meet the Daemons
©2014 Cloudera, Inc. All rights reserved.6
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
• NameNode/...
7
Daemon Locations
©2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionSe...
8
Tables and Column Families
©2014 Cloudera, Inc. All rights reserved.8
Column Family “contactinfo” Column Family “profile...
9
Rows and Columns
©2014 Cloudera, Inc. All rights reserved.9
Row key Column Family “contactinfo” Column Family “profileph...
10
Regions
©2014 Cloudera, Inc. All rights reserved.10
Row key Column Family “contactinfo”
adupont fname: Andre lname: Dup...
11
Client
Write Path
©2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
Regio...
12
Client
Read Path
©2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
Region...
13
HBase API
How to access the data
14
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.14
• Data is not accessed over SQL
• You must:
• Create yo...
15
Types of Access
©2014 Cloudera, Inc. All rights reserved.15
• Gets
• Gets a row’s data based on the row key
• Puts
• Up...
16
Gets
©2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[]...
17
Puts
©2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new Put(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BY...
18
HBase Schema Design
How to design
19
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.19
• Designing schemas for HBase requires an in-depth
know...
20
Treating HBase like a traditional RDBMS will lead
to abject failure!
Captain Picard
21
Row Keys
©2014 Cloudera, Inc. All rights reserved.21
• A row key is more than the glue between two tables
• Engineering...
22
Schema Design
©2014 Cloudera, Inc. All rights reserved.22
• Schema design does not start in an ERD
• Access pattern mus...
23 ©2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson
Upcoming SlideShare
Loading in …5
×

HBaseCon 2014-Just the Basics

480 views

Published on

My HBaseCon 2014 talk that introduced the basic concepts of HBase. This shows the basic workings of HBase, the programming API, and schema design.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

HBaseCon 2014-Just the Basics

  1. 1. 1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v2
  2. 2. 2 What Is HBase? ©2014 Cloudera, Inc. All rights reserved.2 • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • Handles the various manifestations of Big Data • Based on Google’s BigTable paper
  3. 3. 3 Why Use HBase? ©2014 Cloudera, Inc. All rights reserved.3 • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes
  4. 4. 4 When to Consider Not Using HBase? ©2014 Cloudera, Inc. All rights reserved.4 • Only use with Big Data problems • Read straight through files • Write all at once or append new files • Not random reads or writes • Access patterns of the data are ill-defined
  5. 5. 5 HBase Architecture How it works
  6. 6. 6 Meet the Daemons ©2014 Cloudera, Inc. All rights reserved.6 • HBase Master • RegionServer • ZooKeeper • HDFS • NameNode/Standby NameNode • DataNode
  7. 7. 7 Daemon Locations ©2014 Cloudera, Inc. All rights reserved.7 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Master Nodes Slave Nodes
  8. 8. 8 Tables and Column Families ©2014 Cloudera, Inc. All rights reserved.8 Column Family “contactinfo” Column Family “profilephoto” Tables are broken into groupings called Column Families. Group data frequently accessed together and compress it Group photos with different settings
  9. 9. 9 Rows and Columns ©2014 Cloudera, Inc. All rights reserved.9 Row key Column Family “contactinfo” Column Family “profilephoto” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith image: <smith.jpg> mrossi fname: Mario lname: Rossi image: <mario.jpg> Row keys identify a row No storage penalty for unused columns Each Column Family can have many columns
  10. 10. 10 Regions ©2014 Cloudera, Inc. All rights reserved.10 Row key Column Family “contactinfo” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith A table is broken into regions NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Row key Column Family “contactinfo” mrossi fname: Mario lname: Rossi zstevens fname: Zack lname: Stevens Regions are served by RegionServers
  11. 11. 11 Client Write Path ©2014 Cloudera, Inc. All rights reserved.11 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Write to RegionServer
  12. 12. 12 Client Read Path ©2014 Cloudera, Inc. All rights reserved.12 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Read from RegionServer
  13. 13. 13 HBase API How to access the data
  14. 14. 14 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.14 • Data is not accessed over SQL • You must: • Create your own connections • Keep track of the type of data in a column • Give each row a key • Access a row by its key
  15. 15. 15 Types of Access ©2014 Cloudera, Inc. All rights reserved.15 • Gets • Gets a row’s data based on the row key • Puts • Upserts a row with data based on the row key • Scans • Finds all matching rows based on the row key • Scan logic can be increased by using filters
  16. 16. 16 Gets ©2014 Cloudera, Inc. All rights reserved.16 1 2 3 4 Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g); byte[] byteArray = r.getValue(COLFAM_BYTS,COLDESC_BYTS); String columnValue = Bytes.toString(byteArray);
  17. 17. 17 Puts ©2014 Cloudera, Inc. All rights reserved.17 1 2 3 4 Put p = new Put(ROW_KEY_BYTES); p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value")); table.put(p);
  18. 18. 18 HBase Schema Design How to design
  19. 19. 19 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.19 • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship- centric’ • You design around how data is accessed • Row keys are engineered
  20. 20. 20 Treating HBase like a traditional RDBMS will lead to abject failure! Captain Picard
  21. 21. 21 Row Keys ©2014 Cloudera, Inc. All rights reserved.21 • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key • Contents of a row key vary by access pattern • Often made up of several pieces of data: <group_id><email>
  22. 22. 22 Schema Design ©2014 Cloudera, Inc. All rights reserved.22 • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance • Fewer, bigger tables
  23. 23. 23 ©2014 Cloudera, Inc. All rights reserved. Jesse Anderson @jessetanderson

×