Apache HBase - Just the Basics

1,097 views

Published on

Jesse Anderson (Smoking Hand)

This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,097
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Apache HBase - Just the Basics

  1. 1. HBase - Just The Basics
  2. 2. The Basics of HBase
  3. 3. NoSQL Datastore built on top of the HDFS filesystem HBase is a column family oriented database Based on the Google BigTable paper Uses HDFS for storage Data can be retrieved quickly or batch processed with MapReduce What Is Apache HBase?
  4. 4. Need Big Data TB/PB High throughput Variable columns Need random reads and writes HBase Use Cases
  5. 5. HBase Architecture
  6. 6. HBase Daemons
  7. 7. NoSQL Table Architecture
  8. 8. Column Families
  9. 9. NoSQL Data
  10. 10. Regions
  11. 11. Write Path
  12. 12. Read Path
  13. 13. HBase API
  14. 14. HBase has a Java API It is the only first class citizen There are other programmatic interfaces A REST interface allows HTTP access A Thrift gateway allow non-Java programmatic access There are non-native SQL interfaces Apache Phoenix, Impala, Presto, Hive Accessing HBase
  15. 15. setConf(HBaseConfiguration.create(getConf())); Connection connection = null; Table table = null; try { // Define the table and column family for the data TableName TABLE_NAME = TableName.valueOf("hbasetable"); byte[] CF = Bytes.toBytes("colfamily"); // Connect to the table connection = ConnectionFactory.createConnection(getConf()); table = connection.getTable(TABLE_NAME); // Create a put and add columns to it Put p = new Put(Bytes.toBytes("rowkey")); p.addColumn(CF, Bytes.toBytes("columnqual"), Bytes.toBytes(42.0d)); // Add the new column to the row table.put(p); } finally { // close everything down if (table != null) table.close(); if (connection != null) connection.close(); } Puts
  16. 16. // Define the table and column family for the data TableName TABLE_NAME = TableName.valueOf("hbasetable"); byte[] CF = Bytes.toBytes("colfamily"); // Connect to the table connection = ConnectionFactory.createConnection(getConf()); table = connection.getTable(TABLE_NAME); // Create a get with the row key you want Get g = new Get(Bytes.toBytes("rowkey")); // Get the row and bytes the for the cell Result result = table.get(g); byte[] value = result.getValue(CF, Bytes.toBytes("columnqual")); // Yes, your client will need to know the type of data in the cell double doubleValue = Bytes.toDouble(value); Gets
  17. 17. Architecting HBase Solutions
  18. 18. Architecting for a RDBMS is about relationships or normalizing data Architecting for HBase is about access patterns or denormalizing data Questions to ask: How is data being accessed? What is the fastest way to read/write data? What is the optimal way to organize data? Differences With RDBMS
  19. 19. Treating HBase like a relational database will lead to abject failure Abject Failure
  20. 20. Actual engineering goes into row key design You only have one index or primary key Getting this primary key right takes time and effort You don't just use an auto- incrementing number Multiple pieces of data are often in the row key Row Keys
  21. 21. Tables schemas require design and thought The access pattern should be known ahead of time General best practices: Fewer, bigger (denormalized) tables Spend more time designing up front Use bulk loading for incremental or time series data Schema Design
  22. 22. Jesse Anderson (Smoking Hand) jesse@smokinghand.com @jessetanderson Conclusion

×