HBaseCon 2015: Just the Basics

3,612 views

Published on

New to HBase? This session will cover the basics of HBase in a very straightforward way—including architecture, API, and schema design.

Published in: Software
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,612
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

HBaseCon 2015: Just the Basics

  1. 1. HBase - Just The BasicsCopyright © 2015 Smoking Hand LLC. All rights Reserved 1 / 21
  2. 2. NoSQL Datastore built on top of the HDFS filesystem HBase is a column family oriented database Based on the Google BigTable paper Uses HDFS for storage Data can be retrieved quickly or batch processed with MapReduce What Is Apache HBase? Copyright © 2015 Smoking Hand LLC. All rights Reserved 2 / 21
  3. 3. Need Big Data TB/PB High throughput Variable columns Need random reads and writes HBase Use Cases Copyright © 2015 Smoking Hand LLC. All rights Reserved 3 / 21
  4. 4. HBase Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 4 / 21
  5. 5. HBase Daemons Copyright © 2015 Smoking Hand LLC. All rights Reserved 5 / 21
  6. 6. NoSQL Table Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 6 / 21
  7. 7. Column Families Copyright © 2015 Smoking Hand LLC. All rights Reserved 7 / 21
  8. 8. NoSQL Data Copyright © 2015 Smoking Hand LLC. All rights Reserved 8 / 21
  9. 9. Regions Copyright © 2015 Smoking Hand LLC. All rights Reserved 9 / 21
  10. 10. Write Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 10 / 21
  11. 11. Read Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 11 / 21
  12. 12. HBase API Copyright © 2015 Smoking Hand LLC. All rights Reserved 12 / 21
  13. 13. HBase has a Java API It is the only first class citizen There are other programmatic interfaces A REST interface allows HTTP access A Thrift gateway allow non-Java programmatic access There are non-native SQL interfaces Apache Phoenix, Impala, Presto, Hive Accessing HBase Copyright © 2015 Smoking Hand LLC. All rights Reserved 13 / 21
  14. 14. setConf(HBaseConfiguration.create(getConf())); Connectionconnection=null; Tabletable=null; try{ //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createaputandaddcolumnstoit Putp=newPut(Bytes.toBytes("rowkey")); p.addColumn(CF,Bytes.toBytes("columnqual"),Bytes.toBytes(42.0d)); //Addthenewcolumntotherow table.put(p); }finally{ //closeeverythingdown if(table!=null) table.close(); if(connection!=null) connection.close(); } Puts Copyright © 2015 Smoking Hand LLC. All rights Reserved 14 / 21
  15. 15. //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createagetwiththerowkeyyouwant Getg=newGet(Bytes.toBytes("rowkey")); //Gettherowandbytestheforthecell Resultresult=table.get(g); byte[]value=result.getValue(CF,Bytes.toBytes("columnqual")); //Yes,yourclientwillneedtoknowthetypeofdatainthecell doubledoubleValue=Bytes.toDouble(value); Gets Copyright © 2015 Smoking Hand LLC. All rights Reserved 15 / 21
  16. 16. Architecting HBase Solutions Copyright © 2015 Smoking Hand LLC. All rights Reserved 16 / 21
  17. 17. Architecting for a RDBMS is about relationships or normalizing data Architecting for HBase is about access patterns or denormalizing data Questions to ask: How is data being accessed? What is the fastest way to read or write data? What is the optimal way to organize data? Differences With RDBMS Copyright © 2015 Smoking Hand LLC. All rights Reserved 17 / 21
  18. 18. Abject Failure Copyright © 2015 Smoking Hand LLC. All rights Reserved Treating HBase like a relational database will lead to abject failure 18 / 21
  19. 19. Actual engineering goes into row key design You only have one index or primary key Getting this primary key right takes time and effort You don't just use an auto-incrementing number Multiple pieces of data are often in the row key Row Keys Copyright © 2015 Smoking Hand LLC. All rights Reserved 19 / 21
  20. 20. Tables schemas require design and thought The access pattern should be known ahead of time General best practices: Fewer, bigger (denormalized) tables Spend more time designing up front Use bulk loading for incremental or time series data Schema Design Copyright © 2015 Smoking Hand LLC. All rights Reserved 20 / 21
  21. 21. Jesse Anderson (Smoking Hand) jesse@smokinghand.com @jessetanderson Conclusion Copyright © 2015 Smoking Hand LLC. All rights Reserved 21 / 21

×