Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: Just the Basics

3,996 views

Published on

New to HBase? This session will cover the basics of HBase in a very straightforward way—including architecture, API, and schema design.

Published in: Software
  • Be the first to comment

HBaseCon 2015: Just the Basics

  1. 1. HBase - Just The BasicsCopyright © 2015 Smoking Hand LLC. All rights Reserved 1 / 21
  2. 2. NoSQL Datastore built on top of the HDFS filesystem HBase is a column family oriented database Based on the Google BigTable paper Uses HDFS for storage Data can be retrieved quickly or batch processed with MapReduce What Is Apache HBase? Copyright © 2015 Smoking Hand LLC. All rights Reserved 2 / 21
  3. 3. Need Big Data TB/PB High throughput Variable columns Need random reads and writes HBase Use Cases Copyright © 2015 Smoking Hand LLC. All rights Reserved 3 / 21
  4. 4. HBase Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 4 / 21
  5. 5. HBase Daemons Copyright © 2015 Smoking Hand LLC. All rights Reserved 5 / 21
  6. 6. NoSQL Table Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 6 / 21
  7. 7. Column Families Copyright © 2015 Smoking Hand LLC. All rights Reserved 7 / 21
  8. 8. NoSQL Data Copyright © 2015 Smoking Hand LLC. All rights Reserved 8 / 21
  9. 9. Regions Copyright © 2015 Smoking Hand LLC. All rights Reserved 9 / 21
  10. 10. Write Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 10 / 21
  11. 11. Read Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 11 / 21
  12. 12. HBase API Copyright © 2015 Smoking Hand LLC. All rights Reserved 12 / 21
  13. 13. HBase has a Java API It is the only first class citizen There are other programmatic interfaces A REST interface allows HTTP access A Thrift gateway allow non-Java programmatic access There are non-native SQL interfaces Apache Phoenix, Impala, Presto, Hive Accessing HBase Copyright © 2015 Smoking Hand LLC. All rights Reserved 13 / 21
  14. 14. setConf(HBaseConfiguration.create(getConf())); Connectionconnection=null; Tabletable=null; try{ //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createaputandaddcolumnstoit Putp=newPut(Bytes.toBytes("rowkey")); p.addColumn(CF,Bytes.toBytes("columnqual"),Bytes.toBytes(42.0d)); //Addthenewcolumntotherow table.put(p); }finally{ //closeeverythingdown if(table!=null) table.close(); if(connection!=null) connection.close(); } Puts Copyright © 2015 Smoking Hand LLC. All rights Reserved 14 / 21
  15. 15. //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createagetwiththerowkeyyouwant Getg=newGet(Bytes.toBytes("rowkey")); //Gettherowandbytestheforthecell Resultresult=table.get(g); byte[]value=result.getValue(CF,Bytes.toBytes("columnqual")); //Yes,yourclientwillneedtoknowthetypeofdatainthecell doubledoubleValue=Bytes.toDouble(value); Gets Copyright © 2015 Smoking Hand LLC. All rights Reserved 15 / 21
  16. 16. Architecting HBase Solutions Copyright © 2015 Smoking Hand LLC. All rights Reserved 16 / 21
  17. 17. Architecting for a RDBMS is about relationships or normalizing data Architecting for HBase is about access patterns or denormalizing data Questions to ask: How is data being accessed? What is the fastest way to read or write data? What is the optimal way to organize data? Differences With RDBMS Copyright © 2015 Smoking Hand LLC. All rights Reserved 17 / 21
  18. 18. Abject Failure Copyright © 2015 Smoking Hand LLC. All rights Reserved Treating HBase like a relational database will lead to abject failure 18 / 21
  19. 19. Actual engineering goes into row key design You only have one index or primary key Getting this primary key right takes time and effort You don't just use an auto-incrementing number Multiple pieces of data are often in the row key Row Keys Copyright © 2015 Smoking Hand LLC. All rights Reserved 19 / 21
  20. 20. Tables schemas require design and thought The access pattern should be known ahead of time General best practices: Fewer, bigger (denormalized) tables Spend more time designing up front Use bulk loading for incremental or time series data Schema Design Copyright © 2015 Smoking Hand LLC. All rights Reserved 20 / 21
  21. 21. Jesse Anderson (Smoking Hand) jesse@smokinghand.com @jessetanderson Conclusion Copyright © 2015 Smoking Hand LLC. All rights Reserved 21 / 21

×