More Related Content

Viewers also liked(20)

HBaseCon 2015: Just the Basics

  1. HBase - Just The BasicsCopyright © 2015 Smoking Hand LLC. All rights Reserved 1 / 21
  2. NoSQL Datastore built on top of the HDFS filesystem HBase is a column family oriented database Based on the Google BigTable paper Uses HDFS for storage Data can be retrieved quickly or batch processed with MapReduce What Is Apache HBase? Copyright © 2015 Smoking Hand LLC. All rights Reserved 2 / 21
  3. Need Big Data TB/PB High throughput Variable columns Need random reads and writes HBase Use Cases Copyright © 2015 Smoking Hand LLC. All rights Reserved 3 / 21
  4. HBase Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 4 / 21
  5. HBase Daemons Copyright © 2015 Smoking Hand LLC. All rights Reserved 5 / 21
  6. NoSQL Table Architecture Copyright © 2015 Smoking Hand LLC. All rights Reserved 6 / 21
  7. Column Families Copyright © 2015 Smoking Hand LLC. All rights Reserved 7 / 21
  8. NoSQL Data Copyright © 2015 Smoking Hand LLC. All rights Reserved 8 / 21
  9. Regions Copyright © 2015 Smoking Hand LLC. All rights Reserved 9 / 21
  10. Write Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 10 / 21
  11. Read Path Copyright © 2015 Smoking Hand LLC. All rights Reserved 11 / 21
  12. HBase API Copyright © 2015 Smoking Hand LLC. All rights Reserved 12 / 21
  13. HBase has a Java API It is the only first class citizen There are other programmatic interfaces A REST interface allows HTTP access A Thrift gateway allow non-Java programmatic access There are non-native SQL interfaces Apache Phoenix, Impala, Presto, Hive Accessing HBase Copyright © 2015 Smoking Hand LLC. All rights Reserved 13 / 21
  14. setConf(HBaseConfiguration.create(getConf())); Connectionconnection=null; Tabletable=null; try{ //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createaputandaddcolumnstoit Putp=newPut(Bytes.toBytes("rowkey")); p.addColumn(CF,Bytes.toBytes("columnqual"),Bytes.toBytes(42.0d)); //Addthenewcolumntotherow table.put(p); }finally{ //closeeverythingdown if(table!=null) table.close(); if(connection!=null) connection.close(); } Puts Copyright © 2015 Smoking Hand LLC. All rights Reserved 14 / 21
  15. //Definethetableandcolumnfamilyforthedata TableNameTABLE_NAME=TableName.valueOf("hbasetable"); byte[]CF=Bytes.toBytes("colfamily"); //Connecttothetable connection=ConnectionFactory.createConnection(getConf()); table=connection.getTable(TABLE_NAME); //Createagetwiththerowkeyyouwant Getg=newGet(Bytes.toBytes("rowkey")); //Gettherowandbytestheforthecell Resultresult=table.get(g); byte[]value=result.getValue(CF,Bytes.toBytes("columnqual")); //Yes,yourclientwillneedtoknowthetypeofdatainthecell doubledoubleValue=Bytes.toDouble(value); Gets Copyright © 2015 Smoking Hand LLC. All rights Reserved 15 / 21
  16. Architecting HBase Solutions Copyright © 2015 Smoking Hand LLC. All rights Reserved 16 / 21
  17. Architecting for a RDBMS is about relationships or normalizing data Architecting for HBase is about access patterns or denormalizing data Questions to ask: How is data being accessed? What is the fastest way to read or write data? What is the optimal way to organize data? Differences With RDBMS Copyright © 2015 Smoking Hand LLC. All rights Reserved 17 / 21
  18. Abject Failure Copyright © 2015 Smoking Hand LLC. All rights Reserved Treating HBase like a relational database will lead to abject failure 18 / 21
  19. Actual engineering goes into row key design You only have one index or primary key Getting this primary key right takes time and effort You don't just use an auto-incrementing number Multiple pieces of data are often in the row key Row Keys Copyright © 2015 Smoking Hand LLC. All rights Reserved 19 / 21
  20. Tables schemas require design and thought The access pattern should be known ahead of time General best practices: Fewer, bigger (denormalized) tables Spend more time designing up front Use bulk loading for incremental or time series data Schema Design Copyright © 2015 Smoking Hand LLC. All rights Reserved 20 / 21
  21. Jesse Anderson (Smoking Hand) jesse@smokinghand.com @jessetanderson Conclusion Copyright © 2015 Smoking Hand LLC. All rights Reserved 21 / 21